Re: [PATCH v4 2/6] dt-bindings: power: Add qcom rpm power domain driver bindings

2018-07-03 Thread Viresh Kumar
On 03-07-18, 16:35, Rob Herring wrote:
> > +qcom,level values specified in the OPP tables for RPMh power domains
> > +should use the RPMH_REGULATOR_LEVEL_* constants from
> > +
> > +
> > +   rpmhpd: power-controller {
> > +   compatible = "qcom,sdm845-rpmhpd";
> > +   #power-domain-cells = <1>;
> > +   operating-points-v2 = <_opp_table>;
> > +   };
> > +
> > +   rpmhpd_opp_table: opp-table {
> > +   compatible = "operating-points-v2-qcom-level";
> > +
> > +   rpmhpd_opp_ret: opp1 {
> > +   qcom,level = ;
> > +   };
> 
> I don't see the point in using the OPP binding here when you aren't 
> using *any* of the properties from it.

Yeah, that's the case for now. But there are cases (as Stephen
mentioned earlier [1]) where the voltage values (and maybe other
values like current, etc) would be known and filled in DT. And that's
why we all agreed to use OPP tables for PM domains as well, as these
are really "operating performance points" of these PM domains.

-- 
viresh

[1] lkml.kernel.org/r/20180110025454.gg21...@codeaurora.org


Re: [PATCH v4 2/6] dt-bindings: power: Add qcom rpm power domain driver bindings

2018-07-03 Thread Viresh Kumar
On 03-07-18, 16:35, Rob Herring wrote:
> > +qcom,level values specified in the OPP tables for RPMh power domains
> > +should use the RPMH_REGULATOR_LEVEL_* constants from
> > +
> > +
> > +   rpmhpd: power-controller {
> > +   compatible = "qcom,sdm845-rpmhpd";
> > +   #power-domain-cells = <1>;
> > +   operating-points-v2 = <_opp_table>;
> > +   };
> > +
> > +   rpmhpd_opp_table: opp-table {
> > +   compatible = "operating-points-v2-qcom-level";
> > +
> > +   rpmhpd_opp_ret: opp1 {
> > +   qcom,level = ;
> > +   };
> 
> I don't see the point in using the OPP binding here when you aren't 
> using *any* of the properties from it.

Yeah, that's the case for now. But there are cases (as Stephen
mentioned earlier [1]) where the voltage values (and maybe other
values like current, etc) would be known and filled in DT. And that's
why we all agreed to use OPP tables for PM domains as well, as these
are really "operating performance points" of these PM domains.

-- 
viresh

[1] lkml.kernel.org/r/20180110025454.gg21...@codeaurora.org


linux-next: Tree for Jul 4

2018-07-03 Thread Stephen Rothwell
Hi all,

Changes since 20180703:

The net-next tree lost its build failure.

The akpm-current tree lost its build failure.

Non-merge commits (relative to Linus' tree): 3555
 3796 files changed, 134005 insertions(+), 74047 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig. And finally, a simple boot test of the powerpc
pseries_le_defconfig kernel in qemu (with and without kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 282 trees (counting Linus' and 65 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (410da1e12ffe net/smc: fix up merge error with poll 
changes)
Merging fixes/master (147a89bc71e7 Merge tag 'kconfig-v4.17' of 
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild)
Merging kbuild-current/fixes (021c91791a5e Linux 4.18-rc3)
Merging arc-current/for-curr (3d561d86e99f ARC: Enable CONFIG_SWAP)
Merging arm-current/fixes (92d44a42af81 ARM: fix kill( ,SIGFPE) breakage)
Merging arm64-fixes/for-next/fixes (24fe1b0efad4 arm64: Remove unnecessary ISBs 
from set_{pte,pmd,pud})
Merging m68k-current/for-linus (b12c8a70643f m68k: Set default dma mask for 
platform devices)
Merging powerpc-fixes/fixes (22db552b50fa powerpc/powermac: Fix rtc read/write 
functions)
Merging sparc/master (1aaccb5fa0ea Merge tag 'rtc-4.18' of 
git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux)
Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2)
Merging net/master (248c690a2dc8 Merge tag 
'wireless-drivers-for-davem-2018-07-03' of 
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers)
Merging bpf/master (ed2b82c03dc1 bpf: hash map: decrement counter on error)
Merging ipsec/master (7284fdf39a91 esp6: fix memleak on error path in 
esp6_input)
Merging netfilter/master (24ac3a08e658 net/smc: rebuild nonblocking connect)
Merging ipvs/master (312564269535 net: netsec: reduce DMA mask to 40 bits)
Merging wireless-drivers/master (248c690a2dc8 Merge tag 
'wireless-drivers-for-davem-2018-07-03' of 
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers)
Merging mac80211/master (19103a4bfb42 mac80211: add stations tied to AP_VLANs 
during hw reconfig)
Merging rdma-fixes/for-rc (b697d7d8c741 IB/hfi1: Fix incorrect mixing of 
ERR_PTR and NULL return values)
Merging sound-current/for-linus (aaa23f86001b ALSA: hda - Handle pm failure 
during hotplug)
Merging sound-asoc-fixes/for-linus (16a27dd8d9e3 Merge branch 'asoc-4.18' into 
asoc-linus)
Merging regmap-fixes/for-linus (7daf201d7fe8 Linux 4.18-rc2)
Merging regulator-fixes/for-linus (69e00058b86c Merge branch 'regulator-4.18' 
into regulator-linus)
Merging spi-fixes/for-linus (d28cbf8dcb01 Merge branch 'spi-4.18' into 
spi-linus)
Merging pci-current/for-linus (83235822b8b4 nfp: stop limiting VFs to 0)
Merging driver-core.current/driver-core-linus (7daf201d7fe8 Linux 4.18-rc2)
Merging tty.current/tty-linus (021c91791a5e Linux 4.18-rc3)
Merging usb.current/usb-linus (b3a653288e1a i2c-cht-wc: Fix bq24190 supplier)
Merging usb-gadget-fixes/fixes (1d8e5c002758 dwc2: gadget: Fix ISOC IN DDMA PID 
bitfield value calculation)
Merging usb-serial-fixes/usb-linus (021c91791a5e Linux 4.18-rc3)
Merging usb-chipidea-fixes/ci-for-usb-stable (90f26cc6bb90 usb: chipidea: host: 
fix disconnection detect issue)
Merging phy/fixes (ad5003300b07 phy: mapphone-mdm6600: Fix wrong enum used for 
status lines)
Merging staging.current/staging-linus (920c92448839 staging: rtl8723bs: P

linux-next: Tree for Jul 4

2018-07-03 Thread Stephen Rothwell
Hi all,

Changes since 20180703:

The net-next tree lost its build failure.

The akpm-current tree lost its build failure.

Non-merge commits (relative to Linus' tree): 3555
 3796 files changed, 134005 insertions(+), 74047 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig. And finally, a simple boot test of the powerpc
pseries_le_defconfig kernel in qemu (with and without kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 282 trees (counting Linus' and 65 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (410da1e12ffe net/smc: fix up merge error with poll 
changes)
Merging fixes/master (147a89bc71e7 Merge tag 'kconfig-v4.17' of 
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild)
Merging kbuild-current/fixes (021c91791a5e Linux 4.18-rc3)
Merging arc-current/for-curr (3d561d86e99f ARC: Enable CONFIG_SWAP)
Merging arm-current/fixes (92d44a42af81 ARM: fix kill( ,SIGFPE) breakage)
Merging arm64-fixes/for-next/fixes (24fe1b0efad4 arm64: Remove unnecessary ISBs 
from set_{pte,pmd,pud})
Merging m68k-current/for-linus (b12c8a70643f m68k: Set default dma mask for 
platform devices)
Merging powerpc-fixes/fixes (22db552b50fa powerpc/powermac: Fix rtc read/write 
functions)
Merging sparc/master (1aaccb5fa0ea Merge tag 'rtc-4.18' of 
git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux)
Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2)
Merging net/master (248c690a2dc8 Merge tag 
'wireless-drivers-for-davem-2018-07-03' of 
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers)
Merging bpf/master (ed2b82c03dc1 bpf: hash map: decrement counter on error)
Merging ipsec/master (7284fdf39a91 esp6: fix memleak on error path in 
esp6_input)
Merging netfilter/master (24ac3a08e658 net/smc: rebuild nonblocking connect)
Merging ipvs/master (312564269535 net: netsec: reduce DMA mask to 40 bits)
Merging wireless-drivers/master (248c690a2dc8 Merge tag 
'wireless-drivers-for-davem-2018-07-03' of 
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers)
Merging mac80211/master (19103a4bfb42 mac80211: add stations tied to AP_VLANs 
during hw reconfig)
Merging rdma-fixes/for-rc (b697d7d8c741 IB/hfi1: Fix incorrect mixing of 
ERR_PTR and NULL return values)
Merging sound-current/for-linus (aaa23f86001b ALSA: hda - Handle pm failure 
during hotplug)
Merging sound-asoc-fixes/for-linus (16a27dd8d9e3 Merge branch 'asoc-4.18' into 
asoc-linus)
Merging regmap-fixes/for-linus (7daf201d7fe8 Linux 4.18-rc2)
Merging regulator-fixes/for-linus (69e00058b86c Merge branch 'regulator-4.18' 
into regulator-linus)
Merging spi-fixes/for-linus (d28cbf8dcb01 Merge branch 'spi-4.18' into 
spi-linus)
Merging pci-current/for-linus (83235822b8b4 nfp: stop limiting VFs to 0)
Merging driver-core.current/driver-core-linus (7daf201d7fe8 Linux 4.18-rc2)
Merging tty.current/tty-linus (021c91791a5e Linux 4.18-rc3)
Merging usb.current/usb-linus (b3a653288e1a i2c-cht-wc: Fix bq24190 supplier)
Merging usb-gadget-fixes/fixes (1d8e5c002758 dwc2: gadget: Fix ISOC IN DDMA PID 
bitfield value calculation)
Merging usb-serial-fixes/usb-linus (021c91791a5e Linux 4.18-rc3)
Merging usb-chipidea-fixes/ci-for-usb-stable (90f26cc6bb90 usb: chipidea: host: 
fix disconnection detect issue)
Merging phy/fixes (ad5003300b07 phy: mapphone-mdm6600: Fix wrong enum used for 
status lines)
Merging staging.current/staging-linus (920c92448839 staging: rtl8723bs: P

[PATCH v2 1/3] nds32: To implement these icache invalidation APIs since nds32 cores don't snoop data cache. This issue is found by Guo Ren. Based on the Documentation/core-api/cachetlb.rst and it says

2018-07-03 Thread Greentime Hu
"Any necessary cache flushing or other coherency operations
that need to occur should happen here.  If the processor's
instruction cache does not snoop cpu stores, it is very
likely that you will need to flush the instruction cache
for copy_to_user_page()."

"If the icache does not snoop stores then this
routine(flush_icache_range) will need to flush it."

Signed-off-by: Guo Ren 
Signed-off-by: Greentime Hu 
---
 arch/nds32/include/asm/cacheflush.h |  9 +--
 arch/nds32/mm/cacheflush.c  | 53 ++---
 2 files changed, 39 insertions(+), 23 deletions(-)

diff --git a/arch/nds32/include/asm/cacheflush.h 
b/arch/nds32/include/asm/cacheflush.h
index 10b48f0d8e85..8b26198d51bb 100644
--- a/arch/nds32/include/asm/cacheflush.h
+++ b/arch/nds32/include/asm/cacheflush.h
@@ -8,6 +8,8 @@
 
 #define PG_dcache_dirty PG_arch_1
 
+void flush_icache_range(unsigned long start, unsigned long end);
+void flush_icache_page(struct vm_area_struct *vma, struct page *page);
 #ifdef CONFIG_CPU_CACHE_ALIASING
 void flush_cache_mm(struct mm_struct *mm);
 void flush_cache_dup_mm(struct mm_struct *mm);
@@ -34,13 +36,16 @@ void flush_anon_page(struct vm_area_struct *vma,
 void flush_kernel_dcache_page(struct page *page);
 void flush_kernel_vmap_range(void *addr, int size);
 void invalidate_kernel_vmap_range(void *addr, int size);
-void flush_icache_range(unsigned long start, unsigned long end);
-void flush_icache_page(struct vm_area_struct *vma, struct page *page);
 #define flush_dcache_mmap_lock(mapping)   xa_lock_irq(&(mapping)->i_pages)
 #define flush_dcache_mmap_unlock(mapping) xa_unlock_irq(&(mapping)->i_pages)
 
 #else
 #include 
+#undef flush_icache_range
+#undef flush_icache_page
+#undef flush_icache_user_range
+void flush_icache_user_range(struct vm_area_struct *vma, struct page *page,
+unsigned long addr, int len);
 #endif
 
 #endif /* __NDS32_CACHEFLUSH_H__ */
diff --git a/arch/nds32/mm/cacheflush.c b/arch/nds32/mm/cacheflush.c
index ce8fd34497bf..7fcaa4e6be78 100644
--- a/arch/nds32/mm/cacheflush.c
+++ b/arch/nds32/mm/cacheflush.c
@@ -13,6 +13,38 @@
 
 extern struct cache_info L1_cache_info[2];
 
+void flush_icache_range(unsigned long start, unsigned long end)
+{
+   unsigned long line_size, flags;
+   line_size = L1_cache_info[DCACHE].line_size;
+   start = start & ~(line_size - 1);
+   end = (end + line_size - 1) & ~(line_size - 1);
+   local_irq_save(flags);
+   cpu_cache_wbinval_range(start, end, 1);
+   local_irq_restore(flags);
+}
+EXPORT_SYMBOL(flush_icache_range);
+
+void flush_icache_page(struct vm_area_struct *vma, struct page *page)
+{
+   unsigned long flags;
+   unsigned long kaddr;
+   local_irq_save(flags);
+   kaddr = (unsigned long)kmap_atomic(page);
+   cpu_cache_wbinval_page(kaddr, vma->vm_flags & VM_EXEC);
+   kunmap_atomic((void *)kaddr);
+   local_irq_restore(flags);
+}
+EXPORT_SYMBOL(flush_icache_page);
+
+void flush_icache_user_range(struct vm_area_struct *vma, struct page *page,
+unsigned long addr, int len)
+{
+   unsigned long kaddr;
+   kaddr = (unsigned long)kmap_atomic(page) + (addr & ~PAGE_MASK);
+   flush_icache_range(kaddr, kaddr + len);
+   kunmap_atomic((void *)kaddr);
+}
 #ifndef CONFIG_CPU_CACHE_ALIASING
 void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr,
  pte_t * pte)
@@ -318,27 +350,6 @@ void invalidate_kernel_vmap_range(void *addr, int size)
 }
 EXPORT_SYMBOL(invalidate_kernel_vmap_range);
 
-void flush_icache_range(unsigned long start, unsigned long end)
-{
-   unsigned long line_size, flags;
-   line_size = L1_cache_info[DCACHE].line_size;
-   start = start & ~(line_size - 1);
-   end = (end + line_size - 1) & ~(line_size - 1);
-   local_irq_save(flags);
-   cpu_cache_wbinval_range(start, end, 1);
-   local_irq_restore(flags);
-}
-EXPORT_SYMBOL(flush_icache_range);
-
-void flush_icache_page(struct vm_area_struct *vma, struct page *page)
-{
-   unsigned long flags;
-   local_irq_save(flags);
-   cpu_cache_wbinval_page((unsigned long)page_address(page),
-  vma->vm_flags & VM_EXEC);
-   local_irq_restore(flags);
-}
-
 void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr,
  pte_t * pte)
 {
-- 
2.16.2



[PATCH v2 1/3] nds32: To implement these icache invalidation APIs since nds32 cores don't snoop data cache. This issue is found by Guo Ren. Based on the Documentation/core-api/cachetlb.rst and it says

2018-07-03 Thread Greentime Hu
"Any necessary cache flushing or other coherency operations
that need to occur should happen here.  If the processor's
instruction cache does not snoop cpu stores, it is very
likely that you will need to flush the instruction cache
for copy_to_user_page()."

"If the icache does not snoop stores then this
routine(flush_icache_range) will need to flush it."

Signed-off-by: Guo Ren 
Signed-off-by: Greentime Hu 
---
 arch/nds32/include/asm/cacheflush.h |  9 +--
 arch/nds32/mm/cacheflush.c  | 53 ++---
 2 files changed, 39 insertions(+), 23 deletions(-)

diff --git a/arch/nds32/include/asm/cacheflush.h 
b/arch/nds32/include/asm/cacheflush.h
index 10b48f0d8e85..8b26198d51bb 100644
--- a/arch/nds32/include/asm/cacheflush.h
+++ b/arch/nds32/include/asm/cacheflush.h
@@ -8,6 +8,8 @@
 
 #define PG_dcache_dirty PG_arch_1
 
+void flush_icache_range(unsigned long start, unsigned long end);
+void flush_icache_page(struct vm_area_struct *vma, struct page *page);
 #ifdef CONFIG_CPU_CACHE_ALIASING
 void flush_cache_mm(struct mm_struct *mm);
 void flush_cache_dup_mm(struct mm_struct *mm);
@@ -34,13 +36,16 @@ void flush_anon_page(struct vm_area_struct *vma,
 void flush_kernel_dcache_page(struct page *page);
 void flush_kernel_vmap_range(void *addr, int size);
 void invalidate_kernel_vmap_range(void *addr, int size);
-void flush_icache_range(unsigned long start, unsigned long end);
-void flush_icache_page(struct vm_area_struct *vma, struct page *page);
 #define flush_dcache_mmap_lock(mapping)   xa_lock_irq(&(mapping)->i_pages)
 #define flush_dcache_mmap_unlock(mapping) xa_unlock_irq(&(mapping)->i_pages)
 
 #else
 #include 
+#undef flush_icache_range
+#undef flush_icache_page
+#undef flush_icache_user_range
+void flush_icache_user_range(struct vm_area_struct *vma, struct page *page,
+unsigned long addr, int len);
 #endif
 
 #endif /* __NDS32_CACHEFLUSH_H__ */
diff --git a/arch/nds32/mm/cacheflush.c b/arch/nds32/mm/cacheflush.c
index ce8fd34497bf..7fcaa4e6be78 100644
--- a/arch/nds32/mm/cacheflush.c
+++ b/arch/nds32/mm/cacheflush.c
@@ -13,6 +13,38 @@
 
 extern struct cache_info L1_cache_info[2];
 
+void flush_icache_range(unsigned long start, unsigned long end)
+{
+   unsigned long line_size, flags;
+   line_size = L1_cache_info[DCACHE].line_size;
+   start = start & ~(line_size - 1);
+   end = (end + line_size - 1) & ~(line_size - 1);
+   local_irq_save(flags);
+   cpu_cache_wbinval_range(start, end, 1);
+   local_irq_restore(flags);
+}
+EXPORT_SYMBOL(flush_icache_range);
+
+void flush_icache_page(struct vm_area_struct *vma, struct page *page)
+{
+   unsigned long flags;
+   unsigned long kaddr;
+   local_irq_save(flags);
+   kaddr = (unsigned long)kmap_atomic(page);
+   cpu_cache_wbinval_page(kaddr, vma->vm_flags & VM_EXEC);
+   kunmap_atomic((void *)kaddr);
+   local_irq_restore(flags);
+}
+EXPORT_SYMBOL(flush_icache_page);
+
+void flush_icache_user_range(struct vm_area_struct *vma, struct page *page,
+unsigned long addr, int len)
+{
+   unsigned long kaddr;
+   kaddr = (unsigned long)kmap_atomic(page) + (addr & ~PAGE_MASK);
+   flush_icache_range(kaddr, kaddr + len);
+   kunmap_atomic((void *)kaddr);
+}
 #ifndef CONFIG_CPU_CACHE_ALIASING
 void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr,
  pte_t * pte)
@@ -318,27 +350,6 @@ void invalidate_kernel_vmap_range(void *addr, int size)
 }
 EXPORT_SYMBOL(invalidate_kernel_vmap_range);
 
-void flush_icache_range(unsigned long start, unsigned long end)
-{
-   unsigned long line_size, flags;
-   line_size = L1_cache_info[DCACHE].line_size;
-   start = start & ~(line_size - 1);
-   end = (end + line_size - 1) & ~(line_size - 1);
-   local_irq_save(flags);
-   cpu_cache_wbinval_range(start, end, 1);
-   local_irq_restore(flags);
-}
-EXPORT_SYMBOL(flush_icache_range);
-
-void flush_icache_page(struct vm_area_struct *vma, struct page *page)
-{
-   unsigned long flags;
-   local_irq_save(flags);
-   cpu_cache_wbinval_page((unsigned long)page_address(page),
-  vma->vm_flags & VM_EXEC);
-   local_irq_restore(flags);
-}
-
 void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr,
  pte_t * pte)
 {
-- 
2.16.2



[PATCH v2 2/3] nds32: Fix the dts pointer is not passed correctly issue.

2018-07-03 Thread Greentime Hu
We found that the original implementation will only use the built-in dtb
pointer instead of the pointer pass from bootloader. This bug is fixed
by this patch.

Signed-off-by: Greentime Hu 
---
 arch/nds32/kernel/setup.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/nds32/kernel/setup.c b/arch/nds32/kernel/setup.c
index 2f5b2ccebe47..63a1a5ef5219 100644
--- a/arch/nds32/kernel/setup.c
+++ b/arch/nds32/kernel/setup.c
@@ -278,7 +278,8 @@ static void __init setup_memory(void)
 
 void __init setup_arch(char **cmdline_p)
 {
-   early_init_devtree( __dtb_start);
+   early_init_devtree(__atags_pointer ? \
+   phys_to_virt(__atags_pointer) : __dtb_start);
 
setup_cpuinfo();
 
-- 
2.16.2



[PATCH 01/11] hugetlb: Harmonize hugetlb.h arch specific defines with pgtable.h

2018-07-03 Thread Alexandre Ghiti
asm-generic/hugetlb.h proposes generic implementations of hugetlb
related functions: use __HAVE_ARCH_HUGE* defines in order to make arch
specific implementations of hugetlb functions consistent with pgtable.h
scheme.

Signed-off-by: Alexandre Ghiti 
---
 arch/arm64/include/asm/hugetlb.h | 2 +-
 include/asm-generic/hugetlb.h| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index e73f68569624..3fcf14663dfa 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -81,9 +81,9 @@ extern void huge_ptep_set_wrprotect(struct mm_struct *mm,
unsigned long addr, pte_t *ptep);
 extern void huge_ptep_clear_flush(struct vm_area_struct *vma,
  unsigned long addr, pte_t *ptep);
+#define __HAVE_ARCH_HUGE_PTE_CLEAR
 extern void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
   pte_t *ptep, unsigned long sz);
-#define huge_pte_clear huge_pte_clear
 extern void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr,
 pte_t *ptep, pte_t pte, unsigned long sz);
 #define set_huge_swap_pte_at set_huge_swap_pte_at
diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h
index 9d0cde8ab716..3da7cff52360 100644
--- a/include/asm-generic/hugetlb.h
+++ b/include/asm-generic/hugetlb.h
@@ -32,7 +32,7 @@ static inline pte_t huge_pte_modify(pte_t pte, pgprot_t 
newprot)
return pte_modify(pte, newprot);
 }
 
-#ifndef huge_pte_clear
+#ifndef __HAVE_ARCH_HUGE_PTE_CLEAR
 static inline void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, unsigned long sz)
 {
-- 
2.16.2



[PATCH v2 2/3] nds32: Fix the dts pointer is not passed correctly issue.

2018-07-03 Thread Greentime Hu
We found that the original implementation will only use the built-in dtb
pointer instead of the pointer pass from bootloader. This bug is fixed
by this patch.

Signed-off-by: Greentime Hu 
---
 arch/nds32/kernel/setup.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/nds32/kernel/setup.c b/arch/nds32/kernel/setup.c
index 2f5b2ccebe47..63a1a5ef5219 100644
--- a/arch/nds32/kernel/setup.c
+++ b/arch/nds32/kernel/setup.c
@@ -278,7 +278,8 @@ static void __init setup_memory(void)
 
 void __init setup_arch(char **cmdline_p)
 {
-   early_init_devtree( __dtb_start);
+   early_init_devtree(__atags_pointer ? \
+   phys_to_virt(__atags_pointer) : __dtb_start);
 
setup_cpuinfo();
 
-- 
2.16.2



[PATCH 01/11] hugetlb: Harmonize hugetlb.h arch specific defines with pgtable.h

2018-07-03 Thread Alexandre Ghiti
asm-generic/hugetlb.h proposes generic implementations of hugetlb
related functions: use __HAVE_ARCH_HUGE* defines in order to make arch
specific implementations of hugetlb functions consistent with pgtable.h
scheme.

Signed-off-by: Alexandre Ghiti 
---
 arch/arm64/include/asm/hugetlb.h | 2 +-
 include/asm-generic/hugetlb.h| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index e73f68569624..3fcf14663dfa 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -81,9 +81,9 @@ extern void huge_ptep_set_wrprotect(struct mm_struct *mm,
unsigned long addr, pte_t *ptep);
 extern void huge_ptep_clear_flush(struct vm_area_struct *vma,
  unsigned long addr, pte_t *ptep);
+#define __HAVE_ARCH_HUGE_PTE_CLEAR
 extern void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
   pte_t *ptep, unsigned long sz);
-#define huge_pte_clear huge_pte_clear
 extern void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr,
 pte_t *ptep, pte_t pte, unsigned long sz);
 #define set_huge_swap_pte_at set_huge_swap_pte_at
diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h
index 9d0cde8ab716..3da7cff52360 100644
--- a/include/asm-generic/hugetlb.h
+++ b/include/asm-generic/hugetlb.h
@@ -32,7 +32,7 @@ static inline pte_t huge_pte_modify(pte_t pte, pgprot_t 
newprot)
return pte_modify(pte, newprot);
 }
 
-#ifndef huge_pte_clear
+#ifndef __HAVE_ARCH_HUGE_PTE_CLEAR
 static inline void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, unsigned long sz)
 {
-- 
2.16.2



[PATCH v2 3/3] nds32: To simplify the implementation of update_mmu_cache()

2018-07-03 Thread Greentime Hu
The checking code is done in kmap_atomic() so that we don't need to
check it in update_mmu_cache() again. There is no need to implement
it for cache aliasing or cache non-aliasing versions. We can just
implement one version for both.

Signed-off-by: Greentime Hu 
---
 arch/nds32/mm/cacheflush.c | 47 --
 1 file changed, 8 insertions(+), 39 deletions(-)

diff --git a/arch/nds32/mm/cacheflush.c b/arch/nds32/mm/cacheflush.c
index 7fcaa4e6be78..254703653b6f 100644
--- a/arch/nds32/mm/cacheflush.c
+++ b/arch/nds32/mm/cacheflush.c
@@ -45,7 +45,7 @@ void flush_icache_user_range(struct vm_area_struct *vma, 
struct page *page,
flush_icache_range(kaddr, kaddr + len);
kunmap_atomic((void *)kaddr);
 }
-#ifndef CONFIG_CPU_CACHE_ALIASING
+
 void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr,
  pte_t * pte)
 {
@@ -67,19 +67,15 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned 
long addr,
 
if ((test_and_clear_bit(PG_dcache_dirty, >flags)) ||
(vma->vm_flags & VM_EXEC)) {
-
-   if (!PageHighMem(page)) {
-   cpu_cache_wbinval_page((unsigned long)
-  page_address(page),
-  vma->vm_flags & VM_EXEC);
-   } else {
-   unsigned long kaddr = (unsigned long)kmap_atomic(page);
-   cpu_cache_wbinval_page(kaddr, vma->vm_flags & VM_EXEC);
-   kunmap_atomic((void *)kaddr);
-   }
+   unsigned long kaddr;
+   local_irq_save(flags);
+   kaddr = (unsigned long)kmap_atomic(page);
+   cpu_cache_wbinval_page(kaddr, vma->vm_flags & VM_EXEC);
+   kunmap_atomic((void *)kaddr);
+   local_irq_restore(flags);
}
 }
-#else
+#ifdef CONFIG_CPU_CACHE_ALIASING
 extern pte_t va_present(struct mm_struct *mm, unsigned long addr);
 
 static inline unsigned long aliasing(unsigned long addr, unsigned long page)
@@ -349,31 +345,4 @@ void invalidate_kernel_vmap_range(void *addr, int size)
local_irq_restore(flags);
 }
 EXPORT_SYMBOL(invalidate_kernel_vmap_range);
-
-void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr,
- pte_t * pte)
-{
-   struct page *page;
-   unsigned long flags;
-   unsigned long pfn = pte_pfn(*pte);
-
-   if (!pfn_valid(pfn))
-   return;
-
-   if (vma->vm_mm == current->active_mm) {
-   local_irq_save(flags);
-   __nds32__mtsr_dsb(addr, NDS32_SR_TLB_VPN);
-   __nds32__tlbop_rwr(*pte);
-   __nds32__isb();
-   local_irq_restore(flags);
-   }
-
-   page = pfn_to_page(pfn);
-   if (test_and_clear_bit(PG_dcache_dirty, >flags) ||
-   (vma->vm_flags & VM_EXEC)) {
-   local_irq_save(flags);
-   cpu_dcache_wbinval_page((unsigned long)page_address(page));
-   local_irq_restore(flags);
-   }
-}
 #endif
-- 
2.16.2



[PATCH v2 3/3] nds32: To simplify the implementation of update_mmu_cache()

2018-07-03 Thread Greentime Hu
The checking code is done in kmap_atomic() so that we don't need to
check it in update_mmu_cache() again. There is no need to implement
it for cache aliasing or cache non-aliasing versions. We can just
implement one version for both.

Signed-off-by: Greentime Hu 
---
 arch/nds32/mm/cacheflush.c | 47 --
 1 file changed, 8 insertions(+), 39 deletions(-)

diff --git a/arch/nds32/mm/cacheflush.c b/arch/nds32/mm/cacheflush.c
index 7fcaa4e6be78..254703653b6f 100644
--- a/arch/nds32/mm/cacheflush.c
+++ b/arch/nds32/mm/cacheflush.c
@@ -45,7 +45,7 @@ void flush_icache_user_range(struct vm_area_struct *vma, 
struct page *page,
flush_icache_range(kaddr, kaddr + len);
kunmap_atomic((void *)kaddr);
 }
-#ifndef CONFIG_CPU_CACHE_ALIASING
+
 void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr,
  pte_t * pte)
 {
@@ -67,19 +67,15 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned 
long addr,
 
if ((test_and_clear_bit(PG_dcache_dirty, >flags)) ||
(vma->vm_flags & VM_EXEC)) {
-
-   if (!PageHighMem(page)) {
-   cpu_cache_wbinval_page((unsigned long)
-  page_address(page),
-  vma->vm_flags & VM_EXEC);
-   } else {
-   unsigned long kaddr = (unsigned long)kmap_atomic(page);
-   cpu_cache_wbinval_page(kaddr, vma->vm_flags & VM_EXEC);
-   kunmap_atomic((void *)kaddr);
-   }
+   unsigned long kaddr;
+   local_irq_save(flags);
+   kaddr = (unsigned long)kmap_atomic(page);
+   cpu_cache_wbinval_page(kaddr, vma->vm_flags & VM_EXEC);
+   kunmap_atomic((void *)kaddr);
+   local_irq_restore(flags);
}
 }
-#else
+#ifdef CONFIG_CPU_CACHE_ALIASING
 extern pte_t va_present(struct mm_struct *mm, unsigned long addr);
 
 static inline unsigned long aliasing(unsigned long addr, unsigned long page)
@@ -349,31 +345,4 @@ void invalidate_kernel_vmap_range(void *addr, int size)
local_irq_restore(flags);
 }
 EXPORT_SYMBOL(invalidate_kernel_vmap_range);
-
-void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr,
- pte_t * pte)
-{
-   struct page *page;
-   unsigned long flags;
-   unsigned long pfn = pte_pfn(*pte);
-
-   if (!pfn_valid(pfn))
-   return;
-
-   if (vma->vm_mm == current->active_mm) {
-   local_irq_save(flags);
-   __nds32__mtsr_dsb(addr, NDS32_SR_TLB_VPN);
-   __nds32__tlbop_rwr(*pte);
-   __nds32__isb();
-   local_irq_restore(flags);
-   }
-
-   page = pfn_to_page(pfn);
-   if (test_and_clear_bit(PG_dcache_dirty, >flags) ||
-   (vma->vm_flags & VM_EXEC)) {
-   local_irq_save(flags);
-   cpu_dcache_wbinval_page((unsigned long)page_address(page));
-   local_irq_restore(flags);
-   }
-}
 #endif
-- 
2.16.2



Re: [PATCH v7] add param that allows bootline control of hardened usercopy

2018-07-03 Thread Kees Cook
On Tue, Jul 3, 2018 at 12:43 PM, Chris von Recklinghausen
 wrote:
> Enabling HARDENED_USERCOPY causes measurable regressions in
>  networking performance, up to 8% under UDP flood.
>
> I'm running an a small packet UDP flood using pktgen vs. a host b2b
> connected. On the receiver side the UDP packets are processed by a
> simple user space process that just reads and drops them:
>
> https://github.com/netoptimizer/network-testing/blob/master/src/udp_sink.c
>
> Not very useful from a functional PoV, but it helps to pin-point
> bottlenecks in the networking stack.
>
> When running a kernel with CONFIG_HARDENED_USERCOPY=y, I see a 5-8%
> regression in the receive tput, compared to the same kernel without
> this option enabled.
>
> With CONFIG_HARDENED_USERCOPY=y, perf shows ~6% of CPU time spent
> cumulatively in __check_object_size (~4%) and __virt_addr_valid (~2%).
>
> The call-chain is:
>
> __GI___libc_recvfrom
> entry_SYSCALL_64_after_hwframe
> do_syscall_64
> __x64_sys_recvfrom
> __sys_recvfrom
> inet_recvmsg
> udp_recvmsg
> __check_object_size
>
> udp_recvmsg() actually calls copy_to_iter() (inlined) and the latters
> calls check_copy_size() (again, inlined).
>
> A generic distro may want to enable HARDENED_USERCOPY in their default
> kernel config, but at the same time, such distro may want to be able to
> avoid the performance penalties in with the default configuration and
> disable the stricter check on a per-boot basis.
>
> This change adds a boot parameter that conditionally disables
> HARDENED_USERCOPY at boot time.
>
> v6->v7:
> remove EXPORT_SYMBOL(bypass_usercopy_checks);
> remove mention of CONFIG_JUMP_LABEL from commit comments
> v5->v6:
> no need to key off of anything - build errors were when jump label
> code was in include/linux/thread_info.h.
> v4->v5:
> key off of CONFIG_JUMP_LABEL, not CONFIG_SMP_BROKEN.
>
> v3->v4:
> fix a couple of nits in commit comments
> declaration of bypass_usercopy_checks moved inside mm/usercopy.c and
> made static
> add blurb to commit comments about not enabling this functionality on
> platforms with CONFIG_BROKEN_ON_SMP set.
> v2->v3:
> add benchmark details to commit comments
> Don't add new item to Documentation/admin-guide/kernel-parameters.rst
> rename boot param to "hardened_usercopy="
> update description in Documentation/admin-guide/kernel-parameters.txt
> static_branch_likely -> static_branch_unlikely
> add __ro_after_init versions of DEFINE_STATIC_KEY_FALSE,
> DEFINE_STATIC_KEY_TRUE
> disable_huc_atboot -> enable_checks (strtobool "on" == true)
>
> v1->v2:
> remove CONFIG_HUC_DEFAULT_OFF
> default is now enabled, boot param disables
> move check to __check_object_size so as to not break optimization of
> __builtin_constant_p()
> include linux/atomic.h before linux/jump_label.h
>
> Signed-off-by: Chris von Recklinghausen 

Thanks, this looks good! I'll get this added to the usercopy tree.

(FYI, the version->version changes normally go under the "---" line.
I'll remove them when I apply this patch, so no need to resend.)

-Kees

> ---
>  .../admin-guide/kernel-parameters.txt | 11 
>  include/linux/jump_label.h|  6 +
>  mm/usercopy.c | 25 +++
>  3 files changed, 42 insertions(+)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> b/Documentation/admin-guide/kernel-parameters.txt
> index efc7aa7a0670..560d4dc66f02 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -816,6 +816,17 @@
> disable=[IPV6]
> See Documentation/networking/ipv6.txt.
>
> +   hardened_usercopy=
> +[KNL] Under CONFIG_HARDENED_USERCOPY, whether
> +hardening is enabled for this boot. Hardened
> +usercopy checking is used to protect the kernel
> +from reading or writing beyond known memory
> +allocation boundaries as a proactive defense
> +against bounds-checking flaws in the kernel's
> +copy_to_user()/copy_from_user() interface.
> +on  Perform hardened usercopy checks (default).
> +off Disable hardened usercopy checks.
> +
> disable_radix   [PPC]
> Disable RADIX MMU mode on POWER9
>
> diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
> index b46b541c67c4..1a0b6f17a5d6 100644
> --- a/include/linux/jump_label.h
> +++ b/include/linux/jump_label.h
> @@ -299,12 +299,18 @@ struct static_key_false {
>  #define DEFINE_STATIC_KEY_TRUE(name)   \
> struct static_key_true name = 

Re: [PATCH v7] add param that allows bootline control of hardened usercopy

2018-07-03 Thread Kees Cook
On Tue, Jul 3, 2018 at 12:43 PM, Chris von Recklinghausen
 wrote:
> Enabling HARDENED_USERCOPY causes measurable regressions in
>  networking performance, up to 8% under UDP flood.
>
> I'm running an a small packet UDP flood using pktgen vs. a host b2b
> connected. On the receiver side the UDP packets are processed by a
> simple user space process that just reads and drops them:
>
> https://github.com/netoptimizer/network-testing/blob/master/src/udp_sink.c
>
> Not very useful from a functional PoV, but it helps to pin-point
> bottlenecks in the networking stack.
>
> When running a kernel with CONFIG_HARDENED_USERCOPY=y, I see a 5-8%
> regression in the receive tput, compared to the same kernel without
> this option enabled.
>
> With CONFIG_HARDENED_USERCOPY=y, perf shows ~6% of CPU time spent
> cumulatively in __check_object_size (~4%) and __virt_addr_valid (~2%).
>
> The call-chain is:
>
> __GI___libc_recvfrom
> entry_SYSCALL_64_after_hwframe
> do_syscall_64
> __x64_sys_recvfrom
> __sys_recvfrom
> inet_recvmsg
> udp_recvmsg
> __check_object_size
>
> udp_recvmsg() actually calls copy_to_iter() (inlined) and the latters
> calls check_copy_size() (again, inlined).
>
> A generic distro may want to enable HARDENED_USERCOPY in their default
> kernel config, but at the same time, such distro may want to be able to
> avoid the performance penalties in with the default configuration and
> disable the stricter check on a per-boot basis.
>
> This change adds a boot parameter that conditionally disables
> HARDENED_USERCOPY at boot time.
>
> v6->v7:
> remove EXPORT_SYMBOL(bypass_usercopy_checks);
> remove mention of CONFIG_JUMP_LABEL from commit comments
> v5->v6:
> no need to key off of anything - build errors were when jump label
> code was in include/linux/thread_info.h.
> v4->v5:
> key off of CONFIG_JUMP_LABEL, not CONFIG_SMP_BROKEN.
>
> v3->v4:
> fix a couple of nits in commit comments
> declaration of bypass_usercopy_checks moved inside mm/usercopy.c and
> made static
> add blurb to commit comments about not enabling this functionality on
> platforms with CONFIG_BROKEN_ON_SMP set.
> v2->v3:
> add benchmark details to commit comments
> Don't add new item to Documentation/admin-guide/kernel-parameters.rst
> rename boot param to "hardened_usercopy="
> update description in Documentation/admin-guide/kernel-parameters.txt
> static_branch_likely -> static_branch_unlikely
> add __ro_after_init versions of DEFINE_STATIC_KEY_FALSE,
> DEFINE_STATIC_KEY_TRUE
> disable_huc_atboot -> enable_checks (strtobool "on" == true)
>
> v1->v2:
> remove CONFIG_HUC_DEFAULT_OFF
> default is now enabled, boot param disables
> move check to __check_object_size so as to not break optimization of
> __builtin_constant_p()
> include linux/atomic.h before linux/jump_label.h
>
> Signed-off-by: Chris von Recklinghausen 

Thanks, this looks good! I'll get this added to the usercopy tree.

(FYI, the version->version changes normally go under the "---" line.
I'll remove them when I apply this patch, so no need to resend.)

-Kees

> ---
>  .../admin-guide/kernel-parameters.txt | 11 
>  include/linux/jump_label.h|  6 +
>  mm/usercopy.c | 25 +++
>  3 files changed, 42 insertions(+)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> b/Documentation/admin-guide/kernel-parameters.txt
> index efc7aa7a0670..560d4dc66f02 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -816,6 +816,17 @@
> disable=[IPV6]
> See Documentation/networking/ipv6.txt.
>
> +   hardened_usercopy=
> +[KNL] Under CONFIG_HARDENED_USERCOPY, whether
> +hardening is enabled for this boot. Hardened
> +usercopy checking is used to protect the kernel
> +from reading or writing beyond known memory
> +allocation boundaries as a proactive defense
> +against bounds-checking flaws in the kernel's
> +copy_to_user()/copy_from_user() interface.
> +on  Perform hardened usercopy checks (default).
> +off Disable hardened usercopy checks.
> +
> disable_radix   [PPC]
> Disable RADIX MMU mode on POWER9
>
> diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
> index b46b541c67c4..1a0b6f17a5d6 100644
> --- a/include/linux/jump_label.h
> +++ b/include/linux/jump_label.h
> @@ -299,12 +299,18 @@ struct static_key_false {
>  #define DEFINE_STATIC_KEY_TRUE(name)   \
> struct static_key_true name = 

Re: [PATCH v5 07/12] PM / devfreq: export devfreq_class

2018-07-03 Thread Chanwoo Choi
Hi,

I didn't see any framework which exporting the class instance.
It is very dangerous. Unknown device drivers is able to reset
the 'devfreq_class' instance. I can't agree this approach.

Regards,
Chanwoo Choi


On 2018년 07월 04일 08:47, Matthias Kaehlcke wrote:
> Exporting the device class allows other parts of the kernel to enumerate
> the devfreq devices and receive notification when a devfreq device is
> added or removed.
> 
> Signed-off-by: Matthias Kaehlcke 
> Reviewed-by: Brian Norris 
> ---
> Changes in v5:
> - none
> 
> Changes in v4:
> - added 'Reviewed-by: Brian Norris ' tag
> 
> Changes in v3:
> - none
> 
> Changes in v2:
> - patch added to series
> ---
>  drivers/devfreq/devfreq.c | 3 ++-
>  include/linux/devfreq.h   | 2 ++
>  2 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
> index 4cbaa7ad1972..38b90b64fc6e 100644
> --- a/drivers/devfreq/devfreq.c
> +++ b/drivers/devfreq/devfreq.c
> @@ -31,7 +31,8 @@
>  #define MAX(a,b) ((a > b) ? a : b)
>  #define MIN(a,b) ((a < b) ? a : b)
>  
> -static struct class *devfreq_class;
> +struct class *devfreq_class;
> +EXPORT_SYMBOL_GPL(devfreq_class);
>  
>  /*
>   * devfreq core provides delayed work based load monitoring helper
> diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
> index c4f84a769cb5..964e064a951f 100644
> --- a/include/linux/devfreq.h
> +++ b/include/linux/devfreq.h
> @@ -206,6 +206,8 @@ struct devfreq_freqs {
>  };
>  
>  #if defined(CONFIG_PM_DEVFREQ)
> +extern struct class *devfreq_class;
> +
>  extern struct devfreq *devfreq_add_device(struct device *dev,
> struct devfreq_dev_profile *profile,
> const char *governor_name,
> 



Re: [PATCH v5 07/12] PM / devfreq: export devfreq_class

2018-07-03 Thread Chanwoo Choi
Hi,

I didn't see any framework which exporting the class instance.
It is very dangerous. Unknown device drivers is able to reset
the 'devfreq_class' instance. I can't agree this approach.

Regards,
Chanwoo Choi


On 2018년 07월 04일 08:47, Matthias Kaehlcke wrote:
> Exporting the device class allows other parts of the kernel to enumerate
> the devfreq devices and receive notification when a devfreq device is
> added or removed.
> 
> Signed-off-by: Matthias Kaehlcke 
> Reviewed-by: Brian Norris 
> ---
> Changes in v5:
> - none
> 
> Changes in v4:
> - added 'Reviewed-by: Brian Norris ' tag
> 
> Changes in v3:
> - none
> 
> Changes in v2:
> - patch added to series
> ---
>  drivers/devfreq/devfreq.c | 3 ++-
>  include/linux/devfreq.h   | 2 ++
>  2 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
> index 4cbaa7ad1972..38b90b64fc6e 100644
> --- a/drivers/devfreq/devfreq.c
> +++ b/drivers/devfreq/devfreq.c
> @@ -31,7 +31,8 @@
>  #define MAX(a,b) ((a > b) ? a : b)
>  #define MIN(a,b) ((a < b) ? a : b)
>  
> -static struct class *devfreq_class;
> +struct class *devfreq_class;
> +EXPORT_SYMBOL_GPL(devfreq_class);
>  
>  /*
>   * devfreq core provides delayed work based load monitoring helper
> diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
> index c4f84a769cb5..964e064a951f 100644
> --- a/include/linux/devfreq.h
> +++ b/include/linux/devfreq.h
> @@ -206,6 +206,8 @@ struct devfreq_freqs {
>  };
>  
>  #if defined(CONFIG_PM_DEVFREQ)
> +extern struct class *devfreq_class;
> +
>  extern struct devfreq *devfreq_add_device(struct device *dev,
> struct devfreq_dev_profile *profile,
> const char *governor_name,
> 



Re: [PATCH V2 19/19] irqchip: add C-SKY irqchip drivers

2018-07-03 Thread Guo Ren
On Tue, Jul 03, 2018 at 11:28:03AM +0200, Thomas Gleixner wrote:
> -EEMPTYCHANGELOG
Ok, I'll seperate this patchset's changelog from cover-letter in next
version patch.

> > +#ifdef CONFIG_CSKY_VECIRQ_LEGENCY
> 
> I assume you meant _LEGACY
Yes.

> > +#include 
> > +#endif
> 
> Also why making the include conditional. Just include it always and be done
> with it.
Ok.

> > +static void __iomem *reg_base;
> > +
> > +#define INTC_ICR   0x00
> > +#define INTC_ISR   0x00
> > +#define INTC_NEN31_00  0x10
> > +#define INTC_NEN63_32  0x28
> > +#define INTC_IFR31_00  0x08
> > +#define INTC_IFR63_32  0x20
> > +#define INTC_SOURCE0x40
> > +
> > +#define INTC_IRQS  64
> > +
> > +#define INTC_ICR_AVE   BIT(31)
> > +
> > +#define VEC_IRQ_BASE   32
> > +
> > +static struct irq_domain *root_domain;
> > +
> > +static void __init ck_set_gc(void __iomem *reg_base, u32 irq_base,
> > +u32 mask_reg)
> > +{
> > +   struct irq_chip_generic *gc;
> > +
> > +   gc = irq_get_domain_generic_chip(root_domain, irq_base);
> > +   gc->reg_base = reg_base;
> > +   gc->chip_types[0].regs.mask = mask_reg;
> > +   gc->chip_types[0].chip.irq_mask = irq_gc_mask_clr_bit;
> > +   gc->chip_types[0].chip.irq_unmask = irq_gc_mask_set_bit;
> > +}
> > +
> > +static struct irq_domain *root_domain;
> 
> You have the same declaration 10 lines above already
Opps, thx. I'll remove.

> > +static void ck_irq_handler(struct pt_regs *regs)
> > +{
> > +#ifdef CONFIG_CSKY_VECIRQ_LEGENCY
> > +   irq_hw_number_t irq = ((mfcr("psr") >> 16) & 0xff) - VEC_IRQ_BASE;
> > +#else
> > +   irq_hw_number_t irq = readl_relaxed(reg_base + INTC_ISR) & 0x3f;
> > +#endif
> 
> You can avoid the ifdeffery by doing:
> 
>   irq_hw_number_t irq;
> 
>   if (IS_ENABLED(CONFIG_CSKY_VECIRQ_LEGENCY))
>   irq = ((mfcr("psr") >> 16) & 0xff) - VEC_IRQ_BASE;
>   else
>   irq = readl_relaxed(reg_base + INTC_ISR) & 0x3f;
> 
> which makes the whole thing more readable.
Ok, good idea.

> > +#define expand_byte_to_word(i) (i|(i<<8)|(i<<16)|(i<<24))
> > +static inline void setup_irq_channel(void __iomem *reg_base)
> 
> Please do not glue a define and a function declaration togetther w/o a new
> line in between. That's really hard to parse.
> 
> Also please make that expand macro an inline function.
> 
> > +{
> > +   int i;
> 
> Bah: writel_relaxed(u32 value, ...). So why 'int' ? Just because?
> 
> > +
> > +   /*
> > +* There are 64 irq nums and irq-channels and one byte per channel.
> > +* Setup every channel with the same hwirq num.
> 
> This is magic and not understandable for an outsider.
> 
> > +*/
> > +   for (i = 0; i < INTC_IRQS; i += 4) {
> > +   writel_relaxed(expand_byte_to_word(i) + 0x00010203,
> 
> No magic numbers please w/o explanation. And '+' is the wrong operator
> here, really.  Stick that into the expand function:
> 
> static inline u32 build_intc_ctrl(u32 idx)
> {
>   u32 res;
> 
>   /*
>* Set the same index for each channel (or whatever
>* this means in reality).
>*/
>   res = idx | (idx << 8) | (idx || 16) | (idx << 24);
> 
>   /*
>* Set the channel magic number in descending order because
>* the most significant bit comes first. (Replace with
>* something which has not been pulled out of thin air).
>*/
>   return res | 0x00010203;
> }
> 
> Hmm?
That's Ok. And here is my implementation:

static inline u32 build_intc_ctrl(u32 idx)
{
/*
 * One channel is one byte in a word-width register, so
 * there are four channels in a word-width register.
 *
 * Set the same index for each channel and it will make
 * "irq num = channel num".
 */
return (idx | ((idx + 1) << 8) |
  ((idx + 2) << 16) | ((idx + 3) << 24));
}
Hmm? (No magic number)

> > +#ifndef CONFIG_CSKY_VECIRQ_LEGENCY
> > +   writel_relaxed(INTC_ICR_AVE, reg_base + INTC_ICR);
> > +#else
> > +   writel_relaxed(0, reg_base + INTC_ICR);
> > +#endif
> 
> See above.
Ok, use if (IS_ENABLED(CONFIG_CSKY_VECIRQ_LEGENCY))

> > +++ b/drivers/irqchip/irq-csky-v2.c
> > @@ -0,0 +1,191 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +// Copyright (C) 2018 Hangzhou C-SKY Microsystems co.,ltd.
> 
> Please stick an empty newline here for separation
Ok

> > +   static void __iomem *reg_base;
> > +   irq_hw_number_t hwirq;
> > +
> > +   reg_base = *this_cpu_ptr(_reg);
> 
> Wheee!
> 
>   static void __iomem *reg_base = this_cpu_read(intcl_reg);
>   irq_hw_number_t hwirq;
> 
> Hmm?
Thx for the tips and I'll use this_cpu_read() without static.
void __iomem *reg_base = this_cpu_read(intcl_reg);

> > +   writel_relaxed(cpu, INTCG_base + INTCG_CIDSTR + (4*(d->hwirq - 
> > COMM_IRQ_BASE)));
> 
> Spaces between '4' and '*' and '(d->)' please. And to avoid the overly long
> line use a local variable to calculate the value.
Ok.

> > +   } else
> > +

Re: [PATCH V2 19/19] irqchip: add C-SKY irqchip drivers

2018-07-03 Thread Guo Ren
On Tue, Jul 03, 2018 at 11:28:03AM +0200, Thomas Gleixner wrote:
> -EEMPTYCHANGELOG
Ok, I'll seperate this patchset's changelog from cover-letter in next
version patch.

> > +#ifdef CONFIG_CSKY_VECIRQ_LEGENCY
> 
> I assume you meant _LEGACY
Yes.

> > +#include 
> > +#endif
> 
> Also why making the include conditional. Just include it always and be done
> with it.
Ok.

> > +static void __iomem *reg_base;
> > +
> > +#define INTC_ICR   0x00
> > +#define INTC_ISR   0x00
> > +#define INTC_NEN31_00  0x10
> > +#define INTC_NEN63_32  0x28
> > +#define INTC_IFR31_00  0x08
> > +#define INTC_IFR63_32  0x20
> > +#define INTC_SOURCE0x40
> > +
> > +#define INTC_IRQS  64
> > +
> > +#define INTC_ICR_AVE   BIT(31)
> > +
> > +#define VEC_IRQ_BASE   32
> > +
> > +static struct irq_domain *root_domain;
> > +
> > +static void __init ck_set_gc(void __iomem *reg_base, u32 irq_base,
> > +u32 mask_reg)
> > +{
> > +   struct irq_chip_generic *gc;
> > +
> > +   gc = irq_get_domain_generic_chip(root_domain, irq_base);
> > +   gc->reg_base = reg_base;
> > +   gc->chip_types[0].regs.mask = mask_reg;
> > +   gc->chip_types[0].chip.irq_mask = irq_gc_mask_clr_bit;
> > +   gc->chip_types[0].chip.irq_unmask = irq_gc_mask_set_bit;
> > +}
> > +
> > +static struct irq_domain *root_domain;
> 
> You have the same declaration 10 lines above already
Opps, thx. I'll remove.

> > +static void ck_irq_handler(struct pt_regs *regs)
> > +{
> > +#ifdef CONFIG_CSKY_VECIRQ_LEGENCY
> > +   irq_hw_number_t irq = ((mfcr("psr") >> 16) & 0xff) - VEC_IRQ_BASE;
> > +#else
> > +   irq_hw_number_t irq = readl_relaxed(reg_base + INTC_ISR) & 0x3f;
> > +#endif
> 
> You can avoid the ifdeffery by doing:
> 
>   irq_hw_number_t irq;
> 
>   if (IS_ENABLED(CONFIG_CSKY_VECIRQ_LEGENCY))
>   irq = ((mfcr("psr") >> 16) & 0xff) - VEC_IRQ_BASE;
>   else
>   irq = readl_relaxed(reg_base + INTC_ISR) & 0x3f;
> 
> which makes the whole thing more readable.
Ok, good idea.

> > +#define expand_byte_to_word(i) (i|(i<<8)|(i<<16)|(i<<24))
> > +static inline void setup_irq_channel(void __iomem *reg_base)
> 
> Please do not glue a define and a function declaration togetther w/o a new
> line in between. That's really hard to parse.
> 
> Also please make that expand macro an inline function.
> 
> > +{
> > +   int i;
> 
> Bah: writel_relaxed(u32 value, ...). So why 'int' ? Just because?
> 
> > +
> > +   /*
> > +* There are 64 irq nums and irq-channels and one byte per channel.
> > +* Setup every channel with the same hwirq num.
> 
> This is magic and not understandable for an outsider.
> 
> > +*/
> > +   for (i = 0; i < INTC_IRQS; i += 4) {
> > +   writel_relaxed(expand_byte_to_word(i) + 0x00010203,
> 
> No magic numbers please w/o explanation. And '+' is the wrong operator
> here, really.  Stick that into the expand function:
> 
> static inline u32 build_intc_ctrl(u32 idx)
> {
>   u32 res;
> 
>   /*
>* Set the same index for each channel (or whatever
>* this means in reality).
>*/
>   res = idx | (idx << 8) | (idx || 16) | (idx << 24);
> 
>   /*
>* Set the channel magic number in descending order because
>* the most significant bit comes first. (Replace with
>* something which has not been pulled out of thin air).
>*/
>   return res | 0x00010203;
> }
> 
> Hmm?
That's Ok. And here is my implementation:

static inline u32 build_intc_ctrl(u32 idx)
{
/*
 * One channel is one byte in a word-width register, so
 * there are four channels in a word-width register.
 *
 * Set the same index for each channel and it will make
 * "irq num = channel num".
 */
return (idx | ((idx + 1) << 8) |
  ((idx + 2) << 16) | ((idx + 3) << 24));
}
Hmm? (No magic number)

> > +#ifndef CONFIG_CSKY_VECIRQ_LEGENCY
> > +   writel_relaxed(INTC_ICR_AVE, reg_base + INTC_ICR);
> > +#else
> > +   writel_relaxed(0, reg_base + INTC_ICR);
> > +#endif
> 
> See above.
Ok, use if (IS_ENABLED(CONFIG_CSKY_VECIRQ_LEGENCY))

> > +++ b/drivers/irqchip/irq-csky-v2.c
> > @@ -0,0 +1,191 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +// Copyright (C) 2018 Hangzhou C-SKY Microsystems co.,ltd.
> 
> Please stick an empty newline here for separation
Ok

> > +   static void __iomem *reg_base;
> > +   irq_hw_number_t hwirq;
> > +
> > +   reg_base = *this_cpu_ptr(_reg);
> 
> Wheee!
> 
>   static void __iomem *reg_base = this_cpu_read(intcl_reg);
>   irq_hw_number_t hwirq;
> 
> Hmm?
Thx for the tips and I'll use this_cpu_read() without static.
void __iomem *reg_base = this_cpu_read(intcl_reg);

> > +   writel_relaxed(cpu, INTCG_base + INTCG_CIDSTR + (4*(d->hwirq - 
> > COMM_IRQ_BASE)));
> 
> Spaces between '4' and '*' and '(d->)' please. And to avoid the overly long
> line use a local variable to calculate the value.
Ok.

> > +   } else
> > +

Re: [PATCH v2 0/5] Improve Broadcom PAXC support

2018-07-03 Thread Ray Jui

Hi Lorenzo,

A friendly reminder: Have you had a chance to help to review the patch 
series below?


Thanks,

Ray

On 6/21/2018 11:22 AM, Ray Jui wrote:

Hi Lorenzo,

On 6/21/2018 9:48 AM, Lorenzo Pieralisi wrote:

On Wed, Jun 20, 2018 at 10:26:28AM -0700, Ray Jui wrote:

Hi Lorenzo/Bjorn,

Could you please help to review this patch series when you have time?

I believe I have addressed major comment in v1 from Bjorn and 
answered all

questions from Lorenzo.


I will have a look next week, sorry for the delay.


Thanks a lot. Much appreciated!

Ray



Thanks,
Lorenzo


Thanks,

Ray

On 6/11/2018 5:21 PM, Ray Jui wrote:

This patch series improves the Broadcom PAXC support by 1) adding more
quirks for specific versions of PAXC controllers; 2) adding logic to
reject internally unconfigured physical functions from the embedded
network processor acting as endpoint; 3) reducing verbose print level
in the outbound/inbound mapping code

This patch series is based off v4.17 and is available on GIHUB:
repo: https://github.com/Broadcom/arm64-linux.git
branch: sr-paxc-v2

Changes since v1:
  - consolidate 2 PAXC related patch series into 1
  - change the way how the capability list corruption is handled, per
recommendation from Bjorn. Now handle and fix up the corruption at
the config register read
  - rebase to v4.17

Ray Jui (5):
   PCI: iproc: Activate PAXC bridge quirk for more devices
   PCI: iproc: Fix up corrupted PAXC root complex config registers
   PCI: iproc: Disable MSI parsing in certain PAXC blocks
   PCI: iproc: Reject unconfigured physical functions from PAXC
   PCI: iproc: Reduce inbound/outbound mapping print level

  drivers/pci/host/pcie-iproc.c | 159 
+++---

  drivers/pci/host/pcie-iproc.h |   8 +++
  drivers/pci/quirks.c  |   3 +
  3 files changed, 144 insertions(+), 26 deletions(-)



Re: [PATCH v2 0/5] Improve Broadcom PAXC support

2018-07-03 Thread Ray Jui

Hi Lorenzo,

A friendly reminder: Have you had a chance to help to review the patch 
series below?


Thanks,

Ray

On 6/21/2018 11:22 AM, Ray Jui wrote:

Hi Lorenzo,

On 6/21/2018 9:48 AM, Lorenzo Pieralisi wrote:

On Wed, Jun 20, 2018 at 10:26:28AM -0700, Ray Jui wrote:

Hi Lorenzo/Bjorn,

Could you please help to review this patch series when you have time?

I believe I have addressed major comment in v1 from Bjorn and 
answered all

questions from Lorenzo.


I will have a look next week, sorry for the delay.


Thanks a lot. Much appreciated!

Ray



Thanks,
Lorenzo


Thanks,

Ray

On 6/11/2018 5:21 PM, Ray Jui wrote:

This patch series improves the Broadcom PAXC support by 1) adding more
quirks for specific versions of PAXC controllers; 2) adding logic to
reject internally unconfigured physical functions from the embedded
network processor acting as endpoint; 3) reducing verbose print level
in the outbound/inbound mapping code

This patch series is based off v4.17 and is available on GIHUB:
repo: https://github.com/Broadcom/arm64-linux.git
branch: sr-paxc-v2

Changes since v1:
  - consolidate 2 PAXC related patch series into 1
  - change the way how the capability list corruption is handled, per
recommendation from Bjorn. Now handle and fix up the corruption at
the config register read
  - rebase to v4.17

Ray Jui (5):
   PCI: iproc: Activate PAXC bridge quirk for more devices
   PCI: iproc: Fix up corrupted PAXC root complex config registers
   PCI: iproc: Disable MSI parsing in certain PAXC blocks
   PCI: iproc: Reject unconfigured physical functions from PAXC
   PCI: iproc: Reduce inbound/outbound mapping print level

  drivers/pci/host/pcie-iproc.c | 159 
+++---

  drivers/pci/host/pcie-iproc.h |   8 +++
  drivers/pci/quirks.c  |   3 +
  3 files changed, 144 insertions(+), 26 deletions(-)



Re: Build regressions/improvements in v4.18-rc3

2018-07-03 Thread Michael Ellerman
Helge Deller  writes:

> On 03.07.2018 03:09, Michael Ellerman wrote:
>> Helge Deller  writes:
>> 
>>> On 02.07.2018 16:09, Geert Uytterhoeven wrote:
 On Mon, Jul 2, 2018 at 4:01 PM Geert Uytterhoeven  
 wrote:
> JFYI, when comparing v4.18-rc3[1] to v4.18-rc2[3], the summaries are:
 ...
>>>
>>> Both of the following are simply happening because of old compiler which is 
>>> being used:
>>>
> [Deleted 26903 lines about "warning: ... [-Wpointer-sign]" on 
> parisc-allmodconfig]
>> 
>> It's GCC 4.6.3. Are you saying that's not supported anymore?
>> 
>> I can update to 8.1.0 if that's more useful.
>
> Yes, please.
> Any version >= 7.2.1 should be better.

OK, done. It's now using 8.1.0 from kernel.org.

Do you want kisskb to email you build failures for parsic in linux-next
and/or Linus' tree?

It only mails when a config goes from working to broken or vice versa,
so shouldn't be too chatty.

cheers


Re: Build regressions/improvements in v4.18-rc3

2018-07-03 Thread Michael Ellerman
Helge Deller  writes:

> On 03.07.2018 03:09, Michael Ellerman wrote:
>> Helge Deller  writes:
>> 
>>> On 02.07.2018 16:09, Geert Uytterhoeven wrote:
 On Mon, Jul 2, 2018 at 4:01 PM Geert Uytterhoeven  
 wrote:
> JFYI, when comparing v4.18-rc3[1] to v4.18-rc2[3], the summaries are:
 ...
>>>
>>> Both of the following are simply happening because of old compiler which is 
>>> being used:
>>>
> [Deleted 26903 lines about "warning: ... [-Wpointer-sign]" on 
> parisc-allmodconfig]
>> 
>> It's GCC 4.6.3. Are you saying that's not supported anymore?
>> 
>> I can update to 8.1.0 if that's more useful.
>
> Yes, please.
> Any version >= 7.2.1 should be better.

OK, done. It's now using 8.1.0 from kernel.org.

Do you want kisskb to email you build failures for parsic in linux-next
and/or Linus' tree?

It only mails when a config goes from working to broken or vice versa,
so shouldn't be too chatty.

cheers


Re: Build regressions/improvements in v4.18-rc3

2018-07-03 Thread Michael Ellerman
John David Anglin  writes:

> On 2018-07-02 9:09 PM, Michael Ellerman wrote:
>> It's GCC 4.6.3. Are you saying that's not supported anymore?
> See  for supported releases.

Thanks, but I mean "supported by the parisc Linux port".

Allegedly the kernel builds with GCC 3.2:

  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/changes.rst#n32


But in reality each arch has a different minimum version they actually
care about.

cheers


Re: Build regressions/improvements in v4.18-rc3

2018-07-03 Thread Michael Ellerman
John David Anglin  writes:

> On 2018-07-02 9:09 PM, Michael Ellerman wrote:
>> It's GCC 4.6.3. Are you saying that's not supported anymore?
> See  for supported releases.

Thanks, but I mean "supported by the parisc Linux port".

Allegedly the kernel builds with GCC 3.2:

  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/changes.rst#n32


But in reality each arch has a different minimum version they actually
care about.

cheers


Re: [PATCH 3/3] m68k: switch to MEMBLOCK + NO_BOOTMEM

2018-07-03 Thread Mike Rapoport
Hi Greg,

On Wed, Jul 04, 2018 at 02:39:05PM +1000, Greg Ungerer wrote:
> Hi Mike,
> 
> On 04/07/18 14:22, Mike Rapoport wrote:
> >On Wed, Jul 04, 2018 at 12:02:52PM +1000, Greg Ungerer wrote:
> >>On 04/07/18 11:39, Greg Ungerer wrote:
> >>>On 03/07/18 20:29, Mike Rapoport wrote:
> In m68k the physical memory is described by [memory_start, memory_end] for
> !MMU variant and by m68k_memory array of memory ranges for the MMU 
> version.
> This information is directly used to register the physical memory with
> memblock.
> 
> The reserve_bootmem() calls are replaced with memblock_reserve() and the
> bootmap bitmap allocation is simply dropped.
> 
> Since the MMU variant creates early mappings only for the small part of 
> the
> memory we force bottom-up allocations in memblock because otherwise we 
> will
> attempt to access memory that not yet mapped
> 
> Signed-off-by: Mike Rapoport 
> >>>
> >>>This builds cleanly for me with a m5475_defconfig, but it fails
> >>>to boot on real hardware. No console, no nothing on startup.
> >>>I haven't debugged any further yet.
> >>>
> >>>The M5475 is a ColdFire with MMU enabled target.
> >>
> >>With some early serial debug trace I see:
> >>
> >>Linux version 4.18.0-rc3-3-g109f5e551b18-dirty (gerg@goober) (gcc 
> >>version 5.4.0 (GCC)) #5 Wed Jul 4 12:00:03 AEST 2018
> >>On node 0 totalpages: 4096
> >>   DMA zone: 18 pages used for memmap
> >>   DMA zone: 0 pages reserved
> >>   DMA zone: 4096 pages, LIFO batch:0
> >>pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
> >>pcpu-alloc: [0] 0
> >>Built 1 zonelists, mobility grouping off.  Total pages: 4078
> >>Kernel command line: root=/dev/mtdblock0
> >>Dentry cache hash table entries: 4096 (order: 1, 16384 bytes)
> >>Inode-cache hash table entries: 2048 (order: 0, 8192 bytes)
> >>Sorting __ex_table...
> >>Memory: 3032K/32768K available (1489K kernel code, 96K rwdata, 240K rodata, 
> >>56K init, 77K bss, 29736K reserved, 0K cma-reserved)
> > 
> >  ^^
> >It seems I was over enthusiastic when I reserved the memory for the kernel.
> >Can you please try with the below patch:
> >
> >diff --git a/arch/m68k/mm/mcfmmu.c b/arch/m68k/mm/mcfmmu.c
> >index e9e60e1..18c7bf6 100644
> >--- a/arch/m68k/mm/mcfmmu.c
> >+++ b/arch/m68k/mm/mcfmmu.c
> >@@ -174,7 +174,7 @@ void __init cf_bootmem_alloc(void)
> > high_memory = (void *)_ramend;
> > /* Reserve kernel text/data/bss */
> >-memblock_reserve(memstart, _ramend - memstart);
> >+memblock_reserve(memstart, memstart - _rambase);
> > m68k_virt_to_node_shift = fls(_ramend - 1) - 6;
> > module_fixup(NULL, __start_fixup, __stop_fixup);
> >diff --git a/mm/memblock.c b/mm/memblock.c
> >index 03d48d8..98661be 100644
> >--- a/mm/memblock.c
> >+++ b/mm/memblock.c
> >@@ -54,7 +54,7 @@ struct memblock memblock __initdata_memblock = {
> > .current_limit  = MEMBLOCK_ALLOC_ANYWHERE,
> >  };
> >-int memblock_debug __initdata_memblock;
> >+int memblock_debug __initdata_memblock = 1;
> >  static bool system_has_some_mirror __initdata_memblock = false;
> >  static int memblock_can_resize __initdata_memblock;
> >  static int memblock_memory_in_slab __initdata_memblock = 0;
> >
> >
> >The memblock hunk is needed to see early memblock debug messages as all the
> >setup happens before parsing of the command line.
> 
> Ok, that works, boots all the way up now.

Thanks for testing. 
I'll send v2 later on today.
 
> Linux version 4.18.0-rc3-3-g109f5e551b18-dirty (gerg@goober) (gcc version 
> 5.4.0 (GCC)) #7 Wed Jul 4 14:34:48 AEST 2018
> memblock_add: [0x-0x01ff] 0x001ebaa0
> memblock_reserve: [0x00332000-0x00663fff] 0x001ebafa
> memblock_reserve: [0x01ffe000-0x01ff] 0x001efd38
> memblock_reserve: [0x01ff8000-0x01ffdfff] 0x001efd38
> memblock_virt_alloc_try_nid_nopanic: 147456 bytes align=0x0 nid=0 from=0x0 
> max_addr=0x0 0x00190dea
> memblock_reserve: [0x01fd4000-0x01ff7fff] 0x001f0466
> memblock_virt_alloc_try_nid_nopanic: 4 bytes align=0x0 nid=0 from=0x0 
> max_addr=0x0 0x001ee234
> memblock_reserve: [0x01fd3ff0-0x01fd3ff3] 0x001f0466
> memblock_virt_alloc_try_nid: 20 bytes align=0x0 nid=-1 from=0x0 max_addr=0x0 
> 0x001ea488
> memblock_reserve: [0x01fd3fd0-0x01fd3fe3] 0x001f0466
> memblock_virt_alloc_try_nid: 20 bytes align=0x0 nid=-1 from=0x0 max_addr=0x0 
> 0x001ea4a8
> memblock_reserve: [0x01fd3fb0-0x01fd3fc3] 0x001f0466
> memblock_virt_alloc_try_nid: 20 bytes align=0x0 nid=-1 from=0x0 max_addr=0x0 
> 0x001ea4c0
> memblock_reserve: [0x01fd3f90-0x01fd3fa3] 0x001f0466
> memblock_virt_alloc_try_nid_nopanic: 8192 bytes align=0x2000 nid=-1 from=0x0 
> max_addr=0x0 0x001eef30
> memblock_reserve: [0x01fd-0x01fd1fff] 0x001f0466
> memblock_virt_alloc_try_nid_nopanic: 32768 bytes align=0x2000 nid=-1 from=0x0 
> max_addr=0x0 0x001ef5d6
> memblock_reserve: [0x01fc8000-0x01fc] 0x001f0466
> 

Re: [PATCH 3/3] m68k: switch to MEMBLOCK + NO_BOOTMEM

2018-07-03 Thread Mike Rapoport
Hi Greg,

On Wed, Jul 04, 2018 at 02:39:05PM +1000, Greg Ungerer wrote:
> Hi Mike,
> 
> On 04/07/18 14:22, Mike Rapoport wrote:
> >On Wed, Jul 04, 2018 at 12:02:52PM +1000, Greg Ungerer wrote:
> >>On 04/07/18 11:39, Greg Ungerer wrote:
> >>>On 03/07/18 20:29, Mike Rapoport wrote:
> In m68k the physical memory is described by [memory_start, memory_end] for
> !MMU variant and by m68k_memory array of memory ranges for the MMU 
> version.
> This information is directly used to register the physical memory with
> memblock.
> 
> The reserve_bootmem() calls are replaced with memblock_reserve() and the
> bootmap bitmap allocation is simply dropped.
> 
> Since the MMU variant creates early mappings only for the small part of 
> the
> memory we force bottom-up allocations in memblock because otherwise we 
> will
> attempt to access memory that not yet mapped
> 
> Signed-off-by: Mike Rapoport 
> >>>
> >>>This builds cleanly for me with a m5475_defconfig, but it fails
> >>>to boot on real hardware. No console, no nothing on startup.
> >>>I haven't debugged any further yet.
> >>>
> >>>The M5475 is a ColdFire with MMU enabled target.
> >>
> >>With some early serial debug trace I see:
> >>
> >>Linux version 4.18.0-rc3-3-g109f5e551b18-dirty (gerg@goober) (gcc 
> >>version 5.4.0 (GCC)) #5 Wed Jul 4 12:00:03 AEST 2018
> >>On node 0 totalpages: 4096
> >>   DMA zone: 18 pages used for memmap
> >>   DMA zone: 0 pages reserved
> >>   DMA zone: 4096 pages, LIFO batch:0
> >>pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
> >>pcpu-alloc: [0] 0
> >>Built 1 zonelists, mobility grouping off.  Total pages: 4078
> >>Kernel command line: root=/dev/mtdblock0
> >>Dentry cache hash table entries: 4096 (order: 1, 16384 bytes)
> >>Inode-cache hash table entries: 2048 (order: 0, 8192 bytes)
> >>Sorting __ex_table...
> >>Memory: 3032K/32768K available (1489K kernel code, 96K rwdata, 240K rodata, 
> >>56K init, 77K bss, 29736K reserved, 0K cma-reserved)
> > 
> >  ^^
> >It seems I was over enthusiastic when I reserved the memory for the kernel.
> >Can you please try with the below patch:
> >
> >diff --git a/arch/m68k/mm/mcfmmu.c b/arch/m68k/mm/mcfmmu.c
> >index e9e60e1..18c7bf6 100644
> >--- a/arch/m68k/mm/mcfmmu.c
> >+++ b/arch/m68k/mm/mcfmmu.c
> >@@ -174,7 +174,7 @@ void __init cf_bootmem_alloc(void)
> > high_memory = (void *)_ramend;
> > /* Reserve kernel text/data/bss */
> >-memblock_reserve(memstart, _ramend - memstart);
> >+memblock_reserve(memstart, memstart - _rambase);
> > m68k_virt_to_node_shift = fls(_ramend - 1) - 6;
> > module_fixup(NULL, __start_fixup, __stop_fixup);
> >diff --git a/mm/memblock.c b/mm/memblock.c
> >index 03d48d8..98661be 100644
> >--- a/mm/memblock.c
> >+++ b/mm/memblock.c
> >@@ -54,7 +54,7 @@ struct memblock memblock __initdata_memblock = {
> > .current_limit  = MEMBLOCK_ALLOC_ANYWHERE,
> >  };
> >-int memblock_debug __initdata_memblock;
> >+int memblock_debug __initdata_memblock = 1;
> >  static bool system_has_some_mirror __initdata_memblock = false;
> >  static int memblock_can_resize __initdata_memblock;
> >  static int memblock_memory_in_slab __initdata_memblock = 0;
> >
> >
> >The memblock hunk is needed to see early memblock debug messages as all the
> >setup happens before parsing of the command line.
> 
> Ok, that works, boots all the way up now.

Thanks for testing. 
I'll send v2 later on today.
 
> Linux version 4.18.0-rc3-3-g109f5e551b18-dirty (gerg@goober) (gcc version 
> 5.4.0 (GCC)) #7 Wed Jul 4 14:34:48 AEST 2018
> memblock_add: [0x-0x01ff] 0x001ebaa0
> memblock_reserve: [0x00332000-0x00663fff] 0x001ebafa
> memblock_reserve: [0x01ffe000-0x01ff] 0x001efd38
> memblock_reserve: [0x01ff8000-0x01ffdfff] 0x001efd38
> memblock_virt_alloc_try_nid_nopanic: 147456 bytes align=0x0 nid=0 from=0x0 
> max_addr=0x0 0x00190dea
> memblock_reserve: [0x01fd4000-0x01ff7fff] 0x001f0466
> memblock_virt_alloc_try_nid_nopanic: 4 bytes align=0x0 nid=0 from=0x0 
> max_addr=0x0 0x001ee234
> memblock_reserve: [0x01fd3ff0-0x01fd3ff3] 0x001f0466
> memblock_virt_alloc_try_nid: 20 bytes align=0x0 nid=-1 from=0x0 max_addr=0x0 
> 0x001ea488
> memblock_reserve: [0x01fd3fd0-0x01fd3fe3] 0x001f0466
> memblock_virt_alloc_try_nid: 20 bytes align=0x0 nid=-1 from=0x0 max_addr=0x0 
> 0x001ea4a8
> memblock_reserve: [0x01fd3fb0-0x01fd3fc3] 0x001f0466
> memblock_virt_alloc_try_nid: 20 bytes align=0x0 nid=-1 from=0x0 max_addr=0x0 
> 0x001ea4c0
> memblock_reserve: [0x01fd3f90-0x01fd3fa3] 0x001f0466
> memblock_virt_alloc_try_nid_nopanic: 8192 bytes align=0x2000 nid=-1 from=0x0 
> max_addr=0x0 0x001eef30
> memblock_reserve: [0x01fd-0x01fd1fff] 0x001f0466
> memblock_virt_alloc_try_nid_nopanic: 32768 bytes align=0x2000 nid=-1 from=0x0 
> max_addr=0x0 0x001ef5d6
> memblock_reserve: [0x01fc8000-0x01fc] 0x001f0466
> 

[PATCH v3 0/4] Add devicetree functionality to w1 busses and ds2760

2018-07-03 Thread Daniel Mack
This patch set contains four patches that bring devicetree
functionality to w1 bus masters and slaves in general. As an example,
the ds2760 driver is made aware of devicetree probing. Other drivers
can easily be ported later.

W1 masters scan their bus in order to discover slave devices. Once
one is found, a driver matching the family of the device is instanciated
which handles it. To add devicetree functionality, all that's needed
for now is a call to of_find_matching_node() when a slave device is
attached, so the corresponding of_node pointer is set.

The series also contains a patch that merges the w1 slave driver for the
ds2760 battery monitor into its only user, the ds2760 supply driver.
The indirection with two drivers never had any benefit, and here is
a good opportunity to clean this up.

Patch #1 adds some DT bindings documentation
Patch #2 adds the call to of_find_matching_node()
Patch #3 merges the w1 slave and the supply driver for ds2760
Patch #4 makes the ds2760 supply driver aware of DT environments

This works fine on a PXA3xx based board with a battery attached to
the w1-gpio bus master controller.

Changelog:

v2 → v3:
* Fixed a typo in the documentation and added more real-world
  example for the bindings. Pointed out by Rob Herring.

Daniel Mack (4):
  dt-bindings: w1: document generic onewire and DS2760 bindings
  w1: core: match sub-nodes of bus masters in devicetree
  power: supply: ds2760_battery: merge ds2760 supply driver with its w1
slave companion
  power: supply: ds2760_battery: add devicetree probing

 .../bindings/power/supply/maxim,ds2760.txt|  29 ++
 .../devicetree/bindings/w1/w1-gpio.txt|  11 +-
 Documentation/devicetree/bindings/w1/w1.txt   |  25 ++
 drivers/power/supply/Kconfig  |   2 +-
 drivers/power/supply/ds2760_battery.c | 351 +-
 drivers/w1/slaves/Kconfig |  12 -
 drivers/w1/slaves/Makefile|   1 -
 drivers/w1/slaves/w1_ds2760.c | 175 -
 drivers/w1/slaves/w1_ds2760.h |  59 ---
 drivers/w1/w1.c   |   3 +
 include/linux/w1.h|   2 +
 11 files changed, 329 insertions(+), 341 deletions(-)
 create mode 100644 
Documentation/devicetree/bindings/power/supply/maxim,ds2760.txt
 create mode 100644 Documentation/devicetree/bindings/w1/w1.txt
 delete mode 100644 drivers/w1/slaves/w1_ds2760.c
 delete mode 100644 drivers/w1/slaves/w1_ds2760.h

-- 
2.17.1



[PATCH v3 0/4] Add devicetree functionality to w1 busses and ds2760

2018-07-03 Thread Daniel Mack
This patch set contains four patches that bring devicetree
functionality to w1 bus masters and slaves in general. As an example,
the ds2760 driver is made aware of devicetree probing. Other drivers
can easily be ported later.

W1 masters scan their bus in order to discover slave devices. Once
one is found, a driver matching the family of the device is instanciated
which handles it. To add devicetree functionality, all that's needed
for now is a call to of_find_matching_node() when a slave device is
attached, so the corresponding of_node pointer is set.

The series also contains a patch that merges the w1 slave driver for the
ds2760 battery monitor into its only user, the ds2760 supply driver.
The indirection with two drivers never had any benefit, and here is
a good opportunity to clean this up.

Patch #1 adds some DT bindings documentation
Patch #2 adds the call to of_find_matching_node()
Patch #3 merges the w1 slave and the supply driver for ds2760
Patch #4 makes the ds2760 supply driver aware of DT environments

This works fine on a PXA3xx based board with a battery attached to
the w1-gpio bus master controller.

Changelog:

v2 → v3:
* Fixed a typo in the documentation and added more real-world
  example for the bindings. Pointed out by Rob Herring.

Daniel Mack (4):
  dt-bindings: w1: document generic onewire and DS2760 bindings
  w1: core: match sub-nodes of bus masters in devicetree
  power: supply: ds2760_battery: merge ds2760 supply driver with its w1
slave companion
  power: supply: ds2760_battery: add devicetree probing

 .../bindings/power/supply/maxim,ds2760.txt|  29 ++
 .../devicetree/bindings/w1/w1-gpio.txt|  11 +-
 Documentation/devicetree/bindings/w1/w1.txt   |  25 ++
 drivers/power/supply/Kconfig  |   2 +-
 drivers/power/supply/ds2760_battery.c | 351 +-
 drivers/w1/slaves/Kconfig |  12 -
 drivers/w1/slaves/Makefile|   1 -
 drivers/w1/slaves/w1_ds2760.c | 175 -
 drivers/w1/slaves/w1_ds2760.h |  59 ---
 drivers/w1/w1.c   |   3 +
 include/linux/w1.h|   2 +
 11 files changed, 329 insertions(+), 341 deletions(-)
 create mode 100644 
Documentation/devicetree/bindings/power/supply/maxim,ds2760.txt
 create mode 100644 Documentation/devicetree/bindings/w1/w1.txt
 delete mode 100644 drivers/w1/slaves/w1_ds2760.c
 delete mode 100644 drivers/w1/slaves/w1_ds2760.h

-- 
2.17.1



[PATCH v3 2/4] w1: core: match sub-nodes of bus masters in devicetree

2018-07-03 Thread Daniel Mack
Once a new slave device is detected, match it against all sub-nodes of the
master bus controller. If a match is found, set the slave device's of_node
pointer.

Signed-off-by: Daniel Mack 
---
 drivers/w1/w1.c| 3 +++
 include/linux/w1.h | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/drivers/w1/w1.c b/drivers/w1/w1.c
index caef0e0fd817..890c038c25f8 100644
--- a/drivers/w1/w1.c
+++ b/drivers/w1/w1.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -686,6 +687,8 @@ static int __w1_attach_slave_device(struct w1_slave *sl)
sl->dev.bus = _bus_type;
sl->dev.release = _slave_release;
sl->dev.groups = w1_slave_groups;
+   sl->dev.of_node = of_find_matching_node(sl->master->dev.of_node,
+   sl->family->of_match_table);
 
dev_set_name(>dev, "%02x-%012llx",
 (unsigned int) sl->reg_num.family,
diff --git a/include/linux/w1.h b/include/linux/w1.h
index 694101f744c7..3111585c371f 100644
--- a/include/linux/w1.h
+++ b/include/linux/w1.h
@@ -274,6 +274,8 @@ struct w1_family {
 
struct w1_family_ops*fops;
 
+   const struct of_device_id *of_match_table;
+
atomic_trefcnt;
 };
 
-- 
2.17.1



[PATCH v3 2/4] w1: core: match sub-nodes of bus masters in devicetree

2018-07-03 Thread Daniel Mack
Once a new slave device is detected, match it against all sub-nodes of the
master bus controller. If a match is found, set the slave device's of_node
pointer.

Signed-off-by: Daniel Mack 
---
 drivers/w1/w1.c| 3 +++
 include/linux/w1.h | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/drivers/w1/w1.c b/drivers/w1/w1.c
index caef0e0fd817..890c038c25f8 100644
--- a/drivers/w1/w1.c
+++ b/drivers/w1/w1.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -686,6 +687,8 @@ static int __w1_attach_slave_device(struct w1_slave *sl)
sl->dev.bus = _bus_type;
sl->dev.release = _slave_release;
sl->dev.groups = w1_slave_groups;
+   sl->dev.of_node = of_find_matching_node(sl->master->dev.of_node,
+   sl->family->of_match_table);
 
dev_set_name(>dev, "%02x-%012llx",
 (unsigned int) sl->reg_num.family,
diff --git a/include/linux/w1.h b/include/linux/w1.h
index 694101f744c7..3111585c371f 100644
--- a/include/linux/w1.h
+++ b/include/linux/w1.h
@@ -274,6 +274,8 @@ struct w1_family {
 
struct w1_family_ops*fops;
 
+   const struct of_device_id *of_match_table;
+
atomic_trefcnt;
 };
 
-- 
2.17.1



[PATCH v3 1/4] dt-bindings: w1: document generic onewire and DS2760 bindings

2018-07-03 Thread Daniel Mack
This patch adds a generic w1 bindings document that merely describes how
slave deviceses are grouped under master nodes. It also adds a specific
binding for the ds2760 battery monitor.

Signed-off-by: Daniel Mack 
---
 .../bindings/power/supply/maxim,ds2760.txt| 29 +++
 .../devicetree/bindings/w1/w1-gpio.txt| 11 +--
 Documentation/devicetree/bindings/w1/w1.txt   | 25 
 3 files changed, 62 insertions(+), 3 deletions(-)
 create mode 100644 
Documentation/devicetree/bindings/power/supply/maxim,ds2760.txt
 create mode 100644 Documentation/devicetree/bindings/w1/w1.txt

diff --git a/Documentation/devicetree/bindings/power/supply/maxim,ds2760.txt 
b/Documentation/devicetree/bindings/power/supply/maxim,ds2760.txt
new file mode 100644
index ..a39d1ada48f2
--- /dev/null
+++ b/Documentation/devicetree/bindings/power/supply/maxim,ds2760.txt
@@ -0,0 +1,29 @@
+Devicetree bindings for Maxim DS2760
+
+
+The ds2760 is a w1 slave device and must hence have its sub-node in DT
+under a w1 bus master node.
+
+The device exposes a power supply, so the details described in
+Documentation/devicetree/bindings/power/supply/power_supply.txt apply.
+
+Required properties:
+- compatible: must be "maxim,ds2760"
+
+Optional properties:
+- power-supplies:  Refers to one or more power supplies connected to
+   this battery.
+- maxim,pmod-enabled:  This boolean property enables the DS2760 to enter
+   sleep mode when the DQ line goes low for greater
+   than 2 seconds and leave sleep Mode when the DQ
+   line goes high.
+- maxim,cache-time:Time im milliseconds to cache the data for. When
+   this time expires, the values are read again from
+   the hardware. Defaults to 1000.
+- maxim,rated-capacity:The rated capacity of the battery, in mAh.
+   If not specified, the value stored in the
+   non-volatile chip memory is used.
+- maxim,current-accumulator:
+   The current accumulator value in mAh.
+   If not specified, the value stored in the
+   non-volatile chip memory is used.
diff --git a/Documentation/devicetree/bindings/w1/w1-gpio.txt 
b/Documentation/devicetree/bindings/w1/w1-gpio.txt
index 6e09c35d9f1a..3d6554eac240 100644
--- a/Documentation/devicetree/bindings/w1/w1-gpio.txt
+++ b/Documentation/devicetree/bindings/w1/w1-gpio.txt
@@ -13,10 +13,15 @@ Optional properties:
  - linux,open-drain: if specified, the data pin is considered in
 open-drain mode.
 
+Also refer to the generic w1.txt document.
+
 Examples:
 
-   onewire@0 {
+   onewire {
compatible = "w1-gpio";
-   gpios = < 126 0>, < 105 0>;
-   };
+   gpios = < 0 GPIO_ACTIVE_HIGH>;
 
+   battery {
+   // ...
+   };
+   };
diff --git a/Documentation/devicetree/bindings/w1/w1.txt 
b/Documentation/devicetree/bindings/w1/w1.txt
new file mode 100644
index ..bee18f7070ec
--- /dev/null
+++ b/Documentation/devicetree/bindings/w1/w1.txt
@@ -0,0 +1,25 @@
+Generic devicetree bindings for onewire (w1) busses
+===
+
+Onewire busses are described through nodes of their master bus controller.
+Slave devices are listed as sub-nodes of such master devices. For now, only
+one slave is allowed per bus master.
+
+
+Example:
+
+   charger: charger {
+   compatible = "gpio-charger";
+   charger-type = "mains";
+   gpios = < 1 GPIO_ACTIVE_LOW>;
+   };
+
+   onewire@0 {
+   compatible = "w1-gpio";
+   gpios = < 100 0>, < 101 0>;
+
+   battery {
+   compatible = "maxim,ds2760";
+   power-supplies = <>;
+   };
+   };
-- 
2.17.1



[PATCH v3 4/4] power: supply: ds2760_battery: add devicetree probing

2018-07-03 Thread Daniel Mack
Add a matching table for devicetree probing, and optionally set the module
parameter variables from DT properties.

Signed-off-by: Daniel Mack 
---
 drivers/power/supply/ds2760_battery.c | 32 ++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/drivers/power/supply/ds2760_battery.c 
b/drivers/power/supply/ds2760_battery.c
index aa406a7c65a1..921cbcfd8c99 100644
--- a/drivers/power/supply/ds2760_battery.c
+++ b/drivers/power/supply/ds2760_battery.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static unsigned int cache_time = 1000;
 module_param(cache_time, uint, 0644);
@@ -705,6 +706,27 @@ static int w1_ds2760_add_slave(struct w1_slave *sl)
 
psy_cfg.drv_data = di;
 
+   if (dev->of_node) {
+   u32 tmp;
+
+   psy_cfg.of_node = dev->of_node;
+
+   if (!of_property_read_bool(dev->of_node, "maxim,pmod-enabled"))
+   pmod_enabled = true;
+
+   if (!of_property_read_u32(dev->of_node,
+ "maxim,cache-time", ))
+   cache_time = tmp;
+
+   if (!of_property_read_u32(dev->of_node,
+ "maxim,rated-capacity", ))
+   rated_capacity = tmp / 10; /* property is in mAh */
+
+   if (!of_property_read_u32(dev->of_node,
+ "maxim,current-accumulator", ))
+   current_accum = tmp;
+   }
+
di->charge_status = POWER_SUPPLY_STATUS_UNKNOWN;
 
sl->family_data = di;
@@ -719,7 +741,7 @@ static int w1_ds2760_add_slave(struct w1_slave *sl)
 
ds2760_battery_write_status(di, status);
 
-   /* set rated capacity from module param */
+   /* set rated capacity from module param or device tree */
if (rated_capacity)
ds2760_battery_write_rated_capacity(di, rated_capacity);
 
@@ -769,6 +791,13 @@ static void w1_ds2760_remove_slave(struct w1_slave *sl)
power_supply_unregister(di->bat);
 }
 
+#ifdef CONFIG_OF
+static const struct of_device_id w1_ds2760_of_ids[] = {
+   { .compatible = "maxim,ds2760" },
+   {}
+};
+#endif
+
 static struct w1_family_ops w1_ds2760_fops = {
.add_slave  = w1_ds2760_add_slave,
.remove_slave   = w1_ds2760_remove_slave,
@@ -778,6 +807,7 @@ static struct w1_family_ops w1_ds2760_fops = {
 static struct w1_family w1_ds2760_family = {
.fid= W1_FAMILY_DS2760,
.fops   = _ds2760_fops,
+   .of_match_table = of_match_ptr(w1_ds2760_of_ids),
 };
 module_w1_family(w1_ds2760_family);
 
-- 
2.17.1



[PATCH v3 3/4] power: supply: ds2760_battery: merge ds2760 supply driver with its w1 slave companion

2018-07-03 Thread Daniel Mack
This patch removes the w1 slave driver that used to register the w1 family
and instanciate a platform device at runtime. The code now lives in the
supply driver instead to avoid that level of indirection.

The old device name "ds2760-battery.0" is preserved, so userspace
applications can access the same virtual device nodes as before.

Note that because the w1 core does not currently have a framework for
suspend/resume, the driver now registers a PM notifier callback.

Signed-off-by: Daniel Mack 
---
 drivers/power/supply/Kconfig  |   2 +-
 drivers/power/supply/ds2760_battery.c | 321 ++
 drivers/w1/slaves/Kconfig |  12 -
 drivers/w1/slaves/Makefile|   1 -
 drivers/w1/slaves/w1_ds2760.c | 175 --
 drivers/w1/slaves/w1_ds2760.h |  59 -
 6 files changed, 232 insertions(+), 338 deletions(-)
 delete mode 100644 drivers/w1/slaves/w1_ds2760.c
 delete mode 100644 drivers/w1/slaves/w1_ds2760.h

diff --git a/drivers/power/supply/Kconfig b/drivers/power/supply/Kconfig
index 428b426842f4..518a88c4adfa 100644
--- a/drivers/power/supply/Kconfig
+++ b/drivers/power/supply/Kconfig
@@ -92,7 +92,7 @@ config BATTERY_CPCAP
 
 config BATTERY_DS2760
tristate "DS2760 battery driver (HP iPAQ & others)"
-   depends on W1 && W1_SLAVE_DS2760
+   depends on W1
help
  Say Y here to enable support for batteries with ds2760 chip.
 
diff --git a/drivers/power/supply/ds2760_battery.c 
b/drivers/power/supply/ds2760_battery.c
index ae180dc929c9..aa406a7c65a1 100644
--- a/drivers/power/supply/ds2760_battery.c
+++ b/drivers/power/supply/ds2760_battery.c
@@ -27,9 +27,63 @@
 #include 
 #include 
 #include 
-
+#include 
 #include 
-#include "../../w1/slaves/w1_ds2760.h"
+
+static unsigned int cache_time = 1000;
+module_param(cache_time, uint, 0644);
+MODULE_PARM_DESC(cache_time, "cache time in milliseconds");
+
+static bool pmod_enabled;
+module_param(pmod_enabled, bool, 0644);
+MODULE_PARM_DESC(pmod_enabled, "PMOD enable bit");
+
+static unsigned int rated_capacity;
+module_param(rated_capacity, uint, 0644);
+MODULE_PARM_DESC(rated_capacity, "rated battery capacity, 10*mAh or index");
+
+static unsigned int current_accum;
+module_param(current_accum, uint, 0644);
+MODULE_PARM_DESC(current_accum, "current accumulator value");
+
+#define W1_FAMILY_DS2760   0x30
+
+/* Known commands to the DS2760 chip */
+#define W1_DS2760_SWAP 0xAA
+#define W1_DS2760_READ_DATA0x69
+#define W1_DS2760_WRITE_DATA   0x6C
+#define W1_DS2760_COPY_DATA0x48
+#define W1_DS2760_RECALL_DATA  0xB8
+#define W1_DS2760_LOCK 0x6A
+
+/* Number of valid register addresses */
+#define DS2760_DATA_SIZE   0x40
+
+#define DS2760_PROTECTION_REG  0x00
+
+#define DS2760_STATUS_REG  0x01
+#define DS2760_STATUS_IE   (1 << 2)
+#define DS2760_STATUS_SWEN (1 << 3)
+#define DS2760_STATUS_RNAOP(1 << 4)
+#define DS2760_STATUS_PMOD (1 << 5)
+
+#define DS2760_EEPROM_REG  0x07
+#define DS2760_SPECIAL_FEATURE_REG 0x08
+#define DS2760_VOLTAGE_MSB 0x0c
+#define DS2760_VOLTAGE_LSB 0x0d
+#define DS2760_CURRENT_MSB 0x0e
+#define DS2760_CURRENT_LSB 0x0f
+#define DS2760_CURRENT_ACCUM_MSB   0x10
+#define DS2760_CURRENT_ACCUM_LSB   0x11
+#define DS2760_TEMP_MSB0x18
+#define DS2760_TEMP_LSB0x19
+#define DS2760_EEPROM_BLOCK0   0x20
+#define DS2760_ACTIVE_FULL 0x20
+#define DS2760_EEPROM_BLOCK1   0x30
+#define DS2760_STATUS_WRITE_REG0x31
+#define DS2760_RATED_CAPACITY  0x32
+#define DS2760_CURRENT_OFFSET_BIAS 0x33
+#define DS2760_ACTIVE_EMPTY0x3b
 
 struct ds2760_device_info {
struct device *dev;
@@ -55,28 +109,113 @@ struct ds2760_device_info {
int full_counter;
struct power_supply *bat;
struct power_supply_desc bat_desc;
-   struct device *w1_dev;
struct workqueue_struct *monitor_wqueue;
struct delayed_work monitor_work;
struct delayed_work set_charged_work;
+   struct notifier_block pm_notifier;
 };
 
-static unsigned int cache_time = 1000;
-module_param(cache_time, uint, 0644);
-MODULE_PARM_DESC(cache_time, "cache time in milliseconds");
+static int w1_ds2760_io(struct device *dev, char *buf, int addr, size_t count,
+   int io)
+{
+   struct w1_slave *sl = container_of(dev, struct w1_slave, dev);
 
-static bool pmod_enabled;
-module_param(pmod_enabled, bool, 0644);
-MODULE_PARM_DESC(pmod_enabled, "PMOD enable bit");
+   if (!dev)
+   return 0;
 
-static unsigned int rated_capacity;
-module_param(rated_capacity, uint, 0644);
-MODULE_PARM_DESC(rated_capacity, "rated battery capacity, 10*mAh or index");
+   mutex_lock(>master->bus_mutex);
 

[PATCH v3 1/4] dt-bindings: w1: document generic onewire and DS2760 bindings

2018-07-03 Thread Daniel Mack
This patch adds a generic w1 bindings document that merely describes how
slave deviceses are grouped under master nodes. It also adds a specific
binding for the ds2760 battery monitor.

Signed-off-by: Daniel Mack 
---
 .../bindings/power/supply/maxim,ds2760.txt| 29 +++
 .../devicetree/bindings/w1/w1-gpio.txt| 11 +--
 Documentation/devicetree/bindings/w1/w1.txt   | 25 
 3 files changed, 62 insertions(+), 3 deletions(-)
 create mode 100644 
Documentation/devicetree/bindings/power/supply/maxim,ds2760.txt
 create mode 100644 Documentation/devicetree/bindings/w1/w1.txt

diff --git a/Documentation/devicetree/bindings/power/supply/maxim,ds2760.txt 
b/Documentation/devicetree/bindings/power/supply/maxim,ds2760.txt
new file mode 100644
index ..a39d1ada48f2
--- /dev/null
+++ b/Documentation/devicetree/bindings/power/supply/maxim,ds2760.txt
@@ -0,0 +1,29 @@
+Devicetree bindings for Maxim DS2760
+
+
+The ds2760 is a w1 slave device and must hence have its sub-node in DT
+under a w1 bus master node.
+
+The device exposes a power supply, so the details described in
+Documentation/devicetree/bindings/power/supply/power_supply.txt apply.
+
+Required properties:
+- compatible: must be "maxim,ds2760"
+
+Optional properties:
+- power-supplies:  Refers to one or more power supplies connected to
+   this battery.
+- maxim,pmod-enabled:  This boolean property enables the DS2760 to enter
+   sleep mode when the DQ line goes low for greater
+   than 2 seconds and leave sleep Mode when the DQ
+   line goes high.
+- maxim,cache-time:Time im milliseconds to cache the data for. When
+   this time expires, the values are read again from
+   the hardware. Defaults to 1000.
+- maxim,rated-capacity:The rated capacity of the battery, in mAh.
+   If not specified, the value stored in the
+   non-volatile chip memory is used.
+- maxim,current-accumulator:
+   The current accumulator value in mAh.
+   If not specified, the value stored in the
+   non-volatile chip memory is used.
diff --git a/Documentation/devicetree/bindings/w1/w1-gpio.txt 
b/Documentation/devicetree/bindings/w1/w1-gpio.txt
index 6e09c35d9f1a..3d6554eac240 100644
--- a/Documentation/devicetree/bindings/w1/w1-gpio.txt
+++ b/Documentation/devicetree/bindings/w1/w1-gpio.txt
@@ -13,10 +13,15 @@ Optional properties:
  - linux,open-drain: if specified, the data pin is considered in
 open-drain mode.
 
+Also refer to the generic w1.txt document.
+
 Examples:
 
-   onewire@0 {
+   onewire {
compatible = "w1-gpio";
-   gpios = < 126 0>, < 105 0>;
-   };
+   gpios = < 0 GPIO_ACTIVE_HIGH>;
 
+   battery {
+   // ...
+   };
+   };
diff --git a/Documentation/devicetree/bindings/w1/w1.txt 
b/Documentation/devicetree/bindings/w1/w1.txt
new file mode 100644
index ..bee18f7070ec
--- /dev/null
+++ b/Documentation/devicetree/bindings/w1/w1.txt
@@ -0,0 +1,25 @@
+Generic devicetree bindings for onewire (w1) busses
+===
+
+Onewire busses are described through nodes of their master bus controller.
+Slave devices are listed as sub-nodes of such master devices. For now, only
+one slave is allowed per bus master.
+
+
+Example:
+
+   charger: charger {
+   compatible = "gpio-charger";
+   charger-type = "mains";
+   gpios = < 1 GPIO_ACTIVE_LOW>;
+   };
+
+   onewire@0 {
+   compatible = "w1-gpio";
+   gpios = < 100 0>, < 101 0>;
+
+   battery {
+   compatible = "maxim,ds2760";
+   power-supplies = <>;
+   };
+   };
-- 
2.17.1



[PATCH v3 4/4] power: supply: ds2760_battery: add devicetree probing

2018-07-03 Thread Daniel Mack
Add a matching table for devicetree probing, and optionally set the module
parameter variables from DT properties.

Signed-off-by: Daniel Mack 
---
 drivers/power/supply/ds2760_battery.c | 32 ++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/drivers/power/supply/ds2760_battery.c 
b/drivers/power/supply/ds2760_battery.c
index aa406a7c65a1..921cbcfd8c99 100644
--- a/drivers/power/supply/ds2760_battery.c
+++ b/drivers/power/supply/ds2760_battery.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static unsigned int cache_time = 1000;
 module_param(cache_time, uint, 0644);
@@ -705,6 +706,27 @@ static int w1_ds2760_add_slave(struct w1_slave *sl)
 
psy_cfg.drv_data = di;
 
+   if (dev->of_node) {
+   u32 tmp;
+
+   psy_cfg.of_node = dev->of_node;
+
+   if (!of_property_read_bool(dev->of_node, "maxim,pmod-enabled"))
+   pmod_enabled = true;
+
+   if (!of_property_read_u32(dev->of_node,
+ "maxim,cache-time", ))
+   cache_time = tmp;
+
+   if (!of_property_read_u32(dev->of_node,
+ "maxim,rated-capacity", ))
+   rated_capacity = tmp / 10; /* property is in mAh */
+
+   if (!of_property_read_u32(dev->of_node,
+ "maxim,current-accumulator", ))
+   current_accum = tmp;
+   }
+
di->charge_status = POWER_SUPPLY_STATUS_UNKNOWN;
 
sl->family_data = di;
@@ -719,7 +741,7 @@ static int w1_ds2760_add_slave(struct w1_slave *sl)
 
ds2760_battery_write_status(di, status);
 
-   /* set rated capacity from module param */
+   /* set rated capacity from module param or device tree */
if (rated_capacity)
ds2760_battery_write_rated_capacity(di, rated_capacity);
 
@@ -769,6 +791,13 @@ static void w1_ds2760_remove_slave(struct w1_slave *sl)
power_supply_unregister(di->bat);
 }
 
+#ifdef CONFIG_OF
+static const struct of_device_id w1_ds2760_of_ids[] = {
+   { .compatible = "maxim,ds2760" },
+   {}
+};
+#endif
+
 static struct w1_family_ops w1_ds2760_fops = {
.add_slave  = w1_ds2760_add_slave,
.remove_slave   = w1_ds2760_remove_slave,
@@ -778,6 +807,7 @@ static struct w1_family_ops w1_ds2760_fops = {
 static struct w1_family w1_ds2760_family = {
.fid= W1_FAMILY_DS2760,
.fops   = _ds2760_fops,
+   .of_match_table = of_match_ptr(w1_ds2760_of_ids),
 };
 module_w1_family(w1_ds2760_family);
 
-- 
2.17.1



[PATCH v3 3/4] power: supply: ds2760_battery: merge ds2760 supply driver with its w1 slave companion

2018-07-03 Thread Daniel Mack
This patch removes the w1 slave driver that used to register the w1 family
and instanciate a platform device at runtime. The code now lives in the
supply driver instead to avoid that level of indirection.

The old device name "ds2760-battery.0" is preserved, so userspace
applications can access the same virtual device nodes as before.

Note that because the w1 core does not currently have a framework for
suspend/resume, the driver now registers a PM notifier callback.

Signed-off-by: Daniel Mack 
---
 drivers/power/supply/Kconfig  |   2 +-
 drivers/power/supply/ds2760_battery.c | 321 ++
 drivers/w1/slaves/Kconfig |  12 -
 drivers/w1/slaves/Makefile|   1 -
 drivers/w1/slaves/w1_ds2760.c | 175 --
 drivers/w1/slaves/w1_ds2760.h |  59 -
 6 files changed, 232 insertions(+), 338 deletions(-)
 delete mode 100644 drivers/w1/slaves/w1_ds2760.c
 delete mode 100644 drivers/w1/slaves/w1_ds2760.h

diff --git a/drivers/power/supply/Kconfig b/drivers/power/supply/Kconfig
index 428b426842f4..518a88c4adfa 100644
--- a/drivers/power/supply/Kconfig
+++ b/drivers/power/supply/Kconfig
@@ -92,7 +92,7 @@ config BATTERY_CPCAP
 
 config BATTERY_DS2760
tristate "DS2760 battery driver (HP iPAQ & others)"
-   depends on W1 && W1_SLAVE_DS2760
+   depends on W1
help
  Say Y here to enable support for batteries with ds2760 chip.
 
diff --git a/drivers/power/supply/ds2760_battery.c 
b/drivers/power/supply/ds2760_battery.c
index ae180dc929c9..aa406a7c65a1 100644
--- a/drivers/power/supply/ds2760_battery.c
+++ b/drivers/power/supply/ds2760_battery.c
@@ -27,9 +27,63 @@
 #include 
 #include 
 #include 
-
+#include 
 #include 
-#include "../../w1/slaves/w1_ds2760.h"
+
+static unsigned int cache_time = 1000;
+module_param(cache_time, uint, 0644);
+MODULE_PARM_DESC(cache_time, "cache time in milliseconds");
+
+static bool pmod_enabled;
+module_param(pmod_enabled, bool, 0644);
+MODULE_PARM_DESC(pmod_enabled, "PMOD enable bit");
+
+static unsigned int rated_capacity;
+module_param(rated_capacity, uint, 0644);
+MODULE_PARM_DESC(rated_capacity, "rated battery capacity, 10*mAh or index");
+
+static unsigned int current_accum;
+module_param(current_accum, uint, 0644);
+MODULE_PARM_DESC(current_accum, "current accumulator value");
+
+#define W1_FAMILY_DS2760   0x30
+
+/* Known commands to the DS2760 chip */
+#define W1_DS2760_SWAP 0xAA
+#define W1_DS2760_READ_DATA0x69
+#define W1_DS2760_WRITE_DATA   0x6C
+#define W1_DS2760_COPY_DATA0x48
+#define W1_DS2760_RECALL_DATA  0xB8
+#define W1_DS2760_LOCK 0x6A
+
+/* Number of valid register addresses */
+#define DS2760_DATA_SIZE   0x40
+
+#define DS2760_PROTECTION_REG  0x00
+
+#define DS2760_STATUS_REG  0x01
+#define DS2760_STATUS_IE   (1 << 2)
+#define DS2760_STATUS_SWEN (1 << 3)
+#define DS2760_STATUS_RNAOP(1 << 4)
+#define DS2760_STATUS_PMOD (1 << 5)
+
+#define DS2760_EEPROM_REG  0x07
+#define DS2760_SPECIAL_FEATURE_REG 0x08
+#define DS2760_VOLTAGE_MSB 0x0c
+#define DS2760_VOLTAGE_LSB 0x0d
+#define DS2760_CURRENT_MSB 0x0e
+#define DS2760_CURRENT_LSB 0x0f
+#define DS2760_CURRENT_ACCUM_MSB   0x10
+#define DS2760_CURRENT_ACCUM_LSB   0x11
+#define DS2760_TEMP_MSB0x18
+#define DS2760_TEMP_LSB0x19
+#define DS2760_EEPROM_BLOCK0   0x20
+#define DS2760_ACTIVE_FULL 0x20
+#define DS2760_EEPROM_BLOCK1   0x30
+#define DS2760_STATUS_WRITE_REG0x31
+#define DS2760_RATED_CAPACITY  0x32
+#define DS2760_CURRENT_OFFSET_BIAS 0x33
+#define DS2760_ACTIVE_EMPTY0x3b
 
 struct ds2760_device_info {
struct device *dev;
@@ -55,28 +109,113 @@ struct ds2760_device_info {
int full_counter;
struct power_supply *bat;
struct power_supply_desc bat_desc;
-   struct device *w1_dev;
struct workqueue_struct *monitor_wqueue;
struct delayed_work monitor_work;
struct delayed_work set_charged_work;
+   struct notifier_block pm_notifier;
 };
 
-static unsigned int cache_time = 1000;
-module_param(cache_time, uint, 0644);
-MODULE_PARM_DESC(cache_time, "cache time in milliseconds");
+static int w1_ds2760_io(struct device *dev, char *buf, int addr, size_t count,
+   int io)
+{
+   struct w1_slave *sl = container_of(dev, struct w1_slave, dev);
 
-static bool pmod_enabled;
-module_param(pmod_enabled, bool, 0644);
-MODULE_PARM_DESC(pmod_enabled, "PMOD enable bit");
+   if (!dev)
+   return 0;
 
-static unsigned int rated_capacity;
-module_param(rated_capacity, uint, 0644);
-MODULE_PARM_DESC(rated_capacity, "rated battery capacity, 10*mAh or index");
+   mutex_lock(>master->bus_mutex);
 

Re: [PATCH 3/3] m68k: switch to MEMBLOCK + NO_BOOTMEM

2018-07-03 Thread Greg Ungerer

Hi Mike,

On 04/07/18 14:22, Mike Rapoport wrote:

On Wed, Jul 04, 2018 at 12:02:52PM +1000, Greg Ungerer wrote:

On 04/07/18 11:39, Greg Ungerer wrote:

On 03/07/18 20:29, Mike Rapoport wrote:

In m68k the physical memory is described by [memory_start, memory_end] for
!MMU variant and by m68k_memory array of memory ranges for the MMU version.
This information is directly used to register the physical memory with
memblock.

The reserve_bootmem() calls are replaced with memblock_reserve() and the
bootmap bitmap allocation is simply dropped.

Since the MMU variant creates early mappings only for the small part of the
memory we force bottom-up allocations in memblock because otherwise we will
attempt to access memory that not yet mapped

Signed-off-by: Mike Rapoport 


This builds cleanly for me with a m5475_defconfig, but it fails
to boot on real hardware. No console, no nothing on startup.
I haven't debugged any further yet.

The M5475 is a ColdFire with MMU enabled target.


With some early serial debug trace I see:

Linux version 4.18.0-rc3-3-g109f5e551b18-dirty (gerg@goober) (gcc version 
5.4.0 (GCC)) #5 Wed Jul 4 12:00:03 AEST 2018
On node 0 totalpages: 4096
   DMA zone: 18 pages used for memmap
   DMA zone: 0 pages reserved
   DMA zone: 4096 pages, LIFO batch:0
pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
pcpu-alloc: [0] 0
Built 1 zonelists, mobility grouping off.  Total pages: 4078
Kernel command line: root=/dev/mtdblock0
Dentry cache hash table entries: 4096 (order: 1, 16384 bytes)
Inode-cache hash table entries: 2048 (order: 0, 8192 bytes)
Sorting __ex_table...
Memory: 3032K/32768K available (1489K kernel code, 96K rwdata, 240K rodata, 56K 
init, 77K bss, 29736K reserved, 0K cma-reserved)


  ^^
It seems I was over enthusiastic when I reserved the memory for the kernel.
Can you please try with the below patch:

diff --git a/arch/m68k/mm/mcfmmu.c b/arch/m68k/mm/mcfmmu.c
index e9e60e1..18c7bf6 100644
--- a/arch/m68k/mm/mcfmmu.c
+++ b/arch/m68k/mm/mcfmmu.c
@@ -174,7 +174,7 @@ void __init cf_bootmem_alloc(void)
high_memory = (void *)_ramend;
  
  	/* Reserve kernel text/data/bss */

-   memblock_reserve(memstart, _ramend - memstart);
+   memblock_reserve(memstart, memstart - _rambase);
  
  	m68k_virt_to_node_shift = fls(_ramend - 1) - 6;

module_fixup(NULL, __start_fixup, __stop_fixup);
diff --git a/mm/memblock.c b/mm/memblock.c
index 03d48d8..98661be 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -54,7 +54,7 @@ struct memblock memblock __initdata_memblock = {
.current_limit  = MEMBLOCK_ALLOC_ANYWHERE,
  };
  
-int memblock_debug __initdata_memblock;

+int memblock_debug __initdata_memblock = 1;
  static bool system_has_some_mirror __initdata_memblock = false;
  static int memblock_can_resize __initdata_memblock;
  static int memblock_memory_in_slab __initdata_memblock = 0;


The memblock hunk is needed to see early memblock debug messages as all the
setup happens before parsing of the command line.


Ok, that works, boots all the way up now.

Linux version 4.18.0-rc3-3-g109f5e551b18-dirty (gerg@goober) (gcc version 
5.4.0 (GCC)) #7 Wed Jul 4 14:34:48 AEST 2018
memblock_add: [0x-0x01ff] 0x001ebaa0
memblock_reserve: [0x00332000-0x00663fff] 0x001ebafa
memblock_reserve: [0x01ffe000-0x01ff] 0x001efd38
memblock_reserve: [0x01ff8000-0x01ffdfff] 0x001efd38
memblock_virt_alloc_try_nid_nopanic: 147456 bytes align=0x0 nid=0 from=0x0 
max_addr=0x0 0x00190dea
memblock_reserve: [0x01fd4000-0x01ff7fff] 0x001f0466
memblock_virt_alloc_try_nid_nopanic: 4 bytes align=0x0 nid=0 from=0x0 
max_addr=0x0 0x001ee234
memblock_reserve: [0x01fd3ff0-0x01fd3ff3] 0x001f0466
memblock_virt_alloc_try_nid: 20 bytes align=0x0 nid=-1 from=0x0 max_addr=0x0 
0x001ea488
memblock_reserve: [0x01fd3fd0-0x01fd3fe3] 0x001f0466
memblock_virt_alloc_try_nid: 20 bytes align=0x0 nid=-1 from=0x0 max_addr=0x0 
0x001ea4a8
memblock_reserve: [0x01fd3fb0-0x01fd3fc3] 0x001f0466
memblock_virt_alloc_try_nid: 20 bytes align=0x0 nid=-1 from=0x0 max_addr=0x0 
0x001ea4c0
memblock_reserve: [0x01fd3f90-0x01fd3fa3] 0x001f0466
memblock_virt_alloc_try_nid_nopanic: 8192 bytes align=0x2000 nid=-1 from=0x0 
max_addr=0x0 0x001eef30
memblock_reserve: [0x01fd-0x01fd1fff] 0x001f0466
memblock_virt_alloc_try_nid_nopanic: 32768 bytes align=0x2000 nid=-1 from=0x0 
max_addr=0x0 0x001ef5d6
memblock_reserve: [0x01fc8000-0x01fc] 0x001f0466
memblock_virt_alloc_try_nid: 4 bytes align=0x0 nid=-1 from=0x0 max_addr=0x0 
0x001ef2ac
memblock_reserve: [0x01fd3f80-0x01fd3f83] 0x001f0466
memblock_virt_alloc_try_nid: 4 bytes align=0x0 nid=-1 from=0x0 max_addr=0x0 
0x001ef2c2
memblock_reserve: [0x01fd3f70-0x01fd3f73] 0x001f0466
memblock_virt_alloc_try_nid: 4 bytes align=0x0 nid=-1 from=0x0 max_addr=0x0 
0x001ef2d6
memblock_reserve: [0x01fd3f60-0x01fd3f63] 0x001f0466
memblock_virt_alloc_try_nid: 4 bytes align=0x0 nid=-1 

Re: [PATCH 3/3] m68k: switch to MEMBLOCK + NO_BOOTMEM

2018-07-03 Thread Greg Ungerer

Hi Mike,

On 04/07/18 14:22, Mike Rapoport wrote:

On Wed, Jul 04, 2018 at 12:02:52PM +1000, Greg Ungerer wrote:

On 04/07/18 11:39, Greg Ungerer wrote:

On 03/07/18 20:29, Mike Rapoport wrote:

In m68k the physical memory is described by [memory_start, memory_end] for
!MMU variant and by m68k_memory array of memory ranges for the MMU version.
This information is directly used to register the physical memory with
memblock.

The reserve_bootmem() calls are replaced with memblock_reserve() and the
bootmap bitmap allocation is simply dropped.

Since the MMU variant creates early mappings only for the small part of the
memory we force bottom-up allocations in memblock because otherwise we will
attempt to access memory that not yet mapped

Signed-off-by: Mike Rapoport 


This builds cleanly for me with a m5475_defconfig, but it fails
to boot on real hardware. No console, no nothing on startup.
I haven't debugged any further yet.

The M5475 is a ColdFire with MMU enabled target.


With some early serial debug trace I see:

Linux version 4.18.0-rc3-3-g109f5e551b18-dirty (gerg@goober) (gcc version 
5.4.0 (GCC)) #5 Wed Jul 4 12:00:03 AEST 2018
On node 0 totalpages: 4096
   DMA zone: 18 pages used for memmap
   DMA zone: 0 pages reserved
   DMA zone: 4096 pages, LIFO batch:0
pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
pcpu-alloc: [0] 0
Built 1 zonelists, mobility grouping off.  Total pages: 4078
Kernel command line: root=/dev/mtdblock0
Dentry cache hash table entries: 4096 (order: 1, 16384 bytes)
Inode-cache hash table entries: 2048 (order: 0, 8192 bytes)
Sorting __ex_table...
Memory: 3032K/32768K available (1489K kernel code, 96K rwdata, 240K rodata, 56K 
init, 77K bss, 29736K reserved, 0K cma-reserved)


  ^^
It seems I was over enthusiastic when I reserved the memory for the kernel.
Can you please try with the below patch:

diff --git a/arch/m68k/mm/mcfmmu.c b/arch/m68k/mm/mcfmmu.c
index e9e60e1..18c7bf6 100644
--- a/arch/m68k/mm/mcfmmu.c
+++ b/arch/m68k/mm/mcfmmu.c
@@ -174,7 +174,7 @@ void __init cf_bootmem_alloc(void)
high_memory = (void *)_ramend;
  
  	/* Reserve kernel text/data/bss */

-   memblock_reserve(memstart, _ramend - memstart);
+   memblock_reserve(memstart, memstart - _rambase);
  
  	m68k_virt_to_node_shift = fls(_ramend - 1) - 6;

module_fixup(NULL, __start_fixup, __stop_fixup);
diff --git a/mm/memblock.c b/mm/memblock.c
index 03d48d8..98661be 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -54,7 +54,7 @@ struct memblock memblock __initdata_memblock = {
.current_limit  = MEMBLOCK_ALLOC_ANYWHERE,
  };
  
-int memblock_debug __initdata_memblock;

+int memblock_debug __initdata_memblock = 1;
  static bool system_has_some_mirror __initdata_memblock = false;
  static int memblock_can_resize __initdata_memblock;
  static int memblock_memory_in_slab __initdata_memblock = 0;


The memblock hunk is needed to see early memblock debug messages as all the
setup happens before parsing of the command line.


Ok, that works, boots all the way up now.

Linux version 4.18.0-rc3-3-g109f5e551b18-dirty (gerg@goober) (gcc version 
5.4.0 (GCC)) #7 Wed Jul 4 14:34:48 AEST 2018
memblock_add: [0x-0x01ff] 0x001ebaa0
memblock_reserve: [0x00332000-0x00663fff] 0x001ebafa
memblock_reserve: [0x01ffe000-0x01ff] 0x001efd38
memblock_reserve: [0x01ff8000-0x01ffdfff] 0x001efd38
memblock_virt_alloc_try_nid_nopanic: 147456 bytes align=0x0 nid=0 from=0x0 
max_addr=0x0 0x00190dea
memblock_reserve: [0x01fd4000-0x01ff7fff] 0x001f0466
memblock_virt_alloc_try_nid_nopanic: 4 bytes align=0x0 nid=0 from=0x0 
max_addr=0x0 0x001ee234
memblock_reserve: [0x01fd3ff0-0x01fd3ff3] 0x001f0466
memblock_virt_alloc_try_nid: 20 bytes align=0x0 nid=-1 from=0x0 max_addr=0x0 
0x001ea488
memblock_reserve: [0x01fd3fd0-0x01fd3fe3] 0x001f0466
memblock_virt_alloc_try_nid: 20 bytes align=0x0 nid=-1 from=0x0 max_addr=0x0 
0x001ea4a8
memblock_reserve: [0x01fd3fb0-0x01fd3fc3] 0x001f0466
memblock_virt_alloc_try_nid: 20 bytes align=0x0 nid=-1 from=0x0 max_addr=0x0 
0x001ea4c0
memblock_reserve: [0x01fd3f90-0x01fd3fa3] 0x001f0466
memblock_virt_alloc_try_nid_nopanic: 8192 bytes align=0x2000 nid=-1 from=0x0 
max_addr=0x0 0x001eef30
memblock_reserve: [0x01fd-0x01fd1fff] 0x001f0466
memblock_virt_alloc_try_nid_nopanic: 32768 bytes align=0x2000 nid=-1 from=0x0 
max_addr=0x0 0x001ef5d6
memblock_reserve: [0x01fc8000-0x01fc] 0x001f0466
memblock_virt_alloc_try_nid: 4 bytes align=0x0 nid=-1 from=0x0 max_addr=0x0 
0x001ef2ac
memblock_reserve: [0x01fd3f80-0x01fd3f83] 0x001f0466
memblock_virt_alloc_try_nid: 4 bytes align=0x0 nid=-1 from=0x0 max_addr=0x0 
0x001ef2c2
memblock_reserve: [0x01fd3f70-0x01fd3f73] 0x001f0466
memblock_virt_alloc_try_nid: 4 bytes align=0x0 nid=-1 from=0x0 max_addr=0x0 
0x001ef2d6
memblock_reserve: [0x01fd3f60-0x01fd3f63] 0x001f0466
memblock_virt_alloc_try_nid: 4 bytes align=0x0 nid=-1 

Re: [PATCH 0/5] thunderbolt: Add support for runtime PM

2018-07-03 Thread Mika Westerberg
On Mon, Jun 18, 2018 at 02:07:26PM +0300, Mika Westerberg wrote:
> Hi all,
> 
> In recent PCs such as Lenovo X1 Carbon 6th generation the Thunderbolt
> controller is in RTD3 mode (Runtime D3). This is different from the
> previous modes because now the controller is present most of the time (it
> still will be hot-removed/hot-added during NVM firmware upgrade). Because
> of that we need to dynamically power it down whenever possible to save some
> power.
> 
> This patch series adds Linux runtime PM support for the Thunderbolt host
> controller driver using ICM firmware but it should be generic enough for
> future additions to allow similar functionality for the older Apple
> hardware as well (even though those system do not support full RTD3, it
> still makes it possible for the host controller to go to low power state if
> cable is not connected).
> 
> With these patches the driver automatically runtime suspends the host
> controller after being idle for 15s. The connected Thunderbolt devices (if
> any) need to support RTD3 mode as well. Typically all 3rd generation
> devices (Alpine Ridge, Titan Ridge) support this.
> 
> However, while this provides some power savings, there is more work to do
> in order to allow powering down the PCIe root port leading to the
> Thunderbolt PCIe hierarchy. This work is still in progress.
> 
> Mika Westerberg (5):
>   thunderbolt: Use 64-bit DMA mask if supported by the platform
>   thunderbolt: Do not unnecessarily call ICM get route
>   thunderbolt: No need to take tb->lock in domain suspend/complete
>   thunderbolt: Use correct ICM commands in system suspend
>   thunderbolt: Add support for runtime PM

All applied to thunderbolt.git/next.


Re: [PATCH 0/5] thunderbolt: Add support for runtime PM

2018-07-03 Thread Mika Westerberg
On Mon, Jun 18, 2018 at 02:07:26PM +0300, Mika Westerberg wrote:
> Hi all,
> 
> In recent PCs such as Lenovo X1 Carbon 6th generation the Thunderbolt
> controller is in RTD3 mode (Runtime D3). This is different from the
> previous modes because now the controller is present most of the time (it
> still will be hot-removed/hot-added during NVM firmware upgrade). Because
> of that we need to dynamically power it down whenever possible to save some
> power.
> 
> This patch series adds Linux runtime PM support for the Thunderbolt host
> controller driver using ICM firmware but it should be generic enough for
> future additions to allow similar functionality for the older Apple
> hardware as well (even though those system do not support full RTD3, it
> still makes it possible for the host controller to go to low power state if
> cable is not connected).
> 
> With these patches the driver automatically runtime suspends the host
> controller after being idle for 15s. The connected Thunderbolt devices (if
> any) need to support RTD3 mode as well. Typically all 3rd generation
> devices (Alpine Ridge, Titan Ridge) support this.
> 
> However, while this provides some power savings, there is more work to do
> in order to allow powering down the PCIe root port leading to the
> Thunderbolt PCIe hierarchy. This work is still in progress.
> 
> Mika Westerberg (5):
>   thunderbolt: Use 64-bit DMA mask if supported by the platform
>   thunderbolt: Do not unnecessarily call ICM get route
>   thunderbolt: No need to take tb->lock in domain suspend/complete
>   thunderbolt: Use correct ICM commands in system suspend
>   thunderbolt: Add support for runtime PM

All applied to thunderbolt.git/next.


[PATCH] mfd: ti_am335x_tscadc: Fix struct clk memory leak

2018-07-03 Thread Zumeng Chen
Use devm_elk_get() to let Linux manage struct clk memory to avoid the following
memory leakage report:

unreferenced object 0xdd75efc0 (size 64):
  comm "systemd-udevd", pid 186, jiffies 4294945126 (age 1195.750s)
  hex dump (first 32 bytes):
61 64 63 5f 74 73 63 5f 66 63 6b 00 00 00 00 00  adc_tsc_fck.
00 00 00 00 92 03 00 00 00 00 00 00 00 00 00 00  
  backtrace:
[] kmemleak_alloc+0x40/0x74
[] __kmalloc_track_caller+0x198/0x388
[] kstrdup+0x40/0x5c
[] kstrdup_const+0x30/0x3c
[] __clk_create_clk+0x60/0xac
[] clk_get_sys+0x74/0x144
[] clk_get+0x5c/0x68
[] ti_tscadc_probe+0x260/0x468 [ti_am335x_tscadc]
[] platform_drv_probe+0x60/0xac
[] driver_probe_device+0x214/0x2dc
[] __driver_attach+0x94/0xc0
[] bus_for_each_dev+0x90/0xa0
[] driver_attach+0x28/0x30
[] bus_add_driver+0x184/0x1ec
[] driver_register+0xb0/0xf0
[] __platform_driver_register+0x40/0x54

Signed-off-by: Zumeng Chen 
---
 drivers/mfd/ti_am335x_tscadc.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/mfd/ti_am335x_tscadc.c b/drivers/mfd/ti_am335x_tscadc.c
index 47012c0..7a30546 100644
--- a/drivers/mfd/ti_am335x_tscadc.c
+++ b/drivers/mfd/ti_am335x_tscadc.c
@@ -209,14 +209,13 @@ staticint ti_tscadc_probe(struct platform_device 
*pdev)
 * The TSC_ADC_SS controller design assumes the OCP clock is
 * at least 6x faster than the ADC clock.
 */
-   clk = clk_get(>dev, "adc_tsc_fck");
+   clk = devm_clk_get(>dev, "adc_tsc_fck");
if (IS_ERR(clk)) {
dev_err(>dev, "failed to get TSC fck\n");
err = PTR_ERR(clk);
goto err_disable_clk;
}
clock_rate = clk_get_rate(clk);
-   clk_put(clk);
tscadc->clk_div = clock_rate / ADC_CLK;
 
/* TSCADC_CLKDIV needs to be configured to the value minus 1 */
-- 
2.7.4



[PATCH] mfd: ti_am335x_tscadc: Fix struct clk memory leak

2018-07-03 Thread Zumeng Chen
Use devm_elk_get() to let Linux manage struct clk memory to avoid the following
memory leakage report:

unreferenced object 0xdd75efc0 (size 64):
  comm "systemd-udevd", pid 186, jiffies 4294945126 (age 1195.750s)
  hex dump (first 32 bytes):
61 64 63 5f 74 73 63 5f 66 63 6b 00 00 00 00 00  adc_tsc_fck.
00 00 00 00 92 03 00 00 00 00 00 00 00 00 00 00  
  backtrace:
[] kmemleak_alloc+0x40/0x74
[] __kmalloc_track_caller+0x198/0x388
[] kstrdup+0x40/0x5c
[] kstrdup_const+0x30/0x3c
[] __clk_create_clk+0x60/0xac
[] clk_get_sys+0x74/0x144
[] clk_get+0x5c/0x68
[] ti_tscadc_probe+0x260/0x468 [ti_am335x_tscadc]
[] platform_drv_probe+0x60/0xac
[] driver_probe_device+0x214/0x2dc
[] __driver_attach+0x94/0xc0
[] bus_for_each_dev+0x90/0xa0
[] driver_attach+0x28/0x30
[] bus_add_driver+0x184/0x1ec
[] driver_register+0xb0/0xf0
[] __platform_driver_register+0x40/0x54

Signed-off-by: Zumeng Chen 
---
 drivers/mfd/ti_am335x_tscadc.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/mfd/ti_am335x_tscadc.c b/drivers/mfd/ti_am335x_tscadc.c
index 47012c0..7a30546 100644
--- a/drivers/mfd/ti_am335x_tscadc.c
+++ b/drivers/mfd/ti_am335x_tscadc.c
@@ -209,14 +209,13 @@ staticint ti_tscadc_probe(struct platform_device 
*pdev)
 * The TSC_ADC_SS controller design assumes the OCP clock is
 * at least 6x faster than the ADC clock.
 */
-   clk = clk_get(>dev, "adc_tsc_fck");
+   clk = devm_clk_get(>dev, "adc_tsc_fck");
if (IS_ERR(clk)) {
dev_err(>dev, "failed to get TSC fck\n");
err = PTR_ERR(clk);
goto err_disable_clk;
}
clock_rate = clk_get_rate(clk);
-   clk_put(clk);
tscadc->clk_div = clock_rate / ADC_CLK;
 
/* TSCADC_CLKDIV needs to be configured to the value minus 1 */
-- 
2.7.4



Re: [PATCH v12 09/13] x86/sgx: EPC page allocation routines

2018-07-03 Thread Borislav Petkov
On Tue, Jul 03, 2018 at 10:41:14PM +0200, Thomas Gleixner wrote:
> On Tue, 3 Jul 2018, Jarkko Sakkinen wrote:
> >  
> > +#define SGX_NR_TO_SCAN 16
> > +#define SGX_NR_LOW_PAGES 32
> > +#define SGX_NR_HIGH_PAGES 64
> > +
> >  bool sgx_enabled __ro_after_init;
> >  EXPORT_SYMBOL(sgx_enabled);
> >  bool sgx_lc_enabled __ro_after_init;
> >  EXPORT_SYMBOL(sgx_lc_enabled);
> > +LIST_HEAD(sgx_active_page_list);
> > +EXPORT_SYMBOL(sgx_active_page_list);
> > +DEFINE_SPINLOCK(sgx_active_page_list_lock);
> > +EXPORT_SYMBOL(sgx_active_page_list_lock);
> 
> Why is all of this exported. If done right then no call site has to fiddle
> with the list and the lock at all.

... and also, why are the other exports not EXPORT_SYMBOL_GPL?

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH v12 09/13] x86/sgx: EPC page allocation routines

2018-07-03 Thread Borislav Petkov
On Tue, Jul 03, 2018 at 10:41:14PM +0200, Thomas Gleixner wrote:
> On Tue, 3 Jul 2018, Jarkko Sakkinen wrote:
> >  
> > +#define SGX_NR_TO_SCAN 16
> > +#define SGX_NR_LOW_PAGES 32
> > +#define SGX_NR_HIGH_PAGES 64
> > +
> >  bool sgx_enabled __ro_after_init;
> >  EXPORT_SYMBOL(sgx_enabled);
> >  bool sgx_lc_enabled __ro_after_init;
> >  EXPORT_SYMBOL(sgx_lc_enabled);
> > +LIST_HEAD(sgx_active_page_list);
> > +EXPORT_SYMBOL(sgx_active_page_list);
> > +DEFINE_SPINLOCK(sgx_active_page_list_lock);
> > +EXPORT_SYMBOL(sgx_active_page_list_lock);
> 
> Why is all of this exported. If done right then no call site has to fiddle
> with the list and the lock at all.

... and also, why are the other exports not EXPORT_SYMBOL_GPL?

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH 3/3] m68k: switch to MEMBLOCK + NO_BOOTMEM

2018-07-03 Thread Mike Rapoport
On Wed, Jul 04, 2018 at 12:02:52PM +1000, Greg Ungerer wrote:
> Hi Mike,
> 
> On 04/07/18 11:39, Greg Ungerer wrote:
> >On 03/07/18 20:29, Mike Rapoport wrote:
> >>In m68k the physical memory is described by [memory_start, memory_end] for
> >>!MMU variant and by m68k_memory array of memory ranges for the MMU version.
> >>This information is directly used to register the physical memory with
> >>memblock.
> >>
> >>The reserve_bootmem() calls are replaced with memblock_reserve() and the
> >>bootmap bitmap allocation is simply dropped.
> >>
> >>Since the MMU variant creates early mappings only for the small part of the
> >>memory we force bottom-up allocations in memblock because otherwise we will
> >>attempt to access memory that not yet mapped
> >>
> >>Signed-off-by: Mike Rapoport 
> >
> >This builds cleanly for me with a m5475_defconfig, but it fails
> >to boot on real hardware. No console, no nothing on startup.
> >I haven't debugged any further yet.
> >
> >The M5475 is a ColdFire with MMU enabled target.
> 
> With some early serial debug trace I see:
> 
> Linux version 4.18.0-rc3-3-g109f5e551b18-dirty (gerg@goober) (gcc version 
> 5.4.0 (GCC)) #5 Wed Jul 4 12:00:03 AEST 2018
> On node 0 totalpages: 4096
>   DMA zone: 18 pages used for memmap
>   DMA zone: 0 pages reserved
>   DMA zone: 4096 pages, LIFO batch:0
> pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
> pcpu-alloc: [0] 0
> Built 1 zonelists, mobility grouping off.  Total pages: 4078
> Kernel command line: root=/dev/mtdblock0
> Dentry cache hash table entries: 4096 (order: 1, 16384 bytes)
> Inode-cache hash table entries: 2048 (order: 0, 8192 bytes)
> Sorting __ex_table...
> Memory: 3032K/32768K available (1489K kernel code, 96K rwdata, 240K rodata, 
> 56K init, 77K bss, 29736K reserved, 0K cma-reserved)

 ^^
It seems I was over enthusiastic when I reserved the memory for the kernel.
Can you please try with the below patch:

diff --git a/arch/m68k/mm/mcfmmu.c b/arch/m68k/mm/mcfmmu.c
index e9e60e1..18c7bf6 100644
--- a/arch/m68k/mm/mcfmmu.c
+++ b/arch/m68k/mm/mcfmmu.c
@@ -174,7 +174,7 @@ void __init cf_bootmem_alloc(void)
high_memory = (void *)_ramend;
 
/* Reserve kernel text/data/bss */
-   memblock_reserve(memstart, _ramend - memstart);
+   memblock_reserve(memstart, memstart - _rambase);
 
m68k_virt_to_node_shift = fls(_ramend - 1) - 6;
module_fixup(NULL, __start_fixup, __stop_fixup);
diff --git a/mm/memblock.c b/mm/memblock.c
index 03d48d8..98661be 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -54,7 +54,7 @@ struct memblock memblock __initdata_memblock = {
.current_limit  = MEMBLOCK_ALLOC_ANYWHERE,
 };
 
-int memblock_debug __initdata_memblock;
+int memblock_debug __initdata_memblock = 1;
 static bool system_has_some_mirror __initdata_memblock = false;
 static int memblock_can_resize __initdata_memblock;
 static int memblock_memory_in_slab __initdata_memblock = 0;


The memblock hunk is needed to see early memblock debug messages as all the
setup happens before parsing of the command line.

> SLUB: HWalign=16, Order=0-3, MinObjects=0, CPUs=1, Nodes=8
> NR_IRQS: 256
> clocksource: slt: mask: 0x max_cycles: 0x, max_idle_ns: 
> 14370379300 ns
> Calibrating delay loop... 264.19 BogoMIPS (lpj=1320960)
> pid_max: default: 32768 minimum: 301
> Mount-cache hash table entries: 2048 (order: 0, 8192 bytes)
> Mountpoint-cache hash table entries: 2048 (order: 0, 8192 bytes)
> clocksource: jiffies: mask: 0x max_cycles: 0x, max_idle_ns: 
> 1911260446275 ns
> ColdFire: PCI bus initialization...
> Coldfire: PCI IO/config window mapped to 0xe000
> PCI host bridge to bus :00
> pci_bus :00: root bus resource [io  0x-0x]
> pci_bus :00: root bus resource [mem 0x-0x]
> pci_bus :00: root bus resource [bus 00-ff]
> pci :00:14.0: [8086:1229] type 00 class 0x02
> pci :00:14.0: reg 0x10: [mem 0x-0x0fff]
> pci :00:14.0: reg 0x14: [io  0x-0x003f]
> pci :00:14.0: reg 0x18: [mem 0x-0x000f]
> pci :00:14.0: reg 0x30: [mem 0x-0x000f pref]
> pci :00:14.0: supports D1 D2
> pci :00:14.0: PME# supported from D0 D1 D2 D3hot
> pci :00:14.0: BAR 2: assigned [mem 0xf000-0xf00f]
> pci :00:14.0: BAR 6: assigned [mem 0xf010-0xf01f pref]
> pci :00:14.0: BAR 0: assigned [mem 0xf020-0xf0200fff]
> pci :00:14.0: BAR 1: assigned [io  0x0400-0x043f]
> vgaarb: loaded
> clocksource: Switched to clocksource slt
> PCI: CLS 32 bytes, default 16
> workingset: timestamp_bits=27 max_order=9 bucket_order=0
> kobject_add_internal failed for slab (error: -12 parent: kernel)
> Cannot register slab subsystem.
> romfs: ROMFS MTD (C) 2007 Red Hat, Inc.
> io scheduler noop registered (default)
> io scheduler mq-deadline registered
> io scheduler kyber registered
> 

Re: [PATCH 3/3] m68k: switch to MEMBLOCK + NO_BOOTMEM

2018-07-03 Thread Mike Rapoport
On Wed, Jul 04, 2018 at 12:02:52PM +1000, Greg Ungerer wrote:
> Hi Mike,
> 
> On 04/07/18 11:39, Greg Ungerer wrote:
> >On 03/07/18 20:29, Mike Rapoport wrote:
> >>In m68k the physical memory is described by [memory_start, memory_end] for
> >>!MMU variant and by m68k_memory array of memory ranges for the MMU version.
> >>This information is directly used to register the physical memory with
> >>memblock.
> >>
> >>The reserve_bootmem() calls are replaced with memblock_reserve() and the
> >>bootmap bitmap allocation is simply dropped.
> >>
> >>Since the MMU variant creates early mappings only for the small part of the
> >>memory we force bottom-up allocations in memblock because otherwise we will
> >>attempt to access memory that not yet mapped
> >>
> >>Signed-off-by: Mike Rapoport 
> >
> >This builds cleanly for me with a m5475_defconfig, but it fails
> >to boot on real hardware. No console, no nothing on startup.
> >I haven't debugged any further yet.
> >
> >The M5475 is a ColdFire with MMU enabled target.
> 
> With some early serial debug trace I see:
> 
> Linux version 4.18.0-rc3-3-g109f5e551b18-dirty (gerg@goober) (gcc version 
> 5.4.0 (GCC)) #5 Wed Jul 4 12:00:03 AEST 2018
> On node 0 totalpages: 4096
>   DMA zone: 18 pages used for memmap
>   DMA zone: 0 pages reserved
>   DMA zone: 4096 pages, LIFO batch:0
> pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
> pcpu-alloc: [0] 0
> Built 1 zonelists, mobility grouping off.  Total pages: 4078
> Kernel command line: root=/dev/mtdblock0
> Dentry cache hash table entries: 4096 (order: 1, 16384 bytes)
> Inode-cache hash table entries: 2048 (order: 0, 8192 bytes)
> Sorting __ex_table...
> Memory: 3032K/32768K available (1489K kernel code, 96K rwdata, 240K rodata, 
> 56K init, 77K bss, 29736K reserved, 0K cma-reserved)

 ^^
It seems I was over enthusiastic when I reserved the memory for the kernel.
Can you please try with the below patch:

diff --git a/arch/m68k/mm/mcfmmu.c b/arch/m68k/mm/mcfmmu.c
index e9e60e1..18c7bf6 100644
--- a/arch/m68k/mm/mcfmmu.c
+++ b/arch/m68k/mm/mcfmmu.c
@@ -174,7 +174,7 @@ void __init cf_bootmem_alloc(void)
high_memory = (void *)_ramend;
 
/* Reserve kernel text/data/bss */
-   memblock_reserve(memstart, _ramend - memstart);
+   memblock_reserve(memstart, memstart - _rambase);
 
m68k_virt_to_node_shift = fls(_ramend - 1) - 6;
module_fixup(NULL, __start_fixup, __stop_fixup);
diff --git a/mm/memblock.c b/mm/memblock.c
index 03d48d8..98661be 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -54,7 +54,7 @@ struct memblock memblock __initdata_memblock = {
.current_limit  = MEMBLOCK_ALLOC_ANYWHERE,
 };
 
-int memblock_debug __initdata_memblock;
+int memblock_debug __initdata_memblock = 1;
 static bool system_has_some_mirror __initdata_memblock = false;
 static int memblock_can_resize __initdata_memblock;
 static int memblock_memory_in_slab __initdata_memblock = 0;


The memblock hunk is needed to see early memblock debug messages as all the
setup happens before parsing of the command line.

> SLUB: HWalign=16, Order=0-3, MinObjects=0, CPUs=1, Nodes=8
> NR_IRQS: 256
> clocksource: slt: mask: 0x max_cycles: 0x, max_idle_ns: 
> 14370379300 ns
> Calibrating delay loop... 264.19 BogoMIPS (lpj=1320960)
> pid_max: default: 32768 minimum: 301
> Mount-cache hash table entries: 2048 (order: 0, 8192 bytes)
> Mountpoint-cache hash table entries: 2048 (order: 0, 8192 bytes)
> clocksource: jiffies: mask: 0x max_cycles: 0x, max_idle_ns: 
> 1911260446275 ns
> ColdFire: PCI bus initialization...
> Coldfire: PCI IO/config window mapped to 0xe000
> PCI host bridge to bus :00
> pci_bus :00: root bus resource [io  0x-0x]
> pci_bus :00: root bus resource [mem 0x-0x]
> pci_bus :00: root bus resource [bus 00-ff]
> pci :00:14.0: [8086:1229] type 00 class 0x02
> pci :00:14.0: reg 0x10: [mem 0x-0x0fff]
> pci :00:14.0: reg 0x14: [io  0x-0x003f]
> pci :00:14.0: reg 0x18: [mem 0x-0x000f]
> pci :00:14.0: reg 0x30: [mem 0x-0x000f pref]
> pci :00:14.0: supports D1 D2
> pci :00:14.0: PME# supported from D0 D1 D2 D3hot
> pci :00:14.0: BAR 2: assigned [mem 0xf000-0xf00f]
> pci :00:14.0: BAR 6: assigned [mem 0xf010-0xf01f pref]
> pci :00:14.0: BAR 0: assigned [mem 0xf020-0xf0200fff]
> pci :00:14.0: BAR 1: assigned [io  0x0400-0x043f]
> vgaarb: loaded
> clocksource: Switched to clocksource slt
> PCI: CLS 32 bytes, default 16
> workingset: timestamp_bits=27 max_order=9 bucket_order=0
> kobject_add_internal failed for slab (error: -12 parent: kernel)
> Cannot register slab subsystem.
> romfs: ROMFS MTD (C) 2007 Red Hat, Inc.
> io scheduler noop registered (default)
> io scheduler mq-deadline registered
> io scheduler kyber registered
> 

Re: linux-next: build warning after merge of the slave-dma tree

2018-07-03 Thread Vinod
Hi Stephen,

On 04-07-18, 13:30, Stephen Rothwell wrote:
> 
> After merging the slave-dma tree, today's linux-next build (x86_64
> allmodconfig) produced this warning:
> 
> WARNING: modpost: missing MODULE_LICENSE() in drivers/dma/fsl-edma-common.o

Thanks for the report, I have fixed it by adding this and pushed out.

-- 
~Vinod


Re: linux-next: build warning after merge of the slave-dma tree

2018-07-03 Thread Vinod
Hi Stephen,

On 04-07-18, 13:30, Stephen Rothwell wrote:
> 
> After merging the slave-dma tree, today's linux-next build (x86_64
> allmodconfig) produced this warning:
> 
> WARNING: modpost: missing MODULE_LICENSE() in drivers/dma/fsl-edma-common.o

Thanks for the report, I have fixed it by adding this and pushed out.

-- 
~Vinod


kernel BUG at mm/gup.c:LINE!

2018-07-03 Thread syzbot

Hello,

syzbot found the following crash on:

HEAD commit:d3bc0e67f852 Merge tag 'for-4.18-rc2-tag' of git://git.ker..
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=177c40
kernel config:  https://syzkaller.appspot.com/x/.config?x=a63be0c83e84d370
dashboard link: https://syzkaller.appspot.com/bug?extid=5dcb560fe12aa5091c06
compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
userspace arch: i386
syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=158577a240

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+5dcb560fe12aa5091...@syzkaller.appspotmail.com

IPv6: ADDRCONF(NETDEV_UP): veth0: link is not ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
8021q: adding VLAN 0 to HW filter on device team0
[ cut here ]
kernel BUG at mm/gup.c:1242!
invalid opcode:  [#1] SMP KASAN
CPU: 1 PID: 4837 Comm: syz-executor0 Not tainted 4.18.0-rc2+ #29
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

RIP: 0010:__mm_populate+0x472/0x520 mm/gup.c:1242
Code: ea 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e aa 00 00 00 44 8b 75 98 45  
31 e4 e9 58 ff ff ff e8 b5 9e d1 ff 0f 0b e8 ae 9e d1 ff <0f> 0b 48 8b bd  
60 ff ff ff e8 d0 72 0f 00 e9 52 fc ff ff 48 8b bd

RSP: 0018:8801aae77ae0 EFLAGS: 00010293
RAX: 8801cfb48280 RBX: 8000 RCX: 81aa6a68
RDX:  RSI: 81aa6dc2 RDI: 0006
RBP: 8801aae77ba0 R08: 8801cfb48280 R09: fbfff133d66a
R10: 0003 R11:  R12: 7bf81000
R13: 7676 R14: dc00 R15: 
FS:  () GS:8801daf0(0063) knlGS:0865b900
CS:  0010 DS: 002b ES: 002b CR0: 80050033
CR2: 080e3a94 CR3: 0001cb021000 CR4: 001406e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 mm_populate include/linux/mm.h:2296 [inline]
 vm_brk_flags+0x1fe/0x240 mm/mmap.c:3038
 vm_brk+0x1f/0x30 mm/mmap.c:3045
 load_elf_library+0x711/0x8e0 fs/binfmt_elf.c:1266
 __do_sys_uselib fs/exec.c:161 [inline]
 __se_sys_uselib fs/exec.c:120 [inline]
 __ia32_sys_uselib+0x37e/0x4c0 fs/exec.c:120
 do_syscall_32_irqs_on arch/x86/entry/common.c:326 [inline]
 do_fast_syscall_32+0x34d/0xfb2 arch/x86/entry/common.c:397
 entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139
RIP: 0023:0xf7fcbcb9
Code: 55 08 8b 88 64 cd ff ff 8b 98 68 cd ff ff 89 c8 85 d2 74 02 89 0a 5b  
5d c3 8b 04 24 c3 8b 1c 24 c3 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90  
90 90 90 eb 0d 90 90 90 90 90 90 90 90 90 90 90 90

RSP: 002b:ff8df4ac EFLAGS: 0282 ORIG_RAX: 0056
RAX: ffda RBX: 2040 RCX: 
RDX:  RSI:  RDI: 
RBP:  R08:  R09: 
R10:  R11:  R12: 
R13:  R14:  R15: 
Modules linked in:
Dumping ftrace buffer:
   (ftrace buffer empty)
---[ end trace f964ea7008b66351 ]---
RIP: 0010:__mm_populate+0x472/0x520 mm/gup.c:1242
Code: ea 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e aa 00 00 00 44 8b 75 98 45  
31 e4 e9 58 ff ff ff e8 b5 9e d1 ff 0f 0b e8 ae 9e d1 ff <0f> 0b 48 8b bd  
60 ff ff ff e8 d0 72 0f 00 e9 52 fc ff ff 48 8b bd

RSP: 0018:8801aae77ae0 EFLAGS: 00010293
RAX: 8801cfb48280 RBX: 8000 RCX: 81aa6a68
RDX:  RSI: 81aa6dc2 RDI: 0006
RBP: 8801aae77ba0 R08: 8801cfb48280 R09: fbfff133d66a
R10: 0003 R11:  R12: 7bf81000
R13: 7676 R14: dc00 R15: 
FS:  () GS:8801daf0(0063) knlGS:0865b900
CS:  0010 DS: 002b ES: 002b CR0: 80050033
CR2: 080e3a94 CR3: 0001cb021000 CR4: 001406e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkal...@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.

syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches


kernel BUG at mm/gup.c:LINE!

2018-07-03 Thread syzbot

Hello,

syzbot found the following crash on:

HEAD commit:d3bc0e67f852 Merge tag 'for-4.18-rc2-tag' of git://git.ker..
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=177c40
kernel config:  https://syzkaller.appspot.com/x/.config?x=a63be0c83e84d370
dashboard link: https://syzkaller.appspot.com/bug?extid=5dcb560fe12aa5091c06
compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
userspace arch: i386
syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=158577a240

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+5dcb560fe12aa5091...@syzkaller.appspotmail.com

IPv6: ADDRCONF(NETDEV_UP): veth0: link is not ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
8021q: adding VLAN 0 to HW filter on device team0
[ cut here ]
kernel BUG at mm/gup.c:1242!
invalid opcode:  [#1] SMP KASAN
CPU: 1 PID: 4837 Comm: syz-executor0 Not tainted 4.18.0-rc2+ #29
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

RIP: 0010:__mm_populate+0x472/0x520 mm/gup.c:1242
Code: ea 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e aa 00 00 00 44 8b 75 98 45  
31 e4 e9 58 ff ff ff e8 b5 9e d1 ff 0f 0b e8 ae 9e d1 ff <0f> 0b 48 8b bd  
60 ff ff ff e8 d0 72 0f 00 e9 52 fc ff ff 48 8b bd

RSP: 0018:8801aae77ae0 EFLAGS: 00010293
RAX: 8801cfb48280 RBX: 8000 RCX: 81aa6a68
RDX:  RSI: 81aa6dc2 RDI: 0006
RBP: 8801aae77ba0 R08: 8801cfb48280 R09: fbfff133d66a
R10: 0003 R11:  R12: 7bf81000
R13: 7676 R14: dc00 R15: 
FS:  () GS:8801daf0(0063) knlGS:0865b900
CS:  0010 DS: 002b ES: 002b CR0: 80050033
CR2: 080e3a94 CR3: 0001cb021000 CR4: 001406e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 mm_populate include/linux/mm.h:2296 [inline]
 vm_brk_flags+0x1fe/0x240 mm/mmap.c:3038
 vm_brk+0x1f/0x30 mm/mmap.c:3045
 load_elf_library+0x711/0x8e0 fs/binfmt_elf.c:1266
 __do_sys_uselib fs/exec.c:161 [inline]
 __se_sys_uselib fs/exec.c:120 [inline]
 __ia32_sys_uselib+0x37e/0x4c0 fs/exec.c:120
 do_syscall_32_irqs_on arch/x86/entry/common.c:326 [inline]
 do_fast_syscall_32+0x34d/0xfb2 arch/x86/entry/common.c:397
 entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139
RIP: 0023:0xf7fcbcb9
Code: 55 08 8b 88 64 cd ff ff 8b 98 68 cd ff ff 89 c8 85 d2 74 02 89 0a 5b  
5d c3 8b 04 24 c3 8b 1c 24 c3 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90  
90 90 90 eb 0d 90 90 90 90 90 90 90 90 90 90 90 90

RSP: 002b:ff8df4ac EFLAGS: 0282 ORIG_RAX: 0056
RAX: ffda RBX: 2040 RCX: 
RDX:  RSI:  RDI: 
RBP:  R08:  R09: 
R10:  R11:  R12: 
R13:  R14:  R15: 
Modules linked in:
Dumping ftrace buffer:
   (ftrace buffer empty)
---[ end trace f964ea7008b66351 ]---
RIP: 0010:__mm_populate+0x472/0x520 mm/gup.c:1242
Code: ea 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e aa 00 00 00 44 8b 75 98 45  
31 e4 e9 58 ff ff ff e8 b5 9e d1 ff 0f 0b e8 ae 9e d1 ff <0f> 0b 48 8b bd  
60 ff ff ff e8 d0 72 0f 00 e9 52 fc ff ff 48 8b bd

RSP: 0018:8801aae77ae0 EFLAGS: 00010293
RAX: 8801cfb48280 RBX: 8000 RCX: 81aa6a68
RDX:  RSI: 81aa6dc2 RDI: 0006
RBP: 8801aae77ba0 R08: 8801cfb48280 R09: fbfff133d66a
R10: 0003 R11:  R12: 7bf81000
R13: 7676 R14: dc00 R15: 
FS:  () GS:8801daf0(0063) knlGS:0865b900
CS:  0010 DS: 002b ES: 002b CR0: 80050033
CR2: 080e3a94 CR3: 0001cb021000 CR4: 001406e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkal...@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.

syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches


Re: [build-check] scripts: add check_build script

2018-07-03 Thread Andrew Morton



On July 3, 2018 5:21:20 PM PDT, Stephen Rothwell  wrote:
>Hi all,
>
>On Tue, 3 Jul 2018 16:27:06 -0700 Randy Dunlap 
>wrote:
>>
>> On 07/03/2018 03:49 PM, Andrew Morton wrote:
>> > On Tue, 3 Jul 2018 15:12:10 +0200 Norbert Manthey
> wrote:
>> >> +build ()
>> >> +{
>> >> + local -r LOG_FILE="$1"
>> >> + local -i STATUS=0
>> >> +
>> >> + make clean -j $(nproc) &> /dev/null
>> >> + make -j $(nproc) &>> "$LOG_FILE" || STATUS=$?
>> >> +
>> >> + echo "build status: $STATUS" >> "$LOG_FILE"
>> >> + echo "[$SECONDS] build status: $STATUS"
>> >> + return "$STATUS"
>> >> +}  
>> > 
>> > The script never sets nproc.  So I guess this is a bare `make -j'. 
>> > When I type that on my (quite beefy) workstation I get eleventy
>xillion
>> > processes and the machine locks up.  Can't even wiggle the mouse. 
>> > After a 20 minute nap (thanks!) it was still comatose so I hit the
>big
>> > button (who writes this junk??).
>> > 
>> > So you might want to take an educated guess from /proc/cpuinfo
>here.  
>> 
>> or use
>> nproc=`getconf _NPROCESSORS_ONLN`
>> 
>> or double it if you want to keep processes ready/waiting.
>
>I have a program called nproc on my system.  It is part of coreutils.

Ohdoh, I misread, sorry. 


Re: [build-check] scripts: add check_build script

2018-07-03 Thread Andrew Morton



On July 3, 2018 5:21:20 PM PDT, Stephen Rothwell  wrote:
>Hi all,
>
>On Tue, 3 Jul 2018 16:27:06 -0700 Randy Dunlap 
>wrote:
>>
>> On 07/03/2018 03:49 PM, Andrew Morton wrote:
>> > On Tue, 3 Jul 2018 15:12:10 +0200 Norbert Manthey
> wrote:
>> >> +build ()
>> >> +{
>> >> + local -r LOG_FILE="$1"
>> >> + local -i STATUS=0
>> >> +
>> >> + make clean -j $(nproc) &> /dev/null
>> >> + make -j $(nproc) &>> "$LOG_FILE" || STATUS=$?
>> >> +
>> >> + echo "build status: $STATUS" >> "$LOG_FILE"
>> >> + echo "[$SECONDS] build status: $STATUS"
>> >> + return "$STATUS"
>> >> +}  
>> > 
>> > The script never sets nproc.  So I guess this is a bare `make -j'. 
>> > When I type that on my (quite beefy) workstation I get eleventy
>xillion
>> > processes and the machine locks up.  Can't even wiggle the mouse. 
>> > After a 20 minute nap (thanks!) it was still comatose so I hit the
>big
>> > button (who writes this junk??).
>> > 
>> > So you might want to take an educated guess from /proc/cpuinfo
>here.  
>> 
>> or use
>> nproc=`getconf _NPROCESSORS_ONLN`
>> 
>> or double it if you want to keep processes ready/waiting.
>
>I have a program called nproc on my system.  It is part of coreutils.

Ohdoh, I misread, sorry. 


Re: [PATCH] userfaultfd: hugetlbfs: Fix userfaultfd_huge_must_wait pte access

2018-07-03 Thread Andrea Arcangeli
Hello,

On Wed, Jun 27, 2018 at 10:47:44AM +0200, Janosch Frank wrote:
> On 26.06.2018 19:00, Mike Kravetz wrote:
> > On 06/26/2018 06:24 AM, Janosch Frank wrote:
> >> Use huge_ptep_get to translate huge ptes to normal ptes so we can
> >> check them with the huge_pte_* functions. Otherwise some architectures
> >> will check the wrong values and will not wait for userspace to bring
> >> in the memory.
> >>
> >> Signed-off-by: Janosch Frank 
> >> Fixes: 369cd2121be4 ("userfaultfd: hugetlbfs: userfaultfd_huge_must_wait 
> >> for hugepmd ranges")
> > Adding linux-mm and Andrew on Cc:
> > 
> > Thanks for catching and fixing this.
> 
> Sure
> I'd be happy if we get less of these problems with time, this one was
> rather painful to debug. :)

What I thought when I read the fix is it would be more robust and we
could catch any further error like this at build time by having
huge_pte_offset return a new type "hugepte_t *" instead of the current
"pte_t *". Of course then huge_ptep_get() would take a "hugepte_t *" as
parameter. The x86 implementation would then become:

static inline pte_t huge_ptep_get(hugepte_t *ptep)
{
return *(pte_t *)ptep;
}

I haven't tried it, perhaps it's not feasible for other reasons
because there's a significant fallout from such a change (i.e. a lot
of hugetlbfs methods needs to change input type), but you said you're
actively looking to get less of these problems this could be a way if
it can be done, so I should mention it.

The need of huge_ptep_get() of course is very apparent when reading the
fix, but it was all but apparent when reading the previous code and the
previous code was correct for x86 because of course huge_ptep_get is
implemented as *ptep on x86.

For now the current fix is certainly good, any robustness cleanup is
cleaner if done orthogonal anyway.

Thanks!
Andrea


Re: [PATCHv5 0/4] Salted build ids via ELF notes

2018-07-03 Thread Masahiro Yamada
Hi.

2018-07-04 8:21 GMT+09:00 Laura Abbott :
>
> Hi,
>
> This is v5 of the series to allow unique build ids in the kernel. As a
> reminder of the context:




> ""
> In Fedora, the debug information is packaged separately (foo-debuginfo) and
> can be installed separately. There's been a long standing issue where only one
> version of a debuginfo info package can be installed at a time. Mark Wielaard
> made an effort for Fedora 27 to allow parallel installation of debuginfo (see
> https://fedoraproject.org/wiki/Changes/ParallelInstallableDebuginfo for
> more details)
>
> Part of the requirement to allow this to work is that build ids are
> unique between builds. The existing upstream rpm implementation ensures
> this by re-calculating the build-id using the version and release as a
> seed. This doesn't work 100% for the kernel because of the vDSO which is
> its own binary and doesn't get updated. After poking holes in a few of my
> ideas, there was a discussion with some people from the binutils team about
> adding --build-id-salt to let ld do the calculation debugedit is doing. There
> was a counter proposal made to add in the salt while building. The
> easiest proposal was to add an item in the linker script vs. linking in
> an object since we need the salt to go in every module as well as the
> kernel and vmlinux.
> ""

I think this information is helpful to explain the background
of this work, but the cover letter cannot be committed in git.

Could you add this in 1/4 please?


If I read only the simple log in 1/4,
I would wonder why it is useful...







> v5 uses the approach suggested by Masahiro Yamada which uses the
> existing ELF note macro to more easily add the salt (vs previous
> approaches which tried to adjust via linker section).
>
> If arch maintainers are okay, I'd like acks for this so this can go
> through the kbuild tree.
>
> Thanks,
> Laura
>
> Laura Abbott (4):
>   kbuild: Add build salt to the kernel and modules
>   x86: Add build salt to the vDSO
>   powerpc: Add build salt to the vDSO
>   arm64: Add build salt to the vDSO
>
>  arch/arm64/kernel/vdso/note.S |  3 +++
>  arch/powerpc/kernel/vdso32/note.S |  3 +++
>  arch/x86/entry/vdso/vdso-note.S   |  3 +++
>  arch/x86/entry/vdso/vdso32/note.S |  3 +++
>  include/linux/build-salt.h| 20 
>  init/Kconfig  |  9 +
>  init/version.c|  3 +++
>  scripts/mod/modpost.c |  3 +++
>  8 files changed, 47 insertions(+)
>  create mode 100644 include/linux/build-salt.h
>
> --
> 2.17.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kbuild" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Best Regards
Masahiro Yamada


Re: [PATCH] userfaultfd: hugetlbfs: Fix userfaultfd_huge_must_wait pte access

2018-07-03 Thread Andrea Arcangeli
Hello,

On Wed, Jun 27, 2018 at 10:47:44AM +0200, Janosch Frank wrote:
> On 26.06.2018 19:00, Mike Kravetz wrote:
> > On 06/26/2018 06:24 AM, Janosch Frank wrote:
> >> Use huge_ptep_get to translate huge ptes to normal ptes so we can
> >> check them with the huge_pte_* functions. Otherwise some architectures
> >> will check the wrong values and will not wait for userspace to bring
> >> in the memory.
> >>
> >> Signed-off-by: Janosch Frank 
> >> Fixes: 369cd2121be4 ("userfaultfd: hugetlbfs: userfaultfd_huge_must_wait 
> >> for hugepmd ranges")
> > Adding linux-mm and Andrew on Cc:
> > 
> > Thanks for catching and fixing this.
> 
> Sure
> I'd be happy if we get less of these problems with time, this one was
> rather painful to debug. :)

What I thought when I read the fix is it would be more robust and we
could catch any further error like this at build time by having
huge_pte_offset return a new type "hugepte_t *" instead of the current
"pte_t *". Of course then huge_ptep_get() would take a "hugepte_t *" as
parameter. The x86 implementation would then become:

static inline pte_t huge_ptep_get(hugepte_t *ptep)
{
return *(pte_t *)ptep;
}

I haven't tried it, perhaps it's not feasible for other reasons
because there's a significant fallout from such a change (i.e. a lot
of hugetlbfs methods needs to change input type), but you said you're
actively looking to get less of these problems this could be a way if
it can be done, so I should mention it.

The need of huge_ptep_get() of course is very apparent when reading the
fix, but it was all but apparent when reading the previous code and the
previous code was correct for x86 because of course huge_ptep_get is
implemented as *ptep on x86.

For now the current fix is certainly good, any robustness cleanup is
cleaner if done orthogonal anyway.

Thanks!
Andrea


Re: [PATCHv5 0/4] Salted build ids via ELF notes

2018-07-03 Thread Masahiro Yamada
Hi.

2018-07-04 8:21 GMT+09:00 Laura Abbott :
>
> Hi,
>
> This is v5 of the series to allow unique build ids in the kernel. As a
> reminder of the context:




> ""
> In Fedora, the debug information is packaged separately (foo-debuginfo) and
> can be installed separately. There's been a long standing issue where only one
> version of a debuginfo info package can be installed at a time. Mark Wielaard
> made an effort for Fedora 27 to allow parallel installation of debuginfo (see
> https://fedoraproject.org/wiki/Changes/ParallelInstallableDebuginfo for
> more details)
>
> Part of the requirement to allow this to work is that build ids are
> unique between builds. The existing upstream rpm implementation ensures
> this by re-calculating the build-id using the version and release as a
> seed. This doesn't work 100% for the kernel because of the vDSO which is
> its own binary and doesn't get updated. After poking holes in a few of my
> ideas, there was a discussion with some people from the binutils team about
> adding --build-id-salt to let ld do the calculation debugedit is doing. There
> was a counter proposal made to add in the salt while building. The
> easiest proposal was to add an item in the linker script vs. linking in
> an object since we need the salt to go in every module as well as the
> kernel and vmlinux.
> ""

I think this information is helpful to explain the background
of this work, but the cover letter cannot be committed in git.

Could you add this in 1/4 please?


If I read only the simple log in 1/4,
I would wonder why it is useful...







> v5 uses the approach suggested by Masahiro Yamada which uses the
> existing ELF note macro to more easily add the salt (vs previous
> approaches which tried to adjust via linker section).
>
> If arch maintainers are okay, I'd like acks for this so this can go
> through the kbuild tree.
>
> Thanks,
> Laura
>
> Laura Abbott (4):
>   kbuild: Add build salt to the kernel and modules
>   x86: Add build salt to the vDSO
>   powerpc: Add build salt to the vDSO
>   arm64: Add build salt to the vDSO
>
>  arch/arm64/kernel/vdso/note.S |  3 +++
>  arch/powerpc/kernel/vdso32/note.S |  3 +++
>  arch/x86/entry/vdso/vdso-note.S   |  3 +++
>  arch/x86/entry/vdso/vdso32/note.S |  3 +++
>  include/linux/build-salt.h| 20 
>  init/Kconfig  |  9 +
>  init/version.c|  3 +++
>  scripts/mod/modpost.c |  3 +++
>  8 files changed, 47 insertions(+)
>  create mode 100644 include/linux/build-salt.h
>
> --
> 2.17.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kbuild" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Best Regards
Masahiro Yamada


RE: [RFC PATCH 1/2] fpga: fpga-mgr: Add readback support

2018-07-03 Thread Appana Durga Kedareswara Rao
Hi Alan,

Thanks for the review... 
Please find comments inline... 


> 
> Hi Appana,
> 
> > Inorder to debug issues with fpga's users would like to read the fpga
> > configuration information.
> > This patch adds readback support for fpga configuration data in the
> > framework through debugfs interface.
> >
> > Usage:
> > cat /sys/kernel/debug/fpga/readback
> 
> Two things here: I'd prefer calling the attribute "image" rather than
> "readback"
> 
> This should be an entry per fpga manager, not one entry for the whole
> framework, so
> 
> cat /sys/kernel/debug/fpga/fpga0/image

Sure will fix in v2... 

> 
> >
> > Signed-off-by: Appana Durga Kedareswara rao
> > 
> > ---
> >  drivers/fpga/fpga-mgr.c   | 52
> +++
> >  include/linux/fpga/fpga-mgr.h |  6 +
> >  2 files changed, 58 insertions(+)
> >
> > diff --git a/drivers/fpga/fpga-mgr.c b/drivers/fpga/fpga-mgr.c index
> > 9939d2c..7a9fd7c 100644
> > --- a/drivers/fpga/fpga-mgr.c
> > +++ b/drivers/fpga/fpga-mgr.c
> > @@ -484,6 +484,39 @@ void fpga_mgr_put(struct fpga_manager *mgr)  }
> > EXPORT_SYMBOL_GPL(fpga_mgr_put);
> >
> > +#ifdef CONFIG_DEBUG_FS
> > +#include 
> > +
> > +static int fpga_mgr_read(struct seq_file *s, void *data) {
> > +   struct fpga_manager *mgr = (struct fpga_manager *)s->private;
> > +   int ret = 0;
> 
> Here you should return an error for mgr's that don't support read function:
> 
> if (!mgr->mops->read)
> return -ENOENT;

Will fix in v2...

> 
> Then you probably should lock the mgr so that nobody tries to write it while
> you are reading.  If you can't lock it, return -EBUSY.

Ok sure will fix in v2... 

> 
> > +
> > +   if (mgr->state != FPGA_MGR_STATE_OPERATING)
> > +   return -EBUSY;
> 
> If the FPGA isn't programmed, I'm not sure that EBUSY would be the correct
> error.

Ok, Should I return EPERM (or) EAGAIN here??? 

> 
> > +
> > +   /* Read the FPGA configuration data from the fabric */
> > +   ret = mgr->mops->read(mgr, s);
> > +   if (ret) {
> > +   dev_err(>dev, "Error while reading configuration data 
> > from
> FPGA\n");
> > +   return ret;
> 
> Don't need this return here since we return right afterwards anyway.

Will fix in v2...

> 
> > +   }
> > +
> > +   return ret;
> > +}
> > +
> > +static int fpga_mgr_read_open(struct inode *inode, struct file *file)
> > +{
> > +   return single_open(file, fpga_mgr_read, inode->i_private); }
> > +
> > +static const struct file_operations fpga_mgr_debugops = {
> 
> Suggest including the name of the debugfs file in the name since more
> debugfs files will be added over time, so
> 
> s/fpga_mgr_debugops/fpga_mgr_ops_image/ or something.

Sure will use fpga_mgr_ops_image 

> 
> > +   .owner = THIS_MODULE,
> > +   .open = fpga_mgr_read_open,
> > +   .read = seq_read,
> > +};
> > +#endif
> > +
> >  /**
> >   * fpga_mgr_lock - Lock FPGA manager for exclusive use
> >   * @mgr:   fpga manager
> > @@ -581,6 +614,21 @@ int fpga_mgr_register(struct device *dev, const
> char *name,
> > if (ret)
> > goto error_device;
> >
> > +#ifdef CONFIG_DEBUG_FS
> 
> I'd prefer an added config such as CONFIG_FPGA_MGR_DEBUG_FS.

Ok, will add an entry in the Kconfig with the name as you suggested in v2...  

> 
> > +   mgr->dir = debugfs_create_dir("fpga", NULL);
> > +   if (!mgr->dir)
> > +   goto error_device;
> > +
> > +   mgr->parent = mgr->dir;
> > +   mgr->dir = debugfs_create_file("readback", 0644, mgr->parent, mgr,
> > +  _mgr_debugops);
> > +   if (!mgr->dir) {
> > +   debugfs_remove_recursive(mgr->parent);
> > +   mgr->parent = NULL;
> > +   goto error_device;
> > +   }
> > +#endif
> > +
> > dev_info(>dev, "%s registered\n", mgr->name);
> >
> > return 0;
> > @@ -604,6 +652,10 @@ void fpga_mgr_unregister(struct device *dev)
> >
> > dev_info(>dev, "%s %s\n", __func__, mgr->name);
> >
> > +#ifdef CONFIG_DEBUG_FS
> > +   debugfs_remove_recursive(mgr->parent);
> > +   mgr->parent = NULL;
> > +#endif
> > /*
> >  * If the low level driver provides a method for putting fpga into
> >  * a desired state upon unregister, do it.
> > diff --git a/include/linux/fpga/fpga-mgr.h
> > b/include/linux/fpga/fpga-mgr.h index 3c6de23..6013809 100644
> > --- a/include/linux/fpga/fpga-mgr.h
> > +++ b/include/linux/fpga/fpga-mgr.h
> > @@ -114,6 +114,7 @@ struct fpga_image_info {
> >   * @write: write count bytes of configuration data to the FPGA
> >   * @write_sg: write the scatter list of configuration data to the FPGA
> >   * @write_complete: set FPGA to operating state after writing is done
> > + * @read: read FPGA configuration information
> 
> Please add note that the read ops is optional similar to below.

Sure will add notes in v2... 

> 
> >   * @fpga_remove: optional: Set FPGA 

RE: [RFC PATCH 1/2] fpga: fpga-mgr: Add readback support

2018-07-03 Thread Appana Durga Kedareswara Rao
Hi Alan,

Thanks for the review... 
Please find comments inline... 


> 
> Hi Appana,
> 
> > Inorder to debug issues with fpga's users would like to read the fpga
> > configuration information.
> > This patch adds readback support for fpga configuration data in the
> > framework through debugfs interface.
> >
> > Usage:
> > cat /sys/kernel/debug/fpga/readback
> 
> Two things here: I'd prefer calling the attribute "image" rather than
> "readback"
> 
> This should be an entry per fpga manager, not one entry for the whole
> framework, so
> 
> cat /sys/kernel/debug/fpga/fpga0/image

Sure will fix in v2... 

> 
> >
> > Signed-off-by: Appana Durga Kedareswara rao
> > 
> > ---
> >  drivers/fpga/fpga-mgr.c   | 52
> +++
> >  include/linux/fpga/fpga-mgr.h |  6 +
> >  2 files changed, 58 insertions(+)
> >
> > diff --git a/drivers/fpga/fpga-mgr.c b/drivers/fpga/fpga-mgr.c index
> > 9939d2c..7a9fd7c 100644
> > --- a/drivers/fpga/fpga-mgr.c
> > +++ b/drivers/fpga/fpga-mgr.c
> > @@ -484,6 +484,39 @@ void fpga_mgr_put(struct fpga_manager *mgr)  }
> > EXPORT_SYMBOL_GPL(fpga_mgr_put);
> >
> > +#ifdef CONFIG_DEBUG_FS
> > +#include 
> > +
> > +static int fpga_mgr_read(struct seq_file *s, void *data) {
> > +   struct fpga_manager *mgr = (struct fpga_manager *)s->private;
> > +   int ret = 0;
> 
> Here you should return an error for mgr's that don't support read function:
> 
> if (!mgr->mops->read)
> return -ENOENT;

Will fix in v2...

> 
> Then you probably should lock the mgr so that nobody tries to write it while
> you are reading.  If you can't lock it, return -EBUSY.

Ok sure will fix in v2... 

> 
> > +
> > +   if (mgr->state != FPGA_MGR_STATE_OPERATING)
> > +   return -EBUSY;
> 
> If the FPGA isn't programmed, I'm not sure that EBUSY would be the correct
> error.

Ok, Should I return EPERM (or) EAGAIN here??? 

> 
> > +
> > +   /* Read the FPGA configuration data from the fabric */
> > +   ret = mgr->mops->read(mgr, s);
> > +   if (ret) {
> > +   dev_err(>dev, "Error while reading configuration data 
> > from
> FPGA\n");
> > +   return ret;
> 
> Don't need this return here since we return right afterwards anyway.

Will fix in v2...

> 
> > +   }
> > +
> > +   return ret;
> > +}
> > +
> > +static int fpga_mgr_read_open(struct inode *inode, struct file *file)
> > +{
> > +   return single_open(file, fpga_mgr_read, inode->i_private); }
> > +
> > +static const struct file_operations fpga_mgr_debugops = {
> 
> Suggest including the name of the debugfs file in the name since more
> debugfs files will be added over time, so
> 
> s/fpga_mgr_debugops/fpga_mgr_ops_image/ or something.

Sure will use fpga_mgr_ops_image 

> 
> > +   .owner = THIS_MODULE,
> > +   .open = fpga_mgr_read_open,
> > +   .read = seq_read,
> > +};
> > +#endif
> > +
> >  /**
> >   * fpga_mgr_lock - Lock FPGA manager for exclusive use
> >   * @mgr:   fpga manager
> > @@ -581,6 +614,21 @@ int fpga_mgr_register(struct device *dev, const
> char *name,
> > if (ret)
> > goto error_device;
> >
> > +#ifdef CONFIG_DEBUG_FS
> 
> I'd prefer an added config such as CONFIG_FPGA_MGR_DEBUG_FS.

Ok, will add an entry in the Kconfig with the name as you suggested in v2...  

> 
> > +   mgr->dir = debugfs_create_dir("fpga", NULL);
> > +   if (!mgr->dir)
> > +   goto error_device;
> > +
> > +   mgr->parent = mgr->dir;
> > +   mgr->dir = debugfs_create_file("readback", 0644, mgr->parent, mgr,
> > +  _mgr_debugops);
> > +   if (!mgr->dir) {
> > +   debugfs_remove_recursive(mgr->parent);
> > +   mgr->parent = NULL;
> > +   goto error_device;
> > +   }
> > +#endif
> > +
> > dev_info(>dev, "%s registered\n", mgr->name);
> >
> > return 0;
> > @@ -604,6 +652,10 @@ void fpga_mgr_unregister(struct device *dev)
> >
> > dev_info(>dev, "%s %s\n", __func__, mgr->name);
> >
> > +#ifdef CONFIG_DEBUG_FS
> > +   debugfs_remove_recursive(mgr->parent);
> > +   mgr->parent = NULL;
> > +#endif
> > /*
> >  * If the low level driver provides a method for putting fpga into
> >  * a desired state upon unregister, do it.
> > diff --git a/include/linux/fpga/fpga-mgr.h
> > b/include/linux/fpga/fpga-mgr.h index 3c6de23..6013809 100644
> > --- a/include/linux/fpga/fpga-mgr.h
> > +++ b/include/linux/fpga/fpga-mgr.h
> > @@ -114,6 +114,7 @@ struct fpga_image_info {
> >   * @write: write count bytes of configuration data to the FPGA
> >   * @write_sg: write the scatter list of configuration data to the FPGA
> >   * @write_complete: set FPGA to operating state after writing is done
> > + * @read: read FPGA configuration information
> 
> Please add note that the read ops is optional similar to below.

Sure will add notes in v2... 

> 
> >   * @fpga_remove: optional: Set FPGA 

Re: [RFC PATCH] ACPI: bus: match of_device_id using acpi device

2018-07-03 Thread Srinath Mannam
Hi Sudeep, Andy,

Yes, This patch is to get of_device_id and then fetch data pointer.

To add ACPI support in multiple drivers which are device-tree based
and has list of of_device_ids, by using this function
very minimal changes and can avoid acpi_device_id list in the driver.
I will send driver changes where this function used to add ACPI
support in following patches.

Below are the changes added to add ACPI support in sdhci iproc driver
using this function.

diff --git a/drivers/mmc/host/sdhci-iproc.c b/drivers/mmc/host/sdhci-iproc.c
index db40218..f1ecac97 100644
--- a/drivers/mmc/host/sdhci-iproc.c
+++ b/drivers/mmc/host/sdhci-iproc.c
@@ -15,6 +15,7 @@
  * iProc SDHCI platform driver
  */

+#include 
 #include 
 #include 
 #include 
@@ -267,8 +268,13 @@ static int sdhci_iproc_probe(struct platform_device *pdev)
int ret;

match = of_match_device(sdhci_iproc_of_match, >dev);
-   if (!match)
-   return -EINVAL;
+   if (!match) {
+   match = acpi_match_of_device_id(sdhci_iproc_of_match,
+   >dev);
+   if (!match)
+   return -EINVAL;
+   }
+
iproc_data = match->data;

host = sdhci_pltfm_init(pdev, iproc_data->pdata, sizeof(*iproc_host));

Regards,
Srinath.



On Tue, Jul 3, 2018 at 11:11 PM, Andy Shevchenko
 wrote:
> On Tue, Jul 3, 2018 at 12:22 PM, Srinath Mannam
>  wrote:
>> This patch provides a function, to get of_device_id after
>> matching with ACPI device _DSD object compatible property
>> in the case driver does not contain acpi_device_id list
>> and driver probe called for ACPI device ID PRP0001 with
>> compatible property match with of_device_id compatible.
>
> I don't see any usefulness of this function. Care to provide a real use case?
>
> --
> With Best Regards,
> Andy Shevchenko


Re: [RFC PATCH] ACPI: bus: match of_device_id using acpi device

2018-07-03 Thread Srinath Mannam
Hi Sudeep, Andy,

Yes, This patch is to get of_device_id and then fetch data pointer.

To add ACPI support in multiple drivers which are device-tree based
and has list of of_device_ids, by using this function
very minimal changes and can avoid acpi_device_id list in the driver.
I will send driver changes where this function used to add ACPI
support in following patches.

Below are the changes added to add ACPI support in sdhci iproc driver
using this function.

diff --git a/drivers/mmc/host/sdhci-iproc.c b/drivers/mmc/host/sdhci-iproc.c
index db40218..f1ecac97 100644
--- a/drivers/mmc/host/sdhci-iproc.c
+++ b/drivers/mmc/host/sdhci-iproc.c
@@ -15,6 +15,7 @@
  * iProc SDHCI platform driver
  */

+#include 
 #include 
 #include 
 #include 
@@ -267,8 +268,13 @@ static int sdhci_iproc_probe(struct platform_device *pdev)
int ret;

match = of_match_device(sdhci_iproc_of_match, >dev);
-   if (!match)
-   return -EINVAL;
+   if (!match) {
+   match = acpi_match_of_device_id(sdhci_iproc_of_match,
+   >dev);
+   if (!match)
+   return -EINVAL;
+   }
+
iproc_data = match->data;

host = sdhci_pltfm_init(pdev, iproc_data->pdata, sizeof(*iproc_host));

Regards,
Srinath.



On Tue, Jul 3, 2018 at 11:11 PM, Andy Shevchenko
 wrote:
> On Tue, Jul 3, 2018 at 12:22 PM, Srinath Mannam
>  wrote:
>> This patch provides a function, to get of_device_id after
>> matching with ACPI device _DSD object compatible property
>> in the case driver does not contain acpi_device_id list
>> and driver probe called for ACPI device ID PRP0001 with
>> compatible property match with of_device_id compatible.
>
> I don't see any usefulness of this function. Care to provide a real use case?
>
> --
> With Best Regards,
> Andy Shevchenko


linux-next: build warning after merge of the slave-dma tree

2018-07-03 Thread Stephen Rothwell
Hi Vinod,

After merging the slave-dma tree, today's linux-next build (x86_64
allmodconfig) produced this warning:

WARNING: modpost: missing MODULE_LICENSE() in drivers/dma/fsl-edma-common.o

Introduced by commit

  6ad069123f03 ("dmaengine: fsl-edma: extract common fsl-edma code (no changes 
in behavior intended)")

-- 
Cheers,
Stephen Rothwell


pgplP29CfMj2l.pgp
Description: OpenPGP digital signature


linux-next: build warning after merge of the slave-dma tree

2018-07-03 Thread Stephen Rothwell
Hi Vinod,

After merging the slave-dma tree, today's linux-next build (x86_64
allmodconfig) produced this warning:

WARNING: modpost: missing MODULE_LICENSE() in drivers/dma/fsl-edma-common.o

Introduced by commit

  6ad069123f03 ("dmaengine: fsl-edma: extract common fsl-edma code (no changes 
in behavior intended)")

-- 
Cheers,
Stephen Rothwell


pgplP29CfMj2l.pgp
Description: OpenPGP digital signature


[PATCH v2] tg: show the sum wait time of an task group

2018-07-03 Thread 王贇

Although we can rely on cpuacct to present the cpu usage of task
group, it is hard to tell how intense the competition is between
these groups on cpu resources.

Monitoring the wait time of each process or sched_debug could cost
too much, and there is no good way to accurately represent the
conflict with these info, we need the wait time on group dimension.

Thus we introduced group's wait_sum represent the conflict between
task groups, which is simply sum the wait time of group's cfs_rq.

The 'cpu.stat' is modified to show the statistic, like:

  nr_periods 0
  nr_throttled 0
  throttled_time 0
  wait_sum 2035098795584

Now we can monitor the changing on wait_sum to tell how suffering
a task group is in the fight of cpu resources.

For example:
  (wait_sum - last_wait_sum) * 100 / (nr_cpu * period_ns) == X%

means the task group paid X percentage of period on waiting
for the cpu.

Signed-off-by: Michael Wang 
---

Since v1:
  Use schedstat_val to avoid compile error
  Check and skip root_task_group

 kernel/sched/core.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 78d8fac..80ab995 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6781,6 +6781,8 @@ static int __cfs_schedulable(struct task_group *tg, u64 
period, u64 quota)

 static int cpu_cfs_stat_show(struct seq_file *sf, void *v)
 {
+   int i;
+   u64 ws = 0;
struct task_group *tg = css_tg(seq_css(sf));
struct cfs_bandwidth *cfs_b = >cfs_bandwidth;

@@ -6788,6 +6790,12 @@ static int cpu_cfs_stat_show(struct seq_file *sf, void 
*v)
seq_printf(sf, "nr_throttled %d\n", cfs_b->nr_throttled);
seq_printf(sf, "throttled_time %llu\n", cfs_b->throttled_time);

+   if (schedstat_enabled() && tg != _task_group) {
+   for_each_possible_cpu(i)
+   ws += schedstat_val(tg->se[i]->statistics.wait_sum);
+   seq_printf(sf, "wait_sum %llu\n", ws);
+   }
+
return 0;
 }
 #endif /* CONFIG_CFS_BANDWIDTH */
--
1.8.3.1



[PATCH v2] tg: show the sum wait time of an task group

2018-07-03 Thread 王贇

Although we can rely on cpuacct to present the cpu usage of task
group, it is hard to tell how intense the competition is between
these groups on cpu resources.

Monitoring the wait time of each process or sched_debug could cost
too much, and there is no good way to accurately represent the
conflict with these info, we need the wait time on group dimension.

Thus we introduced group's wait_sum represent the conflict between
task groups, which is simply sum the wait time of group's cfs_rq.

The 'cpu.stat' is modified to show the statistic, like:

  nr_periods 0
  nr_throttled 0
  throttled_time 0
  wait_sum 2035098795584

Now we can monitor the changing on wait_sum to tell how suffering
a task group is in the fight of cpu resources.

For example:
  (wait_sum - last_wait_sum) * 100 / (nr_cpu * period_ns) == X%

means the task group paid X percentage of period on waiting
for the cpu.

Signed-off-by: Michael Wang 
---

Since v1:
  Use schedstat_val to avoid compile error
  Check and skip root_task_group

 kernel/sched/core.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 78d8fac..80ab995 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6781,6 +6781,8 @@ static int __cfs_schedulable(struct task_group *tg, u64 
period, u64 quota)

 static int cpu_cfs_stat_show(struct seq_file *sf, void *v)
 {
+   int i;
+   u64 ws = 0;
struct task_group *tg = css_tg(seq_css(sf));
struct cfs_bandwidth *cfs_b = >cfs_bandwidth;

@@ -6788,6 +6790,12 @@ static int cpu_cfs_stat_show(struct seq_file *sf, void 
*v)
seq_printf(sf, "nr_throttled %d\n", cfs_b->nr_throttled);
seq_printf(sf, "throttled_time %llu\n", cfs_b->throttled_time);

+   if (schedstat_enabled() && tg != _task_group) {
+   for_each_possible_cpu(i)
+   ws += schedstat_val(tg->se[i]->statistics.wait_sum);
+   seq_printf(sf, "wait_sum %llu\n", ws);
+   }
+
return 0;
 }
 #endif /* CONFIG_CFS_BANDWIDTH */
--
1.8.3.1



Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-03 Thread Benjamin Gilbert
On Tue, Jul 03, 2018 at 05:21:50PM +0300, Kirill A. Shutemov wrote:
> I don't know how to solve it. As far as I know we don't support compiling
> kernel with LTO in mainline.
> 
> Any suggestions?
> 
> Benjamin, do you change LDFLAGS or CFLAGS when compiling the kernel?

We're using the standard build flags as far as I can tell.  In particular,
we don't enable LTO, and I've verified that -flto isn't in the build logs.

Here's a sample image:

https://users.developer.core-os.net/bgilbert/4.17/vmlinuz-4.17.3-coreos
https://users.developer.core-os.net/bgilbert/4.17/vmlinux-4.17.3-coreos
https://users.developer.core-os.net/bgilbert/4.17/System.map

--Benjamin Gilbert


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-03 Thread Benjamin Gilbert
On Tue, Jul 03, 2018 at 05:21:50PM +0300, Kirill A. Shutemov wrote:
> I don't know how to solve it. As far as I know we don't support compiling
> kernel with LTO in mainline.
> 
> Any suggestions?
> 
> Benjamin, do you change LDFLAGS or CFLAGS when compiling the kernel?

We're using the standard build flags as far as I can tell.  In particular,
we don't enable LTO, and I've verified that -flto isn't in the build logs.

Here's a sample image:

https://users.developer.core-os.net/bgilbert/4.17/vmlinuz-4.17.3-coreos
https://users.developer.core-os.net/bgilbert/4.17/vmlinux-4.17.3-coreos
https://users.developer.core-os.net/bgilbert/4.17/System.map

--Benjamin Gilbert


Re: [PATCH V2 5/7] mmc: sdhci: add CMD23 support for v4 mode

2018-07-03 Thread Chunyan Zhang
On 23 June 2018 at 03:40, Adrian Hunter  wrote:
> On 06/15/2018 05:04 AM, Chunyan Zhang wrote:
>> Host Driver Version 4.10 adds a new bit in Host Control 2 Register
>> for selecting Auto CMD23 or Auto CMD12 for ADMA3 data transfer.
>
> We don't support ADMA3.  It would require changes to the block driver.
> So is this change needed?
>
>>
>> Signed-off-by: Chunyan Zhang 
>> ---
>>  drivers/mmc/host/sdhci.c | 16 +++-
>>  drivers/mmc/host/sdhci.h |  1 +
>>  2 files changed, 16 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
>> index b8ee124..3b2af7e 100644
>> --- a/drivers/mmc/host/sdhci.c
>> +++ b/drivers/mmc/host/sdhci.c
>> @@ -954,6 +954,20 @@ static inline bool sdhci_auto_cmd12(struct sdhci_host 
>> *host,
>>  !mrq->cap_cmd_during_tfr;
>>  }
>>
>> +static inline void sdhci_set_auto_cmd23(struct sdhci_host *host,
>> + struct mmc_command *cmd)
>> +{
>> + u16 ctrl2;
>> +
>> + if (host->v4_mode) {
>
> Isn't this only for a V4.1 controller, and doesn't the mode have to be "Auto
> Cmd Auto Select"?

I will send another version of changes for this new mode "Auto Cmd
Auto Select", let's see if the next iteration gets better then.

Thanks for your review,
Chunyan

>
>
>> + ctrl2 = sdhci_readw(host, SDHCI_HOST_CONTROL2);
>> + ctrl2 |= SDHCI_CMD23_ENABLE;
>> + sdhci_writew(host, ctrl2, SDHCI_HOST_CONTROL2);
>> + } else {
>> + sdhci_writel(host, cmd->mrq->sbc->arg, SDHCI_ARGUMENT2);
>> + }
>> +}
>> +
>>  static void sdhci_set_transfer_mode(struct sdhci_host *host,
>>   struct mmc_command *cmd)
>>  {
>> @@ -989,7 +1003,7 @@ static void sdhci_set_transfer_mode(struct sdhci_host 
>> *host,
>>   mode |= SDHCI_TRNS_AUTO_CMD12;
>>   else if (cmd->mrq->sbc && (host->flags & SDHCI_AUTO_CMD23)) {
>>   mode |= SDHCI_TRNS_AUTO_CMD23;
>> - sdhci_writel(host, cmd->mrq->sbc->arg, 
>> SDHCI_ARGUMENT2);
>> + sdhci_set_auto_cmd23(host, cmd);
>>   }
>>   }
>>
>> diff --git a/drivers/mmc/host/sdhci.h b/drivers/mmc/host/sdhci.h
>> index 1e84539..d5e1c10 100644
>> --- a/drivers/mmc/host/sdhci.h
>> +++ b/drivers/mmc/host/sdhci.h
>> @@ -185,6 +185,7 @@
>>  #define   SDHCI_CTRL_DRV_TYPE_D  0x0030
>>  #define  SDHCI_CTRL_EXEC_TUNING  0x0040
>>  #define  SDHCI_CTRL_TUNED_CLK0x0080
>> +#define  SDHCI_CMD23_ENABLE  0x0800
>>  #define  SDHCI_CTRL_V4_MODE  0x1000
>>  #define  SDHCI_CTRL_64BIT_ADDR   0x2000
>>  #define  SDHCI_CTRL_PRESET_VAL_ENABLE0x8000
>>
>


Re: [PATCH V2 3/7] mmc: sdhci: add ADMA2 64-bit addressing support for V4 mode

2018-07-03 Thread Chunyan Zhang
On 21 June 2018 at 21:20, Adrian Hunter  wrote:
> On 15/06/18 05:04, Chunyan Zhang wrote:
>> ADMA2 64-bit addressing support is divided into V3 mode and V4 mode.
>> So there are two kinds of descriptors for ADMA2 64-bit addressing
>> i.e. 96-bit Descriptor for V3 mode, and 128-bit Descriptor for V4
>> mode. 128-bit Descriptor is aligned to 8-byte.
>>
>> For V4 mode, ADMA2 64-bit addressing is enabled via Host Control 2
>> register.
>>
>> Signed-off-by: Chunyan Zhang 
>> ---
>>  drivers/mmc/host/sdhci.c | 50 
>> +++-
>>  drivers/mmc/host/sdhci.h | 23 +-
>>  2 files changed, 55 insertions(+), 18 deletions(-)
>>
>> diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
>> index f57201f..5d3b0d8 100644
>> --- a/drivers/mmc/host/sdhci.c
>> +++ b/drivers/mmc/host/sdhci.c
>> @@ -585,6 +585,8 @@ static void sdhci_adma_table_pre(struct sdhci_host *host,
>>   void *desc, *align;
>>   char *buffer;
>>   int len, offset, i;
>> + unsigned int adma2_align = SDHCI_ADMA2_ALIGN(host);
>> + unsigned int adma2_mask = SDHCI_ADMA2_MASK(host);
>>
>>   /*
>>* The spec does not specify endianness of descriptor table.
>> @@ -608,8 +610,8 @@ static void sdhci_adma_table_pre(struct sdhci_host *host,
>>* buffer for the (up to three) bytes that screw up the
>>* alignment.
>>*/
>> - offset = (SDHCI_ADMA2_ALIGN - (addr & SDHCI_ADMA2_MASK)) &
>> -  SDHCI_ADMA2_MASK;
>> + offset = (adma2_align - (addr & adma2_align)) &
>> +  adma2_mask;
>>   if (offset) {
>>   if (data->flags & MMC_DATA_WRITE) {
>>   buffer = sdhci_kmap_atomic(sg, );
>> @@ -623,8 +625,8 @@ static void sdhci_adma_table_pre(struct sdhci_host *host,
>>
>>   BUG_ON(offset > 65536);
>>
>> - align += SDHCI_ADMA2_ALIGN;
>> - align_addr += SDHCI_ADMA2_ALIGN;
>> + align += adma2_align;
>> + align_addr += adma2_align;
>>
>>   desc += host->desc_sz;
>>
>> @@ -668,13 +670,15 @@ static void sdhci_adma_table_post(struct sdhci_host 
>> *host,
>>   void *align;
>>   char *buffer;
>>   unsigned long flags;
>> + unsigned int adma2_align = SDHCI_ADMA2_ALIGN(host);
>> + unsigned int adma2_mask = SDHCI_ADMA2_MASK(host);
>>
>>   if (data->flags & MMC_DATA_READ) {
>>   bool has_unaligned = false;
>>
>>   /* Do a quick scan of the SG list for any unaligned mappings */
>>   for_each_sg(data->sg, sg, host->sg_count, i)
>> - if (sg_dma_address(sg) & SDHCI_ADMA2_MASK) {
>> + if (sg_dma_address(sg) & adma2_mask) {
>>   has_unaligned = true;
>>   break;
>>   }
>> @@ -686,15 +690,15 @@ static void sdhci_adma_table_post(struct sdhci_host 
>> *host,
>>   align = host->align_buffer;
>>
>>   for_each_sg(data->sg, sg, host->sg_count, i) {
>> - if (sg_dma_address(sg) & SDHCI_ADMA2_MASK) {
>> - size = SDHCI_ADMA2_ALIGN -
>> -(sg_dma_address(sg) & 
>> SDHCI_ADMA2_MASK);
>> + if (sg_dma_address(sg) & adma2_mask) {
>> + size = adma2_align -
>> +(sg_dma_address(sg) & 
>> adma2_mask);
>>
>>   buffer = sdhci_kmap_atomic(sg, );
>>   memcpy(buffer, align, size);
>>   sdhci_kunmap_atomic(buffer, );
>>
>> - align += SDHCI_ADMA2_ALIGN;
>> + align += adma2_align;
>>   }
>>   }
>>   }
>> @@ -3400,6 +3404,26 @@ static int sdhci_allocate_bounce_buffer(struct 
>> sdhci_host *host)
>>   return 0;
>>  }
>>
>> +static inline bool sdhci_use_64bit_dma(struct sdhci_host *host)
>> +{
>> + u32 addr64bit_en;
>> +
>> + /*
>> +  * According to SD Host Controller spec v4.10, bit[27] added from
>> +  * version 4.10 in Capabilities Register is used as 64-bit System
>> +  * Address support for V4 mode, 64-bit DMA Addressing for V4 mode
>> +  * is enabled only if 64-bit Addressing =1 in the Host Control 2
>> +  * register.
>> +  */
>> + if (host->version == SDHCI_SPEC_410 && host->v4_mode) {
>> + addr64bit_en = (sdhci_readw(host, SDHCI_HOST_CONTROL2) &
>> + SDHCI_CTRL_64BIT_ADDR);
>
> This seems the wrong way around.  SDHCI_CTRL_64BIT_ADDR should be set based
> on the driver's requirements, not read to determine 

Re: [PATCH V2 5/7] mmc: sdhci: add CMD23 support for v4 mode

2018-07-03 Thread Chunyan Zhang
On 23 June 2018 at 03:40, Adrian Hunter  wrote:
> On 06/15/2018 05:04 AM, Chunyan Zhang wrote:
>> Host Driver Version 4.10 adds a new bit in Host Control 2 Register
>> for selecting Auto CMD23 or Auto CMD12 for ADMA3 data transfer.
>
> We don't support ADMA3.  It would require changes to the block driver.
> So is this change needed?
>
>>
>> Signed-off-by: Chunyan Zhang 
>> ---
>>  drivers/mmc/host/sdhci.c | 16 +++-
>>  drivers/mmc/host/sdhci.h |  1 +
>>  2 files changed, 16 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
>> index b8ee124..3b2af7e 100644
>> --- a/drivers/mmc/host/sdhci.c
>> +++ b/drivers/mmc/host/sdhci.c
>> @@ -954,6 +954,20 @@ static inline bool sdhci_auto_cmd12(struct sdhci_host 
>> *host,
>>  !mrq->cap_cmd_during_tfr;
>>  }
>>
>> +static inline void sdhci_set_auto_cmd23(struct sdhci_host *host,
>> + struct mmc_command *cmd)
>> +{
>> + u16 ctrl2;
>> +
>> + if (host->v4_mode) {
>
> Isn't this only for a V4.1 controller, and doesn't the mode have to be "Auto
> Cmd Auto Select"?

I will send another version of changes for this new mode "Auto Cmd
Auto Select", let's see if the next iteration gets better then.

Thanks for your review,
Chunyan

>
>
>> + ctrl2 = sdhci_readw(host, SDHCI_HOST_CONTROL2);
>> + ctrl2 |= SDHCI_CMD23_ENABLE;
>> + sdhci_writew(host, ctrl2, SDHCI_HOST_CONTROL2);
>> + } else {
>> + sdhci_writel(host, cmd->mrq->sbc->arg, SDHCI_ARGUMENT2);
>> + }
>> +}
>> +
>>  static void sdhci_set_transfer_mode(struct sdhci_host *host,
>>   struct mmc_command *cmd)
>>  {
>> @@ -989,7 +1003,7 @@ static void sdhci_set_transfer_mode(struct sdhci_host 
>> *host,
>>   mode |= SDHCI_TRNS_AUTO_CMD12;
>>   else if (cmd->mrq->sbc && (host->flags & SDHCI_AUTO_CMD23)) {
>>   mode |= SDHCI_TRNS_AUTO_CMD23;
>> - sdhci_writel(host, cmd->mrq->sbc->arg, 
>> SDHCI_ARGUMENT2);
>> + sdhci_set_auto_cmd23(host, cmd);
>>   }
>>   }
>>
>> diff --git a/drivers/mmc/host/sdhci.h b/drivers/mmc/host/sdhci.h
>> index 1e84539..d5e1c10 100644
>> --- a/drivers/mmc/host/sdhci.h
>> +++ b/drivers/mmc/host/sdhci.h
>> @@ -185,6 +185,7 @@
>>  #define   SDHCI_CTRL_DRV_TYPE_D  0x0030
>>  #define  SDHCI_CTRL_EXEC_TUNING  0x0040
>>  #define  SDHCI_CTRL_TUNED_CLK0x0080
>> +#define  SDHCI_CMD23_ENABLE  0x0800
>>  #define  SDHCI_CTRL_V4_MODE  0x1000
>>  #define  SDHCI_CTRL_64BIT_ADDR   0x2000
>>  #define  SDHCI_CTRL_PRESET_VAL_ENABLE0x8000
>>
>


Re: [PATCH V2 3/7] mmc: sdhci: add ADMA2 64-bit addressing support for V4 mode

2018-07-03 Thread Chunyan Zhang
On 21 June 2018 at 21:20, Adrian Hunter  wrote:
> On 15/06/18 05:04, Chunyan Zhang wrote:
>> ADMA2 64-bit addressing support is divided into V3 mode and V4 mode.
>> So there are two kinds of descriptors for ADMA2 64-bit addressing
>> i.e. 96-bit Descriptor for V3 mode, and 128-bit Descriptor for V4
>> mode. 128-bit Descriptor is aligned to 8-byte.
>>
>> For V4 mode, ADMA2 64-bit addressing is enabled via Host Control 2
>> register.
>>
>> Signed-off-by: Chunyan Zhang 
>> ---
>>  drivers/mmc/host/sdhci.c | 50 
>> +++-
>>  drivers/mmc/host/sdhci.h | 23 +-
>>  2 files changed, 55 insertions(+), 18 deletions(-)
>>
>> diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
>> index f57201f..5d3b0d8 100644
>> --- a/drivers/mmc/host/sdhci.c
>> +++ b/drivers/mmc/host/sdhci.c
>> @@ -585,6 +585,8 @@ static void sdhci_adma_table_pre(struct sdhci_host *host,
>>   void *desc, *align;
>>   char *buffer;
>>   int len, offset, i;
>> + unsigned int adma2_align = SDHCI_ADMA2_ALIGN(host);
>> + unsigned int adma2_mask = SDHCI_ADMA2_MASK(host);
>>
>>   /*
>>* The spec does not specify endianness of descriptor table.
>> @@ -608,8 +610,8 @@ static void sdhci_adma_table_pre(struct sdhci_host *host,
>>* buffer for the (up to three) bytes that screw up the
>>* alignment.
>>*/
>> - offset = (SDHCI_ADMA2_ALIGN - (addr & SDHCI_ADMA2_MASK)) &
>> -  SDHCI_ADMA2_MASK;
>> + offset = (adma2_align - (addr & adma2_align)) &
>> +  adma2_mask;
>>   if (offset) {
>>   if (data->flags & MMC_DATA_WRITE) {
>>   buffer = sdhci_kmap_atomic(sg, );
>> @@ -623,8 +625,8 @@ static void sdhci_adma_table_pre(struct sdhci_host *host,
>>
>>   BUG_ON(offset > 65536);
>>
>> - align += SDHCI_ADMA2_ALIGN;
>> - align_addr += SDHCI_ADMA2_ALIGN;
>> + align += adma2_align;
>> + align_addr += adma2_align;
>>
>>   desc += host->desc_sz;
>>
>> @@ -668,13 +670,15 @@ static void sdhci_adma_table_post(struct sdhci_host 
>> *host,
>>   void *align;
>>   char *buffer;
>>   unsigned long flags;
>> + unsigned int adma2_align = SDHCI_ADMA2_ALIGN(host);
>> + unsigned int adma2_mask = SDHCI_ADMA2_MASK(host);
>>
>>   if (data->flags & MMC_DATA_READ) {
>>   bool has_unaligned = false;
>>
>>   /* Do a quick scan of the SG list for any unaligned mappings */
>>   for_each_sg(data->sg, sg, host->sg_count, i)
>> - if (sg_dma_address(sg) & SDHCI_ADMA2_MASK) {
>> + if (sg_dma_address(sg) & adma2_mask) {
>>   has_unaligned = true;
>>   break;
>>   }
>> @@ -686,15 +690,15 @@ static void sdhci_adma_table_post(struct sdhci_host 
>> *host,
>>   align = host->align_buffer;
>>
>>   for_each_sg(data->sg, sg, host->sg_count, i) {
>> - if (sg_dma_address(sg) & SDHCI_ADMA2_MASK) {
>> - size = SDHCI_ADMA2_ALIGN -
>> -(sg_dma_address(sg) & 
>> SDHCI_ADMA2_MASK);
>> + if (sg_dma_address(sg) & adma2_mask) {
>> + size = adma2_align -
>> +(sg_dma_address(sg) & 
>> adma2_mask);
>>
>>   buffer = sdhci_kmap_atomic(sg, );
>>   memcpy(buffer, align, size);
>>   sdhci_kunmap_atomic(buffer, );
>>
>> - align += SDHCI_ADMA2_ALIGN;
>> + align += adma2_align;
>>   }
>>   }
>>   }
>> @@ -3400,6 +3404,26 @@ static int sdhci_allocate_bounce_buffer(struct 
>> sdhci_host *host)
>>   return 0;
>>  }
>>
>> +static inline bool sdhci_use_64bit_dma(struct sdhci_host *host)
>> +{
>> + u32 addr64bit_en;
>> +
>> + /*
>> +  * According to SD Host Controller spec v4.10, bit[27] added from
>> +  * version 4.10 in Capabilities Register is used as 64-bit System
>> +  * Address support for V4 mode, 64-bit DMA Addressing for V4 mode
>> +  * is enabled only if 64-bit Addressing =1 in the Host Control 2
>> +  * register.
>> +  */
>> + if (host->version == SDHCI_SPEC_410 && host->v4_mode) {
>> + addr64bit_en = (sdhci_readw(host, SDHCI_HOST_CONTROL2) &
>> + SDHCI_CTRL_64BIT_ADDR);
>
> This seems the wrong way around.  SDHCI_CTRL_64BIT_ADDR should be set based
> on the driver's requirements, not read to determine 

Re: [PATCH V2 1/7] mmc: sdhci: add sd host v4 mode

2018-07-03 Thread Chunyan Zhang
On 21 June 2018 at 21:15, Adrian Hunter  wrote:
> On 21/06/18 14:14, Chunyan Zhang wrote:
>> On 21 June 2018 at 18:49, Adrian Hunter  wrote:
>>> On 15/06/18 05:04, Chunyan Zhang wrote:
 For SD host controller version 4.00 or later ones, there're two
 modes of implementation - Version 3.00 compatible mode or
 Version 4 mode.  This patch introduces a flag to record this.

 Signed-off-by: Chunyan Zhang 
 ---
  drivers/mmc/host/sdhci.c | 6 ++
  drivers/mmc/host/sdhci.h | 6 ++
  2 files changed, 12 insertions(+)

 diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
 index 2ededa7f..cf5695f 100644
 --- a/drivers/mmc/host/sdhci.c
 +++ b/drivers/mmc/host/sdhci.c
 @@ -3302,6 +3302,12 @@ void __sdhci_read_caps(struct sdhci_host *host, u16 
 *ver, u32 *caps, u32 *caps1)
   v = ver ? *ver : sdhci_readw(host, SDHCI_HOST_VERSION);
   host->version = (v & SDHCI_SPEC_VER_MASK) >> SDHCI_SPEC_VER_SHIFT;

 + if (host->version >= SDHCI_SPEC_400) {
 + if (sdhci_readw(host, SDHCI_HOST_CONTROL2) &
 + SDHCI_CTRL_V4_MODE)
 + host->v4_mode = true;
 + }
>>>
>>> At this point the host controller has just been reset which would mean it
>>> must be in version 3 compatibility mode, which would mean this code doesn't
>>> do anything.
>>
>> Why is it version 3 mode at this point?
>
> According to the specification, reset clears RW fields to zero.
>
>>
>> I've tested this code on the sd host controller which was introduced
>> in 6/7 in this patch-set, the result showed that it was v4_mode.
>> Moreover without this patch, the Spreadtrum's sdhci driver in patch
>> 6/7 couldn't work.
>>
>> Am I missing something here?
>
> It seems the Spreadtrum controller doesn't clear the "Host Version 4 Enable"
> bit upon software reset for all.
>
> Also this seems the wrong way around.  The driver should decide whether or
> not to use V4 mode and then the "Host Version 4 Enable" bit should be set
> accordingly.
>
> V4 has been around so long that we can't just enable all supporting hardware
> without risking the possibility it will break some platform.  So I suggest

Ok, understand.

> adding a function sdhci_enable_v4_mode() which is called during probe.

Ok, will do, that is more a safe way.

>
>>
>> Best,
>> Chunyan
>>
>>>
 +
   if (host->quirks & SDHCI_QUIRK_MISSING_CAPS)
   return;

 diff --git a/drivers/mmc/host/sdhci.h b/drivers/mmc/host/sdhci.h
 index c95b0a4..128b0ba 100644
 --- a/drivers/mmc/host/sdhci.h
 +++ b/drivers/mmc/host/sdhci.h
 @@ -184,6 +184,7 @@
  #define   SDHCI_CTRL_DRV_TYPE_D  0x0030
  #define  SDHCI_CTRL_EXEC_TUNING  0x0040
  #define  SDHCI_CTRL_TUNED_CLK0x0080
 +#define  SDHCI_CTRL_V4_MODE  0x1000
  #define  SDHCI_CTRL_PRESET_VAL_ENABLE0x8000

  #define SDHCI_CAPABILITIES   0x40
 @@ -270,6 +271,8 @@
  #define   SDHCI_SPEC_100 0
  #define   SDHCI_SPEC_200 1
  #define   SDHCI_SPEC_300 2
 +#define   SDHCI_SPEC_400 3
 +#define   SDHCI_SPEC_410 4

  /*
   * End of controller registers.
 @@ -551,6 +554,9 @@ struct sdhci_host {
   u32 sdma_boundary;

   unsigned long private[0] cacheline_aligned;
 +
 + /* Host Version 4 Enable */
 + boolv4_mode;
  };

  struct sdhci_ops {

>>>
>>
>


Re: [PATCH V2 1/7] mmc: sdhci: add sd host v4 mode

2018-07-03 Thread Chunyan Zhang
On 21 June 2018 at 21:15, Adrian Hunter  wrote:
> On 21/06/18 14:14, Chunyan Zhang wrote:
>> On 21 June 2018 at 18:49, Adrian Hunter  wrote:
>>> On 15/06/18 05:04, Chunyan Zhang wrote:
 For SD host controller version 4.00 or later ones, there're two
 modes of implementation - Version 3.00 compatible mode or
 Version 4 mode.  This patch introduces a flag to record this.

 Signed-off-by: Chunyan Zhang 
 ---
  drivers/mmc/host/sdhci.c | 6 ++
  drivers/mmc/host/sdhci.h | 6 ++
  2 files changed, 12 insertions(+)

 diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
 index 2ededa7f..cf5695f 100644
 --- a/drivers/mmc/host/sdhci.c
 +++ b/drivers/mmc/host/sdhci.c
 @@ -3302,6 +3302,12 @@ void __sdhci_read_caps(struct sdhci_host *host, u16 
 *ver, u32 *caps, u32 *caps1)
   v = ver ? *ver : sdhci_readw(host, SDHCI_HOST_VERSION);
   host->version = (v & SDHCI_SPEC_VER_MASK) >> SDHCI_SPEC_VER_SHIFT;

 + if (host->version >= SDHCI_SPEC_400) {
 + if (sdhci_readw(host, SDHCI_HOST_CONTROL2) &
 + SDHCI_CTRL_V4_MODE)
 + host->v4_mode = true;
 + }
>>>
>>> At this point the host controller has just been reset which would mean it
>>> must be in version 3 compatibility mode, which would mean this code doesn't
>>> do anything.
>>
>> Why is it version 3 mode at this point?
>
> According to the specification, reset clears RW fields to zero.
>
>>
>> I've tested this code on the sd host controller which was introduced
>> in 6/7 in this patch-set, the result showed that it was v4_mode.
>> Moreover without this patch, the Spreadtrum's sdhci driver in patch
>> 6/7 couldn't work.
>>
>> Am I missing something here?
>
> It seems the Spreadtrum controller doesn't clear the "Host Version 4 Enable"
> bit upon software reset for all.
>
> Also this seems the wrong way around.  The driver should decide whether or
> not to use V4 mode and then the "Host Version 4 Enable" bit should be set
> accordingly.
>
> V4 has been around so long that we can't just enable all supporting hardware
> without risking the possibility it will break some platform.  So I suggest

Ok, understand.

> adding a function sdhci_enable_v4_mode() which is called during probe.

Ok, will do, that is more a safe way.

>
>>
>> Best,
>> Chunyan
>>
>>>
 +
   if (host->quirks & SDHCI_QUIRK_MISSING_CAPS)
   return;

 diff --git a/drivers/mmc/host/sdhci.h b/drivers/mmc/host/sdhci.h
 index c95b0a4..128b0ba 100644
 --- a/drivers/mmc/host/sdhci.h
 +++ b/drivers/mmc/host/sdhci.h
 @@ -184,6 +184,7 @@
  #define   SDHCI_CTRL_DRV_TYPE_D  0x0030
  #define  SDHCI_CTRL_EXEC_TUNING  0x0040
  #define  SDHCI_CTRL_TUNED_CLK0x0080
 +#define  SDHCI_CTRL_V4_MODE  0x1000
  #define  SDHCI_CTRL_PRESET_VAL_ENABLE0x8000

  #define SDHCI_CAPABILITIES   0x40
 @@ -270,6 +271,8 @@
  #define   SDHCI_SPEC_100 0
  #define   SDHCI_SPEC_200 1
  #define   SDHCI_SPEC_300 2
 +#define   SDHCI_SPEC_400 3
 +#define   SDHCI_SPEC_410 4

  /*
   * End of controller registers.
 @@ -551,6 +554,9 @@ struct sdhci_host {
   u32 sdma_boundary;

   unsigned long private[0] cacheline_aligned;
 +
 + /* Host Version 4 Enable */
 + boolv4_mode;
  };

  struct sdhci_ops {

>>>
>>
>


Re: [PATCH V2 2/7] mmc: sdhci: made changes for System Address register of SDMA

2018-07-03 Thread Chunyan Zhang
On 21 June 2018 at 19:22, Adrian Hunter  wrote:
> On 15/06/18 05:04, Chunyan Zhang wrote:
>> According to the SD host controller specification version 4.10, when
>> Host Version 4 is enabled, SDMA uses ADMA System Address register
>> (05Fh-058h) instead of using SDMA System Address register to
>> support both 32-bit and 64-bit addressing.
>>
>> Signed-off-by: Chunyan Zhang 
>> ---
>>  drivers/mmc/host/sdhci.c | 10 --
>>  1 file changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
>> index cf5695f..f57201f 100644
>> --- a/drivers/mmc/host/sdhci.c
>> +++ b/drivers/mmc/host/sdhci.c
>> @@ -805,6 +805,7 @@ static void sdhci_set_timeout(struct sdhci_host *host, 
>> struct mmc_command *cmd)
>>  static void sdhci_prepare_data(struct sdhci_host *host, struct mmc_command 
>> *cmd)
>>  {
>>   u8 ctrl;
>> + u32 reg;
>
> reg could just be an int.

Why? Shouldn't addresses be unsigned?

>
>>   struct mmc_data *data = cmd->data;
>>
>>   if (sdhci_data_line_cmd(cmd))
>> @@ -894,8 +895,10 @@ static void sdhci_prepare_data(struct sdhci_host *host, 
>> struct mmc_command *cmd)
>>SDHCI_ADMA_ADDRESS_HI);
>>   } else {
>>   WARN_ON(sg_cnt != 1);
>> + reg = host->v4_mode ? SDHCI_ADMA_ADDRESS :
>> + SDHCI_DMA_ADDRESS;
>>   sdhci_writel(host, sdhci_sdma_address(host),
>> -  SDHCI_DMA_ADDRESS);
>> +  reg);
>
> Shouldn't we support 64-bit SDMA in version 4 mode?

I will address.

>
>
>>   }
>>   }
>>
>> @@ -2721,6 +2724,7 @@ static void sdhci_data_irq(struct sdhci_host *host, 
>> u32 intmask)
>>*/
>>   if (intmask & SDHCI_INT_DMA_END) {
>>   u32 dmastart, dmanow;
>> + u32 reg;
>>
>>   dmastart = sdhci_sdma_address(host);
>>   dmanow = dmastart + host->data->bytes_xfered;
>> @@ -2733,7 +2737,9 @@ static void sdhci_data_irq(struct sdhci_host *host, 
>> u32 intmask)
>>   host->data->bytes_xfered = dmanow - dmastart;
>>   DBG("DMA base 0x%08x, transferred 0x%06x bytes, next 
>> 0x%08x\n",
>>   dmastart, host->data->bytes_xfered, dmanow);
>> - sdhci_writel(host, dmanow, SDHCI_DMA_ADDRESS);
>> + reg = host->v4_mode ? SDHCI_ADMA_ADDRESS :
>> + SDHCI_DMA_ADDRESS;
>> + sdhci_writel(host, dmanow, reg);
>
> Shouldn't we support 64-bit SDMA in version 4 mode?
>
>>   }
>>
>>   if (intmask & SDHCI_INT_DATA_END) {
>>
>


Re: [PATCH V2 2/7] mmc: sdhci: made changes for System Address register of SDMA

2018-07-03 Thread Chunyan Zhang
On 21 June 2018 at 19:22, Adrian Hunter  wrote:
> On 15/06/18 05:04, Chunyan Zhang wrote:
>> According to the SD host controller specification version 4.10, when
>> Host Version 4 is enabled, SDMA uses ADMA System Address register
>> (05Fh-058h) instead of using SDMA System Address register to
>> support both 32-bit and 64-bit addressing.
>>
>> Signed-off-by: Chunyan Zhang 
>> ---
>>  drivers/mmc/host/sdhci.c | 10 --
>>  1 file changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
>> index cf5695f..f57201f 100644
>> --- a/drivers/mmc/host/sdhci.c
>> +++ b/drivers/mmc/host/sdhci.c
>> @@ -805,6 +805,7 @@ static void sdhci_set_timeout(struct sdhci_host *host, 
>> struct mmc_command *cmd)
>>  static void sdhci_prepare_data(struct sdhci_host *host, struct mmc_command 
>> *cmd)
>>  {
>>   u8 ctrl;
>> + u32 reg;
>
> reg could just be an int.

Why? Shouldn't addresses be unsigned?

>
>>   struct mmc_data *data = cmd->data;
>>
>>   if (sdhci_data_line_cmd(cmd))
>> @@ -894,8 +895,10 @@ static void sdhci_prepare_data(struct sdhci_host *host, 
>> struct mmc_command *cmd)
>>SDHCI_ADMA_ADDRESS_HI);
>>   } else {
>>   WARN_ON(sg_cnt != 1);
>> + reg = host->v4_mode ? SDHCI_ADMA_ADDRESS :
>> + SDHCI_DMA_ADDRESS;
>>   sdhci_writel(host, sdhci_sdma_address(host),
>> -  SDHCI_DMA_ADDRESS);
>> +  reg);
>
> Shouldn't we support 64-bit SDMA in version 4 mode?

I will address.

>
>
>>   }
>>   }
>>
>> @@ -2721,6 +2724,7 @@ static void sdhci_data_irq(struct sdhci_host *host, 
>> u32 intmask)
>>*/
>>   if (intmask & SDHCI_INT_DMA_END) {
>>   u32 dmastart, dmanow;
>> + u32 reg;
>>
>>   dmastart = sdhci_sdma_address(host);
>>   dmanow = dmastart + host->data->bytes_xfered;
>> @@ -2733,7 +2737,9 @@ static void sdhci_data_irq(struct sdhci_host *host, 
>> u32 intmask)
>>   host->data->bytes_xfered = dmanow - dmastart;
>>   DBG("DMA base 0x%08x, transferred 0x%06x bytes, next 
>> 0x%08x\n",
>>   dmastart, host->data->bytes_xfered, dmanow);
>> - sdhci_writel(host, dmanow, SDHCI_DMA_ADDRESS);
>> + reg = host->v4_mode ? SDHCI_ADMA_ADDRESS :
>> + SDHCI_DMA_ADDRESS;
>> + sdhci_writel(host, dmanow, reg);
>
> Shouldn't we support 64-bit SDMA in version 4 mode?
>
>>   }
>>
>>   if (intmask & SDHCI_INT_DATA_END) {
>>
>


Re: [PATCH -mm -v4 00/21] mm, THP, swap: Swapout/swapin THP in one piece

2018-07-03 Thread Huang, Ying
Sergey Senozhatsky  writes:

> On (07/04/18 10:20), Huang, Ying wrote:
>> > On (06/27/18 21:51), Andrew Morton wrote:
>> >> On Fri, 22 Jun 2018 11:51:30 +0800 "Huang, Ying"  
>> >> wrote:
>> >> 
>> >> > This is the final step of THP (Transparent Huge Page) swap
>> >> > optimization.  After the first and second step, the splitting huge
>> >> > page is delayed from almost the first step of swapout to after swapout
>> >> > has been finished.  In this step, we avoid splitting THP for swapout
>> >> > and swapout/swapin the THP in one piece.
>> >> 
>> >> It's a tremendously good performance improvement.  It's also a
>> >> tremendously large patchset :(
>> >
>> > Will zswap gain a THP swap out/in support at some point?
>> >
>> >
>> > mm/zswap.c: static int zswap_frontswap_store(...)
>> > ...
>> >/* THP isn't supported */
>> >if (PageTransHuge(page)) {
>> >ret = -EINVAL;
>> >goto reject;
>> >}
>> 
>> That's not on my TODO list.  Do you have interest to work on this?
>
> I'd say I'm interested. Can't promise that I'll have enough spare time
> any time soon, tho.

Thanks!

> The numbers you posted do look fantastic indeed, embedded devices
> [which normally use zswap/zram quite heavily] _probably_ should see
> some performance improvement as well once zswap [and may be zram] can
> handle THP.

Yes.  I think so too.

Best Regards,
Huang, Ying


Re: [PATCH -mm -v4 00/21] mm, THP, swap: Swapout/swapin THP in one piece

2018-07-03 Thread Huang, Ying
Sergey Senozhatsky  writes:

> On (07/04/18 10:20), Huang, Ying wrote:
>> > On (06/27/18 21:51), Andrew Morton wrote:
>> >> On Fri, 22 Jun 2018 11:51:30 +0800 "Huang, Ying"  
>> >> wrote:
>> >> 
>> >> > This is the final step of THP (Transparent Huge Page) swap
>> >> > optimization.  After the first and second step, the splitting huge
>> >> > page is delayed from almost the first step of swapout to after swapout
>> >> > has been finished.  In this step, we avoid splitting THP for swapout
>> >> > and swapout/swapin the THP in one piece.
>> >> 
>> >> It's a tremendously good performance improvement.  It's also a
>> >> tremendously large patchset :(
>> >
>> > Will zswap gain a THP swap out/in support at some point?
>> >
>> >
>> > mm/zswap.c: static int zswap_frontswap_store(...)
>> > ...
>> >/* THP isn't supported */
>> >if (PageTransHuge(page)) {
>> >ret = -EINVAL;
>> >goto reject;
>> >}
>> 
>> That's not on my TODO list.  Do you have interest to work on this?
>
> I'd say I'm interested. Can't promise that I'll have enough spare time
> any time soon, tho.

Thanks!

> The numbers you posted do look fantastic indeed, embedded devices
> [which normally use zswap/zram quite heavily] _probably_ should see
> some performance improvement as well once zswap [and may be zram] can
> handle THP.

Yes.  I think so too.

Best Regards,
Huang, Ying


Re: [PATCH v5 04/12] PM / devfreq: Add struct devfreq_policy

2018-07-03 Thread Chanwoo Choi
Hi,

On 2018년 07월 04일 08:46, Matthias Kaehlcke wrote:
> Move variables related with devfreq policy changes from struct devfreq
> to the new struct devfreq_policy and add a policy field to struct devfreq.
> 
> The following variables are moved:
> 
> df->min/max_freq   =>   p->user.min/max_freq
> df->scaling_min/max_freq   =>   p->devinfo.min/max_freq
> df->governor   =>   p->governor
> df->governor_name  =>   p->governor_name
> 
> Signed-off-by: Matthias Kaehlcke 
> Reviewed-by: Brian Norris 
> ---
> Changes in v5:
> - none
> 
> Changes in v4:
> - added 'Reviewed-by: Brian Norris ' tag
> 
> Changes in v3:
> - none
> 
> Changes in v2:
> - performance, powersave and simpleondemand governors don't need changes
>   with "PM / devfreq: Don't adjust to user limits in governors"
> - formatting fixes
> ---
>  drivers/devfreq/devfreq.c  | 137 -
>  drivers/devfreq/governor_passive.c |   4 +-
>  include/linux/devfreq.h|  38 +---
>  3 files changed, 103 insertions(+), 76 deletions(-)
> 

(skip)

>  
> diff --git a/drivers/devfreq/governor_passive.c 
> b/drivers/devfreq/governor_passive.c
> index 3bc29acbd54e..e0987c749ec2 100644
> --- a/drivers/devfreq/governor_passive.c
> +++ b/drivers/devfreq/governor_passive.c
> @@ -99,12 +99,12 @@ static int update_devfreq_passive(struct devfreq 
> *devfreq, unsigned long freq)
>  {
>   int ret;
>  
> - if (!devfreq->governor)
> + if (!devfreq->policy.governor)
>   return -EINVAL;
>  
>   mutex_lock_nested(>lock, SINGLE_DEPTH_NESTING);
>  
> - ret = devfreq->governor->get_target_freq(devfreq, );
> + ret = devfreq->policy.governor->get_target_freq(devfreq, );
>   if (ret < 0)
>   goto out;
>  
> diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
> index 3aae5b3af87c..9bf23b976f4d 100644
> --- a/include/linux/devfreq.h
> +++ b/include/linux/devfreq.h
> @@ -109,6 +109,30 @@ struct devfreq_dev_profile {
>   unsigned int max_state;
>  };
>  
> +/**
> + * struct devfreq_freq_limits - Devfreq frequency limits
> + * @min_freq:minimum frequency
> + * @max_freq:maximum frequency
> + */
> +struct devfreq_freq_limits {
> + unsigned long min_freq;
> + unsigned long max_freq;
> +};
> +
> +/**
> + * struct devfreq_policy - Devfreq policy
> + * @user:frequency limits requested by the user
> + * @devinfo: frequency limits of the device (available OPPs)
> + * @governor:method how to choose frequency based on the usage.

nitpick. remove '.' on the end of line.

> + * @governor_name:   devfreq governor name for use with this devfreq
> + */
> +struct devfreq_policy {
> + struct devfreq_freq_limits user;
> + struct devfreq_freq_limits devinfo;
> + const struct devfreq_governor *governor;
> + char governor_name[DEVFREQ_NAME_LEN];
> +};
> +
>  /**
>   * struct devfreq - Device devfreq structure
>   * @node:list node - contains the devices with devfreq that have been
> @@ -117,8 +141,6 @@ struct devfreq_dev_profile {
>   * @dev: device registered by devfreq class. dev.parent is the device
>   *   using devfreq.
>   * @profile: device-specific devfreq profile
> - * @governor:method how to choose frequency based on the usage.
> - * @governor_name:   devfreq governor name for use with this devfreq
>   * @nb:  notifier block used to notify devfreq object that it 
> should
>   *   reevaluate operable frequencies. Devfreq users may use
>   *   devfreq.nb to the corresponding register notifier call chain.
> @@ -126,10 +148,7 @@ struct devfreq_dev_profile {
>   * @previous_freq:   previously configured frequency value.
>   * @data:Private data of the governor. The devfreq framework does not
>   *   touch this.
> - * @min_freq:Limit minimum frequency requested by user (0: none)
> - * @max_freq:Limit maximum frequency requested by user (0: none)
> - * @scaling_min_freq:Limit minimum frequency requested by OPP 
> interface
> - * @scaling_max_freq:Limit maximum frequency requested by OPP 
> interface
> + * @policy:  Policy for frequency adjustments

The devfreq_policy contains the range of frequency and governor information.
But, this description focus on the frequency. You need to explain the more
correct description of 'policy'.

>   * @stop_polling: devfreq polling status of a device.
>   * @total_trans: Number of devfreq transitions
>   * @trans_table: Statistics of devfreq transitions
> @@ -151,8 +170,6 @@ struct devfreq {
>   struct mutex lock;
>   struct device dev;
>   struct devfreq_dev_profile *profile;
> - const struct devfreq_governor *governor;
> - char governor_name[DEVFREQ_NAME_LEN];
>   struct notifier_block nb;
>   struct delayed_work work;
>  
> @@ -161,10 +178,7 @@ struct devfreq {
>  
>   void *data; /* private data for governors */
>  
> - unsigned 

Re: [PATCH v5 04/12] PM / devfreq: Add struct devfreq_policy

2018-07-03 Thread Chanwoo Choi
Hi,

On 2018년 07월 04일 08:46, Matthias Kaehlcke wrote:
> Move variables related with devfreq policy changes from struct devfreq
> to the new struct devfreq_policy and add a policy field to struct devfreq.
> 
> The following variables are moved:
> 
> df->min/max_freq   =>   p->user.min/max_freq
> df->scaling_min/max_freq   =>   p->devinfo.min/max_freq
> df->governor   =>   p->governor
> df->governor_name  =>   p->governor_name
> 
> Signed-off-by: Matthias Kaehlcke 
> Reviewed-by: Brian Norris 
> ---
> Changes in v5:
> - none
> 
> Changes in v4:
> - added 'Reviewed-by: Brian Norris ' tag
> 
> Changes in v3:
> - none
> 
> Changes in v2:
> - performance, powersave and simpleondemand governors don't need changes
>   with "PM / devfreq: Don't adjust to user limits in governors"
> - formatting fixes
> ---
>  drivers/devfreq/devfreq.c  | 137 -
>  drivers/devfreq/governor_passive.c |   4 +-
>  include/linux/devfreq.h|  38 +---
>  3 files changed, 103 insertions(+), 76 deletions(-)
> 

(skip)

>  
> diff --git a/drivers/devfreq/governor_passive.c 
> b/drivers/devfreq/governor_passive.c
> index 3bc29acbd54e..e0987c749ec2 100644
> --- a/drivers/devfreq/governor_passive.c
> +++ b/drivers/devfreq/governor_passive.c
> @@ -99,12 +99,12 @@ static int update_devfreq_passive(struct devfreq 
> *devfreq, unsigned long freq)
>  {
>   int ret;
>  
> - if (!devfreq->governor)
> + if (!devfreq->policy.governor)
>   return -EINVAL;
>  
>   mutex_lock_nested(>lock, SINGLE_DEPTH_NESTING);
>  
> - ret = devfreq->governor->get_target_freq(devfreq, );
> + ret = devfreq->policy.governor->get_target_freq(devfreq, );
>   if (ret < 0)
>   goto out;
>  
> diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
> index 3aae5b3af87c..9bf23b976f4d 100644
> --- a/include/linux/devfreq.h
> +++ b/include/linux/devfreq.h
> @@ -109,6 +109,30 @@ struct devfreq_dev_profile {
>   unsigned int max_state;
>  };
>  
> +/**
> + * struct devfreq_freq_limits - Devfreq frequency limits
> + * @min_freq:minimum frequency
> + * @max_freq:maximum frequency
> + */
> +struct devfreq_freq_limits {
> + unsigned long min_freq;
> + unsigned long max_freq;
> +};
> +
> +/**
> + * struct devfreq_policy - Devfreq policy
> + * @user:frequency limits requested by the user
> + * @devinfo: frequency limits of the device (available OPPs)
> + * @governor:method how to choose frequency based on the usage.

nitpick. remove '.' on the end of line.

> + * @governor_name:   devfreq governor name for use with this devfreq
> + */
> +struct devfreq_policy {
> + struct devfreq_freq_limits user;
> + struct devfreq_freq_limits devinfo;
> + const struct devfreq_governor *governor;
> + char governor_name[DEVFREQ_NAME_LEN];
> +};
> +
>  /**
>   * struct devfreq - Device devfreq structure
>   * @node:list node - contains the devices with devfreq that have been
> @@ -117,8 +141,6 @@ struct devfreq_dev_profile {
>   * @dev: device registered by devfreq class. dev.parent is the device
>   *   using devfreq.
>   * @profile: device-specific devfreq profile
> - * @governor:method how to choose frequency based on the usage.
> - * @governor_name:   devfreq governor name for use with this devfreq
>   * @nb:  notifier block used to notify devfreq object that it 
> should
>   *   reevaluate operable frequencies. Devfreq users may use
>   *   devfreq.nb to the corresponding register notifier call chain.
> @@ -126,10 +148,7 @@ struct devfreq_dev_profile {
>   * @previous_freq:   previously configured frequency value.
>   * @data:Private data of the governor. The devfreq framework does not
>   *   touch this.
> - * @min_freq:Limit minimum frequency requested by user (0: none)
> - * @max_freq:Limit maximum frequency requested by user (0: none)
> - * @scaling_min_freq:Limit minimum frequency requested by OPP 
> interface
> - * @scaling_max_freq:Limit maximum frequency requested by OPP 
> interface
> + * @policy:  Policy for frequency adjustments

The devfreq_policy contains the range of frequency and governor information.
But, this description focus on the frequency. You need to explain the more
correct description of 'policy'.

>   * @stop_polling: devfreq polling status of a device.
>   * @total_trans: Number of devfreq transitions
>   * @trans_table: Statistics of devfreq transitions
> @@ -151,8 +170,6 @@ struct devfreq {
>   struct mutex lock;
>   struct device dev;
>   struct devfreq_dev_profile *profile;
> - const struct devfreq_governor *governor;
> - char governor_name[DEVFREQ_NAME_LEN];
>   struct notifier_block nb;
>   struct delayed_work work;
>  
> @@ -161,10 +178,7 @@ struct devfreq {
>  
>   void *data; /* private data for governors */
>  
> - unsigned 

Re: [PATCH v2] x86/mce: add CMCI support for Centaur CPUs

2018-07-03 Thread David Wang



> -Original Mail-
>  Sender: Borislav Petkov [mailto:b...@alien8.de]
>  Time: 2018年6月26日 22:30
>  Receiver: David Wang 
> CC: tony.l...@intel.com; mi...@redhat.com; t...@linutronix.de;
> h...@zytor.com; x...@kernel.org; linux-kernel@vger.kernel.org;
> linux-e...@vger.kernel.org; cooper...@zhaoxin.com;
> qiyuanw...@zhaoxin.com; benjamin...@viatech.com;
> luke...@viacpu.com; tim...@zhaoxin.com
> Topic : Re: [PATCH v2] x86/mce: add CMCI support for Centaur CPUs
> 
> On Mon, Jun 04, 2018 at 10:37:33AM +0800, David Wang wrote:
> > New Centaur CPU support CMCI mechanism, which is compatible with
> INTEL CMCI.
> >
> > Signed-off-by: David Wang 
> 
> ...
> 
> > diff --git a/arch/x86/kernel/cpu/mcheck/mce.c
> > b/arch/x86/kernel/cpu/mcheck/mce.c
> > index cd76380..2ebafc7 100644
> > --- a/arch/x86/kernel/cpu/mcheck/mce.c
> > +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> > @@ -1727,6 +1727,7 @@ static void __mcheck_cpu_init_early(struct
> cpuinfo_x86 *c)
> > }
> >  }
> >
> > +#ifdef CONFIG_X86_MCE_CENTAUR
> >  static void mce_centaur_feature_init(struct cpuinfo_x86 *c)  {
> > struct mca_config *cfg = _cfg;
> > @@ -1740,7 +1741,12 @@ static void mce_centaur_feature_init(struct
> cpuinfo_x86 *c)
> > if (cfg->monarch_timeout < 0)
> > cfg->monarch_timeout = USEC_PER_SEC;
> > }
> > +   mce_intel_feature_init(c);
> > +   mce_adjust_timer = cmci_intel_adjust_timer;
> 
> This ...
> 
> >  }
> > +#else
> > +static inline void mce_centaur_feature_init(struct cpuinfo_x86 *c) {
> > +} #endif
> >
> >  static void __mcheck_cpu_init_vendor(struct cpuinfo_x86 *c)  { diff
> > --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c
> > b/arch/x86/kernel/cpu/mcheck/mce_intel.c
> > index d05be30..5b1b68f 100644
> > --- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
> > +++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
> > @@ -85,7 +85,8 @@ static int cmci_supported(int *banks)
> >  * initialization is vendor keyed and this
> >  * makes sure none of the backdoors are entered otherwise.
> >  */
> > -   if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
> > +   if ((boot_cpu_data.x86_vendor != X86_VENDOR_INTEL &&
> > +   boot_cpu_data.x86_vendor != X86_VENDOR_CENTAUR))
> > return 0;
> > if (!boot_cpu_has(X86_FEATURE_APIC) || lapic_get_maxlvt() < 6)
> > return 0;
> > @@ -506,10 +507,20 @@ static void intel_ppin_init(struct cpuinfo_x86
> > *c)
> >
> >  void mce_intel_feature_init(struct cpuinfo_x86 *c)  {
> > -   intel_init_thermal(c);
> > -   intel_init_cmci();
> > -   intel_init_lmce();
> > -   intel_ppin_init(c);
> > +
> > +   switch (c->x86_vendor) {
> > +   case X86_VENDOR_INTEL:
> > +   intel_init_thermal(c);
> > +   intel_init_cmci();
> > +   intel_init_lmce();
> > +   intel_ppin_init(c);
> > +   break;
> > +   case X86_VENDOR_CENTAUR:
> > +   intel_init_cmci();
> 
> ... and this I really don't like for the simple reason that if the Intel side 
> gets
> changed, it could potentially break Centaur. And we don't want that. And
> the vendor should be free to change their code without asking another
> vendor for permission even if the other vendor is almost copying the
> code...
> 
> Long story short, I think you should extract the facilities you're going to
> need into generic, library-like ones and call them from centaur-specific
> compilation units which get enabled when CPU_SUP_CENTAUR is enabled.
> 
> So that the code can still be shared but there's no dependency on other
> vendors and so that one vendor doesn't break the other one and
> vice-versa.
> 
> IMO.
> 
> Thx.
> 
OK. I will adjust code.
Thank you.
> --
> Regards/Gruss,
> Boris.
> 
> Good mailing practices for 400: avoid top-posting and trim the reply.



Re: [PATCH v2] x86/mce: add CMCI support for Centaur CPUs

2018-07-03 Thread David Wang



> -Original Mail-
>  Sender: Borislav Petkov [mailto:b...@alien8.de]
>  Time: 2018年6月26日 22:30
>  Receiver: David Wang 
> CC: tony.l...@intel.com; mi...@redhat.com; t...@linutronix.de;
> h...@zytor.com; x...@kernel.org; linux-kernel@vger.kernel.org;
> linux-e...@vger.kernel.org; cooper...@zhaoxin.com;
> qiyuanw...@zhaoxin.com; benjamin...@viatech.com;
> luke...@viacpu.com; tim...@zhaoxin.com
> Topic : Re: [PATCH v2] x86/mce: add CMCI support for Centaur CPUs
> 
> On Mon, Jun 04, 2018 at 10:37:33AM +0800, David Wang wrote:
> > New Centaur CPU support CMCI mechanism, which is compatible with
> INTEL CMCI.
> >
> > Signed-off-by: David Wang 
> 
> ...
> 
> > diff --git a/arch/x86/kernel/cpu/mcheck/mce.c
> > b/arch/x86/kernel/cpu/mcheck/mce.c
> > index cd76380..2ebafc7 100644
> > --- a/arch/x86/kernel/cpu/mcheck/mce.c
> > +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> > @@ -1727,6 +1727,7 @@ static void __mcheck_cpu_init_early(struct
> cpuinfo_x86 *c)
> > }
> >  }
> >
> > +#ifdef CONFIG_X86_MCE_CENTAUR
> >  static void mce_centaur_feature_init(struct cpuinfo_x86 *c)  {
> > struct mca_config *cfg = _cfg;
> > @@ -1740,7 +1741,12 @@ static void mce_centaur_feature_init(struct
> cpuinfo_x86 *c)
> > if (cfg->monarch_timeout < 0)
> > cfg->monarch_timeout = USEC_PER_SEC;
> > }
> > +   mce_intel_feature_init(c);
> > +   mce_adjust_timer = cmci_intel_adjust_timer;
> 
> This ...
> 
> >  }
> > +#else
> > +static inline void mce_centaur_feature_init(struct cpuinfo_x86 *c) {
> > +} #endif
> >
> >  static void __mcheck_cpu_init_vendor(struct cpuinfo_x86 *c)  { diff
> > --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c
> > b/arch/x86/kernel/cpu/mcheck/mce_intel.c
> > index d05be30..5b1b68f 100644
> > --- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
> > +++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
> > @@ -85,7 +85,8 @@ static int cmci_supported(int *banks)
> >  * initialization is vendor keyed and this
> >  * makes sure none of the backdoors are entered otherwise.
> >  */
> > -   if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
> > +   if ((boot_cpu_data.x86_vendor != X86_VENDOR_INTEL &&
> > +   boot_cpu_data.x86_vendor != X86_VENDOR_CENTAUR))
> > return 0;
> > if (!boot_cpu_has(X86_FEATURE_APIC) || lapic_get_maxlvt() < 6)
> > return 0;
> > @@ -506,10 +507,20 @@ static void intel_ppin_init(struct cpuinfo_x86
> > *c)
> >
> >  void mce_intel_feature_init(struct cpuinfo_x86 *c)  {
> > -   intel_init_thermal(c);
> > -   intel_init_cmci();
> > -   intel_init_lmce();
> > -   intel_ppin_init(c);
> > +
> > +   switch (c->x86_vendor) {
> > +   case X86_VENDOR_INTEL:
> > +   intel_init_thermal(c);
> > +   intel_init_cmci();
> > +   intel_init_lmce();
> > +   intel_ppin_init(c);
> > +   break;
> > +   case X86_VENDOR_CENTAUR:
> > +   intel_init_cmci();
> 
> ... and this I really don't like for the simple reason that if the Intel side 
> gets
> changed, it could potentially break Centaur. And we don't want that. And
> the vendor should be free to change their code without asking another
> vendor for permission even if the other vendor is almost copying the
> code...
> 
> Long story short, I think you should extract the facilities you're going to
> need into generic, library-like ones and call them from centaur-specific
> compilation units which get enabled when CPU_SUP_CENTAUR is enabled.
> 
> So that the code can still be shared but there's no dependency on other
> vendors and so that one vendor doesn't break the other one and
> vice-versa.
> 
> IMO.
> 
> Thx.
> 
OK. I will adjust code.
Thank you.
> --
> Regards/Gruss,
> Boris.
> 
> Good mailing practices for 400: avoid top-posting and trim the reply.



Re: [PATCH v2] leds: ledtrig-morse: send out morse code

2018-07-03 Thread Willy Tarreau
On Tue, Jul 03, 2018 at 09:43:06PM +0300, Andy Shevchenko wrote:
> > +struct morse_char {
> > +   charc;
> > +   char*z;
> > +};
> > +
> 
> > +static struct morse_char morse_table[] = {
> 
> const ?
> 
> > +   {'a', ".-"},
> > +   {'b', "-..."},
> > +   {'c', "-.-."},
> > +   {'d', "-.."},
> > +   {'e', "."},
> > +   {'f', "..-."},
> > +   {'g', "--."},
> > +   {'h', ""},
> > +   {'i', ".."},
> > +   {'j', ".---"},
> > +   {'k', "-.-"},
> > +   {'l', ".-.."},
> > +   {'m', "--"},
> > +   {'n', "-."},
> > +   {'o', "---"},
> > +   {'p', ".--."},
> > +   {'q', "--.-"},
> > +   {'r', ".-."},
> > +   {'s', "..."},
> > +   {'t', "-"},
> > +   {'u', "..-"},
> > +   {'v', "...-"},
> > +   {'w', ".--"},
> > +   {'x', "-..-"},
> > +   {'y', "-.--"},
> > +   {'z', "--.."},
> > +   {'1', "."},
> > +   {'2', "..---"},
> > +   {'3', "...--"},
> > +   {'4', "-"},
> > +   {'5', "."},
> > +   {'6', "-"},
> > +   {'7', "--..."},
> > +   {'8', "---.."},
> > +   {'9', "."},
> > +   {'0', "-"},
> 
> Do you expect this to be changed somehow?
> Otherwise we might just to keep two char arrays of alphas and digits
> in an order of ascii appearance.
> 
> In the code something like
> 
> ch = tolower(x);
> if (isalpha(ch))
>  code = alphas[ch - 'a'];
> else if (isdigit(ch))
>  code = digits[ch - '0'];
> else
>  code = unknown;
> 
> > +   {0, NULL},
> 
> And this will gone, you just provide it with known size,

Well, in this case it's even possible to go further and avoid storing
36 strings. Indeed, no representation is longer than 5 symbols, so you
can use 5 bits for the encoding (0=".", 1="-") and 3 bits for the
length, it gives you a single byte per character instead of a pointer
to a string plus 6 chars. Then in order to make it readable, 5 macros
can be provided to emit the code :

#define MORSE1(a,b)   (1 | ((a)<<3))
#define MORSE2(a,b)   (2 | ((a)<<3)|((b)<<4))
#define MORSE3(a,b,c) (3 | ((a)<<3)|((b)<<4)|((c)<<5))
#define MORSE4(a,b,c,d)   (4 | ((a)<<3)|((b)<<4)|((c)<<5)|((d)<<6))
#define MORSE5(a,b,c,d,e) (5 | ((a)<<3)|((b)<<4)|((c)<<5)|((d)<<6)|((e)<<7))

Then all chars may be defined like this :

['a'] = MORSE2(0,1),
['b'] = MORSE4(1,0,0,0),
['c'] = MORSE4(1,0,1,0),
['d'] = MORSE3(1,0,0),
['e'] = MORSE1(0),
...

and when processing these :

code = morse_table[tolower(c)];
code_len = code & 7;
code >>= 3;

while (code_len) {
   if (code & 1)
   emit_long();
   else
   emit_short();
   code >>= 1;
   code_len--;
}

In this case it could even cover the whole ASCII table at once since it's
not certain that the saved bytes compensate for the extra code and alignment
used to save them :-)

Note that I'm not suggesting that it is required to proceed like this, but
I think it makes the whole code more compact, which aligns with the purpose
of focusing on embedded devices.

Cheers,
Willy


Re: [PATCH v2] leds: ledtrig-morse: send out morse code

2018-07-03 Thread Willy Tarreau
On Tue, Jul 03, 2018 at 09:43:06PM +0300, Andy Shevchenko wrote:
> > +struct morse_char {
> > +   charc;
> > +   char*z;
> > +};
> > +
> 
> > +static struct morse_char morse_table[] = {
> 
> const ?
> 
> > +   {'a', ".-"},
> > +   {'b', "-..."},
> > +   {'c', "-.-."},
> > +   {'d', "-.."},
> > +   {'e', "."},
> > +   {'f', "..-."},
> > +   {'g', "--."},
> > +   {'h', ""},
> > +   {'i', ".."},
> > +   {'j', ".---"},
> > +   {'k', "-.-"},
> > +   {'l', ".-.."},
> > +   {'m', "--"},
> > +   {'n', "-."},
> > +   {'o', "---"},
> > +   {'p', ".--."},
> > +   {'q', "--.-"},
> > +   {'r', ".-."},
> > +   {'s', "..."},
> > +   {'t', "-"},
> > +   {'u', "..-"},
> > +   {'v', "...-"},
> > +   {'w', ".--"},
> > +   {'x', "-..-"},
> > +   {'y', "-.--"},
> > +   {'z', "--.."},
> > +   {'1', "."},
> > +   {'2', "..---"},
> > +   {'3', "...--"},
> > +   {'4', "-"},
> > +   {'5', "."},
> > +   {'6', "-"},
> > +   {'7', "--..."},
> > +   {'8', "---.."},
> > +   {'9', "."},
> > +   {'0', "-"},
> 
> Do you expect this to be changed somehow?
> Otherwise we might just to keep two char arrays of alphas and digits
> in an order of ascii appearance.
> 
> In the code something like
> 
> ch = tolower(x);
> if (isalpha(ch))
>  code = alphas[ch - 'a'];
> else if (isdigit(ch))
>  code = digits[ch - '0'];
> else
>  code = unknown;
> 
> > +   {0, NULL},
> 
> And this will gone, you just provide it with known size,

Well, in this case it's even possible to go further and avoid storing
36 strings. Indeed, no representation is longer than 5 symbols, so you
can use 5 bits for the encoding (0=".", 1="-") and 3 bits for the
length, it gives you a single byte per character instead of a pointer
to a string plus 6 chars. Then in order to make it readable, 5 macros
can be provided to emit the code :

#define MORSE1(a,b)   (1 | ((a)<<3))
#define MORSE2(a,b)   (2 | ((a)<<3)|((b)<<4))
#define MORSE3(a,b,c) (3 | ((a)<<3)|((b)<<4)|((c)<<5))
#define MORSE4(a,b,c,d)   (4 | ((a)<<3)|((b)<<4)|((c)<<5)|((d)<<6))
#define MORSE5(a,b,c,d,e) (5 | ((a)<<3)|((b)<<4)|((c)<<5)|((d)<<6)|((e)<<7))

Then all chars may be defined like this :

['a'] = MORSE2(0,1),
['b'] = MORSE4(1,0,0,0),
['c'] = MORSE4(1,0,1,0),
['d'] = MORSE3(1,0,0),
['e'] = MORSE1(0),
...

and when processing these :

code = morse_table[tolower(c)];
code_len = code & 7;
code >>= 3;

while (code_len) {
   if (code & 1)
   emit_long();
   else
   emit_short();
   code >>= 1;
   code_len--;
}

In this case it could even cover the whole ASCII table at once since it's
not certain that the saved bytes compensate for the extra code and alignment
used to save them :-)

Note that I'm not suggesting that it is required to proceed like this, but
I think it makes the whole code more compact, which aligns with the purpose
of focusing on embedded devices.

Cheers,
Willy


Re: [PATCH v22 4/4] soc: mediatek: Add Mediatek CMDQ helper

2018-07-03 Thread CK Hu
Hi, Houlong:

On Wed, 2018-07-04 at 08:47 +0800, houlong wei wrote:
> On Fri, 2018-06-29 at 17:22 +0800, CK Hu wrote:
> > Hi, Houlong:
> > 
> > On Fri, 2018-06-29 at 07:32 +0800, houlong wei wrote:
> > > On Thu, 2018-06-28 at 18:41 +0800, CK Hu wrote:
> > > > Hi, Houlong:
> > > > 
> > > > Some inline comment.
> > > > 
> > > > On Wed, 2018-06-27 at 19:16 +0800, Houlong Wei wrote:
> > > > > Add Mediatek CMDQ helper to create CMDQ packet and assemble GCE op 
> > > > > code.
> > > > > 
> > > > > Signed-off-by: Houlong Wei 
> > > > > Signed-off-by: HS Liao 
> > > > > ---
> > > > >  drivers/soc/mediatek/Kconfig   |   12 ++
> > > > >  drivers/soc/mediatek/Makefile  |1 +
> > > > >  drivers/soc/mediatek/mtk-cmdq-helper.c |  258 
> > > > > 
> > > > >  include/linux/soc/mediatek/mtk-cmdq.h  |  132 
> > > > >  4 files changed, 403 insertions(+)
> > > > >  create mode 100644 drivers/soc/mediatek/mtk-cmdq-helper.c
> > > > >  create mode 100644 include/linux/soc/mediatek/mtk-cmdq.h
> > > > > 
> > > > > diff --git a/drivers/soc/mediatek/Kconfig 
> > > > > b/drivers/soc/mediatek/Kconfig
> > > > > index a7d0667..17bd759 100644
> > > > > --- a/drivers/soc/mediatek/Kconfig
> > > > > +++ b/drivers/soc/mediatek/Kconfig
> > > > > @@ -4,6 +4,18 @@
> > > > >  menu "MediaTek SoC drivers"
> > > > >   depends on ARCH_MEDIATEK || COMPILE_TEST
> > > > >  
> > > > 
> > 
> > [...]
> > 
> > > > > +
> > > > > +static int cmdq_pkt_finalize(struct cmdq_pkt *pkt)
> > > > > +{
> > > > > + int err;
> > > > > +
> > > > > + if (cmdq_pkt_is_finalized(pkt))
> > > > > + return 0;
> > > > > +
> > > > > + /* insert EOC and generate IRQ for each command iteration */
> > > > > + err = cmdq_pkt_append_command(pkt, CMDQ_CODE_EOC, 0, 
> > > > > CMDQ_EOC_IRQ_EN);
> > > > > + if (err < 0)
> > > > > + return err;
> > > > > +
> > > > > + /* JUMP to end */
> > > > > + err = cmdq_pkt_append_command(pkt, CMDQ_CODE_JUMP, 0, 
> > > > > CMDQ_JUMP_PASS);
> > > > > + if (err < 0)
> > > > > + return err;
> > > > > +
> > > > > + return 0;
> > > > > +}
> > > > > +
> > > > > +int cmdq_pkt_flush_async(struct cmdq_client *client, struct cmdq_pkt 
> > > > > *pkt,
> > > > > +  cmdq_async_flush_cb cb, void *data)
> > > > > +{
> > > > > + int err;
> > > > > + struct device *dev;
> > > > > + dma_addr_t dma_addr;
> > > > > +
> > > > > + err = cmdq_pkt_finalize(pkt);
> > > > > + if (err < 0)
> > > > > + return err;
> > > > > +
> > > > > + dev = client->chan->mbox->dev;
> > > > > + dma_addr = dma_map_single(dev, pkt->va_base, pkt->cmd_buf_size,
> > > > > + DMA_TO_DEVICE);
> > > > 
> > > > You map here, but I could not find unmap, so the unmap should be done in
> > > > client driver. I would prefer a symmetric map/unmap which means that
> > > > both map and unmap are done in client driver. I think you put map here
> > > > because you should map after finalize. 
> > > 
> > > The unmap is done before callback to client, in function
> > > cmdq_task_exec_done, mtk-cmdq-mailbox.c.
> > 
> > You put unmap in mailbox controller, and map here (here would be mailbox
> > client), so this is not symmetric. If the code is asymmetric, it's easy
> > to cause bug and not easy to maintain. So I would like move both
> > map/unmap to client driver.
> > 
> 
> Since map/unmap is common code for client drivers, can we move unmap to
> CMDQ helper and do not put in client driver?
> 
> > > 
> > > > Therefore, export
> > > > cmdq_pkt_finalize() to client driver and let client do finalize, so
> > > > there is no finalize in flush function. This method have a benefit that
> > > > if client reuse command buffer, it need not to map/unmap frequently.
> > > 
> > > If client reuse command buffer or cmdq_pkt(command queue packet), client
> > > will add new commands to the cmdq_pkt, so map/unmap are necessary for
> > > each cmdq_pkt flush.
> > 
> > If the buffer size is 4K bytes, client driver could map the whole 4K at
> > initialization. Before it write new command, it call
> > dma_sync_single_for_cpu(), after it write new command, it call
> > dma_sync_single_for_device(). And then it could flush this buffer to
> > mailbox controller. So client driver just call dma sync function when it
> > reuse the command buffer. dma sync function is much lightweight then dma
> > map because map would search the memory area which could be mapped.
> > 
> > Regards,
> > CK
> 
> Maybe we can do dma map/unmap/sync in cmdq helper, and make client
> driver simple.
> 

Why are map/unmap common code for client drivers? I've mentioned that
some client driver may just call dma sync function before flush, so it
does not map for every flush. Frequently map/unmap has some drawback:

1. Waste CPU resource: this also waste power.
2. Risk of mapping fail: to reduce this risk, client driver could map at
initialization.

I think 

Re: [PATCH v22 4/4] soc: mediatek: Add Mediatek CMDQ helper

2018-07-03 Thread CK Hu
Hi, Houlong:

On Wed, 2018-07-04 at 08:47 +0800, houlong wei wrote:
> On Fri, 2018-06-29 at 17:22 +0800, CK Hu wrote:
> > Hi, Houlong:
> > 
> > On Fri, 2018-06-29 at 07:32 +0800, houlong wei wrote:
> > > On Thu, 2018-06-28 at 18:41 +0800, CK Hu wrote:
> > > > Hi, Houlong:
> > > > 
> > > > Some inline comment.
> > > > 
> > > > On Wed, 2018-06-27 at 19:16 +0800, Houlong Wei wrote:
> > > > > Add Mediatek CMDQ helper to create CMDQ packet and assemble GCE op 
> > > > > code.
> > > > > 
> > > > > Signed-off-by: Houlong Wei 
> > > > > Signed-off-by: HS Liao 
> > > > > ---
> > > > >  drivers/soc/mediatek/Kconfig   |   12 ++
> > > > >  drivers/soc/mediatek/Makefile  |1 +
> > > > >  drivers/soc/mediatek/mtk-cmdq-helper.c |  258 
> > > > > 
> > > > >  include/linux/soc/mediatek/mtk-cmdq.h  |  132 
> > > > >  4 files changed, 403 insertions(+)
> > > > >  create mode 100644 drivers/soc/mediatek/mtk-cmdq-helper.c
> > > > >  create mode 100644 include/linux/soc/mediatek/mtk-cmdq.h
> > > > > 
> > > > > diff --git a/drivers/soc/mediatek/Kconfig 
> > > > > b/drivers/soc/mediatek/Kconfig
> > > > > index a7d0667..17bd759 100644
> > > > > --- a/drivers/soc/mediatek/Kconfig
> > > > > +++ b/drivers/soc/mediatek/Kconfig
> > > > > @@ -4,6 +4,18 @@
> > > > >  menu "MediaTek SoC drivers"
> > > > >   depends on ARCH_MEDIATEK || COMPILE_TEST
> > > > >  
> > > > 
> > 
> > [...]
> > 
> > > > > +
> > > > > +static int cmdq_pkt_finalize(struct cmdq_pkt *pkt)
> > > > > +{
> > > > > + int err;
> > > > > +
> > > > > + if (cmdq_pkt_is_finalized(pkt))
> > > > > + return 0;
> > > > > +
> > > > > + /* insert EOC and generate IRQ for each command iteration */
> > > > > + err = cmdq_pkt_append_command(pkt, CMDQ_CODE_EOC, 0, 
> > > > > CMDQ_EOC_IRQ_EN);
> > > > > + if (err < 0)
> > > > > + return err;
> > > > > +
> > > > > + /* JUMP to end */
> > > > > + err = cmdq_pkt_append_command(pkt, CMDQ_CODE_JUMP, 0, 
> > > > > CMDQ_JUMP_PASS);
> > > > > + if (err < 0)
> > > > > + return err;
> > > > > +
> > > > > + return 0;
> > > > > +}
> > > > > +
> > > > > +int cmdq_pkt_flush_async(struct cmdq_client *client, struct cmdq_pkt 
> > > > > *pkt,
> > > > > +  cmdq_async_flush_cb cb, void *data)
> > > > > +{
> > > > > + int err;
> > > > > + struct device *dev;
> > > > > + dma_addr_t dma_addr;
> > > > > +
> > > > > + err = cmdq_pkt_finalize(pkt);
> > > > > + if (err < 0)
> > > > > + return err;
> > > > > +
> > > > > + dev = client->chan->mbox->dev;
> > > > > + dma_addr = dma_map_single(dev, pkt->va_base, pkt->cmd_buf_size,
> > > > > + DMA_TO_DEVICE);
> > > > 
> > > > You map here, but I could not find unmap, so the unmap should be done in
> > > > client driver. I would prefer a symmetric map/unmap which means that
> > > > both map and unmap are done in client driver. I think you put map here
> > > > because you should map after finalize. 
> > > 
> > > The unmap is done before callback to client, in function
> > > cmdq_task_exec_done, mtk-cmdq-mailbox.c.
> > 
> > You put unmap in mailbox controller, and map here (here would be mailbox
> > client), so this is not symmetric. If the code is asymmetric, it's easy
> > to cause bug and not easy to maintain. So I would like move both
> > map/unmap to client driver.
> > 
> 
> Since map/unmap is common code for client drivers, can we move unmap to
> CMDQ helper and do not put in client driver?
> 
> > > 
> > > > Therefore, export
> > > > cmdq_pkt_finalize() to client driver and let client do finalize, so
> > > > there is no finalize in flush function. This method have a benefit that
> > > > if client reuse command buffer, it need not to map/unmap frequently.
> > > 
> > > If client reuse command buffer or cmdq_pkt(command queue packet), client
> > > will add new commands to the cmdq_pkt, so map/unmap are necessary for
> > > each cmdq_pkt flush.
> > 
> > If the buffer size is 4K bytes, client driver could map the whole 4K at
> > initialization. Before it write new command, it call
> > dma_sync_single_for_cpu(), after it write new command, it call
> > dma_sync_single_for_device(). And then it could flush this buffer to
> > mailbox controller. So client driver just call dma sync function when it
> > reuse the command buffer. dma sync function is much lightweight then dma
> > map because map would search the memory area which could be mapped.
> > 
> > Regards,
> > CK
> 
> Maybe we can do dma map/unmap/sync in cmdq helper, and make client
> driver simple.
> 

Why are map/unmap common code for client drivers? I've mentioned that
some client driver may just call dma sync function before flush, so it
does not map for every flush. Frequently map/unmap has some drawback:

1. Waste CPU resource: this also waste power.
2. Risk of mapping fail: to reduce this risk, client driver could map at
initialization.

I think 

Re: [PATCH v5 03/12] PM / devfreq: Don't adjust to user limits in governors

2018-07-03 Thread Chanwoo Choi
Hi Matthias,

On 2018년 07월 04일 08:46, Matthias Kaehlcke wrote:
> Several governors use the user space limits df->min/max_freq to adjust
> the target frequency. This is not necessary, since update_devfreq()
> already takes care of this. Instead the governor can request the available
> min/max frequency by setting the target frequency to DEVFREQ_MIN/MAX_FREQ
> and let update_devfreq() take care of any adjustments.
> 
> Signed-off-by: Matthias Kaehlcke 
> Reviewed-by: Brian Norris 
> ---
> Changes in v5:
> - none
> 
> Changes in v4:
> - added 'Reviewed-by: Brian Norris ' tag
> 
> Changes in v3:
> - none
> 
> Changes in v2:
> - squashed "PM / devfreq: Remove redundant frequency adjustment from 
> governors"
>   and "PM / devfreq: governors: Return device frequency limits instead of user
>   limits"
> - updated subject and commit message
> - use DEVFREQ_MIN/MAX_FREQ instead of df->scaling_min/max_freq
> ---
>  drivers/devfreq/governor.h|  3 +++
>  drivers/devfreq/governor_performance.c|  5 +
>  drivers/devfreq/governor_powersave.c  |  2 +-
>  drivers/devfreq/governor_simpleondemand.c | 12 +++-
>  drivers/devfreq/governor_userspace.c  | 16 
>  5 files changed, 12 insertions(+), 26 deletions(-)

Actually, I preferred to use 'df->scaling_min/max_freq'
instead of DEVFREQ_MIN/MAX_FREQ. But, DEVFREQ_MIN/MAX_FREQ is other way. 

So, Looks good to me.
Reviewed-by: Chanwoo Choi 

[snip]

-- 
Best Regards,
Chanwoo Choi
Samsung Electronics


Re: [PATCH v5 03/12] PM / devfreq: Don't adjust to user limits in governors

2018-07-03 Thread Chanwoo Choi
Hi Matthias,

On 2018년 07월 04일 08:46, Matthias Kaehlcke wrote:
> Several governors use the user space limits df->min/max_freq to adjust
> the target frequency. This is not necessary, since update_devfreq()
> already takes care of this. Instead the governor can request the available
> min/max frequency by setting the target frequency to DEVFREQ_MIN/MAX_FREQ
> and let update_devfreq() take care of any adjustments.
> 
> Signed-off-by: Matthias Kaehlcke 
> Reviewed-by: Brian Norris 
> ---
> Changes in v5:
> - none
> 
> Changes in v4:
> - added 'Reviewed-by: Brian Norris ' tag
> 
> Changes in v3:
> - none
> 
> Changes in v2:
> - squashed "PM / devfreq: Remove redundant frequency adjustment from 
> governors"
>   and "PM / devfreq: governors: Return device frequency limits instead of user
>   limits"
> - updated subject and commit message
> - use DEVFREQ_MIN/MAX_FREQ instead of df->scaling_min/max_freq
> ---
>  drivers/devfreq/governor.h|  3 +++
>  drivers/devfreq/governor_performance.c|  5 +
>  drivers/devfreq/governor_powersave.c  |  2 +-
>  drivers/devfreq/governor_simpleondemand.c | 12 +++-
>  drivers/devfreq/governor_userspace.c  | 16 
>  5 files changed, 12 insertions(+), 26 deletions(-)

Actually, I preferred to use 'df->scaling_min/max_freq'
instead of DEVFREQ_MIN/MAX_FREQ. But, DEVFREQ_MIN/MAX_FREQ is other way. 

So, Looks good to me.
Reviewed-by: Chanwoo Choi 

[snip]

-- 
Best Regards,
Chanwoo Choi
Samsung Electronics


Re: [PATCH -mm -v4 00/21] mm, THP, swap: Swapout/swapin THP in one piece

2018-07-03 Thread Sergey Senozhatsky
On (07/04/18 10:20), Huang, Ying wrote:
> > On (06/27/18 21:51), Andrew Morton wrote:
> >> On Fri, 22 Jun 2018 11:51:30 +0800 "Huang, Ying"  
> >> wrote:
> >> 
> >> > This is the final step of THP (Transparent Huge Page) swap
> >> > optimization.  After the first and second step, the splitting huge
> >> > page is delayed from almost the first step of swapout to after swapout
> >> > has been finished.  In this step, we avoid splitting THP for swapout
> >> > and swapout/swapin the THP in one piece.
> >> 
> >> It's a tremendously good performance improvement.  It's also a
> >> tremendously large patchset :(
> >
> > Will zswap gain a THP swap out/in support at some point?
> >
> >
> > mm/zswap.c: static int zswap_frontswap_store(...)
> > ...
> > /* THP isn't supported */
> > if (PageTransHuge(page)) {
> > ret = -EINVAL;
> > goto reject;
> > }
> 
> That's not on my TODO list.  Do you have interest to work on this?

I'd say I'm interested. Can't promise that I'll have enough spare time
any time soon, tho.

The numbers you posted do look fantastic indeed, embedded devices
[which normally use zswap/zram quite heavily] _probably_ should see
some performance improvement as well once zswap [and may be zram] can
handle THP.

-ss


Re: [PATCH -mm -v4 00/21] mm, THP, swap: Swapout/swapin THP in one piece

2018-07-03 Thread Sergey Senozhatsky
On (07/04/18 10:20), Huang, Ying wrote:
> > On (06/27/18 21:51), Andrew Morton wrote:
> >> On Fri, 22 Jun 2018 11:51:30 +0800 "Huang, Ying"  
> >> wrote:
> >> 
> >> > This is the final step of THP (Transparent Huge Page) swap
> >> > optimization.  After the first and second step, the splitting huge
> >> > page is delayed from almost the first step of swapout to after swapout
> >> > has been finished.  In this step, we avoid splitting THP for swapout
> >> > and swapout/swapin the THP in one piece.
> >> 
> >> It's a tremendously good performance improvement.  It's also a
> >> tremendously large patchset :(
> >
> > Will zswap gain a THP swap out/in support at some point?
> >
> >
> > mm/zswap.c: static int zswap_frontswap_store(...)
> > ...
> > /* THP isn't supported */
> > if (PageTransHuge(page)) {
> > ret = -EINVAL;
> > goto reject;
> > }
> 
> That's not on my TODO list.  Do you have interest to work on this?

I'd say I'm interested. Can't promise that I'll have enough spare time
any time soon, tho.

The numbers you posted do look fantastic indeed, embedded devices
[which normally use zswap/zram quite heavily] _probably_ should see
some performance improvement as well once zswap [and may be zram] can
handle THP.

-ss


Re: [patch v3] mm, oom: fix unnecessary killing of additional processes

2018-07-03 Thread penguin-kernel
David Rientjes wrote:
> Ping?
> 
> This can be something that can easily be removed if it becomes obsoleted 
> because the oom reaper is always able to free memory to the extent of 
> exit_mmap().  I argue that it cannot, because it cannot do free_pgtables() 
> for large amounts of virtual memory, but am fine to be proved wrong.

This is "[PATCH 3/8] mm,oom: Fix unnecessary killing of additional processes." 
in my series.

> 
> In the meantime, however, this patch should introduce no significant 
> change in functionality and the only interface it is added is in debugfs 
> and can easily be removed if it is obsoleted.
> 
> The work to make the oom reaper more effective or realible can still 
> continue with this patch.
> 


Re: [patch v3] mm, oom: fix unnecessary killing of additional processes

2018-07-03 Thread penguin-kernel
David Rientjes wrote:
> Ping?
> 
> This can be something that can easily be removed if it becomes obsoleted 
> because the oom reaper is always able to free memory to the extent of 
> exit_mmap().  I argue that it cannot, because it cannot do free_pgtables() 
> for large amounts of virtual memory, but am fine to be proved wrong.

This is "[PATCH 3/8] mm,oom: Fix unnecessary killing of additional processes." 
in my series.

> 
> In the meantime, however, this patch should introduce no significant 
> change in functionality and the only interface it is added is in debugfs 
> and can easily be removed if it is obsoleted.
> 
> The work to make the oom reaper more effective or realible can still 
> continue with this patch.
> 


Re: [PATCH v11 1/2] Refactor part of the oom report in dump_header

2018-07-03 Thread 禹舟键
Hi Andy
The const char array need to be used by the new func
mem_cgroup_print_oom_context and some funcs in oom_kill.c in the
second patch.

Thanks

>
> On Sat, Jun 30, 2018 at 7:38 PM,   wrote:
> > From: yuzhoujian 
> >
> > The current system wide oom report prints information about the victim
> > and the allocation context and restrictions. It, however, doesn't
> > provide any information about memory cgroup the victim belongs to. This
> > information can be interesting for container users because they can find
> > the victim's container much more easily.
> >
> > I follow the advices of David Rientjes and Michal Hocko, and refactor
> > part of the oom report. After this patch, users can get the memcg's
> > path from the oom report and check the certain container more quickly.
> >
> > The oom print info after this patch:
> > oom-kill:constraint=,nodemask=,oom_memcg=,task_memcg=,task=,pid=,uid=
>
>
> > +static const char * const oom_constraint_text[] = {
> > +   [CONSTRAINT_NONE] = "CONSTRAINT_NONE",
> > +   [CONSTRAINT_CPUSET] = "CONSTRAINT_CPUSET",
> > +   [CONSTRAINT_MEMORY_POLICY] = "CONSTRAINT_MEMORY_POLICY",
> > +   [CONSTRAINT_MEMCG] = "CONSTRAINT_MEMCG",
> > +};
>
> I'm not sure why we have this in the header.
>
> This produces a lot of noise when W=1.
>
> In file included from
> /home/andy/prj/linux-topic-mfld/include/linux/memcontrol.h:31:0,
> from /home/andy/prj/linux-topic-mfld/include/net/sock.h:58,
> from /home/andy/prj/linux-topic-mfld/include/linux/tcp.h:23,
> from /home/andy/prj/linux-topic-mfld/include/linux/ipv6.h:87,
> from /home/andy/prj/linux-topic-mfld/include/net/ipv6.h:16,
> from
> /home/andy/prj/linux-topic-mfld/net/ipv4/netfilter/nf_log_ipv4.c:17:
> /home/andy/prj/linux-topic-mfld/include/linux/oom.h:32:27: warning:
> ‘oom_constraint_text’ defined but not used [-W
> unused-const-variable=]
> static const char * const oom_constraint_text[] = {
>   ^~~
>  CC [M]  net/ipv4/netfilter/iptable_nat.o
>
>
> If you need (but looking at the code you actually don't if I didn't
> miss anything) it in several places, just export.
> Otherwise put it back to memcontrol.c.
>
> --
> With Best Regards,
> Andy Shevchenko


Re: [PATCH v11 1/2] Refactor part of the oom report in dump_header

2018-07-03 Thread 禹舟键
Hi Andy
The const char array need to be used by the new func
mem_cgroup_print_oom_context and some funcs in oom_kill.c in the
second patch.

Thanks

>
> On Sat, Jun 30, 2018 at 7:38 PM,   wrote:
> > From: yuzhoujian 
> >
> > The current system wide oom report prints information about the victim
> > and the allocation context and restrictions. It, however, doesn't
> > provide any information about memory cgroup the victim belongs to. This
> > information can be interesting for container users because they can find
> > the victim's container much more easily.
> >
> > I follow the advices of David Rientjes and Michal Hocko, and refactor
> > part of the oom report. After this patch, users can get the memcg's
> > path from the oom report and check the certain container more quickly.
> >
> > The oom print info after this patch:
> > oom-kill:constraint=,nodemask=,oom_memcg=,task_memcg=,task=,pid=,uid=
>
>
> > +static const char * const oom_constraint_text[] = {
> > +   [CONSTRAINT_NONE] = "CONSTRAINT_NONE",
> > +   [CONSTRAINT_CPUSET] = "CONSTRAINT_CPUSET",
> > +   [CONSTRAINT_MEMORY_POLICY] = "CONSTRAINT_MEMORY_POLICY",
> > +   [CONSTRAINT_MEMCG] = "CONSTRAINT_MEMCG",
> > +};
>
> I'm not sure why we have this in the header.
>
> This produces a lot of noise when W=1.
>
> In file included from
> /home/andy/prj/linux-topic-mfld/include/linux/memcontrol.h:31:0,
> from /home/andy/prj/linux-topic-mfld/include/net/sock.h:58,
> from /home/andy/prj/linux-topic-mfld/include/linux/tcp.h:23,
> from /home/andy/prj/linux-topic-mfld/include/linux/ipv6.h:87,
> from /home/andy/prj/linux-topic-mfld/include/net/ipv6.h:16,
> from
> /home/andy/prj/linux-topic-mfld/net/ipv4/netfilter/nf_log_ipv4.c:17:
> /home/andy/prj/linux-topic-mfld/include/linux/oom.h:32:27: warning:
> ‘oom_constraint_text’ defined but not used [-W
> unused-const-variable=]
> static const char * const oom_constraint_text[] = {
>   ^~~
>  CC [M]  net/ipv4/netfilter/iptable_nat.o
>
>
> If you need (but looking at the code you actually don't if I didn't
> miss anything) it in several places, just export.
> Otherwise put it back to memcontrol.c.
>
> --
> With Best Regards,
> Andy Shevchenko


Re: [PATCH -mm -v4 08/21] mm, THP, swap: Support to read a huge swap cluster for swapin a THP

2018-07-03 Thread Huang, Ying
Daniel Jordan  writes:

> On Fri, Jun 22, 2018 at 11:51:38AM +0800, Huang, Ying wrote:
>> @@ -411,14 +414,32 @@ struct page *__read_swap_cache_async(swp_entry_t 
>> entry, gfp_t gfp_mask,
> ...
>> +if (thp_swap_supported() && huge_cluster) {
>> +gfp_t gfp = alloc_hugepage_direct_gfpmask(vma);
>> +
>> +new_page = alloc_hugepage_vma(gfp, vma,
>> +addr, HPAGE_PMD_ORDER);
>
> When allocating a huge page, we ignore the gfp_mask argument.
>
> That doesn't matter right now since AFAICT we're not losing any flags: 
> gfp_mask
> from existing callers of __read_swap_cache_async seems to always be a subset 
> of
> GFP_HIGHUSER_MOVABLE and alloc_hugepage_direct_gfpmask always returns a
> superset of that.
>
> But maybe we should warn here in case we end up violating a restriction from a
> future caller.  Something like this?:
>
>> +gfp_t gfp = alloc_hugepage_direct_gfpmask(vma);
> VM_WARN_ONCE((gfp | gfp_mask) != gfp,
>"ignoring gfp_mask bits");

This looks good!  Thanks!  Will add this.

Best Regards,
Huang, Ying


Re: [PATCH -mm -v4 08/21] mm, THP, swap: Support to read a huge swap cluster for swapin a THP

2018-07-03 Thread Huang, Ying
Daniel Jordan  writes:

> On Fri, Jun 22, 2018 at 11:51:38AM +0800, Huang, Ying wrote:
>> @@ -411,14 +414,32 @@ struct page *__read_swap_cache_async(swp_entry_t 
>> entry, gfp_t gfp_mask,
> ...
>> +if (thp_swap_supported() && huge_cluster) {
>> +gfp_t gfp = alloc_hugepage_direct_gfpmask(vma);
>> +
>> +new_page = alloc_hugepage_vma(gfp, vma,
>> +addr, HPAGE_PMD_ORDER);
>
> When allocating a huge page, we ignore the gfp_mask argument.
>
> That doesn't matter right now since AFAICT we're not losing any flags: 
> gfp_mask
> from existing callers of __read_swap_cache_async seems to always be a subset 
> of
> GFP_HIGHUSER_MOVABLE and alloc_hugepage_direct_gfpmask always returns a
> superset of that.
>
> But maybe we should warn here in case we end up violating a restriction from a
> future caller.  Something like this?:
>
>> +gfp_t gfp = alloc_hugepage_direct_gfpmask(vma);
> VM_WARN_ONCE((gfp | gfp_mask) != gfp,
>"ignoring gfp_mask bits");

This looks good!  Thanks!  Will add this.

Best Regards,
Huang, Ying


RE: [PATCH v5 01/12] PM / devfreq: Init user limits from OPP limits, not viceversa

2018-07-03 Thread MyungJoo Ham
>Commit ab8f58ad72c4 ("PM / devfreq: Set min/max_freq when adding
>the devfreq device") introduced the initialization of the user
>limits min/max_freq from the lowest/highest available OPPs. Later
>commit f1d981eaecf8 ("PM / devfreq: Use the available min/max
>frequency") added scaling_min/max_freq, which actually represent
>the frequencies of the lowest/highest available OPP. scaling_min/
>max_freq are initialized with the values from min/max_freq, which
>is totally correct in the context, but a bit awkward to read.
>
>Swap the initialization and assign scaling_min/max_freq with the
>OPP freqs and then the user limts min/max_freq with scaling_min/
>max_freq.
>
>Needless to say that this change is a NOP, intended to improve
>readability.
>
>Signed-off-by: Matthias Kaehlcke 
>Reviewed-by: Chanwoo Choi 
>Reviewed-by: Brian Norris 
>---
>Changes in v5:
>- none
>
>Changes in v4:
>- added 'Reviewed-by: Brian Norris ' tag
>
>Changes in v3:
>- none
>
>Changes in v2:
>- added 'Reviewed-by: Chanwoo Choi ' tag
>---
> drivers/devfreq/devfreq.c | 12 ++--
> 1 file changed, 6 insertions(+), 6 deletions(-)

Acked-by: MyungJoo Ham 

This can be applied independently from other commits in this series.



RE: [PATCH v5 01/12] PM / devfreq: Init user limits from OPP limits, not viceversa

2018-07-03 Thread MyungJoo Ham
>Commit ab8f58ad72c4 ("PM / devfreq: Set min/max_freq when adding
>the devfreq device") introduced the initialization of the user
>limits min/max_freq from the lowest/highest available OPPs. Later
>commit f1d981eaecf8 ("PM / devfreq: Use the available min/max
>frequency") added scaling_min/max_freq, which actually represent
>the frequencies of the lowest/highest available OPP. scaling_min/
>max_freq are initialized with the values from min/max_freq, which
>is totally correct in the context, but a bit awkward to read.
>
>Swap the initialization and assign scaling_min/max_freq with the
>OPP freqs and then the user limts min/max_freq with scaling_min/
>max_freq.
>
>Needless to say that this change is a NOP, intended to improve
>readability.
>
>Signed-off-by: Matthias Kaehlcke 
>Reviewed-by: Chanwoo Choi 
>Reviewed-by: Brian Norris 
>---
>Changes in v5:
>- none
>
>Changes in v4:
>- added 'Reviewed-by: Brian Norris ' tag
>
>Changes in v3:
>- none
>
>Changes in v2:
>- added 'Reviewed-by: Chanwoo Choi ' tag
>---
> drivers/devfreq/devfreq.c | 12 ++--
> 1 file changed, 6 insertions(+), 6 deletions(-)

Acked-by: MyungJoo Ham 

This can be applied independently from other commits in this series.



Re: [PATCH v5 02/12] PM / devfreq: Fix handling of min/max_freq == 0

2018-07-03 Thread Chanwoo Choi
Hi Matthias,

On 2018년 07월 04일 08:46, Matthias Kaehlcke wrote:
> Commit ab8f58ad72c4 ("PM / devfreq: Set min/max_freq when adding the
> devfreq device") initializes df->min/max_freq with the min/max OPP when
> the device is added. Later commit f1d981eaecf8 ("PM / devfreq: Use the
> available min/max frequency") adds df->scaling_min/max_freq and the
> following to the frequency adjustment code:
> 
>   max_freq = MIN(devfreq->scaling_max_freq, devfreq->max_freq);
> 
> With the current handling of min/max_freq this is incorrect:
> 
> Even though df->max_freq is now initialized to a value != 0 user space
> can still set it to 0, in this case max_freq would be 0 instead of
> df->scaling_max_freq as intended. In consequence the frequency adjustment
> is not performed:
> 
>   if (max_freq && freq > max_freq) {
>   freq = max_freq;
> 
> To fix this set df->min/max freq to the min/max OPP in max/max_freq_store,
> when the user passes a value of 0. This also prevents df->max_freq from
> being set below the min OPP when df->min_freq is 0, and similar for
> min_freq. Since it is now guaranteed that df->min/max_freq can't be 0 the
> checks for this case can be removed.
> 
> Fixes: f1d981eaecf8 ("PM / devfreq: Use the available min/max frequency")
> Signed-off-by: Matthias Kaehlcke 
> Reviewed-by: Brian Norris 
> ---
> Changes in v5:
> - none
> 
> Changes in v4:
> - added 'Reviewed-by: Brian Norris ' tag
> 
> Changes in v3:
> - none
> 
> Changes in v2:
> - handle freq tables sorted in ascending and descending order in
>   min/max_freq_store()
> - use same order for conditional statements in min/max_freq_store()
> ---
>  drivers/devfreq/devfreq.c | 42 ---
>  1 file changed, 30 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
> index 0057ef5b0a98..6f604f8b2b81 100644
> --- a/drivers/devfreq/devfreq.c
> +++ b/drivers/devfreq/devfreq.c
> @@ -283,11 +283,11 @@ int update_devfreq(struct devfreq *devfreq)
>   max_freq = MIN(devfreq->scaling_max_freq, devfreq->max_freq);
>   min_freq = MAX(devfreq->scaling_min_freq, devfreq->min_freq);
>  
> - if (min_freq && freq < min_freq) {
> + if (freq < min_freq) {
>   freq = min_freq;
>   flags &= ~DEVFREQ_FLAG_LEAST_UPPER_BOUND; /* Use GLB */
>   }
> - if (max_freq && freq > max_freq) {
> + if (freq > max_freq) {
>   freq = max_freq;
>   flags |= DEVFREQ_FLAG_LEAST_UPPER_BOUND; /* Use LUB */
>   }
> @@ -1122,18 +1122,27 @@ static ssize_t min_freq_store(struct device *dev, 
> struct device_attribute *attr,
>  {
>   struct devfreq *df = to_devfreq(dev);
>   unsigned long value;
> + unsigned long *freq_table;

You can move 'freq_table' under 'else' statement.

>   int ret;
> - unsigned long max;
>  
>   ret = sscanf(buf, "%lu", );
>   if (ret != 1)
>   return -EINVAL;
>  
>   mutex_lock(>lock);
> - max = df->max_freq;
> - if (value && max && value > max) {
> - ret = -EINVAL;
> - goto unlock;
> +
> + if (value) {
> + if (value > df->max_freq) {
> + ret = -EINVAL;
> + goto unlock;
> + }
> + } else {
> + freq_table = df->profile->freq_table;
> + /* typical order is ascending, some drivers use descending */

You better to explain what is doing of following code.
How about modifying it as following?

/* Get minimum frequency according to sorting way */

> + if (freq_table[0] < freq_table[df->profile->max_state - 1])
> + value = freq_table[0];
> + else
> + value = freq_table[df->profile->max_state - 1];
>   }
>  
>   df->min_freq = value;
> @@ -1157,18 +1166,27 @@ static ssize_t max_freq_store(struct device *dev, 
> struct device_attribute *attr,
>  {
>   struct devfreq *df = to_devfreq(dev);
>   unsigned long value;
> + unsigned long *freq_table;

ditto. You can move 'freq_table' under 'else' statement.

>   int ret;
> - unsigned long min;
>  
>   ret = sscanf(buf, "%lu", );
>   if (ret != 1)
>   return -EINVAL;
>  
>   mutex_lock(>lock);
> - min = df->min_freq;
> - if (value && min && value < min) {
> - ret = -EINVAL;
> - goto unlock;
> +
> + if (value) {
> + if (value < df->min_freq) {
> + ret = -EINVAL;
> + goto unlock;
> + }
> + } else {
> + freq_table = df->profile->freq_table;
> + /* typical order is ascending, some drivers use descending */

ditto.
/* Get maximum frequency according to sorting way */

> + if (freq_table[0] < freq_table[df->profile->max_state - 1])
> + value = freq_table[df->profile->max_state - 1];
> + else
> +   

Re: [PATCH v5 02/12] PM / devfreq: Fix handling of min/max_freq == 0

2018-07-03 Thread Chanwoo Choi
Hi Matthias,

On 2018년 07월 04일 08:46, Matthias Kaehlcke wrote:
> Commit ab8f58ad72c4 ("PM / devfreq: Set min/max_freq when adding the
> devfreq device") initializes df->min/max_freq with the min/max OPP when
> the device is added. Later commit f1d981eaecf8 ("PM / devfreq: Use the
> available min/max frequency") adds df->scaling_min/max_freq and the
> following to the frequency adjustment code:
> 
>   max_freq = MIN(devfreq->scaling_max_freq, devfreq->max_freq);
> 
> With the current handling of min/max_freq this is incorrect:
> 
> Even though df->max_freq is now initialized to a value != 0 user space
> can still set it to 0, in this case max_freq would be 0 instead of
> df->scaling_max_freq as intended. In consequence the frequency adjustment
> is not performed:
> 
>   if (max_freq && freq > max_freq) {
>   freq = max_freq;
> 
> To fix this set df->min/max freq to the min/max OPP in max/max_freq_store,
> when the user passes a value of 0. This also prevents df->max_freq from
> being set below the min OPP when df->min_freq is 0, and similar for
> min_freq. Since it is now guaranteed that df->min/max_freq can't be 0 the
> checks for this case can be removed.
> 
> Fixes: f1d981eaecf8 ("PM / devfreq: Use the available min/max frequency")
> Signed-off-by: Matthias Kaehlcke 
> Reviewed-by: Brian Norris 
> ---
> Changes in v5:
> - none
> 
> Changes in v4:
> - added 'Reviewed-by: Brian Norris ' tag
> 
> Changes in v3:
> - none
> 
> Changes in v2:
> - handle freq tables sorted in ascending and descending order in
>   min/max_freq_store()
> - use same order for conditional statements in min/max_freq_store()
> ---
>  drivers/devfreq/devfreq.c | 42 ---
>  1 file changed, 30 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
> index 0057ef5b0a98..6f604f8b2b81 100644
> --- a/drivers/devfreq/devfreq.c
> +++ b/drivers/devfreq/devfreq.c
> @@ -283,11 +283,11 @@ int update_devfreq(struct devfreq *devfreq)
>   max_freq = MIN(devfreq->scaling_max_freq, devfreq->max_freq);
>   min_freq = MAX(devfreq->scaling_min_freq, devfreq->min_freq);
>  
> - if (min_freq && freq < min_freq) {
> + if (freq < min_freq) {
>   freq = min_freq;
>   flags &= ~DEVFREQ_FLAG_LEAST_UPPER_BOUND; /* Use GLB */
>   }
> - if (max_freq && freq > max_freq) {
> + if (freq > max_freq) {
>   freq = max_freq;
>   flags |= DEVFREQ_FLAG_LEAST_UPPER_BOUND; /* Use LUB */
>   }
> @@ -1122,18 +1122,27 @@ static ssize_t min_freq_store(struct device *dev, 
> struct device_attribute *attr,
>  {
>   struct devfreq *df = to_devfreq(dev);
>   unsigned long value;
> + unsigned long *freq_table;

You can move 'freq_table' under 'else' statement.

>   int ret;
> - unsigned long max;
>  
>   ret = sscanf(buf, "%lu", );
>   if (ret != 1)
>   return -EINVAL;
>  
>   mutex_lock(>lock);
> - max = df->max_freq;
> - if (value && max && value > max) {
> - ret = -EINVAL;
> - goto unlock;
> +
> + if (value) {
> + if (value > df->max_freq) {
> + ret = -EINVAL;
> + goto unlock;
> + }
> + } else {
> + freq_table = df->profile->freq_table;
> + /* typical order is ascending, some drivers use descending */

You better to explain what is doing of following code.
How about modifying it as following?

/* Get minimum frequency according to sorting way */

> + if (freq_table[0] < freq_table[df->profile->max_state - 1])
> + value = freq_table[0];
> + else
> + value = freq_table[df->profile->max_state - 1];
>   }
>  
>   df->min_freq = value;
> @@ -1157,18 +1166,27 @@ static ssize_t max_freq_store(struct device *dev, 
> struct device_attribute *attr,
>  {
>   struct devfreq *df = to_devfreq(dev);
>   unsigned long value;
> + unsigned long *freq_table;

ditto. You can move 'freq_table' under 'else' statement.

>   int ret;
> - unsigned long min;
>  
>   ret = sscanf(buf, "%lu", );
>   if (ret != 1)
>   return -EINVAL;
>  
>   mutex_lock(>lock);
> - min = df->min_freq;
> - if (value && min && value < min) {
> - ret = -EINVAL;
> - goto unlock;
> +
> + if (value) {
> + if (value < df->min_freq) {
> + ret = -EINVAL;
> + goto unlock;
> + }
> + } else {
> + freq_table = df->profile->freq_table;
> + /* typical order is ascending, some drivers use descending */

ditto.
/* Get maximum frequency according to sorting way */

> + if (freq_table[0] < freq_table[df->profile->max_state - 1])
> + value = freq_table[df->profile->max_state - 1];
> + else
> +   

Re: [PATCH -mm -v4 00/21] mm, THP, swap: Swapout/swapin THP in one piece

2018-07-03 Thread Huang, Ying
Sergey Senozhatsky  writes:

> On (06/27/18 21:51), Andrew Morton wrote:
>> On Fri, 22 Jun 2018 11:51:30 +0800 "Huang, Ying"  
>> wrote:
>> 
>> > This is the final step of THP (Transparent Huge Page) swap
>> > optimization.  After the first and second step, the splitting huge
>> > page is delayed from almost the first step of swapout to after swapout
>> > has been finished.  In this step, we avoid splitting THP for swapout
>> > and swapout/swapin the THP in one piece.
>> 
>> It's a tremendously good performance improvement.  It's also a
>> tremendously large patchset :(
>
> Will zswap gain a THP swap out/in support at some point?
>
>
> mm/zswap.c: static int zswap_frontswap_store(...)
> ...
>   /* THP isn't supported */
>   if (PageTransHuge(page)) {
>   ret = -EINVAL;
>   goto reject;
>   }

That's not on my TODO list.  Do you have interest to work on this?

Best Regards,
Huang, Ying


Re: [PATCH -mm -v4 00/21] mm, THP, swap: Swapout/swapin THP in one piece

2018-07-03 Thread Huang, Ying
Sergey Senozhatsky  writes:

> On (06/27/18 21:51), Andrew Morton wrote:
>> On Fri, 22 Jun 2018 11:51:30 +0800 "Huang, Ying"  
>> wrote:
>> 
>> > This is the final step of THP (Transparent Huge Page) swap
>> > optimization.  After the first and second step, the splitting huge
>> > page is delayed from almost the first step of swapout to after swapout
>> > has been finished.  In this step, we avoid splitting THP for swapout
>> > and swapout/swapin the THP in one piece.
>> 
>> It's a tremendously good performance improvement.  It's also a
>> tremendously large patchset :(
>
> Will zswap gain a THP swap out/in support at some point?
>
>
> mm/zswap.c: static int zswap_frontswap_store(...)
> ...
>   /* THP isn't supported */
>   if (PageTransHuge(page)) {
>   ret = -EINVAL;
>   goto reject;
>   }

That's not on my TODO list.  Do you have interest to work on this?

Best Regards,
Huang, Ying


  1   2   3   4   5   6   7   8   9   10   >