date:20181018

Re: [tip:x86/mm] x86/mm: Break out user address space handling

2018-10-18 Thread Ingo Molnar



* Eric W. Biederman  wrote:

> tip-bot for Dave Hansen  writes:
> 
> > Commit-ID:  aa37c51b9421d66f7931c5fdcb9ce80c450974be
> > Gitweb: 
> > https://git.kernel.org/tip/aa37c51b9421d66f7931c5fdcb9ce80c450974be
> > Author: Dave Hansen 
> > AuthorDate: Fri, 28 Sep 2018 09:02:23 -0700
> > Committer:  Peter Zijlstra 
> > CommitDate: Tue, 9 Oct 2018 16:51:15 +0200
> >
> > x86/mm: Break out user address space handling
> >
> > The last patch broke out kernel address space handing into its own
> > helper.  Now, do the same for user address space handling.
> >
> > Cc: x...@kernel.org
> > Cc: Jann Horn 
> > Cc: Sean Christopherson 
> > Cc: Thomas Gleixner 
> > Cc: Andy Lutomirski 
> > Signed-off-by: Dave Hansen 
> > Signed-off-by: Peter Zijlstra (Intel) 
> > Link: http://lkml.kernel.org/r/20180928160223.9c4f6...@viggo.jf.intel.com
> > ---
> >  arch/x86/mm/fault.c | 47 ---
> >  1 file changed, 28 insertions(+), 19 deletions(-)
> >
> > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> > index c7e32f453852..0d1f5d39fc63 100644
> > --- a/arch/x86/mm/fault.c
> > +++ b/arch/x86/mm/fault.c
> > @@ -966,6 +966,7 @@ bad_area_access_error(struct pt_regs *regs, unsigned 
> > long error_code,
> > __bad_area(regs, error_code, address, vma, SEGV_ACCERR);
> >  }
> >  
> > +/* Handle faults in the kernel portion of the address space */
>^^
> I believe you mean the __user__ portion of the address space.
> Given that the call chain is:
> 
> do_user_addr_fault
>handle_mm_fault
>   do_sigbus  

It's both:

  /* Handle faults in the kernel portion of the address space */
  static void
  do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long 
address,
u32 *pkey, unsigned int fault)
  {
struct task_struct *tsk = current;
int code = BUS_ADRERR;

/* Kernel mode? Handle exceptions or die: */
if (!(error_code & X86_PF_USER)) {
no_context(regs, error_code, address, SIGBUS, BUS_ADRERR);
return;
}

/* User-space => ok to do another page fault: */
if (is_prefetch(regs, error_code, address))
return;

tsk->thread.cr2 = address;
tsk->thread.error_code  = error_code;
tsk->thread.trap_nr = X86_TRAP_PF;


Note the X86_PF_USER check: that's what determines whether the fault was 
for user or system mappings.

I agree that the comment is misleading and should be clarified.

Thanks,

Ingo

Re: [tip:x86/mm] x86/mm: Break out user address space handling

2018-10-18 Thread Ingo Molnar



* Eric W. Biederman  wrote:

> tip-bot for Dave Hansen  writes:
> 
> > Commit-ID:  aa37c51b9421d66f7931c5fdcb9ce80c450974be
> > Gitweb: 
> > https://git.kernel.org/tip/aa37c51b9421d66f7931c5fdcb9ce80c450974be
> > Author: Dave Hansen 
> > AuthorDate: Fri, 28 Sep 2018 09:02:23 -0700
> > Committer:  Peter Zijlstra 
> > CommitDate: Tue, 9 Oct 2018 16:51:15 +0200
> >
> > x86/mm: Break out user address space handling
> >
> > The last patch broke out kernel address space handing into its own
> > helper.  Now, do the same for user address space handling.
> >
> > Cc: x...@kernel.org
> > Cc: Jann Horn 
> > Cc: Sean Christopherson 
> > Cc: Thomas Gleixner 
> > Cc: Andy Lutomirski 
> > Signed-off-by: Dave Hansen 
> > Signed-off-by: Peter Zijlstra (Intel) 
> > Link: http://lkml.kernel.org/r/20180928160223.9c4f6...@viggo.jf.intel.com
> > ---
> >  arch/x86/mm/fault.c | 47 ---
> >  1 file changed, 28 insertions(+), 19 deletions(-)
> >
> > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> > index c7e32f453852..0d1f5d39fc63 100644
> > --- a/arch/x86/mm/fault.c
> > +++ b/arch/x86/mm/fault.c
> > @@ -966,6 +966,7 @@ bad_area_access_error(struct pt_regs *regs, unsigned 
> > long error_code,
> > __bad_area(regs, error_code, address, vma, SEGV_ACCERR);
> >  }
> >  
> > +/* Handle faults in the kernel portion of the address space */
>^^
> I believe you mean the __user__ portion of the address space.
> Given that the call chain is:
> 
> do_user_addr_fault
>handle_mm_fault
>   do_sigbus  

It's both:

  /* Handle faults in the kernel portion of the address space */
  static void
  do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long 
address,
u32 *pkey, unsigned int fault)
  {
struct task_struct *tsk = current;
int code = BUS_ADRERR;

/* Kernel mode? Handle exceptions or die: */
if (!(error_code & X86_PF_USER)) {
no_context(regs, error_code, address, SIGBUS, BUS_ADRERR);
return;
}

/* User-space => ok to do another page fault: */
if (is_prefetch(regs, error_code, address))
return;

tsk->thread.cr2 = address;
tsk->thread.error_code  = error_code;
tsk->thread.trap_nr = X86_TRAP_PF;


Note the X86_PF_USER check: that's what determines whether the fault was 
for user or system mappings.

I agree that the comment is misleading and should be clarified.

Thanks,

Ingo

Re: [RFC v4 PATCH 2/5] mm/__free_one_page: skip merge for order-0 page unless compaction failed

2018-10-18 Thread Aaron Lu

On Thu, Oct 18, 2018 at 12:16:32PM +0100, Mel Gorman wrote:
> On Wed, Oct 17, 2018 at 10:59:04PM +0800, Aaron Lu wrote:
> > > Any particuular reason why? I assume it's related to the number of zone
> > > locks with the increase number of zones and the number of threads used
> > > for the test.
> > 
> > I think so too.
> > 
> > The 4 sockets server has 192 CPUs in total while the 2 sockets server
> > has 112 CPUs in total. Assume only ZONE_NORMAL are used, for the 4
> > sockets server it would be 192/4=48(CPUs per zone) while for the 2
> > sockets server it is 112/2=56(CPUs per zone). The test is started with
> > nr_task=nr_cpu so for the 2 sockets servers, it ends up having more CPUs
> > consuming one zone.
> > 
> 
> Nice that the prediction is accurate. It brings us to another option --
> breaking up the zone lock by either hash or address space ranges. The
> address space ranges would probably be easier to implement. Where it
> gets hairy is that PFN walkers would need different zone locks. However,
> overall it might be a better option because it's not order-0 specific.

I think the 'address space range' lock is worth a try.

> It would be a lot of legwork because all uses of the zone lock would
> have to be audited to see which ones protect the free lists and which
> ones protect "something else".

Yes a lot of details.

> > > That's important to know. It does reduce the utility of the patch
> > > somewhat but not all arches support THP and THP is not always enabled on
> > > x86.
> > 
> > I always wondered how systems are making use of THP.
> > After all, when system has been runing a while(days or months), file
> > cache should consumed a lot of memory and high order pages will become
> > more and more scare. If order9 page can't be reliably allocated, will
> > workload rely on it?
> > Just a thought.
> > 
> 
> File cache can usually be trivially reclaimed and moved. It's a "how
> long is a piece of string" to determine at what point a system can get
> fragmented and whether than can be prevented. It's somewhat outside the
> scope of this patch but anecdotally I'm looking at a machine with 20 days
> uptime and it still has 2390GB worth of THPs free after a large amount
> of reclaim activity over the system lifetime so fragmentation avoidance
> does work in some cases.

Good to know, thanks.

> 
> > THP is of course pretty neat that it reduced TLB cost, needs fewer page
> > table etc. I just wondered if people really rely on it, or using it
> > after their system has been up for a long time.
> > 
> 
> If people didn't rely on it then we might as well delete THP and the
> declare the whole tmpfs-backed-THP as worthless.
> 
> > > Yes, but note that the concept is still problematic.
> > > isolate_migratepages_block is not guaranteed to find a pageblock with
> > > unmerged buddies in it. If there are pageblocks towards the end of the
> > > zone with unmerged pages, they may never be found. This will be very hard
> > > to detect at runtime because it's heavily dependant on the exact state
> > > of the system.
> > 
> > Quite true.
> > 
> > The intent here though, is not to have compaction merge back all
> > unmerged pages, but did the merge for these unmerged pages in a
> > piggyback way, i.e. since isolate_migratepages_block() is doing the
> > scan, why don't we let it handle these unmerged pages when it meets
> > them?
> > 
> > If for some reason isolate_migratepages_block() didn't meet a single
> > unmerged page before compaction succeed, we probably do not need worry
> > much yet since compaction succeeded anyway.
> > 
> 
> I don't think this is the right way of thinking about it because it's
> possible to have the system split in such a way so that the migration
> scanner only encounters unmovable pages before it meets the free scanner
> where unmerged buddies were in the higher portion of the address space.

Yes it is possible unmerged pages are in the higher portion.

My understanding is, when the two scanners meet, all unmerged pages will
be either used by the free scanner as migrate targets or sent to merge
by the migration scanner.

> 
> You either need to keep unmerged buddies on a separate list or search
> the order-0 free list for merge candidates prior to compaction.
> 
> > > It's needed to form them efficiently but excessive reclaim or writing 3
> > > to drop_caches can also do it. Be careful of tying lazy buddy too
> > > closely to compaction.
> > 
> > That's the current design of this patchset, do you see any immediate
> > problem of this? Is it that you are worried about high-order allocation
> > success rate using this design?
> 
> I've pointed out what I see are the design flaws but yes, in general, I'm
> worried about the high order allocation success rate using this design,
> the reliance on compaction and the fact that the primary motivation is
> when THP is disabled.

When THP is in use, zone lock contention is pretty much nowhere :-)

I'll see what I can get with 'address space

Re: [RFC v4 PATCH 2/5] mm/__free_one_page: skip merge for order-0 page unless compaction failed

2018-10-18 Thread Aaron Lu

On Thu, Oct 18, 2018 at 12:16:32PM +0100, Mel Gorman wrote:
> On Wed, Oct 17, 2018 at 10:59:04PM +0800, Aaron Lu wrote:
> > > Any particuular reason why? I assume it's related to the number of zone
> > > locks with the increase number of zones and the number of threads used
> > > for the test.
> > 
> > I think so too.
> > 
> > The 4 sockets server has 192 CPUs in total while the 2 sockets server
> > has 112 CPUs in total. Assume only ZONE_NORMAL are used, for the 4
> > sockets server it would be 192/4=48(CPUs per zone) while for the 2
> > sockets server it is 112/2=56(CPUs per zone). The test is started with
> > nr_task=nr_cpu so for the 2 sockets servers, it ends up having more CPUs
> > consuming one zone.
> > 
> 
> Nice that the prediction is accurate. It brings us to another option --
> breaking up the zone lock by either hash or address space ranges. The
> address space ranges would probably be easier to implement. Where it
> gets hairy is that PFN walkers would need different zone locks. However,
> overall it might be a better option because it's not order-0 specific.

I think the 'address space range' lock is worth a try.

> It would be a lot of legwork because all uses of the zone lock would
> have to be audited to see which ones protect the free lists and which
> ones protect "something else".

Yes a lot of details.

> > > That's important to know. It does reduce the utility of the patch
> > > somewhat but not all arches support THP and THP is not always enabled on
> > > x86.
> > 
> > I always wondered how systems are making use of THP.
> > After all, when system has been runing a while(days or months), file
> > cache should consumed a lot of memory and high order pages will become
> > more and more scare. If order9 page can't be reliably allocated, will
> > workload rely on it?
> > Just a thought.
> > 
> 
> File cache can usually be trivially reclaimed and moved. It's a "how
> long is a piece of string" to determine at what point a system can get
> fragmented and whether than can be prevented. It's somewhat outside the
> scope of this patch but anecdotally I'm looking at a machine with 20 days
> uptime and it still has 2390GB worth of THPs free after a large amount
> of reclaim activity over the system lifetime so fragmentation avoidance
> does work in some cases.

Good to know, thanks.

> 
> > THP is of course pretty neat that it reduced TLB cost, needs fewer page
> > table etc. I just wondered if people really rely on it, or using it
> > after their system has been up for a long time.
> > 
> 
> If people didn't rely on it then we might as well delete THP and the
> declare the whole tmpfs-backed-THP as worthless.
> 
> > > Yes, but note that the concept is still problematic.
> > > isolate_migratepages_block is not guaranteed to find a pageblock with
> > > unmerged buddies in it. If there are pageblocks towards the end of the
> > > zone with unmerged pages, they may never be found. This will be very hard
> > > to detect at runtime because it's heavily dependant on the exact state
> > > of the system.
> > 
> > Quite true.
> > 
> > The intent here though, is not to have compaction merge back all
> > unmerged pages, but did the merge for these unmerged pages in a
> > piggyback way, i.e. since isolate_migratepages_block() is doing the
> > scan, why don't we let it handle these unmerged pages when it meets
> > them?
> > 
> > If for some reason isolate_migratepages_block() didn't meet a single
> > unmerged page before compaction succeed, we probably do not need worry
> > much yet since compaction succeeded anyway.
> > 
> 
> I don't think this is the right way of thinking about it because it's
> possible to have the system split in such a way so that the migration
> scanner only encounters unmovable pages before it meets the free scanner
> where unmerged buddies were in the higher portion of the address space.

Yes it is possible unmerged pages are in the higher portion.

My understanding is, when the two scanners meet, all unmerged pages will
be either used by the free scanner as migrate targets or sent to merge
by the migration scanner.

> 
> You either need to keep unmerged buddies on a separate list or search
> the order-0 free list for merge candidates prior to compaction.
> 
> > > It's needed to form them efficiently but excessive reclaim or writing 3
> > > to drop_caches can also do it. Be careful of tying lazy buddy too
> > > closely to compaction.
> > 
> > That's the current design of this patchset, do you see any immediate
> > problem of this? Is it that you are worried about high-order allocation
> > success rate using this design?
> 
> I've pointed out what I see are the design flaws but yes, in general, I'm
> worried about the high order allocation success rate using this design,
> the reliance on compaction and the fact that the primary motivation is
> when THP is disabled.

When THP is in use, zone lock contention is pretty much nowhere :-)

I'll see what I can get with 'address space

[tip:x86/urgent] x86/swiotlb: Enable swiotlb for > 4GiG RAM on 32-bit kernels

2018-10-18 Thread tip-bot for Christoph Hellwig

Commit-ID:  485734f3fc77c1eb77ffe138c027b9a4bf0178f3
Gitweb: https://git.kernel.org/tip/485734f3fc77c1eb77ffe138c027b9a4bf0178f3
Author: Christoph Hellwig 
AuthorDate: Sun, 14 Oct 2018 09:52:08 +0200
Committer:  Ingo Molnar 
CommitDate: Fri, 19 Oct 2018 07:49:32 +0200

x86/swiotlb: Enable swiotlb for > 4GiG RAM on 32-bit kernels

We already build the swiotlb code for 32-bit kernels with PAE support,
but the code to actually use swiotlb has only been enabled for 64-bit
kernels for an unknown reason.

Before Linux v4.18 we paper over this fact because the networking code,
the SCSI layer and some random block drivers implemented their own
bounce buffering scheme.

[ mingo: Changelog fixes. ]

Fixes: 21e07dba9fb1 ("scsi: reduce use of block bounce buffers")
Fixes: ab74cfebafa3 ("net: remove the PCI_DMA_BUS_IS_PHYS check in 
illegal_highdma")
Reported-by: Matthew Whitehead 
Signed-off-by: Christoph Hellwig 
Signed-off-by: Thomas Gleixner 
Tested-by: Matthew Whitehead 
Cc: konrad.w...@oracle.com
Cc: io...@lists.linux-foundation.org
Cc: sta...@vger.kernel.org
Link: https://lkml.kernel.org/r/20181014075208.2715-1-...@lst.de
Signed-off-by: Ingo Molnar 
---
 arch/x86/kernel/pci-swiotlb.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
index 661583662430..71c0b01d93b1 100644
--- a/arch/x86/kernel/pci-swiotlb.c
+++ b/arch/x86/kernel/pci-swiotlb.c
@@ -42,10 +42,8 @@ IOMMU_INIT_FINISH(pci_swiotlb_detect_override,
 int __init pci_swiotlb_detect_4gb(void)
 {
/* don't initialize swiotlb if iommu=off (no_iommu=1) */
-#ifdef CONFIG_X86_64
if (!no_iommu && max_possible_pfn > MAX_DMA32_PFN)
swiotlb = 1;
-#endif
 
/*
 * If SME is active then swiotlb will be set to 1 so that bounce

[tip:x86/urgent] x86/swiotlb: Enable swiotlb for > 4GiG RAM on 32-bit kernels

2018-10-18 Thread tip-bot for Christoph Hellwig

Commit-ID:  485734f3fc77c1eb77ffe138c027b9a4bf0178f3
Gitweb: https://git.kernel.org/tip/485734f3fc77c1eb77ffe138c027b9a4bf0178f3
Author: Christoph Hellwig 
AuthorDate: Sun, 14 Oct 2018 09:52:08 +0200
Committer:  Ingo Molnar 
CommitDate: Fri, 19 Oct 2018 07:49:32 +0200

x86/swiotlb: Enable swiotlb for > 4GiG RAM on 32-bit kernels

We already build the swiotlb code for 32-bit kernels with PAE support,
but the code to actually use swiotlb has only been enabled for 64-bit
kernels for an unknown reason.

Before Linux v4.18 we paper over this fact because the networking code,
the SCSI layer and some random block drivers implemented their own
bounce buffering scheme.

[ mingo: Changelog fixes. ]

Fixes: 21e07dba9fb1 ("scsi: reduce use of block bounce buffers")
Fixes: ab74cfebafa3 ("net: remove the PCI_DMA_BUS_IS_PHYS check in 
illegal_highdma")
Reported-by: Matthew Whitehead 
Signed-off-by: Christoph Hellwig 
Signed-off-by: Thomas Gleixner 
Tested-by: Matthew Whitehead 
Cc: konrad.w...@oracle.com
Cc: io...@lists.linux-foundation.org
Cc: sta...@vger.kernel.org
Link: https://lkml.kernel.org/r/20181014075208.2715-1-...@lst.de
Signed-off-by: Ingo Molnar 
---
 arch/x86/kernel/pci-swiotlb.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
index 661583662430..71c0b01d93b1 100644
--- a/arch/x86/kernel/pci-swiotlb.c
+++ b/arch/x86/kernel/pci-swiotlb.c
@@ -42,10 +42,8 @@ IOMMU_INIT_FINISH(pci_swiotlb_detect_override,
 int __init pci_swiotlb_detect_4gb(void)
 {
/* don't initialize swiotlb if iommu=off (no_iommu=1) */
-#ifdef CONFIG_X86_64
if (!no_iommu && max_possible_pfn > MAX_DMA32_PFN)
swiotlb = 1;
-#endif
 
/*
 * If SME is active then swiotlb will be set to 1 so that bounce

Re: [PATCH 1/2] sched/cpufreq: Reorganize the cpufreq files

2018-10-18 Thread kbuild test robot

Hi Daniel,

I love your patch! Yet something to improve:

[auto build test ERROR on tip/sched/core]
[also build test ERROR on v4.19-rc8 next-20181018]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Daniel-Lezcano/sched-cpufreq-Reorganize-the-cpufreq-files/20181019-103851
config: x86_64-randconfig-s2-10191209 (attached as .config)
compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

>> scripts/Makefile.modbuiltin:26: kernel/sched/cpufreq/Makefile: No such file 
>> or directory
   make[4]: *** No rule to make target 'kernel/sched/cpufreq/Makefile'.
   make[4]: Failed to remake makefile 'kernel/sched/cpufreq/Makefile'.

vim +26 scripts/Makefile.modbuiltin

607b30fc Michal Marek 2010-06-10  22  
bc081dd6 Michal Marek 2009-12-07  23  # The filename Kbuild has precedence over 
Makefile
bc081dd6 Michal Marek 2009-12-07  24  kbuild-dir := $(if $(filter 
/%,$(src)),$(src),$(srctree)/$(src))
bc081dd6 Michal Marek 2009-12-07  25  kbuild-file := $(if $(wildcard 
$(kbuild-dir)/Kbuild),$(kbuild-dir)/Kbuild,$(kbuild-dir)/Makefile)
bc081dd6 Michal Marek 2009-12-07 @26  include $(kbuild-file)
bc081dd6 Michal Marek 2009-12-07  27  

:: The code at line 26 was first introduced by commit
:: bc081dd6e9f622c73334dc465359168543ccaabf kbuild: generate modules.builtin

:: TO: Michal Marek 
:: CC: Michal Marek 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: [PATCH 1/2] sched/cpufreq: Reorganize the cpufreq files

2018-10-18 Thread kbuild test robot

Hi Daniel,

I love your patch! Yet something to improve:

[auto build test ERROR on tip/sched/core]
[also build test ERROR on v4.19-rc8 next-20181018]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Daniel-Lezcano/sched-cpufreq-Reorganize-the-cpufreq-files/20181019-103851
config: x86_64-randconfig-s2-10191209 (attached as .config)
compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

>> scripts/Makefile.modbuiltin:26: kernel/sched/cpufreq/Makefile: No such file 
>> or directory
   make[4]: *** No rule to make target 'kernel/sched/cpufreq/Makefile'.
   make[4]: Failed to remake makefile 'kernel/sched/cpufreq/Makefile'.

vim +26 scripts/Makefile.modbuiltin

607b30fc Michal Marek 2010-06-10  22  
bc081dd6 Michal Marek 2009-12-07  23  # The filename Kbuild has precedence over 
Makefile
bc081dd6 Michal Marek 2009-12-07  24  kbuild-dir := $(if $(filter 
/%,$(src)),$(src),$(srctree)/$(src))
bc081dd6 Michal Marek 2009-12-07  25  kbuild-file := $(if $(wildcard 
$(kbuild-dir)/Kbuild),$(kbuild-dir)/Kbuild,$(kbuild-dir)/Makefile)
bc081dd6 Michal Marek 2009-12-07 @26  include $(kbuild-file)
bc081dd6 Michal Marek 2009-12-07  27  

:: The code at line 26 was first introduced by commit
:: bc081dd6e9f622c73334dc465359168543ccaabf kbuild: generate modules.builtin

:: TO: Michal Marek 
:: CC: Michal Marek 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: [PATCH] Input: synaptics - avoid using uninitialized variable when probing

2018-10-18 Thread Peter Hutterer

On Tue, Oct 16, 2018 at 05:14:43PM -0700, Dmitry Torokhov wrote:
> synaptics_detect() does not check whether sending commands to the
> device succeeds and instead relies on getting unique data from the
> device. Let's make sure we seed entire buffer with zeroes to make sure
> we not use garbage on stack that just happen to be 0x47.
> 
> Reported-by: syzbot+13cb3b01d0784e4ff...@syzkaller.appspotmail.com
> Signed-off-by: Dmitry Torokhov 

doh, was just about to send out the same patch.

Reviewed-by: Peter Hutterer 

Cheers,
   Peter

> ---
>  drivers/input/mouse/synaptics.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/drivers/input/mouse/synaptics.c b/drivers/input/mouse/synaptics.c
> index 55d33500d55e..5e85f3cca867 100644
> --- a/drivers/input/mouse/synaptics.c
> +++ b/drivers/input/mouse/synaptics.c
> @@ -99,9 +99,7 @@ static int synaptics_mode_cmd(struct psmouse *psmouse, u8 
> mode)
>  int synaptics_detect(struct psmouse *psmouse, bool set_properties)
>  {
>   struct ps2dev *ps2dev = >ps2dev;
> - u8 param[4];
> -
> - param[0] = 0;
> + u8 param[4] = { 0 };
>  
>   ps2_command(ps2dev, param, PSMOUSE_CMD_SETRES);
>   ps2_command(ps2dev, param, PSMOUSE_CMD_SETRES);
> -- 
> 2.19.1.331.ge82ca0e54c-goog
> 
> 
> -- 
> Dmitry

Re: [PATCH] Input: synaptics - avoid using uninitialized variable when probing

2018-10-18 Thread Peter Hutterer

On Tue, Oct 16, 2018 at 05:14:43PM -0700, Dmitry Torokhov wrote:
> synaptics_detect() does not check whether sending commands to the
> device succeeds and instead relies on getting unique data from the
> device. Let's make sure we seed entire buffer with zeroes to make sure
> we not use garbage on stack that just happen to be 0x47.
> 
> Reported-by: syzbot+13cb3b01d0784e4ff...@syzkaller.appspotmail.com
> Signed-off-by: Dmitry Torokhov 

doh, was just about to send out the same patch.

Reviewed-by: Peter Hutterer 

Cheers,
   Peter

> ---
>  drivers/input/mouse/synaptics.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/drivers/input/mouse/synaptics.c b/drivers/input/mouse/synaptics.c
> index 55d33500d55e..5e85f3cca867 100644
> --- a/drivers/input/mouse/synaptics.c
> +++ b/drivers/input/mouse/synaptics.c
> @@ -99,9 +99,7 @@ static int synaptics_mode_cmd(struct psmouse *psmouse, u8 
> mode)
>  int synaptics_detect(struct psmouse *psmouse, bool set_properties)
>  {
>   struct ps2dev *ps2dev = >ps2dev;
> - u8 param[4];
> -
> - param[0] = 0;
> + u8 param[4] = { 0 };
>  
>   ps2_command(ps2dev, param, PSMOUSE_CMD_SETRES);
>   ps2_command(ps2dev, param, PSMOUSE_CMD_SETRES);
> -- 
> 2.19.1.331.ge82ca0e54c-goog
> 
> 
> -- 
> Dmitry

[RFC v4 0/2] WhiteEgret LSM module

2018-10-18 Thread Shinya Takumi

WhiteEgret is an LSM to simply provide a whitelisting-type
execution control.

An execution-whitelist, simply called whitelist, is a list
of executable components (e.g., applications, libraries)
that are approved to run on a host. The whitelist is used
to decide whether executable components are permitted to
execute or not. This mechanism can stop an execution of
unknown software, so it helps stop the execution of
malicious code and other unauthorized software.

It is important to maintain a whitelist properly according to
the execution environments. Managing whitelists for information
systems is a difficult task because their environments are
changed frequently. On the other hand, for such devices that
continue to do the same tasks for a certain period of time,
we can use the same whitelist for the period once the whitelist
is established. Thus the whitelisting-type execution control
works best in such execution environments. Examples of the above
execution environments include control devices in industrial
control systems.

Although the number of changing whitelists is not so large,
it is necessary to change them according to a system life cycle
or each phase of system operations. There is a requirement to
change whitelists with the system operations continued because
they often cannot easily be stopped. For example, such cases
include temporarily allowing maintenance programs for maintenance
or troubleshooting purposes while running the systems.

WhiteEgret is aiming at satisfying the above requirement.
WhiteEgret adopts a model that a whitelist is managed in user space.
Namely, WhiteEgret assumes that a privileged user manages a whitelist
in user space. This makes it possible to change the whitelist while
running the systems.

Mechanism of WhiteEgret

WhiteEgret requires a user application called WhiteEgret User
Application (WEUA, for short). WhiteEgret utilizes the
bprm_check_security hook and the mmap_file hook.
WhiteEgret asks WEUA whether an executable component hooked
by the above hooks is permitted to execute or not.
If the response from the WEUA is "permit", then WhiteEgret
continues to process the executable component. If the response
is "not permit", then WhiteEgret returns an error and blocks
the execution of the executable component.
The bprm_check_security hook is triggered by execve system
call, so execution by almost all executable components are
hooked by the hook. However, because shared objects do not
invoke execve system call, WhiteEgret utilizes the mmap_file
hook to hook the memory mapping by a shared object.
Thus WhiteEgret ignores the mmap_file hook caused by
non-executable and by executable which calls execve system call.

To ask the permission to a WEUA, WhiteEgret sends the
absolute path, the inode number and the device number of the
executable component to the WEUA.
Then the WEUA is expected to work as follows.
The WEUA sees if the tuple of the absolute path and/or the inode
number and/or the device number is contained in the whitelist.
If it exists, the WEUA compares a hash value of the executable
component indicated by the absolute path (and/or the inode number
and/or device number) with that in the whitelist to see whether
the executable component is changed or not after the whitelist is
made. The WEUA returns "permit" if both tests are passed,
otherwise returns "not permit".

WhiteEgret v4 is also able to control for script execution. Some
LSM hooks (file_open/file_permission/task_alloc/task_free) are
added. Kernel configs are required to enable the hooks.

Most of interpreters open script files to execute. Therefore
WhiteEgret hooks for reading or opening a file. Then WhiteEgret
asks the WEUA whether an execution of the script is permitted to
execute or not. WhiteEgret is able to choose a hook entry for
execution control between file_open or file_permission.

Hook entries of task_alloc and task_free are used to control
exections of script effectively. Some interpreters forks child
processes to execte script files, so the WEUA managed a process
family of an interpter by bprm_check_security, task_alloc and
task_free.

To use WhiteEgret

Users have to prepare a whitelist and a WEUA to use WhiteEgret.
A sample WEUA is involved in samples/whiteegret/.

To enable WhiteEgret, you are required to build the kernel using
normal procedures with CONFIG_SECURITY_WHITEEGRET=y.

Additionally, SECURITY_WHITEEGRET_INTERPRETER=y option is
required to enable to control script executions.
And SECURITY_WHITEEGRET_WRITE=y option is also required to
detect of writing script files.

The patchset is also available in our github repo:
  https://github.com/whiteegret/whiteegret

---
Changes in v4:
- Add LSM hooks (file_open/file_permission/task_alloc/task_free)
  to control for script execution. Kernel configs are required
to enable the hooks.
- Modify the data struct for kernel space and user space
  communication.

Changes in v3:
- Change to a minor LSM module.

Changes in v2:
-

[RFC v4 0/2] WhiteEgret LSM module

2018-10-18 Thread Shinya Takumi

WhiteEgret is an LSM to simply provide a whitelisting-type
execution control.

An execution-whitelist, simply called whitelist, is a list
of executable components (e.g., applications, libraries)
that are approved to run on a host. The whitelist is used
to decide whether executable components are permitted to
execute or not. This mechanism can stop an execution of
unknown software, so it helps stop the execution of
malicious code and other unauthorized software.

It is important to maintain a whitelist properly according to
the execution environments. Managing whitelists for information
systems is a difficult task because their environments are
changed frequently. On the other hand, for such devices that
continue to do the same tasks for a certain period of time,
we can use the same whitelist for the period once the whitelist
is established. Thus the whitelisting-type execution control
works best in such execution environments. Examples of the above
execution environments include control devices in industrial
control systems.

Although the number of changing whitelists is not so large,
it is necessary to change them according to a system life cycle
or each phase of system operations. There is a requirement to
change whitelists with the system operations continued because
they often cannot easily be stopped. For example, such cases
include temporarily allowing maintenance programs for maintenance
or troubleshooting purposes while running the systems.

WhiteEgret is aiming at satisfying the above requirement.
WhiteEgret adopts a model that a whitelist is managed in user space.
Namely, WhiteEgret assumes that a privileged user manages a whitelist
in user space. This makes it possible to change the whitelist while
running the systems.

Mechanism of WhiteEgret

WhiteEgret requires a user application called WhiteEgret User
Application (WEUA, for short). WhiteEgret utilizes the
bprm_check_security hook and the mmap_file hook.
WhiteEgret asks WEUA whether an executable component hooked
by the above hooks is permitted to execute or not.
If the response from the WEUA is "permit", then WhiteEgret
continues to process the executable component. If the response
is "not permit", then WhiteEgret returns an error and blocks
the execution of the executable component.
The bprm_check_security hook is triggered by execve system
call, so execution by almost all executable components are
hooked by the hook. However, because shared objects do not
invoke execve system call, WhiteEgret utilizes the mmap_file
hook to hook the memory mapping by a shared object.
Thus WhiteEgret ignores the mmap_file hook caused by
non-executable and by executable which calls execve system call.

To ask the permission to a WEUA, WhiteEgret sends the
absolute path, the inode number and the device number of the
executable component to the WEUA.
Then the WEUA is expected to work as follows.
The WEUA sees if the tuple of the absolute path and/or the inode
number and/or the device number is contained in the whitelist.
If it exists, the WEUA compares a hash value of the executable
component indicated by the absolute path (and/or the inode number
and/or device number) with that in the whitelist to see whether
the executable component is changed or not after the whitelist is
made. The WEUA returns "permit" if both tests are passed,
otherwise returns "not permit".

WhiteEgret v4 is also able to control for script execution. Some
LSM hooks (file_open/file_permission/task_alloc/task_free) are
added. Kernel configs are required to enable the hooks.

Most of interpreters open script files to execute. Therefore
WhiteEgret hooks for reading or opening a file. Then WhiteEgret
asks the WEUA whether an execution of the script is permitted to
execute or not. WhiteEgret is able to choose a hook entry for
execution control between file_open or file_permission.

Hook entries of task_alloc and task_free are used to control
exections of script effectively. Some interpreters forks child
processes to execte script files, so the WEUA managed a process
family of an interpter by bprm_check_security, task_alloc and
task_free.

To use WhiteEgret

Users have to prepare a whitelist and a WEUA to use WhiteEgret.
A sample WEUA is involved in samples/whiteegret/.

To enable WhiteEgret, you are required to build the kernel using
normal procedures with CONFIG_SECURITY_WHITEEGRET=y.

Additionally, SECURITY_WHITEEGRET_INTERPRETER=y option is
required to enable to control script executions.
And SECURITY_WHITEEGRET_WRITE=y option is also required to
detect of writing script files.

The patchset is also available in our github repo:
  https://github.com/whiteegret/whiteegret

---
Changes in v4:
- Add LSM hooks (file_open/file_permission/task_alloc/task_free)
  to control for script execution. Kernel configs are required
to enable the hooks.
- Modify the data struct for kernel space and user space
  communication.

Changes in v3:
- Change to a minor LSM module.

Changes in v2:
-

[RFC v4 2/2] WhiteEgret: Add an example of user application.

2018-10-18 Thread Shinya Takumi

A user application is required to use WhiteEgret.
This RFC provides a sample user application program.

Usage
  sample-we-user

[RFC v4 2/2] WhiteEgret: Add an example of user application.

2018-10-18 Thread Shinya Takumi

A user application is required to use WhiteEgret.
This RFC provides a sample user application program.

Usage
  sample-we-user

Re: [PATCH 1/2] sched/cpufreq: Reorganize the cpufreq files

2018-10-18 Thread kbuild test robot

Hi Daniel,

I love your patch! Yet something to improve:

[auto build test ERROR on tip/sched/core]
[also build test ERROR on v4.19-rc8 next-20181018]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Daniel-Lezcano/sched-cpufreq-Reorganize-the-cpufreq-files/20181019-103851
config: x86_64-randconfig-s1-10191209 (attached as .config)
compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

>> scripts/Makefile.build:45: kernel/sched/cpufreq/Makefile: No such file or 
>> directory
>> make[4]: *** No rule to make target 'kernel/sched/cpufreq/Makefile'.
   make[4]: Failed to remake makefile 'kernel/sched/cpufreq/Makefile'.

vim +45 scripts/Makefile.build

0c53c8e6e Sam Ravnborg   2007-10-14  41  
2a6914703 Sam Ravnborg   2005-07-25  42  # The filename Kbuild has precedence 
over Makefile
db8c1a7b2 Sam Ravnborg   2005-07-27  43  kbuild-dir := $(if $(filter 
/%,$(src)),$(src),$(srctree)/$(src))
0c53c8e6e Sam Ravnborg   2007-10-14  44  kbuild-file := $(if $(wildcard 
$(kbuild-dir)/Kbuild),$(kbuild-dir)/Kbuild,$(kbuild-dir)/Makefile)
0c53c8e6e Sam Ravnborg   2007-10-14 @45  include $(kbuild-file)
^1da177e4 Linus Torvalds 2005-04-16  46  

:: The code at line 45 was first introduced by commit
:: 0c53c8e6eb456cde30f2305421c605713856abc8 kbuild: check for wrong use of 
CFLAGS

:: TO: Sam Ravnborg 
:: CC: Sam Ravnborg 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: [PATCH 1/2] sched/cpufreq: Reorganize the cpufreq files

2018-10-18 Thread kbuild test robot

Hi Daniel,

I love your patch! Yet something to improve:

[auto build test ERROR on tip/sched/core]
[also build test ERROR on v4.19-rc8 next-20181018]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Daniel-Lezcano/sched-cpufreq-Reorganize-the-cpufreq-files/20181019-103851
config: x86_64-randconfig-s1-10191209 (attached as .config)
compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

>> scripts/Makefile.build:45: kernel/sched/cpufreq/Makefile: No such file or 
>> directory
>> make[4]: *** No rule to make target 'kernel/sched/cpufreq/Makefile'.
   make[4]: Failed to remake makefile 'kernel/sched/cpufreq/Makefile'.

vim +45 scripts/Makefile.build

0c53c8e6e Sam Ravnborg   2007-10-14  41  
2a6914703 Sam Ravnborg   2005-07-25  42  # The filename Kbuild has precedence 
over Makefile
db8c1a7b2 Sam Ravnborg   2005-07-27  43  kbuild-dir := $(if $(filter 
/%,$(src)),$(src),$(srctree)/$(src))
0c53c8e6e Sam Ravnborg   2007-10-14  44  kbuild-file := $(if $(wildcard 
$(kbuild-dir)/Kbuild),$(kbuild-dir)/Kbuild,$(kbuild-dir)/Makefile)
0c53c8e6e Sam Ravnborg   2007-10-14 @45  include $(kbuild-file)
^1da177e4 Linus Torvalds 2005-04-16  46  

:: The code at line 45 was first introduced by commit
:: 0c53c8e6eb456cde30f2305421c605713856abc8 kbuild: check for wrong use of 
CFLAGS

:: TO: Sam Ravnborg 
:: CC: Sam Ravnborg 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

[RFC v4 1/2] WhiteEgret: Add WhiteEgret core functions.

2018-10-18 Thread Shinya Takumi

This RFC provides implementation of WhiteEgret.

Signed-off-by: Shinya Takumi 
---
 security/Kconfig   |   1 +
 security/Makefile  |   2 +
 security/whiteegret/Kconfig|  82 +++
 security/whiteegret/Makefile   |   2 +
 security/whiteegret/init.c | 148 
 security/whiteegret/main.c | 466 +
 security/whiteegret/request.c  |  98 
 security/whiteegret/request.h  |  47 
 security/whiteegret/we.h   |  72 ++
 security/whiteegret/we_fs.c| 269 +
 security/whiteegret/we_fs.h|  23 ++
 security/whiteegret/we_fs_common.h |  60 +
 12 files changed, 1270 insertions(+)
 create mode 100644 security/whiteegret/Kconfig
 create mode 100644 security/whiteegret/Makefile
 create mode 100644 security/whiteegret/init.c
 create mode 100644 security/whiteegret/main.c
 create mode 100644 security/whiteegret/request.c
 create mode 100644 security/whiteegret/request.h
 create mode 100644 security/whiteegret/we.h
 create mode 100644 security/whiteegret/we_fs.c
 create mode 100644 security/whiteegret/we_fs.h
 create mode 100644 security/whiteegret/we_fs_common.h

diff --git a/security/Kconfig b/security/Kconfig
index d9aa521..d656e20 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -236,6 +236,7 @@ source security/tomoyo/Kconfig
 source security/apparmor/Kconfig
 source security/loadpin/Kconfig
 source security/yama/Kconfig
+source security/whiteegret/Kconfig
 
 source security/integrity/Kconfig
 
diff --git a/security/Makefile b/security/Makefile
index 4d2d378..2da669c 100644
--- a/security/Makefile
+++ b/security/Makefile
@@ -10,6 +10,7 @@ subdir-$(CONFIG_SECURITY_TOMOYO)+= tomoyo
 subdir-$(CONFIG_SECURITY_APPARMOR) += apparmor
 subdir-$(CONFIG_SECURITY_YAMA) += yama
 subdir-$(CONFIG_SECURITY_LOADPIN)  += loadpin
+subdir-$(CONFIG_SECURITY_WHITEEGRET)   += whiteegret
 
 # always enable default capabilities
 obj-y  += commoncap.o
@@ -26,6 +27,7 @@ obj-$(CONFIG_SECURITY_APPARMOR)   += apparmor/
 obj-$(CONFIG_SECURITY_YAMA)+= yama/
 obj-$(CONFIG_SECURITY_LOADPIN) += loadpin/
 obj-$(CONFIG_CGROUP_DEVICE)+= device_cgroup.o
+obj-$(CONFIG_SECURITY_WHITEEGRET)  += whiteegret/
 
 # Object integrity file lists
 subdir-$(CONFIG_INTEGRITY) += integrity
diff --git a/security/whiteegret/Kconfig b/security/whiteegret/Kconfig
new file mode 100644
index 000..e55acc4
--- /dev/null
+++ b/security/whiteegret/Kconfig
@@ -0,0 +1,82 @@
+config SECURITY_WHITEEGRET
+   bool "WhiteEgret support"
+   depends on SECURITY
+   select SECURITYFS
+   default n
+   help
+ This enables the WhiteEgret security module.
+ WhiteEgret provides a whitelisting execution control capability,
+ which helps stop the execution of unauthorized software
+ such as malware.
+ You will also need a user application and an execution whitelist.
+ If you are unsure how to answer this question, answer N.
+
+config SECURITY_WHITEEGRET_INTERPRETER
+   bool "WhiteEgret hook file read and create/exit task for interpreter"
+   depends on SECURITY_WHITEEGRET
+   default n
+   help
+ This add LSM fook points for controlling interpreter.
+ Target hook points are file read and create/exit task functions.
+ You selecte details hook points for enabling config depend on
+ SECURITY_WHITEEGRET_INTERPRETER.
+
+config SECURITY_WHITEEGRET_HOOK_FILE_READ
+   bool "WhiteEgret hook file read"
+   depends on SECURITY_WHITEEGRET_INTERPRETER
+   default n
+   help
+ This enables hooking file read. The Kernel notify hooking infomation
+ to WhiteEgret's user application. This applocation can receive
+ hooking infomation and contorolling execution of hook function.
+
+config SECURITY_WHITEEGRET_HOOK_READ_OPEN
+   bool "WhiteEgret hook open for file read"
+   depends on SECURITY_WHITEEGRET_INTERPRETER
+   default y
+   help
+ This enables hooking file open LSM for reading. The Kernel notify
+ hooking infomation to WhiteEgret user application. This applocation
+ can receive hooking infomation and contorolling execution of
+ hook function.
+
+config SECURITY_WHITEEGRET_CHECK_LIVING_TASK
+   bool "WhiteEgret hook creating and exiting task"
+   depends on SECURITY_WHITEEGRET_INTERPRETER
+   default y
+   help
+ This enables hooking create/exit task LSM. The Kernel notify
+ hooking infomation to WhiteEgret user application. This applocation
+ can receive hooking infomation and contorolling execution of
+ hook function.
+
+config SECURITY_WHITEEGRET_HOOK_WRITE
+   bool "WhiteEgret hook write"
+   depends on SECURITY_WHITEEGRET
+   select SECURITY_PATH
+

[RFC v4 1/2] WhiteEgret: Add WhiteEgret core functions.

2018-10-18 Thread Shinya Takumi

This RFC provides implementation of WhiteEgret.

Signed-off-by: Shinya Takumi 
---
 security/Kconfig   |   1 +
 security/Makefile  |   2 +
 security/whiteegret/Kconfig|  82 +++
 security/whiteegret/Makefile   |   2 +
 security/whiteegret/init.c | 148 
 security/whiteegret/main.c | 466 +
 security/whiteegret/request.c  |  98 
 security/whiteegret/request.h  |  47 
 security/whiteegret/we.h   |  72 ++
 security/whiteegret/we_fs.c| 269 +
 security/whiteegret/we_fs.h|  23 ++
 security/whiteegret/we_fs_common.h |  60 +
 12 files changed, 1270 insertions(+)
 create mode 100644 security/whiteegret/Kconfig
 create mode 100644 security/whiteegret/Makefile
 create mode 100644 security/whiteegret/init.c
 create mode 100644 security/whiteegret/main.c
 create mode 100644 security/whiteegret/request.c
 create mode 100644 security/whiteegret/request.h
 create mode 100644 security/whiteegret/we.h
 create mode 100644 security/whiteegret/we_fs.c
 create mode 100644 security/whiteegret/we_fs.h
 create mode 100644 security/whiteegret/we_fs_common.h

diff --git a/security/Kconfig b/security/Kconfig
index d9aa521..d656e20 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -236,6 +236,7 @@ source security/tomoyo/Kconfig
 source security/apparmor/Kconfig
 source security/loadpin/Kconfig
 source security/yama/Kconfig
+source security/whiteegret/Kconfig
 
 source security/integrity/Kconfig
 
diff --git a/security/Makefile b/security/Makefile
index 4d2d378..2da669c 100644
--- a/security/Makefile
+++ b/security/Makefile
@@ -10,6 +10,7 @@ subdir-$(CONFIG_SECURITY_TOMOYO)+= tomoyo
 subdir-$(CONFIG_SECURITY_APPARMOR) += apparmor
 subdir-$(CONFIG_SECURITY_YAMA) += yama
 subdir-$(CONFIG_SECURITY_LOADPIN)  += loadpin
+subdir-$(CONFIG_SECURITY_WHITEEGRET)   += whiteegret
 
 # always enable default capabilities
 obj-y  += commoncap.o
@@ -26,6 +27,7 @@ obj-$(CONFIG_SECURITY_APPARMOR)   += apparmor/
 obj-$(CONFIG_SECURITY_YAMA)+= yama/
 obj-$(CONFIG_SECURITY_LOADPIN) += loadpin/
 obj-$(CONFIG_CGROUP_DEVICE)+= device_cgroup.o
+obj-$(CONFIG_SECURITY_WHITEEGRET)  += whiteegret/
 
 # Object integrity file lists
 subdir-$(CONFIG_INTEGRITY) += integrity
diff --git a/security/whiteegret/Kconfig b/security/whiteegret/Kconfig
new file mode 100644
index 000..e55acc4
--- /dev/null
+++ b/security/whiteegret/Kconfig
@@ -0,0 +1,82 @@
+config SECURITY_WHITEEGRET
+   bool "WhiteEgret support"
+   depends on SECURITY
+   select SECURITYFS
+   default n
+   help
+ This enables the WhiteEgret security module.
+ WhiteEgret provides a whitelisting execution control capability,
+ which helps stop the execution of unauthorized software
+ such as malware.
+ You will also need a user application and an execution whitelist.
+ If you are unsure how to answer this question, answer N.
+
+config SECURITY_WHITEEGRET_INTERPRETER
+   bool "WhiteEgret hook file read and create/exit task for interpreter"
+   depends on SECURITY_WHITEEGRET
+   default n
+   help
+ This add LSM fook points for controlling interpreter.
+ Target hook points are file read and create/exit task functions.
+ You selecte details hook points for enabling config depend on
+ SECURITY_WHITEEGRET_INTERPRETER.
+
+config SECURITY_WHITEEGRET_HOOK_FILE_READ
+   bool "WhiteEgret hook file read"
+   depends on SECURITY_WHITEEGRET_INTERPRETER
+   default n
+   help
+ This enables hooking file read. The Kernel notify hooking infomation
+ to WhiteEgret's user application. This applocation can receive
+ hooking infomation and contorolling execution of hook function.
+
+config SECURITY_WHITEEGRET_HOOK_READ_OPEN
+   bool "WhiteEgret hook open for file read"
+   depends on SECURITY_WHITEEGRET_INTERPRETER
+   default y
+   help
+ This enables hooking file open LSM for reading. The Kernel notify
+ hooking infomation to WhiteEgret user application. This applocation
+ can receive hooking infomation and contorolling execution of
+ hook function.
+
+config SECURITY_WHITEEGRET_CHECK_LIVING_TASK
+   bool "WhiteEgret hook creating and exiting task"
+   depends on SECURITY_WHITEEGRET_INTERPRETER
+   default y
+   help
+ This enables hooking create/exit task LSM. The Kernel notify
+ hooking infomation to WhiteEgret user application. This applocation
+ can receive hooking infomation and contorolling execution of
+ hook function.
+
+config SECURITY_WHITEEGRET_HOOK_WRITE
+   bool "WhiteEgret hook write"
+   depends on SECURITY_WHITEEGRET
+   select SECURITY_PATH
+

Re: [RFC PATCH 1/5] x86: introduce preemption disable prefix

2018-10-18 Thread Alexei Starovoitov

> 
> >
> > Another example is __BPF_PROG_RUN_ARRAY(), which also uses
> > preempt_enable_no_resched().
> 
> Alexei, I think this code is just wrong.

why 'just wrong' ?

> Do you know why it uses
> preempt_enable_no_resched()?

dont recall precisely.
we could be preemptable at the point where macro is called.
I think the goal of no_resched was to avoid adding scheduling points
where they didn't exist before just because a prog ran for few nsec.
May be Daniel or Roman remember.

Re: [RFC PATCH 1/5] x86: introduce preemption disable prefix

2018-10-18 Thread Alexei Starovoitov

> 
> >
> > Another example is __BPF_PROG_RUN_ARRAY(), which also uses
> > preempt_enable_no_resched().
> 
> Alexei, I think this code is just wrong.

why 'just wrong' ?

> Do you know why it uses
> preempt_enable_no_resched()?

dont recall precisely.
we could be preemptable at the point where macro is called.
I think the goal of no_resched was to avoid adding scheduling points
where they didn't exist before just because a prog ran for few nsec.
May be Daniel or Roman remember.

Re: linux-next: build warning after merge of the scsi tree

2018-10-18 Thread Stephen Rothwell

Hi James,

On Thu, 18 Oct 2018 21:54:03 -0700 James Bottomley 
 wrote:
>
> It's the merge commit ... it was obviously the wrong choice; I'll fix
> it.

OK, thanks.

-- 
Cheers,
Stephen Rothwell


pgpS8HDlYzZFg.pgp
Description: OpenPGP digital signature

Re: linux-next: build warning after merge of the scsi tree

2018-10-18 Thread Stephen Rothwell

Hi James,

On Thu, 18 Oct 2018 21:54:03 -0700 James Bottomley 
 wrote:
>
> It's the merge commit ... it was obviously the wrong choice; I'll fix
> it.

OK, thanks.

-- 
Cheers,
Stephen Rothwell


pgpS8HDlYzZFg.pgp
Description: OpenPGP digital signature

[PATCH] coresight: tmc: Fix bad register address for CLAIM

2018-10-18 Thread Leo Yan

Commit 4d3ebd3658d8 ("coreisght: tmc: Claim device before use") uses
CLAIM tag to validate if the device is available, it needs to pass
the device base address to access related registers.

In the function tmc_etb_disable_hw() it wrongly passes the driver data
pointer as register base address, thus it's easily to produce the kernel
warning info like below:

[   83.579898] WARNING: CPU: 4 PID: 2970 at 
drivers/hwtracing/coresight/coresight.c:207 
coresight_disclaim_device_unlocked+0x44/0x80
[   83.591448] Modules linked in:
[   83.594485] CPU: 4 PID: 2970 Comm: uname Not tainted 
4.19.0-rc6-00417-g721b509 #110
[   83.602067] Hardware name: ARM Juno development board (r2) (DT)
[   83.607932] pstate: 8085 (Nzcv daIf -PAN -UAO)
[   83.612681] pc : coresight_disclaim_device_unlocked+0x44/0x80
[   83.618375] lr : coresight_disclaim_device_unlocked+0x44/0x80
[   83.624064] sp : 0fe3ba20
[   83.627347] x29: 0fe3ba20 x28: 80002d430dc0
[   83.632618] x27: 800033177c00 x26: 80002eb44480
[   83.637889] x25: 0001 x24: 800033c72600
[   83.643160] x23: 099b11f8 x22: 099b11c8
[   83.648430] x21: 0002 x20: 800033a90418
[   83.653701] x19: 099b11c8 x18: 
[   83.658971] x17:  x16: 
[   83.664241] x15:  x14: 
[   83.669511] x13:  x12: 
[   83.674782] x11:  x10: 
[   83.680052] x9 :  x8 : 0001
[   83.685322] x7 : 0001 x6 : 800033ebab18
[   83.690593] x5 : 800033ebab18 x4 : 800033e6c698
[   83.695862] x3 : 0001 x2 : 
[   83.701133] x1 :  x0 : 0001
[   83.706404] Call trace:
[   83.708830]  coresight_disclaim_device_unlocked+0x44/0x80
[   83.714180]  coresight_disclaim_device+0x34/0x48
[   83.718756]  tmc_disable_etf_sink+0xc4/0xf0
[   83.722902]  coresight_disable_path_from+0xc8/0x240
[   83.727735]  coresight_disable_path+0x24/0x30
[   83.732053]  etm_event_stop+0x130/0x170
[   83.735854]  etm_event_del+0x24/0x30
[   83.739399]  event_sched_out.isra.51+0xcc/0x1e8
[   83.743887]  group_sched_out.part.53+0x44/0xb0
[   83.748291]  ctx_sched_out+0x298/0x2b8
[   83.752005]  task_ctx_sched_out+0x74/0xa8
[   83.755980]  perf_event_exit_task+0x140/0x418
[   83.760298]  do_exit+0x3f4/0xcf0
[   83.763497]  do_group_exit+0x5c/0xc0
[   83.767041]  __arm64_sys_exit_group+0x24/0x28
[   83.771359]  el0_svc_common+0x110/0x178
[   83.775160]  el0_svc_handler+0x94/0xe8
[   83.778875]  el0_svc+0x8/0xc
[   83.781728] ---[ end trace 02d8d8eac46db9e5 ]---

This patch is to fix this bug by using 'drvdata->base' as the
register base address for CLAIM related operation.

Fixes: 4d3ebd3658d8 ("coreisght: tmc: Claim device before use")
Cc: Suzuki Poulose 
Cc: Mathieu Poirier 
Cc: Mike Leach 
Cc: Robert Walker 
Signed-off-by: Leo Yan 
---
 drivers/hwtracing/coresight/coresight-tmc-etf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c 
b/drivers/hwtracing/coresight/coresight-tmc-etf.c
index 53fc83b..5864ac5 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
@@ -86,7 +86,7 @@ static void __tmc_etb_disable_hw(struct tmc_drvdata *drvdata)
 
 static void tmc_etb_disable_hw(struct tmc_drvdata *drvdata)
 {
-   coresight_disclaim_device(drvdata);
+   coresight_disclaim_device(drvdata->base);
__tmc_etb_disable_hw(drvdata);
 }
 
-- 
2.7.4

[PATCH] coresight: tmc: Fix bad register address for CLAIM

2018-10-18 Thread Leo Yan

Commit 4d3ebd3658d8 ("coreisght: tmc: Claim device before use") uses
CLAIM tag to validate if the device is available, it needs to pass
the device base address to access related registers.

In the function tmc_etb_disable_hw() it wrongly passes the driver data
pointer as register base address, thus it's easily to produce the kernel
warning info like below:

[   83.579898] WARNING: CPU: 4 PID: 2970 at 
drivers/hwtracing/coresight/coresight.c:207 
coresight_disclaim_device_unlocked+0x44/0x80
[   83.591448] Modules linked in:
[   83.594485] CPU: 4 PID: 2970 Comm: uname Not tainted 
4.19.0-rc6-00417-g721b509 #110
[   83.602067] Hardware name: ARM Juno development board (r2) (DT)
[   83.607932] pstate: 8085 (Nzcv daIf -PAN -UAO)
[   83.612681] pc : coresight_disclaim_device_unlocked+0x44/0x80
[   83.618375] lr : coresight_disclaim_device_unlocked+0x44/0x80
[   83.624064] sp : 0fe3ba20
[   83.627347] x29: 0fe3ba20 x28: 80002d430dc0
[   83.632618] x27: 800033177c00 x26: 80002eb44480
[   83.637889] x25: 0001 x24: 800033c72600
[   83.643160] x23: 099b11f8 x22: 099b11c8
[   83.648430] x21: 0002 x20: 800033a90418
[   83.653701] x19: 099b11c8 x18: 
[   83.658971] x17:  x16: 
[   83.664241] x15:  x14: 
[   83.669511] x13:  x12: 
[   83.674782] x11:  x10: 
[   83.680052] x9 :  x8 : 0001
[   83.685322] x7 : 0001 x6 : 800033ebab18
[   83.690593] x5 : 800033ebab18 x4 : 800033e6c698
[   83.695862] x3 : 0001 x2 : 
[   83.701133] x1 :  x0 : 0001
[   83.706404] Call trace:
[   83.708830]  coresight_disclaim_device_unlocked+0x44/0x80
[   83.714180]  coresight_disclaim_device+0x34/0x48
[   83.718756]  tmc_disable_etf_sink+0xc4/0xf0
[   83.722902]  coresight_disable_path_from+0xc8/0x240
[   83.727735]  coresight_disable_path+0x24/0x30
[   83.732053]  etm_event_stop+0x130/0x170
[   83.735854]  etm_event_del+0x24/0x30
[   83.739399]  event_sched_out.isra.51+0xcc/0x1e8
[   83.743887]  group_sched_out.part.53+0x44/0xb0
[   83.748291]  ctx_sched_out+0x298/0x2b8
[   83.752005]  task_ctx_sched_out+0x74/0xa8
[   83.755980]  perf_event_exit_task+0x140/0x418
[   83.760298]  do_exit+0x3f4/0xcf0
[   83.763497]  do_group_exit+0x5c/0xc0
[   83.767041]  __arm64_sys_exit_group+0x24/0x28
[   83.771359]  el0_svc_common+0x110/0x178
[   83.775160]  el0_svc_handler+0x94/0xe8
[   83.778875]  el0_svc+0x8/0xc
[   83.781728] ---[ end trace 02d8d8eac46db9e5 ]---

This patch is to fix this bug by using 'drvdata->base' as the
register base address for CLAIM related operation.

Fixes: 4d3ebd3658d8 ("coreisght: tmc: Claim device before use")
Cc: Suzuki Poulose 
Cc: Mathieu Poirier 
Cc: Mike Leach 
Cc: Robert Walker 
Signed-off-by: Leo Yan 
---
 drivers/hwtracing/coresight/coresight-tmc-etf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c 
b/drivers/hwtracing/coresight/coresight-tmc-etf.c
index 53fc83b..5864ac5 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
@@ -86,7 +86,7 @@ static void __tmc_etb_disable_hw(struct tmc_drvdata *drvdata)
 
 static void tmc_etb_disable_hw(struct tmc_drvdata *drvdata)
 {
-   coresight_disclaim_device(drvdata);
+   coresight_disclaim_device(drvdata->base);
__tmc_etb_disable_hw(drvdata);
 }
 
-- 
2.7.4

Re: linux-next: build warning after merge of the scsi tree

2018-10-18 Thread James Bottomley

On Fri, 2018-10-19 at 15:50 +1100, Stephen Rothwell wrote:
> Hi James,
> 
> After merging the scsi tree, today's linux-next build (powerpc
> ppc64_defconfig) produced this warning:
> 
> drivers/scsi/lpfc/lpfc_debugfs.c: In function
> 'lpfc_debugfs_nodelist_open':
> drivers/scsi/lpfc/lpfc_debugfs.c:706:17: warning: 'nrport' may be
> used uninitialized in this function [-Wmaybe-uninitialized]
>switch (nrport->port_state) {
>~~^~~~
> drivers/scsi/lpfc/lpfc_debugfs.c:553:30: note: 'nrport' was declared
> here
>   struct nvme_fc_remote_port *nrport;
>   ^~
> 
> I am not sure where this has come from :-(

It's the merge commit ... it was obviously the wrong choice; I'll fix
it.

James


signature.asc
Description: This is a digitally signed message part

Re: linux-next: build warning after merge of the scsi tree

2018-10-18 Thread James Bottomley

On Fri, 2018-10-19 at 15:50 +1100, Stephen Rothwell wrote:
> Hi James,
> 
> After merging the scsi tree, today's linux-next build (powerpc
> ppc64_defconfig) produced this warning:
> 
> drivers/scsi/lpfc/lpfc_debugfs.c: In function
> 'lpfc_debugfs_nodelist_open':
> drivers/scsi/lpfc/lpfc_debugfs.c:706:17: warning: 'nrport' may be
> used uninitialized in this function [-Wmaybe-uninitialized]
>switch (nrport->port_state) {
>~~^~~~
> drivers/scsi/lpfc/lpfc_debugfs.c:553:30: note: 'nrport' was declared
> here
>   struct nvme_fc_remote_port *nrport;
>   ^~
> 
> I am not sure where this has come from :-(

It's the merge commit ... it was obviously the wrong choice; I'll fix
it.

James


signature.asc
Description: This is a digitally signed message part

Re: [PATCH] hugetlbfs: dirty pages as they are added to pagecache

2018-10-18 Thread Mike Kravetz

On 10/18/18 6:47 PM, Andrew Morton wrote:
> On Thu, 18 Oct 2018 20:46:21 -0400 Andrea Arcangeli  
> wrote:
> 
>> On Thu, Oct 18, 2018 at 04:16:40PM -0700, Mike Kravetz wrote:
>>> I was not sure about this, and expected someone could come up with
>>> something better.  It just seems there are filesystems like huegtlbfs,
>>> where it makes no sense wasting cycles traversing the filesystem.  So,
>>> let's not even try.
>>>
>>> Hoping someone can come up with a better method than hard coding as
>>> I have done above.
>>
>> It's not strictly required after marking the pages dirty though. The
>> real fix is the other one? Could we just drop the hardcoding and let
>> it run after the real fix is applied?

Yeah.  The other part of the patch is the real fix.  This drop_caches
part is not necessary.

>> The performance of drop_caches doesn't seem critical, especially with
>> gigapages. tmpfs doesn't seem to be optimized away from drop_caches
>> and the gain would be bigger for tmpfs if THP is not enabled in the
>> mount, so I'm not sure if we should worry about hugetlbfs first.
> 
> I guess so.  I can't immediately see a clean way of expressing this so
> perhaps it would need a new BDI_CAP_NO_BACKING_STORE.  Such a
> thing hardly seems worthwhile for drop_caches.
> 
> And drop_caches really shouldn't be there anyway.  It's a standing
> workaround for ongoing suckage in pagecache and metadata reclaim
> behaviour :(

I'm OK with dropping the other part.  It just seemed like there was no
real reason to try and drop_caches for hugetlbfs (and perhaps others).

Andrew, would you like another version?  Or can you just drop the
fs/drop_caches.c part?

-- 
Mike Kravetz

Re: [PATCH] hugetlbfs: dirty pages as they are added to pagecache

2018-10-18 Thread Mike Kravetz

On 10/18/18 6:47 PM, Andrew Morton wrote:
> On Thu, 18 Oct 2018 20:46:21 -0400 Andrea Arcangeli  
> wrote:
> 
>> On Thu, Oct 18, 2018 at 04:16:40PM -0700, Mike Kravetz wrote:
>>> I was not sure about this, and expected someone could come up with
>>> something better.  It just seems there are filesystems like huegtlbfs,
>>> where it makes no sense wasting cycles traversing the filesystem.  So,
>>> let's not even try.
>>>
>>> Hoping someone can come up with a better method than hard coding as
>>> I have done above.
>>
>> It's not strictly required after marking the pages dirty though. The
>> real fix is the other one? Could we just drop the hardcoding and let
>> it run after the real fix is applied?

Yeah.  The other part of the patch is the real fix.  This drop_caches
part is not necessary.

>> The performance of drop_caches doesn't seem critical, especially with
>> gigapages. tmpfs doesn't seem to be optimized away from drop_caches
>> and the gain would be bigger for tmpfs if THP is not enabled in the
>> mount, so I'm not sure if we should worry about hugetlbfs first.
> 
> I guess so.  I can't immediately see a clean way of expressing this so
> perhaps it would need a new BDI_CAP_NO_BACKING_STORE.  Such a
> thing hardly seems worthwhile for drop_caches.
> 
> And drop_caches really shouldn't be there anyway.  It's a standing
> workaround for ongoing suckage in pagecache and metadata reclaim
> behaviour :(

I'm OK with dropping the other part.  It just seemed like there was no
real reason to try and drop_caches for hugetlbfs (and perhaps others).

Andrew, would you like another version?  Or can you just drop the
fs/drop_caches.c part?

-- 
Mike Kravetz

linux-next: build warning after merge of the scsi tree

2018-10-18 Thread Stephen Rothwell

Hi James,

After merging the scsi tree, today's linux-next build (powerpc
ppc64_defconfig) produced this warning:

drivers/scsi/lpfc/lpfc_debugfs.c: In function 'lpfc_debugfs_nodelist_open':
drivers/scsi/lpfc/lpfc_debugfs.c:706:17: warning: 'nrport' may be used 
uninitialized in this function [-Wmaybe-uninitialized]
   switch (nrport->port_state) {
   ~~^~~~
drivers/scsi/lpfc/lpfc_debugfs.c:553:30: note: 'nrport' was declared here
  struct nvme_fc_remote_port *nrport;
  ^~

I am not sure where this has come from :-(

-- 
Cheers,
Stephen Rothwell


pgpcGTZnRNLfi.pgp
Description: OpenPGP digital signature

linux-next: build warning after merge of the scsi tree

2018-10-18 Thread Stephen Rothwell

Hi James,

After merging the scsi tree, today's linux-next build (powerpc
ppc64_defconfig) produced this warning:

drivers/scsi/lpfc/lpfc_debugfs.c: In function 'lpfc_debugfs_nodelist_open':
drivers/scsi/lpfc/lpfc_debugfs.c:706:17: warning: 'nrport' may be used 
uninitialized in this function [-Wmaybe-uninitialized]
   switch (nrport->port_state) {
   ~~^~~~
drivers/scsi/lpfc/lpfc_debugfs.c:553:30: note: 'nrport' was declared here
  struct nvme_fc_remote_port *nrport;
  ^~

I am not sure where this has come from :-(

-- 
Cheers,
Stephen Rothwell


pgpcGTZnRNLfi.pgp
Description: OpenPGP digital signature

Re: [PATCH] tpm: tpm_i2c_nuvoton: use correct command duration for TPM 2.x

2018-10-18 Thread Nayna Jain





On 10/17/2018 10:02 PM, Tomas Winkler wrote:

diff --git a/drivers/char/tpm/tpm_i2c_nuvoton.c 
b/drivers/char/tpm/tpm_i2c_nuvoton.c
index caa86b19c76d..f74f451baf6a 100644
--- a/drivers/char/tpm/tpm_i2c_nuvoton.c
+++ b/drivers/char/tpm/tpm_i2c_nuvoton.c
@@ -369,6 +369,7 @@ static int i2c_nuvoton_send(struct tpm_chip *chip, u8 *buf, 
size_t len)
struct device *dev = chip->dev.parent;
struct i2c_client *client = to_i2c_client(dev);
u32 ordinal;
+   unsigned long duration;
size_t count = 0;
int burst_count, bytes2write, retries, rc = -EIO;

@@ -455,10 +456,12 @@ static int i2c_nuvoton_send(struct tpm_chip *chip, u8 
*buf, size_t len)
return rc;
}
ordinal = be32_to_cpu(*((__be32 *) (buf + 6)));
-   rc = i2c_nuvoton_wait_for_data_avail(chip,
-tpm_calc_ordinal_duration(chip,
-  ordinal),
->read_queue);
+   if (chip->flags & TPM_CHIP_FLAG_TPM2)
+   duration = tpm2_calc_ordinal_duration(chip, ordinal);
+   else
+   duration = tpm_calc_ordinal_duration(chip, ordinal);
+
+   rc = i2c_nuvoton_wait_for_data_avail(chip, duration, >read_queue);
if (rc) {
dev_err(dev, "%s() timeout command duration\n", __func__);
i2c_nuvoton_ready(chip);


I only have Nuvoton TPM 2.0, tested for that.

Reviewed-by: Nayna Jain 
Tested-by: Nayna Jain  (For TPM 2.0)


Thanks & Regards,
    - Nayna

Re: [PATCH] tpm: tpm_i2c_nuvoton: use correct command duration for TPM 2.x

2018-10-18 Thread Nayna Jain





On 10/17/2018 10:02 PM, Tomas Winkler wrote:

diff --git a/drivers/char/tpm/tpm_i2c_nuvoton.c 
b/drivers/char/tpm/tpm_i2c_nuvoton.c
index caa86b19c76d..f74f451baf6a 100644
--- a/drivers/char/tpm/tpm_i2c_nuvoton.c
+++ b/drivers/char/tpm/tpm_i2c_nuvoton.c
@@ -369,6 +369,7 @@ static int i2c_nuvoton_send(struct tpm_chip *chip, u8 *buf, 
size_t len)
struct device *dev = chip->dev.parent;
struct i2c_client *client = to_i2c_client(dev);
u32 ordinal;
+   unsigned long duration;
size_t count = 0;
int burst_count, bytes2write, retries, rc = -EIO;

@@ -455,10 +456,12 @@ static int i2c_nuvoton_send(struct tpm_chip *chip, u8 
*buf, size_t len)
return rc;
}
ordinal = be32_to_cpu(*((__be32 *) (buf + 6)));
-   rc = i2c_nuvoton_wait_for_data_avail(chip,
-tpm_calc_ordinal_duration(chip,
-  ordinal),
->read_queue);
+   if (chip->flags & TPM_CHIP_FLAG_TPM2)
+   duration = tpm2_calc_ordinal_duration(chip, ordinal);
+   else
+   duration = tpm_calc_ordinal_duration(chip, ordinal);
+
+   rc = i2c_nuvoton_wait_for_data_avail(chip, duration, >read_queue);
if (rc) {
dev_err(dev, "%s() timeout command duration\n", __func__);
i2c_nuvoton_ready(chip);


I only have Nuvoton TPM 2.0, tested for that.

Reviewed-by: Nayna Jain 
Tested-by: Nayna Jain  (For TPM 2.0)


Thanks & Regards,
    - Nayna

Re: [RFC PATCH 1/5] x86: introduce preemption disable prefix

2018-10-18 Thread Nadav Amit

at 9:29 PM, Andy Lutomirski  wrote:

>> On Oct 18, 2018, at 6:08 PM, Nadav Amit  wrote:
>> 
>> at 10:00 AM, Andy Lutomirski  wrote:
>> 
 On Oct 18, 2018, at 9:47 AM, Nadav Amit  wrote:
 
 at 8:51 PM, Andy Lutomirski  wrote:
 
>> On Wed, Oct 17, 2018 at 8:12 PM Nadav Amit  wrote:
>> at 6:22 PM, Andy Lutomirski  wrote:
>> 
 On Oct 17, 2018, at 5:54 PM, Nadav Amit  wrote:
 
 It is sometimes beneficial to prevent preemption for very few
 instructions, or prevent preemption for some instructions that precede
 a branch (this latter case will be introduced in the next patches).
 
 To provide such functionality on x86-64, we use an empty REX-prefix
 (opcode 0x40) as an indication that preemption is disabled for the
 following instruction.
>>> 
>>> Nifty!
>>> 
>>> That being said, I think you have a few bugs. First, you can’t just 
>>> ignore
>>> a rescheduling interrupt, as you introduce unbounded latency when this
>>> happens — you’re effectively emulating preempt_enable_no_resched(), 
>>> which
>>> is not a drop-in replacement for preempt_enable(). To fix this, you may
>>> need to jump to a slow-path trampoline that calls schedule() at the end 
>>> or
>>> consider rewinding one instruction instead. Or use TF, which is only a
>>> little bit terrifying…
>> 
>> Yes, I didn’t pay enough attention here. For my use-case, I think that 
>> the
>> easiest solution would be to make synchronize_sched() ignore preemptions
>> that happen while the prefix is detected. It would slightly change the
>> meaning of the prefix.
 
 So thinking about it further, rewinding the instruction seems the easiest
 and most robust solution. I’ll do it.
 
>>> You also aren’t accounting for the case where you get an exception that
>>> is, in turn, preempted.
>> 
>> Hmm.. Can you give me an example for such an exception in my use-case? I
>> cannot think of an exception that might be preempted (assuming #BP, #MC
>> cannot be preempted).
> 
> Look for cond_local_irq_enable().
 
 I looked at it. Yet, I still don’t see how exceptions might happen in my
 use-case, but having said that - this can be fixed too.
>>> 
>>> I’m not totally certain there’s a case that matters.  But it’s worth 
>>> checking
>> 
>> I am still checking. But, I wanted to ask you whether the existing code is
>> correct, since it seems to me that others do the same mistake I did, unless
>> I don’t understand the code.
>> 
>> Consider for example do_int3(), and see my inlined comments:
>> 
>> dotraplinkage void notrace do_int3(struct pt_regs *regs, long error_code)
>> {
>>   ...
>>   ist_enter(regs);// => preempt_disable()
>>   cond_local_irq_enable(regs);// => assume it enables IRQs
>> 
>>   ...
>>   // resched irq can be delivered here. It will not caused rescheduling
>>   // since preemption is disabled
>> 
>>   cond_local_irq_disable(regs);// => assume it disables IRQs
>>   ist_exit(regs);// => preempt_enable_no_resched()
>> }
>> 
>> At this point resched will not happen for unbounded length of time (unless
>> there is another point when exiting the trap handler that checks if
>> preemption should take place).
> 
> I think it's only a bug in the cases where someone uses extable to fix
> up an int3 (which would be nuts) or that we oops.  But I should still
> fix it.  In the normal case where int3 was in user code, we'll miss
> the reschedule in do_trap(), but we'll reschedule in
> prepare_exit_to_usermode() -> exit_to_usermode_loop().

Thanks for your quick response, and sorry for bothering instead of dealing
with it. Note that do_debug() does something similar to do_int3().

And then there is optimized_callback() that also uses
preempt_enable_no_resched(). I think the original use was correct, but then
a19b2e3d7839 ("kprobes/x86: Remove IRQ disabling from ftrace-based/optimized
kprobes”) removed the IRQ disabling, while leaving
preempt_enable_no_resched() . No?

Re: [RFC PATCH 1/5] x86: introduce preemption disable prefix

2018-10-18 Thread Nadav Amit

at 9:29 PM, Andy Lutomirski  wrote:

>> On Oct 18, 2018, at 6:08 PM, Nadav Amit  wrote:
>> 
>> at 10:00 AM, Andy Lutomirski  wrote:
>> 
 On Oct 18, 2018, at 9:47 AM, Nadav Amit  wrote:
 
 at 8:51 PM, Andy Lutomirski  wrote:
 
>> On Wed, Oct 17, 2018 at 8:12 PM Nadav Amit  wrote:
>> at 6:22 PM, Andy Lutomirski  wrote:
>> 
 On Oct 17, 2018, at 5:54 PM, Nadav Amit  wrote:
 
 It is sometimes beneficial to prevent preemption for very few
 instructions, or prevent preemption for some instructions that precede
 a branch (this latter case will be introduced in the next patches).
 
 To provide such functionality on x86-64, we use an empty REX-prefix
 (opcode 0x40) as an indication that preemption is disabled for the
 following instruction.
>>> 
>>> Nifty!
>>> 
>>> That being said, I think you have a few bugs. First, you can’t just 
>>> ignore
>>> a rescheduling interrupt, as you introduce unbounded latency when this
>>> happens — you’re effectively emulating preempt_enable_no_resched(), 
>>> which
>>> is not a drop-in replacement for preempt_enable(). To fix this, you may
>>> need to jump to a slow-path trampoline that calls schedule() at the end 
>>> or
>>> consider rewinding one instruction instead. Or use TF, which is only a
>>> little bit terrifying…
>> 
>> Yes, I didn’t pay enough attention here. For my use-case, I think that 
>> the
>> easiest solution would be to make synchronize_sched() ignore preemptions
>> that happen while the prefix is detected. It would slightly change the
>> meaning of the prefix.
 
 So thinking about it further, rewinding the instruction seems the easiest
 and most robust solution. I’ll do it.
 
>>> You also aren’t accounting for the case where you get an exception that
>>> is, in turn, preempted.
>> 
>> Hmm.. Can you give me an example for such an exception in my use-case? I
>> cannot think of an exception that might be preempted (assuming #BP, #MC
>> cannot be preempted).
> 
> Look for cond_local_irq_enable().
 
 I looked at it. Yet, I still don’t see how exceptions might happen in my
 use-case, but having said that - this can be fixed too.
>>> 
>>> I’m not totally certain there’s a case that matters.  But it’s worth 
>>> checking
>> 
>> I am still checking. But, I wanted to ask you whether the existing code is
>> correct, since it seems to me that others do the same mistake I did, unless
>> I don’t understand the code.
>> 
>> Consider for example do_int3(), and see my inlined comments:
>> 
>> dotraplinkage void notrace do_int3(struct pt_regs *regs, long error_code)
>> {
>>   ...
>>   ist_enter(regs);// => preempt_disable()
>>   cond_local_irq_enable(regs);// => assume it enables IRQs
>> 
>>   ...
>>   // resched irq can be delivered here. It will not caused rescheduling
>>   // since preemption is disabled
>> 
>>   cond_local_irq_disable(regs);// => assume it disables IRQs
>>   ist_exit(regs);// => preempt_enable_no_resched()
>> }
>> 
>> At this point resched will not happen for unbounded length of time (unless
>> there is another point when exiting the trap handler that checks if
>> preemption should take place).
> 
> I think it's only a bug in the cases where someone uses extable to fix
> up an int3 (which would be nuts) or that we oops.  But I should still
> fix it.  In the normal case where int3 was in user code, we'll miss
> the reschedule in do_trap(), but we'll reschedule in
> prepare_exit_to_usermode() -> exit_to_usermode_loop().

Thanks for your quick response, and sorry for bothering instead of dealing
with it. Note that do_debug() does something similar to do_int3().

And then there is optimized_callback() that also uses
preempt_enable_no_resched(). I think the original use was correct, but then
a19b2e3d7839 ("kprobes/x86: Remove IRQ disabling from ftrace-based/optimized
kprobes”) removed the IRQ disabling, while leaving
preempt_enable_no_resched() . No?

Re: [PATCH 1/2] iio: adc: Add ad7124 support

2018-10-18 Thread kbuild test robot

Hi Stefan,

I love your patch! Yet something to improve:

[auto build test ERROR on iio/togreg]
[also build test ERROR on v4.19-rc8 next-20181018]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Stefan-Popa/iio-adc-Add-ad7124-support/20181019-051737
base:   https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio.git togreg
config: arm-allmodconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=arm 

All error/warnings (new ones prefixed by >>):

>> drivers/iio//adc/ad7124.c:215:3: error: 'const struct ad_sigma_delta_info' 
>> has no member named 'data_reg'
 .data_reg = AD7124_DATA,
  ^~~~
>> drivers/iio//adc/ad7124.c:25:23: warning: excess elements in struct 
>> initializer
#define AD7124_DATA   0x02
  ^
>> drivers/iio//adc/ad7124.c:215:14: note: in expansion of macro 'AD7124_DATA'
 .data_reg = AD7124_DATA,
 ^~~
   drivers/iio//adc/ad7124.c:25:23: note: (near initialization for 
'ad7124_sigma_delta_info')
#define AD7124_DATA   0x02
  ^
>> drivers/iio//adc/ad7124.c:215:14: note: in expansion of macro 'AD7124_DATA'
 .data_reg = AD7124_DATA,
 ^~~

vim +215 drivers/iio//adc/ad7124.c

20  
21  /* AD7124 registers */
22  #define AD7124_COMMS0x00
23  #define AD7124_STATUS   0x00
24  #define AD7124_ADC_CONTROL  0x01
  > 25  #define AD7124_DATA 0x02
26  #define AD7124_IO_CONTROL_1 0x03
27  #define AD7124_IO_CONTROL_2 0x04
28  #define AD7124_ID   0x05
29  #define AD7124_ERROR0x06
30  #define AD7124_ERROR_EN 0x07
31  #define AD7124_MCLK_COUNT   0x08
32  #define AD7124_CHANNEL(x)   (0x09 + (x))
33  #define AD7124_CONFIG(x)(0x19 + (x))
34  #define AD7124_FILTER(x)(0x21 + (x))
35  #define AD7124_OFFSET(x)(0x29 + (x))
36  #define AD7124_GAIN(x)  (0x31 + (x))
37  
38  /* AD7124_STATUS */
39  #define AD7124_STATUS_POR_FLAG_MSK  BIT(4)
40  
41  /* AD7124_ADC_CONTROL */
42  #define AD7124_ADC_CTRL_PWR_MSK GENMASK(7, 6)
43  #define AD7124_ADC_CTRL_PWR(x)  
FIELD_PREP(AD7124_ADC_CTRL_PWR_MSK, x)
44  #define AD7124_ADC_CTRL_MODE_MSKGENMASK(5, 2)
45  #define AD7124_ADC_CTRL_MODE(x) FIELD_PREP(AD7124_ADC_CTRL_MODE_MSK, x)
46  
47  /* AD7124_CHANNEL_X */
48  #define AD7124_CHANNEL_EN_MSK   BIT(15)
49  #define AD7124_CHANNEL_EN(x)
FIELD_PREP(AD7124_CHANNEL_EN_MSK, x)
50  #define AD7124_CHANNEL_SETUP_MSKGENMASK(14, 12)
51  #define AD7124_CHANNEL_SETUP(x) FIELD_PREP(AD7124_CHANNEL_SETUP_MSK, x)
52  #define AD7124_CHANNEL_AINP_MSK GENMASK(9, 5)
53  #define AD7124_CHANNEL_AINP(x)  
FIELD_PREP(AD7124_CHANNEL_AINP_MSK, x)
54  #define AD7124_CHANNEL_AINM_MSK GENMASK(4, 0)
55  #define AD7124_CHANNEL_AINM(x)  
FIELD_PREP(AD7124_CHANNEL_AINM_MSK, x)
56  
57  /* AD7124_CONFIG_X */
58  #define AD7124_CONFIG_BIPOLAR_MSK   BIT(11)
59  #define AD7124_CONFIG_BIPOLAR(x)
FIELD_PREP(AD7124_CONFIG_BIPOLAR_MSK, x)
60  #define AD7124_CONFIG_REF_SEL_MSK   GENMASK(4, 3)
61  #define AD7124_CONFIG_REF_SEL(x)
FIELD_PREP(AD7124_CONFIG_REF_SEL_MSK, x)
62  #define AD7124_CONFIG_PGA_MSK   GENMASK(2, 0)
63  #define AD7124_CONFIG_PGA(x)
FIELD_PREP(AD7124_CONFIG_PGA_MSK, x)
64  
65  /* AD7124_FILTER_X */
66  #define AD7124_FILTER_FS_MSKGENMASK(10, 0)
67  #define AD7124_FILTER_FS(x) 
FIELD_PREP(AD7124_FILTER_FS_MSK, x)
68  
69  enum ad7124_ids {
70  ID_AD7124_4,
71  ID_AD7124_8,
72  };
73  
74  enum ad7124_ref_sel {
75  AD7124_REFIN1,
76  AD7124_REFIN2,
77  AD7124_INT_REF,
78  AD7124_AVDD_REF,
79  };
80  
81  enum ad7124_power_mode {
82  AD7124_LOW_POWER,
83  AD7124_MID_POWER,
84  AD7124_FULL_POWER,
85  };
86  
87  static const unsigned int ad7124_gain[8] = {
88  1, 2, 4, 8, 16, 32, 64, 128
89  };
90  
91  static const int ad7124_master_clk_freq_hz[3] = {
92  [AD7124_LOW_POWER] = 76800,
93  [AD7124_MID_POWER] = 153600,
94  [AD7124_FULL_POWER] = 614400,
95

Re: [PATCH 1/2] iio: adc: Add ad7124 support

2018-10-18 Thread kbuild test robot

Hi Stefan,

I love your patch! Yet something to improve:

[auto build test ERROR on iio/togreg]
[also build test ERROR on v4.19-rc8 next-20181018]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Stefan-Popa/iio-adc-Add-ad7124-support/20181019-051737
base:   https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio.git togreg
config: arm-allmodconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=arm 

All error/warnings (new ones prefixed by >>):

>> drivers/iio//adc/ad7124.c:215:3: error: 'const struct ad_sigma_delta_info' 
>> has no member named 'data_reg'
 .data_reg = AD7124_DATA,
  ^~~~
>> drivers/iio//adc/ad7124.c:25:23: warning: excess elements in struct 
>> initializer
#define AD7124_DATA   0x02
  ^
>> drivers/iio//adc/ad7124.c:215:14: note: in expansion of macro 'AD7124_DATA'
 .data_reg = AD7124_DATA,
 ^~~
   drivers/iio//adc/ad7124.c:25:23: note: (near initialization for 
'ad7124_sigma_delta_info')
#define AD7124_DATA   0x02
  ^
>> drivers/iio//adc/ad7124.c:215:14: note: in expansion of macro 'AD7124_DATA'
 .data_reg = AD7124_DATA,
 ^~~

vim +215 drivers/iio//adc/ad7124.c

20  
21  /* AD7124 registers */
22  #define AD7124_COMMS0x00
23  #define AD7124_STATUS   0x00
24  #define AD7124_ADC_CONTROL  0x01
  > 25  #define AD7124_DATA 0x02
26  #define AD7124_IO_CONTROL_1 0x03
27  #define AD7124_IO_CONTROL_2 0x04
28  #define AD7124_ID   0x05
29  #define AD7124_ERROR0x06
30  #define AD7124_ERROR_EN 0x07
31  #define AD7124_MCLK_COUNT   0x08
32  #define AD7124_CHANNEL(x)   (0x09 + (x))
33  #define AD7124_CONFIG(x)(0x19 + (x))
34  #define AD7124_FILTER(x)(0x21 + (x))
35  #define AD7124_OFFSET(x)(0x29 + (x))
36  #define AD7124_GAIN(x)  (0x31 + (x))
37  
38  /* AD7124_STATUS */
39  #define AD7124_STATUS_POR_FLAG_MSK  BIT(4)
40  
41  /* AD7124_ADC_CONTROL */
42  #define AD7124_ADC_CTRL_PWR_MSK GENMASK(7, 6)
43  #define AD7124_ADC_CTRL_PWR(x)  
FIELD_PREP(AD7124_ADC_CTRL_PWR_MSK, x)
44  #define AD7124_ADC_CTRL_MODE_MSKGENMASK(5, 2)
45  #define AD7124_ADC_CTRL_MODE(x) FIELD_PREP(AD7124_ADC_CTRL_MODE_MSK, x)
46  
47  /* AD7124_CHANNEL_X */
48  #define AD7124_CHANNEL_EN_MSK   BIT(15)
49  #define AD7124_CHANNEL_EN(x)
FIELD_PREP(AD7124_CHANNEL_EN_MSK, x)
50  #define AD7124_CHANNEL_SETUP_MSKGENMASK(14, 12)
51  #define AD7124_CHANNEL_SETUP(x) FIELD_PREP(AD7124_CHANNEL_SETUP_MSK, x)
52  #define AD7124_CHANNEL_AINP_MSK GENMASK(9, 5)
53  #define AD7124_CHANNEL_AINP(x)  
FIELD_PREP(AD7124_CHANNEL_AINP_MSK, x)
54  #define AD7124_CHANNEL_AINM_MSK GENMASK(4, 0)
55  #define AD7124_CHANNEL_AINM(x)  
FIELD_PREP(AD7124_CHANNEL_AINM_MSK, x)
56  
57  /* AD7124_CONFIG_X */
58  #define AD7124_CONFIG_BIPOLAR_MSK   BIT(11)
59  #define AD7124_CONFIG_BIPOLAR(x)
FIELD_PREP(AD7124_CONFIG_BIPOLAR_MSK, x)
60  #define AD7124_CONFIG_REF_SEL_MSK   GENMASK(4, 3)
61  #define AD7124_CONFIG_REF_SEL(x)
FIELD_PREP(AD7124_CONFIG_REF_SEL_MSK, x)
62  #define AD7124_CONFIG_PGA_MSK   GENMASK(2, 0)
63  #define AD7124_CONFIG_PGA(x)
FIELD_PREP(AD7124_CONFIG_PGA_MSK, x)
64  
65  /* AD7124_FILTER_X */
66  #define AD7124_FILTER_FS_MSKGENMASK(10, 0)
67  #define AD7124_FILTER_FS(x) 
FIELD_PREP(AD7124_FILTER_FS_MSK, x)
68  
69  enum ad7124_ids {
70  ID_AD7124_4,
71  ID_AD7124_8,
72  };
73  
74  enum ad7124_ref_sel {
75  AD7124_REFIN1,
76  AD7124_REFIN2,
77  AD7124_INT_REF,
78  AD7124_AVDD_REF,
79  };
80  
81  enum ad7124_power_mode {
82  AD7124_LOW_POWER,
83  AD7124_MID_POWER,
84  AD7124_FULL_POWER,
85  };
86  
87  static const unsigned int ad7124_gain[8] = {
88  1, 2, 4, 8, 16, 32, 64, 128
89  };
90  
91  static const int ad7124_master_clk_freq_hz[3] = {
92  [AD7124_LOW_POWER] = 76800,
93  [AD7124_MID_POWER] = 153600,
94  [AD7124_FULL_POWER] = 614400,
95

Re: [RFC PATCH 1/5] x86: introduce preemption disable prefix

2018-10-18 Thread Andy Lutomirski

> On Oct 18, 2018, at 6:08 PM, Nadav Amit  wrote:
>
> at 10:00 AM, Andy Lutomirski  wrote:
>
>>
>>
>>> On Oct 18, 2018, at 9:47 AM, Nadav Amit  wrote:
>>>
>>> at 8:51 PM, Andy Lutomirski  wrote:
>>>
> On Wed, Oct 17, 2018 at 8:12 PM Nadav Amit  wrote:
> at 6:22 PM, Andy Lutomirski  wrote:
>
>>> On Oct 17, 2018, at 5:54 PM, Nadav Amit  wrote:
>>>
>>> It is sometimes beneficial to prevent preemption for very few
>>> instructions, or prevent preemption for some instructions that precede
>>> a branch (this latter case will be introduced in the next patches).
>>>
>>> To provide such functionality on x86-64, we use an empty REX-prefix
>>> (opcode 0x40) as an indication that preemption is disabled for the
>>> following instruction.
>>
>> Nifty!
>>
>> That being said, I think you have a few bugs. First, you can’t just 
>> ignore
>> a rescheduling interrupt, as you introduce unbounded latency when this
>> happens — you’re effectively emulating preempt_enable_no_resched(), which
>> is not a drop-in replacement for preempt_enable(). To fix this, you may
>> need to jump to a slow-path trampoline that calls schedule() at the end 
>> or
>> consider rewinding one instruction instead. Or use TF, which is only a
>> little bit terrifying…
>
> Yes, I didn’t pay enough attention here. For my use-case, I think that the
> easiest solution would be to make synchronize_sched() ignore preemptions
> that happen while the prefix is detected. It would slightly change the
> meaning of the prefix.
>>>
>>> So thinking about it further, rewinding the instruction seems the easiest
>>> and most robust solution. I’ll do it.
>>>
>> You also aren’t accounting for the case where you get an exception that
>> is, in turn, preempted.
>
> Hmm.. Can you give me an example for such an exception in my use-case? I
> cannot think of an exception that might be preempted (assuming #BP, #MC
> cannot be preempted).

 Look for cond_local_irq_enable().
>>>
>>> I looked at it. Yet, I still don’t see how exceptions might happen in my
>>> use-case, but having said that - this can be fixed too.
>>
>> I’m not totally certain there’s a case that matters.  But it’s worth checking
>
> I am still checking. But, I wanted to ask you whether the existing code is
> correct, since it seems to me that others do the same mistake I did, unless
> I don’t understand the code.
>
> Consider for example do_int3(), and see my inlined comments:
>
> dotraplinkage void notrace do_int3(struct pt_regs *regs, long error_code)
> {
>...
>ist_enter(regs);// => preempt_disable()
>cond_local_irq_enable(regs);// => assume it enables IRQs
>
>...
>// resched irq can be delivered here. It will not caused rescheduling
>// since preemption is disabled
>
>cond_local_irq_disable(regs);// => assume it disables IRQs
>ist_exit(regs);// => preempt_enable_no_resched()
> }
>
> At this point resched will not happen for unbounded length of time (unless
> there is another point when exiting the trap handler that checks if
> preemption should take place).

I think it's only a bug in the cases where someone uses extable to fix
up an int3 (which would be nuts) or that we oops.  But I should still
fix it.  In the normal case where int3 was in user code, we'll miss
the reschedule in do_trap(), but we'll reschedule in
prepare_exit_to_usermode() -> exit_to_usermode_loop().

>
> Another example is __BPF_PROG_RUN_ARRAY(), which also uses
> preempt_enable_no_resched().

Alexei, I think this code is just wrong. Do you know why it uses
preempt_enable_no_resched()?

Oleg, the code in kernel/signal.c:

preempt_disable();
read_unlock(_lock);
preempt_enable_no_resched();
freezable_schedule();

looks bogus.  I don't get what it's trying to achieve with
preempt_disable(), and I also don't see why no_resched does anything.
Sure, it prevents a reschedule triggered during read_unlock() from
causing a reschedule, but it doesn't prevent an interrupt immediately
after the preempt_enable_no_resched() call from scheduling.

--Andy

>
> Am I missing something?
>
> Thanks,
> Nadav

Re: [RFC PATCH 1/5] x86: introduce preemption disable prefix

2018-10-18 Thread Andy Lutomirski

> On Oct 18, 2018, at 6:08 PM, Nadav Amit  wrote:
>
> at 10:00 AM, Andy Lutomirski  wrote:
>
>>
>>
>>> On Oct 18, 2018, at 9:47 AM, Nadav Amit  wrote:
>>>
>>> at 8:51 PM, Andy Lutomirski  wrote:
>>>
> On Wed, Oct 17, 2018 at 8:12 PM Nadav Amit  wrote:
> at 6:22 PM, Andy Lutomirski  wrote:
>
>>> On Oct 17, 2018, at 5:54 PM, Nadav Amit  wrote:
>>>
>>> It is sometimes beneficial to prevent preemption for very few
>>> instructions, or prevent preemption for some instructions that precede
>>> a branch (this latter case will be introduced in the next patches).
>>>
>>> To provide such functionality on x86-64, we use an empty REX-prefix
>>> (opcode 0x40) as an indication that preemption is disabled for the
>>> following instruction.
>>
>> Nifty!
>>
>> That being said, I think you have a few bugs. First, you can’t just 
>> ignore
>> a rescheduling interrupt, as you introduce unbounded latency when this
>> happens — you’re effectively emulating preempt_enable_no_resched(), which
>> is not a drop-in replacement for preempt_enable(). To fix this, you may
>> need to jump to a slow-path trampoline that calls schedule() at the end 
>> or
>> consider rewinding one instruction instead. Or use TF, which is only a
>> little bit terrifying…
>
> Yes, I didn’t pay enough attention here. For my use-case, I think that the
> easiest solution would be to make synchronize_sched() ignore preemptions
> that happen while the prefix is detected. It would slightly change the
> meaning of the prefix.
>>>
>>> So thinking about it further, rewinding the instruction seems the easiest
>>> and most robust solution. I’ll do it.
>>>
>> You also aren’t accounting for the case where you get an exception that
>> is, in turn, preempted.
>
> Hmm.. Can you give me an example for such an exception in my use-case? I
> cannot think of an exception that might be preempted (assuming #BP, #MC
> cannot be preempted).

 Look for cond_local_irq_enable().
>>>
>>> I looked at it. Yet, I still don’t see how exceptions might happen in my
>>> use-case, but having said that - this can be fixed too.
>>
>> I’m not totally certain there’s a case that matters.  But it’s worth checking
>
> I am still checking. But, I wanted to ask you whether the existing code is
> correct, since it seems to me that others do the same mistake I did, unless
> I don’t understand the code.
>
> Consider for example do_int3(), and see my inlined comments:
>
> dotraplinkage void notrace do_int3(struct pt_regs *regs, long error_code)
> {
>...
>ist_enter(regs);// => preempt_disable()
>cond_local_irq_enable(regs);// => assume it enables IRQs
>
>...
>// resched irq can be delivered here. It will not caused rescheduling
>// since preemption is disabled
>
>cond_local_irq_disable(regs);// => assume it disables IRQs
>ist_exit(regs);// => preempt_enable_no_resched()
> }
>
> At this point resched will not happen for unbounded length of time (unless
> there is another point when exiting the trap handler that checks if
> preemption should take place).

I think it's only a bug in the cases where someone uses extable to fix
up an int3 (which would be nuts) or that we oops.  But I should still
fix it.  In the normal case where int3 was in user code, we'll miss
the reschedule in do_trap(), but we'll reschedule in
prepare_exit_to_usermode() -> exit_to_usermode_loop().

>
> Another example is __BPF_PROG_RUN_ARRAY(), which also uses
> preempt_enable_no_resched().

Alexei, I think this code is just wrong. Do you know why it uses
preempt_enable_no_resched()?

Oleg, the code in kernel/signal.c:

preempt_disable();
read_unlock(_lock);
preempt_enable_no_resched();
freezable_schedule();

looks bogus.  I don't get what it's trying to achieve with
preempt_disable(), and I also don't see why no_resched does anything.
Sure, it prevents a reschedule triggered during read_unlock() from
causing a reschedule, but it doesn't prevent an interrupt immediately
after the preempt_enable_no_resched() call from scheduling.

--Andy

>
> Am I missing something?
>
> Thanks,
> Nadav

[PATCH v2 1/2] seq_buf: Make seq_buf_puts() null-terminate the buffer

2018-10-18 Thread Michael Ellerman

Currently seq_buf_puts() will happily create a non null-terminated
string for you in the buffer. This is particularly dangerous if the
buffer is on the stack.

For example:

  char buf[8];
  char secret = "secret";
  struct seq_buf s;

  seq_buf_init(, buf, sizeof(buf));
  seq_buf_puts(, "foo");
  printk("Message is %s\n", buf);

Can result in:

  Message is fooªsecret

We could require all users to memset() their buffer to zero before
use. But that seems likely to be forgotten and lead to bugs.

Instead we can change seq_buf_puts() to always leave the buffer in a
null-terminated state.

The only downside is that this makes the buffer 1 character smaller
for seq_buf_puts(), but that seems like a good trade off.

Acked-by: Kees Cook 
Signed-off-by: Michael Ellerman 
---
 lib/seq_buf.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

v2: Fix NULL/null terminology.

diff --git a/lib/seq_buf.c b/lib/seq_buf.c
index 11f2ae0f9099..6aabb609dd87 100644
--- a/lib/seq_buf.c
+++ b/lib/seq_buf.c
@@ -144,9 +144,13 @@ int seq_buf_puts(struct seq_buf *s, const char *str)
 
WARN_ON(s->size == 0);
 
+   /* Add 1 to len for the trailing null byte which must be there */
+   len += 1;
+
if (seq_buf_can_fit(s, len)) {
memcpy(s->buffer + s->len, str, len);
-   s->len += len;
+   /* Don't count the trailing null byte against the capacity */
+   s->len += len - 1;
return 0;
}
seq_buf_set_overflow(s);
-- 
2.17.2

[PATCH v2 2/2] seq_buf: Use size_t for len in seq_buf_puts()

2018-10-18 Thread Michael Ellerman

Jann Horn points out that we're using unsigned int for len in
seq_buf_puts(), which could potentially overflow if we're passed a
UINT_MAX sized string.

The rest of the code already uses size_t, so we should also use that
in seq_buf_puts() to avoid any issues.

Suggested-by: Jann Horn 
Signed-off-by: Michael Ellerman 
---
 lib/seq_buf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

v2: New in v2.

diff --git a/lib/seq_buf.c b/lib/seq_buf.c
index 6aabb609dd87..bd807f545a9d 100644
--- a/lib/seq_buf.c
+++ b/lib/seq_buf.c
@@ -140,7 +140,7 @@ int seq_buf_bprintf(struct seq_buf *s, const char *fmt, 
const u32 *binary)
  */
 int seq_buf_puts(struct seq_buf *s, const char *str)
 {
-   unsigned int len = strlen(str);
+   size_t len = strlen(str);
 
WARN_ON(s->size == 0);
 
-- 
2.17.2

[PATCH v2 1/2] seq_buf: Make seq_buf_puts() null-terminate the buffer

2018-10-18 Thread Michael Ellerman

Currently seq_buf_puts() will happily create a non null-terminated
string for you in the buffer. This is particularly dangerous if the
buffer is on the stack.

For example:

  char buf[8];
  char secret = "secret";
  struct seq_buf s;

  seq_buf_init(, buf, sizeof(buf));
  seq_buf_puts(, "foo");
  printk("Message is %s\n", buf);

Can result in:

  Message is fooªsecret

We could require all users to memset() their buffer to zero before
use. But that seems likely to be forgotten and lead to bugs.

Instead we can change seq_buf_puts() to always leave the buffer in a
null-terminated state.

The only downside is that this makes the buffer 1 character smaller
for seq_buf_puts(), but that seems like a good trade off.

Acked-by: Kees Cook 
Signed-off-by: Michael Ellerman 
---
 lib/seq_buf.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

v2: Fix NULL/null terminology.

diff --git a/lib/seq_buf.c b/lib/seq_buf.c
index 11f2ae0f9099..6aabb609dd87 100644
--- a/lib/seq_buf.c
+++ b/lib/seq_buf.c
@@ -144,9 +144,13 @@ int seq_buf_puts(struct seq_buf *s, const char *str)
 
WARN_ON(s->size == 0);
 
+   /* Add 1 to len for the trailing null byte which must be there */
+   len += 1;
+
if (seq_buf_can_fit(s, len)) {
memcpy(s->buffer + s->len, str, len);
-   s->len += len;
+   /* Don't count the trailing null byte against the capacity */
+   s->len += len - 1;
return 0;
}
seq_buf_set_overflow(s);
-- 
2.17.2

[PATCH v2 2/2] seq_buf: Use size_t for len in seq_buf_puts()

2018-10-18 Thread Michael Ellerman

Jann Horn points out that we're using unsigned int for len in
seq_buf_puts(), which could potentially overflow if we're passed a
UINT_MAX sized string.

The rest of the code already uses size_t, so we should also use that
in seq_buf_puts() to avoid any issues.

Suggested-by: Jann Horn 
Signed-off-by: Michael Ellerman 
---
 lib/seq_buf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

v2: New in v2.

diff --git a/lib/seq_buf.c b/lib/seq_buf.c
index 6aabb609dd87..bd807f545a9d 100644
--- a/lib/seq_buf.c
+++ b/lib/seq_buf.c
@@ -140,7 +140,7 @@ int seq_buf_bprintf(struct seq_buf *s, const char *fmt, 
const u32 *binary)
  */
 int seq_buf_puts(struct seq_buf *s, const char *str)
 {
-   unsigned int len = strlen(str);
+   size_t len = strlen(str);
 
WARN_ON(s->size == 0);
 
-- 
2.17.2

Re: [PATCH] seq_buf: Make seq_buf_puts() NULL terminate the buffer

2018-10-18 Thread Michael Ellerman

Jann Horn  writes:
> On Wed, Oct 17, 2018 at 2:10 PM Michael Ellerman  wrote:
>> Currently seq_buf_puts() will happily create a non NULL terminated
>> string for you in the buffer. This is particularly dangerous if the
>> buffer is on the stack.
>>
>> For example:
>>
>>   char buf[8];
>>   char secret = "secret";
>>   struct seq_buf s;
>>
>>   seq_buf_init(, buf, sizeof(buf));
>>   seq_buf_puts(, "foo");
>>   printk("Message is %s\n", buf);
>>
>> Can result in:
>>
>>   Message is fooªsecret
>>
>> We could require all users to memset() their buffer to NULL before
>> use. But that seems likely to be forgotten and lead to bugs.
>>
>> Instead we can change seq_buf_puts() to always leave the buffer in a
>> NULL terminated state.
>>
>> The only downside is that this makes the buffer 1 character smaller
>> for seq_buf_puts(), but that seems like a good trade off.
>
> After this, you can also simplify rdt_last_cmd_status_show(), right?

Yes.

We also have a seq_buf_printf() in powerpc code that is printing a fixed
string purely to get NULL termination, so that can become a
seq_buf_puts().

>> diff --git a/lib/seq_buf.c b/lib/seq_buf.c
>> index 11f2ae0f9099..b1570204cde3 100644
>> --- a/lib/seq_buf.c
>> +++ b/lib/seq_buf.c
>> @@ -144,9 +144,13 @@ int seq_buf_puts(struct seq_buf *s, const char *str)
>>
>> WARN_ON(s->size == 0);
>>
>> +   /* Add 1 to len for the trailing NULL which must be there */
>
> Nit: In the comments, I would prefer either "null byte" or "NUL"
> instead of "NULL" when talking about something that is not a pointer.

Hmm yeah I guess. I think of them as being more or less the same thing,
or at least interchangeable, but that's a bit sloppy.

I'll send a v2 with "null byte".

>> +   len += 1;
>
> It looks like you're using an "unsigned int" for the length, meaning
> that this can in theory (e.g. when operating on a string from a big
> vmalloc buffer) overflow. You should be using size_t here.

Yes you're right.

And if len overflows to zero above ..

>> if (seq_buf_can_fit(s, len)) {

This will return true.

>> memcpy(s->buffer + s->len, str, len);
>> -   s->len += len;
>> +   /* Don't count the trailing NULL against the capacity */
>> +   s->len += len - 1;

And then here s->len becomes UINT_MAX.

I think. Which is probably not what we want.

I'll send a patch to switch to size_t in there.

cheers

Re: [PATCH] seq_buf: Make seq_buf_puts() NULL terminate the buffer

2018-10-18 Thread Michael Ellerman

Jann Horn  writes:
> On Wed, Oct 17, 2018 at 2:10 PM Michael Ellerman  wrote:
>> Currently seq_buf_puts() will happily create a non NULL terminated
>> string for you in the buffer. This is particularly dangerous if the
>> buffer is on the stack.
>>
>> For example:
>>
>>   char buf[8];
>>   char secret = "secret";
>>   struct seq_buf s;
>>
>>   seq_buf_init(, buf, sizeof(buf));
>>   seq_buf_puts(, "foo");
>>   printk("Message is %s\n", buf);
>>
>> Can result in:
>>
>>   Message is fooªsecret
>>
>> We could require all users to memset() their buffer to NULL before
>> use. But that seems likely to be forgotten and lead to bugs.
>>
>> Instead we can change seq_buf_puts() to always leave the buffer in a
>> NULL terminated state.
>>
>> The only downside is that this makes the buffer 1 character smaller
>> for seq_buf_puts(), but that seems like a good trade off.
>
> After this, you can also simplify rdt_last_cmd_status_show(), right?

Yes.

We also have a seq_buf_printf() in powerpc code that is printing a fixed
string purely to get NULL termination, so that can become a
seq_buf_puts().

>> diff --git a/lib/seq_buf.c b/lib/seq_buf.c
>> index 11f2ae0f9099..b1570204cde3 100644
>> --- a/lib/seq_buf.c
>> +++ b/lib/seq_buf.c
>> @@ -144,9 +144,13 @@ int seq_buf_puts(struct seq_buf *s, const char *str)
>>
>> WARN_ON(s->size == 0);
>>
>> +   /* Add 1 to len for the trailing NULL which must be there */
>
> Nit: In the comments, I would prefer either "null byte" or "NUL"
> instead of "NULL" when talking about something that is not a pointer.

Hmm yeah I guess. I think of them as being more or less the same thing,
or at least interchangeable, but that's a bit sloppy.

I'll send a v2 with "null byte".

>> +   len += 1;
>
> It looks like you're using an "unsigned int" for the length, meaning
> that this can in theory (e.g. when operating on a string from a big
> vmalloc buffer) overflow. You should be using size_t here.

Yes you're right.

And if len overflows to zero above ..

>> if (seq_buf_can_fit(s, len)) {

This will return true.

>> memcpy(s->buffer + s->len, str, len);
>> -   s->len += len;
>> +   /* Don't count the trailing NULL against the capacity */
>> +   s->len += len - 1;

And then here s->len becomes UINT_MAX.

I think. Which is probably not what we want.

I'll send a patch to switch to size_t in there.

cheers

Re: Crash in msm serial on dragonboard with ftrace bootargs

2018-10-18 Thread Joel Fernandes

On Thu, Oct 18, 2018 at 09:17:06AM -0400, Steven Rostedt wrote:
> On Thu, 18 Oct 2018 10:51:18 +0530
> Sai Prakash Ranjan  wrote:
> 
> > > So something else is causing an issue besides just msm_read.
> > > 
> > > Can you do an objdump -dr of the entire vmlinux binary and gzip it and
> > > post it somewhere. Not sure if it would be too big to email. You could
> > > try sending it to me privately. I'd like to see the binary that you are
> > > using.
> > >   
> > 
> > I have sent the objdump and dot config to you privately.
> 
> Thanks. I don't see anything that pops out, but then again, my arm asm
> foo is very rusty (it has been literally decades since I did any arm
> asm). I wonder if it could simply be a timing issue?
> 
> 086eb538 :
> 086eb538:   a9be7bfdstp x29, x30, [sp,#-32]!
> 086eb53c:   910003fdmov x29, sp
> 086eb540:   a90153f3stp x19, x20, [sp,#16]
> 086eb544:   aa0003f4mov x20, x0
> 086eb548:   2a0103f3mov w19, w1
> 086eb54c:   aa1e03e0mov x0, x30
> 086eb550:   97e6bae4bl  0809a0e0 <_mcount>
> 
> The above is changed to nop on boot, but then to:
> 
>   bl ftrace_caller
> 
> When ftrace is enabled.
> 
> 086eb554:   8b334280add x0, x20, w19, uxtw
> 086eb558:   b940ldr w0, [x0]
> 086eb55c:   a94153f3ldp x19, x20, [sp,#16]
> 086eb560:   a8c27bfdldp x29, x30, [sp],#32
> 086eb564:   d65f03c0ret
> 
> 
> 
> 0809a0e4 :
> 0809a0e4:   a9bf7bfdstp x29, x30, [sp,#-16]!
> 0809a0e8:   910003fdmov x29, sp
> 0809a0ec:   d10013c0sub x0, x30, #0x4
> 0809a0f0:   f94003a1ldr x1, [x29]
> 0809a0f4:   f9400421ldr x1, [x1,#8]
> 0809a0f8:   d1001021sub x1, x1, #0x4
> 
> 0809a0fc :
> 0809a0fc:   d503201fnop
> 
> The above nop gets patched to:
> 
>   bl ftrace_ops_no_ops
> 
> Which will iterate through all the registered functions.
> 
> 
> 0809a100 :
> 0809a100:   d503201fnop
> 
> The above only gets set when function graph tracer is enabled, which it
> is not in this case.
> 
> 0809a104:   a8c17bfdldp x29, x30, [sp],#16
> 0809a108:   d65f03c0ret
> 
> 
> Anyone see any problems here?

This seems sane to me, he says in the other thread that he put 'notrace' to
the msm serial functions (which AIUI should prevent ftrace instrumentation)
and he still sees the issue.

thanks,

 - Joel

Re: Crash in msm serial on dragonboard with ftrace bootargs

2018-10-18 Thread Joel Fernandes

On Thu, Oct 18, 2018 at 09:17:06AM -0400, Steven Rostedt wrote:
> On Thu, 18 Oct 2018 10:51:18 +0530
> Sai Prakash Ranjan  wrote:
> 
> > > So something else is causing an issue besides just msm_read.
> > > 
> > > Can you do an objdump -dr of the entire vmlinux binary and gzip it and
> > > post it somewhere. Not sure if it would be too big to email. You could
> > > try sending it to me privately. I'd like to see the binary that you are
> > > using.
> > >   
> > 
> > I have sent the objdump and dot config to you privately.
> 
> Thanks. I don't see anything that pops out, but then again, my arm asm
> foo is very rusty (it has been literally decades since I did any arm
> asm). I wonder if it could simply be a timing issue?
> 
> 086eb538 :
> 086eb538:   a9be7bfdstp x29, x30, [sp,#-32]!
> 086eb53c:   910003fdmov x29, sp
> 086eb540:   a90153f3stp x19, x20, [sp,#16]
> 086eb544:   aa0003f4mov x20, x0
> 086eb548:   2a0103f3mov w19, w1
> 086eb54c:   aa1e03e0mov x0, x30
> 086eb550:   97e6bae4bl  0809a0e0 <_mcount>
> 
> The above is changed to nop on boot, but then to:
> 
>   bl ftrace_caller
> 
> When ftrace is enabled.
> 
> 086eb554:   8b334280add x0, x20, w19, uxtw
> 086eb558:   b940ldr w0, [x0]
> 086eb55c:   a94153f3ldp x19, x20, [sp,#16]
> 086eb560:   a8c27bfdldp x29, x30, [sp],#32
> 086eb564:   d65f03c0ret
> 
> 
> 
> 0809a0e4 :
> 0809a0e4:   a9bf7bfdstp x29, x30, [sp,#-16]!
> 0809a0e8:   910003fdmov x29, sp
> 0809a0ec:   d10013c0sub x0, x30, #0x4
> 0809a0f0:   f94003a1ldr x1, [x29]
> 0809a0f4:   f9400421ldr x1, [x1,#8]
> 0809a0f8:   d1001021sub x1, x1, #0x4
> 
> 0809a0fc :
> 0809a0fc:   d503201fnop
> 
> The above nop gets patched to:
> 
>   bl ftrace_ops_no_ops
> 
> Which will iterate through all the registered functions.
> 
> 
> 0809a100 :
> 0809a100:   d503201fnop
> 
> The above only gets set when function graph tracer is enabled, which it
> is not in this case.
> 
> 0809a104:   a8c17bfdldp x29, x30, [sp],#16
> 0809a108:   d65f03c0ret
> 
> 
> Anyone see any problems here?

This seems sane to me, he says in the other thread that he put 'notrace' to
the msm serial functions (which AIUI should prevent ftrace instrumentation)
and he still sees the issue.

thanks,

 - Joel

linux-next: manual merge of the kvm tree with the tip tree

2018-10-18 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the kvm tree got a conflict in:

  tools/include/uapi/linux/kvm.h

between commit:

  25fe15e54fe5 ("tools headers uapi: Sync kvm.h copy")

from the tip tree and commit:

  c939989d74e2 ("tools/headers: update kvm.h")

from the kvm tree.

I fixed it up (the latter is a superset of the former) and can carry the
fix as necessary. This is now fixed as far as linux-next is concerned,
but any non trivial conflicts should be mentioned to your upstream
maintainer when your tree is submitted for merging.  You may also want
to consider cooperating with the maintainer of the conflicting tree to
minimise any particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell


pgpvBil35GIdb.pgp
Description: OpenPGP digital signature

linux-next: manual merge of the kvm tree with the tip tree

2018-10-18 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the kvm tree got a conflict in:

  tools/include/uapi/linux/kvm.h

between commit:

  25fe15e54fe5 ("tools headers uapi: Sync kvm.h copy")

from the tip tree and commit:

  c939989d74e2 ("tools/headers: update kvm.h")

from the kvm tree.

I fixed it up (the latter is a superset of the former) and can carry the
fix as necessary. This is now fixed as far as linux-next is concerned,
but any non trivial conflicts should be mentioned to your upstream
maintainer when your tree is submitted for merging.  You may also want
to consider cooperating with the maintainer of the conflicting tree to
minimise any particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell


pgpvBil35GIdb.pgp
Description: OpenPGP digital signature

Re: [PATCH] firmware: coreboot: Fix a missing-check bug

2018-10-18 Thread Samuel Holland

On 10/18/18 10:37, Wenwen Wang wrote:
> In coreboot_table_init(), a for loop is used to copy the entries of the
> coreboot table. For each entry, the header of the entry, which is a
> structure coreboot_table_entry and includes the size of the entry, is
> firstly copied from the IO region 'ptr_entry' to 'entry' through the first
> memcpy_fromio(). Then the 'entry.size' is used to allocate the
> coreboot_device 'device' through kzalloc(). After 'device' is allocated,
> the whole entry, including the header, is then copied to 'device->entry'
> through the second memcpy_fromio(). Obviously, the header of the entry is
> copied twice here. More importantly, no check is enforced after the second
> copy to make sure the two copies obtain the same values. Given that the IO
> region can also be accessed by the device, it is possible that
> 'device->entry.size' is different from 'entry.size' after the second copy,
> especially when the device race to modify the size value between these two
> copies. This can cause undefined behavior of the kernel and introduce
> potential security risk, because 'device->entry.size' is inconsistent with
> the actual size of the entry.

Thanks for the patch.

However, this IO region is not associated with a hardware device. It is a table
in RAM that is only written to by firmware (coreboot) before Linux is ever run.
So there's no device on the other side that could race with the kernel here.

Regards,
Samuel

> This patch rewrites the header of each entry after the second copy, using
> the value acquired in the first copy. Through this way, the above issue can
> be avoided.
> 
> Signed-off-by: Wenwen Wang 
> ---
>  drivers/firmware/google/coreboot_table.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/firmware/google/coreboot_table.c 
> b/drivers/firmware/google/coreboot_table.c
> index 19db570..20fcd54 100644
> --- a/drivers/firmware/google/coreboot_table.c
> +++ b/drivers/firmware/google/coreboot_table.c
> @@ -128,6 +128,7 @@ int coreboot_table_init(struct device *dev, void __iomem 
> *ptr)
>   device->dev.bus = _bus_type;
>   device->dev.release = coreboot_device_release;
>   memcpy_fromio(>entry, ptr_entry, entry.size);
> + device->entry = entry;
>  
>   ret = device_register(>dev);
>   if (ret) {
>

Re: [PATCH] firmware: coreboot: Fix a missing-check bug

2018-10-18 Thread Samuel Holland

On 10/18/18 10:37, Wenwen Wang wrote:
> In coreboot_table_init(), a for loop is used to copy the entries of the
> coreboot table. For each entry, the header of the entry, which is a
> structure coreboot_table_entry and includes the size of the entry, is
> firstly copied from the IO region 'ptr_entry' to 'entry' through the first
> memcpy_fromio(). Then the 'entry.size' is used to allocate the
> coreboot_device 'device' through kzalloc(). After 'device' is allocated,
> the whole entry, including the header, is then copied to 'device->entry'
> through the second memcpy_fromio(). Obviously, the header of the entry is
> copied twice here. More importantly, no check is enforced after the second
> copy to make sure the two copies obtain the same values. Given that the IO
> region can also be accessed by the device, it is possible that
> 'device->entry.size' is different from 'entry.size' after the second copy,
> especially when the device race to modify the size value between these two
> copies. This can cause undefined behavior of the kernel and introduce
> potential security risk, because 'device->entry.size' is inconsistent with
> the actual size of the entry.

Thanks for the patch.

However, this IO region is not associated with a hardware device. It is a table
in RAM that is only written to by firmware (coreboot) before Linux is ever run.
So there's no device on the other side that could race with the kernel here.

Regards,
Samuel

> This patch rewrites the header of each entry after the second copy, using
> the value acquired in the first copy. Through this way, the above issue can
> be avoided.
> 
> Signed-off-by: Wenwen Wang 
> ---
>  drivers/firmware/google/coreboot_table.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/firmware/google/coreboot_table.c 
> b/drivers/firmware/google/coreboot_table.c
> index 19db570..20fcd54 100644
> --- a/drivers/firmware/google/coreboot_table.c
> +++ b/drivers/firmware/google/coreboot_table.c
> @@ -128,6 +128,7 @@ int coreboot_table_init(struct device *dev, void __iomem 
> *ptr)
>   device->dev.bus = _bus_type;
>   device->dev.release = coreboot_device_release;
>   memcpy_fromio(>entry, ptr_entry, entry.size);
> + device->entry = entry;
>  
>   ret = device_register(>dev);
>   if (ret) {
>

Re: [PATCH 1/4] Adds -Wshadow=local on KBUILD_HOSTCFLAGS

2018-10-18 Thread Leonardo Bras

On Thu, Oct 18, 2018 at 11:42 PM Masahiro Yamada
 wrote:
> Adding -Wshadow to KBUILD_HOSTCFLAGS emits another warning in Kconfig.
> Of course, it is easy to fix.
For v2, I already replaced '-Wshadow=local' for '-Wshadow' and fixed this
warning.

> But, I just started to think this option is a kind of harsh...

The v2 it's almost done. You think it will be useful, or should I drop it?

Regards,
Leonardo Bras

Re: [PATCH 1/4] Adds -Wshadow=local on KBUILD_HOSTCFLAGS

2018-10-18 Thread Leonardo Bras

On Thu, Oct 18, 2018 at 11:42 PM Masahiro Yamada
 wrote:
> Adding -Wshadow to KBUILD_HOSTCFLAGS emits another warning in Kconfig.
> Of course, it is easy to fix.
For v2, I already replaced '-Wshadow=local' for '-Wshadow' and fixed this
warning.

> But, I just started to think this option is a kind of harsh...

The v2 it's almost done. You think it will be useful, or should I drop it?

Regards,
Leonardo Bras

Re: [Lkcamp] [PATCH 3/4] kbuild: Removes unnecessary shadowed local variable and optimize testing.

2018-10-18 Thread Masahiro Yamada

On Thu, Oct 18, 2018 at 11:04 AM Leonardo Bras  wrote:
>
> Hello Helen,
>
> On Wed, Oct 17, 2018 at 12:06 AM Helen Koike  wrote:
> >
> > Hi Leonardo,
> >
> > On 10/16/18 9:09 PM, Leonardo Brás wrote:
> > > Removes an unnecessary shadowed local variable (start).
> > > Optimize test of isdigit:
> > > - If isalpha returns true, isdigit will return false, so no need to 
> > > test.
> >
> > This patch does two different things, it should be in two separated patches.
> Sure, no problem.
>
> >
> > >
> > > Signed-off-by: Leonardo Brás 
> > > ---
> > >  scripts/asn1_compiler.c | 7 +++
> > >  1 file changed, 3 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/scripts/asn1_compiler.c b/scripts/asn1_compiler.c
> > > index c146020fc783..08bb6e5fd6ad 100644
> > > --- a/scripts/asn1_compiler.c
> > > +++ b/scripts/asn1_compiler.c
> > > @@ -413,7 +413,7 @@ static void tokenise(char *buffer, char *end)
> > >
> > >   /* Handle string tokens */
> > >   if (isalpha(*p)) {
> > > - const char **dir, *start = p;
> > > + const char **dir;
> > >
> > >   /* Can be a directive, type name or element
> > >* name.  Find the end of the name.
> > > @@ -454,10 +454,9 @@ static void tokenise(char *buffer, char *end)
> > >
> > >   tokens[tix++].token_type = TOKEN_TYPE_NAME;
> > >   continue;
> > > - }
> > > + } else if (isdigit(*p)) {
> > > + /* Handle numbers */
> >
> > Actually you can't do that, p is being altered in the first if statement.
>
> Yeah, you are right. I will remove this logic for v2.


I drop v1 from my tree.

Please send v2.


> >
> > >
> > > - /* Handle numbers */
> > > - if (isdigit(*p)) {
> > >   /* Find the end of the number */
> > >   q = p + 1;
> > >   while (q < nl && (isdigit(*q)))
> > >
> >
> > Regards
> > Helen
>
> Thanks!
>
> Leonardo Brás



-- 
Best Regards
Masahiro Yamada

Re: [Lkcamp] [PATCH 3/4] kbuild: Removes unnecessary shadowed local variable and optimize testing.

2018-10-18 Thread Masahiro Yamada

On Thu, Oct 18, 2018 at 11:04 AM Leonardo Bras  wrote:
>
> Hello Helen,
>
> On Wed, Oct 17, 2018 at 12:06 AM Helen Koike  wrote:
> >
> > Hi Leonardo,
> >
> > On 10/16/18 9:09 PM, Leonardo Brás wrote:
> > > Removes an unnecessary shadowed local variable (start).
> > > Optimize test of isdigit:
> > > - If isalpha returns true, isdigit will return false, so no need to 
> > > test.
> >
> > This patch does two different things, it should be in two separated patches.
> Sure, no problem.
>
> >
> > >
> > > Signed-off-by: Leonardo Brás 
> > > ---
> > >  scripts/asn1_compiler.c | 7 +++
> > >  1 file changed, 3 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/scripts/asn1_compiler.c b/scripts/asn1_compiler.c
> > > index c146020fc783..08bb6e5fd6ad 100644
> > > --- a/scripts/asn1_compiler.c
> > > +++ b/scripts/asn1_compiler.c
> > > @@ -413,7 +413,7 @@ static void tokenise(char *buffer, char *end)
> > >
> > >   /* Handle string tokens */
> > >   if (isalpha(*p)) {
> > > - const char **dir, *start = p;
> > > + const char **dir;
> > >
> > >   /* Can be a directive, type name or element
> > >* name.  Find the end of the name.
> > > @@ -454,10 +454,9 @@ static void tokenise(char *buffer, char *end)
> > >
> > >   tokens[tix++].token_type = TOKEN_TYPE_NAME;
> > >   continue;
> > > - }
> > > + } else if (isdigit(*p)) {
> > > + /* Handle numbers */
> >
> > Actually you can't do that, p is being altered in the first if statement.
>
> Yeah, you are right. I will remove this logic for v2.


I drop v1 from my tree.

Please send v2.


> >
> > >
> > > - /* Handle numbers */
> > > - if (isdigit(*p)) {
> > >   /* Find the end of the number */
> > >   q = p + 1;
> > >   while (q < nl && (isdigit(*q)))
> > >
> >
> > Regards
> > Helen
>
> Thanks!
>
> Leonardo Brás



-- 
Best Regards
Masahiro Yamada

Re: [PATCH 1/4] Adds -Wshadow=local on KBUILD_HOSTCFLAGS

2018-10-18 Thread Masahiro Yamada

On Fri, Oct 19, 2018 at 1:53 AM Borislav Petkov  wrote:
>
> On Fri, Oct 19, 2018 at 01:39:21AM +0900, Masahiro Yamada wrote:
> > It is not realistic to enable this warning option by default.
>
> I believe the question is whether to enable that warning by default in
> KBUILD_HOSTCFLAGS. Enabling it by default for the target kernel is of
> course, too noisy. That's why it is hidden behind W=2 there.

Sorry, I misunderstood the question.

The difference about the noisiness between CC and HOSTCC
is just comes from the amount of source code.

Adding -Wshadow to KBUILD_HOSTCFLAGS emits another warning in Kconfig.
Of course, it is easy to fix.
But, I just started to think this option is a kind of harsh...

--
Best Regards
Masahiro Yamada

Re: [PATCH 1/4] Adds -Wshadow=local on KBUILD_HOSTCFLAGS

2018-10-18 Thread Masahiro Yamada

On Fri, Oct 19, 2018 at 1:53 AM Borislav Petkov  wrote:
>
> On Fri, Oct 19, 2018 at 01:39:21AM +0900, Masahiro Yamada wrote:
> > It is not realistic to enable this warning option by default.
>
> I believe the question is whether to enable that warning by default in
> KBUILD_HOSTCFLAGS. Enabling it by default for the target kernel is of
> course, too noisy. That's why it is hidden behind W=2 there.

Sorry, I misunderstood the question.

The difference about the noisiness between CC and HOSTCC
is just comes from the amount of source code.

Adding -Wshadow to KBUILD_HOSTCFLAGS emits another warning in Kconfig.
Of course, it is easy to fix.
But, I just started to think this option is a kind of harsh...

--
Best Regards
Masahiro Yamada

Re: [PATCH V2 2/5] mm/hugetlb: Distinguish between migratability and movability

2018-10-18 Thread Anshuman Khandual




On 10/19/2018 07:29 AM, Naoya Horiguchi wrote:
> On Fri, Oct 12, 2018 at 09:29:56AM +0530, Anshuman Khandual wrote:
>> During huge page allocation it's migratability is checked to determine if
>> it should be placed under movable zones with GFP_HIGHUSER_MOVABLE. But the
>> movability aspect of the huge page could depend on other factors than just
>> migratability. Movability in itself is a distinct property which should not
>> be tied with migratability alone.
>>
>> This differentiates these two and implements an enhanced movability check
>> which also considers huge page size to determine if it is feasible to be
>> placed under a movable zone. At present it just checks for gigantic pages
>> but going forward it can incorporate other enhanced checks.
> 
> (nitpicking...)
> The following code just checks hugepage_migration_supported(), so maybe
> s/Movability/Migratability/ is expected in the comment?
> 
>   static int unmap_and_move_huge_page(...)
>   {
>   ...
>   /*
>* Movability of hugepages depends on architectures and hugepage 
> size.
>* This check is necessary because some callers of hugepage 
> migration
>* like soft offline and memory hotremove don't walk through page
>* tables or check whether the hugepage is pmd-based or not before
>* kicking migration.
>*/
>   if (!hugepage_migration_supported(page_hstate(hpage))) {
> 
Sure, will update this patch only unless other changes are suggested.

Re: [PATCH V2 2/5] mm/hugetlb: Distinguish between migratability and movability

2018-10-18 Thread Anshuman Khandual




On 10/19/2018 07:29 AM, Naoya Horiguchi wrote:
> On Fri, Oct 12, 2018 at 09:29:56AM +0530, Anshuman Khandual wrote:
>> During huge page allocation it's migratability is checked to determine if
>> it should be placed under movable zones with GFP_HIGHUSER_MOVABLE. But the
>> movability aspect of the huge page could depend on other factors than just
>> migratability. Movability in itself is a distinct property which should not
>> be tied with migratability alone.
>>
>> This differentiates these two and implements an enhanced movability check
>> which also considers huge page size to determine if it is feasible to be
>> placed under a movable zone. At present it just checks for gigantic pages
>> but going forward it can incorporate other enhanced checks.
> 
> (nitpicking...)
> The following code just checks hugepage_migration_supported(), so maybe
> s/Movability/Migratability/ is expected in the comment?
> 
>   static int unmap_and_move_huge_page(...)
>   {
>   ...
>   /*
>* Movability of hugepages depends on architectures and hugepage 
> size.
>* This check is necessary because some callers of hugepage 
> migration
>* like soft offline and memory hotremove don't walk through page
>* tables or check whether the hugepage is pmd-based or not before
>* kicking migration.
>*/
>   if (!hugepage_migration_supported(page_hstate(hpage))) {
> 
Sure, will update this patch only unless other changes are suggested.

Re: [driver-core PATCH v4 4/6] driver core: Probe devices asynchronously instead of the driver

2018-10-18 Thread Bart Van Assche


On 10/18/18 7:20 PM, Alexander Duyck wrote:

I see what you are talking about now. Actually I think this was an
existing issue before my patch even came into play. Basically the code
as it currently stands is device specific in terms of the attach and
release code.

I wonder if we shouldn't have the async_synchronize_full call in
__device_release_driver moved down and into driver_detach before we
even start the for loop. Assuming the driver is no longer associated
with the bus that should flush out all devices so that we can then
pull them out of the devices list at least. I may look at adding an
additional bitflag to the device struct to indicate that it has a
driver attach pending. Then for things like races between any attach
and detach calls the logic becomes pretty straight forward. Attach
will set the bit and provide driver data, detach will clear the bit
and the driver data. If a driver loads in between it should clear the
bit as well.

I'll work on it over the next couple days and hopefully have something
ready for testing/review early next week.


Hi Alex,

How about checking in __driver_attach_async_helper() whether the driver 
pointer is still valid by checking whether bus_for_each_drv(dev->bus, 
...) can still find the driver pointer? That approach requires 
protection with a mutex to avoid races with the driver detach code but 
shouldn't require any new flags in struct device.


Thanks,

Bart.

Re: [driver-core PATCH v4 4/6] driver core: Probe devices asynchronously instead of the driver

2018-10-18 Thread Bart Van Assche


On 10/18/18 7:20 PM, Alexander Duyck wrote:

I see what you are talking about now. Actually I think this was an
existing issue before my patch even came into play. Basically the code
as it currently stands is device specific in terms of the attach and
release code.

I wonder if we shouldn't have the async_synchronize_full call in
__device_release_driver moved down and into driver_detach before we
even start the for loop. Assuming the driver is no longer associated
with the bus that should flush out all devices so that we can then
pull them out of the devices list at least. I may look at adding an
additional bitflag to the device struct to indicate that it has a
driver attach pending. Then for things like races between any attach
and detach calls the logic becomes pretty straight forward. Attach
will set the bit and provide driver data, detach will clear the bit
and the driver data. If a driver loads in between it should clear the
bit as well.

I'll work on it over the next couple days and hopefully have something
ready for testing/review early next week.


Hi Alex,

How about checking in __driver_attach_async_helper() whether the driver 
pointer is still valid by checking whether bus_for_each_drv(dev->bus, 
...) can still find the driver pointer? That approach requires 
protection with a mutex to avoid races with the driver detach code but 
shouldn't require any new flags in struct device.


Thanks,

Bart.

Re: [PATCH 2/3] uapi: get rid of STATX_ALL

2018-10-18 Thread Andreas Dilger

On Oct 18, 2018, at 7:11 AM, Miklos Szeredi  wrote:
> 
> Constants of the *_ALL type can be actively harmful due to the fact that
> developers will usually fail to consider the possible effects of future
> changes to the definition.
> 
> Remove STATX_ALL from the uapi, while no damage has been done yet.
> 
> We could keep something like this around in the kernel, but there's
> actually no point, since all filesystems should be explicitly checking
> flags that they support and not rely on the VFS masking unknown ones out: a
> flag could be known to the VFS, yet not known to the filesystem (see
> orangefs bug fixed in the previous patch).

What is actually strange is that the vfs_getattr_nosec() code is setting

stat->result_mask = STATX_BASIC_STATS;

in the code before any of the filesystem ->getattr methods are called.
That means it is up to the filesystems to actively *clear* flags from
the result_mask as opposed to only *setting* flags for fields that it
is filling in.  That seems a bit backward to me?

It looks like NFS is _probably_ doing the right thing, though it is still
using the userspace-supplied request_mask as a mask for the bits being
returned,  but the saving grace is that result_mask is STATX_BASIC_STATS
set by vfs_getattr_nosec() AFAICS.

Looking at OCFS2, CIFS, GFS2, they are doing a full inode revalidation
and returning the basic stats without looking at flags, request_mask,
or result_mask at all, so I'd expect they could be more efficient
(i.e. not revalidating the inode and possibly doing an RPC at all if
only some minimal flags are requested, or if AT_STATX_FORCE_SYNC is
not set).

It looks like overlayfs, nfsd, and fuse at least (as statx callers)
are often requesting a small subset of flags (e.g. STATX_INO,
STATX_NLINK, STATX_ATIME | STATX_MTIME, etc.) so they could be more
efficient if the underlying filesystems did only what was asked.

Cheers, Andreas

> 
> Signed-off-by: Miklos Szeredi 
> Cc: David Howells 
> Cc: Michael Kerrisk 
> ---
> fs/stat.c   | 1 -
> include/uapi/linux/stat.h   | 2 +-
> samples/statx/test-statx.c  | 2 +-
> tools/include/uapi/linux/stat.h | 2 +-
> 4 files changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/stat.c b/fs/stat.c
> index f8e6fb2c3657..8d297a279991 100644
> --- a/fs/stat.c
> +++ b/fs/stat.c
> @@ -73,7 +73,6 @@ int vfs_getattr_nosec(const struct path *path, struct kstat 
> *stat,
> 
>   memset(stat, 0, sizeof(*stat));
>   stat->result_mask |= STATX_BASIC_STATS;
> - request_mask &= STATX_ALL;
>   query_flags &= KSTAT_QUERY_FLAGS;
>   if (inode->i_op->getattr)
>   return inode->i_op->getattr(path, stat, request_mask,
> diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h
> index 7b35e98d3c58..370f09d92fa6 100644
> --- a/include/uapi/linux/stat.h
> +++ b/include/uapi/linux/stat.h
> @@ -148,7 +148,7 @@ struct statx {
> #define STATX_BLOCKS  0x0400U /* Want/got stx_blocks */
> #define STATX_BASIC_STATS 0x07ffU /* The stuff in the normal stat 
> struct */
> #define STATX_BTIME   0x0800U /* Want/got stx_btime */
> -#define STATX_ALL0x0fffU /* All currently supported 
> flags */
> +
> #define STATX__RESERVED   0x8000U /* Reserved for future 
> struct statx expansion */
> 
> /*
> diff --git a/samples/statx/test-statx.c b/samples/statx/test-statx.c
> index d4d77b09412c..e354048dea3c 100644
> --- a/samples/statx/test-statx.c
> +++ b/samples/statx/test-statx.c
> @@ -211,7 +211,7 @@ int main(int argc, char **argv)
>   struct statx stx;
>   int ret, raw = 0, atflag = AT_SYMLINK_NOFOLLOW;
> 
> - unsigned int mask = STATX_ALL;
> + unsigned int mask = STATX_BASIC_STATS | STATX_BTIME;
> 
>   for (argv++; *argv; argv++) {
>   if (strcmp(*argv, "-F") == 0) {
> diff --git a/tools/include/uapi/linux/stat.h b/tools/include/uapi/linux/stat.h
> index 7b35e98d3c58..370f09d92fa6 100644
> --- a/tools/include/uapi/linux/stat.h
> +++ b/tools/include/uapi/linux/stat.h
> @@ -148,7 +148,7 @@ struct statx {
> #define STATX_BLOCKS  0x0400U /* Want/got stx_blocks */
> #define STATX_BASIC_STATS 0x07ffU /* The stuff in the normal stat 
> struct */
> #define STATX_BTIME   0x0800U /* Want/got stx_btime */
> -#define STATX_ALL0x0fffU /* All currently supported 
> flags */
> +
> #define STATX__RESERVED   0x8000U /* Reserved for future 
> struct statx expansion */
> 
> /*
> --
> 2.14.3
> 


Cheers, Andreas







signature.asc
Description: Message signed with OpenPGP

Re: [PATCH 2/3] uapi: get rid of STATX_ALL

2018-10-18 Thread Andreas Dilger

On Oct 18, 2018, at 7:11 AM, Miklos Szeredi  wrote:
> 
> Constants of the *_ALL type can be actively harmful due to the fact that
> developers will usually fail to consider the possible effects of future
> changes to the definition.
> 
> Remove STATX_ALL from the uapi, while no damage has been done yet.
> 
> We could keep something like this around in the kernel, but there's
> actually no point, since all filesystems should be explicitly checking
> flags that they support and not rely on the VFS masking unknown ones out: a
> flag could be known to the VFS, yet not known to the filesystem (see
> orangefs bug fixed in the previous patch).

What is actually strange is that the vfs_getattr_nosec() code is setting

stat->result_mask = STATX_BASIC_STATS;

in the code before any of the filesystem ->getattr methods are called.
That means it is up to the filesystems to actively *clear* flags from
the result_mask as opposed to only *setting* flags for fields that it
is filling in.  That seems a bit backward to me?

It looks like NFS is _probably_ doing the right thing, though it is still
using the userspace-supplied request_mask as a mask for the bits being
returned,  but the saving grace is that result_mask is STATX_BASIC_STATS
set by vfs_getattr_nosec() AFAICS.

Looking at OCFS2, CIFS, GFS2, they are doing a full inode revalidation
and returning the basic stats without looking at flags, request_mask,
or result_mask at all, so I'd expect they could be more efficient
(i.e. not revalidating the inode and possibly doing an RPC at all if
only some minimal flags are requested, or if AT_STATX_FORCE_SYNC is
not set).

It looks like overlayfs, nfsd, and fuse at least (as statx callers)
are often requesting a small subset of flags (e.g. STATX_INO,
STATX_NLINK, STATX_ATIME | STATX_MTIME, etc.) so they could be more
efficient if the underlying filesystems did only what was asked.

Cheers, Andreas

> 
> Signed-off-by: Miklos Szeredi 
> Cc: David Howells 
> Cc: Michael Kerrisk 
> ---
> fs/stat.c   | 1 -
> include/uapi/linux/stat.h   | 2 +-
> samples/statx/test-statx.c  | 2 +-
> tools/include/uapi/linux/stat.h | 2 +-
> 4 files changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/stat.c b/fs/stat.c
> index f8e6fb2c3657..8d297a279991 100644
> --- a/fs/stat.c
> +++ b/fs/stat.c
> @@ -73,7 +73,6 @@ int vfs_getattr_nosec(const struct path *path, struct kstat 
> *stat,
> 
>   memset(stat, 0, sizeof(*stat));
>   stat->result_mask |= STATX_BASIC_STATS;
> - request_mask &= STATX_ALL;
>   query_flags &= KSTAT_QUERY_FLAGS;
>   if (inode->i_op->getattr)
>   return inode->i_op->getattr(path, stat, request_mask,
> diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h
> index 7b35e98d3c58..370f09d92fa6 100644
> --- a/include/uapi/linux/stat.h
> +++ b/include/uapi/linux/stat.h
> @@ -148,7 +148,7 @@ struct statx {
> #define STATX_BLOCKS  0x0400U /* Want/got stx_blocks */
> #define STATX_BASIC_STATS 0x07ffU /* The stuff in the normal stat 
> struct */
> #define STATX_BTIME   0x0800U /* Want/got stx_btime */
> -#define STATX_ALL0x0fffU /* All currently supported 
> flags */
> +
> #define STATX__RESERVED   0x8000U /* Reserved for future 
> struct statx expansion */
> 
> /*
> diff --git a/samples/statx/test-statx.c b/samples/statx/test-statx.c
> index d4d77b09412c..e354048dea3c 100644
> --- a/samples/statx/test-statx.c
> +++ b/samples/statx/test-statx.c
> @@ -211,7 +211,7 @@ int main(int argc, char **argv)
>   struct statx stx;
>   int ret, raw = 0, atflag = AT_SYMLINK_NOFOLLOW;
> 
> - unsigned int mask = STATX_ALL;
> + unsigned int mask = STATX_BASIC_STATS | STATX_BTIME;
> 
>   for (argv++; *argv; argv++) {
>   if (strcmp(*argv, "-F") == 0) {
> diff --git a/tools/include/uapi/linux/stat.h b/tools/include/uapi/linux/stat.h
> index 7b35e98d3c58..370f09d92fa6 100644
> --- a/tools/include/uapi/linux/stat.h
> +++ b/tools/include/uapi/linux/stat.h
> @@ -148,7 +148,7 @@ struct statx {
> #define STATX_BLOCKS  0x0400U /* Want/got stx_blocks */
> #define STATX_BASIC_STATS 0x07ffU /* The stuff in the normal stat 
> struct */
> #define STATX_BTIME   0x0800U /* Want/got stx_btime */
> -#define STATX_ALL0x0fffU /* All currently supported 
> flags */
> +
> #define STATX__RESERVED   0x8000U /* Reserved for future 
> struct statx expansion */
> 
> /*
> --
> 2.14.3
> 


Cheers, Andreas







signature.asc
Description: Message signed with OpenPGP

Re: [driver-core PATCH v4 4/6] driver core: Probe devices asynchronously instead of the driver

2018-10-18 Thread Alexander Duyck

On Thu, Oct 18, 2018 at 1:15 PM Bart Van Assche  wrote:
>
> On Thu, 2018-10-18 at 12:38 -0700, Alexander Duyck wrote:
> > Basically if somebody loads a driver the dev->driver becomes set. If a
> > driver is removed it will clear dev->driver and set driver_data to
> > 0/NULL. That is what I am using as a mutex to track it in conjunction
> > with the device mutex. Basically if somebody attempts to attach a driver
> > before we get there we just exit and don't attempt to load this driver.
>
> I don't think that the above matches your code. __device_attach() does not
> set the dev->driver pointer before scheduling an asynchronous probe. Only
> dev->driver_data gets set before the asynchonous probe is scheduled. Since
> driver_detach() only iterates over devices that are in the per-driver klist
> it will skip all devices for which an asynchronous probe has been scheduled
> but __device_attach_async_helper() has not yet been called. My conclusion
> remains that this patch does not prevent a driver pointer to become invalid
> concurrently with __device_attach_async_helper() dereferencing the same
> driver pointer.
>
> Bart.

I see what you are talking about now. Actually I think this was an
existing issue before my patch even came into play. Basically the code
as it currently stands is device specific in terms of the attach and
release code.

I wonder if we shouldn't have the async_synchronize_full call in
__device_release_driver moved down and into driver_detach before we
even start the for loop. Assuming the driver is no longer associated
with the bus that should flush out all devices so that we can then
pull them out of the devices list at least. I may look at adding an
additional bitflag to the device struct to indicate that it has a
driver attach pending. Then for things like races between any attach
and detach calls the logic becomes pretty straight forward. Attach
will set the bit and provide driver data, detach will clear the bit
and the driver data. If a driver loads in between it should clear the
bit as well.

I'll work on it over the next couple days and hopefully have something
ready for testing/review early next week.

Thanks.

- Alex

Re: [driver-core PATCH v4 4/6] driver core: Probe devices asynchronously instead of the driver

2018-10-18 Thread Alexander Duyck

On Thu, Oct 18, 2018 at 1:15 PM Bart Van Assche  wrote:
>
> On Thu, 2018-10-18 at 12:38 -0700, Alexander Duyck wrote:
> > Basically if somebody loads a driver the dev->driver becomes set. If a
> > driver is removed it will clear dev->driver and set driver_data to
> > 0/NULL. That is what I am using as a mutex to track it in conjunction
> > with the device mutex. Basically if somebody attempts to attach a driver
> > before we get there we just exit and don't attempt to load this driver.
>
> I don't think that the above matches your code. __device_attach() does not
> set the dev->driver pointer before scheduling an asynchronous probe. Only
> dev->driver_data gets set before the asynchonous probe is scheduled. Since
> driver_detach() only iterates over devices that are in the per-driver klist
> it will skip all devices for which an asynchronous probe has been scheduled
> but __device_attach_async_helper() has not yet been called. My conclusion
> remains that this patch does not prevent a driver pointer to become invalid
> concurrently with __device_attach_async_helper() dereferencing the same
> driver pointer.
>
> Bart.

I see what you are talking about now. Actually I think this was an
existing issue before my patch even came into play. Basically the code
as it currently stands is device specific in terms of the attach and
release code.

I wonder if we shouldn't have the async_synchronize_full call in
__device_release_driver moved down and into driver_detach before we
even start the for loop. Assuming the driver is no longer associated
with the bus that should flush out all devices so that we can then
pull them out of the devices list at least. I may look at adding an
additional bitflag to the device struct to indicate that it has a
driver attach pending. Then for things like races between any attach
and detach calls the logic becomes pretty straight forward. Attach
will set the bit and provide driver data, detach will clear the bit
and the driver data. If a driver loads in between it should clear the
bit as well.

I'll work on it over the next couple days and hopefully have something
ready for testing/review early next week.

Thanks.

- Alex

Re: [PATCH V9 18/21] dt-bindings: csky CPU Bindings

2018-10-18 Thread Guo Ren

On Thu, Oct 18, 2018 at 09:31:35AM -0500, Rob Herring wrote:
> On Tue, 16 Oct 2018 10:58:37 +0800, Guo Ren wrote:
> > This patch adds the documentation to describe that how to add cpu nodes in
> > dts for SMP.
> > 
> > Signed-off-by: Guo Ren 
> > Cc: Rob Herring 
> > ---
> > Changelog:
> >  - Add compatible.
> >  - Remove status part.
> > ---
> > ---
> >  Documentation/devicetree/bindings/csky/cpus.txt | 73 
> > +
> >  1 file changed, 73 insertions(+)
> >  create mode 100644 Documentation/devicetree/bindings/csky/cpus.txt
> > 
> 
> Reviewed-by: Rob Herring 

Thx Rob

Best Regards
 Guo Ren

Re: [PATCH V9 18/21] dt-bindings: csky CPU Bindings

2018-10-18 Thread Guo Ren

On Thu, Oct 18, 2018 at 09:31:35AM -0500, Rob Herring wrote:
> On Tue, 16 Oct 2018 10:58:37 +0800, Guo Ren wrote:
> > This patch adds the documentation to describe that how to add cpu nodes in
> > dts for SMP.
> > 
> > Signed-off-by: Guo Ren 
> > Cc: Rob Herring 
> > ---
> > Changelog:
> >  - Add compatible.
> >  - Remove status part.
> > ---
> > ---
> >  Documentation/devicetree/bindings/csky/cpus.txt | 73 
> > +
> >  1 file changed, 73 insertions(+)
> >  create mode 100644 Documentation/devicetree/bindings/csky/cpus.txt
> > 
> 
> Reviewed-by: Rob Herring 

Thx Rob

Best Regards
 Guo Ren

Re: [PATCH 2/2] mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask

2018-10-18 Thread Andrew Morton

On Wed, 26 Sep 2018 16:22:27 +0200 Michal Hocko  wrote:

> > MPOL_PREFERRED is handled by policy_node() before we call 
> > __alloc_pages_nodemask.
> > __GFP_THISNODE is applied only when we are not using
> > __GFP_DIRECT_RECLAIM which is handled in alloc_hugepage_direct_gfpmask
> > now.
> > Lastly MPOL_BIND wasn't handled explicitly but in the end the removed
> > late check would remove __GFP_THISNODE for it as well. So in the end we
> > are doing the same thing unless I miss something
> 
> Forgot to add. One notable exception would be that the previous code
> would allow to hit
>   WARN_ON_ONCE(policy->mode == MPOL_BIND && (gfp & __GFP_THISNODE));
> in policy_node if the requested node (e.g. cpu local one) was outside of
> the mbind nodemask. This is not possible now. We haven't heard about any
> such warning yet so it is unlikely that it happens though.

Perhaps a changelog addition is needed to cover the above?

I assume that David's mbind() concern has gone away.

No acks or reviewed-by's yet?

Re: [PATCH 2/2] mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask

2018-10-18 Thread Andrew Morton

On Wed, 26 Sep 2018 16:22:27 +0200 Michal Hocko  wrote:

> > MPOL_PREFERRED is handled by policy_node() before we call 
> > __alloc_pages_nodemask.
> > __GFP_THISNODE is applied only when we are not using
> > __GFP_DIRECT_RECLAIM which is handled in alloc_hugepage_direct_gfpmask
> > now.
> > Lastly MPOL_BIND wasn't handled explicitly but in the end the removed
> > late check would remove __GFP_THISNODE for it as well. So in the end we
> > are doing the same thing unless I miss something
> 
> Forgot to add. One notable exception would be that the previous code
> would allow to hit
>   WARN_ON_ONCE(policy->mode == MPOL_BIND && (gfp & __GFP_THISNODE));
> in policy_node if the requested node (e.g. cpu local one) was outside of
> the mbind nodemask. This is not possible now. We haven't heard about any
> such warning yet so it is unlikely that it happens though.

Perhaps a changelog addition is needed to cover the above?

I assume that David's mbind() concern has gone away.

No acks or reviewed-by's yet?

Re: possible deadlock in __generic_file_fsync

2018-10-18 Thread syzbot


syzbot has found a reproducer for the following crash on:

HEAD commit:fa520c47eaa1 fscache: Fix out of bound read in long cookie..
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=130da8ee40
kernel config:  https://syzkaller.appspot.com/x/.config?x=b3f55cb3dfcc6c33
dashboard link: https://syzkaller.appspot.com/bug?extid=5cd33f0e6abe2bb3e397
compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=1299be0940

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+5cd33f0e6abe2bb3e...@syzkaller.appspotmail.com


==
WARNING: possible circular locking dependency detected
4.19.0-rc8+ #290 Not tainted
--
kworker/0:1/14 is trying to acquire lock:
722efb72 (>s_type->i_mutex_key#10){+.+.}, at: inode_lock  
include/linux/fs.h:738 [inline]
722efb72 (>s_type->i_mutex_key#10){+.+.}, at:  
__generic_file_fsync+0xb5/0x200 fs/libfs.c:981


but task is already holding lock:
0144bfd5 ((work_completion)(>complete_work)){+.+.}, at:  
process_one_work+0xb9a/0x1b90 kernel/workqueue.c:2128


which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 ((work_completion)(>complete_work)){+.+.}:
   process_one_work+0xc0a/0x1b90 kernel/workqueue.c:2129
   worker_thread+0x17f/0x1390 kernel/workqueue.c:2296
   kthread+0x35a/0x420 kernel/kthread.c:246
   ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:413

-> #1 ((wq_completion)"dio/%s"sb->s_id){+.+.}:
   flush_workqueue+0x30a/0x1e10 kernel/workqueue.c:2655
   drain_workqueue+0x2a9/0x640 kernel/workqueue.c:2820
   destroy_workqueue+0xc6/0x9c0 kernel/workqueue.c:4155
   __alloc_workqueue_key+0xed8/0x1170 kernel/workqueue.c:4138
   sb_init_dio_done_wq+0x37/0x90 fs/direct-io.c:623
   do_blockdev_direct_IO+0x12ea/0x9d70 fs/direct-io.c:1283
   __blockdev_direct_IO+0x9d/0xc6 fs/direct-io.c:1417
   ext4_direct_IO_write fs/ext4/inode.c:3743 [inline]
   ext4_direct_IO+0xae8/0x2230 fs/ext4/inode.c:3870
   generic_file_direct_write+0x275/0x4b0 mm/filemap.c:3042
   __generic_file_write_iter+0x2ff/0x630 mm/filemap.c:3221
   ext4_file_write_iter+0x390/0x1420 fs/ext4/file.c:266
   call_write_iter include/linux/fs.h:1808 [inline]
   aio_write+0x3b1/0x610 fs/aio.c:1561
   io_submit_one+0xaa1/0xf80 fs/aio.c:1835
   __do_sys_io_submit fs/aio.c:1916 [inline]
   __se_sys_io_submit fs/aio.c:1887 [inline]
   __x64_sys_io_submit+0x1b7/0x580 fs/aio.c:1887
   do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

-> #0 (>s_type->i_mutex_key#10){+.+.}:
   lock_acquire+0x1ed/0x520 kernel/locking/lockdep.c:3900
   down_write+0x8a/0x130 kernel/locking/rwsem.c:70
   inode_lock include/linux/fs.h:738 [inline]
   __generic_file_fsync+0xb5/0x200 fs/libfs.c:981
   ext4_sync_file+0xa4f/0x1510 fs/ext4/fsync.c:120
   vfs_fsync_range+0x140/0x220 fs/sync.c:197
   generic_write_sync include/linux/fs.h:2732 [inline]
   dio_complete+0x75c/0x9e0 fs/direct-io.c:329
   dio_aio_complete_work+0x20/0x30 fs/direct-io.c:341
   process_one_work+0xc90/0x1b90 kernel/workqueue.c:2153
   worker_thread+0x17f/0x1390 kernel/workqueue.c:2296
   kthread+0x35a/0x420 kernel/kthread.c:246
   ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:413

other info that might help us debug this:

Chain exists of:
  >s_type->i_mutex_key#10 --> (wq_completion)"dio/%s"sb->s_id -->  
(work_completion)(>complete_work)


 Possible unsafe locking scenario:

   CPU0CPU1
   
  lock((work_completion)(>complete_work));
   lock((wq_completion)"dio/%s"sb->s_id);
   lock((work_completion)(>complete_work));
  lock(>s_type->i_mutex_key#10);

 *** DEADLOCK ***

2 locks held by kworker/0:1/14:
 #0: 13da2705 ((wq_completion)"dio/%s"sb->s_id){+.+.}, at:  
__write_once_size include/linux/compiler.h:215 [inline]
 #0: 13da2705 ((wq_completion)"dio/%s"sb->s_id){+.+.}, at:  
arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
 #0: 13da2705 ((wq_completion)"dio/%s"sb->s_id){+.+.}, at:  
atomic64_set include/asm-generic/atomic-instrumented.h:40 [inline]
 #0: 13da2705 ((wq_completion)"dio/%s"sb->s_id){+.+.}, at:  
atomic_long_set include/asm-generic/atomic-long.h:59 [inline]
 #0: 13da2705 ((wq_completion)"dio/%s"sb->s_id){+.+.}, at:  
set_work_data kernel/workqueue.c:617 [inline]
 #0: 13da2705 ((wq_completion)"dio/%s"sb->s_id){+.+.}, at:  
set_work_pool_and_clear_pending kernel/workqueue.c:644 [inline]
 #0: 13da2705 ((wq_completion)"dio/%s"sb->s_id){+.+.}, at:  
process_one_work+0xb43/0x1b90

Re: possible deadlock in __generic_file_fsync

2018-10-18 Thread syzbot


syzbot has found a reproducer for the following crash on:

HEAD commit:fa520c47eaa1 fscache: Fix out of bound read in long cookie..
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=130da8ee40
kernel config:  https://syzkaller.appspot.com/x/.config?x=b3f55cb3dfcc6c33
dashboard link: https://syzkaller.appspot.com/bug?extid=5cd33f0e6abe2bb3e397
compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=1299be0940

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+5cd33f0e6abe2bb3e...@syzkaller.appspotmail.com


==
WARNING: possible circular locking dependency detected
4.19.0-rc8+ #290 Not tainted
--
kworker/0:1/14 is trying to acquire lock:
722efb72 (>s_type->i_mutex_key#10){+.+.}, at: inode_lock  
include/linux/fs.h:738 [inline]
722efb72 (>s_type->i_mutex_key#10){+.+.}, at:  
__generic_file_fsync+0xb5/0x200 fs/libfs.c:981


but task is already holding lock:
0144bfd5 ((work_completion)(>complete_work)){+.+.}, at:  
process_one_work+0xb9a/0x1b90 kernel/workqueue.c:2128


which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 ((work_completion)(>complete_work)){+.+.}:
   process_one_work+0xc0a/0x1b90 kernel/workqueue.c:2129
   worker_thread+0x17f/0x1390 kernel/workqueue.c:2296
   kthread+0x35a/0x420 kernel/kthread.c:246
   ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:413

-> #1 ((wq_completion)"dio/%s"sb->s_id){+.+.}:
   flush_workqueue+0x30a/0x1e10 kernel/workqueue.c:2655
   drain_workqueue+0x2a9/0x640 kernel/workqueue.c:2820
   destroy_workqueue+0xc6/0x9c0 kernel/workqueue.c:4155
   __alloc_workqueue_key+0xed8/0x1170 kernel/workqueue.c:4138
   sb_init_dio_done_wq+0x37/0x90 fs/direct-io.c:623
   do_blockdev_direct_IO+0x12ea/0x9d70 fs/direct-io.c:1283
   __blockdev_direct_IO+0x9d/0xc6 fs/direct-io.c:1417
   ext4_direct_IO_write fs/ext4/inode.c:3743 [inline]
   ext4_direct_IO+0xae8/0x2230 fs/ext4/inode.c:3870
   generic_file_direct_write+0x275/0x4b0 mm/filemap.c:3042
   __generic_file_write_iter+0x2ff/0x630 mm/filemap.c:3221
   ext4_file_write_iter+0x390/0x1420 fs/ext4/file.c:266
   call_write_iter include/linux/fs.h:1808 [inline]
   aio_write+0x3b1/0x610 fs/aio.c:1561
   io_submit_one+0xaa1/0xf80 fs/aio.c:1835
   __do_sys_io_submit fs/aio.c:1916 [inline]
   __se_sys_io_submit fs/aio.c:1887 [inline]
   __x64_sys_io_submit+0x1b7/0x580 fs/aio.c:1887
   do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

-> #0 (>s_type->i_mutex_key#10){+.+.}:
   lock_acquire+0x1ed/0x520 kernel/locking/lockdep.c:3900
   down_write+0x8a/0x130 kernel/locking/rwsem.c:70
   inode_lock include/linux/fs.h:738 [inline]
   __generic_file_fsync+0xb5/0x200 fs/libfs.c:981
   ext4_sync_file+0xa4f/0x1510 fs/ext4/fsync.c:120
   vfs_fsync_range+0x140/0x220 fs/sync.c:197
   generic_write_sync include/linux/fs.h:2732 [inline]
   dio_complete+0x75c/0x9e0 fs/direct-io.c:329
   dio_aio_complete_work+0x20/0x30 fs/direct-io.c:341
   process_one_work+0xc90/0x1b90 kernel/workqueue.c:2153
   worker_thread+0x17f/0x1390 kernel/workqueue.c:2296
   kthread+0x35a/0x420 kernel/kthread.c:246
   ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:413

other info that might help us debug this:

Chain exists of:
  >s_type->i_mutex_key#10 --> (wq_completion)"dio/%s"sb->s_id -->  
(work_completion)(>complete_work)


 Possible unsafe locking scenario:

   CPU0CPU1
   
  lock((work_completion)(>complete_work));
   lock((wq_completion)"dio/%s"sb->s_id);
   lock((work_completion)(>complete_work));
  lock(>s_type->i_mutex_key#10);

 *** DEADLOCK ***

2 locks held by kworker/0:1/14:
 #0: 13da2705 ((wq_completion)"dio/%s"sb->s_id){+.+.}, at:  
__write_once_size include/linux/compiler.h:215 [inline]
 #0: 13da2705 ((wq_completion)"dio/%s"sb->s_id){+.+.}, at:  
arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
 #0: 13da2705 ((wq_completion)"dio/%s"sb->s_id){+.+.}, at:  
atomic64_set include/asm-generic/atomic-instrumented.h:40 [inline]
 #0: 13da2705 ((wq_completion)"dio/%s"sb->s_id){+.+.}, at:  
atomic_long_set include/asm-generic/atomic-long.h:59 [inline]
 #0: 13da2705 ((wq_completion)"dio/%s"sb->s_id){+.+.}, at:  
set_work_data kernel/workqueue.c:617 [inline]
 #0: 13da2705 ((wq_completion)"dio/%s"sb->s_id){+.+.}, at:  
set_work_pool_and_clear_pending kernel/workqueue.c:644 [inline]
 #0: 13da2705 ((wq_completion)"dio/%s"sb->s_id){+.+.}, at:  
process_one_work+0xb43/0x1b90

Re: [PATCH 0/9] psi: pressure stall information for CPU, memory, and IO v4

2018-10-18 Thread Andrew Morton

On Tue, 28 Aug 2018 13:22:49 -0400 Johannes Weiner  wrote:

> This version 4 of the PSI series incorporates feedback from Peter and
> fixes two races in the lockless aggregator that Suren found in his
> testing and which caused the sample calculation to sometimes underflow
> and record bogusly large samples; details at the bottom of this email.

We've had very little in the way of review activity for the PSI
patchset.  According to the changelog tags, anyway.

Re: [PATCH 0/9] psi: pressure stall information for CPU, memory, and IO v4

2018-10-18 Thread Andrew Morton

On Tue, 28 Aug 2018 13:22:49 -0400 Johannes Weiner  wrote:

> This version 4 of the PSI series incorporates feedback from Peter and
> fixes two races in the lockless aggregator that Suren found in his
> testing and which caused the sample calculation to sometimes underflow
> and record bogusly large samples; details at the bottom of this email.

We've had very little in the way of review activity for the PSI
patchset.  According to the changelog tags, anyway.

[PATCH V2] perf arm64: Fix generate system call table failed with /tmp mounted with noexec

2018-10-18 Thread Hongxu Jia

When /tmp is mounted with noexec, mksyscalltbl fails.
[snip]
|perf-1.0/tools/perf/arch/arm64/entry/syscalls//mksyscalltbl:
/tmp/create-table-6VGPSt: Permission denied
[snip]

Add variable TMPDIR as prefix dir of the temporary file, if it is set,
replace default /tmp

Remove extra slash from `syscalls//mksyscalltbl'

Fixes: 2b5882435606 ("perf arm64: Generate system call table from asm/unistd.h")

Signed-off-by: Hongxu Jia 
---
 tools/perf/arch/arm64/Makefile| 2 +-
 tools/perf/arch/arm64/entry/syscalls/mksyscalltbl | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/arch/arm64/Makefile b/tools/perf/arch/arm64/Makefile
index dbef716..bc2a284 100644
--- a/tools/perf/arch/arm64/Makefile
+++ b/tools/perf/arch/arm64/Makefile
@@ -13,7 +13,7 @@ out:= $(OUTPUT)arch/arm64/include/generated/asm
 header := $(out)/syscalls.c
 incpath := $(srctree)/tools
 sysdef := $(srctree)/tools/arch/arm64/include/uapi/asm/unistd.h
-sysprf := $(srctree)/tools/perf/arch/arm64/entry/syscalls/
+sysprf := $(srctree)/tools/perf/arch/arm64/entry/syscalls
 systbl := $(sysprf)/mksyscalltbl
 
 # Create output directory if not already present
diff --git a/tools/perf/arch/arm64/entry/syscalls/mksyscalltbl 
b/tools/perf/arch/arm64/entry/syscalls/mksyscalltbl
index 2dbb8cad..c88fd32 100755
--- a/tools/perf/arch/arm64/entry/syscalls/mksyscalltbl
+++ b/tools/perf/arch/arm64/entry/syscalls/mksyscalltbl
@@ -23,7 +23,7 @@ create_table_from_c()
 {
local sc nr last_sc
 
-   create_table_exe=`mktemp /tmp/create-table-XX`
+   create_table_exe=`mktemp ${TMPDIR:-/tmp}/create-table-XX`
 
{
 
-- 
2.7.4

[PATCH V2] perf arm64: Fix generate system call table failed with /tmp mounted with noexec

2018-10-18 Thread Hongxu Jia

When /tmp is mounted with noexec, mksyscalltbl fails.
[snip]
|perf-1.0/tools/perf/arch/arm64/entry/syscalls//mksyscalltbl:
/tmp/create-table-6VGPSt: Permission denied
[snip]

Add variable TMPDIR as prefix dir of the temporary file, if it is set,
replace default /tmp

Remove extra slash from `syscalls//mksyscalltbl'

Fixes: 2b5882435606 ("perf arm64: Generate system call table from asm/unistd.h")

Signed-off-by: Hongxu Jia 
---
 tools/perf/arch/arm64/Makefile| 2 +-
 tools/perf/arch/arm64/entry/syscalls/mksyscalltbl | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/arch/arm64/Makefile b/tools/perf/arch/arm64/Makefile
index dbef716..bc2a284 100644
--- a/tools/perf/arch/arm64/Makefile
+++ b/tools/perf/arch/arm64/Makefile
@@ -13,7 +13,7 @@ out:= $(OUTPUT)arch/arm64/include/generated/asm
 header := $(out)/syscalls.c
 incpath := $(srctree)/tools
 sysdef := $(srctree)/tools/arch/arm64/include/uapi/asm/unistd.h
-sysprf := $(srctree)/tools/perf/arch/arm64/entry/syscalls/
+sysprf := $(srctree)/tools/perf/arch/arm64/entry/syscalls
 systbl := $(sysprf)/mksyscalltbl
 
 # Create output directory if not already present
diff --git a/tools/perf/arch/arm64/entry/syscalls/mksyscalltbl 
b/tools/perf/arch/arm64/entry/syscalls/mksyscalltbl
index 2dbb8cad..c88fd32 100755
--- a/tools/perf/arch/arm64/entry/syscalls/mksyscalltbl
+++ b/tools/perf/arch/arm64/entry/syscalls/mksyscalltbl
@@ -23,7 +23,7 @@ create_table_from_c()
 {
local sc nr last_sc
 
-   create_table_exe=`mktemp /tmp/create-table-XX`
+   create_table_exe=`mktemp ${TMPDIR:-/tmp}/create-table-XX`
 
{
 
-- 
2.7.4

Re: [PATCH V2 2/5] mm/hugetlb: Distinguish between migratability and movability

2018-10-18 Thread Naoya Horiguchi

On Fri, Oct 12, 2018 at 09:29:56AM +0530, Anshuman Khandual wrote:
> During huge page allocation it's migratability is checked to determine if
> it should be placed under movable zones with GFP_HIGHUSER_MOVABLE. But the
> movability aspect of the huge page could depend on other factors than just
> migratability. Movability in itself is a distinct property which should not
> be tied with migratability alone.
> 
> This differentiates these two and implements an enhanced movability check
> which also considers huge page size to determine if it is feasible to be
> placed under a movable zone. At present it just checks for gigantic pages
> but going forward it can incorporate other enhanced checks.

(nitpicking...)
The following code just checks hugepage_migration_supported(), so maybe
s/Movability/Migratability/ is expected in the comment?

  static int unmap_and_move_huge_page(...)
  {
  ...
  /*
   * Movability of hugepages depends on architectures and hugepage size.
   * This check is necessary because some callers of hugepage migration
   * like soft offline and memory hotremove don't walk through page
   * tables or check whether the hugepage is pmd-based or not before
   * kicking migration.
   */
  if (!hugepage_migration_supported(page_hstate(hpage))) {

Thanks,
Naoya Horiguchi

> 
> Suggested-by: Michal Hocko 
> Signed-off-by: Anshuman Khandual 
> ---
>  include/linux/hugetlb.h | 30 ++
>  mm/hugetlb.c|  2 +-
>  2 files changed, 31 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 9c1b77f..456cb60 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -493,6 +493,31 @@ static inline bool hugepage_migration_supported(struct 
> hstate *h)
>  #endif
>  }
>  
> +/*
> + * Movability check is different as compared to migration check.
> + * It determines whether or not a huge page should be placed on
> + * movable zone or not. Movability of any huge page should be
> + * required only if huge page size is supported for migration.
> + * There wont be any reason for the huge page to be movable if
> + * it is not migratable to start with. Also the size of the huge
> + * page should be large enough to be placed under a movable zone
> + * and still feasible enough to be migratable. Just the presence
> + * in movable zone does not make the migration feasible.
> + *
> + * So even though large huge page sizes like the gigantic ones
> + * are migratable they should not be movable because its not
> + * feasible to migrate them from movable zone.
> + */
> +static inline bool hugepage_movable_supported(struct hstate *h)
> +{
> + if (!hugepage_migration_supported(h))
> + return false;
> +
> + if (hstate_is_gigantic(h))
> + return false;
> + return true;
> +}
> +
>  static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
>  struct mm_struct *mm, pte_t *pte)
>  {
> @@ -589,6 +614,11 @@ static inline bool hugepage_migration_supported(struct 
> hstate *h)
>   return false;
>  }
>  
> +static inline bool hugepage_movable_supported(struct hstate *h)
> +{
> + return false;
> +}
> +
>  static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
>  struct mm_struct *mm, pte_t *pte)
>  {
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 3c21775..a5a111d 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -919,7 +919,7 @@ static struct page *dequeue_huge_page_nodemask(struct 
> hstate *h, gfp_t gfp_mask,
>  /* Movability of hugepages depends on migration support. */
>  static inline gfp_t htlb_alloc_mask(struct hstate *h)
>  {
> - if (hugepage_migration_supported(h))
> + if (hugepage_movable_supported(h))
>   return GFP_HIGHUSER_MOVABLE;
>   else
>   return GFP_HIGHUSER;
> -- 
> 2.7.4
> 
>

Re: [PATCH V2 0/5] arm64/mm: Enable HugeTLB migration

2018-10-18 Thread Naoya Horiguchi

On Wed, Oct 17, 2018 at 01:49:17PM +0530, Anshuman Khandual wrote:
> 
> 
> On 10/12/2018 09:29 AM, Anshuman Khandual wrote:
> > This patch series enables HugeTLB migration support for all supported
> > huge page sizes at all levels including contiguous bit implementation.
> > Following HugeTLB migration support matrix has been enabled with this
> > patch series. All permutations have been tested except for the 16GB.
> > 
> >  CONT PTEPMDCONT PMDPUD
> >  ------
> > 4K: 64K 2M 32M 1G
> > 16K: 2M32M  1G
> > 64K: 2M   512M 16G
> > 
> > First the series adds migration support for PUD based huge pages. It
> > then adds a platform specific hook to query an architecture if a
> > given huge page size is supported for migration while also providing
> > a default fallback option preserving the existing semantics which just
> > checks for (PMD|PUD|PGDIR)_SHIFT macros. The last two patches enables
> > HugeTLB migration on arm64 and subscribe to this new platform specific
> > hook by defining an override.
> > 
> > The second patch differentiates between movability and migratability
> > aspects of huge pages and implements hugepage_movable_supported() which
> > can then be used during allocation to decide whether to place the huge
> > page in movable zone or not.
> > 
> > Changes in V2:
> > 
> > - Added a new patch which differentiates migratability and movability
> >   of huge pages and implements hugepage_movable_supported() function
> >   as suggested by Michal Hocko.
> 
> Hello Andrew/Michal/Mike/Naoya/Catalin,
> 
> Just checking for an update. Does this series looks okay ?

Looks good to me. So for the series

Reviewed-by: Naoya Horiguchi

Re: [PATCH V2 2/5] mm/hugetlb: Distinguish between migratability and movability

2018-10-18 Thread Naoya Horiguchi

On Fri, Oct 12, 2018 at 09:29:56AM +0530, Anshuman Khandual wrote:
> During huge page allocation it's migratability is checked to determine if
> it should be placed under movable zones with GFP_HIGHUSER_MOVABLE. But the
> movability aspect of the huge page could depend on other factors than just
> migratability. Movability in itself is a distinct property which should not
> be tied with migratability alone.
> 
> This differentiates these two and implements an enhanced movability check
> which also considers huge page size to determine if it is feasible to be
> placed under a movable zone. At present it just checks for gigantic pages
> but going forward it can incorporate other enhanced checks.

(nitpicking...)
The following code just checks hugepage_migration_supported(), so maybe
s/Movability/Migratability/ is expected in the comment?

  static int unmap_and_move_huge_page(...)
  {
  ...
  /*
   * Movability of hugepages depends on architectures and hugepage size.
   * This check is necessary because some callers of hugepage migration
   * like soft offline and memory hotremove don't walk through page
   * tables or check whether the hugepage is pmd-based or not before
   * kicking migration.
   */
  if (!hugepage_migration_supported(page_hstate(hpage))) {

Thanks,
Naoya Horiguchi

> 
> Suggested-by: Michal Hocko 
> Signed-off-by: Anshuman Khandual 
> ---
>  include/linux/hugetlb.h | 30 ++
>  mm/hugetlb.c|  2 +-
>  2 files changed, 31 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 9c1b77f..456cb60 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -493,6 +493,31 @@ static inline bool hugepage_migration_supported(struct 
> hstate *h)
>  #endif
>  }
>  
> +/*
> + * Movability check is different as compared to migration check.
> + * It determines whether or not a huge page should be placed on
> + * movable zone or not. Movability of any huge page should be
> + * required only if huge page size is supported for migration.
> + * There wont be any reason for the huge page to be movable if
> + * it is not migratable to start with. Also the size of the huge
> + * page should be large enough to be placed under a movable zone
> + * and still feasible enough to be migratable. Just the presence
> + * in movable zone does not make the migration feasible.
> + *
> + * So even though large huge page sizes like the gigantic ones
> + * are migratable they should not be movable because its not
> + * feasible to migrate them from movable zone.
> + */
> +static inline bool hugepage_movable_supported(struct hstate *h)
> +{
> + if (!hugepage_migration_supported(h))
> + return false;
> +
> + if (hstate_is_gigantic(h))
> + return false;
> + return true;
> +}
> +
>  static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
>  struct mm_struct *mm, pte_t *pte)
>  {
> @@ -589,6 +614,11 @@ static inline bool hugepage_migration_supported(struct 
> hstate *h)
>   return false;
>  }
>  
> +static inline bool hugepage_movable_supported(struct hstate *h)
> +{
> + return false;
> +}
> +
>  static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
>  struct mm_struct *mm, pte_t *pte)
>  {
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 3c21775..a5a111d 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -919,7 +919,7 @@ static struct page *dequeue_huge_page_nodemask(struct 
> hstate *h, gfp_t gfp_mask,
>  /* Movability of hugepages depends on migration support. */
>  static inline gfp_t htlb_alloc_mask(struct hstate *h)
>  {
> - if (hugepage_migration_supported(h))
> + if (hugepage_movable_supported(h))
>   return GFP_HIGHUSER_MOVABLE;
>   else
>   return GFP_HIGHUSER;
> -- 
> 2.7.4
> 
>

Re: [PATCH V2 0/5] arm64/mm: Enable HugeTLB migration

2018-10-18 Thread Naoya Horiguchi

On Wed, Oct 17, 2018 at 01:49:17PM +0530, Anshuman Khandual wrote:
> 
> 
> On 10/12/2018 09:29 AM, Anshuman Khandual wrote:
> > This patch series enables HugeTLB migration support for all supported
> > huge page sizes at all levels including contiguous bit implementation.
> > Following HugeTLB migration support matrix has been enabled with this
> > patch series. All permutations have been tested except for the 16GB.
> > 
> >  CONT PTEPMDCONT PMDPUD
> >  ------
> > 4K: 64K 2M 32M 1G
> > 16K: 2M32M  1G
> > 64K: 2M   512M 16G
> > 
> > First the series adds migration support for PUD based huge pages. It
> > then adds a platform specific hook to query an architecture if a
> > given huge page size is supported for migration while also providing
> > a default fallback option preserving the existing semantics which just
> > checks for (PMD|PUD|PGDIR)_SHIFT macros. The last two patches enables
> > HugeTLB migration on arm64 and subscribe to this new platform specific
> > hook by defining an override.
> > 
> > The second patch differentiates between movability and migratability
> > aspects of huge pages and implements hugepage_movable_supported() which
> > can then be used during allocation to decide whether to place the huge
> > page in movable zone or not.
> > 
> > Changes in V2:
> > 
> > - Added a new patch which differentiates migratability and movability
> >   of huge pages and implements hugepage_movable_supported() function
> >   as suggested by Michal Hocko.
> 
> Hello Andrew/Michal/Mike/Naoya/Catalin,
> 
> Just checking for an update. Does this series looks okay ?

Looks good to me. So for the series

Reviewed-by: Naoya Horiguchi

Re: [PATCH v4 3/6] dt-bindings: power: Introduce properties to present the battery OCV capacity table

2018-10-18 Thread Baolin Wang

On 19 October 2018 at 00:51, Rob Herring  wrote:
> On Mon, Oct 15, 2018 at 04:09:22PM +0800, Baolin Wang wrote:
>> Some battery driver will use the open circuit voltage (OCV) value to look
>> up the corresponding battery capacity percent in one certain degree Celsius.
>> Thus this patch provides some battery properties to present the OCV table
>> temperatures and OCV capacity table values.
>>
>> Suggested-by: Sebastian Reichel 
>> Signed-off-by: Baolin Wang 
>> Reviewed-by: Linus Walleij 
>> ---
>> Changes from v3:
>>  - Split binding into one separate patch.
>>  - Rename ocv-capacity-table-temperatures to ocv-capacity-celsius.
>>  - Add some words to specify the OCV's unit.
>>
>> Changes from v2:
>>  - Use type __be32 to calculate the table length.
>>  - Update error messages.
>>  - Add some helper functions.
>>
>> Changes from v1:
>>  - New patch in v2.
>> ---
>>  .../devicetree/bindings/power/supply/battery.txt   |   15 +++
>>  1 file changed, 15 insertions(+)
>>
>> diff --git a/Documentation/devicetree/bindings/power/supply/battery.txt 
>> b/Documentation/devicetree/bindings/power/supply/battery.txt
>> index 938d027..1f70e5d 100644
>> --- a/Documentation/devicetree/bindings/power/supply/battery.txt
>> +++ b/Documentation/devicetree/bindings/power/supply/battery.txt
>> @@ -23,6 +23,17 @@ Optional Properties:
>>   - constant-charge-current-max-microamp: maximum constant input current
>>   - constant-charge-voltage-max-microvolt: maximum constant input voltage
>>   - factory-internal-resistance-micro-ohms: battery factory internal 
>> resistance
>> + - ocv-capacity-table-0: An array providing the battery capacity percent
>> +   with corresponding open circuit voltage (OCV) of the battery, which
>> +   is used to look up battery capacity according to current OCV value.
>> +   And the open circuit voltage unit is microvolt.
>
> The order percent and voltage is still not clear. I'd guess one way
> based on the text here, but the example is the opposite. The description
> here must stand on its own.

Yes, you are right. I will correct the order of ocv-capacity-table-n. Thanks.

>> + - ocv-capacity-table-1: Same as ocv-capacity-table-0
>> + ..
>> + - ocv-capacity-table-n: Same as ocv-capacity-table-0
>> + - ocv-capacity-celsius: An array containing the temperature in degree 
>> Celsius,
>> +   for each of the battery capacity lookup table. The first temperature 
>> value
>> +   specifies the OCV table 0, and the second temperature value specifies the
>> +   OCV table 1, and so on.
>>
>>  Battery properties are named, where possible, for the corresponding
>>  elements in enum power_supply_property, defined in
>> @@ -44,6 +55,10 @@ Example:
>>   constant-charge-current-max-microamp = <90>;
>>   constant-charge-voltage-max-microvolt = <420>;
>>   factory-internal-resistance-micro-ohms = <25>;
>> + ocv-capacity-celsius = <(-10) 0 10>;
>> + ocv-capacity-table-0 = <4185000 100>, <4113000 95>, <4066000 
>> 90>, ...;
>> + ocv-capacity-table-1 = <420 100>, <4185000 95>, <4113000 
>> 90>, ...;
>> + ocv-capacity-table-2 = <425 100>, <420 95>, <4185000 
>> 90>, ...;
>>   };
>>
>>   charger: charger@11 {
>> --
>> 1.7.9.5
>>



-- 
Baolin Wang
Best Regards

Re: [PATCH v4 3/6] dt-bindings: power: Introduce properties to present the battery OCV capacity table

2018-10-18 Thread Baolin Wang

On 19 October 2018 at 00:51, Rob Herring  wrote:
> On Mon, Oct 15, 2018 at 04:09:22PM +0800, Baolin Wang wrote:
>> Some battery driver will use the open circuit voltage (OCV) value to look
>> up the corresponding battery capacity percent in one certain degree Celsius.
>> Thus this patch provides some battery properties to present the OCV table
>> temperatures and OCV capacity table values.
>>
>> Suggested-by: Sebastian Reichel 
>> Signed-off-by: Baolin Wang 
>> Reviewed-by: Linus Walleij 
>> ---
>> Changes from v3:
>>  - Split binding into one separate patch.
>>  - Rename ocv-capacity-table-temperatures to ocv-capacity-celsius.
>>  - Add some words to specify the OCV's unit.
>>
>> Changes from v2:
>>  - Use type __be32 to calculate the table length.
>>  - Update error messages.
>>  - Add some helper functions.
>>
>> Changes from v1:
>>  - New patch in v2.
>> ---
>>  .../devicetree/bindings/power/supply/battery.txt   |   15 +++
>>  1 file changed, 15 insertions(+)
>>
>> diff --git a/Documentation/devicetree/bindings/power/supply/battery.txt 
>> b/Documentation/devicetree/bindings/power/supply/battery.txt
>> index 938d027..1f70e5d 100644
>> --- a/Documentation/devicetree/bindings/power/supply/battery.txt
>> +++ b/Documentation/devicetree/bindings/power/supply/battery.txt
>> @@ -23,6 +23,17 @@ Optional Properties:
>>   - constant-charge-current-max-microamp: maximum constant input current
>>   - constant-charge-voltage-max-microvolt: maximum constant input voltage
>>   - factory-internal-resistance-micro-ohms: battery factory internal 
>> resistance
>> + - ocv-capacity-table-0: An array providing the battery capacity percent
>> +   with corresponding open circuit voltage (OCV) of the battery, which
>> +   is used to look up battery capacity according to current OCV value.
>> +   And the open circuit voltage unit is microvolt.
>
> The order percent and voltage is still not clear. I'd guess one way
> based on the text here, but the example is the opposite. The description
> here must stand on its own.

Yes, you are right. I will correct the order of ocv-capacity-table-n. Thanks.

>> + - ocv-capacity-table-1: Same as ocv-capacity-table-0
>> + ..
>> + - ocv-capacity-table-n: Same as ocv-capacity-table-0
>> + - ocv-capacity-celsius: An array containing the temperature in degree 
>> Celsius,
>> +   for each of the battery capacity lookup table. The first temperature 
>> value
>> +   specifies the OCV table 0, and the second temperature value specifies the
>> +   OCV table 1, and so on.
>>
>>  Battery properties are named, where possible, for the corresponding
>>  elements in enum power_supply_property, defined in
>> @@ -44,6 +55,10 @@ Example:
>>   constant-charge-current-max-microamp = <90>;
>>   constant-charge-voltage-max-microvolt = <420>;
>>   factory-internal-resistance-micro-ohms = <25>;
>> + ocv-capacity-celsius = <(-10) 0 10>;
>> + ocv-capacity-table-0 = <4185000 100>, <4113000 95>, <4066000 
>> 90>, ...;
>> + ocv-capacity-table-1 = <420 100>, <4185000 95>, <4113000 
>> 90>, ...;
>> + ocv-capacity-table-2 = <425 100>, <420 95>, <4185000 
>> 90>, ...;
>>   };
>>
>>   charger: charger@11 {
>> --
>> 1.7.9.5
>>



-- 
Baolin Wang
Best Regards

Re: [PATCH 3/3] statx: add STATX_ATTRIBUTES flag

2018-10-18 Thread Andreas Dilger

On Oct 18, 2018, at 7:11 AM, Miklos Szeredi  wrote:
> 
> FUSE will want to know if stx_attributes is interesting or not, because
> there's a non-zero cost of retreiving it.
> 
> This is more of a "want" flag, since stx_attributes_mask already indicates
> whether we "got" stx_attributes or not.  So for now we can just deal with
> this flag in the generic code.
> 
> Signed-off-by: Miklos Szeredi 
> Cc: David Howells 
> Cc: Michael Kerrisk 

Reviewed-by: Andreas Dilger 

> ---
> fs/stat.c   | 3 +++
> include/uapi/linux/stat.h   | 1 +
> samples/statx/test-statx.c  | 2 +-
> tools/include/uapi/linux/stat.h | 1 +
> 4 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/stat.c b/fs/stat.c
> index 8d297a279991..6bf86d57e2e3 100644
> --- a/fs/stat.c
> +++ b/fs/stat.c
> @@ -535,6 +535,9 @@ cp_statx(const struct kstat *stat, struct statx __user 
> *buffer)
>   tmp.stx_size = stat->size;
>   tmp.stx_blocks = stat->blocks;
>   tmp.stx_attributes_mask = stat->attributes_mask;
> + /* Having anything in attributes_mask means attributes are valid. */
> + if (tmp.stx_attributes_mask)
> + tmp.stx_mask |= STATX_ATTRIBUTES;
>   tmp.stx_atime.tv_sec = stat->atime.tv_sec;
>   tmp.stx_atime.tv_nsec = stat->atime.tv_nsec;
>   tmp.stx_btime.tv_sec = stat->btime.tv_sec;
> diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h
> index 370f09d92fa6..aef0aba5dfe7 100644
> --- a/include/uapi/linux/stat.h
> +++ b/include/uapi/linux/stat.h
> @@ -148,6 +148,7 @@ struct statx {
> #define STATX_BLOCKS  0x0400U /* Want/got stx_blocks */
> #define STATX_BASIC_STATS 0x07ffU /* The stuff in the normal stat 
> struct */
> #define STATX_BTIME   0x0800U /* Want/got stx_btime */
> +#define STATX_ATTRIBUTES 0x1000U /* Want/got stx_attributes */
> 
> #define STATX__RESERVED   0x8000U /* Reserved for future 
> struct statx expansion */
> 
> diff --git a/samples/statx/test-statx.c b/samples/statx/test-statx.c
> index e354048dea3c..deef9a68ff68 100644
> --- a/samples/statx/test-statx.c
> +++ b/samples/statx/test-statx.c
> @@ -211,7 +211,7 @@ int main(int argc, char **argv)
>   struct statx stx;
>   int ret, raw = 0, atflag = AT_SYMLINK_NOFOLLOW;
> 
> - unsigned int mask = STATX_BASIC_STATS | STATX_BTIME;
> + unsigned int mask = STATX_BASIC_STATS | STATX_BTIME | STATX_ATTRIBUTES;
> 
>   for (argv++; *argv; argv++) {
>   if (strcmp(*argv, "-F") == 0) {
> diff --git a/tools/include/uapi/linux/stat.h b/tools/include/uapi/linux/stat.h
> index 370f09d92fa6..aef0aba5dfe7 100644
> --- a/tools/include/uapi/linux/stat.h
> +++ b/tools/include/uapi/linux/stat.h
> @@ -148,6 +148,7 @@ struct statx {
> #define STATX_BLOCKS  0x0400U /* Want/got stx_blocks */
> #define STATX_BASIC_STATS 0x07ffU /* The stuff in the normal stat 
> struct */
> #define STATX_BTIME   0x0800U /* Want/got stx_btime */
> +#define STATX_ATTRIBUTES 0x1000U /* Want/got stx_attributes */
> 
> #define STATX__RESERVED   0x8000U /* Reserved for future 
> struct statx expansion */
> 
> --
> 2.14.3
> 


Cheers, Andreas







signature.asc
Description: Message signed with OpenPGP

Re: [PATCH 3/3] statx: add STATX_ATTRIBUTES flag

2018-10-18 Thread Andreas Dilger

On Oct 18, 2018, at 7:11 AM, Miklos Szeredi  wrote:
> 
> FUSE will want to know if stx_attributes is interesting or not, because
> there's a non-zero cost of retreiving it.
> 
> This is more of a "want" flag, since stx_attributes_mask already indicates
> whether we "got" stx_attributes or not.  So for now we can just deal with
> this flag in the generic code.
> 
> Signed-off-by: Miklos Szeredi 
> Cc: David Howells 
> Cc: Michael Kerrisk 

Reviewed-by: Andreas Dilger 

> ---
> fs/stat.c   | 3 +++
> include/uapi/linux/stat.h   | 1 +
> samples/statx/test-statx.c  | 2 +-
> tools/include/uapi/linux/stat.h | 1 +
> 4 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/stat.c b/fs/stat.c
> index 8d297a279991..6bf86d57e2e3 100644
> --- a/fs/stat.c
> +++ b/fs/stat.c
> @@ -535,6 +535,9 @@ cp_statx(const struct kstat *stat, struct statx __user 
> *buffer)
>   tmp.stx_size = stat->size;
>   tmp.stx_blocks = stat->blocks;
>   tmp.stx_attributes_mask = stat->attributes_mask;
> + /* Having anything in attributes_mask means attributes are valid. */
> + if (tmp.stx_attributes_mask)
> + tmp.stx_mask |= STATX_ATTRIBUTES;
>   tmp.stx_atime.tv_sec = stat->atime.tv_sec;
>   tmp.stx_atime.tv_nsec = stat->atime.tv_nsec;
>   tmp.stx_btime.tv_sec = stat->btime.tv_sec;
> diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h
> index 370f09d92fa6..aef0aba5dfe7 100644
> --- a/include/uapi/linux/stat.h
> +++ b/include/uapi/linux/stat.h
> @@ -148,6 +148,7 @@ struct statx {
> #define STATX_BLOCKS  0x0400U /* Want/got stx_blocks */
> #define STATX_BASIC_STATS 0x07ffU /* The stuff in the normal stat 
> struct */
> #define STATX_BTIME   0x0800U /* Want/got stx_btime */
> +#define STATX_ATTRIBUTES 0x1000U /* Want/got stx_attributes */
> 
> #define STATX__RESERVED   0x8000U /* Reserved for future 
> struct statx expansion */
> 
> diff --git a/samples/statx/test-statx.c b/samples/statx/test-statx.c
> index e354048dea3c..deef9a68ff68 100644
> --- a/samples/statx/test-statx.c
> +++ b/samples/statx/test-statx.c
> @@ -211,7 +211,7 @@ int main(int argc, char **argv)
>   struct statx stx;
>   int ret, raw = 0, atflag = AT_SYMLINK_NOFOLLOW;
> 
> - unsigned int mask = STATX_BASIC_STATS | STATX_BTIME;
> + unsigned int mask = STATX_BASIC_STATS | STATX_BTIME | STATX_ATTRIBUTES;
> 
>   for (argv++; *argv; argv++) {
>   if (strcmp(*argv, "-F") == 0) {
> diff --git a/tools/include/uapi/linux/stat.h b/tools/include/uapi/linux/stat.h
> index 370f09d92fa6..aef0aba5dfe7 100644
> --- a/tools/include/uapi/linux/stat.h
> +++ b/tools/include/uapi/linux/stat.h
> @@ -148,6 +148,7 @@ struct statx {
> #define STATX_BLOCKS  0x0400U /* Want/got stx_blocks */
> #define STATX_BASIC_STATS 0x07ffU /* The stuff in the normal stat 
> struct */
> #define STATX_BTIME   0x0800U /* Want/got stx_btime */
> +#define STATX_ATTRIBUTES 0x1000U /* Want/got stx_attributes */
> 
> #define STATX__RESERVED   0x8000U /* Reserved for future 
> struct statx expansion */
> 
> --
> 2.14.3
> 


Cheers, Andreas







signature.asc
Description: Message signed with OpenPGP

Re: [PATCH] hugetlbfs: dirty pages as they are added to pagecache

2018-10-18 Thread Andrew Morton

On Thu, 18 Oct 2018 20:46:21 -0400 Andrea Arcangeli  wrote:

> On Thu, Oct 18, 2018 at 04:16:40PM -0700, Mike Kravetz wrote:
> > I was not sure about this, and expected someone could come up with
> > something better.  It just seems there are filesystems like huegtlbfs,
> > where it makes no sense wasting cycles traversing the filesystem.  So,
> > let's not even try.
> > 
> > Hoping someone can come up with a better method than hard coding as
> > I have done above.
> 
> It's not strictly required after marking the pages dirty though. The
> real fix is the other one? Could we just drop the hardcoding and let
> it run after the real fix is applied?
> 
> The performance of drop_caches doesn't seem critical, especially with
> gigapages. tmpfs doesn't seem to be optimized away from drop_caches
> and the gain would be bigger for tmpfs if THP is not enabled in the
> mount, so I'm not sure if we should worry about hugetlbfs first.

I guess so.  I can't immediately see a clean way of expressing this so
perhaps it would need a new BDI_CAP_NO_BACKING_STORE.  Such a
thing hardly seems worthwhile for drop_caches.

And drop_caches really shouldn't be there anyway.  It's a standing
workaround for ongoing suckage in pagecache and metadata reclaim
behaviour :(

Re: [PATCH] hugetlbfs: dirty pages as they are added to pagecache

2018-10-18 Thread Andrew Morton

On Thu, 18 Oct 2018 20:46:21 -0400 Andrea Arcangeli  wrote:

> On Thu, Oct 18, 2018 at 04:16:40PM -0700, Mike Kravetz wrote:
> > I was not sure about this, and expected someone could come up with
> > something better.  It just seems there are filesystems like huegtlbfs,
> > where it makes no sense wasting cycles traversing the filesystem.  So,
> > let's not even try.
> > 
> > Hoping someone can come up with a better method than hard coding as
> > I have done above.
> 
> It's not strictly required after marking the pages dirty though. The
> real fix is the other one? Could we just drop the hardcoding and let
> it run after the real fix is applied?
> 
> The performance of drop_caches doesn't seem critical, especially with
> gigapages. tmpfs doesn't seem to be optimized away from drop_caches
> and the gain would be bigger for tmpfs if THP is not enabled in the
> mount, so I'm not sure if we should worry about hugetlbfs first.

I guess so.  I can't immediately see a clean way of expressing this so
perhaps it would need a new BDI_CAP_NO_BACKING_STORE.  Such a
thing hardly seems worthwhile for drop_caches.

And drop_caches really shouldn't be there anyway.  It's a standing
workaround for ongoing suckage in pagecache and metadata reclaim
behaviour :(

[PATCH 2/2] locking/lockdep: Make global debug_locks* variables read-mostly

2018-10-18 Thread Waiman Long

Make the frequently used lockdep global variable debug_locks read-mostly.
As debug_locks_silent is sometime used together with debug_locks,
it is also made read-mostly so that they can be close together.

With false cacheline sharing, cacheline contention problem can happen
depending on what get put into the same cacheline as debug_locks.

Signed-off-by: Waiman Long 
---
 include/linux/debug_locks.h | 4 ++--
 lib/debug_locks.c   | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/debug_locks.h b/include/linux/debug_locks.h
index 120225e..257ab3c 100644
--- a/include/linux/debug_locks.h
+++ b/include/linux/debug_locks.h
@@ -8,8 +8,8 @@
 
 struct task_struct;
 
-extern int debug_locks;
-extern int debug_locks_silent;
+extern int debug_locks __read_mostly;
+extern int debug_locks_silent __read_mostly;
 
 
 static inline int __debug_locks_off(void)
diff --git a/lib/debug_locks.c b/lib/debug_locks.c
index 124fdf2..ce51749 100644
--- a/lib/debug_locks.c
+++ b/lib/debug_locks.c
@@ -21,7 +21,7 @@
  * that would just muddy the log. So we report the first one and
  * shut up after that.
  */
-int debug_locks = 1;
+int debug_locks __read_mostly = 1;
 EXPORT_SYMBOL_GPL(debug_locks);
 
 /*
@@ -29,7 +29,7 @@
  * 'silent failure': nothing is printed to the console when
  * a locking bug is detected.
  */
-int debug_locks_silent;
+int debug_locks_silent __read_mostly;
 EXPORT_SYMBOL_GPL(debug_locks_silent);
 
 /*
-- 
1.8.3.1

[PATCH 1/2] locking/lockdep: Fix debug_locks off performance problem

2018-10-18 Thread Waiman Long

It was found that when debug_locks was turned off because of a problem
found by the lockdep code, the system performance could drop quite
significantly when the lock_stat code was also configured into the
kernel. For instance, parallel kernel build time on a 4-socket x86-64
server nearly doubled.

Further analysis into the cause of the slowdown traced back to the
frequent call to debug_locks_off() from the __lock_acquired() function
probably due to some inconsistent lockdep states with debug_locks
off. The debug_locks_off() function did an unconditional atomic xchg
to write a 0 value into debug_locks which had already been set to 0.
This led to severe cacheline contention in the cacheline that held
debug_locks.  As debug_locks is being referenced in quite a few different
places in the kernel, this greatly slow down the system performance.

To prevent that trashing of debug_locks cacheline, lock_acquired()
and lock_contended() now checks the state of debug_locks before
proceeding. The debug_locks_off() function is also modified to check
debug_locks before calling __debug_locks_off().

Signed-off-by: Waiman Long 
---
 kernel/locking/lockdep.c | 4 ++--
 lib/debug_locks.c| 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index be76f47..1efada2 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -4066,7 +4066,7 @@ void lock_contended(struct lockdep_map *lock, unsigned 
long ip)
 {
unsigned long flags;
 
-   if (unlikely(!lock_stat))
+   if (unlikely(!lock_stat || !debug_locks))
return;
 
if (unlikely(current->lockdep_recursion))
@@ -4086,7 +4086,7 @@ void lock_acquired(struct lockdep_map *lock, unsigned 
long ip)
 {
unsigned long flags;
 
-   if (unlikely(!lock_stat))
+   if (unlikely(!lock_stat || !debug_locks))
return;
 
if (unlikely(current->lockdep_recursion))
diff --git a/lib/debug_locks.c b/lib/debug_locks.c
index 96c4c63..124fdf2 100644
--- a/lib/debug_locks.c
+++ b/lib/debug_locks.c
@@ -37,7 +37,7 @@
  */
 int debug_locks_off(void)
 {
-   if (__debug_locks_off()) {
+   if (debug_locks && __debug_locks_off()) {
if (!debug_locks_silent) {
console_verbose();
return 1;
-- 
1.8.3.1

[PATCH 2/2] locking/lockdep: Make global debug_locks* variables read-mostly

2018-10-18 Thread Waiman Long

Make the frequently used lockdep global variable debug_locks read-mostly.
As debug_locks_silent is sometime used together with debug_locks,
it is also made read-mostly so that they can be close together.

With false cacheline sharing, cacheline contention problem can happen
depending on what get put into the same cacheline as debug_locks.

Signed-off-by: Waiman Long 
---
 include/linux/debug_locks.h | 4 ++--
 lib/debug_locks.c   | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/debug_locks.h b/include/linux/debug_locks.h
index 120225e..257ab3c 100644
--- a/include/linux/debug_locks.h
+++ b/include/linux/debug_locks.h
@@ -8,8 +8,8 @@
 
 struct task_struct;
 
-extern int debug_locks;
-extern int debug_locks_silent;
+extern int debug_locks __read_mostly;
+extern int debug_locks_silent __read_mostly;
 
 
 static inline int __debug_locks_off(void)
diff --git a/lib/debug_locks.c b/lib/debug_locks.c
index 124fdf2..ce51749 100644
--- a/lib/debug_locks.c
+++ b/lib/debug_locks.c
@@ -21,7 +21,7 @@
  * that would just muddy the log. So we report the first one and
  * shut up after that.
  */
-int debug_locks = 1;
+int debug_locks __read_mostly = 1;
 EXPORT_SYMBOL_GPL(debug_locks);
 
 /*
@@ -29,7 +29,7 @@
  * 'silent failure': nothing is printed to the console when
  * a locking bug is detected.
  */
-int debug_locks_silent;
+int debug_locks_silent __read_mostly;
 EXPORT_SYMBOL_GPL(debug_locks_silent);
 
 /*
-- 
1.8.3.1

[PATCH 1/2] locking/lockdep: Fix debug_locks off performance problem

2018-10-18 Thread Waiman Long

It was found that when debug_locks was turned off because of a problem
found by the lockdep code, the system performance could drop quite
significantly when the lock_stat code was also configured into the
kernel. For instance, parallel kernel build time on a 4-socket x86-64
server nearly doubled.

Further analysis into the cause of the slowdown traced back to the
frequent call to debug_locks_off() from the __lock_acquired() function
probably due to some inconsistent lockdep states with debug_locks
off. The debug_locks_off() function did an unconditional atomic xchg
to write a 0 value into debug_locks which had already been set to 0.
This led to severe cacheline contention in the cacheline that held
debug_locks.  As debug_locks is being referenced in quite a few different
places in the kernel, this greatly slow down the system performance.

To prevent that trashing of debug_locks cacheline, lock_acquired()
and lock_contended() now checks the state of debug_locks before
proceeding. The debug_locks_off() function is also modified to check
debug_locks before calling __debug_locks_off().

Signed-off-by: Waiman Long 
---
 kernel/locking/lockdep.c | 4 ++--
 lib/debug_locks.c| 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index be76f47..1efada2 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -4066,7 +4066,7 @@ void lock_contended(struct lockdep_map *lock, unsigned 
long ip)
 {
unsigned long flags;
 
-   if (unlikely(!lock_stat))
+   if (unlikely(!lock_stat || !debug_locks))
return;
 
if (unlikely(current->lockdep_recursion))
@@ -4086,7 +4086,7 @@ void lock_acquired(struct lockdep_map *lock, unsigned 
long ip)
 {
unsigned long flags;
 
-   if (unlikely(!lock_stat))
+   if (unlikely(!lock_stat || !debug_locks))
return;
 
if (unlikely(current->lockdep_recursion))
diff --git a/lib/debug_locks.c b/lib/debug_locks.c
index 96c4c63..124fdf2 100644
--- a/lib/debug_locks.c
+++ b/lib/debug_locks.c
@@ -37,7 +37,7 @@
  */
 int debug_locks_off(void)
 {
-   if (__debug_locks_off()) {
+   if (debug_locks && __debug_locks_off()) {
if (!debug_locks_silent) {
console_verbose();
return 1;
-- 
1.8.3.1

Re: [PATCH 2/3] uapi: get rid of STATX_ALL

2018-10-18 Thread Andreas Dilger

> On Oct 18, 2018, at 7:15 AM, Florian Weimer  wrote:
> 
> * Miklos Szeredi:
> 
>> #define STATX__RESERVED  0x8000U /* Reserved for future 
>> struct statx expansion */
> 
> What about this?  Isn't it similar to STATX_ALL in the sense that we
> don't know yet what it will mean?

No, this means that this last bit will be used for increasing the size of the
stx_mask field at some point in the future.  IMHO, this is present as a reminder
to any developer who is adding fields in the future not to use the last flag.

Cheers, Andreas

signature.asc
Description: Message signed with OpenPGP

Re: [PATCH 2/3] uapi: get rid of STATX_ALL

2018-10-18 Thread Andreas Dilger

> On Oct 18, 2018, at 7:15 AM, Florian Weimer  wrote:
> 
> * Miklos Szeredi:
> 
>> #define STATX__RESERVED  0x8000U /* Reserved for future 
>> struct statx expansion */
> 
> What about this?  Isn't it similar to STATX_ALL in the sense that we
> don't know yet what it will mean?

No, this means that this last bit will be used for increasing the size of the
stx_mask field at some point in the future.  IMHO, this is present as a reminder
to any developer who is adding fields in the future not to use the last flag.

Cheers, Andreas

signature.asc
Description: Message signed with OpenPGP

Re: [PATCH 4.9 00/35] 4.9.135-stable review

2018-10-18 Thread Nathan Chancellor

On Thu, Oct 18, 2018 at 07:54:29PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.9.135 release.
> There are 35 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Sat Oct 20 17:54:00 UTC 2018.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
>   
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.135-rc1.gz
> or in the git tree and branch at:
>   
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.9.y
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h
> 

Merged, compiled with -Werror, and installed onto my OnePlus 6.

No initial issues noticed in dmesg or general usage.

Thanks!
Nathan

Re: [PATCH 4.14 00/41] 4.14.78-stable review

2018-10-18 Thread Nathan Chancellor

On Thu, Oct 18, 2018 at 07:54:15PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.14.78 release.
> There are 41 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Sat Oct 20 17:53:55 UTC 2018.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
>   
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.78-rc1.gz
> or in the git tree and branch at:
>   
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.14.y
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h
> 

Merged, compiled, and installed onto my Raspberry Pi.

No initial issues noticed in dmesg or general usage.

Thanks!
Nathan

Re: [PATCH 4.14 00/41] 4.14.78-stable review

2018-10-18 Thread Nathan Chancellor

On Thu, Oct 18, 2018 at 07:54:15PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.14.78 release.
> There are 41 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Sat Oct 20 17:53:55 UTC 2018.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
>   
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.78-rc1.gz
> or in the git tree and branch at:
>   
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.14.y
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h
> 

Merged, compiled, and installed onto my Raspberry Pi.

No initial issues noticed in dmesg or general usage.

Thanks!
Nathan

Re: [PATCH 4.9 00/35] 4.9.135-stable review

2018-10-18 Thread Nathan Chancellor

On Thu, Oct 18, 2018 at 07:54:29PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.9.135 release.
> There are 35 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Sat Oct 20 17:54:00 UTC 2018.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
>   
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.135-rc1.gz
> or in the git tree and branch at:
>   
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.9.y
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h
> 

Merged, compiled with -Werror, and installed onto my OnePlus 6.

No initial issues noticed in dmesg or general usage.

Thanks!
Nathan

Re: [PATCH 4.4 00/48] 4.4.162-stable review

2018-10-18 Thread Nathan Chancellor

On Thu, Oct 18, 2018 at 07:54:35PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.4.162 release.
> There are 48 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Sat Oct 20 17:54:03 UTC 2018.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
>   
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.162-rc1.gz
> or in the git tree and branch at:
>   
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.4.y
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h
> 

Merged, compiled with -Werror, and installed onto my Pixel 2 XL.

No initial issues noticed in dmesg or general usage.

Thanks!
Nathan

Re: [PATCH 4.4 00/48] 4.4.162-stable review

2018-10-18 Thread Nathan Chancellor

On Thu, Oct 18, 2018 at 07:54:35PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.4.162 release.
> There are 48 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Sat Oct 20 17:54:03 UTC 2018.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
>   
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.162-rc1.gz
> or in the git tree and branch at:
>   
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.4.y
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h
> 

Merged, compiled with -Werror, and installed onto my Pixel 2 XL.

No initial issues noticed in dmesg or general usage.

Thanks!
Nathan

Re: turbostat-17.06.23 floating point exception

2018-10-18 Thread Solio Sarabia

On Fri, Oct 12, 2018 at 07:03:41PM -0400, Len Brown wrote:
> > Why would the cpu topology report 0 cpus?  I added a debug entry to
> > cpu_usage_stat and /proc/stat showed it as an extra column.  Then
> > fscanf parsing in for_all_cpus() failed, causing the SIGFPE.
> >
> > This is not an issue. Thanks.
> 
> Yes, it is true that turbostat doesn't check for systems with 0 cpus.
> I'm curious how you provoked the kernel to claim that.  If it is
> something others might do, we can have check for it and gracefully
> exit.

source/tools/power/x86/turbostat/turbostat.c
int for_all_proc_cpus(int (func)(int))
{
retval = fscanf(fp, "cpu %*d %*d %*d %*d %*d %*d %*d %*d %*d %*d\n");
^
This fails due to an extra debug entry in /proc/stat
(total of 11 columns).  I was measuring time in a hot
function and decided to add this time in an extra
cpu_usage_stat. This was an experiment though.

Thanks,
-S.

Re: turbostat-17.06.23 floating point exception

2018-10-18 Thread Solio Sarabia

On Fri, Oct 12, 2018 at 07:03:41PM -0400, Len Brown wrote:
> > Why would the cpu topology report 0 cpus?  I added a debug entry to
> > cpu_usage_stat and /proc/stat showed it as an extra column.  Then
> > fscanf parsing in for_all_cpus() failed, causing the SIGFPE.
> >
> > This is not an issue. Thanks.
> 
> Yes, it is true that turbostat doesn't check for systems with 0 cpus.
> I'm curious how you provoked the kernel to claim that.  If it is
> something others might do, we can have check for it and gracefully
> exit.

source/tools/power/x86/turbostat/turbostat.c
int for_all_proc_cpus(int (func)(int))
{
retval = fscanf(fp, "cpu %*d %*d %*d %*d %*d %*d %*d %*d %*d %*d\n");
^
This fails due to an extra debug entry in /proc/stat
(total of 11 columns).  I was measuring time in a hot
function and decided to add this time in an extra
cpu_usage_stat. This was an experiment though.

Thanks,
-S.

Re: [BUG -next 20181008] list corruption with "mm/slub: remove useless condition in deactivate_slab"

2018-10-18 Thread Pingfan Liu

On Tue, Oct 16, 2018 at 3:36 PM Heiko Carstens
 wrote:
>
> On Tue, Oct 16, 2018 at 02:29:28PM +0800, Pingfan Liu wrote:
> > > I think it is caused by the uinon page->lru and page->next. It can be 
> > > fixed by:
> > > diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
> > > index 3a1a1db..4aa0fb5 100644
> > > --- a/include/linux/slub_def.h
> > > +++ b/include/linux/slub_def.h
> > > @@ -56,6 +56,7 @@ struct kmem_cache_cpu {
> > >  #define slub_set_percpu_partial(c, p)  \
> > >  ({ \
> > > slub_percpu_partial(c) = (p)->next; \
> > > +   p->next = NULL; \
> > >  })
> > >
> > > I will do some test and post the fix.
> > >
> > Please ignore the above comment. And after re-check the code, I am
> > sure that all callers of deactivate_slab(), pass c->page, which means
> > that page should not be on any list. But your test result "list_add
> > double add: new=03d1029ecc08,
> > prev=8ff846d0,next=03d1029ecc08"  indicates that
> > page(new) is already on a list. I think that maybe something else is
> > wrong which is covered.
> > I can not reproduce this bug on x86. Could you share your config and
> > cmdline? Any do you turn on any debug option of slub?
>
> You can re-create the config with "make ARCH=s390 debug_defconfig".
>
> Not sure which machine I used to reproduce this but most likely it was
> a machine with these command line options:
>
> dasd=e12d root=/dev/dasda1 userprocess_debug numa_debug sched_debug
> ignore_loglevel sclp_con_drop=1 sclp_con_pages=32 audit=0
> crashkernel=128M ignore_rlimit_data
>
> You can ignore the dasd and sclp* command line options. These are
> s390 specific. The rest should be available on any architecture.
>
Thank you for the info. I can reproduce the bug, and find that this
bug is caused by this commit. In deactivate_slab(), page is firstly
add_full(), then hit the redo condition, hence it should be
remove_full(). This wrong commit erases the related code.

Regards,
Pingfan

Re: [BUG -next 20181008] list corruption with "mm/slub: remove useless condition in deactivate_slab"

2018-10-18 Thread Pingfan Liu

On Tue, Oct 16, 2018 at 3:36 PM Heiko Carstens
 wrote:
>
> On Tue, Oct 16, 2018 at 02:29:28PM +0800, Pingfan Liu wrote:
> > > I think it is caused by the uinon page->lru and page->next. It can be 
> > > fixed by:
> > > diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
> > > index 3a1a1db..4aa0fb5 100644
> > > --- a/include/linux/slub_def.h
> > > +++ b/include/linux/slub_def.h
> > > @@ -56,6 +56,7 @@ struct kmem_cache_cpu {
> > >  #define slub_set_percpu_partial(c, p)  \
> > >  ({ \
> > > slub_percpu_partial(c) = (p)->next; \
> > > +   p->next = NULL; \
> > >  })
> > >
> > > I will do some test and post the fix.
> > >
> > Please ignore the above comment. And after re-check the code, I am
> > sure that all callers of deactivate_slab(), pass c->page, which means
> > that page should not be on any list. But your test result "list_add
> > double add: new=03d1029ecc08,
> > prev=8ff846d0,next=03d1029ecc08"  indicates that
> > page(new) is already on a list. I think that maybe something else is
> > wrong which is covered.
> > I can not reproduce this bug on x86. Could you share your config and
> > cmdline? Any do you turn on any debug option of slub?
>
> You can re-create the config with "make ARCH=s390 debug_defconfig".
>
> Not sure which machine I used to reproduce this but most likely it was
> a machine with these command line options:
>
> dasd=e12d root=/dev/dasda1 userprocess_debug numa_debug sched_debug
> ignore_loglevel sclp_con_drop=1 sclp_con_pages=32 audit=0
> crashkernel=128M ignore_rlimit_data
>
> You can ignore the dasd and sclp* command line options. These are
> s390 specific. The rest should be available on any architecture.
>
Thank you for the info. I can reproduce the bug, and find that this
bug is caused by this commit. In deactivate_slab(), page is firstly
add_full(), then hit the redo condition, hence it should be
remove_full(). This wrong commit erases the related code.

Regards,
Pingfan

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1550 matches

Mail list logo