Re: [tip:x86/mm] x86/mm: Break out user address space handling

2018-10-18 Thread Ingo Molnar


* Eric W. Biederman  wrote:

> tip-bot for Dave Hansen  writes:
> 
> > Commit-ID:  aa37c51b9421d66f7931c5fdcb9ce80c450974be
> > Gitweb: 
> > https://git.kernel.org/tip/aa37c51b9421d66f7931c5fdcb9ce80c450974be
> > Author: Dave Hansen 
> > AuthorDate: Fri, 28 Sep 2018 09:02:23 -0700
> > Committer:  Peter Zijlstra 
> > CommitDate: Tue, 9 Oct 2018 16:51:15 +0200
> >
> > x86/mm: Break out user address space handling
> >
> > The last patch broke out kernel address space handing into its own
> > helper.  Now, do the same for user address space handling.
> >
> > Cc: x...@kernel.org
> > Cc: Jann Horn 
> > Cc: Sean Christopherson 
> > Cc: Thomas Gleixner 
> > Cc: Andy Lutomirski 
> > Signed-off-by: Dave Hansen 
> > Signed-off-by: Peter Zijlstra (Intel) 
> > Link: http://lkml.kernel.org/r/20180928160223.9c4f6...@viggo.jf.intel.com
> > ---
> >  arch/x86/mm/fault.c | 47 ---
> >  1 file changed, 28 insertions(+), 19 deletions(-)
> >
> > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> > index c7e32f453852..0d1f5d39fc63 100644
> > --- a/arch/x86/mm/fault.c
> > +++ b/arch/x86/mm/fault.c
> > @@ -966,6 +966,7 @@ bad_area_access_error(struct pt_regs *regs, unsigned 
> > long error_code,
> > __bad_area(regs, error_code, address, vma, SEGV_ACCERR);
> >  }
> >  
> > +/* Handle faults in the kernel portion of the address space */
>^^
> I believe you mean the __user__ portion of the address space.
> Given that the call chain is:
> 
> do_user_addr_fault
>handle_mm_fault
>   do_sigbus  

It's both:

  /* Handle faults in the kernel portion of the address space */
  static void
  do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long 
address,
u32 *pkey, unsigned int fault)
  {
struct task_struct *tsk = current;
int code = BUS_ADRERR;

/* Kernel mode? Handle exceptions or die: */
if (!(error_code & X86_PF_USER)) {
no_context(regs, error_code, address, SIGBUS, BUS_ADRERR);
return;
}

/* User-space => ok to do another page fault: */
if (is_prefetch(regs, error_code, address))
return;

tsk->thread.cr2 = address;
tsk->thread.error_code  = error_code;
tsk->thread.trap_nr = X86_TRAP_PF;


Note the X86_PF_USER check: that's what determines whether the fault was 
for user or system mappings.

I agree that the comment is misleading and should be clarified.

Thanks,

Ingo


Re: [RFC v4 PATCH 2/5] mm/__free_one_page: skip merge for order-0 page unless compaction failed

2018-10-18 Thread Aaron Lu
On Thu, Oct 18, 2018 at 12:16:32PM +0100, Mel Gorman wrote:
> On Wed, Oct 17, 2018 at 10:59:04PM +0800, Aaron Lu wrote:
> > > Any particuular reason why? I assume it's related to the number of zone
> > > locks with the increase number of zones and the number of threads used
> > > for the test.
> > 
> > I think so too.
> > 
> > The 4 sockets server has 192 CPUs in total while the 2 sockets server
> > has 112 CPUs in total. Assume only ZONE_NORMAL are used, for the 4
> > sockets server it would be 192/4=48(CPUs per zone) while for the 2
> > sockets server it is 112/2=56(CPUs per zone). The test is started with
> > nr_task=nr_cpu so for the 2 sockets servers, it ends up having more CPUs
> > consuming one zone.
> > 
> 
> Nice that the prediction is accurate. It brings us to another option --
> breaking up the zone lock by either hash or address space ranges. The
> address space ranges would probably be easier to implement. Where it
> gets hairy is that PFN walkers would need different zone locks. However,
> overall it might be a better option because it's not order-0 specific.

I think the 'address space range' lock is worth a try.

> It would be a lot of legwork because all uses of the zone lock would
> have to be audited to see which ones protect the free lists and which
> ones protect "something else".

Yes a lot of details.

> > > That's important to know. It does reduce the utility of the patch
> > > somewhat but not all arches support THP and THP is not always enabled on
> > > x86.
> > 
> > I always wondered how systems are making use of THP.
> > After all, when system has been runing a while(days or months), file
> > cache should consumed a lot of memory and high order pages will become
> > more and more scare. If order9 page can't be reliably allocated, will
> > workload rely on it?
> > Just a thought.
> > 
> 
> File cache can usually be trivially reclaimed and moved. It's a "how
> long is a piece of string" to determine at what point a system can get
> fragmented and whether than can be prevented. It's somewhat outside the
> scope of this patch but anecdotally I'm looking at a machine with 20 days
> uptime and it still has 2390GB worth of THPs free after a large amount
> of reclaim activity over the system lifetime so fragmentation avoidance
> does work in some cases.

Good to know, thanks.

> 
> > THP is of course pretty neat that it reduced TLB cost, needs fewer page
> > table etc. I just wondered if people really rely on it, or using it
> > after their system has been up for a long time.
> > 
> 
> If people didn't rely on it then we might as well delete THP and the
> declare the whole tmpfs-backed-THP as worthless.
> 
> > > Yes, but note that the concept is still problematic.
> > > isolate_migratepages_block is not guaranteed to find a pageblock with
> > > unmerged buddies in it. If there are pageblocks towards the end of the
> > > zone with unmerged pages, they may never be found. This will be very hard
> > > to detect at runtime because it's heavily dependant on the exact state
> > > of the system.
> > 
> > Quite true.
> > 
> > The intent here though, is not to have compaction merge back all
> > unmerged pages, but did the merge for these unmerged pages in a
> > piggyback way, i.e. since isolate_migratepages_block() is doing the
> > scan, why don't we let it handle these unmerged pages when it meets
> > them?
> > 
> > If for some reason isolate_migratepages_block() didn't meet a single
> > unmerged page before compaction succeed, we probably do not need worry
> > much yet since compaction succeeded anyway.
> > 
> 
> I don't think this is the right way of thinking about it because it's
> possible to have the system split in such a way so that the migration
> scanner only encounters unmovable pages before it meets the free scanner
> where unmerged buddies were in the higher portion of the address space.

Yes it is possible unmerged pages are in the higher portion.

My understanding is, when the two scanners meet, all unmerged pages will
be either used by the free scanner as migrate targets or sent to merge
by the migration scanner.

> 
> You either need to keep unmerged buddies on a separate list or search
> the order-0 free list for merge candidates prior to compaction.
> 
> > > It's needed to form them efficiently but excessive reclaim or writing 3
> > > to drop_caches can also do it. Be careful of tying lazy buddy too
> > > closely to compaction.
> > 
> > That's the current design of this patchset, do you see any immediate
> > problem of this? Is it that you are worried about high-order allocation
> > success rate using this design?
> 
> I've pointed out what I see are the design flaws but yes, in general, I'm
> worried about the high order allocation success rate using this design,
> the reliance on compaction and the fact that the primary motivation is
> when THP is disabled.

When THP is in use, zone lock contention is pretty much nowhere :-)

I'll see what I can get with 'address space 

[tip:x86/urgent] x86/swiotlb: Enable swiotlb for > 4GiG RAM on 32-bit kernels

2018-10-18 Thread tip-bot for Christoph Hellwig
Commit-ID:  485734f3fc77c1eb77ffe138c027b9a4bf0178f3
Gitweb: https://git.kernel.org/tip/485734f3fc77c1eb77ffe138c027b9a4bf0178f3
Author: Christoph Hellwig 
AuthorDate: Sun, 14 Oct 2018 09:52:08 +0200
Committer:  Ingo Molnar 
CommitDate: Fri, 19 Oct 2018 07:49:32 +0200

x86/swiotlb: Enable swiotlb for > 4GiG RAM on 32-bit kernels

We already build the swiotlb code for 32-bit kernels with PAE support,
but the code to actually use swiotlb has only been enabled for 64-bit
kernels for an unknown reason.

Before Linux v4.18 we paper over this fact because the networking code,
the SCSI layer and some random block drivers implemented their own
bounce buffering scheme.

[ mingo: Changelog fixes. ]

Fixes: 21e07dba9fb1 ("scsi: reduce use of block bounce buffers")
Fixes: ab74cfebafa3 ("net: remove the PCI_DMA_BUS_IS_PHYS check in 
illegal_highdma")
Reported-by: Matthew Whitehead 
Signed-off-by: Christoph Hellwig 
Signed-off-by: Thomas Gleixner 
Tested-by: Matthew Whitehead 
Cc: konrad.w...@oracle.com
Cc: io...@lists.linux-foundation.org
Cc: sta...@vger.kernel.org
Link: https://lkml.kernel.org/r/20181014075208.2715-1-...@lst.de
Signed-off-by: Ingo Molnar 
---
 arch/x86/kernel/pci-swiotlb.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
index 661583662430..71c0b01d93b1 100644
--- a/arch/x86/kernel/pci-swiotlb.c
+++ b/arch/x86/kernel/pci-swiotlb.c
@@ -42,10 +42,8 @@ IOMMU_INIT_FINISH(pci_swiotlb_detect_override,
 int __init pci_swiotlb_detect_4gb(void)
 {
/* don't initialize swiotlb if iommu=off (no_iommu=1) */
-#ifdef CONFIG_X86_64
if (!no_iommu && max_possible_pfn > MAX_DMA32_PFN)
swiotlb = 1;
-#endif
 
/*
 * If SME is active then swiotlb will be set to 1 so that bounce


Re: [PATCH 1/2] sched/cpufreq: Reorganize the cpufreq files

2018-10-18 Thread kbuild test robot
Hi Daniel,

I love your patch! Yet something to improve:

[auto build test ERROR on tip/sched/core]
[also build test ERROR on v4.19-rc8 next-20181018]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Daniel-Lezcano/sched-cpufreq-Reorganize-the-cpufreq-files/20181019-103851
config: x86_64-randconfig-s2-10191209 (attached as .config)
compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

>> scripts/Makefile.modbuiltin:26: kernel/sched/cpufreq/Makefile: No such file 
>> or directory
   make[4]: *** No rule to make target 'kernel/sched/cpufreq/Makefile'.
   make[4]: Failed to remake makefile 'kernel/sched/cpufreq/Makefile'.

vim +26 scripts/Makefile.modbuiltin

607b30fc Michal Marek 2010-06-10  22  
bc081dd6 Michal Marek 2009-12-07  23  # The filename Kbuild has precedence over 
Makefile
bc081dd6 Michal Marek 2009-12-07  24  kbuild-dir := $(if $(filter 
/%,$(src)),$(src),$(srctree)/$(src))
bc081dd6 Michal Marek 2009-12-07  25  kbuild-file := $(if $(wildcard 
$(kbuild-dir)/Kbuild),$(kbuild-dir)/Kbuild,$(kbuild-dir)/Makefile)
bc081dd6 Michal Marek 2009-12-07 @26  include $(kbuild-file)
bc081dd6 Michal Marek 2009-12-07  27  

:: The code at line 26 was first introduced by commit
:: bc081dd6e9f622c73334dc465359168543ccaabf kbuild: generate modules.builtin

:: TO: Michal Marek 
:: CC: Michal Marek 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH] Input: synaptics - avoid using uninitialized variable when probing

2018-10-18 Thread Peter Hutterer
On Tue, Oct 16, 2018 at 05:14:43PM -0700, Dmitry Torokhov wrote:
> synaptics_detect() does not check whether sending commands to the
> device succeeds and instead relies on getting unique data from the
> device. Let's make sure we seed entire buffer with zeroes to make sure
> we not use garbage on stack that just happen to be 0x47.
> 
> Reported-by: syzbot+13cb3b01d0784e4ff...@syzkaller.appspotmail.com
> Signed-off-by: Dmitry Torokhov 

doh, was just about to send out the same patch.

Reviewed-by: Peter Hutterer 

Cheers,
   Peter

> ---
>  drivers/input/mouse/synaptics.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/drivers/input/mouse/synaptics.c b/drivers/input/mouse/synaptics.c
> index 55d33500d55e..5e85f3cca867 100644
> --- a/drivers/input/mouse/synaptics.c
> +++ b/drivers/input/mouse/synaptics.c
> @@ -99,9 +99,7 @@ static int synaptics_mode_cmd(struct psmouse *psmouse, u8 
> mode)
>  int synaptics_detect(struct psmouse *psmouse, bool set_properties)
>  {
>   struct ps2dev *ps2dev = >ps2dev;
> - u8 param[4];
> -
> - param[0] = 0;
> + u8 param[4] = { 0 };
>  
>   ps2_command(ps2dev, param, PSMOUSE_CMD_SETRES);
>   ps2_command(ps2dev, param, PSMOUSE_CMD_SETRES);
> -- 
> 2.19.1.331.ge82ca0e54c-goog
> 
> 
> -- 
> Dmitry


[RFC v4 0/2] WhiteEgret LSM module

2018-10-18 Thread Shinya Takumi
WhiteEgret is an LSM to simply provide a whitelisting-type
execution control.

An execution-whitelist, simply called whitelist, is a list
of executable components (e.g., applications, libraries)
that are approved to run on a host. The whitelist is used
to decide whether executable components are permitted to
execute or not. This mechanism can stop an execution of
unknown software, so it helps stop the execution of
malicious code and other unauthorized software.

It is important to maintain a whitelist properly according to
the execution environments. Managing whitelists for information
systems is a difficult task because their environments are
changed frequently. On the other hand, for such devices that
continue to do the same tasks for a certain period of time,
we can use the same whitelist for the period once the whitelist
is established. Thus the whitelisting-type execution control
works best in such execution environments. Examples of the above
execution environments include control devices in industrial
control systems.

Although the number of changing whitelists is not so large,
it is necessary to change them according to a system life cycle
or each phase of system operations. There is a requirement to
change whitelists with the system operations continued because
they often cannot easily be stopped. For example, such cases
include temporarily allowing maintenance programs for maintenance
or troubleshooting purposes while running the systems.

WhiteEgret is aiming at satisfying the above requirement.
WhiteEgret adopts a model that a whitelist is managed in user space.
Namely, WhiteEgret assumes that a privileged user manages a whitelist
in user space. This makes it possible to change the whitelist while
running the systems.

Mechanism of WhiteEgret

WhiteEgret requires a user application called WhiteEgret User
Application (WEUA, for short). WhiteEgret utilizes the
bprm_check_security hook and the mmap_file hook.
WhiteEgret asks WEUA whether an executable component hooked
by the above hooks is permitted to execute or not.
If the response from the WEUA is "permit", then WhiteEgret
continues to process the executable component. If the response
is "not permit", then WhiteEgret returns an error and blocks
the execution of the executable component.
The bprm_check_security hook is triggered by execve system
call, so execution by almost all executable components are
hooked by the hook. However, because shared objects do not
invoke execve system call, WhiteEgret utilizes the mmap_file
hook to hook the memory mapping by a shared object.
Thus WhiteEgret ignores the mmap_file hook caused by
non-executable and by executable which calls execve system call.

To ask the permission to a WEUA, WhiteEgret sends the
absolute path, the inode number and the device number of the
executable component to the WEUA.
Then the WEUA is expected to work as follows.
The WEUA sees if the tuple of the absolute path and/or the inode
number and/or the device number is contained in the whitelist.
If it exists, the WEUA compares a hash value of the executable
component indicated by the absolute path (and/or the inode number
and/or device number) with that in the whitelist to see whether
the executable component is changed or not after the whitelist is
made. The WEUA returns "permit" if both tests are passed,
otherwise returns "not permit".

WhiteEgret v4 is also able to control for script execution. Some
LSM hooks (file_open/file_permission/task_alloc/task_free) are
added. Kernel configs are required to enable the hooks.

Most of interpreters open script files to execute. Therefore
WhiteEgret hooks for reading or opening a file. Then WhiteEgret
asks the WEUA whether an execution of the script is permitted to
execute or not. WhiteEgret is able to choose a hook entry for
execution control between file_open or file_permission.

Hook entries of task_alloc and task_free are used to control
exections of script effectively. Some interpreters forks child
processes to execte script files, so the WEUA managed a process
family of an interpter by bprm_check_security, task_alloc and
task_free.

To use WhiteEgret

Users have to prepare a whitelist and a WEUA to use WhiteEgret.
A sample WEUA is involved in samples/whiteegret/.

To enable WhiteEgret, you are required to build the kernel using
normal procedures with CONFIG_SECURITY_WHITEEGRET=y.

Additionally, SECURITY_WHITEEGRET_INTERPRETER=y option is
required to enable to control script executions.
And SECURITY_WHITEEGRET_WRITE=y option is also required to
detect of writing script files.

The patchset is also available in our github repo:
  https://github.com/whiteegret/whiteegret

---
Changes in v4:
- Add LSM hooks (file_open/file_permission/task_alloc/task_free)
  to control for script execution. Kernel configs are required
to enable the hooks.
- Modify the data struct for kernel space and user space
  communication.

Changes in v3:
- Change to a minor LSM module.

Changes in v2:
- 

[RFC v4 2/2] WhiteEgret: Add an example of user application.

2018-10-18 Thread Shinya Takumi
A user application is required to use WhiteEgret.
This RFC provides a sample user application program.

Usage
  sample-we-user   

Re: [PATCH 1/2] sched/cpufreq: Reorganize the cpufreq files

2018-10-18 Thread kbuild test robot
Hi Daniel,

I love your patch! Yet something to improve:

[auto build test ERROR on tip/sched/core]
[also build test ERROR on v4.19-rc8 next-20181018]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Daniel-Lezcano/sched-cpufreq-Reorganize-the-cpufreq-files/20181019-103851
config: x86_64-randconfig-s1-10191209 (attached as .config)
compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

>> scripts/Makefile.build:45: kernel/sched/cpufreq/Makefile: No such file or 
>> directory
>> make[4]: *** No rule to make target 'kernel/sched/cpufreq/Makefile'.
   make[4]: Failed to remake makefile 'kernel/sched/cpufreq/Makefile'.

vim +45 scripts/Makefile.build

0c53c8e6e Sam Ravnborg   2007-10-14  41  
2a6914703 Sam Ravnborg   2005-07-25  42  # The filename Kbuild has precedence 
over Makefile
db8c1a7b2 Sam Ravnborg   2005-07-27  43  kbuild-dir := $(if $(filter 
/%,$(src)),$(src),$(srctree)/$(src))
0c53c8e6e Sam Ravnborg   2007-10-14  44  kbuild-file := $(if $(wildcard 
$(kbuild-dir)/Kbuild),$(kbuild-dir)/Kbuild,$(kbuild-dir)/Makefile)
0c53c8e6e Sam Ravnborg   2007-10-14 @45  include $(kbuild-file)
^1da177e4 Linus Torvalds 2005-04-16  46  

:: The code at line 45 was first introduced by commit
:: 0c53c8e6eb456cde30f2305421c605713856abc8 kbuild: check for wrong use of 
CFLAGS

:: TO: Sam Ravnborg 
:: CC: Sam Ravnborg 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


[RFC v4 1/2] WhiteEgret: Add WhiteEgret core functions.

2018-10-18 Thread Shinya Takumi
This RFC provides implementation of WhiteEgret.

Signed-off-by: Shinya Takumi 
---
 security/Kconfig   |   1 +
 security/Makefile  |   2 +
 security/whiteegret/Kconfig|  82 +++
 security/whiteegret/Makefile   |   2 +
 security/whiteegret/init.c | 148 
 security/whiteegret/main.c | 466 +
 security/whiteegret/request.c  |  98 
 security/whiteegret/request.h  |  47 
 security/whiteegret/we.h   |  72 ++
 security/whiteegret/we_fs.c| 269 +
 security/whiteegret/we_fs.h|  23 ++
 security/whiteegret/we_fs_common.h |  60 +
 12 files changed, 1270 insertions(+)
 create mode 100644 security/whiteegret/Kconfig
 create mode 100644 security/whiteegret/Makefile
 create mode 100644 security/whiteegret/init.c
 create mode 100644 security/whiteegret/main.c
 create mode 100644 security/whiteegret/request.c
 create mode 100644 security/whiteegret/request.h
 create mode 100644 security/whiteegret/we.h
 create mode 100644 security/whiteegret/we_fs.c
 create mode 100644 security/whiteegret/we_fs.h
 create mode 100644 security/whiteegret/we_fs_common.h

diff --git a/security/Kconfig b/security/Kconfig
index d9aa521..d656e20 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -236,6 +236,7 @@ source security/tomoyo/Kconfig
 source security/apparmor/Kconfig
 source security/loadpin/Kconfig
 source security/yama/Kconfig
+source security/whiteegret/Kconfig
 
 source security/integrity/Kconfig
 
diff --git a/security/Makefile b/security/Makefile
index 4d2d378..2da669c 100644
--- a/security/Makefile
+++ b/security/Makefile
@@ -10,6 +10,7 @@ subdir-$(CONFIG_SECURITY_TOMOYO)+= tomoyo
 subdir-$(CONFIG_SECURITY_APPARMOR) += apparmor
 subdir-$(CONFIG_SECURITY_YAMA) += yama
 subdir-$(CONFIG_SECURITY_LOADPIN)  += loadpin
+subdir-$(CONFIG_SECURITY_WHITEEGRET)   += whiteegret
 
 # always enable default capabilities
 obj-y  += commoncap.o
@@ -26,6 +27,7 @@ obj-$(CONFIG_SECURITY_APPARMOR)   += apparmor/
 obj-$(CONFIG_SECURITY_YAMA)+= yama/
 obj-$(CONFIG_SECURITY_LOADPIN) += loadpin/
 obj-$(CONFIG_CGROUP_DEVICE)+= device_cgroup.o
+obj-$(CONFIG_SECURITY_WHITEEGRET)  += whiteegret/
 
 # Object integrity file lists
 subdir-$(CONFIG_INTEGRITY) += integrity
diff --git a/security/whiteegret/Kconfig b/security/whiteegret/Kconfig
new file mode 100644
index 000..e55acc4
--- /dev/null
+++ b/security/whiteegret/Kconfig
@@ -0,0 +1,82 @@
+config SECURITY_WHITEEGRET
+   bool "WhiteEgret support"
+   depends on SECURITY
+   select SECURITYFS
+   default n
+   help
+ This enables the WhiteEgret security module.
+ WhiteEgret provides a whitelisting execution control capability,
+ which helps stop the execution of unauthorized software
+ such as malware.
+ You will also need a user application and an execution whitelist.
+ If you are unsure how to answer this question, answer N.
+
+config SECURITY_WHITEEGRET_INTERPRETER
+   bool "WhiteEgret hook file read and create/exit task for interpreter"
+   depends on SECURITY_WHITEEGRET
+   default n
+   help
+ This add LSM fook points for controlling interpreter.
+ Target hook points are file read and create/exit task functions.
+ You selecte details hook points for enabling config depend on
+ SECURITY_WHITEEGRET_INTERPRETER.
+
+config SECURITY_WHITEEGRET_HOOK_FILE_READ
+   bool "WhiteEgret hook file read"
+   depends on SECURITY_WHITEEGRET_INTERPRETER
+   default n
+   help
+ This enables hooking file read. The Kernel notify hooking infomation
+ to WhiteEgret's user application. This applocation can receive
+ hooking infomation and contorolling execution of hook function.
+
+config SECURITY_WHITEEGRET_HOOK_READ_OPEN
+   bool "WhiteEgret hook open for file read"
+   depends on SECURITY_WHITEEGRET_INTERPRETER
+   default y
+   help
+ This enables hooking file open LSM for reading. The Kernel notify
+ hooking infomation to WhiteEgret user application. This applocation
+ can receive hooking infomation and contorolling execution of
+ hook function.
+
+config SECURITY_WHITEEGRET_CHECK_LIVING_TASK
+   bool "WhiteEgret hook creating and exiting task"
+   depends on SECURITY_WHITEEGRET_INTERPRETER
+   default y
+   help
+ This enables hooking create/exit task LSM. The Kernel notify
+ hooking infomation to WhiteEgret user application. This applocation
+ can receive hooking infomation and contorolling execution of
+ hook function.
+
+config SECURITY_WHITEEGRET_HOOK_WRITE
+   bool "WhiteEgret hook write"
+   depends on SECURITY_WHITEEGRET
+   select SECURITY_PATH
+   

Re: [RFC PATCH 1/5] x86: introduce preemption disable prefix

2018-10-18 Thread Alexei Starovoitov
> 
> >
> > Another example is __BPF_PROG_RUN_ARRAY(), which also uses
> > preempt_enable_no_resched().
> 
> Alexei, I think this code is just wrong.

why 'just wrong' ?

> Do you know why it uses
> preempt_enable_no_resched()?

dont recall precisely.
we could be preemptable at the point where macro is called.
I think the goal of no_resched was to avoid adding scheduling points
where they didn't exist before just because a prog ran for few nsec.
May be Daniel or Roman remember.



Re: linux-next: build warning after merge of the scsi tree

2018-10-18 Thread Stephen Rothwell
Hi James,

On Thu, 18 Oct 2018 21:54:03 -0700 James Bottomley 
 wrote:
>
> It's the merge commit ... it was obviously the wrong choice; I'll fix
> it.

OK, thanks.

-- 
Cheers,
Stephen Rothwell


pgpS8HDlYzZFg.pgp
Description: OpenPGP digital signature


[PATCH] coresight: tmc: Fix bad register address for CLAIM

2018-10-18 Thread Leo Yan
Commit 4d3ebd3658d8 ("coreisght: tmc: Claim device before use") uses
CLAIM tag to validate if the device is available, it needs to pass
the device base address to access related registers.

In the function tmc_etb_disable_hw() it wrongly passes the driver data
pointer as register base address, thus it's easily to produce the kernel
warning info like below:

[   83.579898] WARNING: CPU: 4 PID: 2970 at 
drivers/hwtracing/coresight/coresight.c:207 
coresight_disclaim_device_unlocked+0x44/0x80
[   83.591448] Modules linked in:
[   83.594485] CPU: 4 PID: 2970 Comm: uname Not tainted 
4.19.0-rc6-00417-g721b509 #110
[   83.602067] Hardware name: ARM Juno development board (r2) (DT)
[   83.607932] pstate: 8085 (Nzcv daIf -PAN -UAO)
[   83.612681] pc : coresight_disclaim_device_unlocked+0x44/0x80
[   83.618375] lr : coresight_disclaim_device_unlocked+0x44/0x80
[   83.624064] sp : 0fe3ba20
[   83.627347] x29: 0fe3ba20 x28: 80002d430dc0
[   83.632618] x27: 800033177c00 x26: 80002eb44480
[   83.637889] x25: 0001 x24: 800033c72600
[   83.643160] x23: 099b11f8 x22: 099b11c8
[   83.648430] x21: 0002 x20: 800033a90418
[   83.653701] x19: 099b11c8 x18: 
[   83.658971] x17:  x16: 
[   83.664241] x15:  x14: 
[   83.669511] x13:  x12: 
[   83.674782] x11:  x10: 
[   83.680052] x9 :  x8 : 0001
[   83.685322] x7 : 0001 x6 : 800033ebab18
[   83.690593] x5 : 800033ebab18 x4 : 800033e6c698
[   83.695862] x3 : 0001 x2 : 
[   83.701133] x1 :  x0 : 0001
[   83.706404] Call trace:
[   83.708830]  coresight_disclaim_device_unlocked+0x44/0x80
[   83.714180]  coresight_disclaim_device+0x34/0x48
[   83.718756]  tmc_disable_etf_sink+0xc4/0xf0
[   83.722902]  coresight_disable_path_from+0xc8/0x240
[   83.727735]  coresight_disable_path+0x24/0x30
[   83.732053]  etm_event_stop+0x130/0x170
[   83.735854]  etm_event_del+0x24/0x30
[   83.739399]  event_sched_out.isra.51+0xcc/0x1e8
[   83.743887]  group_sched_out.part.53+0x44/0xb0
[   83.748291]  ctx_sched_out+0x298/0x2b8
[   83.752005]  task_ctx_sched_out+0x74/0xa8
[   83.755980]  perf_event_exit_task+0x140/0x418
[   83.760298]  do_exit+0x3f4/0xcf0
[   83.763497]  do_group_exit+0x5c/0xc0
[   83.767041]  __arm64_sys_exit_group+0x24/0x28
[   83.771359]  el0_svc_common+0x110/0x178
[   83.775160]  el0_svc_handler+0x94/0xe8
[   83.778875]  el0_svc+0x8/0xc
[   83.781728] ---[ end trace 02d8d8eac46db9e5 ]---

This patch is to fix this bug by using 'drvdata->base' as the
register base address for CLAIM related operation.

Fixes: 4d3ebd3658d8 ("coreisght: tmc: Claim device before use")
Cc: Suzuki Poulose 
Cc: Mathieu Poirier 
Cc: Mike Leach 
Cc: Robert Walker 
Signed-off-by: Leo Yan 
---
 drivers/hwtracing/coresight/coresight-tmc-etf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c 
b/drivers/hwtracing/coresight/coresight-tmc-etf.c
index 53fc83b..5864ac5 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
@@ -86,7 +86,7 @@ static void __tmc_etb_disable_hw(struct tmc_drvdata *drvdata)
 
 static void tmc_etb_disable_hw(struct tmc_drvdata *drvdata)
 {
-   coresight_disclaim_device(drvdata);
+   coresight_disclaim_device(drvdata->base);
__tmc_etb_disable_hw(drvdata);
 }
 
-- 
2.7.4



Re: linux-next: build warning after merge of the scsi tree

2018-10-18 Thread James Bottomley
On Fri, 2018-10-19 at 15:50 +1100, Stephen Rothwell wrote:
> Hi James,
> 
> After merging the scsi tree, today's linux-next build (powerpc
> ppc64_defconfig) produced this warning:
> 
> drivers/scsi/lpfc/lpfc_debugfs.c: In function
> 'lpfc_debugfs_nodelist_open':
> drivers/scsi/lpfc/lpfc_debugfs.c:706:17: warning: 'nrport' may be
> used uninitialized in this function [-Wmaybe-uninitialized]
>switch (nrport->port_state) {
>~~^~~~
> drivers/scsi/lpfc/lpfc_debugfs.c:553:30: note: 'nrport' was declared
> here
>   struct nvme_fc_remote_port *nrport;
>   ^~
> 
> I am not sure where this has come from :-(

It's the merge commit ... it was obviously the wrong choice; I'll fix
it.

James


signature.asc
Description: This is a digitally signed message part


Re: [PATCH] hugetlbfs: dirty pages as they are added to pagecache

2018-10-18 Thread Mike Kravetz
On 10/18/18 6:47 PM, Andrew Morton wrote:
> On Thu, 18 Oct 2018 20:46:21 -0400 Andrea Arcangeli  
> wrote:
> 
>> On Thu, Oct 18, 2018 at 04:16:40PM -0700, Mike Kravetz wrote:
>>> I was not sure about this, and expected someone could come up with
>>> something better.  It just seems there are filesystems like huegtlbfs,
>>> where it makes no sense wasting cycles traversing the filesystem.  So,
>>> let's not even try.
>>>
>>> Hoping someone can come up with a better method than hard coding as
>>> I have done above.
>>
>> It's not strictly required after marking the pages dirty though. The
>> real fix is the other one? Could we just drop the hardcoding and let
>> it run after the real fix is applied?

Yeah.  The other part of the patch is the real fix.  This drop_caches
part is not necessary.

>> The performance of drop_caches doesn't seem critical, especially with
>> gigapages. tmpfs doesn't seem to be optimized away from drop_caches
>> and the gain would be bigger for tmpfs if THP is not enabled in the
>> mount, so I'm not sure if we should worry about hugetlbfs first.
> 
> I guess so.  I can't immediately see a clean way of expressing this so
> perhaps it would need a new BDI_CAP_NO_BACKING_STORE.  Such a
> thing hardly seems worthwhile for drop_caches.
> 
> And drop_caches really shouldn't be there anyway.  It's a standing
> workaround for ongoing suckage in pagecache and metadata reclaim
> behaviour :(

I'm OK with dropping the other part.  It just seemed like there was no
real reason to try and drop_caches for hugetlbfs (and perhaps others).

Andrew, would you like another version?  Or can you just drop the
fs/drop_caches.c part?

-- 
Mike Kravetz


linux-next: build warning after merge of the scsi tree

2018-10-18 Thread Stephen Rothwell
Hi James,

After merging the scsi tree, today's linux-next build (powerpc
ppc64_defconfig) produced this warning:

drivers/scsi/lpfc/lpfc_debugfs.c: In function 'lpfc_debugfs_nodelist_open':
drivers/scsi/lpfc/lpfc_debugfs.c:706:17: warning: 'nrport' may be used 
uninitialized in this function [-Wmaybe-uninitialized]
   switch (nrport->port_state) {
   ~~^~~~
drivers/scsi/lpfc/lpfc_debugfs.c:553:30: note: 'nrport' was declared here
  struct nvme_fc_remote_port *nrport;
  ^~

I am not sure where this has come from :-(

-- 
Cheers,
Stephen Rothwell


pgpcGTZnRNLfi.pgp
Description: OpenPGP digital signature


Re: [PATCH] tpm: tpm_i2c_nuvoton: use correct command duration for TPM 2.x

2018-10-18 Thread Nayna Jain




On 10/17/2018 10:02 PM, Tomas Winkler wrote:

diff --git a/drivers/char/tpm/tpm_i2c_nuvoton.c 
b/drivers/char/tpm/tpm_i2c_nuvoton.c
index caa86b19c76d..f74f451baf6a 100644
--- a/drivers/char/tpm/tpm_i2c_nuvoton.c
+++ b/drivers/char/tpm/tpm_i2c_nuvoton.c
@@ -369,6 +369,7 @@ static int i2c_nuvoton_send(struct tpm_chip *chip, u8 *buf, 
size_t len)
struct device *dev = chip->dev.parent;
struct i2c_client *client = to_i2c_client(dev);
u32 ordinal;
+   unsigned long duration;
size_t count = 0;
int burst_count, bytes2write, retries, rc = -EIO;

@@ -455,10 +456,12 @@ static int i2c_nuvoton_send(struct tpm_chip *chip, u8 
*buf, size_t len)
return rc;
}
ordinal = be32_to_cpu(*((__be32 *) (buf + 6)));
-   rc = i2c_nuvoton_wait_for_data_avail(chip,
-tpm_calc_ordinal_duration(chip,
-  ordinal),
->read_queue);
+   if (chip->flags & TPM_CHIP_FLAG_TPM2)
+   duration = tpm2_calc_ordinal_duration(chip, ordinal);
+   else
+   duration = tpm_calc_ordinal_duration(chip, ordinal);
+
+   rc = i2c_nuvoton_wait_for_data_avail(chip, duration, >read_queue);
if (rc) {
dev_err(dev, "%s() timeout command duration\n", __func__);
i2c_nuvoton_ready(chip);


I only have Nuvoton TPM 2.0, tested for that.

Reviewed-by: Nayna Jain 
Tested-by: Nayna Jain  (For TPM 2.0)


Thanks & Regards,
    - Nayna




Re: [RFC PATCH 1/5] x86: introduce preemption disable prefix

2018-10-18 Thread Nadav Amit
at 9:29 PM, Andy Lutomirski  wrote:

>> On Oct 18, 2018, at 6:08 PM, Nadav Amit  wrote:
>> 
>> at 10:00 AM, Andy Lutomirski  wrote:
>> 
 On Oct 18, 2018, at 9:47 AM, Nadav Amit  wrote:
 
 at 8:51 PM, Andy Lutomirski  wrote:
 
>> On Wed, Oct 17, 2018 at 8:12 PM Nadav Amit  wrote:
>> at 6:22 PM, Andy Lutomirski  wrote:
>> 
 On Oct 17, 2018, at 5:54 PM, Nadav Amit  wrote:
 
 It is sometimes beneficial to prevent preemption for very few
 instructions, or prevent preemption for some instructions that precede
 a branch (this latter case will be introduced in the next patches).
 
 To provide such functionality on x86-64, we use an empty REX-prefix
 (opcode 0x40) as an indication that preemption is disabled for the
 following instruction.
>>> 
>>> Nifty!
>>> 
>>> That being said, I think you have a few bugs. First, you can’t just 
>>> ignore
>>> a rescheduling interrupt, as you introduce unbounded latency when this
>>> happens — you’re effectively emulating preempt_enable_no_resched(), 
>>> which
>>> is not a drop-in replacement for preempt_enable(). To fix this, you may
>>> need to jump to a slow-path trampoline that calls schedule() at the end 
>>> or
>>> consider rewinding one instruction instead. Or use TF, which is only a
>>> little bit terrifying…
>> 
>> Yes, I didn’t pay enough attention here. For my use-case, I think that 
>> the
>> easiest solution would be to make synchronize_sched() ignore preemptions
>> that happen while the prefix is detected. It would slightly change the
>> meaning of the prefix.
 
 So thinking about it further, rewinding the instruction seems the easiest
 and most robust solution. I’ll do it.
 
>>> You also aren’t accounting for the case where you get an exception that
>>> is, in turn, preempted.
>> 
>> Hmm.. Can you give me an example for such an exception in my use-case? I
>> cannot think of an exception that might be preempted (assuming #BP, #MC
>> cannot be preempted).
> 
> Look for cond_local_irq_enable().
 
 I looked at it. Yet, I still don’t see how exceptions might happen in my
 use-case, but having said that - this can be fixed too.
>>> 
>>> I’m not totally certain there’s a case that matters.  But it’s worth 
>>> checking
>> 
>> I am still checking. But, I wanted to ask you whether the existing code is
>> correct, since it seems to me that others do the same mistake I did, unless
>> I don’t understand the code.
>> 
>> Consider for example do_int3(), and see my inlined comments:
>> 
>> dotraplinkage void notrace do_int3(struct pt_regs *regs, long error_code)
>> {
>>   ...
>>   ist_enter(regs);// => preempt_disable()
>>   cond_local_irq_enable(regs);// => assume it enables IRQs
>> 
>>   ...
>>   // resched irq can be delivered here. It will not caused rescheduling
>>   // since preemption is disabled
>> 
>>   cond_local_irq_disable(regs);// => assume it disables IRQs
>>   ist_exit(regs);// => preempt_enable_no_resched()
>> }
>> 
>> At this point resched will not happen for unbounded length of time (unless
>> there is another point when exiting the trap handler that checks if
>> preemption should take place).
> 
> I think it's only a bug in the cases where someone uses extable to fix
> up an int3 (which would be nuts) or that we oops.  But I should still
> fix it.  In the normal case where int3 was in user code, we'll miss
> the reschedule in do_trap(), but we'll reschedule in
> prepare_exit_to_usermode() -> exit_to_usermode_loop().

Thanks for your quick response, and sorry for bothering instead of dealing
with it. Note that do_debug() does something similar to do_int3().

And then there is optimized_callback() that also uses
preempt_enable_no_resched(). I think the original use was correct, but then
a19b2e3d7839 ("kprobes/x86: Remove IRQ disabling from ftrace-based/optimized
kprobes”) removed the IRQ disabling, while leaving
preempt_enable_no_resched() . No?



Re: [PATCH 1/2] iio: adc: Add ad7124 support

2018-10-18 Thread kbuild test robot
Hi Stefan,

I love your patch! Yet something to improve:

[auto build test ERROR on iio/togreg]
[also build test ERROR on v4.19-rc8 next-20181018]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Stefan-Popa/iio-adc-Add-ad7124-support/20181019-051737
base:   https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio.git togreg
config: arm-allmodconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=arm 

All error/warnings (new ones prefixed by >>):

>> drivers/iio//adc/ad7124.c:215:3: error: 'const struct ad_sigma_delta_info' 
>> has no member named 'data_reg'
 .data_reg = AD7124_DATA,
  ^~~~
>> drivers/iio//adc/ad7124.c:25:23: warning: excess elements in struct 
>> initializer
#define AD7124_DATA   0x02
  ^
>> drivers/iio//adc/ad7124.c:215:14: note: in expansion of macro 'AD7124_DATA'
 .data_reg = AD7124_DATA,
 ^~~
   drivers/iio//adc/ad7124.c:25:23: note: (near initialization for 
'ad7124_sigma_delta_info')
#define AD7124_DATA   0x02
  ^
>> drivers/iio//adc/ad7124.c:215:14: note: in expansion of macro 'AD7124_DATA'
 .data_reg = AD7124_DATA,
 ^~~

vim +215 drivers/iio//adc/ad7124.c

20  
21  /* AD7124 registers */
22  #define AD7124_COMMS0x00
23  #define AD7124_STATUS   0x00
24  #define AD7124_ADC_CONTROL  0x01
  > 25  #define AD7124_DATA 0x02
26  #define AD7124_IO_CONTROL_1 0x03
27  #define AD7124_IO_CONTROL_2 0x04
28  #define AD7124_ID   0x05
29  #define AD7124_ERROR0x06
30  #define AD7124_ERROR_EN 0x07
31  #define AD7124_MCLK_COUNT   0x08
32  #define AD7124_CHANNEL(x)   (0x09 + (x))
33  #define AD7124_CONFIG(x)(0x19 + (x))
34  #define AD7124_FILTER(x)(0x21 + (x))
35  #define AD7124_OFFSET(x)(0x29 + (x))
36  #define AD7124_GAIN(x)  (0x31 + (x))
37  
38  /* AD7124_STATUS */
39  #define AD7124_STATUS_POR_FLAG_MSK  BIT(4)
40  
41  /* AD7124_ADC_CONTROL */
42  #define AD7124_ADC_CTRL_PWR_MSK GENMASK(7, 6)
43  #define AD7124_ADC_CTRL_PWR(x)  
FIELD_PREP(AD7124_ADC_CTRL_PWR_MSK, x)
44  #define AD7124_ADC_CTRL_MODE_MSKGENMASK(5, 2)
45  #define AD7124_ADC_CTRL_MODE(x) FIELD_PREP(AD7124_ADC_CTRL_MODE_MSK, x)
46  
47  /* AD7124_CHANNEL_X */
48  #define AD7124_CHANNEL_EN_MSK   BIT(15)
49  #define AD7124_CHANNEL_EN(x)
FIELD_PREP(AD7124_CHANNEL_EN_MSK, x)
50  #define AD7124_CHANNEL_SETUP_MSKGENMASK(14, 12)
51  #define AD7124_CHANNEL_SETUP(x) FIELD_PREP(AD7124_CHANNEL_SETUP_MSK, x)
52  #define AD7124_CHANNEL_AINP_MSK GENMASK(9, 5)
53  #define AD7124_CHANNEL_AINP(x)  
FIELD_PREP(AD7124_CHANNEL_AINP_MSK, x)
54  #define AD7124_CHANNEL_AINM_MSK GENMASK(4, 0)
55  #define AD7124_CHANNEL_AINM(x)  
FIELD_PREP(AD7124_CHANNEL_AINM_MSK, x)
56  
57  /* AD7124_CONFIG_X */
58  #define AD7124_CONFIG_BIPOLAR_MSK   BIT(11)
59  #define AD7124_CONFIG_BIPOLAR(x)
FIELD_PREP(AD7124_CONFIG_BIPOLAR_MSK, x)
60  #define AD7124_CONFIG_REF_SEL_MSK   GENMASK(4, 3)
61  #define AD7124_CONFIG_REF_SEL(x)
FIELD_PREP(AD7124_CONFIG_REF_SEL_MSK, x)
62  #define AD7124_CONFIG_PGA_MSK   GENMASK(2, 0)
63  #define AD7124_CONFIG_PGA(x)
FIELD_PREP(AD7124_CONFIG_PGA_MSK, x)
64  
65  /* AD7124_FILTER_X */
66  #define AD7124_FILTER_FS_MSKGENMASK(10, 0)
67  #define AD7124_FILTER_FS(x) 
FIELD_PREP(AD7124_FILTER_FS_MSK, x)
68  
69  enum ad7124_ids {
70  ID_AD7124_4,
71  ID_AD7124_8,
72  };
73  
74  enum ad7124_ref_sel {
75  AD7124_REFIN1,
76  AD7124_REFIN2,
77  AD7124_INT_REF,
78  AD7124_AVDD_REF,
79  };
80  
81  enum ad7124_power_mode {
82  AD7124_LOW_POWER,
83  AD7124_MID_POWER,
84  AD7124_FULL_POWER,
85  };
86  
87  static const unsigned int ad7124_gain[8] = {
88  1, 2, 4, 8, 16, 32, 64, 128
89  };
90  
91  static const int ad7124_master_clk_freq_hz[3] = {
92  [AD7124_LOW_POWER] = 76800,
93  [AD7124_MID_POWER] = 153600,
94  [AD7124_FULL_POWER] = 614400,
95  

Re: [RFC PATCH 1/5] x86: introduce preemption disable prefix

2018-10-18 Thread Andy Lutomirski
> On Oct 18, 2018, at 6:08 PM, Nadav Amit  wrote:
>
> at 10:00 AM, Andy Lutomirski  wrote:
>
>>
>>
>>> On Oct 18, 2018, at 9:47 AM, Nadav Amit  wrote:
>>>
>>> at 8:51 PM, Andy Lutomirski  wrote:
>>>
> On Wed, Oct 17, 2018 at 8:12 PM Nadav Amit  wrote:
> at 6:22 PM, Andy Lutomirski  wrote:
>
>>> On Oct 17, 2018, at 5:54 PM, Nadav Amit  wrote:
>>>
>>> It is sometimes beneficial to prevent preemption for very few
>>> instructions, or prevent preemption for some instructions that precede
>>> a branch (this latter case will be introduced in the next patches).
>>>
>>> To provide such functionality on x86-64, we use an empty REX-prefix
>>> (opcode 0x40) as an indication that preemption is disabled for the
>>> following instruction.
>>
>> Nifty!
>>
>> That being said, I think you have a few bugs. First, you can’t just 
>> ignore
>> a rescheduling interrupt, as you introduce unbounded latency when this
>> happens — you’re effectively emulating preempt_enable_no_resched(), which
>> is not a drop-in replacement for preempt_enable(). To fix this, you may
>> need to jump to a slow-path trampoline that calls schedule() at the end 
>> or
>> consider rewinding one instruction instead. Or use TF, which is only a
>> little bit terrifying…
>
> Yes, I didn’t pay enough attention here. For my use-case, I think that the
> easiest solution would be to make synchronize_sched() ignore preemptions
> that happen while the prefix is detected. It would slightly change the
> meaning of the prefix.
>>>
>>> So thinking about it further, rewinding the instruction seems the easiest
>>> and most robust solution. I’ll do it.
>>>
>> You also aren’t accounting for the case where you get an exception that
>> is, in turn, preempted.
>
> Hmm.. Can you give me an example for such an exception in my use-case? I
> cannot think of an exception that might be preempted (assuming #BP, #MC
> cannot be preempted).

 Look for cond_local_irq_enable().
>>>
>>> I looked at it. Yet, I still don’t see how exceptions might happen in my
>>> use-case, but having said that - this can be fixed too.
>>
>> I’m not totally certain there’s a case that matters.  But it’s worth checking
>
> I am still checking. But, I wanted to ask you whether the existing code is
> correct, since it seems to me that others do the same mistake I did, unless
> I don’t understand the code.
>
> Consider for example do_int3(), and see my inlined comments:
>
> dotraplinkage void notrace do_int3(struct pt_regs *regs, long error_code)
> {
>...
>ist_enter(regs);// => preempt_disable()
>cond_local_irq_enable(regs);// => assume it enables IRQs
>
>...
>// resched irq can be delivered here. It will not caused rescheduling
>// since preemption is disabled
>
>cond_local_irq_disable(regs);// => assume it disables IRQs
>ist_exit(regs);// => preempt_enable_no_resched()
> }
>
> At this point resched will not happen for unbounded length of time (unless
> there is another point when exiting the trap handler that checks if
> preemption should take place).

I think it's only a bug in the cases where someone uses extable to fix
up an int3 (which would be nuts) or that we oops.  But I should still
fix it.  In the normal case where int3 was in user code, we'll miss
the reschedule in do_trap(), but we'll reschedule in
prepare_exit_to_usermode() -> exit_to_usermode_loop().

>
> Another example is __BPF_PROG_RUN_ARRAY(), which also uses
> preempt_enable_no_resched().

Alexei, I think this code is just wrong. Do you know why it uses
preempt_enable_no_resched()?

Oleg, the code in kernel/signal.c:

preempt_disable();
read_unlock(_lock);
preempt_enable_no_resched();
freezable_schedule();

looks bogus.  I don't get what it's trying to achieve with
preempt_disable(), and I also don't see why no_resched does anything.
Sure, it prevents a reschedule triggered during read_unlock() from
causing a reschedule, but it doesn't prevent an interrupt immediately
after the preempt_enable_no_resched() call from scheduling.

--Andy

>
> Am I missing something?
>
> Thanks,
> Nadav


[PATCH v2 1/2] seq_buf: Make seq_buf_puts() null-terminate the buffer

2018-10-18 Thread Michael Ellerman
Currently seq_buf_puts() will happily create a non null-terminated
string for you in the buffer. This is particularly dangerous if the
buffer is on the stack.

For example:

  char buf[8];
  char secret = "secret";
  struct seq_buf s;

  seq_buf_init(, buf, sizeof(buf));
  seq_buf_puts(, "foo");
  printk("Message is %s\n", buf);

Can result in:

  Message is fooªsecret

We could require all users to memset() their buffer to zero before
use. But that seems likely to be forgotten and lead to bugs.

Instead we can change seq_buf_puts() to always leave the buffer in a
null-terminated state.

The only downside is that this makes the buffer 1 character smaller
for seq_buf_puts(), but that seems like a good trade off.

Acked-by: Kees Cook 
Signed-off-by: Michael Ellerman 
---
 lib/seq_buf.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

v2: Fix NULL/null terminology.

diff --git a/lib/seq_buf.c b/lib/seq_buf.c
index 11f2ae0f9099..6aabb609dd87 100644
--- a/lib/seq_buf.c
+++ b/lib/seq_buf.c
@@ -144,9 +144,13 @@ int seq_buf_puts(struct seq_buf *s, const char *str)
 
WARN_ON(s->size == 0);
 
+   /* Add 1 to len for the trailing null byte which must be there */
+   len += 1;
+
if (seq_buf_can_fit(s, len)) {
memcpy(s->buffer + s->len, str, len);
-   s->len += len;
+   /* Don't count the trailing null byte against the capacity */
+   s->len += len - 1;
return 0;
}
seq_buf_set_overflow(s);
-- 
2.17.2



[PATCH v2 2/2] seq_buf: Use size_t for len in seq_buf_puts()

2018-10-18 Thread Michael Ellerman
Jann Horn points out that we're using unsigned int for len in
seq_buf_puts(), which could potentially overflow if we're passed a
UINT_MAX sized string.

The rest of the code already uses size_t, so we should also use that
in seq_buf_puts() to avoid any issues.

Suggested-by: Jann Horn 
Signed-off-by: Michael Ellerman 
---
 lib/seq_buf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

v2: New in v2.

diff --git a/lib/seq_buf.c b/lib/seq_buf.c
index 6aabb609dd87..bd807f545a9d 100644
--- a/lib/seq_buf.c
+++ b/lib/seq_buf.c
@@ -140,7 +140,7 @@ int seq_buf_bprintf(struct seq_buf *s, const char *fmt, 
const u32 *binary)
  */
 int seq_buf_puts(struct seq_buf *s, const char *str)
 {
-   unsigned int len = strlen(str);
+   size_t len = strlen(str);
 
WARN_ON(s->size == 0);
 
-- 
2.17.2



Re: [PATCH] seq_buf: Make seq_buf_puts() NULL terminate the buffer

2018-10-18 Thread Michael Ellerman
Jann Horn  writes:
> On Wed, Oct 17, 2018 at 2:10 PM Michael Ellerman  wrote:
>> Currently seq_buf_puts() will happily create a non NULL terminated
>> string for you in the buffer. This is particularly dangerous if the
>> buffer is on the stack.
>>
>> For example:
>>
>>   char buf[8];
>>   char secret = "secret";
>>   struct seq_buf s;
>>
>>   seq_buf_init(, buf, sizeof(buf));
>>   seq_buf_puts(, "foo");
>>   printk("Message is %s\n", buf);
>>
>> Can result in:
>>
>>   Message is fooªsecret
>>
>> We could require all users to memset() their buffer to NULL before
>> use. But that seems likely to be forgotten and lead to bugs.
>>
>> Instead we can change seq_buf_puts() to always leave the buffer in a
>> NULL terminated state.
>>
>> The only downside is that this makes the buffer 1 character smaller
>> for seq_buf_puts(), but that seems like a good trade off.
>
> After this, you can also simplify rdt_last_cmd_status_show(), right?

Yes.

We also have a seq_buf_printf() in powerpc code that is printing a fixed
string purely to get NULL termination, so that can become a
seq_buf_puts().

>> diff --git a/lib/seq_buf.c b/lib/seq_buf.c
>> index 11f2ae0f9099..b1570204cde3 100644
>> --- a/lib/seq_buf.c
>> +++ b/lib/seq_buf.c
>> @@ -144,9 +144,13 @@ int seq_buf_puts(struct seq_buf *s, const char *str)
>>
>> WARN_ON(s->size == 0);
>>
>> +   /* Add 1 to len for the trailing NULL which must be there */
>
> Nit: In the comments, I would prefer either "null byte" or "NUL"
> instead of "NULL" when talking about something that is not a pointer.

Hmm yeah I guess. I think of them as being more or less the same thing,
or at least interchangeable, but that's a bit sloppy.

I'll send a v2 with "null byte".

>> +   len += 1;
>
> It looks like you're using an "unsigned int" for the length, meaning
> that this can in theory (e.g. when operating on a string from a big
> vmalloc buffer) overflow. You should be using size_t here.

Yes you're right.

And if len overflows to zero above ..

>> if (seq_buf_can_fit(s, len)) {

This will return true.

>> memcpy(s->buffer + s->len, str, len);
>> -   s->len += len;
>> +   /* Don't count the trailing NULL against the capacity */
>> +   s->len += len - 1;

And then here s->len becomes UINT_MAX.

I think. Which is probably not what we want.

I'll send a patch to switch to size_t in there.

cheers


Re: Crash in msm serial on dragonboard with ftrace bootargs

2018-10-18 Thread Joel Fernandes
On Thu, Oct 18, 2018 at 09:17:06AM -0400, Steven Rostedt wrote:
> On Thu, 18 Oct 2018 10:51:18 +0530
> Sai Prakash Ranjan  wrote:
> 
> > > So something else is causing an issue besides just msm_read.
> > > 
> > > Can you do an objdump -dr of the entire vmlinux binary and gzip it and
> > > post it somewhere. Not sure if it would be too big to email. You could
> > > try sending it to me privately. I'd like to see the binary that you are
> > > using.
> > >   
> > 
> > I have sent the objdump and dot config to you privately.
> 
> Thanks. I don't see anything that pops out, but then again, my arm asm
> foo is very rusty (it has been literally decades since I did any arm
> asm). I wonder if it could simply be a timing issue?
> 
> 086eb538 :
> 086eb538:   a9be7bfdstp x29, x30, [sp,#-32]!
> 086eb53c:   910003fdmov x29, sp
> 086eb540:   a90153f3stp x19, x20, [sp,#16]
> 086eb544:   aa0003f4mov x20, x0
> 086eb548:   2a0103f3mov w19, w1
> 086eb54c:   aa1e03e0mov x0, x30
> 086eb550:   97e6bae4bl  0809a0e0 <_mcount>
> 
> The above is changed to nop on boot, but then to:
> 
>   bl ftrace_caller
> 
> When ftrace is enabled.
> 
> 086eb554:   8b334280add x0, x20, w19, uxtw
> 086eb558:   b940ldr w0, [x0]
> 086eb55c:   a94153f3ldp x19, x20, [sp,#16]
> 086eb560:   a8c27bfdldp x29, x30, [sp],#32
> 086eb564:   d65f03c0ret
> 
> 
> 
> 0809a0e4 :
> 0809a0e4:   a9bf7bfdstp x29, x30, [sp,#-16]!
> 0809a0e8:   910003fdmov x29, sp
> 0809a0ec:   d10013c0sub x0, x30, #0x4
> 0809a0f0:   f94003a1ldr x1, [x29]
> 0809a0f4:   f9400421ldr x1, [x1,#8]
> 0809a0f8:   d1001021sub x1, x1, #0x4
> 
> 0809a0fc :
> 0809a0fc:   d503201fnop
> 
> The above nop gets patched to:
> 
>   bl ftrace_ops_no_ops
> 
> Which will iterate through all the registered functions.
> 
> 
> 0809a100 :
> 0809a100:   d503201fnop
> 
> The above only gets set when function graph tracer is enabled, which it
> is not in this case.
> 
> 0809a104:   a8c17bfdldp x29, x30, [sp],#16
> 0809a108:   d65f03c0ret
> 
> 
> Anyone see any problems here?

This seems sane to me, he says in the other thread that he put 'notrace' to
the msm serial functions (which AIUI should prevent ftrace instrumentation)
and he still sees the issue.

thanks,

 - Joel



linux-next: manual merge of the kvm tree with the tip tree

2018-10-18 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the kvm tree got a conflict in:

  tools/include/uapi/linux/kvm.h

between commit:

  25fe15e54fe5 ("tools headers uapi: Sync kvm.h copy")

from the tip tree and commit:

  c939989d74e2 ("tools/headers: update kvm.h")

from the kvm tree.

I fixed it up (the latter is a superset of the former) and can carry the
fix as necessary. This is now fixed as far as linux-next is concerned,
but any non trivial conflicts should be mentioned to your upstream
maintainer when your tree is submitted for merging.  You may also want
to consider cooperating with the maintainer of the conflicting tree to
minimise any particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell


pgpvBil35GIdb.pgp
Description: OpenPGP digital signature


Re: [PATCH] firmware: coreboot: Fix a missing-check bug

2018-10-18 Thread Samuel Holland
On 10/18/18 10:37, Wenwen Wang wrote:
> In coreboot_table_init(), a for loop is used to copy the entries of the
> coreboot table. For each entry, the header of the entry, which is a
> structure coreboot_table_entry and includes the size of the entry, is
> firstly copied from the IO region 'ptr_entry' to 'entry' through the first
> memcpy_fromio(). Then the 'entry.size' is used to allocate the
> coreboot_device 'device' through kzalloc(). After 'device' is allocated,
> the whole entry, including the header, is then copied to 'device->entry'
> through the second memcpy_fromio(). Obviously, the header of the entry is
> copied twice here. More importantly, no check is enforced after the second
> copy to make sure the two copies obtain the same values. Given that the IO
> region can also be accessed by the device, it is possible that
> 'device->entry.size' is different from 'entry.size' after the second copy,
> especially when the device race to modify the size value between these two
> copies. This can cause undefined behavior of the kernel and introduce
> potential security risk, because 'device->entry.size' is inconsistent with
> the actual size of the entry.

Thanks for the patch.

However, this IO region is not associated with a hardware device. It is a table
in RAM that is only written to by firmware (coreboot) before Linux is ever run.
So there's no device on the other side that could race with the kernel here.

Regards,
Samuel

> This patch rewrites the header of each entry after the second copy, using
> the value acquired in the first copy. Through this way, the above issue can
> be avoided.
> 
> Signed-off-by: Wenwen Wang 
> ---
>  drivers/firmware/google/coreboot_table.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/firmware/google/coreboot_table.c 
> b/drivers/firmware/google/coreboot_table.c
> index 19db570..20fcd54 100644
> --- a/drivers/firmware/google/coreboot_table.c
> +++ b/drivers/firmware/google/coreboot_table.c
> @@ -128,6 +128,7 @@ int coreboot_table_init(struct device *dev, void __iomem 
> *ptr)
>   device->dev.bus = _bus_type;
>   device->dev.release = coreboot_device_release;
>   memcpy_fromio(>entry, ptr_entry, entry.size);
> + device->entry = entry;
>  
>   ret = device_register(>dev);
>   if (ret) {
> 



Re: [PATCH 1/4] Adds -Wshadow=local on KBUILD_HOSTCFLAGS

2018-10-18 Thread Leonardo Bras
On Thu, Oct 18, 2018 at 11:42 PM Masahiro Yamada
 wrote:
> Adding -Wshadow to KBUILD_HOSTCFLAGS emits another warning in Kconfig.
> Of course, it is easy to fix.
For v2, I already replaced '-Wshadow=local' for '-Wshadow' and fixed this
warning.

> But, I just started to think this option is a kind of harsh...

The v2 it's almost done. You think it will be useful, or should I drop it?

Regards,
Leonardo Bras


Re: [Lkcamp] [PATCH 3/4] kbuild: Removes unnecessary shadowed local variable and optimize testing.

2018-10-18 Thread Masahiro Yamada
On Thu, Oct 18, 2018 at 11:04 AM Leonardo Bras  wrote:
>
> Hello Helen,
>
> On Wed, Oct 17, 2018 at 12:06 AM Helen Koike  wrote:
> >
> > Hi Leonardo,
> >
> > On 10/16/18 9:09 PM, Leonardo Brás wrote:
> > > Removes an unnecessary shadowed local variable (start).
> > > Optimize test of isdigit:
> > > - If isalpha returns true, isdigit will return false, so no need to 
> > > test.
> >
> > This patch does two different things, it should be in two separated patches.
> Sure, no problem.
>
> >
> > >
> > > Signed-off-by: Leonardo Brás 
> > > ---
> > >  scripts/asn1_compiler.c | 7 +++
> > >  1 file changed, 3 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/scripts/asn1_compiler.c b/scripts/asn1_compiler.c
> > > index c146020fc783..08bb6e5fd6ad 100644
> > > --- a/scripts/asn1_compiler.c
> > > +++ b/scripts/asn1_compiler.c
> > > @@ -413,7 +413,7 @@ static void tokenise(char *buffer, char *end)
> > >
> > >   /* Handle string tokens */
> > >   if (isalpha(*p)) {
> > > - const char **dir, *start = p;
> > > + const char **dir;
> > >
> > >   /* Can be a directive, type name or element
> > >* name.  Find the end of the name.
> > > @@ -454,10 +454,9 @@ static void tokenise(char *buffer, char *end)
> > >
> > >   tokens[tix++].token_type = TOKEN_TYPE_NAME;
> > >   continue;
> > > - }
> > > + } else if (isdigit(*p)) {
> > > + /* Handle numbers */
> >
> > Actually you can't do that, p is being altered in the first if statement.
>
> Yeah, you are right. I will remove this logic for v2.


I drop v1 from my tree.

Please send v2.


> >
> > >
> > > - /* Handle numbers */
> > > - if (isdigit(*p)) {
> > >   /* Find the end of the number */
> > >   q = p + 1;
> > >   while (q < nl && (isdigit(*q)))
> > >
> >
> > Regards
> > Helen
>
> Thanks!
>
> Leonardo Brás



-- 
Best Regards
Masahiro Yamada


Re: [PATCH 1/4] Adds -Wshadow=local on KBUILD_HOSTCFLAGS

2018-10-18 Thread Masahiro Yamada
On Fri, Oct 19, 2018 at 1:53 AM Borislav Petkov  wrote:
>
> On Fri, Oct 19, 2018 at 01:39:21AM +0900, Masahiro Yamada wrote:
> > It is not realistic to enable this warning option by default.
>
> I believe the question is whether to enable that warning by default in
> KBUILD_HOSTCFLAGS. Enabling it by default for the target kernel is of
> course, too noisy. That's why it is hidden behind W=2 there.


Sorry, I misunderstood the question.

The difference about the noisiness between CC and HOSTCC
is just comes from the amount of source code.

Adding -Wshadow to KBUILD_HOSTCFLAGS emits another warning in Kconfig.
Of course, it is easy to fix.
But, I just started to think this option is a kind of harsh...


--
Best Regards
Masahiro Yamada


Re: [PATCH V2 2/5] mm/hugetlb: Distinguish between migratability and movability

2018-10-18 Thread Anshuman Khandual



On 10/19/2018 07:29 AM, Naoya Horiguchi wrote:
> On Fri, Oct 12, 2018 at 09:29:56AM +0530, Anshuman Khandual wrote:
>> During huge page allocation it's migratability is checked to determine if
>> it should be placed under movable zones with GFP_HIGHUSER_MOVABLE. But the
>> movability aspect of the huge page could depend on other factors than just
>> migratability. Movability in itself is a distinct property which should not
>> be tied with migratability alone.
>>
>> This differentiates these two and implements an enhanced movability check
>> which also considers huge page size to determine if it is feasible to be
>> placed under a movable zone. At present it just checks for gigantic pages
>> but going forward it can incorporate other enhanced checks.
> 
> (nitpicking...)
> The following code just checks hugepage_migration_supported(), so maybe
> s/Movability/Migratability/ is expected in the comment?
> 
>   static int unmap_and_move_huge_page(...)
>   {
>   ...
>   /*
>* Movability of hugepages depends on architectures and hugepage 
> size.
>* This check is necessary because some callers of hugepage 
> migration
>* like soft offline and memory hotremove don't walk through page
>* tables or check whether the hugepage is pmd-based or not before
>* kicking migration.
>*/
>   if (!hugepage_migration_supported(page_hstate(hpage))) {
> 
Sure, will update this patch only unless other changes are suggested.


Re: [driver-core PATCH v4 4/6] driver core: Probe devices asynchronously instead of the driver

2018-10-18 Thread Bart Van Assche

On 10/18/18 7:20 PM, Alexander Duyck wrote:

I see what you are talking about now. Actually I think this was an
existing issue before my patch even came into play. Basically the code
as it currently stands is device specific in terms of the attach and
release code.

I wonder if we shouldn't have the async_synchronize_full call in
__device_release_driver moved down and into driver_detach before we
even start the for loop. Assuming the driver is no longer associated
with the bus that should flush out all devices so that we can then
pull them out of the devices list at least. I may look at adding an
additional bitflag to the device struct to indicate that it has a
driver attach pending. Then for things like races between any attach
and detach calls the logic becomes pretty straight forward. Attach
will set the bit and provide driver data, detach will clear the bit
and the driver data. If a driver loads in between it should clear the
bit as well.

I'll work on it over the next couple days and hopefully have something
ready for testing/review early next week.


Hi Alex,

How about checking in __driver_attach_async_helper() whether the driver 
pointer is still valid by checking whether bus_for_each_drv(dev->bus, 
...) can still find the driver pointer? That approach requires 
protection with a mutex to avoid races with the driver detach code but 
shouldn't require any new flags in struct device.


Thanks,

Bart.


Re: [PATCH 2/3] uapi: get rid of STATX_ALL

2018-10-18 Thread Andreas Dilger
On Oct 18, 2018, at 7:11 AM, Miklos Szeredi  wrote:
> 
> Constants of the *_ALL type can be actively harmful due to the fact that
> developers will usually fail to consider the possible effects of future
> changes to the definition.
> 
> Remove STATX_ALL from the uapi, while no damage has been done yet.
> 
> We could keep something like this around in the kernel, but there's
> actually no point, since all filesystems should be explicitly checking
> flags that they support and not rely on the VFS masking unknown ones out: a
> flag could be known to the VFS, yet not known to the filesystem (see
> orangefs bug fixed in the previous patch).

What is actually strange is that the vfs_getattr_nosec() code is setting

stat->result_mask = STATX_BASIC_STATS;

in the code before any of the filesystem ->getattr methods are called.
That means it is up to the filesystems to actively *clear* flags from
the result_mask as opposed to only *setting* flags for fields that it
is filling in.  That seems a bit backward to me?

It looks like NFS is _probably_ doing the right thing, though it is still
using the userspace-supplied request_mask as a mask for the bits being
returned,  but the saving grace is that result_mask is STATX_BASIC_STATS
set by vfs_getattr_nosec() AFAICS.

Looking at OCFS2, CIFS, GFS2, they are doing a full inode revalidation
and returning the basic stats without looking at flags, request_mask,
or result_mask at all, so I'd expect they could be more efficient
(i.e. not revalidating the inode and possibly doing an RPC at all if
only some minimal flags are requested, or if AT_STATX_FORCE_SYNC is
not set).

It looks like overlayfs, nfsd, and fuse at least (as statx callers)
are often requesting a small subset of flags (e.g. STATX_INO,
STATX_NLINK, STATX_ATIME | STATX_MTIME, etc.) so they could be more
efficient if the underlying filesystems did only what was asked.

Cheers, Andreas

> 
> Signed-off-by: Miklos Szeredi 
> Cc: David Howells 
> Cc: Michael Kerrisk 
> ---
> fs/stat.c   | 1 -
> include/uapi/linux/stat.h   | 2 +-
> samples/statx/test-statx.c  | 2 +-
> tools/include/uapi/linux/stat.h | 2 +-
> 4 files changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/stat.c b/fs/stat.c
> index f8e6fb2c3657..8d297a279991 100644
> --- a/fs/stat.c
> +++ b/fs/stat.c
> @@ -73,7 +73,6 @@ int vfs_getattr_nosec(const struct path *path, struct kstat 
> *stat,
> 
>   memset(stat, 0, sizeof(*stat));
>   stat->result_mask |= STATX_BASIC_STATS;
> - request_mask &= STATX_ALL;
>   query_flags &= KSTAT_QUERY_FLAGS;
>   if (inode->i_op->getattr)
>   return inode->i_op->getattr(path, stat, request_mask,
> diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h
> index 7b35e98d3c58..370f09d92fa6 100644
> --- a/include/uapi/linux/stat.h
> +++ b/include/uapi/linux/stat.h
> @@ -148,7 +148,7 @@ struct statx {
> #define STATX_BLOCKS  0x0400U /* Want/got stx_blocks */
> #define STATX_BASIC_STATS 0x07ffU /* The stuff in the normal stat 
> struct */
> #define STATX_BTIME   0x0800U /* Want/got stx_btime */
> -#define STATX_ALL0x0fffU /* All currently supported 
> flags */
> +
> #define STATX__RESERVED   0x8000U /* Reserved for future 
> struct statx expansion */
> 
> /*
> diff --git a/samples/statx/test-statx.c b/samples/statx/test-statx.c
> index d4d77b09412c..e354048dea3c 100644
> --- a/samples/statx/test-statx.c
> +++ b/samples/statx/test-statx.c
> @@ -211,7 +211,7 @@ int main(int argc, char **argv)
>   struct statx stx;
>   int ret, raw = 0, atflag = AT_SYMLINK_NOFOLLOW;
> 
> - unsigned int mask = STATX_ALL;
> + unsigned int mask = STATX_BASIC_STATS | STATX_BTIME;
> 
>   for (argv++; *argv; argv++) {
>   if (strcmp(*argv, "-F") == 0) {
> diff --git a/tools/include/uapi/linux/stat.h b/tools/include/uapi/linux/stat.h
> index 7b35e98d3c58..370f09d92fa6 100644
> --- a/tools/include/uapi/linux/stat.h
> +++ b/tools/include/uapi/linux/stat.h
> @@ -148,7 +148,7 @@ struct statx {
> #define STATX_BLOCKS  0x0400U /* Want/got stx_blocks */
> #define STATX_BASIC_STATS 0x07ffU /* The stuff in the normal stat 
> struct */
> #define STATX_BTIME   0x0800U /* Want/got stx_btime */
> -#define STATX_ALL0x0fffU /* All currently supported 
> flags */
> +
> #define STATX__RESERVED   0x8000U /* Reserved for future 
> struct statx expansion */
> 
> /*
> --
> 2.14.3
> 


Cheers, Andreas







signature.asc
Description: Message signed with OpenPGP


Re: [driver-core PATCH v4 4/6] driver core: Probe devices asynchronously instead of the driver

2018-10-18 Thread Alexander Duyck
On Thu, Oct 18, 2018 at 1:15 PM Bart Van Assche  wrote:
>
> On Thu, 2018-10-18 at 12:38 -0700, Alexander Duyck wrote:
> > Basically if somebody loads a driver the dev->driver becomes set. If a
> > driver is removed it will clear dev->driver and set driver_data to
> > 0/NULL. That is what I am using as a mutex to track it in conjunction
> > with the device mutex. Basically if somebody attempts to attach a driver
> > before we get there we just exit and don't attempt to load this driver.
>
> I don't think that the above matches your code. __device_attach() does not
> set the dev->driver pointer before scheduling an asynchronous probe. Only
> dev->driver_data gets set before the asynchonous probe is scheduled. Since
> driver_detach() only iterates over devices that are in the per-driver klist
> it will skip all devices for which an asynchronous probe has been scheduled
> but __device_attach_async_helper() has not yet been called. My conclusion
> remains that this patch does not prevent a driver pointer to become invalid
> concurrently with __device_attach_async_helper() dereferencing the same
> driver pointer.
>
> Bart.

I see what you are talking about now. Actually I think this was an
existing issue before my patch even came into play. Basically the code
as it currently stands is device specific in terms of the attach and
release code.

I wonder if we shouldn't have the async_synchronize_full call in
__device_release_driver moved down and into driver_detach before we
even start the for loop. Assuming the driver is no longer associated
with the bus that should flush out all devices so that we can then
pull them out of the devices list at least. I may look at adding an
additional bitflag to the device struct to indicate that it has a
driver attach pending. Then for things like races between any attach
and detach calls the logic becomes pretty straight forward. Attach
will set the bit and provide driver data, detach will clear the bit
and the driver data. If a driver loads in between it should clear the
bit as well.

I'll work on it over the next couple days and hopefully have something
ready for testing/review early next week.

Thanks.

- Alex


Re: [PATCH V9 18/21] dt-bindings: csky CPU Bindings

2018-10-18 Thread Guo Ren
On Thu, Oct 18, 2018 at 09:31:35AM -0500, Rob Herring wrote:
> On Tue, 16 Oct 2018 10:58:37 +0800, Guo Ren wrote:
> > This patch adds the documentation to describe that how to add cpu nodes in
> > dts for SMP.
> > 
> > Signed-off-by: Guo Ren 
> > Cc: Rob Herring 
> > ---
> > Changelog:
> >  - Add compatible.
> >  - Remove status part.
> > ---
> > ---
> >  Documentation/devicetree/bindings/csky/cpus.txt | 73 
> > +
> >  1 file changed, 73 insertions(+)
> >  create mode 100644 Documentation/devicetree/bindings/csky/cpus.txt
> > 
> 
> Reviewed-by: Rob Herring 

Thx Rob

Best Regards
 Guo Ren


Re: [PATCH 2/2] mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask

2018-10-18 Thread Andrew Morton
On Wed, 26 Sep 2018 16:22:27 +0200 Michal Hocko  wrote:

> > MPOL_PREFERRED is handled by policy_node() before we call 
> > __alloc_pages_nodemask.
> > __GFP_THISNODE is applied only when we are not using
> > __GFP_DIRECT_RECLAIM which is handled in alloc_hugepage_direct_gfpmask
> > now.
> > Lastly MPOL_BIND wasn't handled explicitly but in the end the removed
> > late check would remove __GFP_THISNODE for it as well. So in the end we
> > are doing the same thing unless I miss something
> 
> Forgot to add. One notable exception would be that the previous code
> would allow to hit
>   WARN_ON_ONCE(policy->mode == MPOL_BIND && (gfp & __GFP_THISNODE));
> in policy_node if the requested node (e.g. cpu local one) was outside of
> the mbind nodemask. This is not possible now. We haven't heard about any
> such warning yet so it is unlikely that it happens though.

Perhaps a changelog addition is needed to cover the above?

I assume that David's mbind() concern has gone away.

No acks or reviewed-by's yet?


Re: possible deadlock in __generic_file_fsync

2018-10-18 Thread syzbot

syzbot has found a reproducer for the following crash on:

HEAD commit:fa520c47eaa1 fscache: Fix out of bound read in long cookie..
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=130da8ee40
kernel config:  https://syzkaller.appspot.com/x/.config?x=b3f55cb3dfcc6c33
dashboard link: https://syzkaller.appspot.com/bug?extid=5cd33f0e6abe2bb3e397
compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=1299be0940

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+5cd33f0e6abe2bb3e...@syzkaller.appspotmail.com


==
WARNING: possible circular locking dependency detected
4.19.0-rc8+ #290 Not tainted
--
kworker/0:1/14 is trying to acquire lock:
722efb72 (>s_type->i_mutex_key#10){+.+.}, at: inode_lock  
include/linux/fs.h:738 [inline]
722efb72 (>s_type->i_mutex_key#10){+.+.}, at:  
__generic_file_fsync+0xb5/0x200 fs/libfs.c:981


but task is already holding lock:
0144bfd5 ((work_completion)(>complete_work)){+.+.}, at:  
process_one_work+0xb9a/0x1b90 kernel/workqueue.c:2128


which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 ((work_completion)(>complete_work)){+.+.}:
   process_one_work+0xc0a/0x1b90 kernel/workqueue.c:2129
   worker_thread+0x17f/0x1390 kernel/workqueue.c:2296
   kthread+0x35a/0x420 kernel/kthread.c:246
   ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:413

-> #1 ((wq_completion)"dio/%s"sb->s_id){+.+.}:
   flush_workqueue+0x30a/0x1e10 kernel/workqueue.c:2655
   drain_workqueue+0x2a9/0x640 kernel/workqueue.c:2820
   destroy_workqueue+0xc6/0x9c0 kernel/workqueue.c:4155
   __alloc_workqueue_key+0xed8/0x1170 kernel/workqueue.c:4138
   sb_init_dio_done_wq+0x37/0x90 fs/direct-io.c:623
   do_blockdev_direct_IO+0x12ea/0x9d70 fs/direct-io.c:1283
   __blockdev_direct_IO+0x9d/0xc6 fs/direct-io.c:1417
   ext4_direct_IO_write fs/ext4/inode.c:3743 [inline]
   ext4_direct_IO+0xae8/0x2230 fs/ext4/inode.c:3870
   generic_file_direct_write+0x275/0x4b0 mm/filemap.c:3042
   __generic_file_write_iter+0x2ff/0x630 mm/filemap.c:3221
   ext4_file_write_iter+0x390/0x1420 fs/ext4/file.c:266
   call_write_iter include/linux/fs.h:1808 [inline]
   aio_write+0x3b1/0x610 fs/aio.c:1561
   io_submit_one+0xaa1/0xf80 fs/aio.c:1835
   __do_sys_io_submit fs/aio.c:1916 [inline]
   __se_sys_io_submit fs/aio.c:1887 [inline]
   __x64_sys_io_submit+0x1b7/0x580 fs/aio.c:1887
   do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

-> #0 (>s_type->i_mutex_key#10){+.+.}:
   lock_acquire+0x1ed/0x520 kernel/locking/lockdep.c:3900
   down_write+0x8a/0x130 kernel/locking/rwsem.c:70
   inode_lock include/linux/fs.h:738 [inline]
   __generic_file_fsync+0xb5/0x200 fs/libfs.c:981
   ext4_sync_file+0xa4f/0x1510 fs/ext4/fsync.c:120
   vfs_fsync_range+0x140/0x220 fs/sync.c:197
   generic_write_sync include/linux/fs.h:2732 [inline]
   dio_complete+0x75c/0x9e0 fs/direct-io.c:329
   dio_aio_complete_work+0x20/0x30 fs/direct-io.c:341
   process_one_work+0xc90/0x1b90 kernel/workqueue.c:2153
   worker_thread+0x17f/0x1390 kernel/workqueue.c:2296
   kthread+0x35a/0x420 kernel/kthread.c:246
   ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:413

other info that might help us debug this:

Chain exists of:
  >s_type->i_mutex_key#10 --> (wq_completion)"dio/%s"sb->s_id -->  
(work_completion)(>complete_work)


 Possible unsafe locking scenario:

   CPU0CPU1
   
  lock((work_completion)(>complete_work));
   lock((wq_completion)"dio/%s"sb->s_id);
   lock((work_completion)(>complete_work));
  lock(>s_type->i_mutex_key#10);

 *** DEADLOCK ***

2 locks held by kworker/0:1/14:
 #0: 13da2705 ((wq_completion)"dio/%s"sb->s_id){+.+.}, at:  
__write_once_size include/linux/compiler.h:215 [inline]
 #0: 13da2705 ((wq_completion)"dio/%s"sb->s_id){+.+.}, at:  
arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
 #0: 13da2705 ((wq_completion)"dio/%s"sb->s_id){+.+.}, at:  
atomic64_set include/asm-generic/atomic-instrumented.h:40 [inline]
 #0: 13da2705 ((wq_completion)"dio/%s"sb->s_id){+.+.}, at:  
atomic_long_set include/asm-generic/atomic-long.h:59 [inline]
 #0: 13da2705 ((wq_completion)"dio/%s"sb->s_id){+.+.}, at:  
set_work_data kernel/workqueue.c:617 [inline]
 #0: 13da2705 ((wq_completion)"dio/%s"sb->s_id){+.+.}, at:  
set_work_pool_and_clear_pending kernel/workqueue.c:644 [inline]
 #0: 13da2705 ((wq_completion)"dio/%s"sb->s_id){+.+.}, at:  
process_one_work+0xb43/0x1b90 

Re: [PATCH 0/9] psi: pressure stall information for CPU, memory, and IO v4

2018-10-18 Thread Andrew Morton
On Tue, 28 Aug 2018 13:22:49 -0400 Johannes Weiner  wrote:

> This version 4 of the PSI series incorporates feedback from Peter and
> fixes two races in the lockless aggregator that Suren found in his
> testing and which caused the sample calculation to sometimes underflow
> and record bogusly large samples; details at the bottom of this email.

We've had very little in the way of review activity for the PSI
patchset.  According to the changelog tags, anyway.



[PATCH V2] perf arm64: Fix generate system call table failed with /tmp mounted with noexec

2018-10-18 Thread Hongxu Jia
When /tmp is mounted with noexec, mksyscalltbl fails.
[snip]
|perf-1.0/tools/perf/arch/arm64/entry/syscalls//mksyscalltbl:
/tmp/create-table-6VGPSt: Permission denied
[snip]

Add variable TMPDIR as prefix dir of the temporary file, if it is set,
replace default /tmp

Remove extra slash from `syscalls//mksyscalltbl'

Fixes: 2b5882435606 ("perf arm64: Generate system call table from asm/unistd.h")

Signed-off-by: Hongxu Jia 
---
 tools/perf/arch/arm64/Makefile| 2 +-
 tools/perf/arch/arm64/entry/syscalls/mksyscalltbl | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/arch/arm64/Makefile b/tools/perf/arch/arm64/Makefile
index dbef716..bc2a284 100644
--- a/tools/perf/arch/arm64/Makefile
+++ b/tools/perf/arch/arm64/Makefile
@@ -13,7 +13,7 @@ out:= $(OUTPUT)arch/arm64/include/generated/asm
 header := $(out)/syscalls.c
 incpath := $(srctree)/tools
 sysdef := $(srctree)/tools/arch/arm64/include/uapi/asm/unistd.h
-sysprf := $(srctree)/tools/perf/arch/arm64/entry/syscalls/
+sysprf := $(srctree)/tools/perf/arch/arm64/entry/syscalls
 systbl := $(sysprf)/mksyscalltbl
 
 # Create output directory if not already present
diff --git a/tools/perf/arch/arm64/entry/syscalls/mksyscalltbl 
b/tools/perf/arch/arm64/entry/syscalls/mksyscalltbl
index 2dbb8cad..c88fd32 100755
--- a/tools/perf/arch/arm64/entry/syscalls/mksyscalltbl
+++ b/tools/perf/arch/arm64/entry/syscalls/mksyscalltbl
@@ -23,7 +23,7 @@ create_table_from_c()
 {
local sc nr last_sc
 
-   create_table_exe=`mktemp /tmp/create-table-XX`
+   create_table_exe=`mktemp ${TMPDIR:-/tmp}/create-table-XX`
 
{
 
-- 
2.7.4



Re: [PATCH V2 2/5] mm/hugetlb: Distinguish between migratability and movability

2018-10-18 Thread Naoya Horiguchi
On Fri, Oct 12, 2018 at 09:29:56AM +0530, Anshuman Khandual wrote:
> During huge page allocation it's migratability is checked to determine if
> it should be placed under movable zones with GFP_HIGHUSER_MOVABLE. But the
> movability aspect of the huge page could depend on other factors than just
> migratability. Movability in itself is a distinct property which should not
> be tied with migratability alone.
> 
> This differentiates these two and implements an enhanced movability check
> which also considers huge page size to determine if it is feasible to be
> placed under a movable zone. At present it just checks for gigantic pages
> but going forward it can incorporate other enhanced checks.

(nitpicking...)
The following code just checks hugepage_migration_supported(), so maybe
s/Movability/Migratability/ is expected in the comment?

  static int unmap_and_move_huge_page(...)
  {
  ...
  /*
   * Movability of hugepages depends on architectures and hugepage size.
   * This check is necessary because some callers of hugepage migration
   * like soft offline and memory hotremove don't walk through page
   * tables or check whether the hugepage is pmd-based or not before
   * kicking migration.
   */
  if (!hugepage_migration_supported(page_hstate(hpage))) {

Thanks,
Naoya Horiguchi

> 
> Suggested-by: Michal Hocko 
> Signed-off-by: Anshuman Khandual 
> ---
>  include/linux/hugetlb.h | 30 ++
>  mm/hugetlb.c|  2 +-
>  2 files changed, 31 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 9c1b77f..456cb60 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -493,6 +493,31 @@ static inline bool hugepage_migration_supported(struct 
> hstate *h)
>  #endif
>  }
>  
> +/*
> + * Movability check is different as compared to migration check.
> + * It determines whether or not a huge page should be placed on
> + * movable zone or not. Movability of any huge page should be
> + * required only if huge page size is supported for migration.
> + * There wont be any reason for the huge page to be movable if
> + * it is not migratable to start with. Also the size of the huge
> + * page should be large enough to be placed under a movable zone
> + * and still feasible enough to be migratable. Just the presence
> + * in movable zone does not make the migration feasible.
> + *
> + * So even though large huge page sizes like the gigantic ones
> + * are migratable they should not be movable because its not
> + * feasible to migrate them from movable zone.
> + */
> +static inline bool hugepage_movable_supported(struct hstate *h)
> +{
> + if (!hugepage_migration_supported(h))
> + return false;
> +
> + if (hstate_is_gigantic(h))
> + return false;
> + return true;
> +}
> +
>  static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
>  struct mm_struct *mm, pte_t *pte)
>  {
> @@ -589,6 +614,11 @@ static inline bool hugepage_migration_supported(struct 
> hstate *h)
>   return false;
>  }
>  
> +static inline bool hugepage_movable_supported(struct hstate *h)
> +{
> + return false;
> +}
> +
>  static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
>  struct mm_struct *mm, pte_t *pte)
>  {
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 3c21775..a5a111d 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -919,7 +919,7 @@ static struct page *dequeue_huge_page_nodemask(struct 
> hstate *h, gfp_t gfp_mask,
>  /* Movability of hugepages depends on migration support. */
>  static inline gfp_t htlb_alloc_mask(struct hstate *h)
>  {
> - if (hugepage_migration_supported(h))
> + if (hugepage_movable_supported(h))
>   return GFP_HIGHUSER_MOVABLE;
>   else
>   return GFP_HIGHUSER;
> -- 
> 2.7.4
> 
> 


Re: [PATCH V2 0/5] arm64/mm: Enable HugeTLB migration

2018-10-18 Thread Naoya Horiguchi
On Wed, Oct 17, 2018 at 01:49:17PM +0530, Anshuman Khandual wrote:
> 
> 
> On 10/12/2018 09:29 AM, Anshuman Khandual wrote:
> > This patch series enables HugeTLB migration support for all supported
> > huge page sizes at all levels including contiguous bit implementation.
> > Following HugeTLB migration support matrix has been enabled with this
> > patch series. All permutations have been tested except for the 16GB.
> > 
> >  CONT PTEPMDCONT PMDPUD
> >  ------
> > 4K: 64K 2M 32M 1G
> > 16K: 2M32M  1G
> > 64K: 2M   512M 16G
> > 
> > First the series adds migration support for PUD based huge pages. It
> > then adds a platform specific hook to query an architecture if a
> > given huge page size is supported for migration while also providing
> > a default fallback option preserving the existing semantics which just
> > checks for (PMD|PUD|PGDIR)_SHIFT macros. The last two patches enables
> > HugeTLB migration on arm64 and subscribe to this new platform specific
> > hook by defining an override.
> > 
> > The second patch differentiates between movability and migratability
> > aspects of huge pages and implements hugepage_movable_supported() which
> > can then be used during allocation to decide whether to place the huge
> > page in movable zone or not.
> > 
> > Changes in V2:
> > 
> > - Added a new patch which differentiates migratability and movability
> >   of huge pages and implements hugepage_movable_supported() function
> >   as suggested by Michal Hocko.
> 
> Hello Andrew/Michal/Mike/Naoya/Catalin,
> 
> Just checking for an update. Does this series looks okay ?

Looks good to me. So for the series

Reviewed-by: Naoya Horiguchi 


Re: [PATCH v4 3/6] dt-bindings: power: Introduce properties to present the battery OCV capacity table

2018-10-18 Thread Baolin Wang
On 19 October 2018 at 00:51, Rob Herring  wrote:
> On Mon, Oct 15, 2018 at 04:09:22PM +0800, Baolin Wang wrote:
>> Some battery driver will use the open circuit voltage (OCV) value to look
>> up the corresponding battery capacity percent in one certain degree Celsius.
>> Thus this patch provides some battery properties to present the OCV table
>> temperatures and OCV capacity table values.
>>
>> Suggested-by: Sebastian Reichel 
>> Signed-off-by: Baolin Wang 
>> Reviewed-by: Linus Walleij 
>> ---
>> Changes from v3:
>>  - Split binding into one separate patch.
>>  - Rename ocv-capacity-table-temperatures to ocv-capacity-celsius.
>>  - Add some words to specify the OCV's unit.
>>
>> Changes from v2:
>>  - Use type __be32 to calculate the table length.
>>  - Update error messages.
>>  - Add some helper functions.
>>
>> Changes from v1:
>>  - New patch in v2.
>> ---
>>  .../devicetree/bindings/power/supply/battery.txt   |   15 +++
>>  1 file changed, 15 insertions(+)
>>
>> diff --git a/Documentation/devicetree/bindings/power/supply/battery.txt 
>> b/Documentation/devicetree/bindings/power/supply/battery.txt
>> index 938d027..1f70e5d 100644
>> --- a/Documentation/devicetree/bindings/power/supply/battery.txt
>> +++ b/Documentation/devicetree/bindings/power/supply/battery.txt
>> @@ -23,6 +23,17 @@ Optional Properties:
>>   - constant-charge-current-max-microamp: maximum constant input current
>>   - constant-charge-voltage-max-microvolt: maximum constant input voltage
>>   - factory-internal-resistance-micro-ohms: battery factory internal 
>> resistance
>> + - ocv-capacity-table-0: An array providing the battery capacity percent
>> +   with corresponding open circuit voltage (OCV) of the battery, which
>> +   is used to look up battery capacity according to current OCV value.
>> +   And the open circuit voltage unit is microvolt.
>
> The order percent and voltage is still not clear. I'd guess one way
> based on the text here, but the example is the opposite. The description
> here must stand on its own.

Yes, you are right. I will correct the order of ocv-capacity-table-n. Thanks.

>> + - ocv-capacity-table-1: Same as ocv-capacity-table-0
>> + ..
>> + - ocv-capacity-table-n: Same as ocv-capacity-table-0
>> + - ocv-capacity-celsius: An array containing the temperature in degree 
>> Celsius,
>> +   for each of the battery capacity lookup table. The first temperature 
>> value
>> +   specifies the OCV table 0, and the second temperature value specifies the
>> +   OCV table 1, and so on.
>>
>>  Battery properties are named, where possible, for the corresponding
>>  elements in enum power_supply_property, defined in
>> @@ -44,6 +55,10 @@ Example:
>>   constant-charge-current-max-microamp = <90>;
>>   constant-charge-voltage-max-microvolt = <420>;
>>   factory-internal-resistance-micro-ohms = <25>;
>> + ocv-capacity-celsius = <(-10) 0 10>;
>> + ocv-capacity-table-0 = <4185000 100>, <4113000 95>, <4066000 
>> 90>, ...;
>> + ocv-capacity-table-1 = <420 100>, <4185000 95>, <4113000 
>> 90>, ...;
>> + ocv-capacity-table-2 = <425 100>, <420 95>, <4185000 
>> 90>, ...;
>>   };
>>
>>   charger: charger@11 {
>> --
>> 1.7.9.5
>>



-- 
Baolin Wang
Best Regards


Re: [PATCH 3/3] statx: add STATX_ATTRIBUTES flag

2018-10-18 Thread Andreas Dilger
On Oct 18, 2018, at 7:11 AM, Miklos Szeredi  wrote:
> 
> FUSE will want to know if stx_attributes is interesting or not, because
> there's a non-zero cost of retreiving it.
> 
> This is more of a "want" flag, since stx_attributes_mask already indicates
> whether we "got" stx_attributes or not.  So for now we can just deal with
> this flag in the generic code.
> 
> Signed-off-by: Miklos Szeredi 
> Cc: David Howells 
> Cc: Michael Kerrisk 

Reviewed-by: Andreas Dilger 

> ---
> fs/stat.c   | 3 +++
> include/uapi/linux/stat.h   | 1 +
> samples/statx/test-statx.c  | 2 +-
> tools/include/uapi/linux/stat.h | 1 +
> 4 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/stat.c b/fs/stat.c
> index 8d297a279991..6bf86d57e2e3 100644
> --- a/fs/stat.c
> +++ b/fs/stat.c
> @@ -535,6 +535,9 @@ cp_statx(const struct kstat *stat, struct statx __user 
> *buffer)
>   tmp.stx_size = stat->size;
>   tmp.stx_blocks = stat->blocks;
>   tmp.stx_attributes_mask = stat->attributes_mask;
> + /* Having anything in attributes_mask means attributes are valid. */
> + if (tmp.stx_attributes_mask)
> + tmp.stx_mask |= STATX_ATTRIBUTES;
>   tmp.stx_atime.tv_sec = stat->atime.tv_sec;
>   tmp.stx_atime.tv_nsec = stat->atime.tv_nsec;
>   tmp.stx_btime.tv_sec = stat->btime.tv_sec;
> diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h
> index 370f09d92fa6..aef0aba5dfe7 100644
> --- a/include/uapi/linux/stat.h
> +++ b/include/uapi/linux/stat.h
> @@ -148,6 +148,7 @@ struct statx {
> #define STATX_BLOCKS  0x0400U /* Want/got stx_blocks */
> #define STATX_BASIC_STATS 0x07ffU /* The stuff in the normal stat 
> struct */
> #define STATX_BTIME   0x0800U /* Want/got stx_btime */
> +#define STATX_ATTRIBUTES 0x1000U /* Want/got stx_attributes */
> 
> #define STATX__RESERVED   0x8000U /* Reserved for future 
> struct statx expansion */
> 
> diff --git a/samples/statx/test-statx.c b/samples/statx/test-statx.c
> index e354048dea3c..deef9a68ff68 100644
> --- a/samples/statx/test-statx.c
> +++ b/samples/statx/test-statx.c
> @@ -211,7 +211,7 @@ int main(int argc, char **argv)
>   struct statx stx;
>   int ret, raw = 0, atflag = AT_SYMLINK_NOFOLLOW;
> 
> - unsigned int mask = STATX_BASIC_STATS | STATX_BTIME;
> + unsigned int mask = STATX_BASIC_STATS | STATX_BTIME | STATX_ATTRIBUTES;
> 
>   for (argv++; *argv; argv++) {
>   if (strcmp(*argv, "-F") == 0) {
> diff --git a/tools/include/uapi/linux/stat.h b/tools/include/uapi/linux/stat.h
> index 370f09d92fa6..aef0aba5dfe7 100644
> --- a/tools/include/uapi/linux/stat.h
> +++ b/tools/include/uapi/linux/stat.h
> @@ -148,6 +148,7 @@ struct statx {
> #define STATX_BLOCKS  0x0400U /* Want/got stx_blocks */
> #define STATX_BASIC_STATS 0x07ffU /* The stuff in the normal stat 
> struct */
> #define STATX_BTIME   0x0800U /* Want/got stx_btime */
> +#define STATX_ATTRIBUTES 0x1000U /* Want/got stx_attributes */
> 
> #define STATX__RESERVED   0x8000U /* Reserved for future 
> struct statx expansion */
> 
> --
> 2.14.3
> 


Cheers, Andreas







signature.asc
Description: Message signed with OpenPGP


Re: [PATCH] hugetlbfs: dirty pages as they are added to pagecache

2018-10-18 Thread Andrew Morton
On Thu, 18 Oct 2018 20:46:21 -0400 Andrea Arcangeli  wrote:

> On Thu, Oct 18, 2018 at 04:16:40PM -0700, Mike Kravetz wrote:
> > I was not sure about this, and expected someone could come up with
> > something better.  It just seems there are filesystems like huegtlbfs,
> > where it makes no sense wasting cycles traversing the filesystem.  So,
> > let's not even try.
> > 
> > Hoping someone can come up with a better method than hard coding as
> > I have done above.
> 
> It's not strictly required after marking the pages dirty though. The
> real fix is the other one? Could we just drop the hardcoding and let
> it run after the real fix is applied?
> 
> The performance of drop_caches doesn't seem critical, especially with
> gigapages. tmpfs doesn't seem to be optimized away from drop_caches
> and the gain would be bigger for tmpfs if THP is not enabled in the
> mount, so I'm not sure if we should worry about hugetlbfs first.

I guess so.  I can't immediately see a clean way of expressing this so
perhaps it would need a new BDI_CAP_NO_BACKING_STORE.  Such a
thing hardly seems worthwhile for drop_caches.

And drop_caches really shouldn't be there anyway.  It's a standing
workaround for ongoing suckage in pagecache and metadata reclaim
behaviour :(



[PATCH 2/2] locking/lockdep: Make global debug_locks* variables read-mostly

2018-10-18 Thread Waiman Long
Make the frequently used lockdep global variable debug_locks read-mostly.
As debug_locks_silent is sometime used together with debug_locks,
it is also made read-mostly so that they can be close together.

With false cacheline sharing, cacheline contention problem can happen
depending on what get put into the same cacheline as debug_locks.

Signed-off-by: Waiman Long 
---
 include/linux/debug_locks.h | 4 ++--
 lib/debug_locks.c   | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/debug_locks.h b/include/linux/debug_locks.h
index 120225e..257ab3c 100644
--- a/include/linux/debug_locks.h
+++ b/include/linux/debug_locks.h
@@ -8,8 +8,8 @@
 
 struct task_struct;
 
-extern int debug_locks;
-extern int debug_locks_silent;
+extern int debug_locks __read_mostly;
+extern int debug_locks_silent __read_mostly;
 
 
 static inline int __debug_locks_off(void)
diff --git a/lib/debug_locks.c b/lib/debug_locks.c
index 124fdf2..ce51749 100644
--- a/lib/debug_locks.c
+++ b/lib/debug_locks.c
@@ -21,7 +21,7 @@
  * that would just muddy the log. So we report the first one and
  * shut up after that.
  */
-int debug_locks = 1;
+int debug_locks __read_mostly = 1;
 EXPORT_SYMBOL_GPL(debug_locks);
 
 /*
@@ -29,7 +29,7 @@
  * 'silent failure': nothing is printed to the console when
  * a locking bug is detected.
  */
-int debug_locks_silent;
+int debug_locks_silent __read_mostly;
 EXPORT_SYMBOL_GPL(debug_locks_silent);
 
 /*
-- 
1.8.3.1



[PATCH 1/2] locking/lockdep: Fix debug_locks off performance problem

2018-10-18 Thread Waiman Long
It was found that when debug_locks was turned off because of a problem
found by the lockdep code, the system performance could drop quite
significantly when the lock_stat code was also configured into the
kernel. For instance, parallel kernel build time on a 4-socket x86-64
server nearly doubled.

Further analysis into the cause of the slowdown traced back to the
frequent call to debug_locks_off() from the __lock_acquired() function
probably due to some inconsistent lockdep states with debug_locks
off. The debug_locks_off() function did an unconditional atomic xchg
to write a 0 value into debug_locks which had already been set to 0.
This led to severe cacheline contention in the cacheline that held
debug_locks.  As debug_locks is being referenced in quite a few different
places in the kernel, this greatly slow down the system performance.

To prevent that trashing of debug_locks cacheline, lock_acquired()
and lock_contended() now checks the state of debug_locks before
proceeding. The debug_locks_off() function is also modified to check
debug_locks before calling __debug_locks_off().

Signed-off-by: Waiman Long 
---
 kernel/locking/lockdep.c | 4 ++--
 lib/debug_locks.c| 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index be76f47..1efada2 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -4066,7 +4066,7 @@ void lock_contended(struct lockdep_map *lock, unsigned 
long ip)
 {
unsigned long flags;
 
-   if (unlikely(!lock_stat))
+   if (unlikely(!lock_stat || !debug_locks))
return;
 
if (unlikely(current->lockdep_recursion))
@@ -4086,7 +4086,7 @@ void lock_acquired(struct lockdep_map *lock, unsigned 
long ip)
 {
unsigned long flags;
 
-   if (unlikely(!lock_stat))
+   if (unlikely(!lock_stat || !debug_locks))
return;
 
if (unlikely(current->lockdep_recursion))
diff --git a/lib/debug_locks.c b/lib/debug_locks.c
index 96c4c63..124fdf2 100644
--- a/lib/debug_locks.c
+++ b/lib/debug_locks.c
@@ -37,7 +37,7 @@
  */
 int debug_locks_off(void)
 {
-   if (__debug_locks_off()) {
+   if (debug_locks && __debug_locks_off()) {
if (!debug_locks_silent) {
console_verbose();
return 1;
-- 
1.8.3.1



Re: [PATCH 2/3] uapi: get rid of STATX_ALL

2018-10-18 Thread Andreas Dilger

> On Oct 18, 2018, at 7:15 AM, Florian Weimer  wrote:
> 
> * Miklos Szeredi:
> 
>> #define STATX__RESERVED  0x8000U /* Reserved for future 
>> struct statx expansion */
> 
> What about this?  Isn't it similar to STATX_ALL in the sense that we
> don't know yet what it will mean?

No, this means that this last bit will be used for increasing the size of the
stx_mask field at some point in the future.  IMHO, this is present as a reminder
to any developer who is adding fields in the future not to use the last flag.

Cheers, Andreas







signature.asc
Description: Message signed with OpenPGP


Re: [PATCH 4.9 00/35] 4.9.135-stable review

2018-10-18 Thread Nathan Chancellor
On Thu, Oct 18, 2018 at 07:54:29PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.9.135 release.
> There are 35 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Sat Oct 20 17:54:00 UTC 2018.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
>   
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.135-rc1.gz
> or in the git tree and branch at:
>   
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.9.y
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h
> 

Merged, compiled with -Werror, and installed onto my OnePlus 6.

No initial issues noticed in dmesg or general usage.

Thanks!
Nathan


Re: [PATCH 4.14 00/41] 4.14.78-stable review

2018-10-18 Thread Nathan Chancellor
On Thu, Oct 18, 2018 at 07:54:15PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.14.78 release.
> There are 41 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Sat Oct 20 17:53:55 UTC 2018.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
>   
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.78-rc1.gz
> or in the git tree and branch at:
>   
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.14.y
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h
> 

Merged, compiled, and installed onto my Raspberry Pi.

No initial issues noticed in dmesg or general usage.

Thanks!
Nathan


Re: [PATCH 4.4 00/48] 4.4.162-stable review

2018-10-18 Thread Nathan Chancellor
On Thu, Oct 18, 2018 at 07:54:35PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.4.162 release.
> There are 48 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Sat Oct 20 17:54:03 UTC 2018.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
>   
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.162-rc1.gz
> or in the git tree and branch at:
>   
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.4.y
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h
> 

Merged, compiled with -Werror, and installed onto my Pixel 2 XL.

No initial issues noticed in dmesg or general usage.

Thanks!
Nathan


Re: turbostat-17.06.23 floating point exception

2018-10-18 Thread Solio Sarabia
On Fri, Oct 12, 2018 at 07:03:41PM -0400, Len Brown wrote:
> > Why would the cpu topology report 0 cpus?  I added a debug entry to
> > cpu_usage_stat and /proc/stat showed it as an extra column.  Then
> > fscanf parsing in for_all_cpus() failed, causing the SIGFPE.
> >
> > This is not an issue. Thanks.
> 
> Yes, it is true that turbostat doesn't check for systems with 0 cpus.
> I'm curious how you provoked the kernel to claim that.  If it is
> something others might do, we can have check for it and gracefully
> exit.

source/tools/power/x86/turbostat/turbostat.c
int for_all_proc_cpus(int (func)(int))
{
retval = fscanf(fp, "cpu %*d %*d %*d %*d %*d %*d %*d %*d %*d %*d\n");
^
This fails due to an extra debug entry in /proc/stat
(total of 11 columns).  I was measuring time in a hot
function and decided to add this time in an extra
cpu_usage_stat. This was an experiment though.

Thanks,
-S.


Re: [BUG -next 20181008] list corruption with "mm/slub: remove useless condition in deactivate_slab"

2018-10-18 Thread Pingfan Liu
On Tue, Oct 16, 2018 at 3:36 PM Heiko Carstens
 wrote:
>
> On Tue, Oct 16, 2018 at 02:29:28PM +0800, Pingfan Liu wrote:
> > > I think it is caused by the uinon page->lru and page->next. It can be 
> > > fixed by:
> > > diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
> > > index 3a1a1db..4aa0fb5 100644
> > > --- a/include/linux/slub_def.h
> > > +++ b/include/linux/slub_def.h
> > > @@ -56,6 +56,7 @@ struct kmem_cache_cpu {
> > >  #define slub_set_percpu_partial(c, p)  \
> > >  ({ \
> > > slub_percpu_partial(c) = (p)->next; \
> > > +   p->next = NULL; \
> > >  })
> > >
> > > I will do some test and post the fix.
> > >
> > Please ignore the above comment. And after re-check the code, I am
> > sure that all callers of deactivate_slab(), pass c->page, which means
> > that page should not be on any list. But your test result "list_add
> > double add: new=03d1029ecc08,
> > prev=8ff846d0,next=03d1029ecc08"  indicates that
> > page(new) is already on a list. I think that maybe something else is
> > wrong which is covered.
> > I can not reproduce this bug on x86. Could you share your config and
> > cmdline? Any do you turn on any debug option of slub?
>
> You can re-create the config with "make ARCH=s390 debug_defconfig".
>
> Not sure which machine I used to reproduce this but most likely it was
> a machine with these command line options:
>
> dasd=e12d root=/dev/dasda1 userprocess_debug numa_debug sched_debug
> ignore_loglevel sclp_con_drop=1 sclp_con_pages=32 audit=0
> crashkernel=128M ignore_rlimit_data
>
> You can ignore the dasd and sclp* command line options. These are
> s390 specific. The rest should be available on any architecture.
>
Thank you for the info. I can reproduce the bug, and find that this
bug is caused by this commit. In deactivate_slab(), page is firstly
add_full(), then hit the redo condition, hence it should be
remove_full(). This wrong commit erases the related code.

Regards,
Pingfan


Re: [PATCH v4] PCI/AER: Enable error reporting for all ports

2018-10-18 Thread Bjorn Helgaas
On Thu, Oct 18, 2018 at 05:03:13PM -0600, Keith Busch wrote:
> On Thu, Oct 18, 2018 at 03:53:58PM -0500, Bjorn Helgaas wrote:
> > Change the AER service driver so it binds to *all* PCIe Ports,
> > including Switch Upstream and Downstream Ports.  Enable AER error
> > reporting for all these Ports, but not for any children.
> 
> I'm looking at this again and think enabling/disabling error
> reporting for ports is the responsibility of the port driver, not
> the AER service.

That's an interesting idea.  Can you expand on this a little more?
Why is it the responsibility of the port driver?

Do you think pci_enable_pcie_error_reporting() shouldn't be part of
the AER service because it updates the Device Control register, which
is in the PCIe Capability, not the AER Capability?

What about pci_aer_clear_device_status(), which clears Device Status,
which is also in the PCIe Capability?

> The following should do the same as this patch, but without making
> AER driver handle non-root ports.  The report enabling/disabling
> functions are already stubbed for '!CONFIG_PCIE_AER' and have checks
> for aer_cap and firmware first.

If we thought we should enable error reporting *always*, regardless of
whether the AER service is enabled, this would make perfect sense to
me, and I might suggest doing it in an even more generic place like
pci_configure_device() or pci_init_capabilities().

But that doesn't seem like where you're headed.  It seems like you
still only want error reporting enabled when CONFIG_PCIEAR=y.  If
that's the case, it seems like doing it in portdrv only obfuscates the
connection with AER.  When CONFIG_PCIEAER is unset, the portdrv code
*looks* like it's doing something but it's really not because of the
#ifdef magic.

> A real patch for this could even make this remove all the aer
> specific error report enabling, so it'd be a net-loss in code lines.
> :)
> 
> ---
> diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
> index 0acca3596807..f129a33c8303 100644
> --- a/drivers/pci/pcie/portdrv_pci.c
> +++ b/drivers/pci/pcie/portdrv_pci.c
> @@ -122,12 +122,13 @@ static int pcie_portdrv_probe(struct pci_dev *dev,
>   pm_runtime_put_autosuspend(>dev);
>   pm_runtime_allow(>dev);
>   }
> -
> + pci_enable_pcie_error_reporting(dev);
>   return 0;
>  }
>  
>  static void pcie_portdrv_remove(struct pci_dev *dev)
>  {
> + pci_disable_pcie_error_reporting(dev);
>   if (pci_bridge_d3_possible(dev)) {
>   pm_runtime_forbid(>dev);
>   pm_runtime_get_noresume(>dev);
> --


Re: [RFC PATCH 1/5] x86: introduce preemption disable prefix

2018-10-18 Thread Nadav Amit
at 10:00 AM, Andy Lutomirski  wrote:

> 
> 
>> On Oct 18, 2018, at 9:47 AM, Nadav Amit  wrote:
>> 
>> at 8:51 PM, Andy Lutomirski  wrote:
>> 
 On Wed, Oct 17, 2018 at 8:12 PM Nadav Amit  wrote:
 at 6:22 PM, Andy Lutomirski  wrote:
 
>> On Oct 17, 2018, at 5:54 PM, Nadav Amit  wrote:
>> 
>> It is sometimes beneficial to prevent preemption for very few
>> instructions, or prevent preemption for some instructions that precede
>> a branch (this latter case will be introduced in the next patches).
>> 
>> To provide such functionality on x86-64, we use an empty REX-prefix
>> (opcode 0x40) as an indication that preemption is disabled for the
>> following instruction.
> 
> Nifty!
> 
> That being said, I think you have a few bugs. First, you can’t just ignore
> a rescheduling interrupt, as you introduce unbounded latency when this
> happens — you’re effectively emulating preempt_enable_no_resched(), which
> is not a drop-in replacement for preempt_enable(). To fix this, you may
> need to jump to a slow-path trampoline that calls schedule() at the end or
> consider rewinding one instruction instead. Or use TF, which is only a
> little bit terrifying…
 
 Yes, I didn’t pay enough attention here. For my use-case, I think that the
 easiest solution would be to make synchronize_sched() ignore preemptions
 that happen while the prefix is detected. It would slightly change the
 meaning of the prefix.
>> 
>> So thinking about it further, rewinding the instruction seems the easiest
>> and most robust solution. I’ll do it.
>> 
> You also aren’t accounting for the case where you get an exception that
> is, in turn, preempted.
 
 Hmm.. Can you give me an example for such an exception in my use-case? I
 cannot think of an exception that might be preempted (assuming #BP, #MC
 cannot be preempted).
>>> 
>>> Look for cond_local_irq_enable().
>> 
>> I looked at it. Yet, I still don’t see how exceptions might happen in my
>> use-case, but having said that - this can be fixed too.
> 
> I’m not totally certain there’s a case that matters.  But it’s worth checking 

I am still checking. But, I wanted to ask you whether the existing code is
correct, since it seems to me that others do the same mistake I did, unless
I don’t understand the code.

Consider for example do_int3(), and see my inlined comments:

dotraplinkage void notrace do_int3(struct pt_regs *regs, long error_code)
{
...
ist_enter(regs);// => preempt_disable()
cond_local_irq_enable(regs);// => assume it enables IRQs

...
// resched irq can be delivered here. It will not caused rescheduling
// since preemption is disabled

cond_local_irq_disable(regs);   // => assume it disables IRQs
ist_exit(regs); // => preempt_enable_no_resched()
}

At this point resched will not happen for unbounded length of time (unless
there is another point when exiting the trap handler that checks if
preemption should take place).

Another example is __BPF_PROG_RUN_ARRAY(), which also uses
preempt_enable_no_resched().

Am I missing something?

Thanks,
Nadav

[PATCH V2] clk: imx7d: remove UART1 clock setting

2018-10-18 Thread Anson Huang
There are clock assignments in all i.MX7D dtb files for UART1,
below is the example in imx7d-sdb.dts, so setting UART1 clock
in clock driver is NOT necessary, actually, module clocks setting
should be done in module driver.

 {
pinctrl-names = "default";
pinctrl-0 = <_uart1>;
assigned-clocks = < IMX7D_UART1_ROOT_SRC>;
assigned-clock-parents = < IMX7D_PLL_SYS_MAIN_240M_CLK>;
status = "okay";
};

Signed-off-by: Anson Huang 
---
 drivers/clk/imx/clk-imx7d.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/clk/imx/clk-imx7d.c b/drivers/clk/imx/clk-imx7d.c
index adb08f6..06c105d 100644
--- a/drivers/clk/imx/clk-imx7d.c
+++ b/drivers/clk/imx/clk-imx7d.c
@@ -886,9 +886,6 @@ static void __init imx7d_clocks_init(struct device_node 
*ccm_node)
/* use old gpt clk setting, gpt1 root clk must be twice as gpt counter 
freq */
clk_set_parent(clks[IMX7D_GPT1_ROOT_SRC], clks[IMX7D_OSC_24M_CLK]);
 
-   /* set uart module clock's parent clock source that must be great then 
80MHz */
-   clk_set_parent(clks[IMX7D_UART1_ROOT_SRC], clks[IMX7D_OSC_24M_CLK]);
-
/* Set clock rate for USBPHY, the USB_PLL at CCM is from USBOTG2 */
clks[IMX7D_USB1_MAIN_480M_CLK] = 
imx_clk_fixed_factor("pll_usb1_main_clk", "osc", 20, 1);
clks[IMX7D_USB_MAIN_480M_CLK] = 
imx_clk_fixed_factor("pll_usb_main_clk", "osc", 20, 1);
-- 
2.7.4



[PATCH] rcu: Benefit from expedited grace period in __wait_rcu_gp

2018-10-18 Thread KarimAllah Ahmed
When expedited grace-period is set, both synchronize_sched
synchronize_rcu_bh can be optimized to have a significantly lower latency.

Improve wait_rcu_gp handling to also account for expedited grace-period.
The downside is that wait_rcu_gp will not wait anymore for all RCU variants
concurrently when an expedited grace-period is set, however, given the
improved latency it does not really matter.

Cc: Paul E. McKenney 
Cc: Josh Triplett 
Cc: Steven Rostedt 
Cc: Mathieu Desnoyers 
Cc: Lai Jiangshan 
Cc: linux-kernel@vger.kernel.org
Signed-off-by: KarimAllah Ahmed 
---
 kernel/rcu/update.c | 34 --
 1 file changed, 28 insertions(+), 6 deletions(-)

diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index 68fa19a..44b8817 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -392,13 +392,27 @@ void __wait_rcu_gp(bool checktiny, int n, call_rcu_func_t 
*crcu_array,
might_sleep();
continue;
}
-   init_rcu_head_on_stack(_array[i].head);
-   init_completion(_array[i].completion);
+
for (j = 0; j < i; j++)
if (crcu_array[j] == crcu_array[i])
break;
-   if (j == i)
-   (crcu_array[i])(_array[i].head, wakeme_after_rcu);
+   if (j != i)
+   continue;
+
+   if ((crcu_array[i] == call_rcu_sched ||
+crcu_array[i] == call_rcu_bh)
+   && rcu_gp_is_expedited()) {
+   if (crcu_array[i] == call_rcu_sched)
+   synchronize_sched_expedited();
+   else
+   synchronize_rcu_bh_expedited();
+
+   continue;
+   }
+
+   init_rcu_head_on_stack(_array[i].head);
+   init_completion(_array[i].completion);
+   (crcu_array[i])(_array[i].head, wakeme_after_rcu);
}
 
/* Wait for all callbacks to be invoked. */
@@ -407,11 +421,19 @@ void __wait_rcu_gp(bool checktiny, int n, call_rcu_func_t 
*crcu_array,
(crcu_array[i] == call_rcu ||
 crcu_array[i] == call_rcu_bh))
continue;
+
+   if ((crcu_array[i] == call_rcu_sched ||
+crcu_array[i] == call_rcu_bh)
+   && rcu_gp_is_expedited())
+   continue;
+
for (j = 0; j < i; j++)
if (crcu_array[j] == crcu_array[i])
break;
-   if (j == i)
-   wait_for_completion(_array[i].completion);
+   if (j != i)
+   continue;
+
+   wait_for_completion(_array[i].completion);
destroy_rcu_head_on_stack(_array[i].head);
}
 }
-- 
2.7.4



Re: [PATCH] hugetlbfs: dirty pages as they are added to pagecache

2018-10-18 Thread Andrea Arcangeli
On Thu, Oct 18, 2018 at 04:16:40PM -0700, Mike Kravetz wrote:
> I was not sure about this, and expected someone could come up with
> something better.  It just seems there are filesystems like huegtlbfs,
> where it makes no sense wasting cycles traversing the filesystem.  So,
> let's not even try.
> 
> Hoping someone can come up with a better method than hard coding as
> I have done above.

It's not strictly required after marking the pages dirty though. The
real fix is the other one? Could we just drop the hardcoding and let
it run after the real fix is applied?

The performance of drop_caches doesn't seem critical, especially with
gigapages. tmpfs doesn't seem to be optimized away from drop_caches
and the gain would be bigger for tmpfs if THP is not enabled in the
mount, so I'm not sure if we should worry about hugetlbfs first.

Thanks,
Andrea


Re: [PATCH v2 3/3] selftests: ftrace: Add synthetic event syntax testcase

2018-10-18 Thread Steven Rostedt
On Fri, 19 Oct 2018 09:38:18 +0900
Masami Hiramatsu  wrote:

> > Shuah, can you ack this patch, as I want to add it to this release,
> > along with the other changes.  
> 
> Shuah, just a note that this test case must fail unless these fixes are 
> applied.

This is why I wanted to attache them to my pull request :-)

-- Steve


Re: [PATCH v2 3/3] selftests: ftrace: Add synthetic event syntax testcase

2018-10-18 Thread Masami Hiramatsu
On Thu, 18 Oct 2018 16:34:27 -0400
Steven Rostedt  wrote:

> 
> Hi Masami,
> 
> Thanks! I pulled this into my urgent queue and will be running tests on
> it over night.

Thanks!
I hope to pass the test :)

> Shuah, can you ack this patch, as I want to add it to this release,
> along with the other changes.

Shuah, just a note that this test case must fail unless these fixes are applied.

Thank you,



-- 
Masami Hiramatsu 


[LKP] bd00db8b16 [ 48.718069] BUG: spinlock trylock failure on UP on CPU#0, kworker/u2:1/27

2018-10-18 Thread kernel test robot
Greetings,

0day kernel testing robot got the below dmesg and the first bad commit is

https://github.com/bvanassche/linux for-next

commit bd00db8b160b8cb2a61e6321aee5a68c6d823075
Author: Bart Van Assche 
AuthorDate: Tue Oct 16 13:57:06 2018 -0700
Commit: Bart Van Assche 
CommitDate: Wed Oct 17 16:17:43 2018 -0700

drivers/base: Probe devices concurrently if requested by the driver

Instead of probing devices sequentially in the PROBE_PREFER_ASYNCHRONOUS
mode, scan devices concurrently. This helps when the wall clock time for
a single probe is significantly above the CPU time needed for a single
probe, e.g. when scanning SCSI LUNs over a storage network.

Cc: Lee Duncan 
Cc: Hannes Reinecke 
Cc: Luis Chamberlain 
Cc: Johannes Thumshirn 
Cc: Christoph Hellwig 
Cc: Greg Kroah-Hartman 
Cc: Dan Williams 
Signed-off-by: Bart Van Assche 

49e249b0dd  drivers/base: Verify struct device locking requirements at runtime
bd00db8b16  drivers/base: Probe devices concurrently if requested by the driver
9294000d6d  IB/srp: Drain the send queue before destroying a QP
++++-+
|| 49e249b0dd | bd00db8b16 
| bvanassche/for-next |
++++-+
| boot_successes | 39 | 0  
| 12  |
| boot_failures  | 0  | 12 
| |
| BUG:spinlock_trylock_failure_on_UP_on_CPU  | 0  | 11 
| |
| BUG:unable_to_handle_kernel| 0  | 11 
| |
| Oops:#[##] | 0  | 11 
| |
| EIP:__list_add_valid   | 0  | 11 
| |
| Kernel_panic-not_syncing:Fatal_exception   | 0  | 11 
| |
| genirq:Flags_mismatch_irq##(sir_ir)vs.#(serial_ir) | 0  | 1  
| |
++++-+

[   42.926637] CE: hpet increased min_delta_ns to 25312 nsec
[   42.926776] CE: hpet increased min_delta_ns to 37968 nsec
[   48.630057] [drm] Initialized vgem 1.0.0 20120112 for virtual device on 
minor 0
[   48.631889] parport_pc 00:04: reported by Plug and Play ACPI
[   48.633342] parport0: PC-style at 0x378, irq 7 [PCSPP,TRISTATE]
[   48.718069] BUG: spinlock trylock failure on UP on CPU#0, kworker/u2:1/27
[   48.719387]  lock: 0xcd89bc44, .magic: c224d69c, .owner: /-1, 
.owner_cpu: 0
[   48.720968] CPU: 0 PID: 27 Comm: kworker/u2:1 Not tainted 
4.19.0-rc1-00171-gbd00db8 #1
[   48.722617] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1 04/01/2014
[   48.724360] Workqueue: events_unbound async_run_entry_fn
[   48.725391] Call Trace:
[   48.726152]  dump_stack+0x16/0x26
[   48.726948]  spin_dump+0x66/0xa0
[   48.727729]  do_raw_spin_trylock+0x3c/0x60
[   48.728586]  _raw_spin_lock+0x2f/0x60
[   48.729380]  ? __mutex_lock+0xd9/0x460
[   48.730190]  __mutex_lock+0xd9/0x460
[   48.730974]  ? sched_clock_local+0x15/0x120
[   48.732012]  ? _raw_spin_unlock_irqrestore+0x32/0x70
[   48.733027]  mutex_lock_nested+0x14/0x20
[   48.733904]  ? __driver_probe_device_async+0x1f/0x50
[   48.734889]  __driver_probe_device_async+0x1f/0x50
[   48.735841]  async_run_entry_fn+0x34/0xe0
[   48.736786]  process_one_work+0x275/0x4b0
[   48.737669]  worker_thread+0x279/0x400
[   48.738511]  kthread+0xf9/0x100
[   48.739250]  ? process_one_work+0x4b0/0x4b0
[   48.740117]  ? __kthread_create_on_node+0x130/0x130
[   48.741094]  ret_from_fork+0x19/0x24
[   48.741890] BUG: unable to handle kernel NULL pointer dereference at 
[   48.743166] *pde =  
[   48.743877] Oops:  [#1] PREEMPT PTI
[   48.744741] CPU: 0 PID: 27 Comm: kworker/u2:1 Not tainted 
4.19.0-rc1-00171-gbd00db8 #1
[   48.746368] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1 04/01/2014
[   48.746717] Workqueue: events_unbound async_run_entry_fn
[   48.746717] EIP: __list_add_valid+0x20/0x60
[   48.746717] Code: 90 90 90 90 90 90 90 90 90 90 55 89 e5 56 53 8b 59 04 39 
d3 74 14 51 53 52 68 70 fb e0 c1 e8 68 96 c0 ff 0f 0b 90 8d 74 26 00 <8b> 33 39 
ce 74 12 53 56 51 68 ec fb e0 c1 e8 4e 96 c0 ff 0f 0b 8d
[   48.746717] EAX: cf5e7ec8 EBX:  ECX: cd89bc6c EDX: 
[   48.746717] ESI: cd89bc6c EDI: cd89bc40 EBP: cf5e7e84 ESP: cf5e7e7c
[   48.746717] DS: 007b ES: 007b FS:  GS:  SS: 0068 EFLAGS: 00210246
[   48.746717] CR0: 80050033 CR2:  CR3: 0223c000 CR4: 00140690
[   48.746717] DR0:  DR1:  DR2:  DR3: 
[   

Re: [PATCH] seq_buf: Make seq_buf_puts() NULL terminate the buffer

2018-10-18 Thread Kees Cook
On Wed, Oct 17, 2018 at 5:10 AM, Michael Ellerman  wrote:
> Currently seq_buf_puts() will happily create a non NULL terminated
> string for you in the buffer. This is particularly dangerous if the
> buffer is on the stack.
>
> For example:
>
>   char buf[8];
>   char secret = "secret";
>   struct seq_buf s;
>
>   seq_buf_init(, buf, sizeof(buf));
>   seq_buf_puts(, "foo");
>   printk("Message is %s\n", buf);
>
> Can result in:
>
>   Message is fooªsecret
>
> We could require all users to memset() their buffer to NULL before
> use. But that seems likely to be forgotten and lead to bugs.
>
> Instead we can change seq_buf_puts() to always leave the buffer in a
> NULL terminated state.
>
> The only downside is that this makes the buffer 1 character smaller
> for seq_buf_puts(), but that seems like a good trade off.
>
> Signed-off-by: Michael Ellerman 

Yes, please! :) I prefer keeping the string terminated over needing to
remember to do it later.

Acked-by: Kees Cook 

-Kees

> ---
>  lib/seq_buf.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> I recently merged a patch which actually hit this behaviour. I worked
> around it by using seq_buf_printf(), but it would be good to fix the
> problem at the source.
>
> diff --git a/lib/seq_buf.c b/lib/seq_buf.c
> index 11f2ae0f9099..b1570204cde3 100644
> --- a/lib/seq_buf.c
> +++ b/lib/seq_buf.c
> @@ -144,9 +144,13 @@ int seq_buf_puts(struct seq_buf *s, const char *str)
>
> WARN_ON(s->size == 0);
>
> +   /* Add 1 to len for the trailing NULL which must be there */
> +   len += 1;
> +
> if (seq_buf_can_fit(s, len)) {
> memcpy(s->buffer + s->len, str, len);
> -   s->len += len;
> +   /* Don't count the trailing NULL against the capacity */
> +   s->len += len - 1;
> return 0;
> }
> seq_buf_set_overflow(s);
> --
> 2.17.1
>



-- 
Kees Cook
Pixel Security


Re: [RFC 00/60] Coscheduling for Linux

2018-10-18 Thread Subhra Mazumdar

Hi Jan,

On 9/7/18 2:39 PM, Jan H. Schönherr wrote:

The collective context switch from one coscheduled set of tasks to another
-- while fast -- is not atomic. If a use-case needs the absolute guarantee
that all tasks of the previous set have stopped executing before any task
of the next set starts executing, an additional hand-shake/barrier needs to
be added.


Do you know how much is the delay? i.e what is overlap time when a thread
of new group starts executing on one HT while there is still thread of
another group running on the other HT?

Thanks,
Subhra


Re: [GIT PULL] loadpin updates for security-next

2018-10-18 Thread James Morris
On Thu, 18 Oct 2018, Kees Cook wrote:

> Hi James,
> 
> Please pull these loadpin changes for security-next. This is a small
> reporting improvement and the param change needed for the ordering
> series (but since the loadpin change is desired and separable, I'm
> putting it here).
> 
> Thanks!
> 
> -Kees
> 
> The following changes since commit 57361846b52bc686112da6ca5368d11210796804:
> 
>   Linux 4.19-rc2 (2018-09-02 14:37:30 -0700)
> 
> are available in the Git repository at:
> 
>   https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git 
> tags/loadpin-security-next
> 

Thanks, pulled to next-loadpin and next-testing.

-- 
James Morris




Re: [PATCH v3] mm: memcontrol: Don't flood OOM messages with no eligible task.

2018-10-18 Thread Tetsuo Handa
Petr Mladek wrote:
> This looks very complex and I see even more problems:
> 
>   + You would need to update the rate limit intervals when
> new console is attached. Note that the ratelimits might
> get initialized early during boot. It might be solvable
> but ...
> 
>   + You might need to update the length of the message
> when the text (code) is updated. It might be hard
> to maintain.

I assumed we calculate the average dynamically, for the amount of
messages printed by an OOM event is highly unstable (depends on
hardware configuration such as number of nodes, number of zones,
and how many processes are there as a candidate for OOM victim).

> 
>   + You would need to take into account also console_level
> and the level of the printed messages

Isn't that counted by call_console_drivers() ?

> 
>   + This approach does not take into account all other
> messages that might be printed by other subsystems.

Yes. And I wonder whether unlimited printk() alone is the cause of RCU
stalls. I think that printk() is serving as an amplifier for any CPU users.
That is, the average speed might not be safe enough to guarantee that RCU
stalls won't occur. Then, there is no safe average value we can use.

> 
> 
> I have just talked with Michal in person. He pointed out
> that we primary need to know when and if the last printed
> message already reached the console.

I think we can estimate when call_console_drivers() started/completed for
the last time as when and if the last printed message already reached the
console. Sometimes callers might append to the logbuf without waiting for
completion of call_console_drivers(), but the system is already unusable
if majority of ratelimited printk() users hit that race window.

> 
> A solution might be to handle printk and ratelimit together.
> For example:
> 
>+ store log_next_idx of the printed message into
>  the ratelimit structure
> 
>+ eventually store pointer of the ratelimit structure
>  into printk_log
> 
>+ eventually store the timestamp when the message
>  reached the console into the ratelimit structure
> 
> Then the ratelimited printk might take into acount whether
> the previous message already reached console and even when.

If printk() becomes asynchronous (e.g. printk() kernel thread), we would
need to use something like srcu_synchronize() so that the caller waits for
only completion of messages the caller wants to wait.

> 
> 
> Well, this is still rather complex. I am not persuaded that
> it is worth it.
> 
> I suggest to take a breath, stop looking for a perfect solution
> for a bit. The existing ratelimit might be perfectly fine
> in practice. You might always create stress test that would
> fail but the test might be far from reality. Any complex
> solution might bring more problems that solve.
> 
> Console full of repeated messages need not be a catastrophe
> when it helps to fix the problem and the system is usable
> and need a reboot anyway.

I wish that memcg OOM events do not use printk(). Since memcg OOM is not
out of physical memory, we can dynamically allocate physical memory for
holding memcg OOM messages and let the userspace poll it via some interface.


linux-next: build failure after merge of the net-next tree

2018-10-18 Thread Stephen Rothwell
Hi all,

After merging the net-next tree, today's linux-next build (x86_64
allmodconfig) failed like this:

drivers/infiniband/hw/mlx5/flow.c: In function 
'mlx5_ib_handler_MLX5_IB_METHOD_CREATE_FLOW':
drivers/infiniband/hw/mlx5/flow.c:163:12: error: 'struct mlx5_flow_act' has no 
member named 'has_flow_tag'; did you mean 'flow_tag'?
   flow_act.has_flow_tag = true;
^~~~
flow_tag

Caused by commit

  d5634fee245f ("net/mlx5: Add a no-append flow insertion mode")

interacting with commit

  ba4a41198324 ("RDMA/mlx5: Add support for flow tag to raw create flow")

from the rdma tree.

I have applied the following merge fix patch for today:

From: Stephen Rothwell 
Date: Fri, 19 Oct 2018 11:10:39 +1100
Subject: [PATCH] net/mlx5: fix up for has_flow_tag changing to a flag

Signed-off-by: Stephen Rothwell 
---
 drivers/infiniband/hw/mlx5/flow.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/mlx5/flow.c 
b/drivers/infiniband/hw/mlx5/flow.c
index e57435cb6d96..f86cdcafdafc 100644
--- a/drivers/infiniband/hw/mlx5/flow.c
+++ b/drivers/infiniband/hw/mlx5/flow.c
@@ -160,7 +160,7 @@ static int UVERBS_HANDLER(MLX5_IB_METHOD_CREATE_FLOW)(
ret = -EINVAL;
goto err_out;
}
-   flow_act.has_flow_tag = true;
+   flow_act.flags |= FLOW_ACT_HAS_TAG;
}
 
flow_handler = mlx5_ib_raw_fs_rule_add(dev, fs_matcher, _act,
-- 
2.18.0

-- 
Cheers,
Stephen Rothwell


pgpDOumJaFBhY.pgp
Description: OpenPGP digital signature


Re: [PATCH] RDMA/hns: Use 64-bit arithmetic instead of 32-bit

2018-10-18 Thread Doug Ledford
On Thu, 2018-10-18 at 14:01 +0300, Leon Romanovsky wrote:
> On Thu, Oct 18, 2018 at 10:02:58AM +0200, Gustavo A. R. Silva wrote:
> > Cast *max_num_sg* to u64 in order to give the compiler complete
> > information about the proper arithmetic to use.
> > 
> > Notice that such variable is used in a context that expects an
> > expression of type u64 (64 bits, unsigned) and the following
> > expression is currently being evaluated using 32-bit
> > arithmetic:
> 
> And what is wrong with that?
> Please fix static analyzer tool instead of fixing proper C code.

Judging on the static analyzer tool's message, I don't see anything
wrong with it.  The code contains a potential unintentional overflow
error.  The author might have been well aware of the overflow and not
cared and in that case this is valid C, but the analyzer has no way of
knowing that, so it flags it for review.  To silence the checker you
could either cast the arithmetic to u64, or cast length to u32.  Either
would clear up the ambiguity.  I guess I'm not seeing why you would
blame the static checker in this case, it did the best it is possible
for it to do.

> Thanks
> 
> > 
> > length = max_num_sg * page_size;
> > 
> > Addresses-Coverity-ID: 1474517 ("Unintentional integer overflow")
> > Signed-off-by: Gustavo A. R. Silva 
> > ---
> >  drivers/infiniband/hw/hns/hns_roce_mr.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/infiniband/hw/hns/hns_roce_mr.c 
> > b/drivers/infiniband/hw/hns/hns_roce_mr.c
> > index 521ad2a..d479d5e 100644
> > --- a/drivers/infiniband/hw/hns/hns_roce_mr.c
> > +++ b/drivers/infiniband/hw/hns/hns_roce_mr.c
> > @@ -1219,7 +1219,7 @@ struct ib_mr *hns_roce_alloc_mr(struct ib_pd *pd, 
> > enum ib_mr_type mr_type,
> > int ret;
> > 
> > page_size = 1 << (hr_dev->caps.pbl_buf_pg_sz + PAGE_SHIFT);
> > -   length = max_num_sg * page_size;
> > +   length = (u64)max_num_sg * page_size;
> > 
> > if (mr_type != IB_MR_TYPE_MEM_REG)
> > return ERR_PTR(-EINVAL);
> > --
> > 2.7.4
> > 

-- 
Doug Ledford 
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


signature.asc
Description: This is a digitally signed message part


Re:Business proposition for you

2018-10-18 Thread Melvin Greg
Hello, 


I have a client from Syrian who will like to invest with your 
company. My client is willing to invest $4 Million. Can I have 
your company website to show to my client your company so that 
they will check and decide if they will invest there funds with 
you as joint partner. 

This information is needed urgently.

Please reply. 

Best Regards,
Agent Melvin Greg
Tel:+1 4045966532


Re: [[PATCH] 8/9] DMA-UART-Driver-for-AST2500

2018-10-18 Thread Benjamin Herrenschmidt
On Thu, 2018-10-18 at 15:25 +0530, Vinod wrote:
> 
> > It's not a dmaengine driver. It's a serial UART driver that happens to
> > use a dedicated DMA engine.
> 
> Then I see no reason for it to use dmaengine APIs. The framework allows
> people to share a controller for many clients, but if you have dedicated
> one then you may use it directly

Well... the engine is shared by a few UARTs, they have dedicated rings
but there's a common set of regs for interrupt handling etc.

That said, I still think it could be contained within a UART driver,
there's little benefit in adding the framework overhead, esp since
these are really weak cores, any overhead will be felt.

Ben.

> > It's unclear whether it should be split into two drivers, or just have
> > the serial driver directly use the dma engine since that engine is
> > dedicated in HW to only work on those UARTs and nothing else...
> > 
> > Cheers,
> > Ben.
> > 
> > 
> > > While doing resubmission please take some time to understand subsystem
> > > tags to use. (hint git log  will tell you)
> > > 
> > > Also series has [[PATCH] 8/9] whereas it should be [PATCH 8/9] please
> > > let git generate that for you (hint git format-patch start..end does a
> > > good job)
> > > 
> > > > @@ -0,0 +1,1594 @@
> > > > +// SPDX-License-Identifier: GPL-2.0
> > > > +/*
> > > > + *   drivers/tty/serial/8250/8250_aspeed_uart_dma.c
> > > > + *1. 2018/07/01 Shivah Shankar created
> > > > + *2. 2018/08/25 sudheer.veliseti modified
> > > 
> > > we dont use this log in kernel. I do not see s-o-b by Shivah, that
> > > should be added. I think he should be author and you need to list
> > > changes you did..
> > > 
> 
> 



Re: [PATCH v3] of: overlay: user space synchronization

2018-10-18 Thread Frank Rowand
On 10/18/18 12:32, Rob Herring wrote:
> On Tue, Oct 16, 2018 at 05:34:26PM -0700, frowand.l...@gmail.com wrote:
>> From: Frank Rowand 
>>
>> When an overlay is applied or removed, the live devicetree visible in
>> /proc/device-tree/, aka /sys/firmware/devicetree/base/, reflects the
>> changes.  There is no method for user space to determine whether the
>> live devicetree was modified by overlay actions.
> 
> Because userspace has no way to modify the DT and the ways the kernel 
> can do modifications is limited.
> 
> Do you have an actually need for this outside of testing/development?

I do not know if anyone uses /proc/device-tree for anything outside of
testing/development.  If we believe that there is no other use of
/proc/device-tree we can simply document that there is no expectation
that accessors will see a consistent, unchanging /proc/device-tree.

That would be a much smaller patch.


>> Provide a sysfs file, /sys/firmware/devicetree/tree_version,  to allow
>> user space to determine if the live devicetree has remained unchanged
>> while a series of one or more accesses of /proc/device-tree/ occur.
>>
>> The use of both (1) dynamic devicetree modifications and (2) overlay
>> apply and removal are not supported during the same boot cycle.  Thus
>> non-overlay dynamic modifications are not reflected in the value of
>> tree_version.
> 
> I'd prefer to see some sort of information on overlays exported and user 
> space can check if that changed. IIRC, Pantelis had a series to do that 
> along with a kill switch to prevent further modifications. At least some 
> of that series only had minor issues to fix.

The kill switch addresses a different concern, which was from the security
community.  The kill switch is on my todo list.

I don't remember exactly what info the overlay information export patch
provided.  I'll have to go find it and re-read it.


> Also, shouldn't we get uevents if the tree changes? Maybe that's not 

Yes (off the top of my head).  But a shell script accessing /proc/device-tree
is not going to get uevents.


> guaranteed, but I'd bet we can't handle cases where we don't get events. 
> A property added to an existing node comes to mind.> 
> Rob
> 



Re: [PATCH] Input: pm8941-pwrkey - Add pms405 pwrkey support

2018-10-18 Thread Dmitry Torokhov
On Thu, Oct 18, 2018 at 10:54:37AM +0530, Vinod wrote:
> On 19-09-18, 18:49, Bjorn Andersson wrote:
> > From: Vinod Koul 
> > 
> > Update the binding and driver for pms405 pwrkey.
> 
> Rob, Dmitry
> 
> Gentle reminder for this patch...
> 
> > Signed-off-by: Vinod Koul 
> > Signed-off-by: Bjorn Andersson 
> > ---
> >  Documentation/devicetree/bindings/input/qcom,pm8941-pwrkey.txt | 1 +
> >  drivers/input/misc/pm8941-pwrkey.c | 1 +
> >  2 files changed, 2 insertions(+)
> > 
> > diff --git a/Documentation/devicetree/bindings/input/qcom,pm8941-pwrkey.txt 
> > b/Documentation/devicetree/bindings/input/qcom,pm8941-pwrkey.txt
> > index 34ab5763f494..736fba3bad54 100644
> > --- a/Documentation/devicetree/bindings/input/qcom,pm8941-pwrkey.txt
> > +++ b/Documentation/devicetree/bindings/input/qcom,pm8941-pwrkey.txt
> > @@ -8,6 +8,7 @@ PROPERTIES
> > Definition: must be one of:
> > "qcom,pm8941-pwrkey"
> > "qcom,pm8941-resin"
> > +   "qcom,pms405-pwrkey"
> >  
> >  - reg:
> > Usage: required
> > diff --git a/drivers/input/misc/pm8941-pwrkey.c 
> > b/drivers/input/misc/pm8941-pwrkey.c
> > index 48153e0ca19a..fccf63263c1c 100644
> > --- a/drivers/input/misc/pm8941-pwrkey.c
> > +++ b/drivers/input/misc/pm8941-pwrkey.c
> > @@ -317,6 +317,7 @@ static const struct pm8941_data resin_data = {
> >  static const struct of_device_id pm8941_pwr_key_id_table[] = {
> > { .compatible = "qcom,pm8941-pwrkey", .data = _data },
> > { .compatible = "qcom,pm8941-resin", .data = _data },
> > +   { .compatible = "qcom,pms405-pwrkey", .data = _data },

I am sure I asked this question before (in context of a different
driver), but why do we need this compatible if we already have
pm8941-pwrkey compatible? Isn't pms405-pwrkey compatible with
pm8941-pwrkey as far as power key block goes? In which cases do we need
new compatibles and when can we reuse existing ones? Rob?

Thanks.

-- 
Dmitry


Re: [PATCH v3] mm: memcontrol: Don't flood OOM messages with no eligible task.

2018-10-18 Thread Sergey Senozhatsky
On (10/18/18 20:58), Tetsuo Handa wrote:
> > 
> > A knob might do.
> > As well as /proc/sys/kernel/printk tweaks, probably. One can even add
> > echo "a b c d" > /proc/sys/kernel/printk to .bashrc and adjust printk
> > console levels on login and rollback to old values in .bash_logout
> > May be.
> 
> That can work for only single login with root user case.
> Not everyone logs into console as root user.

Add sudo ;)

> It is pity that we can't send kernel messages to only selected consoles
> (e.g. all messages are sent to netconsole, but only critical messages are
> sent to local consoles).

OK, that's a fair point. There was a patch from FB, which would allow us
to set a log_level on per-console basis. So the noise goes to heav^W net
console; only critical stuff goes to the serial console (if I recall it
correctly). I'm not sure what happened to that patch, it was a while ago.
I'll try to find that out.

[..]
> That boils down to a "user interaction" problem.
> Not limiting
> 
>   "%s invoked oom-killer: gfp_mask=%#x(%pGg), nodemask=%*pbl, order=%d, 
> oom_score_adj=%hd\n"
>   "Out of memory and no killable processes...\n"
> 
> is very annoying.
> 
> And I really can't understand why Michal thinks "handling this requirement" as
> "make the code more complex than necessary and squash different things 
> together".

Michal is trying very hard to address the problem in a reasonable way.
The problem you are talking about is not MM specific. You can have a
faulty SCSI device, corrupted FS, and so and on.

-ss


Re: [PATCH v2 5/5] arm64: dts: qcom: sdm845: Add USB PHY lane two

2018-10-18 Thread Doug Anderson
Hi,

On Thu, Oct 18, 2018 at 2:10 PM Evan Green  wrote:
>
> This change adds the second lane registers for the USB PHY, now that the
> QMP phy bindings have been updated. This way the driver can stop
> reaching beyond its register region to get at the second lane.
>
> Signed-off-by: Evan Green 
> ---
>  arch/arm64/boot/dts/qcom/sdm845.dtsi | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi 
> b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> index 9c72edb678ec..f28c50e93f5a 100644
> --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
> +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> @@ -1188,10 +1188,12 @@
>  < GCC_USB3_PHY_PRIM_BCR>;
> reset-names = "phy", "common";
>
> -   usb_1_ssphy: lane@88e9200 {
> +   usb_1_ssphy: lanes@88e9200 {
> reg = <0x88e9200 0x128>,
>   <0x88e9400 0x200>,
>   <0x88e9c00 0x218>,
> + <0x88e9600 0x128>,
> + <0x88e9800 0x200>,
>   <0x88e9a00 0x100>;
> #phy-cells = <0>;
> clocks = < GCC_USB3_PRIM_PHY_PIPE_CLK>;
> @@ -1219,10 +1221,12 @@
>  < GCC_USB3_PHY_SEC_BCR>;
> reset-names = "phy", "common";
>
> -   usb_2_ssphy: lane@88eb200 {
> +   usb_2_ssphy: lanes@88eb200 {
> reg = <0x88eb200 0x128>,
>   <0x88eb400 0x1fc>,
>   <0x88eb800 0x218>,
> + <0x88eb600 0x128>,
> + <0x88eb800 0x1fc>,

This is wrong and needs to be fixed.  The
"qcom,sdm845-qmp-usb3-uni-phy" is not dual-lane and thus shouldn't
have tx2/rx2.  The driver knows "qcom,sdm845-qmp-usb3-uni-phy" is not
dual lane and thus will try to map tx2 as pcs_misc.


Also overall I'll continue to point out that this patch needs to land
_after_ the driver fix since the new driver can cope with either style
device tree (at least temporarily) but the old driver can't cope with
the new device tree.  See patch #2 comments for details.

-Doug


Re: [PATCH v2 4/5] arm64: dts: qcom: sdm845: Add UFS nodes for sdm845-mtp

2018-10-18 Thread Doug Anderson
Hi,

On Thu, Oct 18, 2018 at 2:10 PM Evan Green  wrote:
>
> From: Can Guo 
>
> This change enables the UFS host controller and PHY on sdm845-mtp.
>
> Signed-off-by: Can Guo 
> Signed-off-by: Evan Green 
> Reviewed-by: Vivek Gautam 
> ---
>  arch/arm64/boot/dts/qcom/sdm845-mtp.dts | 14 ++
>  1 file changed, 14 insertions(+)

Reviewed-by: Douglas Anderson 


Re: [PATCH v2 3/5] arm64: dts: qcom: sdm845: add UFS controller

2018-10-18 Thread Doug Anderson
Hi,
On Thu, Oct 18, 2018 at 2:10 PM Evan Green  wrote:
>
> This change adds the UFS controller and PHY to SDM845.
>
> Signed-off-by: Evan Green 
> Signed-off-by: Douglas Anderson 
> ---
>  arch/arm64/boot/dts/qcom/sdm845.dtsi | 67 
> 
>  1 file changed, 67 insertions(+)

I'm already in the Signed-off-by since I did some touching of this
patch, so not sure it makes sense to provide my Reviewed-by on this
patch.  ...but I'll at least make the note that it seems good and
matches my understanding


As mentioned in patch #2 ideally this patch should land after the
driver fix so the old driver doesn't try to use the "tx2" range as
"pcs_misc".

-Doug


Re: [PATCH v2 2/5] phy: qcom-qmp: Utilize fully-specified DT registers

2018-10-18 Thread Doug Anderson
Hi,

On Thu, Oct 18, 2018 at 2:09 PM Evan Green  wrote:
>
> This change utilizes the newly fixed up DT bindings to get the tx2
> and rx2 register regions for the second lane of dual-lane PHYs. Before
> this change, the driver was simply using lane one's register region and
> adding 0x400, which reached well beyond the DT-specified register
> allocation. This would have been a crash were it not for the page size on
> ARM64. Fix the driver not to rely on the magic of virtual memory by using
> the newly specified DT register regions for tx2 and rx2.
>
> In order to support existing device trees, this change also contains a
> fallback mode for when those new register regions don't exist, which
> reverts to the original behavior of overreaching and prints a complaint.
>
> Signed-off-by: Evan Green 
> ---
>  drivers/phy/qualcomm/phy-qcom-qmp.c | 51 
> +++--
>  1 file changed, 38 insertions(+), 13 deletions(-)

Looks good to me.  This seems like the right compromise for moving
forward to fix the problem.

A few notes:

* The only current SoC that uses tx2/rx2 is SDM845 and the support of
that SoC is in its infancy in mainline.  Thus I don't mind that using
existing device trees with this new patch will print a warning and
continue to take advantage of the Linux "bug" that the whole page
could be accessed.  We should fix the old DTS files ASAP and then
remove the nod to backward compatibility in the code.

* This should land _before_ dts patches land.  If dts changes land
without this change then the old code will end up mapping tx2 and
using it as pcsmisc.  ...so either we'll need to delay landing the
device tree patch by one major version or Andy / Kishon will need to
coordinate.


Reviewed-by: Douglas Anderson 


Re: [PATCH v2 1/5] dt-bindings: phy-qcom-qmp: Fix register underspecification

2018-10-18 Thread Doug Anderson
Hi,

On Thu, Oct 18, 2018 at 2:09 PM Evan Green  wrote:
>
> This change adds register regions for the second lane of dual-lane nodes.
> This additional specification is needed so that the driver can stop
> reaching beyond the tx and rx register allocations to get at the
> second lane registers in a dual-lane PHY.
>
> While in there, document #clock-cells as optional for PHYs that don't
> provide a pipe clock. Also, document the pcs_misc register region, which
> was being quietly supplied and used.
>
> Signed-off-by: Evan Green 
>
> ---
> This applies atop linux-next 20181018 with the addition of Doug's
> changes [1] and [2].
>
> [1] 
> https://lore.kernel.org/lkml/20181012213632.252346-1-diand...@chromium.org/
> [2] 
> https://lore.kernel.org/lkml/20181012213926.253765-1-diand...@chromium.org/
>
>  .../devicetree/bindings/phy/qcom-qmp-phy.txt   | 73 
> +++---
>  1 file changed, 65 insertions(+), 8 deletions(-)

This all makes sense to me and seems like the right compromise to make

The only current SoC that uses tx2/rx2 is SDM845 and the support of
that SoC is in its infancy in mainline.  Thus I don't mind that we say
that all 5 registers are "required" even though there is an existing
device tree out there that don't include tx2/rx2 for USB.  Currently
patch #2 in this series still makes old device trees "work" (as well
as they used to) but I'm all for that being very temporary code and
that officially tx2/rx2 are not optional.


Reviewed-by: Douglas Anderson 


Re: [PATCH] ata: add Buddha PATA controller driver

2018-10-18 Thread Michael Schmitz

Hi Bartlomiej,

On 19/10/18 01:29, Bartlomiej Zolnierkiewicz wrote:

Add Buddha PATA controller driver. It enables libata support for
the Buddha, Catweasel and X-Surf expansion boards on the Zorro
expansion bus.

Cc: John Paul Adrian Glaubitz 
Cc: Michael Schmitz 
Cc: Geert Uytterhoeven 
Signed-off-by: Bartlomiej Zolnierkiewicz 
---
John, please test if possible..

  drivers/ata/Kconfig   |   12 ++
  drivers/ata/Makefile  |1
  drivers/ata/pata_buddha.c |  257 
++
  3 files changed, 270 insertions(+)

Index: b/drivers/ata/Kconfig
===
--- a/drivers/ata/Kconfig   2018-10-18 14:18:02.766452406 +0200
+++ b/drivers/ata/Kconfig   2018-10-18 14:18:02.766452406 +0200
@@ -965,6 +965,18 @@ config PATA_GAYLE
  
  	  If unsure, say N.
  
+config PATA_BUDDHA

+   tristate "Buddha/Catweasel/X-Surf PATA support"
+   depends on ZORRO
+   help
+ This option enables support for the IDE interfaces
+ on the Buddha, Catweasel and X-Surf expansion boards
+ on the Zorro expansion bus. It supports up to two
+ interfaces on the Buddha, three on the Catweasel and
+ two on the X-Surf.
+
+ If unsure, say N.
+
  config PATA_ISAPNP
tristate "ISA Plug and Play PATA support"
depends on ISAPNP
Index: b/drivers/ata/Makefile
===
--- a/drivers/ata/Makefile  2018-10-18 14:18:02.766452406 +0200
+++ b/drivers/ata/Makefile  2018-10-18 14:18:02.766452406 +0200
@@ -98,6 +98,7 @@ obj-$(CONFIG_PATA_WINBOND)+= pata_sl82c
  obj-$(CONFIG_PATA_CMD640_PCI) += pata_cmd640.o
  obj-$(CONFIG_PATA_FALCON) += pata_falcon.o
  obj-$(CONFIG_PATA_GAYLE)  += pata_gayle.o
+obj-$(CONFIG_PATA_BUDDHA)  += pata_buddha.o
  obj-$(CONFIG_PATA_ISAPNP) += pata_isapnp.o
  obj-$(CONFIG_PATA_IXP4XX_CF)  += pata_ixp4xx_cf.o
  obj-$(CONFIG_PATA_MPIIX)  += pata_mpiix.o
Index: b/drivers/ata/pata_buddha.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ b/drivers/ata/pata_buddha.c 2018-10-18 14:23:41.054460925 +0200
@@ -0,0 +1,257 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Buddha, Catweasel and X-Surf PATA controller driver
+ *
+ * Copyright (c) 2018 Samsung Electronics Co., Ltd.
+ * http://www.samsung.com
+ *
+ * Based on buddha.c:
+ *
+ * Copyright (C) 1997, 2001 by Geert Uytterhoeven and others
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+#define DRV_NAME "pata_buddha"
+#define DRV_VERSION "0.1.0"
+
+#define BUDDHA_BASE1   0x800
+#define BUDDHA_BASE2   0xa00
+#define BUDDHA_BASE3   0xc00
+#define XSURF_BASE10xb000 /* 2.5" interface */
+#define XSURF_BASE20xd000 /* 3.5" interface */
+#define BUDDHA_CONTROL 0x11a
+#define BUDDHA_IRQ 0xf00
+#define XSURF_IRQ  0x7e
+#define BUDDHA_IRQ_MR  0xfc0   /* master interrupt enable */
+
+enum {
+   BOARD_BUDDHA = 0,
+   BOARD_CATWEASEL,
+   BOARD_XSURF
+};
+
+static unsigned int buddha_bases[3] __initdata = {
+   BUDDHA_BASE1, BUDDHA_BASE2, BUDDHA_BASE3
+};
+
+static unsigned int xsurf_bases[2] __initdata = {
+   XSURF_BASE1, XSURF_BASE2
+};
+
+static struct scsi_host_template pata_buddha_sht = {
+   ATA_PIO_SHT(DRV_NAME),
+};
+
+/* FIXME: is this needed? */
+static unsigned int pata_buddha_data_xfer(struct ata_queued_cmd *qc,
+unsigned char *buf,
+unsigned int buflen, int rw)
+{
+   struct ata_device *dev = qc->dev;
+   struct ata_port *ap = dev->link->ap;
+   void __iomem *data_addr = ap->ioaddr.data_addr;
+   unsigned int words = buflen >> 1;
+
+   /* Transfer multiple of 2 bytes */
+   if (rw == READ)
+   raw_insw((u16 *)data_addr, (u16 *)buf, words);
+   else
+   raw_outsw((u16 *)data_addr, (u16 *)buf, words);
+
+   /* Transfer trailing byte, if any. */
+   if (unlikely(buflen & 0x01)) {
+   unsigned char pad[2] = { };
+
+   /* Point buf to the tail of buffer */
+   buf += buflen - 1;
+
+   if (rw == READ) {
+   raw_insw((u16 *)data_addr, (u16 *)pad, 1);
+   *buf = pad[0];
+   } else {
+   pad[0] = *buf;
+   raw_outsw((u16 *)data_addr, (u16 *)pad, 1);
+   }
+   words++;
+   }
+
+   return words << 1;
+}
+
+/*
+ * Provide our own set_mode() as we don't want to change anything that has
+ * already been configured..
+ */
+static int pata_buddha_set_mode(struct ata_link *link,
+   struct ata_device **unused)
+{
+   struct ata_device *dev;
+
+   

[PATCH 1/2] dt-bindings: serial: add documentation for the SiFive UART driver

2018-10-18 Thread Paul Walmsley
Add DT binding documentation for the Linux driver for the SiFive
asynchronous serial IP block.  Nothing too exotic.

Cc: linux-ser...@vger.kernel.org
Cc: devicet...@vger.kernel.org
Cc: linux-ri...@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman 
Cc: Rob Herring 
Cc: Mark Rutland 
Cc: Palmer Dabbelt 
Signed-off-by: Paul Walmsley 
Signed-off-by: Paul Walmsley 
---
 .../bindings/serial/sifive-serial.txt | 21 +++
 1 file changed, 21 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/serial/sifive-serial.txt

diff --git a/Documentation/devicetree/bindings/serial/sifive-serial.txt 
b/Documentation/devicetree/bindings/serial/sifive-serial.txt
new file mode 100644
index ..8982338512f5
--- /dev/null
+++ b/Documentation/devicetree/bindings/serial/sifive-serial.txt
@@ -0,0 +1,21 @@
+SiFive asynchronous serial interface (UART)
+
+Required properties:
+
+- compatible: should be "sifive,fu540-c000-uart0" or "sifive,uart0"
+- reg: address and length of the register space
+- interrupt-parent: should contain a phandle pointing to the SoC interrupt
+controller device node that the UART interrupts are connected to
+- interrupts: Should contain the UART interrupt identifier
+- clocks: Should contain a clock identifier for the UART's parent clock
+
+
+Example:
+
+uart0: serial@1001 {
+   compatible = "sifive,uart0";
+   interrupt-parent = <>;
+   interrupts = <80>;
+   reg = <0x0 0x1001 0x0 0x1000>;
+   clocks = < PRCI_CLK_TLCLK>;
+};
-- 
2.19.1



[PATCH 2/2] tty: serial: add driver for the SiFive UART

2018-10-18 Thread Paul Walmsley
Add a serial driver for the SiFive UART, found on SiFive FU540 devices
(among others).

The underlying serial IP block is relatively basic, and currently does
not support serial break detection.  Further information on the IP
block can be found in the documentation and Chisel sources:

https://static.dev.sifive.com/FU540-C000-v1.0.pdf

https://github.com/sifive/sifive-blocks/tree/master/src/main/scala/devices/uart

This driver was written in collaboration with Wesley Terpstra
.

Boot-tested on a SiFive HiFive Unleashed A00 board.

Cc: Greg Kroah-Hartman 
Cc: Jiri Slaby 
Cc: Palmer Dabbelt 
Cc: Wesley Terpstra 
Cc: linux-ser...@vger.kernel.org
Cc: linux-ri...@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Paul Walmsley 
Signed-off-by: Paul Walmsley 
---
 drivers/tty/serial/Kconfig   |   24 +
 drivers/tty/serial/Makefile  |1 +
 drivers/tty/serial/sifive.c  | 1070 ++
 include/uapi/linux/serial_core.h |3 +
 4 files changed, 1098 insertions(+)
 create mode 100644 drivers/tty/serial/sifive.c

diff --git a/drivers/tty/serial/Kconfig b/drivers/tty/serial/Kconfig
index df8bd0c7b97d..fdd624df75c8 100644
--- a/drivers/tty/serial/Kconfig
+++ b/drivers/tty/serial/Kconfig
@@ -1060,6 +1060,30 @@ config SERIAL_OMAP_CONSOLE
  your boot loader about how to pass options to the kernel at
  boot time.)
 
+config SERIAL_SIFIVE
+   tristate "SiFive UART support"
+   depends on OF
+   select SERIAL_CORE
+   help
+ Select this option if you are building a kernel for a device that
+ contains a SiFive UART IP block.  This type of UART is present on
+ SiFive FU540 SoCs, among others.
+
+config SERIAL_SIFIVE_CONSOLE
+   bool "Console on SiFive UART"
+   depends on SERIAL_SIFIVE=y
+   select SERIAL_CORE_CONSOLE
+   help
+ Select this option if you would like to use a SiFive UART as the
+ system console.
+
+ Even if you say Y here, the currently visible virtual console
+ (/dev/tty0) will still be used as the system console by default, but
+ you can alter that using a kernel command line option such as
+ "console=ttySIFx". (Try "man bootparam" or see the documentation of
+ your boot loader about how to pass options to the kernel at
+ boot time.)
+
 config SERIAL_LANTIQ
bool "Lantiq serial driver"
depends on LANTIQ
diff --git a/drivers/tty/serial/Makefile b/drivers/tty/serial/Makefile
index daac675612df..7e906d3c0455 100644
--- a/drivers/tty/serial/Makefile
+++ b/drivers/tty/serial/Makefile
@@ -89,6 +89,7 @@ obj-$(CONFIG_SERIAL_MVEBU_UART)   += mvebu-uart.o
 obj-$(CONFIG_SERIAL_PIC32) += pic32_uart.o
 obj-$(CONFIG_SERIAL_MPS2_UART) += mps2-uart.o
 obj-$(CONFIG_SERIAL_OWL)   += owl-uart.o
+obj-$(CONFIG_SERIAL_SIFIVE)+= sifive.o
 
 # GPIOLIB helpers for modem control lines
 obj-$(CONFIG_SERIAL_MCTRL_GPIO)+= serial_mctrl_gpio.o
diff --git a/drivers/tty/serial/sifive.c b/drivers/tty/serial/sifive.c
new file mode 100644
index ..12232e74c464
--- /dev/null
+++ b/drivers/tty/serial/sifive.c
@@ -0,0 +1,1070 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * SiFive UART driver
+ * Copyright (C) 2018 Paul Walmsley 
+ * Copyright (C) 2018 SiFive
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * Based partially on:
+ * - drivers/tty/serial/pxa.c
+ * - drivers/tty/serial/amba-pl011.c
+ * - drivers/tty/serial/uartlite.c
+ * - drivers/tty/serial/omap-serial.c
+ * - drivers/pwm/pwm-sifive.c
+ *
+ * See the following sources for further documentation:
+ * - Chapter 19 "Universal Asynchronous Receiver/Transmitter (UART)" of
+ *   SiFive FE310-G000 v2p3
+ * - The tree/master/src/main/scala/devices/uart directory of
+ *   https://github.com/sifive/sifive-blocks/
+ *
+ * The SiFive UART design is not 8250-compatible.  The following common
+ * features are not supported:
+ * - Word lengths other than 8 bits
+ * - Break handling
+ * - Parity
+ * - Flow control
+ * - Modem signals (DSR, RI, etc.)
+ * On the other hand, the design is free from the baggage of the 8250
+ * programming model.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * Register offsets
+ */
+
+/* TXDATA */
+#define SIFIVE_SERIAL_TXDATA_OFFS  0x0
+#define SIFIVE_SERIAL_TXDATA_FULL_SHIFT31
+#define 

[GIT] Sparc

2018-10-18 Thread David Miller


The main bit here is fixing how fallback system calls are handled in
the sparc vDSO.

Unfortunately, I fat fingered the commit and some perf debugging hacks
slipped into the vDSO fix, which I revert in the very next commit.

Please pull, thanks a lot!

The following changes since commit c343db455eb3105f11bb5ac290d77ab2006b0209:

  Merge branch 'parisc-4.19-3' of 
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux (2018-10-17 
14:01:00 +0200)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc.git 

for you to fetch changes up to 27faeebd00813cd52a140e8944b6b32729b50e61:

  sparc: Revert unintended perf changes. (2018-10-18 11:32:29 -0700)


Dan Carpenter (1):
  sparc: vDSO: Silence an uninitialized variable warning

David S. Miller (2):
  sparc: Fix syscall fallback bugs in VDSO.
  sparc: Revert unintended perf changes.

 arch/sparc/vdso/vclock_gettime.c | 12 +++-
 arch/sparc/vdso/vma.c|  4 +++-
 2 files changed, 14 insertions(+), 2 deletions(-)


Re: [PATCH] hugetlbfs: dirty pages as they are added to pagecache

2018-10-18 Thread Mike Kravetz
On 10/18/18 4:08 PM, Andrew Morton wrote:
> On Wed, 17 Oct 2018 21:10:22 -0700 Mike Kravetz  
> wrote:
> 
>> Some test systems were experiencing negative huge page reserve
>> counts and incorrect file block counts.  This was traced to
>> /proc/sys/vm/drop_caches removing clean pages from hugetlbfs
>> file pagecaches.  When non-hugetlbfs explicit code removes the
>> pages, the appropriate accounting is not performed.
>>
>> This can be recreated as follows:
>>  fallocate -l 2M /dev/hugepages/foo
>>  echo 1 > /proc/sys/vm/drop_caches
>>  fallocate -l 2M /dev/hugepages/foo
>>  grep -i huge /proc/meminfo
>>AnonHugePages: 0 kB
>>ShmemHugePages:0 kB
>>HugePages_Total:2048
>>HugePages_Free: 2047
>>HugePages_Rsvd:18446744073709551615
>>HugePages_Surp:0
>>Hugepagesize:   2048 kB
>>Hugetlb: 4194304 kB
>>  ls -lsh /dev/hugepages/foo
>>4.0M -rw-r--r--. 1 root root 2.0M Oct 17 20:05 /dev/hugepages/foo
>>
>> To address this issue, dirty pages as they are added to pagecache.
>> This can easily be reproduced with fallocate as shown above. Read
>> faulted pages will eventually end up being marked dirty.  But there
>> is a window where they are clean and could be impacted by code such
>> as drop_caches.  So, just dirty them all as they are added to the
>> pagecache.
>>
>> In addition, it makes little sense to even try to drop hugetlbfs
>> pagecache pages, so disable calls to these filesystems in drop_caches
>> code.
>>
>> ...
>>
>> --- a/fs/drop_caches.c
>> +++ b/fs/drop_caches.c
>> @@ -9,6 +9,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  #include "internal.h"
>>  
>>  /* A global variable is a bit ugly, but it keeps the code simple */
>> @@ -18,6 +19,12 @@ static void drop_pagecache_sb(struct super_block *sb, 
>> void *unused)
>>  {
>>  struct inode *inode, *toput_inode = NULL;
>>  
>> +/*
>> + * It makes no sense to try and drop hugetlbfs page cache pages.
>> + */
>> +if (sb->s_magic == HUGETLBFS_MAGIC)
>> +return;
> 
> Hardcoding hugetlbfs seems wrong here.  There are other filesystems
> where it makes no sense to try to drop pagecache.  ramfs and, e...
> 
> I'm struggling to remember which is the correct thing to test here. 
> BDI_CAP_NO_WRITEBACK should get us there, but doesn't seem quite
> appropriate.

I was not sure about this, and expected someone could come up with
something better.  It just seems there are filesystems like huegtlbfs,
where it makes no sense wasting cycles traversing the filesystem.  So,
let's not even try.

Hoping someone can come up with a better method than hard coding as
I have done above.
-- 
Mike Kravetz


Re: Question about mmap syscall and POSIX standard on mips arch

2018-10-18 Thread Paul Burton
Hi Al,

On Thu, Oct 18, 2018 at 05:32:00AM +0100, Al Viro wrote:
> [mips folks Cc'd]
> 
> On Thu, Oct 18, 2018 at 11:26:02AM +0800, Hongzhi, Song wrote:
> > Hi all,
> > 
> > Ltp has a POSIX teatcase about mmap, 24-2.c.
> > 
> > https://github.com/linux-test-project/ltp/blob/e816127e5d8efbff5ae53e9c2292fae22f36838b/testcases/open_posix_testsuite/conformance/interfaces/mmap/24-2.c#L94
> 
> [basically, MAP_FIXED mmap with addr + len > TASK_SIZE fails with
> -EINVAL on mips and -ENOMEM elsewhere]
>  
> > Under POSIX standard, the expected errno should be ENOMEM
> > 
> > when the specific [addr+len] exceeds the bound of memory.
> 
> The mmap() function may fail if:
> 
> [EINVAL]
> The addr argument (if MAP_FIXED was specified) or off is not a multiple
> of the page size as returned by sysconf(), or is considered invalid by
>^^^
> the implementation.
> ^^^
> 
> So that behaviour gets past POSIX.  That part is mostly about the
> things like cache aliasing constraints, etc., but it leaves enough
> space to weasel out.  Said that, this
> 
> [ENOMEM]
> MAP_FIXED was specified, and the range [addr,addr+len) exceeds that allowed
> for the address space of a process; or, if MAP_FIXED was not specified and
> there is insufficient room in the address space to effect the mapping.
> 
> is a lot more specific, so switching to -ENOMEM there might be a good idea,
> especially since on other architectures we do get -ENOMEM in that case,
> AFAICS.

Thanks for the heads up - that does sound like reasonably clear
non-compliance. I'll make a note to put together a patch & test it out,
likely next week, if nobody submits one first.

Thanks,
Paul


Re: [PATCH] hugetlbfs: dirty pages as they are added to pagecache

2018-10-18 Thread Andrew Morton
On Wed, 17 Oct 2018 21:10:22 -0700 Mike Kravetz  wrote:

> Some test systems were experiencing negative huge page reserve
> counts and incorrect file block counts.  This was traced to
> /proc/sys/vm/drop_caches removing clean pages from hugetlbfs
> file pagecaches.  When non-hugetlbfs explicit code removes the
> pages, the appropriate accounting is not performed.
> 
> This can be recreated as follows:
>  fallocate -l 2M /dev/hugepages/foo
>  echo 1 > /proc/sys/vm/drop_caches
>  fallocate -l 2M /dev/hugepages/foo
>  grep -i huge /proc/meminfo
>AnonHugePages: 0 kB
>ShmemHugePages:0 kB
>HugePages_Total:2048
>HugePages_Free: 2047
>HugePages_Rsvd:18446744073709551615
>HugePages_Surp:0
>Hugepagesize:   2048 kB
>Hugetlb: 4194304 kB
>  ls -lsh /dev/hugepages/foo
>4.0M -rw-r--r--. 1 root root 2.0M Oct 17 20:05 /dev/hugepages/foo
> 
> To address this issue, dirty pages as they are added to pagecache.
> This can easily be reproduced with fallocate as shown above. Read
> faulted pages will eventually end up being marked dirty.  But there
> is a window where they are clean and could be impacted by code such
> as drop_caches.  So, just dirty them all as they are added to the
> pagecache.
> 
> In addition, it makes little sense to even try to drop hugetlbfs
> pagecache pages, so disable calls to these filesystems in drop_caches
> code.
> 
> ...
>
> --- a/fs/drop_caches.c
> +++ b/fs/drop_caches.c
> @@ -9,6 +9,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include "internal.h"
>  
>  /* A global variable is a bit ugly, but it keeps the code simple */
> @@ -18,6 +19,12 @@ static void drop_pagecache_sb(struct super_block *sb, void 
> *unused)
>  {
>   struct inode *inode, *toput_inode = NULL;
>  
> + /*
> +  * It makes no sense to try and drop hugetlbfs page cache pages.
> +  */
> + if (sb->s_magic == HUGETLBFS_MAGIC)
> + return;

Hardcoding hugetlbfs seems wrong here.  There are other filesystems
where it makes no sense to try to drop pagecache.  ramfs and, e...

I'm struggling to remember which is the correct thing to test here. 
BDI_CAP_NO_WRITEBACK should get us there, but doesn't seem quite
appropriate.




Re: [Ksummit-discuss] [PATCH v3 1/3] code-of-conduct: Fix the ambiguity about collecting email addresses

2018-10-18 Thread Frank Rowand
On 10/18/18 12:57, James Bottomley wrote:
> On Thu, 2018-10-18 at 19:49 +, tim.b...@sony.com wrote:
>>> -Original Message-
>>> From: Frank Rowand
>>>
>>> On 10/18/18 07:56, James Bottomley wrote:
 On Wed, 2018-10-17 at 12:53 -0700, Frank Rowand wrote:
> On 10/17/18 12:08, James Bottomley wrote:

 [...]
>>> Trying to understand how you are understanding my comment
>>> vs what
>>> I intended to communicate, it seems to me that you are
>>> focused on
>>> the "where allowed" and I am focused on the "which email
>>> addresses".
>>>
>>> More clear?  Or am I still not communicating well enough?
>>
>> I think the crux of the disagreement is that you think the
>> carve
>> out equates to a permission which is not specific enough and
>> I
>> think it
>
> Nope.  That is a big place where I was not transferring my
> thoughts
> to clear communication.  I agree that what I wrote should have
> been
> written in terms of carve out instead of permission.
>
>
>> doesn't equate to a permission at all, which is why there's
>> no need
>> to make it more explicit.  Is that a fair characterisation?
>
> Nope.  My concern is "which email addresses".

 The idea here was because it's a carve out that doesn't give
 permission
 and because the permission is ruled by the project contribution
 documents, the carve out should be broad enough to cover anything
 they
 might say hence "email addresses not ordinarily collected by the
 project" are still included as unacceptable behaviour.

 Perhaps if you propose the wording you'd like to see it would
 help
 because there still looks to be some subtlety I'm not getting.
>>>
>>>
>>> From the beginning of the thread:
>>>
>>>   > @@ -31,7 +31,7 @@ Examples of unacceptable behavior by
>>> participants
>>> include:
>>>   >  * Trolling, insulting/derogatory comments, and personal or
>>> political
>>> attacks
>>>   >  * Public or private harassment
>>>   >  * Publishing others’ private information, such as a physical
>>> or electronic
>>>   > -  address, without explicit permission
>>>   > +  address not ordinarily collected by the project, without
>>> explicit
>>> permission
>>>   >  * Other conduct which could reasonably be considered
>>> inappropriate in a
>>>   >professional setting
>>>
>>>
>>> Alternative (and I'm sure someone else can probably clean this up a
>>> little bit):
>>>
>>> + address that has been provided in a public space for the project,
>>> without explicit permission
>>
>> This ends up reading like so:
>>
>> 
>> Examples of unacceptable behavior by participants include:
>> ...
>> * Publishing others’ private information, such as a physical or
>> electronic
>> address that has been provided in a public space for the project,
>> without
>> explicit permission.
>> 
>>

>> I think that in context, you want a 'not' in there.  That is:

Yes, thank you.


>> unacceptable behavior includes publishing others' private
>> information... that has *not* been provided in a public space.  So, I
>> think the suggested text needs some fixing, IMHO.
> 
> You beat me to this one.  However, there is another issue that I did
> touch on but perhaps not in this subthread: For those of us who live in
> the US, our addresses (that's physical and sometimes email) are
> actually provided in a public space because they're available in the
> public property records.  That's actually why I chose "not ordinarily
> collected by the project" as opposed to "not previously provided in the
> public space" or an equivalent because doxxing in the US is mostly
> finding this information from public sources and broadcasting it.

That clarification helps a _lot_ in understanding what you have said
previously in this thread.  Thanks.  :-)


>> I looked at this issue upstream, and decided to leave the wording in
>> the CoC itself alone - favoring instead to add a clarifying addition
>> to the upstream CoC FAQ, about some email addresses not being
>> private information.
>>
>> The reason I took that approach, rather than try to change the
>> wording inside the CoC, is that the current wording seems to me to be
>> sufficient. The thing that is unacceptable is publishing private
>> information.  The "such as..." clause is intended to convey examples
>> of the types of thing that might usually be considered private
>> information.  But it is not exhaustive, nor is it necessarily
>> correct, depending on the circumstances.  In particular, email
>> addresses are sometimes private information and sometimes not.
>> In the context of kernel development, many email addresses are not
>> private.
>>
>> I am sympathetic to the argument that we use emails as public
>> information so much in kernel development processes, that it makes
>> sense to omit this or qualify it more.
> 
> I think that's the sense of the people who acked this, yes.  

Re: [PATCH v4] PCI/AER: Enable error reporting for all ports

2018-10-18 Thread Keith Busch
On Thu, Oct 18, 2018 at 03:53:58PM -0500, Bjorn Helgaas wrote:
> Change the AER service driver so it binds to *all* PCIe Ports, including
> Switch Upstream and Downstream Ports.  Enable AER error reporting for all
> these Ports, but not for any children.

I'm looking at this again and think enabling/disabling error reporting for
ports is the responsibility of the port driver, not the AER service.

The following should do the same as this patch, but without making AER
driver handle non-root ports.  The report enabling/disabling functions
are already stubbed for '!CONFIG_PCIE_AER' and have checks for aer_cap
and firmware first.

A real patch for this could even make this remove all the aer specific
error report enabling, so it'd be a net-loss in code lines. :)

---
diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index 0acca3596807..f129a33c8303 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -122,12 +122,13 @@ static int pcie_portdrv_probe(struct pci_dev *dev,
pm_runtime_put_autosuspend(>dev);
pm_runtime_allow(>dev);
}
-
+   pci_enable_pcie_error_reporting(dev);
return 0;
 }
 
 static void pcie_portdrv_remove(struct pci_dev *dev)
 {
+   pci_disable_pcie_error_reporting(dev);
if (pci_bridge_d3_possible(dev)) {
pm_runtime_forbid(>dev);
pm_runtime_get_noresume(>dev);
--


[GIT PULL] loadpin updates for security-next

2018-10-18 Thread Kees Cook
Hi James,

Please pull these loadpin changes for security-next. This is a small
reporting improvement and the param change needed for the ordering
series (but since the loadpin change is desired and separable, I'm
putting it here).

Thanks!

-Kees

The following changes since commit 57361846b52bc686112da6ca5368d11210796804:

  Linux 4.19-rc2 (2018-09-02 14:37:30 -0700)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git 
tags/loadpin-security-next

for you to fetch changes up to 13523bef1e2154b6d02836cd0f6c0ffc89b2eae6:

  LoadPin: Rename boot param "enabled" to "enforce" (2018-10-18 15:29:44 -0700)


LoadPin: report improvement and parameter renaming

- Report human-readable device name during init
- Change boot parameter and Kconfig "enabled" to "enforce"


Kees Cook (2):
  LoadPin: Report friendly block device name
  LoadPin: Rename boot param "enabled" to "enforce"

 security/loadpin/Kconfig   |  4 ++--
 security/loadpin/loadpin.c | 26 +++---
 2 files changed, 17 insertions(+), 13 deletions(-)

-- 
Kees Cook
Pixel Security


Re: [PATCH] pstore/ram: Clarify resource reservation labels

2018-10-18 Thread Kees Cook
On Thu, Oct 18, 2018 at 3:33 PM, Dan Williams  wrote:
> On Thu, Oct 18, 2018 at 3:26 PM Kees Cook  wrote:
>>
>> On Thu, Oct 18, 2018 at 3:23 PM, Dan Williams  
>> wrote:
>> > On Thu, Oct 18, 2018 at 3:19 PM Kees Cook  wrote:
>> >>
>> >> On Thu, Oct 18, 2018 at 2:35 PM, Dan Williams  
>> >> wrote:
>> >> > On Thu, Oct 18, 2018 at 1:31 PM Kees Cook  wrote:
>> > [..]
>> >> > I cringe at users picking addresses because someone is going to enable
>> >> > ramoops on top of their persistent memory namespace and wonder why
>> >> > their filesystem got clobbered. Should attempts to specify an explicit
>> >> > ramoops range that intersects EfiPersistentMemory fail by default? The
>> >> > memmap=ss!nn parameter has burned us many times with users picking the
>> >> > wrong address, so I'd be inclined to hide this ramoops sharp edge from
>> >> > them.
>> >>
>> >> Yeah, this is what I'm trying to solve. I'd like ramoops to find the
>> >> address itself, but it has to do it really early, so if I can't have
>> >> nvdimm handle it directly, will having regions already allocated with
>> >> request_mem_region() "get along" with the rest of nvdimm?
>> >
>> > If the filesystem existed on the namespace before the user specified
>> > the ramoops command line then ramoops will clobber the filesystem and
>> > the user will only find out when mount later fails. All the kernel
>> > will say is:
>> >
>> > dev_warn(dev, "could not reserve region %pR\n", res);
>> >
>> > ...from the pmem driver, and then the only way to figure who the
>> > conflict is with is to look at /proc/iomem, but the damage is already
>> > likely done by that point.
>>
>> Yeah, bleh. Okay, well, let's just skip this for now, since ramoops
>> doesn't do _anything_ with pmem now. No need to go crazy right from
>> the start. Instead, let's make it work "normally", and if someone
>> needs it for very early boot, they can manually enter the mem_address.
>>
>> How should I attach a ramoops_probe() call to pmem?
>
> To me this looks like it would be a nvdimm glue driver whose entire
> job is to attach to the namespace, fill out some
> ramoops_platform_data, and then register a "ramoops" platform_device
> for the ramoops driver to find.

That sounds right, yes. I'm happy to help review/test/etc.

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH] pstore/ram: Clarify resource reservation labels

2018-10-18 Thread Dan Williams
On Thu, Oct 18, 2018 at 3:43 PM Luck, Tony  wrote:
>
> > If the filesystem existed on the namespace before the user specified
> > the ramoops command line then ramoops will clobber the filesystem and
> > the user will only find out when mount later fails. All the kernel
> > will say is:
> >
> > dev_warn(dev, "could not reserve region %pR\n", res);
> >
> > ...from the pmem driver, and then the only way to figure who the
> > conflict is with is to look at /proc/iomem, but the damage is already
> > likely done by that point.
>
> When you set up a ramoops region in pmem, write a magic header block
> at the start of the area.  Then when pstore/ramoops goes to find the
> region, it checks for the magic header.  Not there? Don't write to the
> region. Problem solved.

That's effectively what this registration proposal will do. However,
with the caveat that the user never gets to write that magic header
without going through the nvdimm infrastructure to set it up, and
assert there is nothing there already.


RE: [PATCH] pstore/ram: Clarify resource reservation labels

2018-10-18 Thread Luck, Tony
> If the filesystem existed on the namespace before the user specified
> the ramoops command line then ramoops will clobber the filesystem and
> the user will only find out when mount later fails. All the kernel
> will say is:
>
> dev_warn(dev, "could not reserve region %pR\n", res);
>
> ...from the pmem driver, and then the only way to figure who the
> conflict is with is to look at /proc/iomem, but the damage is already
> likely done by that point.

When you set up a ramoops region in pmem, write a magic header block
at the start of the area.  Then when pstore/ramoops goes to find the
region, it checks for the magic header.  Not there? Don't write to the
region. Problem solved.

-Tony


Re: [PATCH] clk: qcom: Remove unused arrays in SDM845 GCC

2018-10-18 Thread Evan Green
On Thu, Oct 18, 2018 at 3:29 PM Stephen Boyd  wrote:
>
> Quoting Evan Green (2018-10-18 15:03:50)
> > This change removes a parent map and parent name array that
> > appear to be completely unreferenced.
> >
> > Signed-off-by: Evan Green 
> > ---
>
> Applied to clk-next. Can you change your commit text style going forward
> to match Documentation/process/submitting-patches.rst, section 2, for an
> "imperative mood"?
>

Thanks. Will do.


Re: [PATCH] ata: add Buddha PATA controller driver

2018-10-18 Thread Michael Schmitz

Hi Adrian,

module built and loaded fine (no need to build a new kernel for this). 
Can't unload the module however (-EBUSY).


You'll have to reboot elgar to reload the module, I'm afraid.

Cheers,

    Michael



On 19/10/18 01:32, John Paul Adrian Glaubitz wrote:

Hi!

On 10/18/18 2:29 PM, Bartlomiej Zolnierkiewicz wrote:

John, please test if possible..

Yes, will test ASAP. @Michael: Can you build an updated kernel from your
tree for elgar and copy it over. I'll make sure to test it and also
hook up an IDE drive for testing.

Adrian





[PATCH v2 5/7] tty: Fix spacing between kernel-doc sections

2018-10-18 Thread Tobin C. Harding
There is a specified spacing for kernel-doc layout in
Documentation/doc-guide/kernel-doc.rst we should try to adhere to it as
much as possible.

Fix spacing between sections.

Signed-off-by: Tobin C. Harding 
---
 drivers/tty/n_hdlc.c  | 2 ++
 drivers/tty/serial/serial-tegra.c | 2 --
 drivers/tty/serial/uartlite.c | 5 +
 drivers/tty/tty_buffer.c  | 1 +
 drivers/tty/tty_port.c| 2 --
 5 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/tty/n_hdlc.c b/drivers/tty/n_hdlc.c
index 4b7aaf9ee09c..5b0c776a43a2 100644
--- a/drivers/tty/n_hdlc.c
+++ b/drivers/tty/n_hdlc.c
@@ -794,6 +794,7 @@ static int n_hdlc_tty_ioctl(struct tty_struct *tty, struct 
file *file,
  *
  * Determine which operations (read/write) will not block and return info
  * to caller.
+ *
  * Returns a bit mask containing info on which ops will not block.
  */
 static __poll_t n_hdlc_tty_poll(struct tty_struct *tty, struct file *filp,
@@ -918,6 +919,7 @@ static void n_hdlc_buf_put(struct n_hdlc_buf_list *buf_list,
  *
  * Remove and return an HDLC buffer from the head of the specified HDLC buffer
  * list.
+ *
  * Returns a pointer to HDLC buffer if available, otherwise %NULL.
  */
 static struct n_hdlc_buf *n_hdlc_buf_get(struct n_hdlc_buf_list *buf_list)
diff --git a/drivers/tty/serial/serial-tegra.c 
b/drivers/tty/serial/serial-tegra.c
index 872f3b2ed196..cb240a899127 100644
--- a/drivers/tty/serial/serial-tegra.c
+++ b/drivers/tty/serial/serial-tegra.c
@@ -74,7 +74,6 @@
 
 /**
  * tegra_uart_chip_data: SOC specific data.
- *
  * @tx_fifo_full_status: Status flag available for checking tx fifo full.
  * @allow_txfifo_reset_fifo_mode: allow_tx fifo reset with fifo mode or not.
  * Tegra30 does not allow this.
@@ -220,7 +219,6 @@ static void tegra_uart_break_ctl(struct uart_port *u, int 
break_ctl)
 
 /**
  * tegra_uart_wait_cycle_time() - Wait for N UART clock periods
- *
  * @tup:   Tegra serial port data structure.
  * @cycles:Number of clock periods to wait.
  *
diff --git a/drivers/tty/serial/uartlite.c b/drivers/tty/serial/uartlite.c
index 94a91748da40..718d3bbd002d 100644
--- a/drivers/tty/serial/uartlite.c
+++ b/drivers/tty/serial/uartlite.c
@@ -605,7 +605,6 @@ static struct uart_driver ulite_uart_driver = {
 
 /**
  * ulite_assign() - register a uartlite device with the driver
- *
  * @dev: pointer to device structure
  * @id: requested id number.  Pass -1 for automatic port assignment
  * @base: base address of uartlite registers
@@ -687,7 +686,6 @@ static int ulite_assign(struct device *dev, int id, u32 
base, int irq,
 
 /**
  * ulite_release() - register a uartlite device with the driver
- *
  * @dev: pointer to device structure
  */
 static int ulite_release(struct device *dev)
@@ -706,7 +704,6 @@ static int ulite_release(struct device *dev)
 
 /**
  * ulite_suspend() - Stop the device.
- *
  * @dev: handle to the device structure.
  * Return: 0 always.
  */
@@ -722,8 +719,8 @@ static int __maybe_unused ulite_suspend(struct device *dev)
 
 /**
  * ulite_resume() - Resume the device.
- *
  * @dev: handle to the device structure.
+ *
  * Return: 0 on success, errno otherwise.
  */
 static int __maybe_unused ulite_resume(struct device *dev)
diff --git a/drivers/tty/tty_buffer.c b/drivers/tty/tty_buffer.c
index d50dbe06896b..5f05500ee4d0 100644
--- a/drivers/tty/tty_buffer.c
+++ b/drivers/tty/tty_buffer.c
@@ -148,6 +148,7 @@ void tty_buffer_free_all(struct tty_port *port)
  * Allocate a new tty buffer to hold the desired number of characters.
  * We round our buffers off in 256 character chunks to get better
  * allocation behaviour.
+ *
  * Return NULL if out of memory or the allocation would exceed the
  * per device queue
  */
diff --git a/drivers/tty/tty_port.c b/drivers/tty/tty_port.c
index 60927a52ec9c..d4dfe29a6348 100644
--- a/drivers/tty/tty_port.c
+++ b/drivers/tty/tty_port.c
@@ -363,7 +363,6 @@ EXPORT_SYMBOL(tty_port_hangup);
 
 /**
  * tty_port_tty_hangup() - helper to hang up a tty
- *
  * @port: tty port
  * @check_clocal: hang only ttys with CLOCAL unset?
  */
@@ -379,7 +378,6 @@ EXPORT_SYMBOL_GPL(tty_port_tty_hangup);
 
 /**
  * tty_port_tty_wakeup() - helper to wake up a tty
- *
  * @port: tty port
  */
 void tty_port_tty_wakeup(struct tty_port *port)
-- 
2.17.1



[PATCH v2 6/7] tty: Remove newline after function kernel-doc

2018-10-18 Thread Tobin C. Harding
Currently there are many places were a newline is left between the
kernel-doc function comment and the function it is documenting.  There
should not be a newline here.

Remove newline after function kernel-doc.  This patch is whitespace
only.

Signed-off-by: Tobin C. Harding 
---

 drivers/tty/n_gsm.c| 46 --
 drivers/tty/n_tty.c| 33 -
 drivers/tty/pty.c  |  7 -
 drivers/tty/serial/ifx6x60.c   |  5 
 drivers/tty/serial/ioc3_serial.c   |  5 
 drivers/tty/serial/ioc4_serial.c   |  3 --
 drivers/tty/serial/men_z135_uart.c |  1 -
 drivers/tty/serial/sn_console.c|  1 -
 drivers/tty/tty_baudrate.c |  4 ---
 drivers/tty/tty_buffer.c   | 14 -
 drivers/tty/tty_io.c   | 36 ---
 drivers/tty/tty_ioctl.c| 13 -
 drivers/tty/tty_ldisc.c| 19 
 drivers/tty/vt/vt_ioctl.c  |  3 --
 14 files changed, 190 deletions(-)

diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c
index e25944ee2766..c65a9afb7cc4 100644
--- a/drivers/tty/n_gsm.c
+++ b/drivers/tty/n_gsm.c
@@ -366,7 +366,6 @@ static const u8 gsm_fcs8[256] = {
  * Update the FCS to include c. Uses the algorithm in the specification
  * notes.
  */
-
 static inline u8 gsm_fcs_add(u8 fcs, u8 c)
 {
return gsm_fcs8[fcs ^ c];
@@ -381,7 +380,6 @@ static inline u8 gsm_fcs_add(u8 fcs, u8 c)
  * Update the FCS to include c. Uses the algorithm in the specification
  * notes.
  */
-
 static inline u8 gsm_fcs_add_block(u8 fcs, u8 *c, int len)
 {
while (len--)
@@ -397,7 +395,6 @@ static inline u8 gsm_fcs_add_block(u8 fcs, u8 *c, int len)
  * Processes one byte of an EA. Updates the passed variable
  * and returns 1 if the EA is now completely read
  */
-
 static int gsm_read_ea(unsigned int *val, u8 c)
 {
/* Add the next 7 bits into the value */
@@ -414,7 +411,6 @@ static int gsm_read_ea(unsigned int *val, u8 c)
  * Returns the correct GSM encoded modem status bits (6 bit field) for
  * the current status of the DLCI and attached tty object
  */
-
 static u8 gsm_encode_modem(const struct gsm_dlci *dlci)
 {
u8 modembits = 0;
@@ -444,7 +440,6 @@ static u8 gsm_encode_modem(const struct gsm_dlci *dlci)
  * Displays a packet in human readable format for debugging purposes. The
  * style is based on amateur radio LAP-B dump display.
  */
-
 static void gsm_print_packet(const char *hdr, int addr, int cr,
u8 control, const u8 *data, int dlen)
 {
@@ -525,7 +520,6 @@ static void gsm_print_packet(const char *hdr, int addr, int 
cr,
  * is doubling and the caller is responsible for handing out
  * suitable sized buffers.
  */
-
 static int gsm_stuff_frame(const u8 *input, u8 *output, int len)
 {
int olen = 0;
@@ -555,7 +549,6 @@ static int gsm_stuff_frame(const u8 *input, u8 *output, int 
len)
  *
  * FIXME: Lock versus data TX path
  */
-
 static void gsm_send(struct gsm_mux *gsm, int addr, int cr, int control)
 {
int len;
@@ -602,7 +595,6 @@ static void gsm_send(struct gsm_mux *gsm, int addr, int cr, 
int control)
  *
  * Format up and transmit a link level response frame.
  */
-
 static inline void gsm_response(struct gsm_mux *gsm, int addr, int control)
 {
gsm_send(gsm, addr, 0, control);
@@ -616,7 +608,6 @@ static inline void gsm_response(struct gsm_mux *gsm, int 
addr, int control)
  *
  * Format up and transmit a link level command frame.
  */
-
 static inline void gsm_command(struct gsm_mux *gsm, int addr, int control)
 {
gsm_send(gsm, addr, 1, control);
@@ -637,7 +628,6 @@ static inline void gsm_command(struct gsm_mux *gsm, int 
addr, int control)
  * at the front for header bytes but that is treated as an implementation
  * detail and not for the high level code to use
  */
-
 static struct gsm_msg *gsm_data_alloc(struct gsm_mux *gsm, u8 addr, int len,
u8 ctrl)
 {
@@ -664,7 +654,6 @@ static struct gsm_msg *gsm_data_alloc(struct gsm_mux *gsm, 
u8 addr, int len,
  *
  * FIXME: lock against link layer control transmissions
  */
-
 static void gsm_data_kick(struct gsm_mux *gsm)
 {
struct gsm_msg *msg, *nmsg;
@@ -715,7 +704,6 @@ static void gsm_data_kick(struct gsm_mux *gsm)
  * out of the mux tty if not already doing so. The Caller must hold
  * the gsm tx lock.
  */
-
 static void __gsm_data_queue(struct gsm_dlci *dlci, struct gsm_msg *msg)
 {
struct gsm_mux *gsm = dlci->gsm;
@@ -766,7 +754,6 @@ static void __gsm_data_queue(struct gsm_dlci *dlci, struct 
gsm_msg *msg)
  * out of the mux tty if not already doing so. Take the
  * the gsm tx lock and dlci lock.
  */
-
 static void gsm_data_queue(struct gsm_dlci *dlci, struct gsm_msg *msg)
 {
unsigned long flags;
@@ -786,7 +773,6 @@ static void gsm_data_queue(struct gsm_dlci *dlci, struct 
gsm_msg *msg)
  *
  * Caller must hold the 

[PATCH v2 4/7] tty: Fix kernel-doc variable typos

2018-10-18 Thread Tobin C. Harding
There are a few typos in the kernel-doc variables [descriptions].  Fix
these by using the correct variable name and adding a description.

Fix kernel-doc variable typos.

Signed-off-by: Tobin C. Harding 
---
 drivers/tty/serial/men_z135_uart.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/serial/men_z135_uart.c 
b/drivers/tty/serial/men_z135_uart.c
index ef89534dd760..784756139935 100644
--- a/drivers/tty/serial/men_z135_uart.c
+++ b/drivers/tty/serial/men_z135_uart.c
@@ -173,7 +173,7 @@ static void men_z135_reg_clr(struct men_z135_port *uart,
 
 /**
  * men_z135_handle_modem_status() - Handle change of modem status
- * @port: The UART port
+ * @uart: The UART port
  *
  * Handle change of modem status register. This is done by reading the "delta"
  * versions of DCD (Data Carrier Detect) and CTS (Clear To Send).
@@ -597,7 +597,7 @@ static void men_z135_stop_rx(struct uart_port *port)
 
 /**
  * men_z135_enable_ms() - Enable Modem Status
- * port:
+ * @port: The UART port
  *
  * Enable Modem Status IRQ.
  */
-- 
2.17.1



[PATCH v2 7/7] tty: Fix section format

2018-10-18 Thread Tobin C. Harding
Currently sections (specifically 'Return:' and 'Context:') are
non-uniform and/or incorrectly labelled.  We should use the format
outlined in Documentation/doc-guide/kernel-doc.rst.  This patch is a bit
of a mash up because text is on occasions mixed up in the current
comments.

 - Use 'Context:' (Replaces 'Locking')

   * Context: Describes whether the function can sleep, what locks it
   *  takes, releases, or expects to be held. It can extend over
   *  multiple lines.

 - Use 'Return:'

   * Return: Describe the return value of foobar.
   *
   * The return value description can also have multiple paragraphs, and
   * should be placed at the end of the comment block.

 - If brief description is too long spilt it up into brief and longer.

   * A longer description, with more discussion of the function
   * function_name() that might be useful to those using or modifying
   * it. Begins with an empty comment line, and may include additional
   * embedded empty comment lines.

Fix brief description if missing and long description present.

This patch leaves some instances of 'Locking:' still in the code (if I
was not _sure_ of the documented locking or if changing it would make
the comment less legible).

Signed-off-by: Tobin C. Harding 
---
 drivers/tty/hvc/hvc_console.c   |   4 +-
 drivers/tty/hvc/hvc_iucv.c  |  14 +--
 drivers/tty/mips_ejtag_fdc.c|   4 +-
 drivers/tty/n_gsm.c |  37 
 drivers/tty/n_hdlc.c|  14 +--
 drivers/tty/n_tracerouter.c |  14 ++-
 drivers/tty/n_tracesink.c   |  20 ++---
 drivers/tty/n_tty.c | 117 -
 drivers/tty/pty.c   |  19 ++--
 drivers/tty/serial/8250/8250_core.c |  14 +--
 drivers/tty/serial/amba-pl011.c |   2 +-
 drivers/tty/serial/earlycon.c   |   4 +-
 drivers/tty/serial/ifx6x60.c|  12 +--
 drivers/tty/serial/serial_txx9.c|   5 +-
 drivers/tty/serial/xilinx_uartps.c  |  21 +++--
 drivers/tty/tty_audit.c |   9 +-
 drivers/tty/tty_baudrate.c  |  31 +++
 drivers/tty/tty_buffer.c|  26 +++---
 drivers/tty/tty_io.c| 129 +---
 drivers/tty/tty_ioctl.c |  15 ++--
 drivers/tty/tty_jobctrl.c   |  20 ++---
 drivers/tty/tty_ldisc.c |  43 +-
 drivers/tty/tty_port.c  |   8 +-
 drivers/tty/vt/consolemap.c |   4 +-
 drivers/tty/vt/keyboard.c   |   6 +-
 drivers/tty/vt/selection.c  |  10 ++-
 drivers/tty/vt/vt.c |  26 +++---
 27 files changed, 310 insertions(+), 318 deletions(-)

diff --git a/drivers/tty/hvc/hvc_console.c b/drivers/tty/hvc/hvc_console.c
index 6d57d0bfff5d..9379de272995 100644
--- a/drivers/tty/hvc/hvc_console.c
+++ b/drivers/tty/hvc/hvc_console.c
@@ -569,7 +569,7 @@ static int hvc_write(struct tty_struct *tty, const unsigned 
char *buf, int count
  * The routine shall not be called within an atomic context because it
  * might sleep.
  *
- * Locking:hp->lock
+ * Context: Take hp->lock
  */
 static void hvc_set_winsz(struct work_struct *work)
 {
@@ -784,7 +784,7 @@ EXPORT_SYMBOL_GPL(hvc_poll);
  * Stores the specified window size information in the hvc structure of @hp.
  * The function schedule the tty resize update.
  *
- * Locking:Locking free; the function MUST be called holding hp->lock
+ * Context: Locking free; the function MUST be called holding hp->lock
  */
 void __hvc_resize(struct hvc_struct *hp, struct winsize ws)
 {
diff --git a/drivers/tty/hvc/hvc_iucv.c b/drivers/tty/hvc/hvc_iucv.c
index 2af1e5751bd6..0de673b29b31 100644
--- a/drivers/tty/hvc/hvc_iucv.c
+++ b/drivers/tty/hvc/hvc_iucv.c
@@ -147,7 +147,8 @@ static struct hvc_iucv_private 
*hvc_iucv_get_private(uint32_t num)
  * required for receiving and sending data with IUCV.
  * Note: The total message size arises from the internal buffer size and the
  *  members of the iucv_tty_msg structure.
- * The function returns NULL if memory allocation has failed.
+ *
+ * Return: NULL if memory allocation has failed.
  */
 static struct iucv_tty_buffer *alloc_tty_buffer(size_t size, gfp_t flags)
 {
@@ -210,9 +211,9 @@ static void destroy_tty_buffer_list(struct list_head *list)
  * If all message data has been written, the message is removed from
  * the input queue.
  *
- * The function returns the number of bytes written to the terminal, zero if
- * there are no pending data messages available or if there is no established
- * IUCV path.
+ * Return: The number of bytes written to the terminal, zero if there are no
+ * pending data messages available or if there is no established IUCV path.
+ *
  * If the IUCV path has been severed, then -EPIPE is returned to cause a
  * hang up (that is issued by the HVC layer).
  */
@@ -309,9 +310,8 @@ static int hvc_iucv_write(struct hvc_iucv_private *priv,
  * If an IUCV communication path has been established, pending IUCV messages
  * 

[PATCH v2 1/7] tty: Fix whitespace before tab warnings

2018-10-18 Thread Tobin C. Harding
This patch is whitespace only.

Checkpatch emits a BUNCH of warnings of type

WARNING: please, no spaces at the start of a line
WARNING: please, no space before tabs

While these may seem trivial they hinder further cleanup efforts because
there are so many that later cleanup touches many of the same lines,
this makes the cleanup patches super noisy when parsed with
checkpatch (although checkpatching this series is prohibitively noisy
even with this patch).

In preparation for doing kernel-docs cleanup fix all the whitespace
before tab and whitespace at start of line warnings.  We do not fix any
that are obviously there for extra-ordinary formatting reasons.

Any line in the diff with obvious other formatting errors is fixed also
excluding kernel-docs lines.

Signed-off-by: Tobin C. Harding 
---

Actually, checkpatch is only noisy on a couple of patches from this
set.  I ran checkpatch individually (instead of on the whole set) on
each patch and fixed things within scope of this set while working on
version 2 of the set.

 drivers/tty/amiserial.c |  43 ---
 drivers/tty/hvc/hvc_console.c   |   2 +-
 drivers/tty/ipwireless/tty.c|   2 +-
 drivers/tty/isicom.c|   4 +-
 drivers/tty/moxa.c  |  30 +++--
 drivers/tty/mxser.c |  10 +-
 drivers/tty/n_hdlc.c|  10 +-
 drivers/tty/rocket.c|   8 +-
 drivers/tty/serdev/core.c   |   2 +-
 drivers/tty/serial/8250/8250_dw.c   |   4 +-
 drivers/tty/serial/amba-pl011.c |   8 +-
 drivers/tty/serial/cpm_uart/cpm_uart_core.c |   8 +-
 drivers/tty/serial/icom.c   |   5 +-
 drivers/tty/serial/ip22zilog.c  |   2 +-
 drivers/tty/serial/jsm/jsm_neo.c|   2 +-
 drivers/tty/serial/mux.c|   6 +-
 drivers/tty/serial/mxs-auart.c  |   2 +-
 drivers/tty/serial/pmac_zilog.c |   4 +-
 drivers/tty/serial/pxa.c|  20 +--
 drivers/tty/serial/serial_txx9.c|   2 +-
 drivers/tty/serial/sn_console.c |  38 +++---
 drivers/tty/serial/sunsab.c |   2 +-
 drivers/tty/serial/sunsu.c  |   4 +-
 drivers/tty/serial/sunzilog.c   |   2 +-
 drivers/tty/serial/ucc_uart.c   | 134 ++--
 drivers/tty/serial/vr41xx_siu.c |   2 +-
 drivers/tty/serial/zs.c |  12 +-
 drivers/tty/sysrq.c |  34 ++---
 drivers/tty/tty_baudrate.c  |   2 +-
 drivers/tty/tty_io.c|  22 ++--
 drivers/tty/tty_ioctl.c |   2 +-
 drivers/tty/tty_jobctrl.c   |   8 +-
 drivers/tty/vt/consolemap.c |   6 +-
 drivers/tty/vt/keyboard.c   |  29 +++--
 drivers/tty/vt/selection.c  |  12 +-
 drivers/tty/vt/vt.c |  40 +++---
 drivers/tty/vt/vt_ioctl.c   |  14 +-
 37 files changed, 271 insertions(+), 266 deletions(-)

diff --git a/drivers/tty/amiserial.c b/drivers/tty/amiserial.c
index 34dead614149..73d4033cef6b 100644
--- a/drivers/tty/amiserial.c
+++ b/drivers/tty/amiserial.c
@@ -18,7 +18,7 @@
  *
  *  Copyright (C) 1991, 1992  Linus Torvalds
  *  Copyright (C) 1992, 1993, 1994, 1995, 1996, 1997, 
- * 1998, 1999  Theodore Ts'o
+ * 1998, 1999  Theodore Ts'o
  *
  */
 
@@ -26,8 +26,8 @@
  * Serial driver configuration section.  Here are the various options:
  *
  * SERIAL_PARANOIA_CHECK
- * Check the magic number for the async_structure where
- * ever possible.
+ * Check the magic number for the async_structure where
+ * ever possible.
  */
 
 #include 
@@ -104,8 +104,8 @@ struct serial_state {
int ignore_status_mask;
int timeout;
int quot;
-   int IER;/* Interrupt Enable Register */
-   int MCR;/* Modem control register */
+   int IER;/* Interrupt Enable Register */
+   int MCR;/* Modem control register */
int x_char; /* xon/xoff character */
 };
 
@@ -173,7 +173,8 @@ static inline int serial_paranoia_check(struct serial_state 
*info,
 
 static __inline__ void rtsdtr_ctrl(int bits)
 {
-ciab.pra = ((bits & (SER_RTS | SER_DTR)) ^ (SER_RTS | SER_DTR)) | 
(ciab.pra & ~(SER_RTS | SER_DTR));
+   ciab.pra = ((bits & (SER_RTS | SER_DTR)) ^ (SER_RTS | SER_DTR)) |
+  (ciab.pra & ~(SER_RTS | SER_DTR));
 }
 
 /*
@@ -243,13 +244,13 @@ static void rs_start(struct tty_struct *tty)
  *
  * and look at the resulting assemble code in serial.s.
  *
- * - Ted Ts'o (ty...@mit.edu), 7-Mar-93
+ *   

[PATCH v2 0/7] drivers: tty: Fix kernel-docs comments

2018-10-18 Thread Tobin C. Harding
Hi,

v2 with the author amended to use the correct _new_ email address
(thanks Geert and Greg) so that the SoB is correct.

Applies on top of tty-next tree.

>From v1:

My first patch set using kernel.org email address.  I've been so careful
with this one I'm expecting some epic fail :)

This set is an attempt to clean up the kernel docs comments under
drivers/tty/.  I do not know what time in the kernel dev cycle would be
best for this set.  If now is not the time please do not hesitate to say
so and stop reading here.

I included a couple of whitespace patches to kick off with, in hindsight
these might be nothing more than code churn (if so I can remove them).

I had a real hard time breaking these changes up into separate patches.
This was mainly because I was super conscious that this set is
borderline trivial yet large enough to be a P in the A to review.  I
have reviewed these patches a bunch of times myself and am happy with
the result being as easy to review as I know how.  If there was anything
more I could have done please say.  Patch 3 and 7 are the two that are
most arduous to review.

Apart from the first two whitespace patches only comments are touched by
this set.  The set builds cleanly but does _not_ checkpatch cleanly -
there is a limit to the code churn I was willing to risk posting :)

Extra thanks to anyone who reviews this.  This set is super non-critical
so please only review at your leisure.

thanks,
Tobin.


v2:
 - Check author of each patch, run when required:
`git commit --amend --author="Tobin C. Harding "
 - Fix introduced 'space before tab warnings' in patch 3 and 7.
 

Tobin C. Harding (7):
  tty: Fix whitespace before tab warnings
  tty: Remove trailing whitespace
  tty: Partially fix kernel-docs layout
  tty: Fix kernel-doc variable typos
  tty: Fix spacing between kernel-doc sections
  tty: Remove newline after function kernel-doc
  tty: Fix section format

 drivers/tty/amiserial.c |   62 +-
 drivers/tty/cyclades.c  |2 +-
 drivers/tty/ehv_bytechan.c  |2 +-
 drivers/tty/hvc/hvc_console.c   |8 +-
 drivers/tty/hvc/hvc_iucv.c  |   14 +-
 drivers/tty/hvc/hvc_xen.c   |2 +-
 drivers/tty/hvc/hvcs.c  |4 +-
 drivers/tty/ipwireless/tty.c|2 +-
 drivers/tty/isicom.c|4 +-
 drivers/tty/mips_ejtag_fdc.c|4 +-
 drivers/tty/moxa.c  |   38 +-
 drivers/tty/mxser.c |   14 +-
 drivers/tty/n_gsm.c |  754 +++---
 drivers/tty/n_hdlc.c|  208 ++--
 drivers/tty/n_r3964.c   |6 +-
 drivers/tty/n_tracerouter.c |   18 +-
 drivers/tty/n_tracesink.c   |   24 +-
 drivers/tty/n_tty.c |  688 ++---
 drivers/tty/pty.c   |  126 ++-
 drivers/tty/rocket.c|   78 +-
 drivers/tty/serdev/core.c   |2 +-
 drivers/tty/serial/21285.c  |2 +-
 drivers/tty/serial/8250/8250_core.c |   72 +-
 drivers/tty/serial/8250/8250_dw.c   |4 +-
 drivers/tty/serial/8250/8250_port.c |   42 +-
 drivers/tty/serial/amba-pl010.c |2 +-
 drivers/tty/serial/amba-pl011.c |   34 +-
 drivers/tty/serial/cpm_uart/cpm_uart_core.c |8 +-
 drivers/tty/serial/earlycon.c   |   26 +-
 drivers/tty/serial/icom.c   |5 +-
 drivers/tty/serial/ifx6x60.c|  387 ---
 drivers/tty/serial/ioc3_serial.c|   73 +-
 drivers/tty/serial/ioc4_serial.c|   85 +-
 drivers/tty/serial/ip22zilog.c  |2 +-
 drivers/tty/serial/jsm/jsm_neo.c|2 +-
 drivers/tty/serial/kgdb_nmi.c   |2 +-
 drivers/tty/serial/max3100.c|2 +-
 drivers/tty/serial/max310x.c|2 +-
 drivers/tty/serial/men_z135_uart.c  |5 +-
 drivers/tty/serial/mux.c|   62 +-
 drivers/tty/serial/mxs-auart.c  |2 +-
 drivers/tty/serial/pmac_zilog.c |   62 +-
 drivers/tty/serial/pxa.c|   20 +-
 drivers/tty/serial/sa1100.c |2 +-
 drivers/tty/serial/serial-tegra.c   |4 +-
 drivers/tty/serial/serial_core.c|  198 ++--
 drivers/tty/serial/serial_txx9.c|   23 +-
 drivers/tty/serial/sn_console.c |  105 +-
 drivers/tty/serial/sunsab.c |   16 +-
 drivers/tty/serial/sunsu.c  |8 +-
 drivers/tty/serial/sunzilog.c   |   10 +-
 drivers/tty/serial/uartlite.c   |   15 +-
 drivers/tty/serial/ucc_uart.c   |  134 +--
 drivers/tty/serial/vr41xx_siu.c |2 +-
 drivers/tty/serial/xilinx_uartps.c  |   71 +-
 

Re: [PATCH] pstore/ram: Clarify resource reservation labels

2018-10-18 Thread Dan Williams
On Thu, Oct 18, 2018 at 3:26 PM Kees Cook  wrote:
>
> On Thu, Oct 18, 2018 at 3:23 PM, Dan Williams  
> wrote:
> > On Thu, Oct 18, 2018 at 3:19 PM Kees Cook  wrote:
> >>
> >> On Thu, Oct 18, 2018 at 2:35 PM, Dan Williams  
> >> wrote:
> >> > On Thu, Oct 18, 2018 at 1:31 PM Kees Cook  wrote:
> > [..]
> >> > I cringe at users picking addresses because someone is going to enable
> >> > ramoops on top of their persistent memory namespace and wonder why
> >> > their filesystem got clobbered. Should attempts to specify an explicit
> >> > ramoops range that intersects EfiPersistentMemory fail by default? The
> >> > memmap=ss!nn parameter has burned us many times with users picking the
> >> > wrong address, so I'd be inclined to hide this ramoops sharp edge from
> >> > them.
> >>
> >> Yeah, this is what I'm trying to solve. I'd like ramoops to find the
> >> address itself, but it has to do it really early, so if I can't have
> >> nvdimm handle it directly, will having regions already allocated with
> >> request_mem_region() "get along" with the rest of nvdimm?
> >
> > If the filesystem existed on the namespace before the user specified
> > the ramoops command line then ramoops will clobber the filesystem and
> > the user will only find out when mount later fails. All the kernel
> > will say is:
> >
> > dev_warn(dev, "could not reserve region %pR\n", res);
> >
> > ...from the pmem driver, and then the only way to figure who the
> > conflict is with is to look at /proc/iomem, but the damage is already
> > likely done by that point.
>
> Yeah, bleh. Okay, well, let's just skip this for now, since ramoops
> doesn't do _anything_ with pmem now. No need to go crazy right from
> the start. Instead, let's make it work "normally", and if someone
> needs it for very early boot, they can manually enter the mem_address.
>
> How should I attach a ramoops_probe() call to pmem?

To me this looks like it would be a nvdimm glue driver whose entire
job is to attach to the namespace, fill out some
ramoops_platform_data, and then register a "ramoops" platform_device
for the ramoops driver to find.


Re: [PATCH] clk: qcom: Remove unused arrays in SDM845 GCC

2018-10-18 Thread Stephen Boyd
Quoting Evan Green (2018-10-18 15:03:50)
> This change removes a parent map and parent name array that
> appear to be completely unreferenced.
> 
> Signed-off-by: Evan Green 
> ---

Applied to clk-next. Can you change your commit text style going forward
to match Documentation/process/submitting-patches.rst, section 2, for an
"imperative mood"?



Re: [PATCH] pstore/ram: Clarify resource reservation labels

2018-10-18 Thread Kees Cook
On Thu, Oct 18, 2018 at 3:23 PM, Dan Williams  wrote:
> On Thu, Oct 18, 2018 at 3:19 PM Kees Cook  wrote:
>>
>> On Thu, Oct 18, 2018 at 2:35 PM, Dan Williams  
>> wrote:
>> > On Thu, Oct 18, 2018 at 1:31 PM Kees Cook  wrote:
> [..]
>> > I cringe at users picking addresses because someone is going to enable
>> > ramoops on top of their persistent memory namespace and wonder why
>> > their filesystem got clobbered. Should attempts to specify an explicit
>> > ramoops range that intersects EfiPersistentMemory fail by default? The
>> > memmap=ss!nn parameter has burned us many times with users picking the
>> > wrong address, so I'd be inclined to hide this ramoops sharp edge from
>> > them.
>>
>> Yeah, this is what I'm trying to solve. I'd like ramoops to find the
>> address itself, but it has to do it really early, so if I can't have
>> nvdimm handle it directly, will having regions already allocated with
>> request_mem_region() "get along" with the rest of nvdimm?
>
> If the filesystem existed on the namespace before the user specified
> the ramoops command line then ramoops will clobber the filesystem and
> the user will only find out when mount later fails. All the kernel
> will say is:
>
> dev_warn(dev, "could not reserve region %pR\n", res);
>
> ...from the pmem driver, and then the only way to figure who the
> conflict is with is to look at /proc/iomem, but the damage is already
> likely done by that point.

Yeah, bleh. Okay, well, let's just skip this for now, since ramoops
doesn't do _anything_ with pmem now. No need to go crazy right from
the start. Instead, let's make it work "normally", and if someone
needs it for very early boot, they can manually enter the mem_address.

How should I attach a ramoops_probe() call to pmem?

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH] pstore/ram: Clarify resource reservation labels

2018-10-18 Thread Dan Williams
On Thu, Oct 18, 2018 at 3:19 PM Kees Cook  wrote:
>
> On Thu, Oct 18, 2018 at 2:35 PM, Dan Williams  
> wrote:
> > On Thu, Oct 18, 2018 at 1:31 PM Kees Cook  wrote:
[..]
> > I cringe at users picking addresses because someone is going to enable
> > ramoops on top of their persistent memory namespace and wonder why
> > their filesystem got clobbered. Should attempts to specify an explicit
> > ramoops range that intersects EfiPersistentMemory fail by default? The
> > memmap=ss!nn parameter has burned us many times with users picking the
> > wrong address, so I'd be inclined to hide this ramoops sharp edge from
> > them.
>
> Yeah, this is what I'm trying to solve. I'd like ramoops to find the
> address itself, but it has to do it really early, so if I can't have
> nvdimm handle it directly, will having regions already allocated with
> request_mem_region() "get along" with the rest of nvdimm?

If the filesystem existed on the namespace before the user specified
the ramoops command line then ramoops will clobber the filesystem and
the user will only find out when mount later fails. All the kernel
will say is:

dev_warn(dev, "could not reserve region %pR\n", res);

...from the pmem driver, and then the only way to figure who the
conflict is with is to look at /proc/iomem, but the damage is already
likely done by that point.


Re: [PATCH] pstore/ram: Clarify resource reservation labels

2018-10-18 Thread Kees Cook
On Thu, Oct 18, 2018 at 2:35 PM, Dan Williams  wrote:
> On Thu, Oct 18, 2018 at 1:31 PM Kees Cook  wrote:
>>
>> On Thu, Oct 18, 2018 at 8:33 AM, Dan Williams  
>> wrote:
>> > [ add Ross ]
>>
>> Hi Ross! :)
>>
>> > On Thu, Oct 18, 2018 at 12:15 AM Kees Cook  wrote:
>> >> As for nvdimm specifically, yes, I'd love to get pstore hooked up
>> >> correctly to nvdimm. How do the namespaces work? Right now pstore
>> >> depends one of platform driver data, device tree specification, or
>> >> manual module parameters.
>> >
>> > From the userspace side we have the ndctl utility to wrap
>> > personalities on top of namespaces. So for example, I envision we
>> > would be able to do:
>> >
>> > ndctl create-namespace --mode=pstore --size=128M
>> >
>> > ...and create a small namespace that will register with the pstore 
>> > sub-system.
>> >
>> > On the kernel side this would involve registering a 'pstore_dev' child
>> > / seed device under each region device. The 'seed-device' sysfs scheme
>> > is described in our documentation [1]. The short summary is ndctl
>> > finds a seed device assigns a namespace to it and then binding that
>> > device to a driver causes it to be initialized by the kernel.
>> >
>> > [1]: https://www.kernel.org/doc/Documentation/nvdimm/nvdimm.txt
>>
>> Interesting!
>>
>> Really, this would be a way to configure "ramoops" (the persistent RAM
>> backend to pstore), rather than pstore itself (pstore is just the
>> framework). From reading the ndctl man page it sounds like there isn't
>> a way to store configuration information beyond just size?
>>
>> ramoops will auto-configure itself and fill available space using its
>> default parameters, but it might be nice to have a way to store that
>> somewhere (traditionally it's part of device tree or platform data).
>> ramoops could grow a "header", but normally the regions are very small
>> so I've avoided that.
>>
>> I'm not sure I understand the right way to glue ramoops_probe() to the
>> "seed-device" stuff. (It needs to be probed VERY early to catch early
>> crashes -- ramoops uses postcore_initcall() normally.)
>
> Irk, yeah, that's early. On some configurations we can't delineate
> namespaces until after ACPI has come up. Ideally the address range
> would be reserved and communicated in the memory-map from the BIOS.

Yeah, I'm wondering if I should introduce a mode for ramoops where it
walks the memory regions looking for persistent ram areas, and uses
the first available. Something like "ramoops.mem_address=first
ramoops.mem_size=NNN"

> I cringe at users picking addresses because someone is going to enable
> ramoops on top of their persistent memory namespace and wonder why
> their filesystem got clobbered. Should attempts to specify an explicit
> ramoops range that intersects EfiPersistentMemory fail by default? The
> memmap=ss!nn parameter has burned us many times with users picking the
> wrong address, so I'd be inclined to hide this ramoops sharp edge from
> them.

Yeah, this is what I'm trying to solve. I'd like ramoops to find the
address itself, but it has to do it really early, so if I can't have
nvdimm handle it directly, will having regions already allocated with
request_mem_region() "get along" with the rest of nvdimm?

-Kees

-- 
Kees Cook
Pixel Security


  1   2   3   4   5   6   7   8   >