Re: [PATCH] gpio: uniphier: add UniPhier GPIO controller driver

2017-08-07 Thread Masahiro Yamada
Hi Linus,

2017-08-08 0:37 GMT+09:00 Linus Walleij :
> On Mon, Aug 7, 2017 at 3:50 PM, Masahiro Yamada
>  wrote:
>
>> Adding "interrupts" property in DT causes
>> of_pupulate_default_populate() to assign virtual IRQ numbers
>> before driver probing.  So it does not work well with IRQ domain hierarchy.
>
> I think I heard some noise about this the week before.
>
>> For pinctrl/stm32/pinctrl-stm32.c,
>> I do not see "interrupts", so it just straight maps the irq numbers.
>
> I think OMAP and DaVinci does someting similar too. This is from a recent
> DaVinci patch from Keerthy:
>
> +Example for 66AK2G:
> +
> +gpio0: gpio@2603000 {
> +   compatible = "ti,k2g-gpio", "ti,keystone-gpio";
> +   reg = <0x02603000 0x100>;
> +   gpio-controller;
> +   #gpio-cells = <2>;
> +   interrupts = ,
> +   ,
> +   ,
> +   ,
> +   ,
> +   ,
> +   ,
> +   ,
> +   ;
> +   interrupt-controller;
> +   #interrupt-cells = <2>;
> +   ti,ngpio = <144>;
> +   ti,davinci-gpio-unbanked = <0>;
> +   clocks = <_clks 0x001b 0x0>;
> +   clock-names = "gpio";
> +};
>
>
> That looks fairly similar.
>

I do not think so.


I do not see .alloc hook in drivers/gpio/gpio-davinci.c
so this driver is unrelated to IRQ domain hierarchy.






-- 
Best Regards
Masahiro Yamada


[PATCH] MAINTAINERS: Update Cavium ThunderX2 entry

2017-08-07 Thread Jayachandran C
Add Robert Richter as the primary maintainer for this platform.

Signed-off-by: Jayachandran C 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 44cb004..f2d8963 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3155,6 +3155,7 @@ S:Supported
 F: drivers/crypto/cavium/cpt/
 
 CAVIUM THUNDERX2 ARM64 SOC
+M: Robert Richter 
 M: Jayachandran C 
 L: linux-arm-ker...@lists.infradead.org (moderated for non-subscribers)
 S: Maintained
-- 
2.7.4



Re: [PATCH] irqchip: uniphier-aidet: add UniPhier AIDET irqchip driver

2017-08-07 Thread Masahiro Yamada
Hi Marc,


2017-08-07 22:36 GMT+09:00 Marc Zyngier :
> On 07/08/17 12:59, Masahiro Yamada wrote:
>> Hi Marc,
>>
>> Thanks for your comments.
>>
>>
>> 2017-08-07 19:43 GMT+09:00 Marc Zyngier :
>>> On 03/08/17 12:15, Masahiro Yamada wrote:
 UniPhier SoCs contain AIDET (ARM Interrupt Detector).  This is intended
 to provide additional features that are not covered by GIC.  The main
 purpose is to provide logic inverter to support low level and falling
 edge trigger type for interrupt lines from on-board devices.

 Signed-off-by: Masahiro Yamada 


Thanks.  I will send v2 based on your comments.



If possible, could you help me with my GPIO driver?

I implemented a similar IRQ domain hierarchy handling in the driver.

This patch:
http://patchwork.ozlabs.org/patch/797145/


If I am understanding correctly, the IRQ mapping to the parent
must be somehow hard-coded in the .alloc() hook of irq_domain_ops.

I asked a question in the following.
https://lkml.org/lkml/2017/7/6/758


I do not have a solution to get the IRQ info from "interrupts" property,
so I am hard-coding it in the driver.

-- 
Best Regards
Masahiro Yamada


Re: [PATCH RFC v2 3/5] samples/bpf: Fix inline asm issues building samples on arm64

2017-08-07 Thread Joel Fernandes
Hi Dave,

On Mon, Aug 7, 2017 at 11:28 AM, David Miller  wrote:
>
> Please, no.

Sorry you dislike it, I had intentionally marked it as RFC as its an
idea I was just toying with the idea and posted it early to get
feedback.

>
> The amount of hellish hacks we are adding to deal with this is getting
> way out of control.

I agree with you that hellish hacks are being added which is why it
keeps breaking. I think one of the things my series does is to add
back inclusion of asm headers that were previously removed (that is
the worst hellish hack in my opinion that existing in mainline). So in
that respect my patch is an improvement and makes it possible to build
for arm64 platforms (which is currently broken in mainline).

>
> BPF programs MUST have their own set of asm headers, this is the
> only way to get around this issue in the long term.

Wouldn't that break scripts or bpf code that instruments/trace arch
specific code?

>
> I am also strongly against adding -static to the build.

I can drop -static if you prefer, that's not an issue.

As I understand it, there are no other cleaner alternatives and this
patchset makes the samples work. I would even argue that's its more
functional than previous attempts and fixes something broken in
mainline in a more generic way. If you can provide an example of where
my patchset may not work, I would love to hear it. My whole idea was
to do it in a way that makes future breakage not happen. I don't think
that leaving things broken in this state for extended periods of time
makes sense and IMHO will slow usage of bpf samples on other
platforms.

thanks,

-Joel


[PATCH v3 RESEND] f2fs: support journalled quota

2017-08-07 Thread Chao Yu
This patch supports to enable f2fs to accept quota information through
mount option:
- {usr,grp,prj}jquota=
- jqfmt=

Then, in ->mount flow, we can recover quota file during log replaying,
by this, journelled quota can be supported.

Signed-off-by: Chao Yu 
Signed-off-by: Jaegeuk Kim 
---
 Documentation/filesystems/f2fs.txt |   9 +
 fs/f2fs/checkpoint.c   |  26 ++-
 fs/f2fs/f2fs.h |   9 +
 fs/f2fs/recovery.c |  72 +++-
 fs/f2fs/super.c| 327 ++---
 5 files changed, 413 insertions(+), 30 deletions(-)

diff --git a/Documentation/filesystems/f2fs.txt 
b/Documentation/filesystems/f2fs.txt
index b8f495a8b67d..deafeff7795b 100644
--- a/Documentation/filesystems/f2fs.txt
+++ b/Documentation/filesystems/f2fs.txt
@@ -165,6 +165,15 @@ io_bits=%u Set the bit size of write IO 
requests. It should be set
 usrquota   Enable plain user disk quota accounting.
 grpquota   Enable plain group disk quota accounting.
 prjquota   Enable plain project quota accounting.
+usrjquota=   Appoint specified file and type during mount, so that 
quota
+grpjquota=   information can be properly updated during recovery 
flow,
+prjjquota=   : must be in root directory;
+jqfmt= : [vfsold,vfsv0,vfsv1].
+offusrjquota   Turn off user journelled quota.
+offgrpjquota   Turn off group journelled quota.
+offprjjquota   Turn off project journelled quota.
+quota  Enable plain user disk quota accounting.
+noquotaDisable all plain disk quota option.
 
 

 DEBUGFS ENTRIES
diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index da5b49183e09..04fe1df052b2 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -588,11 +588,24 @@ static int recover_orphan_inode(struct f2fs_sb_info *sbi, 
nid_t ino)
 int recover_orphan_inodes(struct f2fs_sb_info *sbi)
 {
block_t start_blk, orphan_blocks, i, j;
-   int err;
+   unsigned int s_flags = sbi->sb->s_flags;
+   int err = 0;
 
if (!is_set_ckpt_flags(sbi, CP_ORPHAN_PRESENT_FLAG))
return 0;
 
+   if (s_flags & MS_RDONLY) {
+   f2fs_msg(sbi->sb, KERN_INFO, "orphan cleanup on readonly fs");
+   sbi->sb->s_flags &= ~MS_RDONLY;
+   }
+
+#ifdef CONFIG_QUOTA
+   /* Needed for iput() to work correctly and not trash data */
+   sbi->sb->s_flags |= MS_ACTIVE;
+   /* Turn on quotas so that they are updated correctly */
+   f2fs_enable_quota_files(sbi);
+#endif
+
start_blk = __start_cp_addr(sbi) + 1 + __cp_payload(sbi);
orphan_blocks = __start_sum_addr(sbi) - 1 - __cp_payload(sbi);
 
@@ -608,14 +621,21 @@ int recover_orphan_inodes(struct f2fs_sb_info *sbi)
err = recover_orphan_inode(sbi, ino);
if (err) {
f2fs_put_page(page, 1);
-   return err;
+   goto out;
}
}
f2fs_put_page(page, 1);
}
/* clear Orphan Flag */
clear_ckpt_flags(sbi, CP_ORPHAN_PRESENT_FLAG);
-   return 0;
+out:
+#ifdef CONFIG_QUOTA
+   /* Turn quotas off */
+   f2fs_quota_off_umount(sbi->sb);
+#endif
+   sbi->sb->s_flags = s_flags; /* Restore MS_RDONLY status */
+
+   return err;
 }
 
 static void write_orphan_inodes(struct f2fs_sb_info *sbi, block_t start_blk)
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 1bcaa93bfed7..cea329f75068 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -92,6 +92,7 @@ extern char *fault_name[FAULT_MAX];
 #define F2FS_MOUNT_USRQUOTA0x0008
 #define F2FS_MOUNT_GRPQUOTA0x0010
 #define F2FS_MOUNT_PRJQUOTA0x0020
+#define F2FS_MOUNT_QUOTA   0x0040
 
 #define clear_opt(sbi, option) ((sbi)->mount_opt.opt &= ~F2FS_MOUNT_##option)
 #define set_opt(sbi, option)   ((sbi)->mount_opt.opt |= F2FS_MOUNT_##option)
@@ -1121,6 +1122,12 @@ struct f2fs_sb_info {
 #ifdef CONFIG_F2FS_FAULT_INJECTION
struct f2fs_fault_info fault_info;
 #endif
+
+#ifdef CONFIG_QUOTA
+   /* Names of quota files with journalled quota */
+   char *s_qf_names[MAXQUOTAS];
+   int s_jquota_fmt;   /* Format of quota to use */
+#endif
 };
 
 #ifdef CONFIG_F2FS_FAULT_INJECTION
@@ -2433,6 +2440,8 @@ static inline int f2fs_add_link(struct dentry *dentry, 
struct inode *inode)
  */
 int f2fs_inode_dirtied(struct inode *inode, bool sync);
 void f2fs_inode_synced(struct inode *inode);
+void f2fs_enable_quota_files(struct f2fs_sb_info *sbi);
+void f2fs_quota_off_umount(struct super_block *sb);
 int f2fs_commit_super(struct f2fs_sb_info *sbi, bool recover);
 int f2fs_sync_fs(struct super_block 

Re: [PATCH 2/2 v2] f2fs: introduce gc_urgent mode for background GC

2017-08-07 Thread Jaegeuk Kim
Change log from v1:
 - update Documentation.

This patch adds a sysfs entry to control urgent mode for background GC.
If this is set, background GC thread conducts GC with gc_urgent_sleep_time
all the time.

Signed-off-by: Jaegeuk Kim 
---
 Documentation/ABI/testing/sysfs-fs-f2fs | 12 
 Documentation/filesystems/f2fs.txt  |  9 +
 fs/f2fs/gc.c| 17 +++--
 fs/f2fs/gc.h|  4 
 fs/f2fs/sysfs.c |  9 +
 5 files changed, 49 insertions(+), 2 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs 
b/Documentation/ABI/testing/sysfs-fs-f2fs
index c579ce5e0ef5..11b7f4ebea7c 100644
--- a/Documentation/ABI/testing/sysfs-fs-f2fs
+++ b/Documentation/ABI/testing/sysfs-fs-f2fs
@@ -139,3 +139,15 @@ Date:  June 2017
 Contact:   "Chao Yu" 
 Description:
 Controls current reserved blocks in system.
+
+What:  /sys/fs/f2fs//gc_urgent
+Date:  August 2017
+Contact:   "Jaegeuk Kim" 
+Description:
+Do background GC agressively
+
+What:  /sys/fs/f2fs//gc_urgent_sleep_time
+Date:  August 2017
+Contact:   "Jaegeuk Kim" 
+Description:
+Controls sleep time of GC urgent mode
diff --git a/Documentation/filesystems/f2fs.txt 
b/Documentation/filesystems/f2fs.txt
index b8f495a8b67d..84f36896766c 100644
--- a/Documentation/filesystems/f2fs.txt
+++ b/Documentation/filesystems/f2fs.txt
@@ -210,6 +210,15 @@ Files in /sys/fs/f2fs/
   gc_idle = 1 will select the Cost Benefit approach
   & setting gc_idle = 2 will select the greedy 
approach.
 
+ gc_urgentThis parameter controls triggering background GCs
+  urgently or not. Setting gc_urgent = 0 [default]
+  makes back to default behavior, while if it is 
set
+  to 1, background thread starts to do GC by given
+  gc_urgent_sleep_time interval.
+
+ gc_urgent_sleep_time This parameter controls sleep time for gc_urgent.
+  500 ms is set by default. See above gc_urgent.
+
  reclaim_segments This parameter controls the number of prefree
   segments to be reclaimed. If the number of 
prefree
  segments is larger than the number of segments
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 620dca443b29..8da7c14a9d29 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -35,9 +35,14 @@ static int gc_thread_func(void *data)
set_freezable();
do {
wait_event_interruptible_timeout(*wq,
-   kthread_should_stop() || freezing(current),
+   kthread_should_stop() || freezing(current) ||
+   gc_th->gc_wake,
msecs_to_jiffies(wait_ms));
 
+   /* give it a try one time */
+   if (gc_th->gc_wake)
+   gc_th->gc_wake = 0;
+
if (try_to_freeze())
continue;
if (kthread_should_stop())
@@ -74,6 +79,11 @@ static int gc_thread_func(void *data)
if (!mutex_trylock(>gc_mutex))
goto next;
 
+   if (gc_th->gc_urgent) {
+   wait_ms = gc_th->urgent_sleep_time;
+   goto do_gc;
+   }
+
if (!is_idle(sbi)) {
increase_sleep_time(gc_th, _ms);
mutex_unlock(>gc_mutex);
@@ -84,7 +94,7 @@ static int gc_thread_func(void *data)
decrease_sleep_time(gc_th, _ms);
else
increase_sleep_time(gc_th, _ms);
-
+do_gc:
stat_inc_bggc_count(sbi);
 
/* if return value is not zero, no victim was selected */
@@ -115,11 +125,14 @@ int start_gc_thread(struct f2fs_sb_info *sbi)
goto out;
}
 
+   gc_th->urgent_sleep_time = DEF_GC_THREAD_URGENT_SLEEP_TIME;
gc_th->min_sleep_time = DEF_GC_THREAD_MIN_SLEEP_TIME;
gc_th->max_sleep_time = DEF_GC_THREAD_MAX_SLEEP_TIME;
gc_th->no_gc_sleep_time = DEF_GC_THREAD_NOGC_SLEEP_TIME;
 
gc_th->gc_idle = 0;
+   gc_th->gc_urgent = 0;
+   gc_th->gc_wake= 0;
 
sbi->gc_thread = gc_th;
init_waitqueue_head(>gc_thread->gc_wait_queue_head);
diff --git a/fs/f2fs/gc.h b/fs/f2fs/gc.h
index a993967dcdb9..57a9000ce3af 100644
--- a/fs/f2fs/gc.h
+++ b/fs/f2fs/gc.h
@@ -13,6 +13,7 @@
 * whether IO subsystem is idle
 * or not
 */

Re: [PATCH v2 00/13] mpt3sas driver NVMe support:

2017-08-07 Thread Keith Busch
On Mon, Aug 07, 2017 at 08:45:25AM -0700, James Bottomley wrote:
> On Mon, 2017-08-07 at 20:01 +0530, Kashyap Desai wrote:
> > 
> > We have to attempt this use case and see how it behaves. I have not
> > tried this, so not sure if things are really bad or just some tuning
> > may be helpful. I will revert back to you on this.
> > 
> > I understood request as -  We need some udev rules to be working well
> > for *same* NVME drives if it is behind  or native .
> > Example - If user has OS installed on NVME drive which is behind
> >  driver as SCSI disk should be able to boot if he/she hooked
> > same NVME drive which is detected by native  driver (and vice
> > versa.)
> 
> It's not just the udev rules, it's the tools as well; possibly things
> like that nvme-cli toolkit Intel is doing.

It looks like they can make existing nvme tooling work with little
effort if they have the driver implement NVME_IOCTL_ADMIN_COMMAND, and
then have their driver build the MPI NVMe Encapsulated Request from that.


[PATCH v2 0/4] KVM: optimize the kvm_vcpu_on_spin

2017-08-07 Thread Longpeng(Mike)
This is a simple optimization for kvm_vcpu_on_spin, the
main idea is described in patch-1's commit msg.

I did some tests base on the RFC version, the result shows
that it can improves the performance slightly.

== Geekbench-3.4.1 ==
VM1:8U,4G, vcpu(0...7) is 1:1 pinned to pcpu(6...11,18,19)
running Geekbench-3.4.1 *10 truns*
VM2/VM3/VM4: configure is the same as VM1
stress each vcpu usage(seed by top in guest) to 40%

The comparison of each testcase's score:
(higher is better)
before  after   improve
Inter
 single 1176.7  1179.0  0.2%
 multi  3459.5  3426.5  -0.9%
Float
 single 1150.5  1150.9  0.0%
 multi  3364.5  3391.9  0.8%
Memory(stream)
 single 1768.7  1773.1  0.2%
 multi  2511.6  2557.2  1.8%
Overall
 single 1284.2  1286.2  0.2%
 multi  3231.4  3238.4  0.2%


== kernbench-0.42 ==
VM1:8U,12G, vcpu(0...7) is 1:1 pinned to pcpu(6...11,18,19)
running "kernbench -n 10"
VM2/VM3/VM4: configure is the same as VM1
stress each vcpu usage(seed by top in guest) to 40%

The comparison of 'Elapsed Time':
(sooner is better)
before  after   improve
load -j412.762  12.751  0.1%
load -j32   9.743   8.955   8.1%
load -j 9.688   9.229   4.7%


Physical Machine:
  Architecture:  x86_64
  CPU op-mode(s):32-bit, 64-bit
  Byte Order:Little Endian
  CPU(s):24
  On-line CPU(s) list:   0-23
  Thread(s) per core:2
  Core(s) per socket:6
  Socket(s): 2
  NUMA node(s):  2
  Vendor ID: GenuineIntel
  CPU family:6
  Model: 45
  Model name:Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
  Stepping:  7
  CPU MHz:   2799.902
  BogoMIPS:  5004.67
  Virtualization:VT-x
  L1d cache: 32K
  L1i cache: 32K
  L2 cache:  256K
  L3 cache:  15360K
  NUMA node0 CPU(s): 0-5,12-17
  NUMA node1 CPU(s): 6-11,18-23

---
Changes since V1:
 - split the implementation of s390 & arm. [David]
 - refactor the impls according to the suggestion. [Paolo]

Changes since RFC:
 - only cache result for X86. [David & Cornlia & Paolo]
 - add performance numbers. [David]
 - impls arm/s390. [Christoffer & David]
 - refactor the impls. [me]

---
Longpeng(Mike) (4):
  KVM: add spinlock optimization framework
  KVM: X86: implement the logic for spinlock optimization
  KVM: s390: implements the kvm_arch_vcpu_in_kernel()
  KVM: arm: implements the kvm_arch_vcpu_in_kernel()

 arch/arm/kvm/handle_exit.c  |  2 +-
 arch/arm64/kvm/handle_exit.c|  2 +-
 arch/mips/kvm/mips.c|  6 ++
 arch/powerpc/kvm/powerpc.c  |  6 ++
 arch/s390/kvm/diag.c|  2 +-
 arch/s390/kvm/kvm-s390.c|  6 ++
 arch/x86/include/asm/kvm_host.h |  5 +
 arch/x86/kvm/hyperv.c   |  2 +-
 arch/x86/kvm/svm.c  | 10 +-
 arch/x86/kvm/vmx.c  | 16 +++-
 arch/x86/kvm/x86.c  | 11 +++
 include/linux/kvm_host.h|  3 ++-
 virt/kvm/arm/arm.c  |  5 +
 virt/kvm/kvm_main.c |  4 +++-
 14 files changed, 72 insertions(+), 8 deletions(-)

-- 
1.8.3.1




[PATCH v2 4/4] KVM: arm: implements the kvm_arch_vcpu_in_kernel()

2017-08-07 Thread Longpeng(Mike)
This implements the kvm_arch_vcpu_in_kernel() for ARM.

Signed-off-by: Longpeng(Mike) 
---
 virt/kvm/arm/arm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 862f820..b9f68e4 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -418,7 +418,7 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
 
 bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu)
 {
-   return false;
+   return vcpu_mode_priv(vcpu);
 }
 
 /* Just ensure a guest exit from a particular CPU */
-- 
1.8.3.1




Re: [PATCH v2 0/9] mfd: axp20x: Add basic support for AXP813

2017-08-07 Thread Chen-Yu Tsai
On Wed, Jul 26, 2017 at 4:32 PM, Maxime Ripard
 wrote:
> On Wed, Jul 26, 2017 at 04:28:23PM +0800, Chen-Yu Tsai wrote:
>> Hi everyone,
>>
>> This is v2 of my AXP813 support series. The device tree patches are
>> based on my A83T MMC support series. These will go through the sunxi
>> tree. The dt-binding and mfd patches are based on v4.13-rc1. These
>> will go through Lee's mfd tree.
>>
>> Changes since v1:
>>
>>   - Provided relative path for ac100.txt in dt-bindings/mfd/axp20x.txt
>>
>>   - Added Rob's acks to dt-binding patches
>>
>>   - Added Quentin's "mfd: axp20x: use correct platform device id for
>> many PEK" patch to this series. This patch depends on mfd changes
>> in this series. It is included so Lee can take them together in
>> one go.
>>
>>   - Added Lee's mfd-acks to mfd patches
>>
>>   - Added axp818 compatible with axp813 fallback. The two chips are
>> identical except for the markings. The added compatible matches
>> what is actually on the board, to avoid confusing readers.
>>
>>   - Fixed up device tree patches to mention which board is changed
>>
>>   - Added device tree patches for the H8 homlet
>
> For the whole serie,
> Acked-by: Maxime Ripard 

Applied the dts patches for 4.14.

ChenYu


Re: [PATCH 13/18] power: supply: bq24190_charger: Export 5V boost converter as regulator

2017-08-07 Thread Tony Lindgren
* Hans de Goede  [170806 05:37]:
> Register the 5V boost converter as a regulator named
> "regulator-bq24190-usb-vbus". Note the name includes "bq24190" because
> the bq24190 family is also used on ACPI devices where there are no
> device-tree phandles, so regulator_get will fallback to the name and thus
> it must be unique on the system.

Nice, this makes VBUS easy to use for USB PHY drivers :)

Tony


[PATCH 1/3] autofs - fix AT_NO_AUTOMOUNT not being honored

2017-08-07 Thread Ian Kent
The fstatat(2) and statx() calls can pass the flag AT_NO_AUTOMOUNT
which is meant to clear the LOOKUP_AUTOMOUNT flag and prevent triggering
of an automount by the call. But this flag is unconditionally cleared
for all stat family system calls except statx().

stat family system calls have always triggered mount requests for the
negative dentry case in follow_automount() which is intended but prevents
the fstatat(2) and statx() AT_NO_AUTOMOUNT case from being handled.

In order to handle the AT_NO_AUTOMOUNT for both system calls the
negative dentry case in follow_automount() needs to be changed to
return ENOENT when the LOOKUP_AUTOMOUNT flag is clear (and the other
required flags are clear).

AFAICT this change doesn't have any noticable side effects and may,
in some use cases (although I didn't see it in testing) prevent
unnecessary callbacks to the automount daemon.

It's also possible that a stat family call has been made with a
path that is in the process of being mounted by some other process.
But stat family calls should return the automount state of the path
as it is "now" so it shouldn't wait for mount completion.

This is the same semantic as the positive dentry case already
handled.

Signed-off-by: Ian Kent 
Cc: David Howells 
Cc: Colin Walters 
Cc: Ondrej Holy 
---
 fs/namei.c |   15 ---
 include/linux/fs.h |3 +--
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index ddb6a7c2b3d4..1180f9c58093 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1129,9 +1129,18 @@ static int follow_automount(struct path *path, struct 
nameidata *nd,
 * of the daemon to instantiate them before they can be used.
 */
if (!(nd->flags & (LOOKUP_PARENT | LOOKUP_DIRECTORY |
-  LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT)) &&
-   path->dentry->d_inode)
-   return -EISDIR;
+  LOOKUP_OPEN | LOOKUP_CREATE |
+  LOOKUP_AUTOMOUNT))) {
+   /* Positive dentry that isn't meant to trigger an
+* automount, EISDIR will allow it to be used,
+* otherwise there's no mount here "now" so return
+* ENOENT.
+*/
+   if (path->dentry->d_inode)
+   return -EISDIR;
+   else
+   return -ENOENT;
+   }
 
if (path->dentry->d_sb->s_user_ns != _user_ns)
return -EACCES;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 6e1fd5d21248..37c96f52e48e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3022,8 +3022,7 @@ static inline int vfs_lstat(const char __user *name, 
struct kstat *stat)
 static inline int vfs_fstatat(int dfd, const char __user *filename,
  struct kstat *stat, int flags)
 {
-   return vfs_statx(dfd, filename, flags | AT_NO_AUTOMOUNT,
-stat, STATX_BASIC_STATS);
+   return vfs_statx(dfd, filename, flags, stat, STATX_BASIC_STATS);
 }
 static inline int vfs_fstat(int fd, struct kstat *stat)
 {



[PATCH 3/3] autofs - make dev ioctl version and ismountpoint user accessible

2017-08-07 Thread Ian Kent
Some of the autofs miscellaneous device ioctls need to be accessable to
user space applications without CAP_SYS_ADMIN to get information about
autofs mounts.

Signed-off-by: Ian Kent 
Cc: Colin Walters 
Cc: Ondrej Holy 
---
 fs/autofs4/dev-ioctl.c  |   12 
 include/uapi/linux/auto_dev-ioctl.h |2 +-
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
index 218a4ecc75cc..ea8b3a1cddd2 100644
--- a/fs/autofs4/dev-ioctl.c
+++ b/fs/autofs4/dev-ioctl.c
@@ -628,10 +628,6 @@ static int _autofs_dev_ioctl(unsigned int command,
ioctl_fn fn = NULL;
int err = 0;
 
-   /* only root can play with this */
-   if (!capable(CAP_SYS_ADMIN))
-   return -EPERM;
-
cmd_first = _IOC_NR(AUTOFS_DEV_IOCTL_IOC_FIRST);
cmd = _IOC_NR(command);
 
@@ -640,6 +636,14 @@ static int _autofs_dev_ioctl(unsigned int command,
return -ENOTTY;
}
 
+   /* Only root can use ioctls other than AUTOFS_DEV_IOCTL_VERSION_CMD
+* and AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD
+*/
+   if (cmd != AUTOFS_DEV_IOCTL_VERSION_CMD &&
+   cmd != AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD &&
+   !capable(CAP_SYS_ADMIN))
+   return -EPERM;
+
/* Copy the parameters into kernel space. */
param = copy_dev_ioctl(user);
if (IS_ERR(param))
diff --git a/include/uapi/linux/auto_dev-ioctl.h 
b/include/uapi/linux/auto_dev-ioctl.h
index 744b3d060968..5558db8e6646 100644
--- a/include/uapi/linux/auto_dev-ioctl.h
+++ b/include/uapi/linux/auto_dev-ioctl.h
@@ -16,7 +16,7 @@
 #define AUTOFS_DEVICE_NAME "autofs"
 
 #define AUTOFS_DEV_IOCTL_VERSION_MAJOR 1
-#define AUTOFS_DEV_IOCTL_VERSION_MINOR 0
+#define AUTOFS_DEV_IOCTL_VERSION_MINOR 1
 
 #define AUTOFS_DEV_IOCTL_SIZE  sizeof(struct autofs_dev_ioctl)
 



[PATCH] thermal: rockchip: fix error return code in rockchip_thermal_probe()

2017-08-07 Thread Gustavo A. R. Silva
platform_get_irq() returns an error code, but the rockchip_thermal driver
ignores it and always returns -EINVAL. This is not correct and, prevents
-EPROBE_DEFER from being propagated properly.

Notice that platform_get_irq() no longer returns 0 on error:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e330b9a6bb35dc7097a4f02cb1ae7b6f96df92af

Print and propagate the return value of platform_get_irq on failure.

This issue was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/thermal/rockchip_thermal.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/thermal/rockchip_thermal.c 
b/drivers/thermal/rockchip_thermal.c
index 4c77965..6ca9747 100644
--- a/drivers/thermal/rockchip_thermal.c
+++ b/drivers/thermal/rockchip_thermal.c
@@ -1068,8 +1068,8 @@ static int rockchip_thermal_probe(struct platform_device 
*pdev)
 
irq = platform_get_irq(pdev, 0);
if (irq < 0) {
-   dev_err(>dev, "no irq resource?\n");
-   return -EINVAL;
+   dev_err(>dev, "no irq resource: %d\n", irq);
+   return irq;
}
 
thermal = devm_kzalloc(>dev, sizeof(struct rockchip_thermal_data),
-- 
2.5.0



[PATCH v2] perf/core: Avoid context switch overheads

2017-08-07 Thread 石祤
From: "leilei.lin" 

A performance issue caused by less strickly check in task
sched when these tasks were once attached by per-task perf_event.

A task will alloc task->perf_event_ctxp[ctxn] when it was called
by perf_event_open, and task->perf_event_ctxp[ctxn] would not
ever be freed to NULL.

__perf_event_task_sched_in()
if (task->perf_event_ctxp[ctxn]) //  here is always true
perf_event_context_sched_in() // operate pmu

50% at most performance overhead was observed under some extreme
test case. Therefor, add a more strick check as to ctx->nr_events,
when ctx->nr_events == 0, it's no need to continue.

Signed-off-by: leilei.lin 
---
 kernel/events/core.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 426c2ff..3d86695 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3180,6 +3180,13 @@ static void perf_event_context_sched_in(struct 
perf_event_context *ctx,
return;
 
perf_ctx_lock(cpuctx, ctx);
+   /*
+* We must check ctx->nr_events while holding ctx->lock, such
+* that we serialize against perf_install_in_context().
+*/
+   if (!cpuctx->task_ctx && !ctx->nr_events)
+   goto unlock;
+
perf_pmu_disable(ctx->pmu);
/*
 * We want to keep the following priority order:
@@ -3193,6 +3200,8 @@ static void perf_event_context_sched_in(struct 
perf_event_context *ctx,
cpu_ctx_sched_out(cpuctx, EVENT_FLEXIBLE);
perf_event_sched_in(cpuctx, ctx, task);
perf_pmu_enable(ctx->pmu);
+
+unlock:
perf_ctx_unlock(cpuctx, ctx);
 }
 
-- 
2.8.4.31.g9ed660f



Re: [PATCH] arm64: correct modules range of kernel virtual memory layout

2017-08-07 Thread Miles Chen
On Mon, 2017-08-07 at 15:01 +0100, Will Deacon wrote:
> On Mon, Aug 07, 2017 at 02:18:00PM +0100, Ard Biesheuvel wrote:
> > On 7 August 2017 at 14:16, Will Deacon  wrote:
> > > On Mon, Aug 07, 2017 at 07:04:46PM +0800, Miles Chen wrote:
> > >> The commit f80fb3a3d508 ("arm64: add support for kernel ASLR")
> > >> moved module virtual address to
> > >> [module_alloc_base, module_alloc_base + MODULES_VSIZE).
> > >>
> > >> Display module information of the virtual kernel
> > >> memory layout by using module_alloc_base.
> > >>
> > >> testing output:
> > >> 1) Current implementation:
> > >> Virtual kernel memory layout:
> > >>   modules : 0xff80 - 0xff800800   (   128 MB)
> > >> 2) this patch + KASLR:
> > >> Virtual kernel memory layout:
> > >>   modules : 0xff800056 - 0xff800856   (   128 MB)
> > >> 3) this patch + KASLR and a dummy seed:
> > >> Virtual kernel memory layout:
> > >>   modules : 0xffa7df637000 - 0xffa7e7637000   (   128 MB)
> > >>
> > >> Signed-off-by: Miles Chen 
> > >> ---
> > >>  arch/arm64/mm/init.c | 5 +++--
> > >>  1 file changed, 3 insertions(+), 2 deletions(-)
> > >
> > > Does this mean the modules code in our pt dumper is busted
> > > (arch/arm64/mm/dump.c)? Also, what about KASAN, which uses these addresses
> > > too (in kasan_init)? Should we just remove MODULES_VADDR and MODULES_END
> > > altogether?
> > >
> > 
> > I don't think we need this patch. The 'module' line simply prints the
> > VA region that is reserved for modules. The fact that we end up
> > putting them elsewhere when running randomized does not necessarily
> > mean this line should reflect that.
> 
> I was more concerned by other users of MODULES_VADDR tbh, although I see
> now that we don't randomize the module region if kasan is enabled. Still,
> the kcore code adds the modules region as a separate area (distinct from
> vmalloc) if MODULES_VADDR is defined, the page table dumping code uses
> MODULES_VADDR to identify the module region and I think we'll get false
> positives from is_vmalloc_or_module_addr, which again uses the static
> region.
> 
> So, given that MODULES_VADDR never points at the module area, can't we get
> rid of it?

Agreed.MODULES_VADDR should be phased out. Considering the kernel
modules live somewhere between [VMALLOC_START, VMALLOC_END) now:
(arch/arm64/kernel/module.c:module_alloc). I suggest the following
changes:

1. is_vmalloc_or_module_addr() should return is_vmalloc_addr() directly
2. arch/arm64/mm/dump.c does not need MODULES_VADDR and MODULES_END.
3. kasan uses [module_alloc_base, module_alloc_base + MODULES_VSIZE) to
get the shadow memory? (the kernel modules still live in this range when
kasan is enabled)
4. remove modules line in kernel memory layout
(optional, thanks for Ard's feedback)
5. remove MODULE_VADDR, MODULES_END definition


Miles
> 
> Will




Re: [MD] Crash with 4.12+ kernel and high disk load -- bisected to 4ad23a976413: MD: use per-cpu counter for writes_pending

2017-08-07 Thread Shaohua Li
On Mon, Aug 07, 2017 at 01:20:25PM +0200, Dominik Brodowski wrote:
> Neil, Shaohua,
> 
> following up on David R's bug message: I have observed something similar
> on v4.12.[345] and v4.13-rc4, but not on v4.11. This is a RAID1 (on bare
> metal partitions, /dev/sdaX and /dev/sdbY linked together). In case it
> matters: Further upwards are cryptsetup, a DM volume group, then logical
> volumes, and then filesystems (ext4, but also happened with xfs).
> 
> In a tedious bisect (the bug wasn't as quickly reproducible as I would like,
> but happened when I repeatedly created large lvs and filled them with some
> content, while compiling kernels in parallel), I was able to track this
> down to:
> 
> 
> commit 4ad23a976413aa57fe5ba7a25953dc35ccca5b71
> Author: NeilBrown 
> Date:   Wed Mar 15 14:05:14 2017 +1100
> 
> MD: use per-cpu counter for writes_pending
> 
> The 'writes_pending' counter is used to determine when the
> array is stable so that it can be marked in the superblock
> as "Clean".  Consequently it needs to be updated frequently
> but only checked for zero occasionally.  Recent changes to
> raid5 cause the count to be updated even more often - once
> per 4K rather than once per bio.  This provided
> justification for making the updates more efficient.
> 
> ...
> 
> 
> CC'ing t...@kernel.org, as 4ad23a976413 is the first (and only?) user
> of percpu_ref_switch_to_atomic_sync() introduced in 210f7cdcf088.
> 
> Applying a415c0f10627 on top of 4ad23a976413 does *not* fix the issue, but
> reverting all of a2bfc6753065, a415c0f10627 and 4ad23a976413 seems to fix
> the issue for v4.12.5.

Spent some time to check this one, unfortunately I can't find how that patch
makes rcu stall. the percpu part looks good to me too. Can you double check if
reverting 4ad23a976413aa57 makes the issue go away? When the rcu stall happens,
what the /sys/block/md/md0/array_state? please also attach /proc/mdstat. When
you say the mdx_raid1 threads are in 'R' state, can you double check if the
/proc/pid/stack always 0xff?

Thanks,
Shaohua
> In addition, I can provide the following stack traces, which appear in dmesg
> around the time the system becomes more or less unusuable, with one or more
> of the md[0123]_raid1 threads in the "R" state.
> 
> ...  ...
> [  142.275244] INFO: rcu_sched self-detected stall on CPU
> [  142.275386]  4-...: (5999 ticks this GP) idle=d8a/141/0 
> softirq=2404/2404 fqs=2954
> [  142.275441]   (t=6000 jiffies g=645 c=644 q=199031)
> [  142.275490] NMI backtrace for cpu 4
> [  142.275537] CPU: 4 PID: 1164 Comm: md2_raid1 Not tainted 4.12.4 #2
> [  142.275586] Hardware name: MSI MS-7522/MSI X58 Pro (MS-7522)  , BIOS 
> V8.14B8 11/09/2012
> [  142.275640] Call Trace:
> [  142.275683]  
> [  142.275728]  dump_stack+0x4d/0x6a
> [  142.275775]  nmi_cpu_backtrace+0x9b/0xa0
> [  142.275822]  ? irq_force_complete_move+0xf0/0xf0
> [  142.275869]  nmi_trigger_cpumask_backtrace+0x8f/0xc0
> [  142.275918]  arch_trigger_cpumask_backtrace+0x14/0x20
> [  142.275967]  rcu_dump_cpu_stacks+0x8f/0xd9
> [  142.276016]  rcu_check_callbacks+0x62e/0x780
> [  142.276064]  ? acct_account_cputime+0x17/0x20
> [  142.276111]  update_process_times+0x2a/0x50
> [  142.276159]  tick_sched_handle.isra.18+0x2d/0x30
> [  142.276222]  tick_sched_timer+0x38/0x70
> [  142.276283]  __hrtimer_run_queues+0xbe/0x120
> [  142.276345]  hrtimer_interrupt+0xa3/0x190
> [  142.276408]  local_apic_timer_interrupt+0x33/0x60
> [  142.276471]  smp_apic_timer_interrupt+0x33/0x50
> [  142.276534]  apic_timer_interrupt+0x86/0x90
> [  142.276598] RIP: 0010:__wake_up+0x44/0x50
> [  142.276658] RSP: 0018:c9f8fd88 EFLAGS: 0246 ORIG_RAX: 
> ff10
> [  142.276742] RAX: 81a84bc0 RBX: 880235cf8800 RCX: 
> 
> [  142.276809] RDX: 81a84bd8 RSI: 0246 RDI: 
> 81a84bd0
> [  142.276876] RBP: c9f8fd98 R08:  R09: 
> 0001
> [  142.276943] R10:  R11:  R12: 
> 880235cf8800
> [  142.277009] R13: 880235eb2c28 R14: 0001 R15: 
> 
> [  142.277076]  
> [  142.277136]  md_check_recovery+0x30b/0x4a0
> [  142.277199]  raid1d+0x4c/0x810
> [  142.277258]  md_thread+0x11a/0x150
> [  142.277319]  ? md_thread+0x11a/0x150
> [  142.277379]  ? __wake_up_common+0x80/0x80
> [  142.277442]  kthread+0x11a/0x150
> [  142.277502]  ? find_pers+0x70/0x70
> [  142.277562]  ? __kthread_create_on_node+0x140/0x140
> [  142.277625]  ret_from_fork+0x22/0x30
> 
> ... or this one (on v4.12.5):
> [ 1294.560172] INFO: rcu_sched self-detected stall on CPU  
> [ 1294.560285]  2-...: (6000 ticks this GP) idle=f06/141/0 
> softirq=140681/140681 fqs=2988
> [ 1294.560365]   (t=6001 jiffies g=28666 c=28665 q=129416)
> [ 1294.560426] NMI backtrace for cpu 2
> [ 1294.560483] CPU: 2 PID: 1173 Comm: md3_raid1 Not tainted 4.12.5 #1
> [ 1294.560543] 

[PATCH] spi/bcm63xx: fix error return code in bcm63xx_spi_probe()

2017-08-07 Thread Gustavo A. R. Silva
platform_get_irq() returns an error code, but the spi-bcm63xx driver
ignores it and always returns -ENXIO. This is not correct and,
prevents -EPROBE_DEFER from being propagated properly.

Notice that platform_get_irq() no longer returns 0 on error:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e330b9a6bb35dc7097a4f02cb1ae7b6f96df92af

Print and propagate the return value of platform_get_irq on failure.

This issue was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/spi/spi-bcm63xx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/spi/spi-bcm63xx.c b/drivers/spi/spi-bcm63xx.c
index 84c7356..bfe5754 100644
--- a/drivers/spi/spi-bcm63xx.c
+++ b/drivers/spi/spi-bcm63xx.c
@@ -530,8 +530,8 @@ static int bcm63xx_spi_probe(struct platform_device *pdev)
 
irq = platform_get_irq(pdev, 0);
if (irq < 0) {
-   dev_err(dev, "no irq\n");
-   return -ENXIO;
+   dev_err(dev, "no irq: %d\n", irq);
+   return irq;
}
 
clk = devm_clk_get(dev, "spi");
-- 
2.5.0



linux-next: manual merge of the userns tree with the mips tree

2017-08-07 Thread Stephen Rothwell
Hi Eric,

Today's linux-next merge of the userns tree got a conflict in:

  arch/mips/kernel/traps.c

between commit:

  260a789828aa ("MIPS: signal: Remove unreachable code from force_fcr31_sig().")

from the mips tree and commit:

  ea1b75cf9138 ("signal/mips: Document a conflict with SI_USER with SIGFPE")

from the userns tree.

I fixed it up (the former removed the code updated by the latter) and
can carry the fix as necessary. This is now fixed as far as linux-next
is concerned, but any non trivial conflicts should be mentioned to your
upstream maintainer when your tree is submitted for merging.  You may
also want to consider cooperating with the maintainer of the conflicting
tree to minimise any particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell


Re: [PATCH] mm: ratelimit PFNs busy info message

2017-08-07 Thread Michael Ellerman
Andrew Morton  writes:

> On Wed,  2 Aug 2017 13:44:57 -0400 Jonathan Toppins  
> wrote:
>
>> The RDMA subsystem can generate several thousand of these messages per
>> second eventually leading to a kernel crash. Ratelimit these messages
>> to prevent this crash.
>
> Well...  why are all these EBUSY's occurring?  It sounds inefficient (at
> least) but if it is expected, normal and unavoidable then perhaps we
> should just remove that message altogether?

We see them on powerpc sometimes when CMA is unable to make large
allocations for the hash table of a KVM guest.

At least in that context they're not useful, CMA will try the
allocation again, and if it really can't allocate then CMA will print
more useful information itself.

So I'd vote for dropping the message and letting the callers decide what
to do.

cheers


Re: [RFC PATCH 0/1] Add hugetlbfs support to memfd_create()

2017-08-07 Thread Michal Hocko
Hi,
I am one foot out of office and will be offline for two days so I
didn't get to review the patch yet but this information is an useful
information about the usecase that should be in the patch directly for
future reference.

On Mon 07-08-17 16:47:51, Mike Kravetz wrote:
> This patch came out of discussions in this e-mail thread [1].
> 
> The Oracle JVM team is developing a new garbage collection model.  This
> new model requires multiple mappings of the same anonymous memory.  One
> straight forward way to accomplish this is with memfd_create.  They can
> use the returned fd to create multiple mappings of the same memory.
> 
> The JVM today has an option to use (static hugetlb) huge pages.  If this
> option is specified, they would like to use the same garbage collection
> model requiring multiple mappings to the same memory.  Using hugetlbfs,
> it is possible to explicitly mount a filesystem and specify file paths
> in order to get an fd that can be used for multiple mappings.  However,
> this introduces additional system admin work and coordination.
> 
> Ideally they would like to get a hugetlbfs fd without requiring explicit
> mounting of a filesystem.   Today, mmap and shmget can make use of
> hugetlbfs without explicitly mounting a filesystem.  The patch adds this
> functionality to hugetlbfs.
> 
> A new flag MFD_HUGETLB is introduced to request a hugetlbfs file.  Like
> other system calls where hugetlb can be requested, the huge page size
> can be encoded in the flags argument is the non-default huge page size
> is desired.  hugetlbfs does not support sealing operations, therefore
> specifying MFD_ALLOW_SEALING with MFD_HUGETLB will result in EINVAL.
> 
> Of course, the memfd_man page would need updating if this type of
> functionality moves forward.
> 
> [1] https://lkml.org/lkml/2017/7/6/564
> 
> Mike Kravetz (1):
>   mm/shmem: add hugetlbfs support to memfd_create()
> 
>  include/uapi/linux/memfd.h | 24 
>  mm/shmem.c | 37 +++--
>  2 files changed, 55 insertions(+), 6 deletions(-)
> 
> -- 
> 2.7.5

-- 
Michal Hocko
SUSE Labs


[PATCH 4/4] dt-bindings: mt8173-xhci: add generic compatible and rename file

2017-08-07 Thread Chunfeng Yun
The mt8173-xhci.txt actually holds the bindings for all mediatek
SoCs with xHCI controller, so add a generic compatible and change
the name to xhci-mtk.txt to reflect that.

Signed-off-by: Chunfeng Yun 
---
 .../bindings/usb/{mt8173-xhci.txt => xhci-mtk.txt} |   10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)
 rename Documentation/devicetree/bindings/usb/{mt8173-xhci.txt => xhci-mtk.txt} 
(92%)

diff --git a/Documentation/devicetree/bindings/usb/mt8173-xhci.txt 
b/Documentation/devicetree/bindings/usb/xhci-mtk.txt
similarity index 92%
rename from Documentation/devicetree/bindings/usb/mt8173-xhci.txt
rename to Documentation/devicetree/bindings/usb/xhci-mtk.txt
index 0acfc8a..1ce77c7 100644
--- a/Documentation/devicetree/bindings/usb/mt8173-xhci.txt
+++ b/Documentation/devicetree/bindings/usb/xhci-mtk.txt
@@ -11,7 +11,9 @@ into two parts.
 
 
 Required properties:
- - compatible : should contain "mediatek,mt8173-xhci"
+ - compatible : should be one of
+   "mediatek,mt8173-xhci" (deprecated, use "mediatek,xhci-mtk" instead),
+   "mediatek,xhci-mtk"
  - reg : specifies physical base address and size of the registers
  - reg-names: should be "mac" for xHCI MAC and "ippc" for IP port control
  - interrupts : interrupt used by the controller
@@ -68,10 +70,12 @@ usb30: usb@1127 {
 
 In the case, xhci is added as subnode to mtu3. An example and the DT binding
 details of mtu3 can be found in:
-Documentation/devicetree/bindings/usb/mt8173-mtu3.txt
+Documentation/devicetree/bindings/usb/mtu3.txt
 
 Required properties:
- - compatible : should contain "mediatek,mt8173-xhci"
+ - compatible : should be one of
+   "mediatek,mt8173-xhci" (deprecated, use "mediatek,xhci-mtk" instead),
+   "mediatek,xhci-mtk"
  - reg : specifies physical base address and size of the registers
  - reg-names: should be "mac" for xHCI MAC
  - interrupts : interrupt used by the host controller
-- 
1.7.9.5



[PATCH 3/4] dt-bindings: mt8173-mtu3: add generic compatible and rename file

2017-08-07 Thread Chunfeng Yun
The mt8173-mtu3.txt actually holds the bindings for all mediatek
SoCs with usb3 DRD IP, so add a generic compatible and change the
name to mtu3.txt.

Signed-off-by: Chunfeng Yun 
---
 .../bindings/usb/{mt8173-mtu3.txt => mtu3.txt} |6 --
 1 file changed, 4 insertions(+), 2 deletions(-)
 rename Documentation/devicetree/bindings/usb/{mt8173-mtu3.txt => mtu3.txt} 
(95%)

diff --git a/Documentation/devicetree/bindings/usb/mt8173-mtu3.txt 
b/Documentation/devicetree/bindings/usb/mtu3.txt
similarity index 95%
rename from Documentation/devicetree/bindings/usb/mt8173-mtu3.txt
rename to Documentation/devicetree/bindings/usb/mtu3.txt
index 1d7c3bc..832741d 100644
--- a/Documentation/devicetree/bindings/usb/mt8173-mtu3.txt
+++ b/Documentation/devicetree/bindings/usb/mtu3.txt
@@ -1,7 +1,9 @@
 The device node for Mediatek USB3.0 DRD controller
 
 Required properties:
- - compatible : should be "mediatek,mt8173-mtu3"
+ - compatible : should be one of
+   "mediatek,mt8173-mtu3" (deprecated, use "mediatek,mtu3" instead),
+   "mediatek,mtu3"
  - reg : specifies physical base address and size of the registers
  - reg-names: should be "mac" for device IP and "ippc" for IP port control
  - interrupts : interrupt used by the device IP
@@ -44,7 +46,7 @@ Optional properties:
 Sub-nodes:
 The xhci should be added as subnode to mtu3 as shown in the following example
 if host mode is enabled. The DT binding details of xhci can be found in:
-Documentation/devicetree/bindings/usb/mt8173-xhci.txt
+Documentation/devicetree/bindings/usb/xhci-mtk.txt
 
 Example:
 ssusb: usb@11271000 {
-- 
1.7.9.5



[PATCH 2/4] usb: xhci-mtk: add generic compatible string

2017-08-07 Thread Chunfeng Yun
The xhci-mtk driver is a generic driver for MediaTek xHCI IP, add
a generic compatible to avoid confusion when support new SoCs but
use a compatible with specific SoC's name "mt8173".

Signed-off-by: Chunfeng Yun 
---
 drivers/usb/host/xhci-mtk.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/usb/host/xhci-mtk.c b/drivers/usb/host/xhci-mtk.c
index 67d5dc7..d2934b9 100644
--- a/drivers/usb/host/xhci-mtk.c
+++ b/drivers/usb/host/xhci-mtk.c
@@ -795,6 +795,7 @@ static int __maybe_unused xhci_mtk_resume(struct device 
*dev)
 #ifdef CONFIG_OF
 static const struct of_device_id mtk_xhci_of_match[] = {
{ .compatible = "mediatek,mt8173-xhci"},
+   { .compatible = "mediatek,xhci-mtk"},
{ },
 };
 MODULE_DEVICE_TABLE(of, mtk_xhci_of_match);
-- 
1.7.9.5



[PATCH 1/4] usb: mtu3: add generic compatible string

2017-08-07 Thread Chunfeng Yun
The mtu3 driver is a generic driver for MediaTek usb3 DRD IP, add
a generic compatible to avoid confusion when support new SoCs but
use a compatible with specific SoC's name "mt8173".

Signed-off-by: Chunfeng Yun 
---
 drivers/usb/mtu3/mtu3_plat.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/usb/mtu3/mtu3_plat.c b/drivers/usb/mtu3/mtu3_plat.c
index 0d3ebb3..088e3e6 100644
--- a/drivers/usb/mtu3/mtu3_plat.c
+++ b/drivers/usb/mtu3/mtu3_plat.c
@@ -500,6 +500,7 @@ static int __maybe_unused mtu3_resume(struct device *dev)
 
 static const struct of_device_id mtu3_of_match[] = {
{.compatible = "mediatek,mt8173-mtu3",},
+   {.compatible = "mediatek,mtu3",},
{},
 };
 
-- 
1.7.9.5



Re: [PATCH v5 net-next 00/12] bpf: rewrite value tracking in verifier

2017-08-07 Thread Daniel Borkmann

On 08/07/2017 04:21 PM, Edward Cree wrote:

This series simplifies alignment tracking, generalises bounds tracking and
  fixes some bounds-tracking bugs in the BPF verifier.  Pointer arithmetic on
  packet pointers, stack pointers, map value pointers and context pointers has
  been unified, and bounds on these pointers are only checked when the pointer
  is dereferenced.
Operations on pointers which destroy all relation to the original pointer
  (such as multiplies and shifts) are disallowed if !env->allow_ptr_leaks,
  otherwise they convert the pointer to an unknown scalar and feed it to the
  normal scalar arithmetic handling.
Pointer types have been unified with the corresponding adjusted-pointer types
  where those existed (e.g. PTR_TO_MAP_VALUE[_ADJ] or FRAME_PTR vs
  PTR_TO_STACK); similarly, CONST_IMM and UNKNOWN_VALUE have been unified into
  SCALAR_VALUE.
Pointer types (except CONST_PTR_TO_MAP, PTR_TO_MAP_VALUE_OR_NULL and
  PTR_TO_PACKET_END, which do not allow arithmetic) have a 'fixed offset' and
  a 'variable offset'; the former is used when e.g. adding an immediate or a
  known-constant register, as long as it does not overflow.  Otherwise the
  latter is used, and any operation creating a new variable offset creates a
  new 'id' (and, for PTR_TO_PACKET, clears the 'range').
SCALAR_VALUEs use the 'variable offset' fields to track the range of possible
  values; the 'fixed offset' should never be set on a scalar.


Been testing and reviewing the series over the last several days, looks
reasonable to me as far as I can tell. Thanks for all the hard work on
unifying this, Edward!

Acked-by: Daniel Borkmann 


Re: [PATCH] devfreq: add error check for sscanf in userspace governor

2017-08-07 Thread Chanwoo Choi
Hi,

On 2017년 08월 07일 22:06, Santosh Mardi wrote:
> store_freq function of devfreq userspace governor
> executes further, even if error is returned from sscanf,
> this will result in setting up wrong frequency value.
> 
> Add proper error check to bail out if any error is returned.
> 
> Signed-off-by: Santosh Mardi 
> ---
>  drivers/devfreq/governor_userspace.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/devfreq/governor_userspace.c 
> b/drivers/devfreq/governor_userspace.c
> index 77028c2..1d0c9cc 100644
> --- a/drivers/devfreq/governor_userspace.c
> +++ b/drivers/devfreq/governor_userspace.c
> @@ -53,12 +53,15 @@ static ssize_t store_freq(struct device *dev, struct 
> device_attribute *attr,
>   mutex_lock(>lock);
>   data = devfreq->data;
>  
> - sscanf(buf, "%lu", );
> + err = sscanf(buf, "%lu", );
> + if (err != 1)
> + goto out;
>   data->user_frequency = wanted;
>   data->valid = true;
>   err = update_devfreq(devfreq);
>   if (err == 0)
>   err = count;
> +out:
>   mutex_unlock(>lock);
>   return err;
>  }
> 

Looks good to me.
Reviewed-by: Chanwoo Choi 

-- 
Best Regards,
Chanwoo Choi
Samsung Electronics


Re: [PATCH v2 0/4] ipmi: bt-i2c: added IPMI Block Transfer over I2C

2017-08-07 Thread Brendan Higgins
On Sat, Aug 5, 2017 at 3:23 PM, Corey Minyard  wrote:
> On 08/04/2017 08:18 PM, Brendan Higgins wrote:
>>
>> This patchset introduces IPMI Block Transfer over I2C (BT-I2C), which has
>> the
>> same semantics as IPMI Block Transfer except it done over I2C.
>>
>> For the OpenBMC people, this is based on an RFC:
>> https://lists.ozlabs.org/pipermail/openbmc/2016-September/004505.html
>>
>> The documentation discusses the reason for this in greater detail, suffice
>> it to
>> say SSIF cannot be correctly implemented on some naive I2C devices. There
>> are
>> some additional reasons why we don't like SSIF, but those are again
>> covered in
>> the documentation for all those who are interested.
>
>
> I'm not terribly excited about this.  A few notes:

I was afraid so, alas.

>
> SMBus alerts are fairly broken in Linux right now.  I have a patch to fix
> this at:
> https://github.com/cminyard/linux-ipmi/commit/48136176ce1890f99857c73e0ace5bd8dfb61fbf
> I haven't been able to get much traction getting anyone to care.

Yeah, I have some work I would like to do there as well.

>
> The lack of a NACK could be worked around fairly easily in the current
> driver.  It looks like you
> are just returning a message too short to be valid.  That's easy.  I think
> it's a rather major
> deficiency in the hardware to not be able to NACK something, but that is
> what it is.

Right, we actually have multiple pieces of hardware that do not support
NACKing correctly. The most frustrating piece is the Aspeed chip which
does not provide and facility for arbitrary NACKs to be generated on the
slave side.

>
> What you have is not really BT over I2C.  You have just added a sequence
> number to the
> IPMI messages and dropped the SMBus command.  Other interfaces have sequence
> numbers,
> too.  Calling it BT is a little over the top.

Fair point, maybe ISIF (I2C System Interface)? I don't have strong feelings
about the name.

>
> Do you really need the performance required by having multiple outstanding
> messages?
> That adds a lot of complexity, if it's unnecessary it's just a waste.  The
> IPMI work on top
> of interfaces does not really require that much performance, it's just
> reading sensors,
> FRU data, and such.  Perhaps you have a reason, but I fail to see
> why it's really that big a deal.  The BT interface has this ability, but the
> driver does not
> take advantage of it and nobody has complained.
>

Yes, we do have some platforms which only have IPMI as a standard interface
and we are abusing some OEM commands to do some things that we probably
should not do with IPMI like doing firmware updates. Also, we have some
commands which take a really long time to complete (> 1s). Admittedly, this is
abusing IPMI to solve problems which should probably be solved elsewhere;
nevertheless, it is a feature we are actually using. And having an option to use
sequence numbers if definitely nice from our perspective.

We will probably want to improve the block transfer driver at some
point as well.

> And I don't understand the part about OpenBMC making use of sequence
> numbers.
> Why does that matter for this interface?  It's the host side that would care
> about
> that, the host would stick the numbers in and the slave would return it.  If
> you are
> using sequence numbers in OpenBMC, which sounds quite reasonable, I would
> think
> it would be a bad idea to to trust that the host would give you good
> sequence
> numbers.
>

I think, I illustrated the use case above, but to reiterate, the
desire is to have
multiple messages in flight at the same time because some messages take
a long time to service.

> Plus, with multiple outstanding messages, you really need to limit it.  A
> particular BMC
> may not be able to handle it the full 256, and the ability to have that many
> messages
> outstanding is probably not a good thing.
>

It is going to depend on the BMC of course; nevertheless, I would be willing
to implement a configurable limit.

> If you really need multiple outstanding messages, the host side IPMI message
> handler
> needs to change to allow that.  It's doable, and I know how, I just haven't
> seen the
> need.
>

Sure, we would also like SMBus alert support, but I figured it was probably
best to discuss this with you before we go too far down that path.

> I would agree that the multi-part messages in SSIF is a big pain and and a
> lot of
> unnecessary complexity.  I believe it is there to accommodate host hardware
> that is
> limited.  But SMBus can have 255 byte messages and there's no arbitrary
> limit on
> I2C.  It is the way of IPMI to support the least common denominator.
>
> Your interface will only work on Linux.  Other OSes (unless they choose to
> implement this
> driver) will be unable to use your BMC.  Of course there's the NACK issue,
> but that's a
> big difference, and it would probably still work with existing drivers on
> other OSes.

I hope I did not send the message that we are planning 

Re: [PATCH v2 2/4] usb: common: Move u_serial from gadget/function to usb/common

2017-08-07 Thread Lu Baolu
Hi,

On 08/07/2017 04:13 PM, Felipe Balbi wrote:
> Hi,
>
> Lu Baolu  writes:
>> The component u_serial provides a glue layer between TTY layer
>> and a USB gadget device needed to provide a basic serial port
>> functionality. Currently, u_serial sits under gadget/function
>> and depends on CONFIG_USB_GADGET to be compiled and used.
>>
>> Most of the serial gadget devices are based on a UDC (USB device
>> controller) and implemented by making use of the Linux gadget
>> frameworks. But we are facing other implementions as well. One
>> example can be found with xHCI debug capability. The xHCI debug
>> capability implements a serial gadget with hardware and firmware,
>> and provides an interface similar with xHCI host for submitting
>> and reaping the transfer requests.
>>
>> In order to make better use of u_serial when implementing xHCI
>> debug capability in xHCI driver, this patch moves u_serial.c
>> from gadget/function to usb/common, and moves u_serial.h from
>> gadget/function to include/linux/usb.
>>
>> Signed-off-by: Lu Baolu 
> NAK, u_serial uses the gadget API. It's definitely not COMMON.
>

Okay. It seems that I can't use u_serial anyway. I will implement
a new tty glue for my case.

Best regards,
Lu Baolu


Re: [PATCH 3/4] selftests/seccomp: Refactor RET_ERRNO tests

2017-08-07 Thread Tyler Hicks
On 08/02/2017 10:19 PM, Kees Cook wrote:
> This refactors the errno tests (since they all use the same pattern for
> their filter) and adds a RET_DATA field ordering test.
> 
> Signed-off-by: Kees Cook 

This all looks good and is a great idea.

Reviewed-by: Tyler Hicks 

Tyler

> ---
>  tools/testing/selftests/seccomp/seccomp_bpf.c | 95 
> ---
>  1 file changed, 58 insertions(+), 37 deletions(-)
> 
> diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c 
> b/tools/testing/selftests/seccomp/seccomp_bpf.c
> index 73f5ea6778ce..ee78a53da5d1 100644
> --- a/tools/testing/selftests/seccomp/seccomp_bpf.c
> +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
> @@ -136,7 +136,7 @@ TEST(no_new_privs_support)
>   }
>  }
>  
> -/* Tests kernel support by checking for a copy_from_user() fault on * NULL. 
> */
> +/* Tests kernel support by checking for a copy_from_user() fault on NULL. */
>  TEST(mode_filter_support)
>  {
>   long ret;
> @@ -541,26 +541,30 @@ TEST(arg_out_of_range)
>   EXPECT_EQ(EINVAL, errno);
>  }
>  
> +#define ERRNO_FILTER(name, errno)\
> + struct sock_filter _read_filter_##name[] = {\
> + BPF_STMT(BPF_LD|BPF_W|BPF_ABS,  \
> + offsetof(struct seccomp_data, nr)), \
> + BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_read, 0, 1),   \
> + BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ERRNO | errno), \
> + BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW), \
> + };  \
> + struct sock_fprog prog_##name = {   \
> + .len = (unsigned short)ARRAY_SIZE(_read_filter_##name), \
> + .filter = _read_filter_##name,  \
> + }
> +
> +/* Make sure basic errno values are correctly passed through a filter. */
>  TEST(ERRNO_valid)
>  {
> - struct sock_filter filter[] = {
> - BPF_STMT(BPF_LD|BPF_W|BPF_ABS,
> - offsetof(struct seccomp_data, nr)),
> - BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_read, 0, 1),
> - BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ERRNO | E2BIG),
> - BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW),
> - };
> - struct sock_fprog prog = {
> - .len = (unsigned short)ARRAY_SIZE(filter),
> - .filter = filter,
> - };
> + ERRNO_FILTER(valid, E2BIG);
>   long ret;
>   pid_t parent = getppid();
>  
>   ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
>   ASSERT_EQ(0, ret);
>  
> - ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, );
> + ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, _valid);
>   ASSERT_EQ(0, ret);
>  
>   EXPECT_EQ(parent, syscall(__NR_getppid));
> @@ -568,26 +572,17 @@ TEST(ERRNO_valid)
>   EXPECT_EQ(E2BIG, errno);
>  }
>  
> +/* Make sure an errno of zero is correctly handled by the arch code. */
>  TEST(ERRNO_zero)
>  {
> - struct sock_filter filter[] = {
> - BPF_STMT(BPF_LD|BPF_W|BPF_ABS,
> - offsetof(struct seccomp_data, nr)),
> - BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_read, 0, 1),
> - BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ERRNO | 0),
> - BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW),
> - };
> - struct sock_fprog prog = {
> - .len = (unsigned short)ARRAY_SIZE(filter),
> - .filter = filter,
> - };
> + ERRNO_FILTER(zero, 0);
>   long ret;
>   pid_t parent = getppid();
>  
>   ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
>   ASSERT_EQ(0, ret);
>  
> - ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, );
> + ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, _zero);
>   ASSERT_EQ(0, ret);
>  
>   EXPECT_EQ(parent, syscall(__NR_getppid));
> @@ -595,26 +590,21 @@ TEST(ERRNO_zero)
>   EXPECT_EQ(0, read(0, NULL, 0));
>  }
>  
> +/*
> + * The SECCOMP_RET_DATA mask is 16 bits wide, but errno is smaller.
> + * This tests that the errno value gets capped correctly, fixed by
> + * 580c57f10768 ("seccomp: cap SECCOMP_RET_ERRNO data to MAX_ERRNO").
> + */
>  TEST(ERRNO_capped)
>  {
> - struct sock_filter filter[] = {
> - BPF_STMT(BPF_LD|BPF_W|BPF_ABS,
> - offsetof(struct seccomp_data, nr)),
> - BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_read, 0, 1),
> - BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ERRNO | 4096),
> - BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW),
> - };
> - struct sock_fprog prog = {
> - .len = (unsigned short)ARRAY_SIZE(filter),
> - .filter = filter,
> - };
> + ERRNO_FILTER(capped, 4096);
>   long ret;
>   pid_t parent = getppid();
>  
>   ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
>   ASSERT_EQ(0, ret);
>  
> - ret = 

[PATCH] dt-bindings: clock: sunxi-ccu: Add compatibles for sun5i CCU driver

2017-08-07 Thread Jonathan Liu
The bindings were not updated when the sun5i CCU driver was added in
commit 5e73761786d6 ("clk: sunxi-ng: Add sun5i CCU driver").

Signed-off-by: Jonathan Liu 
---
 Documentation/devicetree/bindings/clock/sunxi-ccu.txt | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/Documentation/devicetree/bindings/clock/sunxi-ccu.txt 
b/Documentation/devicetree/bindings/clock/sunxi-ccu.txt
index df9fad58facd..dbe0c1c58ab5 100644
--- a/Documentation/devicetree/bindings/clock/sunxi-ccu.txt
+++ b/Documentation/devicetree/bindings/clock/sunxi-ccu.txt
@@ -3,6 +3,8 @@ Allwinner Clock Control Unit Binding
 
 Required properties :
 - compatible: must contain one of the following compatibles:
+   - "allwinner,sun5i-a10s-ccu"
+   - "allwinner,sun5i-a13-ccu"
- "allwinner,sun6i-a31-ccu"
- "allwinner,sun8i-a23-ccu"
- "allwinner,sun8i-a33-ccu"
@@ -15,6 +17,7 @@ Required properties :
- "allwinner,sun50i-a64-ccu"
- "allwinner,sun50i-a64-r-ccu"
- "allwinner,sun50i-h5-ccu"
+   - "nextthing,gr8-ccu"
 
 - reg: Must contain the registers base address and length
 - clocks: phandle to the oscillators feeding the CCU. Two are needed:
-- 
2.13.2



[PATCH v2 3/4] selftests/seccomp: Refactor RET_ERRNO tests

2017-08-07 Thread Kees Cook
This refactors the errno tests (since they all use the same pattern for
their filter) and adds a RET_DATA field ordering test.

Signed-off-by: Kees Cook 
Reviewed-by: Tyler Hicks 
---
 tools/testing/selftests/seccomp/seccomp_bpf.c | 95 ---
 1 file changed, 58 insertions(+), 37 deletions(-)

diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c 
b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 73f5ea6778ce..ee78a53da5d1 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -136,7 +136,7 @@ TEST(no_new_privs_support)
}
 }
 
-/* Tests kernel support by checking for a copy_from_user() fault on * NULL. */
+/* Tests kernel support by checking for a copy_from_user() fault on NULL. */
 TEST(mode_filter_support)
 {
long ret;
@@ -541,26 +541,30 @@ TEST(arg_out_of_range)
EXPECT_EQ(EINVAL, errno);
 }
 
+#define ERRNO_FILTER(name, errno)  \
+   struct sock_filter _read_filter_##name[] = {\
+   BPF_STMT(BPF_LD|BPF_W|BPF_ABS,  \
+   offsetof(struct seccomp_data, nr)), \
+   BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_read, 0, 1),   \
+   BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ERRNO | errno), \
+   BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW), \
+   };  \
+   struct sock_fprog prog_##name = {   \
+   .len = (unsigned short)ARRAY_SIZE(_read_filter_##name), \
+   .filter = _read_filter_##name,  \
+   }
+
+/* Make sure basic errno values are correctly passed through a filter. */
 TEST(ERRNO_valid)
 {
-   struct sock_filter filter[] = {
-   BPF_STMT(BPF_LD|BPF_W|BPF_ABS,
-   offsetof(struct seccomp_data, nr)),
-   BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_read, 0, 1),
-   BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ERRNO | E2BIG),
-   BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW),
-   };
-   struct sock_fprog prog = {
-   .len = (unsigned short)ARRAY_SIZE(filter),
-   .filter = filter,
-   };
+   ERRNO_FILTER(valid, E2BIG);
long ret;
pid_t parent = getppid();
 
ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
ASSERT_EQ(0, ret);
 
-   ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, );
+   ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, _valid);
ASSERT_EQ(0, ret);
 
EXPECT_EQ(parent, syscall(__NR_getppid));
@@ -568,26 +572,17 @@ TEST(ERRNO_valid)
EXPECT_EQ(E2BIG, errno);
 }
 
+/* Make sure an errno of zero is correctly handled by the arch code. */
 TEST(ERRNO_zero)
 {
-   struct sock_filter filter[] = {
-   BPF_STMT(BPF_LD|BPF_W|BPF_ABS,
-   offsetof(struct seccomp_data, nr)),
-   BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_read, 0, 1),
-   BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ERRNO | 0),
-   BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW),
-   };
-   struct sock_fprog prog = {
-   .len = (unsigned short)ARRAY_SIZE(filter),
-   .filter = filter,
-   };
+   ERRNO_FILTER(zero, 0);
long ret;
pid_t parent = getppid();
 
ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
ASSERT_EQ(0, ret);
 
-   ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, );
+   ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, _zero);
ASSERT_EQ(0, ret);
 
EXPECT_EQ(parent, syscall(__NR_getppid));
@@ -595,26 +590,21 @@ TEST(ERRNO_zero)
EXPECT_EQ(0, read(0, NULL, 0));
 }
 
+/*
+ * The SECCOMP_RET_DATA mask is 16 bits wide, but errno is smaller.
+ * This tests that the errno value gets capped correctly, fixed by
+ * 580c57f10768 ("seccomp: cap SECCOMP_RET_ERRNO data to MAX_ERRNO").
+ */
 TEST(ERRNO_capped)
 {
-   struct sock_filter filter[] = {
-   BPF_STMT(BPF_LD|BPF_W|BPF_ABS,
-   offsetof(struct seccomp_data, nr)),
-   BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_read, 0, 1),
-   BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ERRNO | 4096),
-   BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW),
-   };
-   struct sock_fprog prog = {
-   .len = (unsigned short)ARRAY_SIZE(filter),
-   .filter = filter,
-   };
+   ERRNO_FILTER(capped, 4096);
long ret;
pid_t parent = getppid();
 
ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
ASSERT_EQ(0, ret);
 
-   ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, );
+   ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, _capped);
ASSERT_EQ(0, ret);
 
EXPECT_EQ(parent, syscall(__NR_getppid));
@@ -622,6 

linux-next: Signed-off-by missing for commit in the scsi-mkp tree

2017-08-07 Thread Stephen Rothwell
Hi Martin,

Commit

  facfc963ae92 ("scsi: g_NCR5380: Two DTC436 PDMA workarounds")

is missing a Signed-off-by from its author.

-- 
Cheers,
Stephen Rothwell


[RFC v1 2/4] ipmi_bmc: device interface to IPMI BMC framework

2017-08-07 Thread Brendan Higgins
From: Benjamin Fair 

This creates a char device which allows userspace programs to send and
receive IPMI messages. Messages are only routed to userspace if no other
kernel driver can handle them.

Signed-off-by: Benjamin Fair 
Signed-off-by: Brendan Higgins 
---
 drivers/char/ipmi_bmc/Kconfig|   6 +
 drivers/char/ipmi_bmc/Makefile   |   1 +
 drivers/char/ipmi_bmc/ipmi_bmc_devintf.c | 241 +++
 3 files changed, 248 insertions(+)
 create mode 100644 drivers/char/ipmi_bmc/ipmi_bmc_devintf.c

diff --git a/drivers/char/ipmi_bmc/Kconfig b/drivers/char/ipmi_bmc/Kconfig
index b6af38455702..262a17866aa2 100644
--- a/drivers/char/ipmi_bmc/Kconfig
+++ b/drivers/char/ipmi_bmc/Kconfig
@@ -11,6 +11,12 @@ menuconfig IPMI_BMC
 
 if IPMI_BMC
 
+config IPMI_BMC_DEVICE_INTERFACE
+   tristate 'Device interface for BMC-side IPMI'
+   help
+ This provides a file interface to the IPMI BMC core so userland
+ processes may use IPMI.
+
 config IPMI_BMC_BT_I2C
depends on I2C
select I2C_SLAVE
diff --git a/drivers/char/ipmi_bmc/Makefile b/drivers/char/ipmi_bmc/Makefile
index 9c7cd48d899f..ead8abffbd11 100644
--- a/drivers/char/ipmi_bmc/Makefile
+++ b/drivers/char/ipmi_bmc/Makefile
@@ -3,5 +3,6 @@
 #
 
 obj-$(CONFIG_IPMI_BMC) += ipmi_bmc.o
+obj-$(CONFIG_IPMI_BMC_DEVICE_INTERFACE) += ipmi_bmc_devintf.o
 obj-$(CONFIG_IPMI_BMC_BT_I2C) += ipmi_bmc_bt_i2c.o
 obj-$(CONFIG_ASPEED_BT_IPMI_BMC) += ipmi_bmc_bt_aspeed.o
diff --git a/drivers/char/ipmi_bmc/ipmi_bmc_devintf.c 
b/drivers/char/ipmi_bmc/ipmi_bmc_devintf.c
new file mode 100644
index ..2421237ed575
--- /dev/null
+++ b/drivers/char/ipmi_bmc/ipmi_bmc_devintf.c
@@ -0,0 +1,241 @@
+/*
+ * Copyright 2017 Google Inc.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define PFX "IPMI BMC devintf: "
+
+#define DEVICE_NAME "ipmi-bt-host"
+
+/* Must be a power of two */
+#define REQUEST_FIFO_SIZE roundup_pow_of_two(BT_MSG_SEQ_MAX)
+
+struct bmc_devintf_data {
+   struct miscdevice   miscdev;
+   struct ipmi_bmc_device  bmc_device;
+   struct ipmi_bmc_ctx *bmc_ctx;
+   wait_queue_head_t   wait_queue;
+   /* FIFO of waiting messages */
+   DECLARE_KFIFO(requests, struct bt_msg, REQUEST_FIFO_SIZE);
+};
+
+static inline struct bmc_devintf_data *file_to_bmc_devintf_data(
+   struct file *file)
+{
+   return container_of(file->private_data, struct bmc_devintf_data,
+   miscdev);
+}
+
+static ssize_t ipmi_bmc_devintf_read(struct file *file, char __user *buf,
+size_t count, loff_t *ppos)
+{
+   struct bmc_devintf_data *devintf_data = file_to_bmc_devintf_data(file);
+   bool non_blocking = file->f_flags & O_NONBLOCK;
+   struct bt_msg msg;
+
+   if (non_blocking && kfifo_is_empty(_data->requests)) {
+   return -EAGAIN;
+   } else if (!non_blocking) {
+   if (wait_event_interruptible(devintf_data->wait_queue,
+   !kfifo_is_empty(_data->requests)))
+   return -ERESTARTSYS;
+   }
+
+   /* TODO(benjaminfair): eliminate this extra copy */
+   if (unlikely(!kfifo_get(_data->requests, ))) {
+   pr_err(PFX "Unable to read request from fifo\n");
+   return -EIO;
+   }
+
+   /* TODO(benjaminfair): handle partial reads of a message */
+   if (count > bt_msg_len())
+   count = bt_msg_len();
+
+   if (copy_to_user(buf, , count))
+   return -EFAULT;
+
+   return count;
+}
+
+static ssize_t ipmi_bmc_devintf_write(struct file *file, const char __user 
*buf,
+ size_t count, loff_t *ppos)
+{
+   struct bmc_devintf_data *devintf_data = file_to_bmc_devintf_data(file);
+   bool non_blocking = file->f_flags & O_NONBLOCK;
+   struct bt_msg msg;
+   ssize_t ret = 0;
+
+   if (count > sizeof(struct bt_msg))
+   return -EINVAL;
+
+   if (copy_from_user(, buf, count))
+   return -EFAULT;
+
+   if (count != bt_msg_len())
+   return -EINVAL;
+
+   ret = ipmi_bmc_send_response(devintf_data->bmc_ctx, );
+
+   /* Try again if blocking is allowed */
+   while (!non_blocking && ret == -EAGAIN) {
+ 

[RFC v1 3/4] ipmi_bmc: bt-i2c: port driver to IPMI BMC framework

2017-08-07 Thread Brendan Higgins
From: Benjamin Fair 

Instead of handling interaction with userspace and providing a file
interface, rely on the IPMI BMC framework to do this. This simplifies
the logic and eliminates duplicate code.

Signed-off-by: Benjamin Fair 
Signed-off-by: Brendan Higgins 
---
 drivers/char/ipmi_bmc/ipmi_bmc_bt_i2c.c | 202 +---
 1 file changed, 28 insertions(+), 174 deletions(-)

diff --git a/drivers/char/ipmi_bmc/ipmi_bmc_bt_i2c.c 
b/drivers/char/ipmi_bmc/ipmi_bmc_bt_i2c.c
index 686b83fa42a4..6665aa9d4300 100644
--- a/drivers/char/ipmi_bmc/ipmi_bmc_bt_i2c.c
+++ b/drivers/char/ipmi_bmc/ipmi_bmc_bt_i2c.c
@@ -14,102 +14,51 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
-#include 
 #include 
 #include 
-#include 
 
 #define PFX "IPMI BMC BT-I2C: "
 
-/*
- * TODO: This is "bt-host" to match the bt-host driver; however, I think this 
is
- * unclear in the context of a CPU side driver. Should probably name this
- * and the DEVICE_NAME in bt-host to something like "bt-bmc" or "bt-slave".
- */
-#define DEVICE_NAME"ipmi-bt-host"
-
-static const unsigned long request_queue_max_len = 256;
-
-struct bt_request_elem {
-   struct list_headlist;
-   struct bt_msg   request;
-};
-
 struct bt_i2c_slave {
+   struct ipmi_bmc_bus bus;
struct i2c_client   *client;
-   struct miscdevice   miscdev;
+   struct ipmi_bmc_ctx *bmc_ctx;
struct bt_msg   request;
-   struct list_headrequest_queue;
-   atomic_trequest_queue_len;
struct bt_msg   response;
boolresponse_in_progress;
size_t  msg_idx;
spinlock_t  lock;
-   wait_queue_head_t   wait_queue;
-   struct mutexfile_mutex;
 };
 
-static int receive_bt_request(struct bt_i2c_slave *bt_slave, bool non_blocking,
- struct bt_msg *bt_request)
+static bool bt_i2c_is_response_open(struct ipmi_bmc_bus *bus)
 {
-   int res;
+   struct bt_i2c_slave *bt_slave;
+   bool response_in_progress;
unsigned long flags;
-   struct bt_request_elem *queue_elem;
-
-   if (!non_blocking) {
-try_again:
-   res = wait_event_interruptible(
-   bt_slave->wait_queue,
-   atomic_read(_slave->request_queue_len));
-   if (res)
-   return res;
-   }
 
-   spin_lock_irqsave(_slave->lock, flags);
-   if (!atomic_read(_slave->request_queue_len)) {
-   spin_unlock_irqrestore(_slave->lock, flags);
-   if (non_blocking)
-   return -EAGAIN;
-   goto try_again;
-   }
+   bt_slave = container_of(bus, struct bt_i2c_slave, bus);
 
-   if (list_empty(_slave->request_queue)) {
-   pr_err(PFX "request_queue was empty despite nonzero 
request_queue_len\n");
-   return -EIO;
-   }
-   queue_elem = list_first_entry(_slave->request_queue,
- struct bt_request_elem, list);
-   memcpy(bt_request, _elem->request, sizeof(*bt_request));
-   list_del(_elem->list);
-   kfree(queue_elem);
-   atomic_dec(_slave->request_queue_len);
+   spin_lock_irqsave(_slave->lock, flags);
+   response_in_progress = bt_slave->response_in_progress;
spin_unlock_irqrestore(_slave->lock, flags);
-   return 0;
+
+   return !response_in_progress;
 }
 
-static int send_bt_response(struct bt_i2c_slave *bt_slave, bool non_blocking,
-   struct bt_msg *bt_response)
+static int bt_i2c_send_response(struct ipmi_bmc_bus *bus,
+   struct bt_msg *bt_response)
 {
-   int res;
+   struct bt_i2c_slave *bt_slave;
unsigned long flags;
 
-   if (!non_blocking) {
-try_again:
-   res = wait_event_interruptible(bt_slave->wait_queue,
-  !bt_slave->response_in_progress);
-   if (res)
-   return res;
-   }
+   bt_slave = container_of(bus, struct bt_i2c_slave, bus);
 
spin_lock_irqsave(_slave->lock, flags);
if (bt_slave->response_in_progress) {
spin_unlock_irqrestore(_slave->lock, flags);
-   if (non_blocking)
-   return -EAGAIN;
-   goto try_again;
+   return -EAGAIN;
}
 
memcpy(_slave->response, bt_response, sizeof(*bt_response));
@@ -118,106 +67,13 @@ static int send_bt_response(struct bt_i2c_slave 
*bt_slave, bool non_blocking,
return 0;
 }
 
-static inline struct bt_i2c_slave *to_bt_i2c_slave(struct file *file)
-{
-   return container_of(file->private_data, struct bt_i2c_slave, miscdev);
-}
-
-static ssize_t bt_read(struct file 

[RFC v1 0/4] ipmi_bmc: framework for IPMI on BMCs

2017-08-07 Thread Brendan Higgins
This introduces a framework for implementing the BMC side of the IPMI protocol,
roughly mirroring the host side OpenIPMI framework; it attempts to abstract away
hardware interfaces, such as Block Transfer interface hardware implementations
from IPMI command handlers.

It does this by implementing the traditional driver model of a bus with devices;
however, in this case a struct ipmi_bmc_bus represents a hardware interface,
where a struct ipmi_bmc_device represents a handler. A handler filters messages
by registering a function which returns whether a given message matches the
handler; it also has the concept of a default handler which is forwarded all
messages which are not matched by some other interface.

In this patchset, we introduce an example of a default handler: a misc device
file interface which implements the same interface as the the device file
interface used by the Aspeed BT driver.

Currently, OpenBMC handles all IPMI message routing and handling in userland;
the existing drivers simply provide a file interface for the hardware on the
device. In this patchset, we propose a common file interface to be shared by all
IPMI hardware interfaces, but also a framework for implementing handlers at the
kernel level, similar to how the existing OpenIPMI framework supports both
kernel users, as well as misc device file interface.

This patchset depends on the "ipmi: bt-i2c: added IPMI Block Transfer over I2C"
patchset, which can be found here:
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1461960.html
However, I can fix this if desired.

Tested on the AST2500 EVB.


Re: linux-next: Signed-off-by missing for commit in the scsi-mkp tree

2017-08-07 Thread Finn Thain
On Tue, 8 Aug 2017, Stephen Rothwell wrote:

> Hi Martin,
> 
> Commit
> 
>   facfc963ae92 ("scsi: g_NCR5380: Two DTC436 PDMA workarounds")
> 
> is missing a Signed-off-by from its author.
> 

Sorry about that. The patch was a joint effort.

Ondrej, would you please send your "Signed-off-by" tag so that Martin can 
amend this commit (if need be).

-- 


[PATCH] mmc: host: omap_hsmmc: Add CMD23 capability to omap_hsmmc driver

2017-08-07 Thread Kishon Vijay Abraham I
omap_hsmmc driver always relied on CMD12 to stop transmission.
However if CMD12 is not issued at the correct timing, the card will
indicate a out of range error. With certain cards in some of the
DRA7 based boards, -EIO error is observed. By Adding CMD23 capability,
the MMC core will send MMC_SET_BLOCK_COUNT command before
MMC_READ_MULTIPLE_BLOCK/MMC_WRITE_MULTIPLE_BLOCK commands.

commit a04e6bae9e6f12 ("mmc: core: check also R1 response for
stop commands") exposed this bug in omap_hsmmc driver.

Signed-off-by: Kishon Vijay Abraham I 
---
 drivers/mmc/host/omap_hsmmc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/mmc/host/omap_hsmmc.c b/drivers/mmc/host/omap_hsmmc.c
index 04ff3c97a535..2ab4788d021f 100644
--- a/drivers/mmc/host/omap_hsmmc.c
+++ b/drivers/mmc/host/omap_hsmmc.c
@@ -2086,7 +2086,7 @@ static int omap_hsmmc_probe(struct platform_device *pdev)
mmc->max_seg_size = mmc->max_req_size;
 
mmc->caps |= MMC_CAP_MMC_HIGHSPEED | MMC_CAP_SD_HIGHSPEED |
-MMC_CAP_WAIT_WHILE_BUSY | MMC_CAP_ERASE;
+MMC_CAP_WAIT_WHILE_BUSY | MMC_CAP_ERASE | MMC_CAP_CMD23;
 
mmc->caps |= mmc_pdata(host)->caps;
if (mmc->caps & MMC_CAP_8_BIT_DATA)
-- 
2.11.0



[PATCH] ARM: dts: DRA7: Add pcie1 dt node for EP mode

2017-08-07 Thread Kishon Vijay Abraham I
Add pcie1 dt node in order for the controller to operate in
endpoint mode. However since none of the dra7 based boards have
slots configured to operate in endpoint mode, keep EP mode
disabled.

Signed-off-by: Kishon Vijay Abraham I 
---
 arch/arm/boot/dts/am571x-idk.dts|  9 +
 arch/arm/boot/dts/am572x-idk.dts|  7 ++-
 arch/arm/boot/dts/am57xx-beagle-x15-common.dtsi |  7 ++-
 arch/arm/boot/dts/dra7-evm.dts  |  4 
 arch/arm/boot/dts/dra7.dtsi | 23 ++-
 arch/arm/boot/dts/dra72-evm-common.dtsi |  4 
 6 files changed, 51 insertions(+), 3 deletions(-)

diff --git a/arch/arm/boot/dts/am571x-idk.dts b/arch/arm/boot/dts/am571x-idk.dts
index adc70fb091a2..0c0bb4e93f25 100644
--- a/arch/arm/boot/dts/am571x-idk.dts
+++ b/arch/arm/boot/dts/am571x-idk.dts
@@ -96,3 +96,12 @@
status = "okay";
};
 };
+
+_rc {
+   status = "okay";
+   gpios = < 23 GPIO_ACTIVE_HIGH>;
+};
+
+_ep {
+   gpios = < 23 GPIO_ACTIVE_HIGH>;
+};
diff --git a/arch/arm/boot/dts/am572x-idk.dts b/arch/arm/boot/dts/am572x-idk.dts
index 940fcbe5380b..5ff75004afcf 100644
--- a/arch/arm/boot/dts/am572x-idk.dts
+++ b/arch/arm/boot/dts/am572x-idk.dts
@@ -88,7 +88,12 @@
load-gpios = < 19 GPIO_ACTIVE_LOW>;
 };
 
- {
+_rc {
+   status = "okay";
+   gpios = < 23 GPIO_ACTIVE_HIGH>;
+};
+
+_ep {
gpios = < 23 GPIO_ACTIVE_HIGH>;
 };
 
diff --git a/arch/arm/boot/dts/am57xx-beagle-x15-common.dtsi 
b/arch/arm/boot/dts/am57xx-beagle-x15-common.dtsi
index fdfe5b16b806..d433a50cd18a 100644
--- a/arch/arm/boot/dts/am57xx-beagle-x15-common.dtsi
+++ b/arch/arm/boot/dts/am57xx-beagle-x15-common.dtsi
@@ -570,7 +570,12 @@
};
 };
 
- {
+_rc {
+   status = "ok";
+   gpios = < 8 GPIO_ACTIVE_LOW>;
+};
+
+_ep {
gpios = < 8 GPIO_ACTIVE_LOW>;
 };
 
diff --git a/arch/arm/boot/dts/dra7-evm.dts b/arch/arm/boot/dts/dra7-evm.dts
index f47fc4daf062..57bd75909d96 100644
--- a/arch/arm/boot/dts/dra7-evm.dts
+++ b/arch/arm/boot/dts/dra7-evm.dts
@@ -720,3 +720,7 @@
status = "okay";
};
 };
+
+_rc {
+   status = "okay";
+};
diff --git a/arch/arm/boot/dts/dra7.dtsi b/arch/arm/boot/dts/dra7.dtsi
index 0f0f6f58bd18..e6f2c6a15dc1 100644
--- a/arch/arm/boot/dts/dra7.dtsi
+++ b/arch/arm/boot/dts/dra7.dtsi
@@ -196,6 +196,7 @@
scm_conf1: scm_conf@1c04 {
compatible = "syscon";
reg = <0x1c04 0x0020>;
+   #syscon-cells = <2>;
};
 
scm_conf_pcie: scm_conf@1c24 {
@@ -287,7 +288,11 @@
#address-cells = <1>;
ranges = <0x5100 0x5100 0x3000
  0x00x2000 0x1000>;
-   pcie1: pcie@5100 {
+   /**
+* To enable PCI endpoint mode, disable the pcie1_rc
+* node and enable pcie1_ep mode.
+*/
+   pcie1_rc: pcie@5100 {
compatible = "ti,dra7-pcie";
reg = <0x5100 0x2000>, <0x51002000 0x14c>, 
<0x1000 0x2000>;
reg-names = "rc_dbics", "ti_conf", "config";
@@ -309,12 +314,28 @@
<0 0 0 2 _intc 2>,
<0 0 0 3 _intc 3>,
<0 0 0 4 _intc 4>;
+   status = "disabled";
pcie1_intc: interrupt-controller {
interrupt-controller;
#address-cells = <0>;
#interrupt-cells = <1>;
};
};
+
+   pcie1_ep: pcie_ep@5100 {
+   compatible = "ti,dra7-pcie-ep";
+   reg = <0x5100 0x28>, <0x51002000 0x14c>, 
<0x51001000 0x28>, <0x1000 0x1000>;
+   reg-names = "ep_dbics", "ti_conf", "ep_dbics2", 
"addr_space";
+   interrupts = <0 232 0x4>;
+   num-lanes = <1>;
+   num-ib-windows = <4>;
+   num-ob-windows = <16>;
+   ti,hwmods = "pcie1";
+   phys = <_phy>;
+   phy-names = "pcie-phy0";
+   ti,syscon-unaligned-access = <_conf1 0x14 
2>;
+   status = "disabled";
+   };
};
 
axi@1 {
diff --git 

Re: [PATCH v4] printk: Add monotonic, boottime, and realtime timestamps

2017-08-07 Thread Sergey Senozhatsky
On (08/07/17 11:52), Prarit Bhargava wrote:
[..]
> +/**
> + * enum printk_time_type - Timestamp types for printk() messages.
> + * @PRINTK_TIME_DISABLE: No time stamp.
> + * @PRINTK_TIME_LOCAL: Local hardware clock timestamp.
> + * @PRINTK_TIME_BOOT: Boottime clock timestamp.
> + * @PRINTK_TIME_MONO: Monotonic clock timestamp.
> + * @PRINTK_TIME_REAL: Realtime clock timestamp.  On 32-bit
> + * systems selecting the real clock printk timestamp may lead to unlikely
> + * situations where a timestamp is wrong because the real time offset is read
> + * without the protection of a sequence lock in the call to 
> ktime_get_log_ts()
> + * in printk_get_ts() below.
> + */
> +enum printk_time_type {
> + PRINTK_TIME_DISABLE = 0,
> + PRINTK_TIME_LOCAL = 1,
> + PRINTK_TIME_BOOT = 2,
> + PRINTK_TIME_MONO = 3,
> + PRINTK_TIME_REAL = 4,
> +};

may be call the entire thing 'timestamp surces' or something?

[..]
> + if (strlen(param) == 1) {
> + /* Preserve legacy boolean settings */
> + if (!strcmp("0", param) || !strcmp("n", param) ||
> + !strcmp("N", param))
> + _printk_time = PRINTK_TIME_DISABLE;
> + if (!strcmp("1", param) || !strcmp("y", param) ||
> + !strcmp("Y", param))
> + _printk_time = PRINTK_TIME_LOCAL;
> + }
> + if (_printk_time == -1) {
> + for (stamp = 0; stamp <= 4; stamp++) {
> + if (!strncmp(printk_time_str[stamp], param,
> +  strlen(param))) {
> + _printk_time = stamp;
> + break;
> + }
> + }
> + }

you can use match_string() here.

> + if (_printk_time == -1) {
> + pr_warn("printk: invalid timestamp value %s\n", param);
> + return -EINVAL;
> + }

`invalid timestamp value' is confusing.


> + } else if ((printk_time_setting != _printk_time) &&
> +(_printk_time != 0)) {
> + pr_warn("printk: timestamp can only be set to 0(disabled) or 
> %s\n",
> + printk_time_str[printk_time_setting]);

ditto.


> + return -EINVAL;
> + }
> +
> + printk_time = _printk_time;
> + pr_info("printk: timestamp set to %s\n", printk_time_str[printk_time]);

ditto.


[..]

> +static u64 printk_get_ts(void)
> +{
> + u64 mono, offset_real;
> +
> + if (printk_time <= PRINTK_TIME_LOCAL)
> + return local_clock();
> +
> + if (printk_time == PRINTK_TIME_BOOT)
> + return ktime_get_boot_log_ts();
> +
> + mono = ktime_get_real_log_ts(_real);
> +
> + if (printk_time == PRINTK_TIME_MONO)
> + return mono;
> +
> + return mono + offset_real;
> +}

this looks hard...

> +static int printk_time;
> +static int printk_time_setting;

how about s/printk_time_setting/printk_time_source/? or something similar?

-ss


Re: [PATCH v8 14/14] lockdep: Crossrelease feature documentation

2017-08-07 Thread kbuild test robot
Hi Byungchul,

[auto build test ERROR on linus/master]
[also build test ERROR on v4.13-rc4 next-20170804]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Byungchul-Park/lockdep-Implement-crossrelease-feature/20170807-172617
config: x86_64-allmodconfig (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

>> ERROR: "lookup_page_ext" [net/sunrpc/sunrpc.ko] undefined!
>> ERROR: "lookup_page_ext" [mm/zsmalloc.ko] undefined!
>> ERROR: "lookup_page_ext" [fs/xfs/xfs.ko] undefined!
>> ERROR: "lookup_page_ext" [fs/ufs/ufs.ko] undefined!
>> ERROR: "lookup_page_ext" [fs/udf/udf.ko] undefined!
>> ERROR: "lookup_page_ext" [fs/ubifs/ubifs.ko] undefined!
>> ERROR: "lookup_page_ext" [fs/sysv/sysv.ko] undefined!
>> ERROR: "lookup_page_ext" [fs/squashfs/squashfs.ko] undefined!
>> ERROR: "lookup_page_ext" [fs/romfs/romfs.ko] undefined!
>> ERROR: "lookup_page_ext" [fs/reiserfs/reiserfs.ko] undefined!
>> ERROR: "lookup_page_ext" [fs/orangefs/orangefs.ko] undefined!
>> ERROR: "lookup_page_ext" [fs/ocfs2/ocfs2.ko] undefined!
>> ERROR: "lookup_page_ext" [fs/ntfs/ntfs.ko] undefined!
>> ERROR: "lookup_page_ext" [fs/nilfs2/nilfs2.ko] undefined!
>> ERROR: "lookup_page_ext" [fs/nfs/nfs.ko] undefined!
>> ERROR: "lookup_page_ext" [fs/ncpfs/ncpfs.ko] undefined!
>> ERROR: "lookup_page_ext" [fs/minix/minix.ko] undefined!
>> ERROR: "lookup_page_ext" [fs/jfs/jfs.ko] undefined!
>> ERROR: "lookup_page_ext" [fs/jffs2/jffs2.ko] undefined!
>> ERROR: "lookup_page_ext" [fs/jbd2/jbd2.ko] undefined!

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


[RFC][PATCH] timer: Add function-change canary

2017-08-07 Thread Kees Cook
This introduces canaries to struct timer_list in an effort to protect the
function callback pointer from getting rewritten during stack or heap
overflow attacks. The struct timer_list has become a recent target for
security flaw exploitation because it includes the "data" argument in
the structure, along with the function callback. This provides attackers
with a ROP-like primitive for performing limited kernel function calls
without needing all the prerequisites to stage a ROP attack.

Recent examples of exploits using struct timer_list attacks:

http://www.openwall.com/lists/oss-security/2016/12/06/1
(https://www.exploit-db.com/exploits/40871/)

https://googleprojectzero.blogspot.com/2017/05/exploiting-linux-kernel-via-packet.html
(https://www.exploit-db.com/exploits/41458/)

Timers normally have their callback functions initialized either via
the setup_timer_*() macros or manually before calls to add_timer(). The
per-timer canary gets set in either case, and then checked at timer
expiration time before calling the function.

Signed-off-by: Kees Cook 
---
 include/linux/timer.h |  6 --
 kernel/time/timer.c   | 45 -
 2 files changed, 48 insertions(+), 3 deletions(-)

diff --git a/include/linux/timer.h b/include/linux/timer.h
index e6789b8757d5..9aac0da9d2ec 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -16,6 +16,7 @@ struct timer_list {
 */
struct hlist_node   entry;
unsigned long   expires;
+   unsigned long   canary;
void(*function)(unsigned long);
unsigned long   data;
u32 flags;
@@ -91,6 +92,7 @@ struct timer_list {
 
 void init_timer_key(struct timer_list *timer, unsigned int flags,
const char *name, struct lock_class_key *key);
+void init_timer_func(struct timer_list *timer, void (*func)(unsigned long));
 
 #ifdef CONFIG_DEBUG_OBJECTS_TIMERS
 extern void init_timer_on_stack_key(struct timer_list *timer,
@@ -140,14 +142,14 @@ static inline void init_timer_on_stack_key(struct 
timer_list *timer,
 #define __setup_timer(_timer, _fn, _data, _flags)  \
do {\
__init_timer((_timer), (_flags));   \
-   (_timer)->function = (_fn); \
+   init_timer_func((_timer), (_fn));   \
(_timer)->data = (_data);   \
} while (0)
 
 #define __setup_timer_on_stack(_timer, _fn, _data, _flags) \
do {\
__init_timer_on_stack((_timer), (_flags));  \
-   (_timer)->function = (_fn); \
+   init_timer_func((_timer), (_fn));   \
(_timer)->data = (_data);   \
} while (0)
 
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 71ce3f4eead3..bc8ae8ef9106 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -44,6 +44,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1060,6 +1061,34 @@ int mod_timer(struct timer_list *timer, unsigned long 
expires)
 }
 EXPORT_SYMBOL(mod_timer);
 
+static DEFINE_MUTEX(timer_canary_mutex);
+static unsigned long timer_canary __ro_after_init;
+
+/**
+ * init_timer_func - set the function used for the timer
+ * @timer: the timer to be updated
+ * @func: the function to be called by the timer
+ *
+ * This should only be called once per timer creation to set the function.
+ * Normally used via the setup_timer_*() macros or add_timer().
+ */
+void init_timer_func(struct timer_list *timer, void (*func)(unsigned long))
+{
+   /* Initialize the global timer canary on first use. */
+   if (!timer_canary) {
+   mutex_lock(_canary_mutex);
+   if (!timer_canary)
+   timer_canary = get_random_long();
+   mutex_unlock(_canary_mutex);
+   }
+
+   /* Record timer canary for this timer function. */
+   timer->function = func;
+   timer->canary = (unsigned long)timer->function ^
+   (unsigned long)timer ^ timer_canary;
+}
+EXPORT_SYMBOL(init_timer_func);
+
 /**
  * add_timer - start a timer
  * @timer: the timer to be added
@@ -1077,6 +1106,7 @@ EXPORT_SYMBOL(mod_timer);
 void add_timer(struct timer_list *timer)
 {
BUG_ON(timer_pending(timer));
+   init_timer_func(timer, timer->function);
mod_timer(timer, timer->expires);
 }
 EXPORT_SYMBOL(add_timer);
@@ -1095,6 +1125,7 @@ void add_timer_on(struct timer_list *timer, int cpu)
 
BUG_ON(timer_pending(timer) || !timer->function);
 
+   init_timer_func(timer, timer->function);
new_base = 

Re: [PATCH 1/4] seccomp: Provide matching filter for introspection

2017-08-07 Thread Tyler Hicks
On 08/07/2017 08:03 PM, Tyler Hicks wrote:
> On 08/02/2017 10:19 PM, Kees Cook wrote:
>> Both the upcoming logging improvements and changes to RET_KILL will need
>> to know which filter a given seccomp return value originated from. In
>> order to delay logic processing of result until after the seccomp loop,
>> this adds a single pointer assignment on matches. This will allow both
>> log and RET_KILL logic to work off the filter rather than doing more
>> expensive tests inside the time-critical run_filters loop.
>>
>> Running tight cycles of getpid() with filters attached shows no measurable
>> difference in speed.
>>
>> Suggested-by: Tyler Hicks 
>> Signed-off-by: Kees Cook 
>> ---
>>  kernel/seccomp.c | 11 ---
>>  1 file changed, 8 insertions(+), 3 deletions(-)
>>
>> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
>> index 98b59b5db90b..8bdcf01379e4 100644
>> --- a/kernel/seccomp.c
>> +++ b/kernel/seccomp.c
>> @@ -171,10 +171,12 @@ static int seccomp_check_filter(struct sock_filter 
>> *filter, unsigned int flen)
>>  /**
>>   * seccomp_run_filters - evaluates all seccomp filters against @sd
>>   * @sd: optional seccomp data to be passed to filters
>> + * @match: stores struct seccomp_filter that resulted in the return value

Thinking just a bit more about this patch, can you document that @match
may be NULL upon return?

Tyler

>>   *
>>   * Returns valid seccomp BPF response codes.
>>   */
>> -static u32 seccomp_run_filters(const struct seccomp_data *sd)
>> +static u32 seccomp_run_filters(const struct seccomp_data *sd,
>> +   struct seccomp_filter **match)
>>  {
>>  struct seccomp_data sd_local;
>>  u32 ret = SECCOMP_RET_ALLOW;
> 
> My version of this patch initialized *match to f here. The reason I did
> that is because if BPF_PROG_RUN() returns RET_ALLOW for all
> filters, I didn't want *match to remain NULL when seccomp_run_filters()
> returns. FILTER_FLAG_LOG nor FILTER_FLAG_KILL_PROCESS would be affected
> by this because they don't care about RET_ALLOW actions but there could
> conceivably be a filter flag in the future that cares about RET_ALLOW
> and not initializing *match to the first filter could result in a latent
> bug for that filter flag.
> 
> I'm fine with not adding the initialization since this is a hot path and
> it doesn't help any of the currently existing/planned filter flags but I
> wanted to at least mention it.
> 
> Reviewed-by: Tyler Hicks 
> 
> Tyler
> 
>> @@ -198,8 +200,10 @@ static u32 seccomp_run_filters(const struct 
>> seccomp_data *sd)
>>  for (; f; f = f->prev) {
>>  u32 cur_ret = BPF_PROG_RUN(f->prog, sd);
>>  
>> -if ((cur_ret & SECCOMP_RET_ACTION) < (ret & SECCOMP_RET_ACTION))
>> +if ((cur_ret & SECCOMP_RET_ACTION) < (ret & 
>> SECCOMP_RET_ACTION)) {
>>  ret = cur_ret;
>> +*match = f;
>> +}
>>  }
>>  return ret;
>>  }
>> @@ -566,6 +570,7 @@ static int __seccomp_filter(int this_syscall, const 
>> struct seccomp_data *sd,
>>  const bool recheck_after_trace)
>>  {
>>  u32 filter_ret, action;
>> +struct seccomp_filter *match = NULL;
>>  int data;
>>  
>>  /*
>> @@ -574,7 +579,7 @@ static int __seccomp_filter(int this_syscall, const 
>> struct seccomp_data *sd,
>>   */
>>  rmb();
>>  
>> -filter_ret = seccomp_run_filters(sd);
>> +filter_ret = seccomp_run_filters(sd, );
>>  data = filter_ret & SECCOMP_RET_DATA;
>>  action = filter_ret & SECCOMP_RET_ACTION;
>>  
>>
> 
> 




signature.asc
Description: OpenPGP digital signature


[PATCH 2/2] f2fs: introduce gc_urgent mode for background GC

2017-08-07 Thread Jaegeuk Kim
This patch adds a sysfs entry to control urgent mode for background GC.
If this is set, background GC thread conducts GC with gc_urgent_sleep_time
all the time.

Signed-off-by: Jaegeuk Kim 
---
 Documentation/ABI/testing/sysfs-fs-f2fs | 12 
 fs/f2fs/gc.c| 17 +++--
 fs/f2fs/gc.h|  4 
 fs/f2fs/sysfs.c |  9 +
 4 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs 
b/Documentation/ABI/testing/sysfs-fs-f2fs
index c579ce5e0ef5..11b7f4ebea7c 100644
--- a/Documentation/ABI/testing/sysfs-fs-f2fs
+++ b/Documentation/ABI/testing/sysfs-fs-f2fs
@@ -139,3 +139,15 @@ Date:  June 2017
 Contact:   "Chao Yu" 
 Description:
 Controls current reserved blocks in system.
+
+What:  /sys/fs/f2fs//gc_urgent
+Date:  August 2017
+Contact:   "Jaegeuk Kim" 
+Description:
+Do background GC agressively
+
+What:  /sys/fs/f2fs//gc_urgent_sleep_time
+Date:  August 2017
+Contact:   "Jaegeuk Kim" 
+Description:
+Controls sleep time of GC urgent mode
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 620dca443b29..8da7c14a9d29 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -35,9 +35,14 @@ static int gc_thread_func(void *data)
set_freezable();
do {
wait_event_interruptible_timeout(*wq,
-   kthread_should_stop() || freezing(current),
+   kthread_should_stop() || freezing(current) ||
+   gc_th->gc_wake,
msecs_to_jiffies(wait_ms));
 
+   /* give it a try one time */
+   if (gc_th->gc_wake)
+   gc_th->gc_wake = 0;
+
if (try_to_freeze())
continue;
if (kthread_should_stop())
@@ -74,6 +79,11 @@ static int gc_thread_func(void *data)
if (!mutex_trylock(>gc_mutex))
goto next;
 
+   if (gc_th->gc_urgent) {
+   wait_ms = gc_th->urgent_sleep_time;
+   goto do_gc;
+   }
+
if (!is_idle(sbi)) {
increase_sleep_time(gc_th, _ms);
mutex_unlock(>gc_mutex);
@@ -84,7 +94,7 @@ static int gc_thread_func(void *data)
decrease_sleep_time(gc_th, _ms);
else
increase_sleep_time(gc_th, _ms);
-
+do_gc:
stat_inc_bggc_count(sbi);
 
/* if return value is not zero, no victim was selected */
@@ -115,11 +125,14 @@ int start_gc_thread(struct f2fs_sb_info *sbi)
goto out;
}
 
+   gc_th->urgent_sleep_time = DEF_GC_THREAD_URGENT_SLEEP_TIME;
gc_th->min_sleep_time = DEF_GC_THREAD_MIN_SLEEP_TIME;
gc_th->max_sleep_time = DEF_GC_THREAD_MAX_SLEEP_TIME;
gc_th->no_gc_sleep_time = DEF_GC_THREAD_NOGC_SLEEP_TIME;
 
gc_th->gc_idle = 0;
+   gc_th->gc_urgent = 0;
+   gc_th->gc_wake= 0;
 
sbi->gc_thread = gc_th;
init_waitqueue_head(>gc_thread->gc_wait_queue_head);
diff --git a/fs/f2fs/gc.h b/fs/f2fs/gc.h
index a993967dcdb9..57a9000ce3af 100644
--- a/fs/f2fs/gc.h
+++ b/fs/f2fs/gc.h
@@ -13,6 +13,7 @@
 * whether IO subsystem is idle
 * or not
 */
+#define DEF_GC_THREAD_URGENT_SLEEP_TIME500 /* 500 ms */
 #define DEF_GC_THREAD_MIN_SLEEP_TIME   3   /* milliseconds */
 #define DEF_GC_THREAD_MAX_SLEEP_TIME   6
 #define DEF_GC_THREAD_NOGC_SLEEP_TIME  30  /* wait 5 min */
@@ -27,12 +28,15 @@ struct f2fs_gc_kthread {
wait_queue_head_t gc_wait_queue_head;
 
/* for gc sleep time */
+   unsigned int urgent_sleep_time;
unsigned int min_sleep_time;
unsigned int max_sleep_time;
unsigned int no_gc_sleep_time;
 
/* for changing gc mode */
unsigned int gc_idle;
+   unsigned int gc_urgent;
+   unsigned int gc_wake;
 };
 
 struct gc_inode_list {
diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c
index 4e5a95e9e666..b769a3d776de 100644
--- a/fs/f2fs/sysfs.c
+++ b/fs/f2fs/sysfs.c
@@ -161,6 +161,10 @@ static ssize_t f2fs_sbi_store(struct f2fs_attr *a,
 
if (!strcmp(a->attr.name, "iostat_enable") && *ui == 0)
f2fs_reset_iostat(sbi);
+   if (!strcmp(a->attr.name, "gc_urgent") && t == 1 && sbi->gc_thread) {
+   sbi->gc_thread->gc_wake = 1;
+   wake_up_interruptible_all(>gc_thread->gc_wait_queue_head);
+   }
 
return count;
 }
@@ -240,10 +244,13 @@ static struct f2fs_attr f2fs_attr_##_name = { 

[PATCH 1/2] f2fs: use IPU for cold files

2017-08-07 Thread Jaegeuk Kim
We expect cold files write data sequentially, but sometimes some of small data
can be updated, which incurs fragmentation.
Let's avoid that.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/segment.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
index 6b871b492fd5..7f700e54b77d 100644
--- a/fs/f2fs/segment.h
+++ b/fs/f2fs/segment.h
@@ -577,6 +577,10 @@ static inline bool need_inplace_update_policy(struct inode 
*inode,
if (test_opt(sbi, LFS))
return false;
 
+   /* if this is cold file, we should overwrite to avoid fragmentation */
+   if (file_is_cold(inode))
+   return true;
+
if (policy & (0x1 << F2FS_IPU_FORCE))
return true;
if (policy & (0x1 << F2FS_IPU_SSR) && need_SSR(sbi))
-- 
2.13.0.rc1.294.g07d810a77f-goog



[PATCH v2 4/4] selftests/seccomp: Test thread vs process killing

2017-08-07 Thread Kees Cook
SECCOMP_RET_KILL is supposed to kill the current thread (and userspace
depends on this), so test for this, distinct from killing the entire
process. This also tests killing the entire process with the new
SECCOMP_FILTER_FLAG_KILL_PROCESS flag. (This also moves a bunch of
defines up earlier in the file to use them earlier.)

Signed-off-by: Kees Cook 
Reviewed-by: Tyler Hicks 
---
 tools/testing/selftests/seccomp/seccomp_bpf.c | 185 --
 1 file changed, 144 insertions(+), 41 deletions(-)

diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c 
b/tools/testing/selftests/seccomp/seccomp_bpf.c
index ee78a53da5d1..d0a9bebf21f3 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -87,6 +87,51 @@ struct seccomp_data {
 };
 #endif
 
+#ifndef __NR_seccomp
+# if defined(__i386__)
+#  define __NR_seccomp 354
+# elif defined(__x86_64__)
+#  define __NR_seccomp 317
+# elif defined(__arm__)
+#  define __NR_seccomp 383
+# elif defined(__aarch64__)
+#  define __NR_seccomp 277
+# elif defined(__hppa__)
+#  define __NR_seccomp 338
+# elif defined(__powerpc__)
+#  define __NR_seccomp 358
+# elif defined(__s390__)
+#  define __NR_seccomp 348
+# else
+#  warning "seccomp syscall number unknown for this architecture"
+#  define __NR_seccomp 0x
+# endif
+#endif
+
+#ifndef SECCOMP_SET_MODE_STRICT
+#define SECCOMP_SET_MODE_STRICT 0
+#endif
+
+#ifndef SECCOMP_SET_MODE_FILTER
+#define SECCOMP_SET_MODE_FILTER 1
+#endif
+
+#ifndef SECCOMP_FILTER_FLAG_TSYNC
+#define SECCOMP_FILTER_FLAG_TSYNC 1
+#endif
+
+#ifndef SECCOMP_FILTER_FLAG_KILL_PROCESS
+#define SECCOMP_FILTER_FLAG_KILL_PROCESS 2
+#endif
+
+#ifndef seccomp
+int seccomp(unsigned int op, unsigned int flags, void *args)
+{
+   errno = 0;
+   return syscall(__NR_seccomp, op, flags, args);
+}
+#endif
+
 #if __BYTE_ORDER == __LITTLE_ENDIAN
 #define syscall_arg(_n) (offsetof(struct seccomp_data, args[_n]))
 #elif __BYTE_ORDER == __BIG_ENDIAN
@@ -520,6 +565,105 @@ TEST_SIGNAL(KILL_one_arg_six, SIGSYS)
close(fd);
 }
 
+/* This is a thread task to die via seccomp filter violation. */
+void *kill_thread(void *data)
+{
+   bool die = (bool)data;
+
+   if (die) {
+   prctl(PR_GET_SECCOMP, 0, 0, 0, 0);
+   return (void *)SIBLING_EXIT_FAILURE;
+   }
+
+   return (void *)SIBLING_EXIT_UNKILLED;
+}
+
+/* Prepare a thread that will kill itself or both of us. */
+void kill_thread_or_group(struct __test_metadata *_metadata, bool kill_process)
+{
+   pthread_t thread;
+   void *status;
+   unsigned int flags;
+   /* Kill only when calling __NR_prctl. */
+   struct sock_filter filter[] = {
+   BPF_STMT(BPF_LD|BPF_W|BPF_ABS,
+   offsetof(struct seccomp_data, nr)),
+   BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_prctl, 0, 1),
+   BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_KILL),
+   BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW),
+   };
+   struct sock_fprog prog = {
+   .len = (unsigned short)ARRAY_SIZE(filter),
+   .filter = filter,
+   };
+
+   ASSERT_EQ(0, prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
+   TH_LOG("Kernel does not support PR_SET_NO_NEW_PRIVS!");
+   }
+
+   flags = kill_process ? SECCOMP_FILTER_FLAG_KILL_PROCESS : 0;
+   ASSERT_EQ(0, seccomp(SECCOMP_SET_MODE_FILTER, flags, )) {
+   if (kill_process)
+   TH_LOG("Kernel does not support 
SECCOMP_FILTER_FLAG_KILL_PROCESS");
+   else
+   TH_LOG("Kernel does not support seccomp syscall");
+   }
+
+   /* Start a thread that will exit immediately. */
+   ASSERT_EQ(0, pthread_create(, NULL, kill_thread, (void *)false));
+   ASSERT_EQ(0, pthread_join(thread, ));
+   ASSERT_EQ(SIBLING_EXIT_UNKILLED, (unsigned long)status);
+
+   /* Start a thread that will die immediately. */
+   ASSERT_EQ(0, pthread_create(, NULL, kill_thread, (void *)true));
+   ASSERT_EQ(0, pthread_join(thread, ));
+   ASSERT_NE(SIBLING_EXIT_FAILURE, (unsigned long)status);
+
+   /*
+* If we get here, only the spawned thread died. Let the parent know
+* this entire process (all threads including this one) didn't die.
+*/
+   exit(42);
+}
+
+TEST(KILL_thread)
+{
+   int status;
+   pid_t child_pid;
+
+   child_pid = fork();
+   ASSERT_LE(0, child_pid);
+   if (child_pid == 0) {
+   kill_thread_or_group(_metadata, false);
+   _exit(38);
+   }
+
+   ASSERT_EQ(child_pid, waitpid(child_pid, , 0));
+
+   /* If only the thread was killed, we'll see exit 42. */
+   ASSERT_TRUE(WIFEXITED(status));
+   ASSERT_EQ(42, WEXITSTATUS(status));
+}
+
+TEST(KILL_process)
+{
+   int status;
+   pid_t child_pid;
+
+   child_pid = fork();
+   ASSERT_LE(0, child_pid);
+  

[PATCH v2 0/4] seccomp: Add SECCOMP_FILTER_FLAG_KILL_PROCESS

2017-08-07 Thread Kees Cook
This series is the result of Fabricio and I going around a few times
on possible solutions for finding a way to enhance RET_KILL to kill
the process group. There's a lot of ways this could be done, but I
wanted something that felt cleanest. As it happens, Tyler's recent
patch series for logging improvement also needs to know a litte bit
more during filter runs, and the solution for both is to pass back
the matched filter. This lets us examine it here for RET_KILL and
in the future for logging changes.

The filter passing is patch 1, the new flag for RET_KILL is patch 2.
Some test refactoring is in patch 3 for the RET_DATA ordering, and
patch 4 is the test for the new RET_KILL flag.

Please take a look!

Thanks,

-Kees

v2:
- moved kill_process bool into struct padding gap (tyhicks)
- improved comments/docs in various places for clarify (tyhicks)
- use ASSERT_TRUE() for WIFEXITED and WIFSIGNALLED (tyhicks)
- adding Reviewed-bys from tyhicks



[PATCH v2 2/4] seccomp: Add SECCOMP_FILTER_FLAG_KILL_PROCESS

2017-08-07 Thread Kees Cook
Right now, SECCOMP_RET_KILL kills the current thread. There have been
a few requests for RET_KILL to kill the entire process (the thread
group), but since seccomp's u32 return values are ABI, and ordered by
lowest value, with RET_KILL as 0, there isn't a trivial way to provide
an even smaller value that would mean the more restrictive action of
killing the thread group.

Instead, create a filter flag that indicates that a RET_KILL from this
filter must kill the process rather than the thread. This can be set
(and not cleared) via the new SECCOMP_FILTER_FLAG_KILL_PROCESS flag.

Pros:
 - the logic for the filter action is contained in the filter.
 - userspace can detect support for the feature since earlier kernels
   will reject the new flag.
Cons:
 - depends on adding an assignment to the seccomp_run_filters() loop
   (previous patch).

Alternatives to this approach with pros/cons:

- Use a new test during seccomp_run_filters() that treats the RET_DATA
  mask of a RET_KILL action as special. If a new bit is set in the data,
  then treat the return value as -1 (lower than 0).
  Pros:
   - the logic for the filter action is contained in the filter.
  Cons:
   - added complexity to time-sensitive seccomp_run_filters() loop.
   - there isn't a trivial way for userspace to detect if the kernel
 supports the feature (earlier kernels will silently ignore the
 RET_DATA and only kill the thread).

- Have SECCOMP_FILTER_FLAG_KILL_PROCESS attach to the seccomp struct
  rather than the filter.
  Pros:
   - no change needed to seccomp_run_filters() loop.
  Cons:
   - the change in behavior technically originates external to the
 filter, which allows for later filters to "enhance" a previously
 applied filter's RET_KILL to kill the entire process, which may
 be unexpected.

Signed-off-by: Kees Cook 
---
 include/linux/seccomp.h  |  3 ++-
 include/uapi/linux/seccomp.h |  3 ++-
 kernel/seccomp.c | 22 +-
 3 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index ecc296c137cd..59d001ba655c 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -3,7 +3,8 @@
 
 #include 
 
-#define SECCOMP_FILTER_FLAG_MASK   (SECCOMP_FILTER_FLAG_TSYNC)
+#define SECCOMP_FILTER_FLAG_MASK   (SECCOMP_FILTER_FLAG_TSYNC | \
+SECCOMP_FILTER_FLAG_KILL_PROCESS)
 
 #ifdef CONFIG_SECCOMP
 
diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
index 0f238a43ff1e..4b75d8c297b6 100644
--- a/include/uapi/linux/seccomp.h
+++ b/include/uapi/linux/seccomp.h
@@ -15,7 +15,8 @@
 #define SECCOMP_SET_MODE_FILTER1
 
 /* Valid flags for SECCOMP_SET_MODE_FILTER */
-#define SECCOMP_FILTER_FLAG_TSYNC  1
+#define SECCOMP_FILTER_FLAG_TSYNC  1
+#define SECCOMP_FILTER_FLAG_KILL_PROCESS   2
 
 /*
  * All BPF programs must return a 32-bit value.
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 1f3347fc2605..297f8bfc3b72 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -44,6 +44,7 @@
  * is only needed for handling filters shared across tasks.
  * @prev: points to a previously installed, or inherited, filter
  * @prog: the BPF program to evaluate
+ * @kill_process: if true, RET_KILL will kill process rather than thread.
  *
  * seccomp_filter objects are organized in a tree linked via the @prev
  * pointer.  For any task, it appears to be a singly-linked list starting
@@ -57,6 +58,7 @@
  */
 struct seccomp_filter {
refcount_t usage;
+   bool kill_process;
struct seccomp_filter *prev;
struct bpf_prog *prog;
 };
@@ -450,6 +452,10 @@ static long seccomp_attach_filter(unsigned int flags,
return ret;
}
 
+   /* Set process-killing flag, if present. */
+   if (flags & SECCOMP_FILTER_FLAG_KILL_PROCESS)
+   filter->kill_process = true;
+
/*
 * If there is an existing filter, make it the prev and don't drop its
 * task reference.
@@ -665,7 +671,21 @@ static int __seccomp_filter(int this_syscall, const struct 
seccomp_data *sd,
seccomp_init_siginfo(, this_syscall, data);
do_coredump();
}
-   do_exit(SIGSYS);
+   /*
+* The only way match can be NULL here is if something
+* went very wrong in seccomp_run_filters() (e.g. a NULL
+* filter list in struct seccomp) and the return action
+* falls back to failing closed. In this case, take the
+* strongest possible action.
+*
+* If we get here with match->kill_process set, we need
+* to kill the entire thread group. Otherwise, kill only
+* the offending thread.
+*/
+   if (!match || match->kill_process)
+ 

[PATCH v2 1/4] seccomp: Provide matching filter for introspection

2017-08-07 Thread Kees Cook
Both the upcoming logging improvements and changes to RET_KILL will need
to know which filter a given seccomp return value originated from. In
order to delay logic processing of result until after the seccomp loop,
this adds a single pointer assignment on matches. This will allow both
log and RET_KILL logic to work off the filter rather than doing more
expensive tests inside the time-critical run_filters loop.

Running tight cycles of getpid() with filters attached shows no measurable
difference in speed.

Suggested-by: Tyler Hicks 
Signed-off-by: Kees Cook 
Reviewed-by: Tyler Hicks 
---
 kernel/seccomp.c | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 98b59b5db90b..1f3347fc2605 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -171,10 +171,14 @@ static int seccomp_check_filter(struct sock_filter 
*filter, unsigned int flen)
 /**
  * seccomp_run_filters - evaluates all seccomp filters against @sd
  * @sd: optional seccomp data to be passed to filters
+ * @match: stores struct seccomp_filter that resulted in the return value,
+ * unless filter returned SECCOMP_RET_ALLOW, in which case it will
+ * be unchanged.
  *
  * Returns valid seccomp BPF response codes.
  */
-static u32 seccomp_run_filters(const struct seccomp_data *sd)
+static u32 seccomp_run_filters(const struct seccomp_data *sd,
+  struct seccomp_filter **match)
 {
struct seccomp_data sd_local;
u32 ret = SECCOMP_RET_ALLOW;
@@ -198,8 +202,10 @@ static u32 seccomp_run_filters(const struct seccomp_data 
*sd)
for (; f; f = f->prev) {
u32 cur_ret = BPF_PROG_RUN(f->prog, sd);
 
-   if ((cur_ret & SECCOMP_RET_ACTION) < (ret & SECCOMP_RET_ACTION))
+   if ((cur_ret & SECCOMP_RET_ACTION) < (ret & 
SECCOMP_RET_ACTION)) {
ret = cur_ret;
+   *match = f;
+   }
}
return ret;
 }
@@ -566,6 +572,7 @@ static int __seccomp_filter(int this_syscall, const struct 
seccomp_data *sd,
const bool recheck_after_trace)
 {
u32 filter_ret, action;
+   struct seccomp_filter *match = NULL;
int data;
 
/*
@@ -574,7 +581,7 @@ static int __seccomp_filter(int this_syscall, const struct 
seccomp_data *sd,
 */
rmb();
 
-   filter_ret = seccomp_run_filters(sd);
+   filter_ret = seccomp_run_filters(sd, );
data = filter_ret & SECCOMP_RET_DATA;
action = filter_ret & SECCOMP_RET_ACTION;
 
@@ -638,6 +645,11 @@ static int __seccomp_filter(int this_syscall, const struct 
seccomp_data *sd,
return 0;
 
case SECCOMP_RET_ALLOW:
+   /*
+* Note that the "match" filter will always be NULL for
+* this action since SECCOMP_RET_ALLOW is the starting
+* state in seccomp_run_filters().
+*/
return 0;
 
case SECCOMP_RET_KILL:
-- 
2.7.4



linux-next: manual merge of the rdma tree with Linus' tree

2017-08-07 Thread Stephen Rothwell
Hi Doug,

Today's linux-next merge of the rdma tree got a conflict in:

  drivers/net/ethernet/mellanox/mlx5/core/main.c

between commit:

  eeb66cdb6826 ("net/mlx5: Separate between E-Switch and MPFS")

from Linus' tree and commit:

  c85023e153e3 ("IB/mlx5: Add raw ethernet local loopback support")

from the rdma tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/net/ethernet/mellanox/mlx5/core/main.c
index 6dbd637b4e66,3cec683fd70f..
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@@ -1155,6 -1168,16 +1168,12 @@@ static int mlx5_load_one(struct mlx5_co
goto err_fs;
}
  
+   err = mlx5_core_set_hca_defaults(dev);
+   if (err) {
+   dev_err(>dev, "Failed to set hca defaults\n");
+   goto err_fs;
+   }
+ 
 -#ifdef CONFIG_MLX5_CORE_EN
 -  mlx5_eswitch_attach(dev->priv.eswitch);
 -#endif
 -
err = mlx5_sriov_attach(dev);
if (err) {
dev_err(>dev, "sriov init failed %d\n", err);


Re: block/ps3vram: Delete an error message for a failed memory allocation in ps3vram_cache_init()

2017-08-07 Thread Michael Ellerman
SF Markus Elfring  writes:

 I didn't consider one would be triggered by the kzalloc failure.
>>>
>>> Do you reconsider any special system settings for further
>>> software evolution then?
>> 
>> Sorry, I don't quite understand your question.
>
> Do you try to configure the Linux error reporting to any special needs?
>
>
>> I think your original patch is OK,
>
> How does this feedback fit to the initial response “Not Applicable”?
> https://patchwork.ozlabs.org/patch/798575/

That comes from me, and means "I can't apply this patch", because it's
not a powerpc patch.

Looking at the maintainers output though maybe that is meant to go via
the powerpc tree.

cheers


[PATCH urgent] x86/asm/64: Clear AC on NMI entries

2017-08-07 Thread Andy Lutomirski
This closes a hole in our SMAP implementation.

This patch comes from grsecurity.  Good catch!

Cc: sta...@vger.kernel.org
Signed-off-by: Andy Lutomirski 
---
 arch/x86/entry/entry_64.S | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index d271fb79248f..6d078b89a5e8 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1211,6 +1211,8 @@ ENTRY(nmi)
 * other IST entries.
 */
 
+   ASM_CLAC
+
/* Use %rdx as our temp variable throughout */
pushq   %rdx
 
-- 
2.13.3



Re: Possible race in pc87413_wdt.ko

2017-08-07 Thread Guenter Roeck

On 08/07/2017 06:22 AM, Anton Volkov wrote:

Hello.

While searching for races in the Linux kernel I've come across 
"drivers/watchdog/pc87413_wdt.ko" module. Here is a question that I came up 
with while analyzing results. Lines are given using the info from Linux v4.12.

Consider the following case:

Thread 1:  Thread 2:
pc87413_init
misc_register(_miscdev)
-> pc87413_get_swc_base_addr   pc87413_open
-> pc87413_refresh
   -> pc87413_swc_bank3
  swc_base_addr = ...  
  (pc87413_wdt.c: line 133)(pc87413_wdt.c: line 146)

So in this case preemptive registration of the device leads to a possibility of 
race between the initialization process and a callback to the registered device.

Is this race feasible from your point of view? And if it is, is it possible to 
move the device registration a bit further down in the pc87413_init function?



Yes, the race is feasible, and it is possible to move the device registration 
function
(though the preferred solution would be to convert the driver to use the 
watchdog
subsystem). The code looks pretty bad as written.

Just not sure if it is worth bothering about it. I suspect no on is using that 
driver
anymore (the datasheet is from 2001). Might as well just declare it obsolete and
wait for someone to scream.

Guenter


Re: [PATCH -mm] mm: Clear to access sub-page last when clearing huge page

2017-08-07 Thread Huang, Ying
Mike Kravetz  writes:

> On 08/07/2017 12:21 AM, Huang, Ying wrote:
>> From: Huang Ying 
>> 
>> Huge page helps to reduce TLB miss rate, but it has higher cache
>> footprint, sometimes this may cause some issue.  For example, when
>> clearing huge page on x86_64 platform, the cache footprint is 2M.  But
>> on a Xeon E5 v3 2699 CPU, there are 18 cores, 36 threads, and only 45M
>> LLC (last level cache).  That is, in average, there are 2.5M LLC for
>> each core and 1.25M LLC for each thread.  If the cache pressure is
>> heavy when clearing the huge page, and we clear the huge page from the
>> begin to the end, it is possible that the begin of huge page is
>> evicted from the cache after we finishing clearing the end of the huge
>> page.  And it is possible for the application to access the begin of
>> the huge page after clearing the huge page.
>> 
>> To help the above situation, in this patch, when we clear a huge page,
>> the order to clear sub-pages is changed.  In quite some situation, we
>> can get the address that the application will access after we clear
>> the huge page, for example, in a page fault handler.  Instead of
>> clearing the huge page from begin to end, we will clear the sub-pages
>> farthest from the the sub-page to access firstly, and clear the
>> sub-page to access last.  This will make the sub-page to access most
>> cache-hot and sub-pages around it more cache-hot too.  If we cannot
>> know the address the application will access, the begin of the huge
>> page is assumed to be the the address the application will access.
>> 
>> With this patch, the throughput increases ~28.3% in vm-scalability
>> anon-w-seq test case with 72 processes on a 2 socket Xeon E5 v3 2699
>> system (36 cores, 72 threads).  The test case creates 72 processes,
>> each process mmap a big anonymous memory area and writes to it from
>> the begin to the end.  For each process, other processes could be seen
>> as other workload which generates heavy cache pressure.  At the same
>> time, the cache miss rate reduced from ~33.4% to ~31.7%, the
>> IPC (instruction per cycle) increased from 0.56 to 0.74, and the time
>> spent in user space is reduced ~7.9%
>> 
>> Thanks Andi Kleen to propose to use address to access to determine the
>> order of sub-pages to clear.
>> 
>> The hugetlbfs access address could be improved, will do that in
>> another patch.
>
> hugetlb_fault masks off the actual faulting address with,
> address &= huge_page_mask(h);
> before calling hugetlb_no_page.
>
> But, we could pass down the actual (unmasked) address to take advantage
> of this optimization for hugetlb faults as well.  hugetlb_fault is the
> only caller of hugetlb_no_page, so this should be pretty straight forward.
>
> Were you thinking of additional improvements?

No.  I am thinking of something like this.  If the basic idea is
accepted, I plan to add better support like this for hugetlbfs in
another patch.

Best Regards,
Huang, Ying


Re: [PATCH 0/6] In-kernel QMI handling

2017-08-07 Thread Bjorn Andersson
On Mon 07 Aug 12:19 PDT 2017, Marcel Holtmann wrote:

> Hi Bjorn,
> 
> >>> This series starts by moving the common definitions of the QMUX
> >>> protocol to the
> >>> uapi header, as they are shared with clients - both in kernel and
> >>> userspace.
> >>> 
> >>> This series then introduces in-kernel helper functions for aiding the
> >>> handling
> >>> of QMI encoded messages in the kernel. QMI encoding is a wire-format
> >>> used in
> >>> exchanging messages between the majority of QRTR clients and
> >>> services.
> >> 
> >> This raises a few red-flags for me.
> > 
> > I'm glad it does. In discussions with the responsible team within
> > Qualcomm I've highlighted a number of concerns about enabling this
> > support in the kernel. Together we're continuously looking into what
> > should be pushed out to user space, and trying to not introduce
> > unnecessary new users.
> > 
> >> So far, we've kept almost everything QMI related in userspace and
> >> handled all QMI control-channel messages from libraries like libqmi or
> >> uqmi via the cdc-wdm driver and the "rmnet" interface via the qmi_wwan
> >> driver.  The kernel drivers just serve as the transport.
> >> 
> > 
> > The path that was taken to support the MSM-style devices was to
> > implement net/qrtr, which exposes a socket interface to abstract the
> > physical transports (QMUX or IPCROUTER in Qualcomm terminology).
> > 
> > As I share you view on letting the kernel handle the transportation only
> > the task of keeping track of registered services (service id -> node and
> > port mapping) was done in a user space process and so far we've only
> > ever have to deal with QMI encoded messages in various user space tools.
> 
> I think that the transport and multiplexing can be in the kernel as
> long as it is done as proper subsystem. Similar to Phonet or CAIF.
> Meaning it should have a well defined socket interface that can be
> easily used from userspace, but also a clean in-kernel interface
> handling.
> 

In a mobile Qualcomm device there's a few different components involved
here: message routing, QMUX protocol and QMI-encoding.

The downstream Qualcomm kernel implements the two first in the
IPCROUTER, upstream this is split between the kernel net/qrtr and a user
space service-register implementing the QMUX protocol for knowing where
services are located.

The common encoding of messages passed between endpoints of the message
routing is QMI, which is made an affair totally that of each client.

> If Qualcomm is supportive of this effort and is willing to actually
> assist and/or open some of the specs or interface descriptions, then
> this is a good thing. Service registration and cleanup is really done
> best in the kernel. Same applies to multiplexing. Trying to do
> multiplexing in userspace is always cumbersome and leads to overhead
> that is of no gain. For example within oFono, we had to force
> everything to go via oFono since it was the only sane way of handling
> it. Other approaches were error prone and full of race conditions. You
> need a central entity that can clean up.
> 

The current upstream solution depends on a collaboration between
net/qrtr and the user space service register for figuring out whom to
send messages to. After that muxing et al is handled by the socket
interface and service registry does not need to be involved.

Qualcomm is very supporting of this solution and we're collaborating on
transitioning "downstream" to use this implementation.

> For the definition of an UAPI to share some code, I am actually not
> sure that is such a good idea. For example the QMI code in oFono
> follows a way simpler approach. And I am not convinced that all the
> macros are actually beneficial. For example, the whole netlink macros
> are pretty cumbersome. Adding some Documentation/qmi.txt on how the
> wire format looks like and what is expected seems to be a way better
> approach.
> 

The socket interface provided by the kernel expects some knowledge of
the QMUX protocol, for service management. The majority of this
knowledge is already public, but I agree that it would be good to gather
this in a document. The common data structure for the control message is
what I've put in the uapi, as this is used by anyone dealing with
control messages.

When it comes to the QMI-encoded messages these are application
specific, just like e.g. protobuf definitions are application specific.

As the core infrastructure is becoming available upstream and boards
like the DB410c and DB820c aim to be supported by open solutions we will
have a natural place to discuss publication of at least some of the
application level protocols.

Regards,
Bjorn


[PATCH] spi: xlp: fix error return code in xlp_spi_probe()

2017-08-07 Thread Gustavo A. R. Silva
platform_get_irq() returns an error code, but the spi-xlp driver ignores
it and always returns -EINVAL. This is not correct and, prevents
-EPROBE_DEFER from being propagated properly.

Notice that platform_get_irq() no longer returns 0 on error:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e330b9a6bb35dc7097a4f02cb1ae7b6f96df92af

Print and propagate the return value of platform_get_irq on failure.

This issue was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/spi/spi-xlp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/spi/spi-xlp.c b/drivers/spi/spi-xlp.c
index 80cb4d6..74a01b0 100644
--- a/drivers/spi/spi-xlp.c
+++ b/drivers/spi/spi-xlp.c
@@ -393,8 +393,8 @@ static int xlp_spi_probe(struct platform_device *pdev)
 
irq = platform_get_irq(pdev, 0);
if (irq < 0) {
-   dev_err(>dev, "no IRQ resource found\n");
-   return -EINVAL;
+   dev_err(>dev, "no IRQ resource found: %d\n", irq);
+   return irq;
}
err = devm_request_irq(>dev, irq, xlp_spi_interrupt, 0,
pdev->name, xspi);
-- 
2.5.0



[PATCH 1/2] PCI: iproc: Implement PCI hotplug support

2017-08-07 Thread Oza Pawandeep
This patch implements PCI hotplug support for iproc family chipsets.

Iproc based SOC (e.g. Stingray) does not have hotplug controller
integrated.
Hence, standard PCI hotplug framework hooks can-not be used.
e.g. controlled power up/down of slot.

The mechanism, for e.g. Stingray has adopted for PCI hotplug is as follows:
PCI present lines are input to GPIOs depending on the type of
connector (x2, x4, x8).

GPIO array needs to be present if hotplug is supported.
HW implementation is SOC/Board specific, and also it depends on how
add-in card is designed
(e.g. how many present pins are implemented).

If x8 card is connected, then it might be possible that all the
3 present pins could go low, or at least one pin goes low.
If x4 card is connected, then it might be possible that 2 present
pins go low, or at least one pin goes low.

The implementation essentially takes care of following:
> Initializing hotplug irq thread.
> Detecting the endpoint device based on link state.
> Handling PERST and detecting the plugged devices.
> Ordered hot plug-out, where User is expected
  to write 1 to /sys/bus/pci/devices//remove
> Handling spurious interrupt
> Handling multiple interrupts and makes sure that card is
  enumerated only once.

Signed-off-by: Oza Pawandeep 
Reviewed-by: Ray Jui 

diff --git a/drivers/pci/host/pcie-iproc-platform.c 
b/drivers/pci/host/pcie-iproc-platform.c
index 9512960..e51bbd2 100644
--- a/drivers/pci/host/pcie-iproc-platform.c
+++ b/drivers/pci/host/pcie-iproc-platform.c
@@ -89,6 +89,9 @@ static int iproc_pcie_pltfm_probe(struct platform_device 
*pdev)
pcie->need_ob_cfg = true;
}
 
+   if (of_property_read_bool(np, "brcm,pci-hotplug"))
+   pcie->enable_hotplug = true;
+
/* PHY use is optional */
pcie->phy = devm_phy_get(dev, "pcie-phy");
if (IS_ERR(pcie->phy)) {
diff --git a/drivers/pci/host/pcie-iproc.c b/drivers/pci/host/pcie-iproc.c
index ee40651..c6d1add 100644
--- a/drivers/pci/host/pcie-iproc.c
+++ b/drivers/pci/host/pcie-iproc.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "pcie-iproc.h"
 
@@ -65,6 +66,17 @@
 #define PCIE_DL_ACTIVE_SHIFT 2
 #define PCIE_DL_ACTIVE   BIT(PCIE_DL_ACTIVE_SHIFT)
 
+#define CFG_RC_LTSSM 0x1cf8
+#define CFG_RC_PHY_CTL   0x1804
+#define CFG_RC_LTSSM_TIMEOUT 1000
+#define CFG_RC_LTSSM_STATE_MASK  0xff
+#define CFG_RC_LTSSM_STATE_L10x1
+
+#define CFG_RC_CLR_LTSSM_HIST_SHIFT  29
+#define CFG_RC_CLR_LTSSM_HIST_MASK   BIT(CFG_RC_CLR_LTSSM_HIST_SHIFT)
+#define CFG_RC_CLR_RECOV_HIST_SHIFT  31
+#define CFG_RC_CLR_RECOV_HIST_MASK   BIT(CFG_RC_CLR_RECOV_HIST_SHIFT)
+
 #define APB_ERR_EN_SHIFT 0
 #define APB_ERR_EN   BIT(APB_ERR_EN_SHIFT)
 
@@ -1306,12 +1318,106 @@ static int iproc_pcie_rev_init(struct iproc_pcie *pcie)
return 0;
 }
 
+static bool iproc_pci_hp_check_ltssm(struct iproc_pcie *pcie)
+{
+   struct pci_bus *bus = pcie->root_bus;
+   u32 val, timeout = CFG_RC_LTSSM_TIMEOUT;
+
+   /* Clear LTSSM history. */
+   pci_bus_read_config_dword(pcie->root_bus, 0,
+ CFG_RC_PHY_CTL, );
+   pci_bus_write_config_dword(bus, 0, CFG_RC_PHY_CTL,
+  val | CFG_RC_CLR_RECOV_HIST_MASK |
+  CFG_RC_CLR_LTSSM_HIST_MASK);
+   /* write back the origional value. */
+   pci_bus_write_config_dword(bus, 0, CFG_RC_PHY_CTL, val);
+
+   do {
+   pci_bus_read_config_dword(pcie->root_bus, 0,
+ CFG_RC_LTSSM, );
+   /* check link state to see if link moved to L1 state. */
+   if ((val & CFG_RC_LTSSM_STATE_MASK) ==
+CFG_RC_LTSSM_STATE_L1)
+   return true;
+   timeout--;
+   usleep_range(500, 1000);
+   } while (timeout);
+
+   return false;
+}
+
+static irqreturn_t iproc_pci_hotplug_thread(int irq, void *data)
+{
+   struct iproc_pcie *pcie = data;
+   struct pci_bus *bus = pcie->root_bus, *child;
+   bool link_status;
+
+   iproc_pcie_perst_ctrl(pcie, true);
+   iproc_pcie_perst_ctrl(pcie, false);
+
+   link_status = iproc_pci_hp_check_ltssm(pcie);
+
+   if (link_status &&
+   !iproc_pcie_check_link(pcie, bus) &&
+   !pcie->ep_is_present) {
+   pci_rescan_bus(bus);
+   list_for_each_entry(child, >children, node)
+   pcie_bus_configure_settings(child);
+   pcie->ep_is_present = true;
+   dev_info(pcie->dev,
+"PCI Hotplug: \n");
+   } else if (link_status && pcie->ep_is_present)
+   /*
+* ep_is_present makes sure, enumuration done only once.
+* So it can handle spurious intrrupts, and also if we
+

[PATCH 0/2] PCI hotplug feature

2017-08-07 Thread Oza Pawandeep
These patches bring in PCI hotplug support for iproc family chipsets.

It includes DT binding documentation update and, implementation in
iproc pcie RC driver.

These patch set is made on top of following patches.
[PATCH v6 2/2] PCI: iproc: add device shutdown for PCI RC
[PATCH v6 1/2] PCI: iproc: Retry request when CRS returned from EP

Oza Pawandeep (2):
  PCI: iproc: Implement PCI hotplug support
  PCI: iproc: Add optional brcm,pci-hotplug

 .../devicetree/bindings/pci/brcm,iproc-pcie.txt|  23 
 drivers/pci/host/pcie-iproc-platform.c |   3 +
 drivers/pci/host/pcie-iproc.c  | 130 -
 drivers/pci/host/pcie-iproc.h  |   7 ++
 4 files changed, 157 insertions(+), 6 deletions(-)

-- 
1.9.1



[PATCH 2/2] PCI: iproc: Add optional brcm,pci-hotplug

2017-08-07 Thread Oza Pawandeep
Add description for optional device tree property
'brcm,pci-hotplug' for PCI hotplug feature.

Signed-off-by: Oza Pawandeep 
Reviewed-by: Ray Jui 

diff --git a/Documentation/devicetree/bindings/pci/brcm,iproc-pcie.txt 
b/Documentation/devicetree/bindings/pci/brcm,iproc-pcie.txt
index b8e48b4..a3bad24 100644
--- a/Documentation/devicetree/bindings/pci/brcm,iproc-pcie.txt
+++ b/Documentation/devicetree/bindings/pci/brcm,iproc-pcie.txt
@@ -72,6 +72,29 @@ Optional properties:
 - brcm,pcie-msi-inten: Needs to be present for some older iProc platforms that
 require the interrupt enable registers to be set explicitly to enable MSI
 
+Optional properties:
+- brcm,pci-hotplug: PCI hotplug feature is supported.
+
+If the brcm,pcie-hotplug property is present, the following properties become
+effective:
+
+- brcm,prsnt-gpio: Array of gpios, needs to be present if Hotplug is supported.
+
+PCI hotplug implementation is SOC/Board specific, and also it depends on
+how add-in card is designed (e.g. how many present pins are implemented).
+
+If x8 card is connected, then it might be possible that all the
+3 present pins could go low, or at least one pin goes low.
+
+If x4 card is connected, then it might be possible that 2 present
+pins go low, or at least one pin goes low.
+
+Example:
+brcm,prsnt-gpio: < 32 1>, < 33 1>;
+This is x4 connector: monitoring max 2 present lines.
+brcm,prsnt-gpio: < 32 1>, < 33 1>, < 34 1>;
+This is x8 connector: monitoring max 3 present lines.
+
 Example:
pcie0: pcie@18012000 {
compatible = "brcm,iproc-pcie";
-- 
1.9.1



[PATCH v2] scheduler: enhancement to show_state_filter

2017-08-07 Thread Yafang Shao
Sometimes we want to get tasks in TASK_RUNNING sepcifically,
instead of dump all tasks.
For example, when the loadavg are high, we want to dump
tasks in TASK_RUNNING and TASK_UNINTERRUPTIBLE, which contribute
to system load. But mostly there're lots of tasks in Sleep state,
which occupies almost all of the kernel log buffer, even overflows
it, that causes the useful messages get lost. Although we can
enlarge the kernel log buffer, but that's not a good idea.

So I made this change to make the show_state_filter more flexible,
and then we can dump the tasks in TASK_RUNNING specifically.

Signed-off-by: Yafang Shao 
---
 drivers/tty/sysrq.c | 2 +-
 include/linux/sched.h   | 1 +
 include/linux/sched/debug.h | 6 --
 kernel/sched/core.c | 7 ---
 4 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
index 3ffc1ce..86db51b 100644
--- a/drivers/tty/sysrq.c
+++ b/drivers/tty/sysrq.c
@@ -291,7 +291,7 @@ static void sysrq_handle_showstate(int key)
 
 static void sysrq_handle_showstate_blocked(int key)
 {
-   show_state_filter(TASK_UNINTERRUPTIBLE);
+   show_state_filter(TASK_UNINTERRUPTIBLE << 1);
 }
 static struct sysrq_key_op sysrq_showstate_blocked_op = {
.handler= sysrq_handle_showstate_blocked,
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8337e2d..318f149 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -82,6 +82,7 @@
 #define TASK_NOLOAD1024
 #define TASK_NEW   2048
 #define TASK_STATE_MAX 4096
+#define TASK_ALL_BITS  ((TASK_STATE_MAX << 1) - 1)
 
 #define TASK_STATE_TO_CHAR_STR "RSDTtXZxKWPNn"
 
diff --git a/include/linux/sched/debug.h b/include/linux/sched/debug.h
index e0eaee5..c844689 100644
--- a/include/linux/sched/debug.h
+++ b/include/linux/sched/debug.h
@@ -1,6 +1,8 @@
 #ifndef _LINUX_SCHED_DEBUG_H
 #define _LINUX_SCHED_DEBUG_H
 
+#include 
+
 /*
  * Various scheduler/task debugging interfaces:
  */
@@ -10,13 +12,13 @@
 extern void dump_cpu_task(int cpu);
 
 /*
- * Only dump TASK_* tasks. (0 for all tasks)
+ * Only dump TASK_* tasks. (TASK_ALL_BITS for all tasks)
  */
 extern void show_state_filter(unsigned long state_filter);
 
 static inline void show_state(void)
 {
-   show_state_filter(0);
+   show_state_filter(TASK_ALL_BITS);
 }
 
 struct pt_regs;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0869b20..f9b9529 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5161,19 +5161,20 @@ void show_state_filter(unsigned long state_filter)
 */
touch_nmi_watchdog();
touch_all_softlockup_watchdogs();
-   if (!state_filter || (p->state & state_filter))
+   /* in case we want to set TASK_RUNNING specifically */
+   if ((p->state != TASK_RUNNING ? p->state << 1 : 1) & 
state_filter)
sched_show_task(p);
}
 
 #ifdef CONFIG_SCHED_DEBUG
-   if (!state_filter)
+   if (state_filter == TASK_ALL_BITS)
sysrq_sched_debug_show();
 #endif
rcu_read_unlock();
/*
 * Only show locks if all tasks are dumped:
 */
-   if (!state_filter)
+   if (state_filter == TASK_ALL_BITS)
debug_show_all_locks();
 }
 
-- 
1.8.3.1



[PATCH] mmc: wmt-sdmmc: Handle return value of clk_prepare_enable

2017-08-07 Thread Arvind Yadav
clk_prepare_enable() can fail here and we must check its return value.

Signed-off-by: Arvind Yadav 
---
 drivers/mmc/host/wmt-sdmmc.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/mmc/host/wmt-sdmmc.c b/drivers/mmc/host/wmt-sdmmc.c
index 21ebba8..e64f930 100644
--- a/drivers/mmc/host/wmt-sdmmc.c
+++ b/drivers/mmc/host/wmt-sdmmc.c
@@ -856,7 +856,9 @@ static int wmt_mci_probe(struct platform_device *pdev)
goto fail5;
}
 
-   clk_prepare_enable(priv->clk_sdmmc);
+   ret = clk_prepare_enable(priv->clk_sdmmc);
+   if (ret)
+   goto fail6;
 
/* configure the controller to a known 'ready' state */
wmt_reset_hardware(mmc);
@@ -866,6 +868,8 @@ static int wmt_mci_probe(struct platform_device *pdev)
dev_info(>dev, "WMT SDHC Controller initialized\n");
 
return 0;
+fail6:
+   clk_put(priv->clk_sdmmc);
 fail5:
free_irq(dma_irq, priv);
 fail4:
-- 
1.9.1



Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions

2017-08-07 Thread kbuild test robot
Hi Byungchul,

[auto build test ERROR on linus/master]
[also build test ERROR on v4.13-rc4 next-20170804]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Byungchul-Park/lockdep-Implement-crossrelease-feature/20170807-172617
config: cris-allmodconfig (attached as .config)
compiler: cris-linux-gcc (GCC) 6.2.0
reproduce:
wget 
https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=cris 

All error/warnings (new ones prefixed by >>):

   In file included from include/linux/pm.h:29:0,
from include/linux/device.h:25,
from include/linux/pci.h:30,
from drivers/usb/host/ehci-hcd.c:24:
   include/linux/completion.h:32:27: error: field 'map' has incomplete type
 struct lockdep_map_cross map;
  ^~~
   In file included from include/linux/spinlock_types.h:18:0,
from include/linux/spinlock.h:81,
from include/linux/seqlock.h:35,
from include/linux/time.h:5,
from include/linux/stat.h:18,
from include/linux/module.h:10,
from drivers/usb/host/ehci-hcd.c:23:
   drivers/usb/host/ehci-hub.c: In function 'ehset_single_step_set_feature':
>> include/linux/lockdep.h:578:4: error: field name not in record or union 
>> initializer
 { .map.name = (_name), .map.key = (void *)(_key), \
   ^
>> include/linux/completion.h:70:2: note: in expansion of macro 
>> 'STATIC_CROSS_LOCKDEP_MAP_INIT'
 STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) }
 ^
   include/linux/completion.h:88:27: note: in expansion of macro 
'COMPLETION_INITIALIZER'
 struct completion work = COMPLETION_INITIALIZER(work)
  ^~
   include/linux/completion.h:106:43: note: in expansion of macro 
'DECLARE_COMPLETION'
# define DECLARE_COMPLETION_ONSTACK(work) DECLARE_COMPLETION(work)
  ^~
   drivers/usb/host/ehci-hub.c:811:2: note: in expansion of macro 
'DECLARE_COMPLETION_ONSTACK'
 DECLARE_COMPLETION_ONSTACK(done);
 ^~
   include/linux/lockdep.h:578:4: note: (near initialization for 'done.map')
 { .map.name = (_name), .map.key = (void *)(_key), \
   ^
>> include/linux/completion.h:70:2: note: in expansion of macro 
>> 'STATIC_CROSS_LOCKDEP_MAP_INIT'
 STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) }
 ^
   include/linux/completion.h:88:27: note: in expansion of macro 
'COMPLETION_INITIALIZER'
 struct completion work = COMPLETION_INITIALIZER(work)
  ^~
   include/linux/completion.h:106:43: note: in expansion of macro 
'DECLARE_COMPLETION'
# define DECLARE_COMPLETION_ONSTACK(work) DECLARE_COMPLETION(work)
  ^~
   drivers/usb/host/ehci-hub.c:811:2: note: in expansion of macro 
'DECLARE_COMPLETION_ONSTACK'
 DECLARE_COMPLETION_ONSTACK(done);
 ^~
   include/linux/lockdep.h:578:25: error: field name not in record or union 
initializer
 { .map.name = (_name), .map.key = (void *)(_key), \
^
>> include/linux/completion.h:70:2: note: in expansion of macro 
>> 'STATIC_CROSS_LOCKDEP_MAP_INIT'
 STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) }
 ^
   include/linux/completion.h:88:27: note: in expansion of macro 
'COMPLETION_INITIALIZER'
 struct completion work = COMPLETION_INITIALIZER(work)
  ^~
   include/linux/completion.h:106:43: note: in expansion of macro 
'DECLARE_COMPLETION'
# define DECLARE_COMPLETION_ONSTACK(work) DECLARE_COMPLETION(work)
  ^~
   drivers/usb/host/ehci-hub.c:811:2: note: in expansion of macro 
'DECLARE_COMPLETION_ONSTACK'
 DECLARE_COMPLETION_ONSTACK(done);
 ^~
   include/linux/lockdep.h:578:25: note: (near initialization for 'done.map')
 { .map.name = (_name), .map.key = (void *)(_key), \
^
>> include/linux/completion.h:70:2: note: in expansion of macro 
>> 'STATIC_CROSS_LOCKDEP_MAP_INIT'
 STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) }
 ^
   include/linux/completion.h:88:27: note: in expansion of macro 
'COMPLETION_INITIALIZER'
 struct completion work = COMPLETION_INITI

Re: [PATCH 1/4] seccomp: Provide matching filter for introspection

2017-08-07 Thread Tyler Hicks
On 08/02/2017 10:19 PM, Kees Cook wrote:
> Both the upcoming logging improvements and changes to RET_KILL will need
> to know which filter a given seccomp return value originated from. In
> order to delay logic processing of result until after the seccomp loop,
> this adds a single pointer assignment on matches. This will allow both
> log and RET_KILL logic to work off the filter rather than doing more
> expensive tests inside the time-critical run_filters loop.
> 
> Running tight cycles of getpid() with filters attached shows no measurable
> difference in speed.
> 
> Suggested-by: Tyler Hicks 
> Signed-off-by: Kees Cook 
> ---
>  kernel/seccomp.c | 11 ---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index 98b59b5db90b..8bdcf01379e4 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -171,10 +171,12 @@ static int seccomp_check_filter(struct sock_filter 
> *filter, unsigned int flen)
>  /**
>   * seccomp_run_filters - evaluates all seccomp filters against @sd
>   * @sd: optional seccomp data to be passed to filters
> + * @match: stores struct seccomp_filter that resulted in the return value
>   *
>   * Returns valid seccomp BPF response codes.
>   */
> -static u32 seccomp_run_filters(const struct seccomp_data *sd)
> +static u32 seccomp_run_filters(const struct seccomp_data *sd,
> +struct seccomp_filter **match)
>  {
>   struct seccomp_data sd_local;
>   u32 ret = SECCOMP_RET_ALLOW;

My version of this patch initialized *match to f here. The reason I did
that is because if BPF_PROG_RUN() returns RET_ALLOW for all
filters, I didn't want *match to remain NULL when seccomp_run_filters()
returns. FILTER_FLAG_LOG nor FILTER_FLAG_KILL_PROCESS would be affected
by this because they don't care about RET_ALLOW actions but there could
conceivably be a filter flag in the future that cares about RET_ALLOW
and not initializing *match to the first filter could result in a latent
bug for that filter flag.

I'm fine with not adding the initialization since this is a hot path and
it doesn't help any of the currently existing/planned filter flags but I
wanted to at least mention it.

Reviewed-by: Tyler Hicks 

Tyler

> @@ -198,8 +200,10 @@ static u32 seccomp_run_filters(const struct seccomp_data 
> *sd)
>   for (; f; f = f->prev) {
>   u32 cur_ret = BPF_PROG_RUN(f->prog, sd);
>  
> - if ((cur_ret & SECCOMP_RET_ACTION) < (ret & SECCOMP_RET_ACTION))
> + if ((cur_ret & SECCOMP_RET_ACTION) < (ret & 
> SECCOMP_RET_ACTION)) {
>   ret = cur_ret;
> + *match = f;
> + }
>   }
>   return ret;
>  }
> @@ -566,6 +570,7 @@ static int __seccomp_filter(int this_syscall, const 
> struct seccomp_data *sd,
>   const bool recheck_after_trace)
>  {
>   u32 filter_ret, action;
> + struct seccomp_filter *match = NULL;
>   int data;
>  
>   /*
> @@ -574,7 +579,7 @@ static int __seccomp_filter(int this_syscall, const 
> struct seccomp_data *sd,
>*/
>   rmb();
>  
> - filter_ret = seccomp_run_filters(sd);
> + filter_ret = seccomp_run_filters(sd, );
>   data = filter_ret & SECCOMP_RET_DATA;
>   action = filter_ret & SECCOMP_RET_ACTION;
>  
> 




signature.asc
Description: OpenPGP digital signature


Re: [PATCH 1/4] seccomp: Provide matching filter for introspection

2017-08-07 Thread Kees Cook
On Mon, Aug 7, 2017 at 6:03 PM, Tyler Hicks  wrote:
>> -static u32 seccomp_run_filters(const struct seccomp_data *sd)
>> +static u32 seccomp_run_filters(const struct seccomp_data *sd,
>> +struct seccomp_filter **match)
>>  {
>>   struct seccomp_data sd_local;
>>   u32 ret = SECCOMP_RET_ALLOW;
>
> My version of this patch initialized *match to f here. The reason I did
> that is because if BPF_PROG_RUN() returns RET_ALLOW for all
> filters, I didn't want *match to remain NULL when seccomp_run_filters()
> returns. FILTER_FLAG_LOG nor FILTER_FLAG_KILL_PROCESS would be affected
> by this because they don't care about RET_ALLOW actions but there could
> conceivably be a filter flag in the future that cares about RET_ALLOW
> and not initializing *match to the first filter could result in a latent
> bug for that filter flag.

Very true, yes. I did intentionally adjust this because I wanted to
keep the hot path as untouched as possible.

> I'm fine with not adding the initialization since this is a hot path and
> it doesn't help any of the currently existing/planned filter flags but I
> wanted to at least mention it.

Yeah, and while I doubt I'll want to ever check "match" for RET_ALLOW,
I'll add a big comment there to explain it.

> Reviewed-by: Tyler Hicks 

Thanks!

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH 2/4] seccomp: Add SECCOMP_FILTER_FLAG_KILL_PROCESS

2017-08-07 Thread Kees Cook
On Mon, Aug 7, 2017 at 6:23 PM, Tyler Hicks  wrote:
> On 08/02/2017 10:19 PM, Kees Cook wrote:
>> Right now, SECCOMP_RET_KILL kills the current thread. There have been
>> a few requests for RET_KILL to kill the entire process (the thread
>> group), but since seccomp's u32 return values are ABI, and ordered by
>> lowest value, with RET_KILL as 0, there isn't a trivial way to provide
>> an even smaller value that would mean the more restrictive action of
>> killing the thread group.
>>
>> Instead, create a filter flag that indicates that a RET_KILL from this
>> filter must kill the process rather than the thread. This can be set
>> (and not cleared) via the new SECCOMP_FILTER_FLAG_KILL_PROCESS flag.
>>
>> Pros:
>>  - the logic for the filter action is contained in the filter.
>>  - userspace can detect support for the feature since earlier kernels
>>will reject the new flag.
>> Cons:
>>  - depends on adding an assignment to the seccomp_run_filters() loop
>>(previous patch).
>>
>> Alternatives to this approach with pros/cons:
>>
>> - Use a new test during seccomp_run_filters() that treats the RET_DATA
>>   mask of a RET_KILL action as special. If a new bit is set in the data,
>>   then treat the return value as -1 (lower than 0).
>>   Pros:
>>- the logic for the filter action is contained in the filter.
>>   Cons:
>>- added complexity to time-sensitive seccomp_run_filters() loop.
>>- there isn't a trivial way for userspace to detect if the kernel
>>  supports the feature (earlier kernels will silently ignore the
>>  RET_DATA and only kill the thread).
>
> I prefer using a filter flag over a special RET_DATA mask but, for
> completeness, I wanted to mention that SECCOMP_GET_ACTION_AVAIL
> operation could be extended to validate special RET_DATA masks. However,
> I don't think that is a clean design.
>
>> - Have SECCOMP_FILTER_FLAG_KILL_PROCESS attach to the seccomp struct
>>   rather than the filter.
>>   Pros:
>>- no change needed to seccomp_run_filters() loop.
>>   Cons:
>>- the change in behavior technically originates external to the
>>  filter, which allows for later filters to "enhance" a previously
>>  applied filter's RET_KILL to kill the entire process, which may
>>  be unexpected.
>>
>> Signed-off-by: Kees Cook 
>> ---
>>  include/linux/seccomp.h  |  3 ++-
>>  include/uapi/linux/seccomp.h |  3 ++-
>>  kernel/seccomp.c | 12 +++-
>>  3 files changed, 15 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
>> index ecc296c137cd..59d001ba655c 100644
>> --- a/include/linux/seccomp.h
>> +++ b/include/linux/seccomp.h
>> @@ -3,7 +3,8 @@
>>
>>  #include 
>>
>> -#define SECCOMP_FILTER_FLAG_MASK (SECCOMP_FILTER_FLAG_TSYNC)
>> +#define SECCOMP_FILTER_FLAG_MASK (SECCOMP_FILTER_FLAG_TSYNC | \
>> +  SECCOMP_FILTER_FLAG_KILL_PROCESS)
>>
>>  #ifdef CONFIG_SECCOMP
>>
>> diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
>> index 0f238a43ff1e..4b75d8c297b6 100644
>> --- a/include/uapi/linux/seccomp.h
>> +++ b/include/uapi/linux/seccomp.h
>> @@ -15,7 +15,8 @@
>>  #define SECCOMP_SET_MODE_FILTER  1
>>
>>  /* Valid flags for SECCOMP_SET_MODE_FILTER */
>> -#define SECCOMP_FILTER_FLAG_TSYNC1
>> +#define SECCOMP_FILTER_FLAG_TSYNC1
>> +#define SECCOMP_FILTER_FLAG_KILL_PROCESS 2
>>
>>  /*
>>   * All BPF programs must return a 32-bit value.
>> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
>> index 8bdcf01379e4..931eb9cbd093 100644
>> --- a/kernel/seccomp.c
>> +++ b/kernel/seccomp.c
>> @@ -44,6 +44,7 @@
>>   * is only needed for handling filters shared across tasks.
>>   * @prev: points to a previously installed, or inherited, filter
>>   * @prog: the BPF program to evaluate
>> + * @kill_process: if true, RET_KILL will kill process rather than thread.
>>   *
>>   * seccomp_filter objects are organized in a tree linked via the @prev
>>   * pointer.  For any task, it appears to be a singly-linked list starting
>> @@ -59,6 +60,7 @@ struct seccomp_filter {
>>   refcount_t usage;
>>   struct seccomp_filter *prev;
>>   struct bpf_prog *prog;
>> + bool kill_process;
>
> Just a reminder to move bool up to be the 2nd member of the struct for
> improved struct packing. (You already said you were going to this while
> you were reviewing my logging patches)

Thanks! Yeah, done now.

>>  };
>>
>>  /* Limit any path through the tree to 256KB worth of instructions. */
>> @@ -448,6 +450,10 @@ static long seccomp_attach_filter(unsigned int flags,
>>   return ret;
>>   }
>>
>> + /* Set process-killing flag, if present. */
>> + if (flags & SECCOMP_FILTER_FLAG_KILL_PROCESS)
>> + filter->kill_process = true;
>> +
>>   /*
>>* If there is an existing filter, make it the prev and don't drop its
>>* task 

Re: [PATCH] media: i2c: OV5647: gate clock lane before stream on

2017-08-07 Thread Jacob Chen
Hi all,

2017-08-07 22:48 GMT+08:00 Luis Oliveira :
> Hi all,
>
> I'm new here, I got to be Maintainer of this driver by the old Maintainer
> recommendation. Still getting the hang of it :)
>
> On 07-Aug-17 13:26, Philipp Zabel wrote:
>> Hi Jacob,
>>
>> On Mon, 2017-08-07 at 19:06 +0800, Jacob Chen wrote:
>> [...]
>> --- a/drivers/media/i2c/ov5647.c
>> +++ b/drivers/media/i2c/ov5647.c
>> @@ -253,6 +253,10 @@ static int ov5647_stream_on(struct v4l2_subdev *sd)
>>  {
>> int ret;
>>
>> +   ret = ov5647_write(sd, 0x4800, 0x04);
>> +   if (ret < 0)
>> +   return ret;
>> +
>>
>> So this clears BIT(1) (force clock lane to low power mode) and BIT(5)
>> (gate clock lane while idle) that were set by ov5647_stream_off() during
>> __sensor_init() due to the change below.
>>
>> Is there a reason, btw, that this driver is full of magic register
>> addresses and values? A few #defines would make this a lot more
>> readable.
>>
>
> For what I can see I agree that a few register name setting could be done.
>
>> ret = ov5647_write(sd, 0x4202, 0x00);
>> if (ret < 0)
>> return ret;
>> @@ -264,6 +268,10 @@ static int ov5647_stream_off(struct v4l2_subdev *sd)
>>  {
>> int ret;
>>
>> +   ret = ov5647_write(sd, 0x4800, 0x25);
>> +   if (ret < 0)
>> +   return ret;
>> +
>> ret = ov5647_write(sd, 0x4202, 0x0f);
>> if (ret < 0)
>> return ret;
>> @@ -320,7 +328,7 @@ static int __sensor_init(struct v4l2_subdev *sd)
>> return ret;
>> }
>>
>> -   return ov5647_write(sd, 0x4800, 0x04);
>> +   return ov5647_stream_off(sd);
>>
>> I see now that BIT(2) (keep bus in LP-11 while idle) is and was always
>> set. So the change is that initially, additionally to LP-11 mode, the
>> clock lane is gated and forced into low power mode, as well?
>>
>
> This is my interpretation as well.
>

BIT(0) are not necessary, just i saw many driver have set it both with BIT(5).

>>  }
>>
>>  static int ov5647_sensor_power(struct v4l2_subdev *sd, int on)
>> --
>> 2.7.4
>>
>
> Can anyone comment on it?
>
> I saw there is a same discussion in  
> https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.kernel.org_patch_9569031_=DwICaQ=DPL6_X_6JkXFx7AXWqB0tg=eMn12aiiNuIDjtRi5xEzC7tWJkpra2vl_XYFVvfxIGE=eortcRXje2uLyZNI_-Uw3Ur_z24tb-e4pZfom7WhdE0=6sLc76bhjR0IdaA3ArZ7F7slgtcyGz8pDTzAF_CBLno=
> There is a comment in i.MX CSI2 driver.
> "
> Configure MIPI Camera Sensor to put all Tx lanes in LP-11 state.
> This must be carried out by the MIPI sensor's s_power(ON) subdev
> op.
> "
> That's what this patch do, sensor driver should make sure that clock
> lanes are in stop state while not streaming.

 This is not the same, as far as I can tell. BIT(5) is just clock lane
 gating, as you describe above. To put the bus into LP-11 state, BIT(2)
 needs to be set.

>>>
>>> Yeah, but i double that clock lane is not in LP11 when continue clock
>>> mode is enabled.
>
> I think by spec it shouldn't got to stopstate in continuous clock.
>
>>
>> If indeed LP-11 state is not achieved while the sensor is idle, as long
>> as BIT(5) is cleared, I think this patch is correct.
>>
>> regards
>> Philipp
>>
>
> As far as I understand, bit[5] set to 1 will force clock lane to be gated (in
> other words it will be forced to be in LP-11 if there are no packets to
> transmit). But also LP-11 must not be achieved with the BIT(5) cleared (free
> running mode)?
>
> Sorry if I misunderstood something.
>

I do some experiments.
I didn't have instruments to test, so i just observe it through phy registers.

If BIT(5) are cleared in "ov5647_sensor_power" and do nothing about it
in "ov5647_stream_on",
Phy didn't get a SoT from sensor in "ov5647_stream_on" and it keep its
clock lane in lp mode.


if BIT(5) are set in "ov5647_sensor_power", and cleared in "ov5647_stream_on".
Phy will get a SoT and the clock lane will enter hs mode.


So i'm pretty sure that LP-11 must not be achieved with the BIT(5) cleared.

> regards,
> Luis
>


Re: [lkp-robot] [mm] 7674270022: will-it-scale.per_process_ops -19.3% regression

2017-08-07 Thread Minchan Kim
Hi,

On Tue, Aug 08, 2017 at 09:19:23AM +0800, kernel test robot wrote:
> 
> Greeting,
> 
> FYI, we noticed a -19.3% regression of will-it-scale.per_process_ops due to 
> commit:
> 
> 
> commit: 76742700225cad9df49f05399381ac3f1ec3dc60 ("mm: fix 
> MADV_[FREE|DONTNEED] TLB flush miss problem")
> url: 
> https://github.com/0day-ci/linux/commits/Nadav-Amit/mm-migrate-prevent-racy-access-to-tlb_flush_pending/20170802-205715
> 
> 
> in testcase: will-it-scale
> on test machine: 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 
> 64G memory
> with following parameters:
> 
>   nr_task: 16
>   mode: process
>   test: brk1
>   cpufreq_governor: performance
> 
> test-description: Will It Scale takes a testcase and runs it from 1 through 
> to n parallel copies to see if the testcase will scale. It builds both a 
> process and threads based test in order to see any differences between the 
> two.
> test-url: https://github.com/antonblanchard/will-it-scale

Thanks for the report.
Could you explain what kinds of workload you are testing?

Does it calls frequently madvise(MADV_DONTNEED) in parallel on multiple
threads?


Re: [PATCH v2 1/5] mfd: mt6397: create irq mappings in mfd core driver

2017-08-07 Thread Dmitry Torokhov
On Mon, Aug 07, 2017 at 11:32:44PM +0200, Alexandre Belloni wrote:
> On 07/08/2017 at 09:57:41 +0800, Chen Zhong wrote:
> > The core driver should create and manage irq mappings instead of
> > leaf drivers. This patch change to pass irq domain to
> > devm_mfd_add_devices() and it will create mapping for irq resources
> > automatically. And remove irq mapping in rtc driver since this has
> > been done in core driver.
> > 
> > Signed-off-by: Chen Zhong 
> 
> For the RTC part:
> Acked-by: Alexandre Belloni 
> 
> > ---
> >  drivers/mfd/mt6397-core.c |4 ++--
> >  drivers/rtc/rtc-mt6397.c  |2 +-
> >  2 files changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/mfd/mt6397-core.c b/drivers/mfd/mt6397-core.c
> > index 04a601f..6546d7f 100644
> > --- a/drivers/mfd/mt6397-core.c
> > +++ b/drivers/mfd/mt6397-core.c
> > @@ -289,7 +289,7 @@ static int mt6397_probe(struct platform_device *pdev)
> >  
> > ret = devm_mfd_add_devices(>dev, -1, mt6323_devs,
> >ARRAY_SIZE(mt6323_devs), NULL,
> > -  0, NULL);
> > +  0, pmic->irq_domain);
> > break;
> >  
> > case MT6397_CID_CODE:
> > @@ -304,7 +304,7 @@ static int mt6397_probe(struct platform_device *pdev)
> >  
> > ret = devm_mfd_add_devices(>dev, -1, mt6397_devs,
> >ARRAY_SIZE(mt6397_devs), NULL,
> > -  0, NULL);
> > +  0, pmic->irq_domain);
> > break;
> >  
> > default:
> > diff --git a/drivers/rtc/rtc-mt6397.c b/drivers/rtc/rtc-mt6397.c
> > index 1a61fa5..22c52f7 100644
> > --- a/drivers/rtc/rtc-mt6397.c
> > +++ b/drivers/rtc/rtc-mt6397.c
> > @@ -323,7 +323,7 @@ static int mtk_rtc_probe(struct platform_device *pdev)
> > rtc->addr_base = res->start;
> >  
> > res = platform_get_resource(pdev, IORESOURCE_IRQ, 0);
> > -   rtc->irq = irq_create_mapping(mt6397_chip->irq_domain, res->start);
> > +   rtc->irq = res->start;

Why not

rtc->irq = platform_get_irq(pdev, 0);
if (rtc->irq < 0)
return rtc->irq;
?

This way you propagate error properly.

Thanks.

-- 
Dmitry


Re: [PATCH 2/2] f2fs: introduce gc_urgent mode for background GC

2017-08-07 Thread Chao Yu
Hi Jaegeuk,

On 2017/8/8 9:42, Jaegeuk Kim wrote:
> This patch adds a sysfs entry to control urgent mode for background GC.
> If this is set, background GC thread conducts GC with gc_urgent_sleep_time
> all the time.

Good idea.

If we want to add more gc policy, current approach is not friendly to be
extended, and sysfs nodes are also become more and more, it's not friendly to
user. So I'd like to suggest adding /sys/fs/f2fs//gc_policy only, and
exposing original policy as normal_mode, and then introduce urgent_mode and
reuse gc_min_sleep_time as gc_urgent_sleep_time in this patch.

e.g.

enum gc_policy {
GC_NORMAL,
GC_URGENT,
};

If we want to turn on urgent_mode, we could:
echo 1 > /sys/fs/f2fs//gc_policy
echo 1000 > /sys/fs/f2fs//gc_min_sleep_time

How do you think?

Thanks,

> 
> Signed-off-by: Jaegeuk Kim 
> ---
>  Documentation/ABI/testing/sysfs-fs-f2fs | 12 
>  fs/f2fs/gc.c| 17 +++--
>  fs/f2fs/gc.h|  4 
>  fs/f2fs/sysfs.c |  9 +
>  4 files changed, 40 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs 
> b/Documentation/ABI/testing/sysfs-fs-f2fs
> index c579ce5e0ef5..11b7f4ebea7c 100644
> --- a/Documentation/ABI/testing/sysfs-fs-f2fs
> +++ b/Documentation/ABI/testing/sysfs-fs-f2fs
> @@ -139,3 +139,15 @@ Date:June 2017
>  Contact: "Chao Yu" 
>  Description:
>Controls current reserved blocks in system.
> +
> +What:/sys/fs/f2fs//gc_urgent
> +Date:August 2017
> +Contact: "Jaegeuk Kim" 
> +Description:
> +  Do background GC agressively
> +
> +What:/sys/fs/f2fs//gc_urgent_sleep_time
> +Date:August 2017
> +Contact: "Jaegeuk Kim" 
> +Description:
> +  Controls sleep time of GC urgent mode
> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> index 620dca443b29..8da7c14a9d29 100644
> --- a/fs/f2fs/gc.c
> +++ b/fs/f2fs/gc.c
> @@ -35,9 +35,14 @@ static int gc_thread_func(void *data)
>   set_freezable();
>   do {
>   wait_event_interruptible_timeout(*wq,
> - kthread_should_stop() || freezing(current),
> + kthread_should_stop() || freezing(current) ||
> + gc_th->gc_wake,
>   msecs_to_jiffies(wait_ms));
>  
> + /* give it a try one time */
> + if (gc_th->gc_wake)
> + gc_th->gc_wake = 0;
> +
>   if (try_to_freeze())
>   continue;
>   if (kthread_should_stop())
> @@ -74,6 +79,11 @@ static int gc_thread_func(void *data)
>   if (!mutex_trylock(>gc_mutex))
>   goto next;
>  
> + if (gc_th->gc_urgent) {
> + wait_ms = gc_th->urgent_sleep_time;
> + goto do_gc;
> + }
> +
>   if (!is_idle(sbi)) {
>   increase_sleep_time(gc_th, _ms);
>   mutex_unlock(>gc_mutex);
> @@ -84,7 +94,7 @@ static int gc_thread_func(void *data)
>   decrease_sleep_time(gc_th, _ms);
>   else
>   increase_sleep_time(gc_th, _ms);
> -
> +do_gc:
>   stat_inc_bggc_count(sbi);
>  
>   /* if return value is not zero, no victim was selected */
> @@ -115,11 +125,14 @@ int start_gc_thread(struct f2fs_sb_info *sbi)
>   goto out;
>   }
>  
> + gc_th->urgent_sleep_time = DEF_GC_THREAD_URGENT_SLEEP_TIME;
>   gc_th->min_sleep_time = DEF_GC_THREAD_MIN_SLEEP_TIME;
>   gc_th->max_sleep_time = DEF_GC_THREAD_MAX_SLEEP_TIME;
>   gc_th->no_gc_sleep_time = DEF_GC_THREAD_NOGC_SLEEP_TIME;
>  
>   gc_th->gc_idle = 0;
> + gc_th->gc_urgent = 0;
> + gc_th->gc_wake= 0;
>  
>   sbi->gc_thread = gc_th;
>   init_waitqueue_head(>gc_thread->gc_wait_queue_head);
> diff --git a/fs/f2fs/gc.h b/fs/f2fs/gc.h
> index a993967dcdb9..57a9000ce3af 100644
> --- a/fs/f2fs/gc.h
> +++ b/fs/f2fs/gc.h
> @@ -13,6 +13,7 @@
>* whether IO subsystem is idle
>* or not
>*/
> +#define DEF_GC_THREAD_URGENT_SLEEP_TIME  500 /* 500 ms */
>  #define DEF_GC_THREAD_MIN_SLEEP_TIME 3   /* milliseconds */
>  #define DEF_GC_THREAD_MAX_SLEEP_TIME 6
>  #define DEF_GC_THREAD_NOGC_SLEEP_TIME30  /* wait 5 min */
> @@ -27,12 +28,15 @@ struct f2fs_gc_kthread {
>   wait_queue_head_t gc_wait_queue_head;
>  
>   /* for gc sleep time */
> + unsigned int urgent_sleep_time;
>   unsigned int min_sleep_time;
>   unsigned int max_sleep_time;
>   unsigned int no_gc_sleep_time;
>  
>   /* 

Re: [PATCH 2/2] f2fs: introduce gc_urgent mode for background GC

2017-08-07 Thread Jaegeuk Kim
Hi Chao,

On 08/08, Chao Yu wrote:
> Hi Jaegeuk,
> 
> On 2017/8/8 9:42, Jaegeuk Kim wrote:
> > This patch adds a sysfs entry to control urgent mode for background GC.
> > If this is set, background GC thread conducts GC with gc_urgent_sleep_time
> > all the time.
> 
> Good idea.
> 
> If we want to add more gc policy, current approach is not friendly to be
> extended, and sysfs nodes are also become more and more, it's not friendly to
> user. So I'd like to suggest adding /sys/fs/f2fs//gc_policy only, and
> exposing original policy as normal_mode, and then introduce urgent_mode and
> reuse gc_min_sleep_time as gc_urgent_sleep_time in this patch.
> 
> e.g.
> 
> enum gc_policy {
>   GC_NORMAL,
>   GC_URGENT,
> };
> 
> If we want to turn on urgent_mode, we could:
> echo 1 > /sys/fs/f2fs//gc_policy
> echo 1000 > /sys/fs/f2fs//gc_min_sleep_time

I want to keep previous gc_min_sleep_time, so that user can go back to normal
state seamlessly.

Thanks,

> 
> How do you think?
> 
> Thanks,
> 
> > 
> > Signed-off-by: Jaegeuk Kim 
> > ---
> >  Documentation/ABI/testing/sysfs-fs-f2fs | 12 
> >  fs/f2fs/gc.c| 17 +++--
> >  fs/f2fs/gc.h|  4 
> >  fs/f2fs/sysfs.c |  9 +
> >  4 files changed, 40 insertions(+), 2 deletions(-)
> > 
> > diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs 
> > b/Documentation/ABI/testing/sysfs-fs-f2fs
> > index c579ce5e0ef5..11b7f4ebea7c 100644
> > --- a/Documentation/ABI/testing/sysfs-fs-f2fs
> > +++ b/Documentation/ABI/testing/sysfs-fs-f2fs
> > @@ -139,3 +139,15 @@ Date:  June 2017
> >  Contact:   "Chao Yu" 
> >  Description:
> >  Controls current reserved blocks in system.
> > +
> > +What:  /sys/fs/f2fs//gc_urgent
> > +Date:  August 2017
> > +Contact:   "Jaegeuk Kim" 
> > +Description:
> > +Do background GC agressively
> > +
> > +What:  /sys/fs/f2fs//gc_urgent_sleep_time
> > +Date:  August 2017
> > +Contact:   "Jaegeuk Kim" 
> > +Description:
> > +Controls sleep time of GC urgent mode
> > diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> > index 620dca443b29..8da7c14a9d29 100644
> > --- a/fs/f2fs/gc.c
> > +++ b/fs/f2fs/gc.c
> > @@ -35,9 +35,14 @@ static int gc_thread_func(void *data)
> > set_freezable();
> > do {
> > wait_event_interruptible_timeout(*wq,
> > -   kthread_should_stop() || freezing(current),
> > +   kthread_should_stop() || freezing(current) ||
> > +   gc_th->gc_wake,
> > msecs_to_jiffies(wait_ms));
> >  
> > +   /* give it a try one time */
> > +   if (gc_th->gc_wake)
> > +   gc_th->gc_wake = 0;
> > +
> > if (try_to_freeze())
> > continue;
> > if (kthread_should_stop())
> > @@ -74,6 +79,11 @@ static int gc_thread_func(void *data)
> > if (!mutex_trylock(>gc_mutex))
> > goto next;
> >  
> > +   if (gc_th->gc_urgent) {
> > +   wait_ms = gc_th->urgent_sleep_time;
> > +   goto do_gc;
> > +   }
> > +
> > if (!is_idle(sbi)) {
> > increase_sleep_time(gc_th, _ms);
> > mutex_unlock(>gc_mutex);
> > @@ -84,7 +94,7 @@ static int gc_thread_func(void *data)
> > decrease_sleep_time(gc_th, _ms);
> > else
> > increase_sleep_time(gc_th, _ms);
> > -
> > +do_gc:
> > stat_inc_bggc_count(sbi);
> >  
> > /* if return value is not zero, no victim was selected */
> > @@ -115,11 +125,14 @@ int start_gc_thread(struct f2fs_sb_info *sbi)
> > goto out;
> > }
> >  
> > +   gc_th->urgent_sleep_time = DEF_GC_THREAD_URGENT_SLEEP_TIME;
> > gc_th->min_sleep_time = DEF_GC_THREAD_MIN_SLEEP_TIME;
> > gc_th->max_sleep_time = DEF_GC_THREAD_MAX_SLEEP_TIME;
> > gc_th->no_gc_sleep_time = DEF_GC_THREAD_NOGC_SLEEP_TIME;
> >  
> > gc_th->gc_idle = 0;
> > +   gc_th->gc_urgent = 0;
> > +   gc_th->gc_wake= 0;
> >  
> > sbi->gc_thread = gc_th;
> > init_waitqueue_head(>gc_thread->gc_wait_queue_head);
> > diff --git a/fs/f2fs/gc.h b/fs/f2fs/gc.h
> > index a993967dcdb9..57a9000ce3af 100644
> > --- a/fs/f2fs/gc.h
> > +++ b/fs/f2fs/gc.h
> > @@ -13,6 +13,7 @@
> >  * whether IO subsystem is idle
> >  * or not
> >  */
> > +#define DEF_GC_THREAD_URGENT_SLEEP_TIME500 /* 500 ms */
> >  #define DEF_GC_THREAD_MIN_SLEEP_TIME   3   /* milliseconds */
> >  #define DEF_GC_THREAD_MAX_SLEEP_TIME   6
> >  #define DEF_GC_THREAD_NOGC_SLEEP_TIME  30  /* wait 5 min */
> 

[PATCH] f2fs: fix some cases with reserved_blocks

2017-08-07 Thread Yunlong Song
Signed-off-by: Yunlong Song 
---
 fs/f2fs/recovery.c | 3 ++-
 fs/f2fs/super.c| 9 +
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
index a3d0261..e288319 100644
--- a/fs/f2fs/recovery.c
+++ b/fs/f2fs/recovery.c
@@ -51,7 +51,8 @@ bool space_for_roll_forward(struct f2fs_sb_info *sbi)
 {
s64 nalloc = percpu_counter_sum_positive(>alloc_valid_block_count);
 
-   if (sbi->last_valid_block_count + nalloc > sbi->user_block_count)
+   if (sbi->last_valid_block_count + nalloc +
+   sbi->reserved_blocks > sbi->user_block_count)
return false;
return true;
 }
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 4c1bdcb..c644bf5 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -946,6 +946,7 @@ static int f2fs_statfs(struct dentry *dentry, struct 
kstatfs *buf)
u64 id = huge_encode_dev(sb->s_bdev->bd_dev);
block_t total_count, user_block_count, start_count, ovp_count;
u64 avail_node_count;
+   block_t avail_user_block_count;
 
total_count = le64_to_cpu(sbi->raw_super->block_count);
user_block_count = sbi->user_block_count;
@@ -953,16 +954,16 @@ static int f2fs_statfs(struct dentry *dentry, struct 
kstatfs *buf)
ovp_count = SM_I(sbi)->ovp_segments << sbi->log_blocks_per_seg;
buf->f_type = F2FS_SUPER_MAGIC;
buf->f_bsize = sbi->blocksize;
+   avail_user_block_count = user_block_count - sbi->reserved_blocks;
 
buf->f_blocks = total_count - start_count;
buf->f_bfree = user_block_count - valid_user_blocks(sbi) + ovp_count;
-   buf->f_bavail = user_block_count - valid_user_blocks(sbi) -
-   sbi->reserved_blocks;
+   buf->f_bavail = avail_user_block_count - valid_user_blocks(sbi);
 
avail_node_count = sbi->total_node_count - F2FS_RESERVED_NODE_NUM;
 
-   if (avail_node_count > user_block_count) {
-   buf->f_files = user_block_count;
+   if (avail_node_count > avail_user_block_count) {
+   buf->f_files = avail_user_block_count;
buf->f_ffree = buf->f_bavail;
} else {
buf->f_files = avail_node_count;
-- 
1.8.5.2



Re: [PATCH 3.18 00/50] 3.18.64-stable review

2017-08-07 Thread Guenter Roeck

On 08/07/2017 12:34 PM, Greg Kroah-Hartman wrote:

On Sat, Aug 05, 2017 at 12:11:19PM -0700, Guenter Roeck wrote:

On 08/05/2017 08:43 AM, Greg Kroah-Hartman wrote:

On Sat, Aug 05, 2017 at 08:02:17AM +0200, Willy Tarreau wrote:

On Sat, Aug 05, 2017 at 07:55:11AM +0200, Willy Tarreau wrote:

On Fri, Aug 04, 2017 at 07:51:07PM -0700, Greg Kroah-Hartman wrote:

On Fri, Aug 04, 2017 at 07:46:57PM -0700, Greg Kroah-Hartman wrote:

On Fri, Aug 04, 2017 at 06:43:50PM -0700, Guenter Roeck wrote:

On 08/04/2017 04:15 PM, Greg Kroah-Hartman wrote:

This is the start of the stable review cycle for the 3.18.64 release.
There are 50 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Sun Aug  6 23:15:34 UTC 2017.
Anything received after that time might be too late.



Preliminary:

Lots of

lib/string.c:31:32: fatal error: asm/word-at-a-time.h

affecting several architectures.

alpha:

lib/string.c:217:4: error: implicit declaration of function 'zero_bytemask'


Hm, I think I need to add c753bf34c94e ("word-at-a-time.h: support
zero_bytemask() on alpha and tile"), right?  Any other arches failing?


Hm, that doesn't work, do we care about tile? :)

Let me see how deep this hole is, I just wanted to get strscpy into 3.18
to fix a bug...


I suspect you'll need this one which came as part of the strscpy() series
between 4.2 and 4.3 (though I have not tested) :

commit a6e2f029ae34f41adb6ae3812c32c5d326e1abd2
Author: Chris Metcalf 
Date:   Wed Apr 29 12:48:40 2015 -0400

  Make asm/word-at-a-time.h available on all architectures
  Added the x86 implementation of word-at-a-time to the
  generic version, which previously only supported big-endian.
  (...)


OK I just applied it on top of 3.18.64-rc1 and it allowed me to build mips
which previously broke. It will not apply as-is, you'll need to drop the
change for arch/nios2/include/asm/Kbuild, and after that it's OK.


Thanks for that, I've now queued that patch up.



Better, but there are still some errors.

powerpc:
lib/string.c: In function 'strscpy':
lib/string.c:217:4: error: implicit declaration of function 'zero_bytemask'

tile:
arch/tile/gxio/mpipe.c:46:15: error: conflicting types for 'strscpy'
include/linux/string.h:29:22: note: previous declaration of 'strscpy' was here

Missing patches:

7a5692e6e533 ("arch/powerpc: provide zero_bytemask() for big-endian")
30059d494a72 ("tile: use global strscpy() rather than private copy")


Thanks for these, I'll queue them up.  And do a -rc2 in a few days as
this was a mess...



Getting there. With v3.18.63-62-gc7d9ae0:

Build results:
total: 136 pass: 136 fail: 0
Qemu test results:
total: 111 pass: 111 fail: 0

Details are available at http://kerneltests.org/builders.

Guenter


[PATCH] usb: imx21-hcd: fix error return code in imx21_probe()

2017-08-07 Thread Gustavo A. R. Silva
platform_get_irq() returns an error code, but the imx21-hcd driver
ignores it and always returns -ENXIO. This is not correct, and
prevents -EPROBE_DEFER from being propagated properly.

Notice that platform_get_irq() no longer returns 0 on error:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e330b9a6bb35dc7097a4f02cb1ae7b6f96df92af

Print error message and propagate the return value of platform_get_irq
on failure.

This issue was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/usb/host/imx21-hcd.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/host/imx21-hcd.c b/drivers/usb/host/imx21-hcd.c
index f542045..e25d72e 100644
--- a/drivers/usb/host/imx21-hcd.c
+++ b/drivers/usb/host/imx21-hcd.c
@@ -1849,8 +1849,10 @@ static int imx21_probe(struct platform_device *pdev)
if (!res)
return -ENODEV;
irq = platform_get_irq(pdev, 0);
-   if (irq < 0)
-   return -ENXIO;
+   if (irq < 0) {
+   dev_err(>dev, "Failed to get IRQ: %d\n", irq);
+   return irq;
+   }
 
hcd = usb_create_hcd(_hc_driver,
>dev, dev_name(>dev));
-- 
2.5.0



Re: [RESEND PATCH] bcache: Don't reinvent the wheel but use existing llist API

2017-08-07 Thread Byungchul Park
On Mon, Aug 07, 2017 at 06:18:35PM +0800, Coly Li wrote:
> On 2017/8/7 下午4:38, Byungchul Park wrote:
> > Although llist provides proper APIs, they are not used. Make them used.
> > 
> > Signed-off-by: Byungchul Park  Only have a question about why not using llist_for_each_entry(), it's

Hello,

The reason is to keep the original logic unchanged. The logic already
does as if it's the safe version against removal.

> still OK with llist_for_each_entry_safe(). The rested part is good to me.
> 
> Acked-by: Coly Li 
> 
> > ---
> >  drivers/md/bcache/closure.c | 17 +++--
> >  1 file changed, 3 insertions(+), 14 deletions(-)
> > 
> > diff --git a/drivers/md/bcache/closure.c b/drivers/md/bcache/closure.c
> > index 864e673..1841d03 100644
> > --- a/drivers/md/bcache/closure.c
> > +++ b/drivers/md/bcache/closure.c
> > @@ -64,27 +64,16 @@ void closure_put(struct closure *cl)
> >  void __closure_wake_up(struct closure_waitlist *wait_list)
> >  {
> > struct llist_node *list;
> > -   struct closure *cl;
> > +   struct closure *cl, *t;
> > struct llist_node *reverse = NULL;
> >  
> > list = llist_del_all(_list->list);
> >  
> > /* We first reverse the list to preserve FIFO ordering and fairness */
> > -
> > -   while (list) {
> > -   struct llist_node *t = list;
> > -   list = llist_next(list);
> > -
> > -   t->next = reverse;
> > -   reverse = t;
> > -   }
> > +   reverse = llist_reverse_order(list);
> >  
> > /* Then do the wakeups */
> > -
> > -   while (reverse) {
> > -   cl = container_of(reverse, struct closure, list);
> > -   reverse = llist_next(reverse);
> > -
> > +   llist_for_each_entry_safe(cl, t, reverse, list) {
> 
> Just wondering why not using llist_for_each_entry(), or you use the
> _safe version on purpose ?

If I use llist_for_each_entry(), then it would change the original
behavior. Is it ok?

Thank you,
Byungchul



[PATCH v5 1/1] usb:host:xhci support option to disable the xHCI USB2 HW LPM

2017-08-07 Thread Thang Q. Nguyen
XHCI specification 1.1 does not require xHCI-compliant controllers
to always enable hardware USB2 LPM. However, the current xHCI
driver always enable it when seeing HLC=1.
This patch supports an option for users to control disabling
USB2 Hardware LPM via DT/ACPI attribute.
This option is needed in case user would like to disable this
feature. For example, their xHCI controller has its USB2 HW LPM
broken.

Signed-off-by: Tung Nguyen 
Signed-off-by: Thang Q. Nguyen 
Acked-by: Rob Herring 
---
Changes since v4:
 - When HW LPM is optionally disabled, explicitly disable HLE, RWE, ...
 - Update codes to work with kernel 4.13-rc4
 - Add Acked-By from Rob Herring 
Changes since v3:
 - Bypass updating LPM parameters when HW LPM is optionally disabled.
Changes since v2:
 - Change code to disable HW LPM as an option for user which
   is set via ACPI/DT.
Changes since v1:
 - Update DT/ACPI attribute and corresponding codes from HLE to LPM to
   be consistent with other attribute names.
---
 Documentation/devicetree/bindings/usb/usb-xhci.txt |1 +
 drivers/usb/host/xhci-plat.c   |3 +++
 drivers/usb/host/xhci.c|2 +-
 drivers/usb/host/xhci.h|1 +
 4 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/Documentation/devicetree/bindings/usb/usb-xhci.txt 
b/Documentation/devicetree/bindings/usb/usb-xhci.txt
index 2d80b60..ae6e484 100644
--- a/Documentation/devicetree/bindings/usb/usb-xhci.txt
+++ b/Documentation/devicetree/bindings/usb/usb-xhci.txt
@@ -26,6 +26,7 @@ Required properties:
 
 Optional properties:
   - clocks: reference to a clock
+  - usb2-lpm-disable: indicate if we don't want to enable USB2 HW LPM
   - usb3-lpm-capable: determines if platform is USB3 LPM capable
   - quirk-broken-port-ped: set if the controller has broken port disable 
mechanism
 
diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c
index c04144b..9028fb5 100644
--- a/drivers/usb/host/xhci-plat.c
+++ b/drivers/usb/host/xhci-plat.c
@@ -267,6 +267,9 @@ static int xhci_plat_probe(struct platform_device *pdev)
goto disable_clk;
}
 
+   if (device_property_read_bool(sysdev, "usb2-lpm-disable"))
+   xhci->quirks |= XHCI_HW_LPM_DISABLE;
+
if (device_property_read_bool(sysdev, "usb3-lpm-capable"))
xhci->quirks |= XHCI_LPM_SUPPORT;
 
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index b2ff1ff..3a8e75f 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -4087,7 +4087,7 @@ static int xhci_set_usb2_hardware_lpm(struct usb_hcd *hcd,
xhci_dbg(xhci, "%s port %d USB2 hardware LPM\n",
enable ? "enable" : "disable", port_num + 1);
 
-   if (enable) {
+   if (enable && !(xhci->quirks & XHCI_HW_LPM_DISABLE)) {
/* Host supports BESL timeout instead of HIRD */
if (udev->usb2_hw_lpm_besl_capable) {
/* if device doesn't have a preferred BESL value use a
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index e3e9352..5d89c51 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -1821,6 +1821,7 @@ struct xhci_hcd {
 #define XHCI_LIMIT_ENDPOINT_INTERVAL_7 (1 << 26)
 #define XHCI_U2_DISABLE_WAKE   (1 << 27)
 #define XHCI_ASMEDIA_MODIFY_FLOWCONTROL(1 << 28)
+#define XHCI_HW_LPM_DISABLE(1 << 29)
 
unsigned intnum_active_eps;
unsigned intlimit_active_eps;
-- 
1.7.1



[PATCH] mm, devm_memremap_pages: use multi-order radix for ZONE_DEVICE lookups

2017-08-07 Thread Dan Williams
devm_memremap_pages() records mapped ranges in pgmap_radix with an entry
per section's worth of memory (128MB).  The key for each of those
entries is a section number.

This leads to false positives when devm_memremap_pages() is passed a
section-unaligned range as lookups in the misalignment fail to return
NULL. We can close this hole by using the pfn as the key for entries in
the tree.  The number of entries required to describe a remapped range
is reduced by leveraging multi-order entries.

In practice this approach usually yields just one entry in the tree if
the size and starting address are of the same power-of-2 alignment.
Previously we always needed nr_entries = mapping_size / 128MB.

Link: https://lists.01.org/pipermail/linux-nvdimm/2016-August/00.html
Reported-by: Toshi Kani 
Cc: Andrew Morton 
Cc: Matthew Wilcox 
Signed-off-by: Dan Williams 
---
Hi Andrew,

This is an optimization for devm_memremap_pages() that has been idling
in my local tree for a while. At one point Matthew had proposed an
official radix tree api to allow filling a range of a radix similar to
what the foreach_order_pgoff() loop is performing. Perhaps this prompts
Matthew to dust that proposal off, but this patch otherwise minimizes
the amount of entries we need in the pgmap_radix.

This patch is against latest -next.


 kernel/memremap.c |   52 ++--
 mm/Kconfig|1 +
 2 files changed, 39 insertions(+), 14 deletions(-)

diff --git a/kernel/memremap.c b/kernel/memremap.c
index 9afdc434fb49..066e73c2fcc9 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -194,18 +194,41 @@ struct page_map {
struct vmem_altmap altmap;
 };
 
-static void pgmap_radix_release(struct resource *res)
+static unsigned long order_at(struct resource *res, unsigned long pgoff)
 {
-   resource_size_t key, align_start, align_size, align_end;
+   unsigned long phys_pgoff = PHYS_PFN(res->start) + pgoff;
+   unsigned long nr_pages, mask;
 
-   align_start = res->start & ~(SECTION_SIZE - 1);
-   align_size = ALIGN(resource_size(res), SECTION_SIZE);
-   align_end = align_start + align_size - 1;
+   nr_pages = PHYS_PFN(resource_size(res));
+   if (nr_pages == pgoff)
+   return ULONG_MAX;
+
+   /*
+* What is the largest aligned power-of-2 range available from
+* this resource pgoff to the end of the resource range,
+* considering the alignment of the current pgoff?
+*/
+   mask = phys_pgoff | rounddown_pow_of_two(nr_pages - pgoff);
+   if (!mask)
+   return ULONG_MAX;
+
+   return find_first_bit(, BITS_PER_LONG);
+}
+
+#define foreach_order_pgoff(res, order, pgoff) \
+   for (pgoff = 0, order = order_at((res), pgoff); order < ULONG_MAX; \
+   pgoff += 1UL << order, order = order_at((res), pgoff))
+
+static void pgmap_radix_release(struct resource *res)
+{
+   unsigned long pgoff, order;
 
mutex_lock(_lock);
-   for (key = res->start; key <= res->end; key += SECTION_SIZE)
-   radix_tree_delete(_radix, key >> PA_SECTION_SHIFT);
+   foreach_order_pgoff(res, order, pgoff)
+   radix_tree_delete(_radix, PHYS_PFN(res->start) + pgoff);
mutex_unlock(_lock);
+
+   synchronize_rcu();
 }
 
 static unsigned long pfn_first(struct page_map *page_map)
@@ -268,7 +291,7 @@ struct dev_pagemap *find_dev_pagemap(resource_size_t phys)
 
WARN_ON_ONCE(!rcu_read_lock_held());
 
-   page_map = radix_tree_lookup(_radix, phys >> PA_SECTION_SHIFT);
+   page_map = radix_tree_lookup(_radix, PHYS_PFN(phys));
return page_map ? _map->pgmap : NULL;
 }
 
@@ -293,12 +316,12 @@ struct dev_pagemap *find_dev_pagemap(resource_size_t phys)
 void *devm_memremap_pages(struct device *dev, struct resource *res,
struct percpu_ref *ref, struct vmem_altmap *altmap)
 {
-   resource_size_t key, align_start, align_size, align_end;
+   resource_size_t align_start, align_size, align_end;
+   unsigned long pfn, pgoff, order;
pgprot_t pgprot = PAGE_KERNEL;
struct dev_pagemap *pgmap;
struct page_map *page_map;
int error, nid, is_ram;
-   unsigned long pfn;
 
align_start = res->start & ~(SECTION_SIZE - 1);
align_size = ALIGN(res->start + resource_size(res), SECTION_SIZE)
@@ -337,11 +360,12 @@ void *devm_memremap_pages(struct device *dev, struct 
resource *res,
mutex_lock(_lock);
error = 0;
align_end = align_start + align_size - 1;
-   for (key = align_start; key <= align_end; key += SECTION_SIZE) {
+
+   foreach_order_pgoff(res, order, pgoff) {
struct dev_pagemap *dup;
 
rcu_read_lock();
-   dup = find_dev_pagemap(key);
+   dup = find_dev_pagemap(res->start + PFN_PHYS(pgoff));
  

[PATCHv3] arm:kexec: have own crash_smp_send_stop() for crash dump for nonpanic cores

2017-08-07 Thread Hoeun Ryu
 Commit 0ee5941 : (x86/panic: replace smp_send_stop() with kdump friendly
version in panic path) introduced crash_smp_send_stop() which is a weak
function and can be overriden by architecture codes to fix the side effect
caused by commit f06e515 : (kernel/panic.c: add "crash_kexec_post_
notifiers" option).

 ARM architecture uses the weak version function and the problem is that
the weak function simply calls smp_send_stop() which makes other CPUs
offline and takes away the chance to save crash information for nonpanic
CPUs in machine_crash_shutdown() when crash_kexec_post_notifiers kernel
option is enabled.

 Calling smp_call_function(machine_crash_nonpanic_core, NULL, false) in
the function is useless because all nonpanic CPUs are already offline by
smp_send_stop() in this case and smp_call_function() only works against
online CPUs.

 The result is that /proc/vmcore is not available with the error messages;
"Warning: Zero PT_NOTE entries found", "Kdump: vmcore not initialized".

 crash_smp_send_stop() is implemented for ARM architecture to fix this
problem and the function (strong symbol version) saves crash information
for nonpanic CPUs using smp_call_function() and machine_crash_shutdown()
tries to save crash information for nonpanic CPUs only when
crash_kexec_post_notifiers kernel option is disabled.

 We might be able to implement the function like arm64 or x86 using a
dedicated IPI (let's say IPI_CPU_CRASH_STOP), but we cannot implement this
function like that because of the lack of IPI slots. Please see the commit
e7273ff4 : (ARM: 8488/1: Make IPI_CPU_BACKTRACE a "non-secure" SGI)

Signed-off-by: Hoeun Ryu 
---
 v3:
   - remove 'WARN_ON(num_online_cpus() > 1)' in machine_crash_shutdown().
 it's a false check for the case when crash_kexec_post_notifiers
 kernel option is disabled.
 v2:
   - calling crash_smp_send_stop() in machine_crash_shutdown() for the case
 when crash_kexec_post_notifiers kernel option is disabled.
   - fix commit messages for it.

 arch/arm/kernel/machine_kexec.c | 40 +---
 1 file changed, 29 insertions(+), 11 deletions(-)

diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c
index fe1419e..82ef7c7 100644
--- a/arch/arm/kernel/machine_kexec.c
+++ b/arch/arm/kernel/machine_kexec.c
@@ -94,6 +94,34 @@ void machine_crash_nonpanic_core(void *unused)
cpu_relax();
 }
 
+void crash_smp_send_stop(void)
+{
+   static int cpus_stopped;
+   unsigned long msecs;
+
+   /*
+* This function can be called twice in panic path, but obviously
+* we execute this only once.
+*/
+   if (cpus_stopped)
+   return;
+
+   cpus_stopped = 1;
+
+   if (num_online_cpus() == 1)
+   return;
+
+   atomic_set(_for_crash_ipi, num_online_cpus() - 1);
+   smp_call_function(machine_crash_nonpanic_core, NULL, false);
+   msecs = 1000; /* Wait at most a second for the other cpus to stop */
+   while ((atomic_read(_for_crash_ipi) > 0) && msecs) {
+   mdelay(1);
+   msecs--;
+   }
+   if (atomic_read(_for_crash_ipi) > 0)
+   pr_warn("Non-crashing CPUs did not react to IPI\n");
+}
+
 static void machine_kexec_mask_interrupts(void)
 {
unsigned int i;
@@ -119,19 +147,9 @@ static void machine_kexec_mask_interrupts(void)
 
 void machine_crash_shutdown(struct pt_regs *regs)
 {
-   unsigned long msecs;
-
local_irq_disable();
 
-   atomic_set(_for_crash_ipi, num_online_cpus() - 1);
-   smp_call_function(machine_crash_nonpanic_core, NULL, false);
-   msecs = 1000; /* Wait at most a second for the other cpus to stop */
-   while ((atomic_read(_for_crash_ipi) > 0) && msecs) {
-   mdelay(1);
-   msecs--;
-   }
-   if (atomic_read(_for_crash_ipi) > 0)
-   pr_warn("Non-crashing CPUs did not react to IPI\n");
+   crash_smp_send_stop();
 
crash_save_cpu(regs, smp_processor_id());
machine_kexec_mask_interrupts();
-- 
2.7.4



Re: [PATCH 4/4] selftests/seccomp: Test thread vs process killing

2017-08-07 Thread Kees Cook
On Mon, Aug 7, 2017 at 6:29 PM, Tyler Hicks  wrote:
>> + /* Only the thread died. Let parent know this thread didn't die. */
>
> This read a little odd to me. How about, "Only the created thread died.
> Let parent know the this creating thread didn't die."?

Sounds good. I've updated this to be more descriptive.

>> + ASSERT_EQ(1, WIFEXITED(status));
>
> This is probably nitpicky but, after reading the wait(2) man page, I
> feel like this should be ASSERT_TRUE(WIFEXITED(status)) instead of
> comparing to 1. There's no documented guarantee that 1 will be returned.

That's a fair point. I've updated this and WIFSIGNALED now, thanks!

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH] serio: PS2 gpio bit banging driver for the serio bus

2017-08-07 Thread Dmitry Torokhov
On Mon, Aug 07, 2017 at 11:03:53AM +0200, Linus Walleij wrote:
> On Tue, Aug 1, 2017 at 12:24 AM, Danilo Krummrich
>  wrote:
> 
> > +config SERIO_GPIO_PS2
> > +   tristate "GPIO PS/2 bit banging driver"
> > +   help
> > + Say Y here if you want PS/2 bit banging support via GPIO.
> > +
> > + To compile this driver as a module, choose M here: the
> > + module will be called gpio-ps2.
> > +
> > + If you are unsure, say N.
> 
> As mentioned
> 
> depends on GPIOLIB
> depends on OF

I do not think this driver has to depend on OF. It should use gpiod and
generic device properties.

Thanks.

-- 
Dmitry


Re: [PATCH v3] xen: get rid of paravirt op adjust_exception_frame

2017-08-07 Thread Andy Lutomirski
On Mon, Aug 7, 2017 at 1:56 PM, Boris Ostrovsky
 wrote:
>
>> diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
>> index 811e4ddb3f37..a3dcd83187ce 100644
>> --- a/arch/x86/xen/enlighten_pv.c
>> +++ b/arch/x86/xen/enlighten_pv.c
>> @@ -579,6 +579,71 @@ static void xen_write_ldt_entry(struct desc_struct *dt, 
>> int entrynum,
>>   preempt_enable();
>>  }
>>
>> +#ifdef CONFIG_X86_64
>> +static struct {
>> + void (*orig)(void);
>> + void (*xen)(void);
>> + bool ist_okay;
>> + bool handle;
>> +} trap_array[] = {
>> + { debug, xen_xendebug, true, true },
>> + { int3, xen_xenint3, true, true },
>> + { double_fault, xen_double_fault, true, false },
>
> Is it really worth adding 'handle' member to the structure because of a
> single special case? We don't expect to ever have another such vector.
>
> (TBH, I think current implementation of cvt_gate_to_trap() is clearer,
> even if it is not as general as what is in this patch. I know that Andy
> disagrees).

I have no real opinion either way.  I just think it's nicer to put it
in cvt_gate_to_trap() instead of the the traps.c setup code.

--Andy


[PATCH v2] x86/xen/64: Rearrange the SYSCALL entries

2017-08-07 Thread Andy Lutomirski
Xen's raw SYSCALL entries are much less weird than native.  Rather
than fudging them to look like native entries, use the Xen-provided
stack frame directly.

This lets us eliminate entry_SYSCALL_64_after_swapgs and two uses of
the SWAPGS_UNSAFE_STACK paravirt hook.  The SYSENTER code would
benefit from similar treatment.

This makes one change to the native code path: the compat
instruction that clears the high 32 bits of %rax is moved slightly
later.  I'd be surprised if this affects performance at all.

Signed-off-by: Andy Lutomirski 
---

Changes from v1 (which I never actually emailed):
 - Fix zero-extension in the compat case.

 arch/x86/entry/entry_64.S|  9 ++---
 arch/x86/entry/entry_64_compat.S |  7 +++
 arch/x86/xen/xen-asm_64.S| 23 +--
 3 files changed, 14 insertions(+), 25 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index aa58155187c5..7cee92cf807f 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -142,14 +142,8 @@ ENTRY(entry_SYSCALL_64)
 * We do not frame this tiny irq-off block with TRACE_IRQS_OFF/ON,
 * it is too small to ever cause noticeable irq latency.
 */
-   SWAPGS_UNSAFE_STACK
-   /*
-* A hypervisor implementation might want to use a label
-* after the swapgs, so that it can do the swapgs
-* for the guest and jump here on syscall.
-*/
-GLOBAL(entry_SYSCALL_64_after_swapgs)
 
+   swapgs
movq%rsp, PER_CPU_VAR(rsp_scratch)
movqPER_CPU_VAR(cpu_current_top_of_stack), %rsp
 
@@ -161,6 +155,7 @@ GLOBAL(entry_SYSCALL_64_after_swapgs)
pushq   %r11/* pt_regs->flags */
pushq   $__USER_CS  /* pt_regs->cs */
pushq   %rcx/* pt_regs->ip */
+GLOBAL(entry_SYSCALL_64_after_hwframe)
pushq   %rax/* pt_regs->orig_ax */
pushq   %rdi/* pt_regs->di */
pushq   %rsi/* pt_regs->si */
diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index e1721dafbcb1..5314d7b8e5ad 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -183,21 +183,20 @@ ENDPROC(entry_SYSENTER_compat)
  */
 ENTRY(entry_SYSCALL_compat)
/* Interrupts are off on entry. */
-   SWAPGS_UNSAFE_STACK
+   swapgs
 
/* Stash user ESP and switch to the kernel stack. */
movl%esp, %r8d
movqPER_CPU_VAR(cpu_current_top_of_stack), %rsp
 
-   /* Zero-extending 32-bit regs, do not remove */
-   movl%eax, %eax
-
/* Construct struct pt_regs on stack */
pushq   $__USER32_DS/* pt_regs->ss */
pushq   %r8 /* pt_regs->sp */
pushq   %r11/* pt_regs->flags */
pushq   $__USER32_CS/* pt_regs->cs */
pushq   %rcx/* pt_regs->ip */
+GLOBAL(entry_SYSCALL_compat_after_hwframe)
+   movl%eax, %eax  /* discard orig_ax high bits */
pushq   %rax/* pt_regs->orig_ax */
pushq   %rdi/* pt_regs->di */
pushq   %rsi/* pt_regs->si */
diff --git a/arch/x86/xen/xen-asm_64.S b/arch/x86/xen/xen-asm_64.S
index c3df43141e70..a8a4f4c460a6 100644
--- a/arch/x86/xen/xen-asm_64.S
+++ b/arch/x86/xen/xen-asm_64.S
@@ -82,34 +82,29 @@ RELOC(xen_sysret64, 1b+1)
  * rip
  * r11
  * rsp->rcx
- *
- * In all the entrypoints, we undo all that to make it look like a
- * CPU-generated syscall/sysenter and jump to the normal entrypoint.
  */
 
-.macro undo_xen_syscall
-   mov 0*8(%rsp), %rcx
-   mov 1*8(%rsp), %r11
-   mov 5*8(%rsp), %rsp
-.endm
-
 /* Normal 64-bit system call target */
 ENTRY(xen_syscall_target)
-   undo_xen_syscall
-   jmp entry_SYSCALL_64_after_swapgs
+   popq %rcx
+   popq %r11
+   jmp entry_SYSCALL_64_after_hwframe
 ENDPROC(xen_syscall_target)
 
 #ifdef CONFIG_IA32_EMULATION
 
 /* 32-bit compat syscall target */
 ENTRY(xen_syscall32_target)
-   undo_xen_syscall
-   jmp entry_SYSCALL_compat
+   popq %rcx
+   popq %r11
+   jmp entry_SYSCALL_compat_after_hwframe
 ENDPROC(xen_syscall32_target)
 
 /* 32-bit compat sysenter target */
 ENTRY(xen_sysenter_target)
-   undo_xen_syscall
+   mov 0*8(%rsp), %rcx
+   mov 1*8(%rsp), %r11
+   mov 5*8(%rsp), %rsp
jmp entry_SYSENTER_compat
 ENDPROC(xen_sysenter_target)
 
-- 
2.13.3



Re: [lkp-robot] [mm] 7674270022: will-it-scale.per_process_ops -19.3% regression

2017-08-07 Thread Nadav Amit
Minchan Kim  wrote:

> Hi,
> 
> On Tue, Aug 08, 2017 at 09:19:23AM +0800, kernel test robot wrote:
>> Greeting,
>> 
>> FYI, we noticed a -19.3% regression of will-it-scale.per_process_ops due to 
>> commit:
>> 
>> 
>> commit: 76742700225cad9df49f05399381ac3f1ec3dc60 ("mm: fix 
>> MADV_[FREE|DONTNEED] TLB flush miss problem")
>> url: 
>> https://github.com/0day-ci/linux/commits/Nadav-Amit/mm-migrate-prevent-racy-access-to-tlb_flush_pending/20170802-205715
>> 
>> 
>> in testcase: will-it-scale
>> on test machine: 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 
>> 64G memory
>> with following parameters:
>> 
>>  nr_task: 16
>>  mode: process
>>  test: brk1
>>  cpufreq_governor: performance
>> 
>> test-description: Will It Scale takes a testcase and runs it from 1 through 
>> to n parallel copies to see if the testcase will scale. It builds both a 
>> process and threads based test in order to see any differences between the 
>> two.
>> test-url: https://github.com/antonblanchard/will-it-scale
> 
> Thanks for the report.
> Could you explain what kinds of workload you are testing?
> 
> Does it calls frequently madvise(MADV_DONTNEED) in parallel on multiple
> threads?

According to the description it is "testcase:brk increase/decrease of one
page”. According to the mode it spawns multiple processes, not threads.

Since a single page is unmapped each time, and the iTLB-loads increase
dramatically, I would suspect that for some reason a full TLB flush is
caused during do_munmap().

If I find some free time, I’ll try to profile the workload - but feel free
to beat me to it.

Nadav 



[PATCH] spi/bcm63xx-hspi: fix error return code in bcm63xx_hsspi_probe()

2017-08-07 Thread Gustavo A. R. Silva
platform_get_irq() returns an error code, but the spi-bcm63xx-hsspi
driver ignores it and always returns -ENXIO. This is not correct and,
prevents -EPROBE_DEFER from being propagated properly.

Notice that platform_get_irq() no longer returns 0 on error:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e330b9a6bb35dc7097a4f02cb1ae7b6f96df92af

Print and propagate the return value of platform_get_irq on failure.

This issue was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/spi/spi-bcm63xx-hsspi.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/spi/spi-bcm63xx-hsspi.c b/drivers/spi/spi-bcm63xx-hsspi.c
index 475a790..cbcba61 100644
--- a/drivers/spi/spi-bcm63xx-hsspi.c
+++ b/drivers/spi/spi-bcm63xx-hsspi.c
@@ -338,8 +338,8 @@ static int bcm63xx_hsspi_probe(struct platform_device *pdev)
 
irq = platform_get_irq(pdev, 0);
if (irq < 0) {
-   dev_err(dev, "no irq\n");
-   return -ENXIO;
+   dev_err(dev, "no irq: %d\n", irq);
+   return irq;
}
 
res_mem = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-- 
2.5.0



Re: [PATCH] arm64: correct modules range of kernel virtual memory layout

2017-08-07 Thread Miles Chen
On Tue, 2017-08-08 at 12:44 +0800, Miles Chen wrote:
> On Mon, 2017-08-07 at 15:01 +0100, Will Deacon wrote:
> > On Mon, Aug 07, 2017 at 02:18:00PM +0100, Ard Biesheuvel wrote:
> > > On 7 August 2017 at 14:16, Will Deacon  wrote:
> > > > On Mon, Aug 07, 2017 at 07:04:46PM +0800, Miles Chen wrote:
> > > >> The commit f80fb3a3d508 ("arm64: add support for kernel ASLR")
> > > >> moved module virtual address to
> > > >> [module_alloc_base, module_alloc_base + MODULES_VSIZE).
> > > >>
> > > >> Display module information of the virtual kernel
> > > >> memory layout by using module_alloc_base.
> > > >>
> > > >> testing output:
> > > >> 1) Current implementation:
> > > >> Virtual kernel memory layout:
> > > >>   modules : 0xff80 - 0xff800800   (   128 MB)
> > > >> 2) this patch + KASLR:
> > > >> Virtual kernel memory layout:
> > > >>   modules : 0xff800056 - 0xff800856   (   128 MB)
> > > >> 3) this patch + KASLR and a dummy seed:
> > > >> Virtual kernel memory layout:
> > > >>   modules : 0xffa7df637000 - 0xffa7e7637000   (   128 MB)
> > > >>
> > > >> Signed-off-by: Miles Chen 
> > > >> ---
> > > >>  arch/arm64/mm/init.c | 5 +++--
> > > >>  1 file changed, 3 insertions(+), 2 deletions(-)
> > > >
> > > > Does this mean the modules code in our pt dumper is busted
> > > > (arch/arm64/mm/dump.c)? Also, what about KASAN, which uses these 
> > > > addresses
> > > > too (in kasan_init)? Should we just remove MODULES_VADDR and MODULES_END
> > > > altogether?
> > > >
> > > 
> > > I don't think we need this patch. The 'module' line simply prints the
> > > VA region that is reserved for modules. The fact that we end up
> > > putting them elsewhere when running randomized does not necessarily
> > > mean this line should reflect that.
> > 
> > I was more concerned by other users of MODULES_VADDR tbh, although I see
> > now that we don't randomize the module region if kasan is enabled. Still,
> > the kcore code adds the modules region as a separate area (distinct from
> > vmalloc) if MODULES_VADDR is defined, the page table dumping code uses
> > MODULES_VADDR to identify the module region and I think we'll get false
> > positives from is_vmalloc_or_module_addr, which again uses the static
> > region.
> > 
> > So, given that MODULES_VADDR never points at the module area, can't we get
> > rid of it?
> 
> Agreed.MODULES_VADDR should be phased out. Considering the kernel
> modules live somewhere between [VMALLOC_START, VMALLOC_END) now:
> (arch/arm64/kernel/module.c:module_alloc). I suggest the following
> changes:
> 
> 1. is_vmalloc_or_module_addr() should return is_vmalloc_addr() directly
> 2. arch/arm64/mm/dump.c does not need MODULES_VADDR and MODULES_END.
> 3. kasan uses [module_alloc_base, module_alloc_base + MODULES_VSIZE) to
> get the shadow memory? (the kernel modules still live in this range when
> kasan is enabled)
> 4. remove modules line in kernel memory layout
> (optional, thanks for Ard's feedback)
> 5. remove MODULE_VADDR, MODULES_END definition

I was wrong about this. is_vmalloc_or_module_addr() is defined
in mm/vmalloc and it uses MODULES_VADDR and MODULES_END.
May it is better to give MODULES_VADDR and MODULES_END
proper values, not remove them.

> Miles
> > 
> > Will
> 
> 




Re: [RESEND PATCH] bcache: Don't reinvent the wheel but use existing llist API

2017-08-07 Thread Coly Li
On 2017/8/8 下午12:12, Byungchul Park wrote:
> On Mon, Aug 07, 2017 at 06:18:35PM +0800, Coly Li wrote:
>> On 2017/8/7 下午4:38, Byungchul Park wrote:
>>> Although llist provides proper APIs, they are not used. Make them used.
>>>
>>> Signed-off-by: Byungchul Park > Only have a question about why not using llist_for_each_entry(), it's
> 
> Hello,
> 
> The reason is to keep the original logic unchanged. The logic already
> does as if it's the safe version against removal.
> 
>> still OK with llist_for_each_entry_safe(). The rested part is good to me.
>>
>> Acked-by: Coly Li 
>>
>>> ---
>>>  drivers/md/bcache/closure.c | 17 +++--
>>>  1 file changed, 3 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/drivers/md/bcache/closure.c b/drivers/md/bcache/closure.c
>>> index 864e673..1841d03 100644
>>> --- a/drivers/md/bcache/closure.c
>>> +++ b/drivers/md/bcache/closure.c
>>> @@ -64,27 +64,16 @@ void closure_put(struct closure *cl)
>>>  void __closure_wake_up(struct closure_waitlist *wait_list)
>>>  {
>>> struct llist_node *list;
>>> -   struct closure *cl;
>>> +   struct closure *cl, *t;
>>> struct llist_node *reverse = NULL;
>>>  
>>> list = llist_del_all(_list->list);
>>>  
>>> /* We first reverse the list to preserve FIFO ordering and fairness */
>>> -
>>> -   while (list) {
>>> -   struct llist_node *t = list;
>>> -   list = llist_next(list);
>>> -
>>> -   t->next = reverse;
>>> -   reverse = t;
>>> -   }
>>> +   reverse = llist_reverse_order(list);
>>>  
>>> /* Then do the wakeups */
>>> -
>>> -   while (reverse) {
>>> -   cl = container_of(reverse, struct closure, list);
>>> -   reverse = llist_next(reverse);
>>> -
>>> +   llist_for_each_entry_safe(cl, t, reverse, list) {
>>
>> Just wondering why not using llist_for_each_entry(), or you use the
>> _safe version on purpose ?
> 
> If I use llist_for_each_entry(), then it would change the original
> behavior. Is it ok?
> 

I feel llist_for_each_entry() keeps the original behavior, and variable
't' can be removed. Anyway, either llist_for_each_entry() or
llist_for_each_entry_safe() works correctly and well here. Any one you
use is OK to me, thanks for your informative reply :-)



-- 
Coly Li


[PATCH] memory: mtk-smi: Handle return value of clk_prepare_enable

2017-08-07 Thread Arvind Yadav
clk_prepare_enable() can fail here and we must check its return value.

Signed-off-by: Arvind Yadav 
---
 drivers/memory/mtk-smi.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c
index 4afbc41..edf36f0 100644
--- a/drivers/memory/mtk-smi.c
+++ b/drivers/memory/mtk-smi.c
@@ -321,6 +321,7 @@ static int mtk_smi_common_probe(struct platform_device 
*pdev)
struct resource *res;
const struct of_device_id *of_id;
enum mtk_smi_gen smi_gen;
+   int ret;
 
if (!dev->pm_domain)
return -EPROBE_DEFER;
@@ -359,7 +360,9 @@ static int mtk_smi_common_probe(struct platform_device 
*pdev)
if (IS_ERR(common->clk_async))
return PTR_ERR(common->clk_async);
 
-   clk_prepare_enable(common->clk_async);
+   ret = clk_prepare_enable(common->clk_async);
+   if (ret)
+   return ret;
}
pm_runtime_enable(dev);
platform_set_drvdata(pdev, common);
-- 
1.9.1



Re: linux-next: manual merge of the userns tree with the mips tree

2017-08-07 Thread Ralf Baechle
On Tue, Aug 08, 2017 at 03:10:04PM +1000, Stephen Rothwell wrote:

(Maciej added to cc.)

> Hi Eric,
> 
> Today's linux-next merge of the userns tree got a conflict in:
> 
>   arch/mips/kernel/traps.c
> 
> between commit:
> 
>   260a789828aa ("MIPS: signal: Remove unreachable code from 
> force_fcr31_sig().")
> 
> from the mips tree and commit:
> 
>   ea1b75cf9138 ("signal/mips: Document a conflict with SI_USER with SIGFPE")
> 
> from the userns tree.
> 
> I fixed it up (the former removed the code updated by the latter) and
> can carry the fix as necessary. This is now fixed as far as linux-next
> is concerned, but any non trivial conflicts should be mentioned to your
> upstream maintainer when your tree is submitted for merging.  You may
> also want to consider cooperating with the maintainer of the conflicting
> tree to minimise any particularly complex conflicts.

Eric,

after yesterday's emails on the topic I think commit ea1b75cf9138 ("signal/
mips: Document a conflict with SI_USER with SIGFPE") should be dropped.

  Ralf


[PATCH] mmc: mxcmmc: Handle return value of clk_prepare_enable

2017-08-07 Thread Arvind Yadav
clk_prepare_enable() can fail here and we must check its return value.

Signed-off-by: Arvind Yadav 
---
 drivers/mmc/host/mxcmmc.c | 25 +++--
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/drivers/mmc/host/mxcmmc.c b/drivers/mmc/host/mxcmmc.c
index fb3ca82..c016820 100644
--- a/drivers/mmc/host/mxcmmc.c
+++ b/drivers/mmc/host/mxcmmc.c
@@ -1098,8 +1098,13 @@ static int mxcmci_probe(struct platform_device *pdev)
goto out_free;
}
 
-   clk_prepare_enable(host->clk_per);
-   clk_prepare_enable(host->clk_ipg);
+   ret = clk_prepare_enable(host->clk_per);
+   if (ret)
+   goto out_free;
+
+   ret = clk_prepare_enable(host->clk_ipg);
+   if (ret)
+   goto out_clk_per_put;
 
mxcmci_softreset(host);
 
@@ -1168,8 +1173,9 @@ static int mxcmci_probe(struct platform_device *pdev)
dma_release_channel(host->dma);
 
 out_clk_put:
-   clk_disable_unprepare(host->clk_per);
clk_disable_unprepare(host->clk_ipg);
+out_clk_per_put:
+   clk_disable_unprepare(host->clk_per);
 
 out_free:
mmc_free_host(mmc);
@@ -1212,10 +1218,17 @@ static int __maybe_unused mxcmci_resume(struct device 
*dev)
 {
struct mmc_host *mmc = dev_get_drvdata(dev);
struct mxcmci_host *host = mmc_priv(mmc);
+   int ret;
 
-   clk_prepare_enable(host->clk_per);
-   clk_prepare_enable(host->clk_ipg);
-   return 0;
+   ret = clk_prepare_enable(host->clk_per);
+   if (ret)
+   return ret;
+
+   ret = clk_prepare_enable(host->clk_ipg);
+   if (ret)
+   clk_disable_unprepare(host->clk_per);
+
+   return ret;
 }
 
 static SIMPLE_DEV_PM_OPS(mxcmci_pm_ops, mxcmci_suspend, mxcmci_resume);
-- 
1.9.1



RE: [PATCH 2/2] gpio: 74x164: handling enable-gpios

2017-08-07 Thread Peng Fan
> > +   chip->enable_gpio = devm_gpiod_get(>dev, "enable",
> GPIOD_OUT_LOW);
> > +   if (IS_ERR(chip->enable_gpio)) {
> > +   dev_dbg(>dev, "No enable-gpios property\n");
> > +   chip->enable_gpio = NULL;
> 
> Also, the error handling here is not correct as it will never propagate
> EPROBE_DEFER.
> 
> I will submit my version of the patch if you don't mind.

That's ok if you have a better patch.

Regards,
Peng.


Re: [PATCH 2/4] seccomp: Add SECCOMP_FILTER_FLAG_KILL_PROCESS

2017-08-07 Thread Tyler Hicks
On 08/02/2017 10:19 PM, Kees Cook wrote:
> Right now, SECCOMP_RET_KILL kills the current thread. There have been
> a few requests for RET_KILL to kill the entire process (the thread
> group), but since seccomp's u32 return values are ABI, and ordered by
> lowest value, with RET_KILL as 0, there isn't a trivial way to provide
> an even smaller value that would mean the more restrictive action of
> killing the thread group.
> 
> Instead, create a filter flag that indicates that a RET_KILL from this
> filter must kill the process rather than the thread. This can be set
> (and not cleared) via the new SECCOMP_FILTER_FLAG_KILL_PROCESS flag.
> 
> Pros:
>  - the logic for the filter action is contained in the filter.
>  - userspace can detect support for the feature since earlier kernels
>will reject the new flag.
> Cons:
>  - depends on adding an assignment to the seccomp_run_filters() loop
>(previous patch).
> 
> Alternatives to this approach with pros/cons:
> 
> - Use a new test during seccomp_run_filters() that treats the RET_DATA
>   mask of a RET_KILL action as special. If a new bit is set in the data,
>   then treat the return value as -1 (lower than 0).
>   Pros:
>- the logic for the filter action is contained in the filter.
>   Cons:
>- added complexity to time-sensitive seccomp_run_filters() loop.
>- there isn't a trivial way for userspace to detect if the kernel
>  supports the feature (earlier kernels will silently ignore the
>  RET_DATA and only kill the thread).

I prefer using a filter flag over a special RET_DATA mask but, for
completeness, I wanted to mention that SECCOMP_GET_ACTION_AVAIL
operation could be extended to validate special RET_DATA masks. However,
I don't think that is a clean design.

> - Have SECCOMP_FILTER_FLAG_KILL_PROCESS attach to the seccomp struct
>   rather than the filter.
>   Pros:
>- no change needed to seccomp_run_filters() loop.
>   Cons:
>- the change in behavior technically originates external to the
>  filter, which allows for later filters to "enhance" a previously
>  applied filter's RET_KILL to kill the entire process, which may
>  be unexpected.
> 
> Signed-off-by: Kees Cook 
> ---
>  include/linux/seccomp.h  |  3 ++-
>  include/uapi/linux/seccomp.h |  3 ++-
>  kernel/seccomp.c | 12 +++-
>  3 files changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
> index ecc296c137cd..59d001ba655c 100644
> --- a/include/linux/seccomp.h
> +++ b/include/linux/seccomp.h
> @@ -3,7 +3,8 @@
>  
>  #include 
>  
> -#define SECCOMP_FILTER_FLAG_MASK (SECCOMP_FILTER_FLAG_TSYNC)
> +#define SECCOMP_FILTER_FLAG_MASK (SECCOMP_FILTER_FLAG_TSYNC | \
> +  SECCOMP_FILTER_FLAG_KILL_PROCESS)
>  
>  #ifdef CONFIG_SECCOMP
>  
> diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
> index 0f238a43ff1e..4b75d8c297b6 100644
> --- a/include/uapi/linux/seccomp.h
> +++ b/include/uapi/linux/seccomp.h
> @@ -15,7 +15,8 @@
>  #define SECCOMP_SET_MODE_FILTER  1
>  
>  /* Valid flags for SECCOMP_SET_MODE_FILTER */
> -#define SECCOMP_FILTER_FLAG_TSYNC1
> +#define SECCOMP_FILTER_FLAG_TSYNC1
> +#define SECCOMP_FILTER_FLAG_KILL_PROCESS 2
>  
>  /*
>   * All BPF programs must return a 32-bit value.
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index 8bdcf01379e4..931eb9cbd093 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -44,6 +44,7 @@
>   * is only needed for handling filters shared across tasks.
>   * @prev: points to a previously installed, or inherited, filter
>   * @prog: the BPF program to evaluate
> + * @kill_process: if true, RET_KILL will kill process rather than thread.
>   *
>   * seccomp_filter objects are organized in a tree linked via the @prev
>   * pointer.  For any task, it appears to be a singly-linked list starting
> @@ -59,6 +60,7 @@ struct seccomp_filter {
>   refcount_t usage;
>   struct seccomp_filter *prev;
>   struct bpf_prog *prog;
> + bool kill_process;

Just a reminder to move bool up to be the 2nd member of the struct for
improved struct packing. (You already said you were going to this while
you were reviewing my logging patches)

>  };
>  
>  /* Limit any path through the tree to 256KB worth of instructions. */
> @@ -448,6 +450,10 @@ static long seccomp_attach_filter(unsigned int flags,
>   return ret;
>   }
>  
> + /* Set process-killing flag, if present. */
> + if (flags & SECCOMP_FILTER_FLAG_KILL_PROCESS)
> + filter->kill_process = true;
> +
>   /*
>* If there is an existing filter, make it the prev and don't drop its
>* task reference.
> @@ -658,7 +664,11 @@ static int __seccomp_filter(int this_syscall, const 
> struct seccomp_data *sd,
>   seccomp_init_siginfo(, this_syscall, data);
>   

[no subject]

2017-08-07 Thread Vinay Venkataraghavan
Hello Linux

http://beat4sale.com/development.php?cover=27rgc725thxdcx





All Best
Vinay

Re: Suspend-resume failure on Intel Eagle Lake Core2Duo

2017-08-07 Thread Masahiro Yamada
Hi Marc,

2017-08-07 17:17 GMT+09:00 Marc Zyngier :
> On 07/08/17 05:45, Masahiro Yamada wrote:
>> Hi Marc,
>>
>>
>> 2017-08-03 22:30 GMT+09:00 Marc Zyngier :
>>> On 03/08/17 13:52, Masahiro Yamada wrote:
 Hi Marc,

 2017-08-03 17:41 GMT+09:00 Marc Zyngier :
> Hi Masahiro,
>
> On 03/08/17 08:32, Masahiro Yamada wrote:
>> Hi.
>>
>> 2017-08-01 0:55 GMT+09:00 Thomas Gleixner :
>>> On Mon, 31 Jul 2017, Tomi Sarvela wrote:
 On 31/07/17 18:06, Thomas Gleixner wrote:
> Can you please remove the patch. And try the following:
>
> # echo N > /sys/module/printk/parameters/console_suspend
>
> # echo mem > /sys/power/state
>
> and log the output of the serial console. That way we might get a clue
> where it gets stuck.

 I'm afraid it hangs right away. No response from SSH, no output to 
 serial.
>>>
>>> What means hangs right away? Is there no output at all on the serial
>>> console? Or does it just stop at some point?
>>>
>>> Thanks,
>>>
>>> tglx
>>>
>>
>> Sorry for jumping in.
>> Finally, I found this thread.
>>
>>
>> My environment is completely different (ARM64 board),
>> I am also suffering from a hibernation problem
>> since this commit.
>>
>>
>> I get no response on the serial console
>> after "Restarting tasks ... done." log message.
>>
>>
>> By reverting bf22ff45bed6 ("genirq: Avoid unnecessary low level
>> irq function calls", I can get hibernation working again.
>>
>>
>> SW info:
>> defconfig:  arch/arm64/configs/defconfig
>> DT   :  arch/arm64/boot/dts/socionext/uniphier-ld20-ref.dts
>> PSCI :  ARM Trusted Firmware
>>
>>
>> SoC info:
>> CPU  :  Cortex-A72 * 2 + Cortex-A53 * 2
>> irqchip  :  GICv3 (drivers/irq/irq-gic-v3.c)
>
> Let me take an educated guess: It feels like your firmware doesn't
> save/restore the GIC context across suspend/resume. Is that something
> you could check, assuming you have access to the firmware source code?

 Thanks for your comments.


 I do not know much about the manner of preserving GICv3 context.

 I can see this patch  (rejected?) :
 https://patchwork.kernel.org/patch/9343061/


 Is it something that should be completely cared by firmware
 instead of kernel?
>>>
>>> That was definitely the intention, but it looks like something that ATF
>>> has only started supporting very recently:
>>>
>>> https://github.com/ARM-software/arm-trusted-firmware/pull/1047
>>>
 ARM Trusted Firmware (https://github.com/ARM-software/arm-trusted-firmware)
 is open source software, and I pushed my platform code to the upstream.

 So, yes, I (and everybody) can have access to the firmware source code.


 I am not sure how ATF saves the context during hibernation, though.
>>>
>>> See the above link. Is there any chance of you trying this into your
>>> firmware?
>>>
>>> Thanks,
>>
>> Thanks for the pointer.
>>
>>
>> Yes.  I will try that once GIC-v3 context save/restore is supported in ATF.
>>
>> I think that will basically work for suspend-to-ram
>> because all contexts including both non-secure and secure worlds will
>> be retained in the main memory.
>>
>> However, I still do not understand how the context is preserved during
>> the hibernation (suspend-to-disk).
>>
>>
>> If my understanding is correct, hibernation on Linux works like follows:
>>
>> [1] Freeze all tasks
>> [2] CPU_OFF for non-boot CPUs
>> [3] Create a hibernation image
>> [4] CPU_ON for non-boot CPUs
>> [5] Write the hibernation image to the disk (=swap area)
>> [6] SYSTEM_OFF
>>
>>
>> IIUC, [5] only writes the context Linux takes care of (only non-secure).
>>
>> If so, where and how does the firmware write the GIC-v3 context
>> to the disk?
>
> Gah, I completely missed the fact that you were talking about suspend to
> disk, sorry about that.
>
> It is likely that some driver doesn't restore its state properly. Is
> there any chance that you could pinpoint which device creates the issue?
>

I use eMMC to store the hibernation image, but
I do not think eMMC driver is the cause of the issue.

I guess the cause of the issue is GIC-v3 context is lost.


I am not an expert in this, so I will ask the ATF community
about how ATF can support suspend-to-disk.


-- 
Best Regards
Masahiro Yamada


Re: [PATCH 4/4] selftests/seccomp: Test thread vs process killing

2017-08-07 Thread Tyler Hicks
On 08/02/2017 10:19 PM, Kees Cook wrote:
> SECCOMP_RET_KILL is supposed to kill the current thread (and userspace
> depends on this), so test for this, distinct from killing the entire
> process. This also tests killing the entire process with the new
> SECCOMP_FILTER_FLAG_KILL_PROCESS flag. (This also moves a bunch of
> defines up earlier in the file to use them earlier.)
> 
> Signed-off-by: Kees Cook 
> ---
>  tools/testing/selftests/seccomp/seccomp_bpf.c | 182 
> --
>  1 file changed, 141 insertions(+), 41 deletions(-)
> 
> diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c 
> b/tools/testing/selftests/seccomp/seccomp_bpf.c
> index ee78a53da5d1..68b9faf23ca6 100644
> --- a/tools/testing/selftests/seccomp/seccomp_bpf.c
> +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
> @@ -87,6 +87,51 @@ struct seccomp_data {
>  };
>  #endif
>  
> +#ifndef __NR_seccomp
> +# if defined(__i386__)
> +#  define __NR_seccomp 354
> +# elif defined(__x86_64__)
> +#  define __NR_seccomp 317
> +# elif defined(__arm__)
> +#  define __NR_seccomp 383
> +# elif defined(__aarch64__)
> +#  define __NR_seccomp 277
> +# elif defined(__hppa__)
> +#  define __NR_seccomp 338
> +# elif defined(__powerpc__)
> +#  define __NR_seccomp 358
> +# elif defined(__s390__)
> +#  define __NR_seccomp 348
> +# else
> +#  warning "seccomp syscall number unknown for this architecture"
> +#  define __NR_seccomp 0x
> +# endif
> +#endif
> +
> +#ifndef SECCOMP_SET_MODE_STRICT
> +#define SECCOMP_SET_MODE_STRICT 0
> +#endif
> +
> +#ifndef SECCOMP_SET_MODE_FILTER
> +#define SECCOMP_SET_MODE_FILTER 1
> +#endif
> +
> +#ifndef SECCOMP_FILTER_FLAG_TSYNC
> +#define SECCOMP_FILTER_FLAG_TSYNC 1
> +#endif
> +
> +#ifndef SECCOMP_FILTER_FLAG_KILL_PROCESS
> +#define SECCOMP_FILTER_FLAG_KILL_PROCESS 2
> +#endif
> +
> +#ifndef seccomp
> +int seccomp(unsigned int op, unsigned int flags, void *args)
> +{
> + errno = 0;
> + return syscall(__NR_seccomp, op, flags, args);
> +}
> +#endif
> +
>  #if __BYTE_ORDER == __LITTLE_ENDIAN
>  #define syscall_arg(_n) (offsetof(struct seccomp_data, args[_n]))
>  #elif __BYTE_ORDER == __BIG_ENDIAN
> @@ -520,6 +565,102 @@ TEST_SIGNAL(KILL_one_arg_six, SIGSYS)
>   close(fd);
>  }
>  
> +/* This is a thread task to die via seccomp filter violation. */
> +void *kill_thread(void *data)
> +{
> + bool die = (bool)data;
> +
> + if (die) {
> + prctl(PR_GET_SECCOMP, 0, 0, 0, 0);
> + return (void *)SIBLING_EXIT_FAILURE;
> + }
> +
> + return (void *)SIBLING_EXIT_UNKILLED;
> +}
> +
> +/* Prepare a thread that will kill itself or both of us. */
> +void kill_thread_or_group(struct __test_metadata *_metadata, bool 
> kill_process)
> +{
> + pthread_t thread;
> + void *status;
> + unsigned int flags;
> + /* Kill only when calling __NR_prctl. */
> + struct sock_filter filter[] = {
> + BPF_STMT(BPF_LD|BPF_W|BPF_ABS,
> + offsetof(struct seccomp_data, nr)),
> + BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_prctl, 0, 1),
> + BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_KILL),
> + BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW),
> + };
> + struct sock_fprog prog = {
> + .len = (unsigned short)ARRAY_SIZE(filter),
> + .filter = filter,
> + };
> +
> + ASSERT_EQ(0, prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
> + TH_LOG("Kernel does not support PR_SET_NO_NEW_PRIVS!");
> + }
> +
> + flags = kill_process ? SECCOMP_FILTER_FLAG_KILL_PROCESS : 0;
> + ASSERT_EQ(0, seccomp(SECCOMP_SET_MODE_FILTER, flags, )) {
> + if (kill_process)
> + TH_LOG("Kernel does not support 
> SECCOMP_FILTER_FLAG_KILL_PROCESS");
> + else
> + TH_LOG("Kernel does not support seccomp syscall");
> + }
> +
> + /* Start a thread that will exit immediately. */
> + ASSERT_EQ(0, pthread_create(, NULL, kill_thread, (void *)false));
> + ASSERT_EQ(0, pthread_join(thread, ));
> + ASSERT_EQ(SIBLING_EXIT_UNKILLED, (unsigned long)status);
> +
> + /* Start a thread that will die immediately. */
> + ASSERT_EQ(0, pthread_create(, NULL, kill_thread, (void *)true));
> + ASSERT_EQ(0, pthread_join(thread, ));
> + ASSERT_NE(SIBLING_EXIT_FAILURE, (unsigned long)status);
> +
> + /* Only the thread died. Let parent know this thread didn't die. */

This read a little odd to me. How about, "Only the created thread died.
Let parent know the this creating thread didn't die."?

> + exit(42);
> +}
> +
> +TEST(KILL_thread)
> +{
> + int status;
> + pid_t child_pid;
> +
> + child_pid = fork();
> + ASSERT_LE(0, child_pid);
> + if (child_pid == 0) {
> + kill_thread_or_group(_metadata, false);
> + _exit(38);
> + }
> +
> + ASSERT_EQ(child_pid, waitpid(child_pid, , 0));
> +
> + /* If only the thread was killed, we'll see exit 42. */
> + 

Re: [PATCH v2 2/3] usb: chipidea: Hook into mux framework to toggle usb switch

2017-08-07 Thread Stephen Boyd
Quoting Peter Rosin (2017-07-31 03:33:22)
> On 2017-07-14 23:40, Stephen Boyd wrote:
> > @@ -1964,16 +1965,26 @@ void ci_hdrc_gadget_destroy(struct ci_hdrc *ci)
> >  
> >  static int udc_id_switch_for_device(struct ci_hdrc *ci)
> >  {
> > + int ret = 0;
> > +
> >   if (ci->is_otg)
> >   /* Clear and enable BSV irq */
> >   hw_write_otgsc(ci, OTGSC_BSVIS | OTGSC_BSVIE,
> >   OTGSC_BSVIS | OTGSC_BSVIE);
> >  
> > - return 0;
> > + if (!ci_otg_is_fsm_mode(ci))
> > + ret = mux_control_select(ci->platdata->usb_switch, 0);
> > +
> > + if (ci->is_otg && ret)
> > + hw_write_otgsc(ci, OTGSC_BSVIE | OTGSC_BSVIS, OTGSC_BSVIS);
> > +
> > + return ret;
> >  }
> >  
> >  static void udc_id_switch_for_host(struct ci_hdrc *ci)
> >  {
> > + mux_control_deselect(ci->platdata->usb_switch);
> > +
> 
> This looks broken. You conditionally lock the mux and you unconditionally
> unlock it. Quoting the mux_control_deselect doc:
> 
>  * It is required that a single call is made to mux_control_deselect() for
>  * each and every successful call made to either of mux_control_select() or
>  * mux_control_try_select().
> 
> Think of the mux as a semaphore with a max count of one. If you lock it,
> you have to unlock it when you're done. If you didn't lock it, you have
> zero business unlocking it. If you try to lock it but fail, you also have
> no business unlocking it. Just like a semaphore.

Good catch. I've added a if (!ci_otg_is_fsm_mode()) check here.

> 
> Another point: I do not know if udc_id_switch_for_host is *only* called
> when there has been a prior call to udc_id_switch_for_device that
> succeeded or how this works exactly. But this all looks fragile. Using
> mux_control_select and mux_control_deselect *must* be done in pairs.
> If you want a mux to be locked for "a while", such as in this case, you
> have to make sure you stay within the rules. There is no room for half
> measures, or you will likely cause a deadlock when something unexpected
> happens.
> 

Can you elaborate? Is it bad that we keep it "locked" while we're in
host or device mode? It looked like we paired the start/stop ops with
each other so that the mux is properly managed across these ops. My
testing hasn't shown a problem, but maybe there's some corner case
you're thinking of? I'll double check the code.


Re: [PATCH v9 0/4] Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-08-07 Thread Bjorn Helgaas
On Mon, Aug 07, 2017 at 02:14:48PM -0700, David Miller wrote:
> From: Ding Tianhong 
> Date: Mon, 7 Aug 2017 12:13:17 +0800
> 
> > Hi David:
> > 
> > I think networking tree merge it is a better choice, as it mainly used to 
> > tell the NIC
> > drivers how to use the Relaxed Ordering Attribute, and later we need send 
> > patch to enable
> > RO for ixgbe driver base on this patch. But I am not sure whether Bjorn has 
> > some of his own
> > view. :)
> > 
> > Hi Bjorn:
> > 
> > Could you help review this patch or give some feedback ?
> 
> I'm still waiting on this...
> 
> Bjorn?

I was on vacation Friday-today, but I'll look at this series this week.


Re: [PATCH v2 2/4] seccomp: Add SECCOMP_FILTER_FLAG_KILL_PROCESS

2017-08-07 Thread Tyler Hicks
On 08/07/2017 08:59 PM, Kees Cook wrote:
> Right now, SECCOMP_RET_KILL kills the current thread. There have been
> a few requests for RET_KILL to kill the entire process (the thread
> group), but since seccomp's u32 return values are ABI, and ordered by
> lowest value, with RET_KILL as 0, there isn't a trivial way to provide
> an even smaller value that would mean the more restrictive action of
> killing the thread group.
> 
> Instead, create a filter flag that indicates that a RET_KILL from this
> filter must kill the process rather than the thread. This can be set
> (and not cleared) via the new SECCOMP_FILTER_FLAG_KILL_PROCESS flag.
> 
> Pros:
>  - the logic for the filter action is contained in the filter.
>  - userspace can detect support for the feature since earlier kernels
>will reject the new flag.
> Cons:
>  - depends on adding an assignment to the seccomp_run_filters() loop
>(previous patch).
> 
> Alternatives to this approach with pros/cons:
> 
> - Use a new test during seccomp_run_filters() that treats the RET_DATA
>   mask of a RET_KILL action as special. If a new bit is set in the data,
>   then treat the return value as -1 (lower than 0).
>   Pros:
>- the logic for the filter action is contained in the filter.
>   Cons:
>- added complexity to time-sensitive seccomp_run_filters() loop.
>- there isn't a trivial way for userspace to detect if the kernel
>  supports the feature (earlier kernels will silently ignore the
>  RET_DATA and only kill the thread).
> 
> - Have SECCOMP_FILTER_FLAG_KILL_PROCESS attach to the seccomp struct
>   rather than the filter.
>   Pros:
>- no change needed to seccomp_run_filters() loop.
>   Cons:
>- the change in behavior technically originates external to the
>  filter, which allows for later filters to "enhance" a previously
>  applied filter's RET_KILL to kill the entire process, which may
>  be unexpected.
> 
> Signed-off-by: Kees Cook 

v2 of these patches all look good to me.

Reviewed-by: Tyler Hicks 

Thanks!

Tyler

> ---
>  include/linux/seccomp.h  |  3 ++-
>  include/uapi/linux/seccomp.h |  3 ++-
>  kernel/seccomp.c | 22 +-
>  3 files changed, 25 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
> index ecc296c137cd..59d001ba655c 100644
> --- a/include/linux/seccomp.h
> +++ b/include/linux/seccomp.h
> @@ -3,7 +3,8 @@
>  
>  #include 
>  
> -#define SECCOMP_FILTER_FLAG_MASK (SECCOMP_FILTER_FLAG_TSYNC)
> +#define SECCOMP_FILTER_FLAG_MASK (SECCOMP_FILTER_FLAG_TSYNC | \
> +  SECCOMP_FILTER_FLAG_KILL_PROCESS)
>  
>  #ifdef CONFIG_SECCOMP
>  
> diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
> index 0f238a43ff1e..4b75d8c297b6 100644
> --- a/include/uapi/linux/seccomp.h
> +++ b/include/uapi/linux/seccomp.h
> @@ -15,7 +15,8 @@
>  #define SECCOMP_SET_MODE_FILTER  1
>  
>  /* Valid flags for SECCOMP_SET_MODE_FILTER */
> -#define SECCOMP_FILTER_FLAG_TSYNC1
> +#define SECCOMP_FILTER_FLAG_TSYNC1
> +#define SECCOMP_FILTER_FLAG_KILL_PROCESS 2
>  
>  /*
>   * All BPF programs must return a 32-bit value.
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index 1f3347fc2605..297f8bfc3b72 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -44,6 +44,7 @@
>   * is only needed for handling filters shared across tasks.
>   * @prev: points to a previously installed, or inherited, filter
>   * @prog: the BPF program to evaluate
> + * @kill_process: if true, RET_KILL will kill process rather than thread.
>   *
>   * seccomp_filter objects are organized in a tree linked via the @prev
>   * pointer.  For any task, it appears to be a singly-linked list starting
> @@ -57,6 +58,7 @@
>   */
>  struct seccomp_filter {
>   refcount_t usage;
> + bool kill_process;
>   struct seccomp_filter *prev;
>   struct bpf_prog *prog;
>  };
> @@ -450,6 +452,10 @@ static long seccomp_attach_filter(unsigned int flags,
>   return ret;
>   }
>  
> + /* Set process-killing flag, if present. */
> + if (flags & SECCOMP_FILTER_FLAG_KILL_PROCESS)
> + filter->kill_process = true;
> +
>   /*
>* If there is an existing filter, make it the prev and don't drop its
>* task reference.
> @@ -665,7 +671,21 @@ static int __seccomp_filter(int this_syscall, const 
> struct seccomp_data *sd,
>   seccomp_init_siginfo(, this_syscall, data);
>   do_coredump();
>   }
> - do_exit(SIGSYS);
> + /*
> +  * The only way match can be NULL here is if something
> +  * went very wrong in seccomp_run_filters() (e.g. a NULL
> +  * filter list in struct seccomp) and the return action
> +  * falls back to failing closed. In this case, take the
> + 

Re: [PATCH v3] irqchip/gic-v3-its: Allow GIC ITS number more than MAX_NUMNODES

2017-08-07 Thread Hanjun Guo

On 2017/7/26 18:15, Hanjun Guo wrote:

From: Hanjun Guo 

When enabling ITS NUMA support on D05, I got the boot log:

[0.00] SRAT: PXM 0 -> ITS 0 -> Node 0
[0.00] SRAT: PXM 0 -> ITS 1 -> Node 0
[0.00] SRAT: PXM 0 -> ITS 2 -> Node 0
[0.00] SRAT: PXM 1 -> ITS 3 -> Node 1
[0.00] SRAT: ITS affinity exceeding max count[4]

This is wrong on D05 as we have 8 ITSs with 4 NUMA nodes.

So dynamically alloc the memory needed instead of using
its_srat_maps[MAX_NUMNODES], which count the number of
ITS entry(ies) in SRAT and alloc its_srat_maps as needed,
then build the mapping of numa node to ITS ID. Of course,
its_srat_maps will be freed after ITS probing because
we don't need that after boot.

After doing this, I got what I wanted:

[0.00] SRAT: PXM 0 -> ITS 0 -> Node 0
[0.00] SRAT: PXM 0 -> ITS 1 -> Node 0
[0.00] SRAT: PXM 0 -> ITS 2 -> Node 0
[0.00] SRAT: PXM 1 -> ITS 3 -> Node 1
[0.00] SRAT: PXM 2 -> ITS 4 -> Node 2
[0.00] SRAT: PXM 2 -> ITS 5 -> Node 2
[0.00] SRAT: PXM 2 -> ITS 6 -> Node 2
[0.00] SRAT: PXM 3 -> ITS 7 -> Node 3

Fixes: dbd2b8267233 ("irqchip/gic-v3-its: Add ACPI NUMA node mapping")
Signed-off-by: Hanjun Guo 
Reviewed-by: Lorenzo Pieralisi 
Cc: Ganapatrao Kulkarni 
Cc: John Garry 
Cc: Marc Zyngier 
---

v2->v3:
   - Remove the NULL check for its_srat_maps as its_in_srat will be 0;
   - Add warning if alloc memory failed for its_srat_maps;
   - Update commit log as Lorenzo suggested;
   - Add review tag


Sorry for the ping, I'm not sure if I missed some review comments
for v2, if I did, I'm sorry for that, please give me some hint if
I need to respin another version.

Thanks
Hanjun


[PATCH] scheduler: enhancement to show_state_filter

2017-08-07 Thread Yafang Shao
Sometimes we want to get tasks in TASK_RUNNING sepcifically,
instead of dump all tasks.
For example, when the loadavg are high, we want to dump
tasks in TASK_RUNNING and TASK_UNINTERRUPTIBLE, which contribute
to system load. But mostly there're lots of tasks in Sleep state,
which occupies almost all of the kernel log buffer, even overflows
it, that causes the useful messages get lost. Although we can
enlarge the kernel log buffer, but that's not a good idea.

So I made this change to make the show_state_filter more flexible,
and then we can dump the tasks in TASK_RUNNING specifically.

Signed-off-by: Yafang Shao 
---
 drivers/tty/sysrq.c | 2 +-
 include/linux/sched.h   | 1 +
 include/linux/sched/debug.h | 6 --
 kernel/sched/core.c | 7 ---
 4 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
index 3ffc1ce..86db51b 100644
--- a/drivers/tty/sysrq.c
+++ b/drivers/tty/sysrq.c
@@ -291,7 +291,7 @@ static void sysrq_handle_showstate(int key)
 
 static void sysrq_handle_showstate_blocked(int key)
 {
-   show_state_filter(TASK_UNINTERRUPTIBLE);
+   show_state_filter(TASK_UNINTERRUPTIBLE << 1);
 }
 static struct sysrq_key_op sysrq_showstate_blocked_op = {
.handler= sysrq_handle_showstate_blocked,
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8337e2d..542a071 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -82,6 +82,7 @@
 #define TASK_NOLOAD1024
 #define TASK_NEW   2048
 #define TASK_STATE_MAX 4096
+#define TASK_ALL_BITS  (TASK_STATE_MAX - 1)
 
 #define TASK_STATE_TO_CHAR_STR "RSDTtXZxKWPNn"
 
diff --git a/include/linux/sched/debug.h b/include/linux/sched/debug.h
index e0eaee5..c844689 100644
--- a/include/linux/sched/debug.h
+++ b/include/linux/sched/debug.h
@@ -1,6 +1,8 @@
 #ifndef _LINUX_SCHED_DEBUG_H
 #define _LINUX_SCHED_DEBUG_H
 
+#include 
+
 /*
  * Various scheduler/task debugging interfaces:
  */
@@ -10,13 +12,13 @@
 extern void dump_cpu_task(int cpu);
 
 /*
- * Only dump TASK_* tasks. (0 for all tasks)
+ * Only dump TASK_* tasks. (TASK_ALL_BITS for all tasks)
  */
 extern void show_state_filter(unsigned long state_filter);
 
 static inline void show_state(void)
 {
-   show_state_filter(0);
+   show_state_filter(TASK_ALL_BITS);
 }
 
 struct pt_regs;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0869b20..46d277c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5161,19 +5161,20 @@ void show_state_filter(unsigned long state_filter)
 */
touch_nmi_watchdog();
touch_all_softlockup_watchdogs();
-   if (!state_filter || (p->state & state_filter))
+   /* in case we want to set TASK_RUNNING specifically */
+   if ((p->state != 0 ? p->state << 1 : 1) & state_filter)
sched_show_task(p);
}
 
 #ifdef CONFIG_SCHED_DEBUG
-   if (!state_filter)
+   if (state_filter == TASK_ALL_BITS)
sysrq_sched_debug_show();
 #endif
rcu_read_unlock();
/*
 * Only show locks if all tasks are dumped:
 */
-   if (!state_filter)
+   if (state_filter == TASK_ALL_BITS)
debug_show_all_locks();
 }
 
-- 
1.8.3.1



[RFC v1 4/4] ipmi_bmc: bt-aspeed: port driver to IPMI BMC framework

2017-08-07 Thread Brendan Higgins
From: Benjamin Fair 

The driver was handling interaction with userspace on its own. This
patch changes it to use the functionality of the ipmi_bmc framework
instead.

Note that this removes the ability for the BMC to set SMS_ATN by making
an ioctl. If this functionality is required, it can be added back in
with a later patch.

Signed-off-by: Benjamin Fair 
Signed-off-by: Brendan Higgins 
---
 drivers/char/ipmi_bmc/ipmi_bmc_bt_aspeed.c | 258 +
 include/uapi/linux/bt-bmc.h|  18 --
 2 files changed, 74 insertions(+), 202 deletions(-)
 delete mode 100644 include/uapi/linux/bt-bmc.h

diff --git a/drivers/char/ipmi_bmc/ipmi_bmc_bt_aspeed.c 
b/drivers/char/ipmi_bmc/ipmi_bmc_bt_aspeed.c
index 70d434bc1cbf..7c8082c511ee 100644
--- a/drivers/char/ipmi_bmc/ipmi_bmc_bt_aspeed.c
+++ b/drivers/char/ipmi_bmc/ipmi_bmc_bt_aspeed.c
@@ -7,25 +7,19 @@
  * 2 of the License, or (at your option) any later version.
  */
 
-#include 
-#include 
 #include 
 #include 
 #include 
+#include 
 #include 
-#include 
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
 
-/*
- * This is a BMC device used to communicate to the host
- */
-#define DEVICE_NAME"ipmi-bt-host"
+#define DEVICE_NAME "ipmi-bmc-bt-aspeed"
 
 #define BT_IO_BASE 0xe4
 #define BT_IRQ 10
@@ -61,18 +55,17 @@
 #define BT_BMC_BUFFER_SIZE 256
 
 struct bt_bmc {
+   struct ipmi_bmc_bus bus;
struct device   dev;
-   struct miscdevice   miscdev;
+   struct ipmi_bmc_ctx *bmc_ctx;
+   struct bt_msg   request;
struct regmap   *map;
int offset;
int irq;
-   wait_queue_head_t   queue;
struct timer_list   poll_timer;
-   struct mutexmutex;
+   spinlock_t  lock;
 };
 
-static atomic_t open_count = ATOMIC_INIT(0);
-
 static const struct regmap_config bt_regmap_cfg = {
.reg_bits = 32,
.val_bits = 32,
@@ -158,27 +151,28 @@ static ssize_t bt_writen(struct bt_bmc *bt_bmc, u8 *buf, 
size_t n)
return n;
 }
 
+/* TODO(benjaminfair): support ioctl BT_BMC_IOCTL_SMS_ATN */
 static void set_sms_atn(struct bt_bmc *bt_bmc)
 {
bt_outb(bt_bmc, BT_CTRL_SMS_ATN, BT_CTRL);
 }
 
-static struct bt_bmc *file_bt_bmc(struct file *file)
+/* Called with bt_bmc->lock held */
+static bool __is_request_avail(struct bt_bmc *bt_bmc)
 {
-   return container_of(file->private_data, struct bt_bmc, miscdev);
+   return bt_inb(bt_bmc, BT_CTRL) & BT_CTRL_H2B_ATN;
 }
 
-static int bt_bmc_open(struct inode *inode, struct file *file)
+static bool is_request_avail(struct bt_bmc *bt_bmc)
 {
-   struct bt_bmc *bt_bmc = file_bt_bmc(file);
+   unsigned long flags;
+   bool result;
 
-   if (atomic_inc_return(_count) == 1) {
-   clr_b_busy(bt_bmc);
-   return 0;
-   }
+   spin_lock_irqsave(_bmc->lock, flags);
+   result = __is_request_avail(bt_bmc);
+   spin_unlock_irqrestore(_bmc->lock, flags);
 
-   atomic_dec(_count);
-   return -EBUSY;
+   return result;
 }
 
 /*
@@ -194,67 +188,43 @@ static int bt_bmc_open(struct inode *inode, struct file 
*file)
  *Length  NetFn/LUN  Seq Cmd Data
  *
  */
-static ssize_t bt_bmc_read(struct file *file, char __user *buf,
-  size_t count, loff_t *ppos)
+static void get_request(struct bt_bmc *bt_bmc)
 {
-   struct bt_bmc *bt_bmc = file_bt_bmc(file);
-   u8 len;
-   int len_byte = 1;
-   u8 kbuffer[BT_BMC_BUFFER_SIZE];
-   ssize_t ret = 0;
-   ssize_t nread;
+   u8 *request_buf = (u8 *) _bmc->request;
+   unsigned long flags;
 
-   if (!access_ok(VERIFY_WRITE, buf, count))
-   return -EFAULT;
+   spin_lock_irqsave(_bmc->lock, flags);
 
-   WARN_ON(*ppos);
-
-   if (wait_event_interruptible(bt_bmc->queue,
-bt_inb(bt_bmc, BT_CTRL) & BT_CTRL_H2B_ATN))
-   return -ERESTARTSYS;
-
-   mutex_lock(_bmc->mutex);
-
-   if (unlikely(!(bt_inb(bt_bmc, BT_CTRL) & BT_CTRL_H2B_ATN))) {
-   ret = -EIO;
-   goto out_unlock;
+   if (!__is_request_avail(bt_bmc)) {
+   spin_unlock_irqrestore(_bmc->lock, flags);
+   return;
}
 
set_b_busy(bt_bmc);
clr_h2b_atn(bt_bmc);
clr_rd_ptr(bt_bmc);
 
-   /*
-* The BT frames start with the message length, which does not
-* include the length byte.
-*/
-   kbuffer[0] = bt_read(bt_bmc);
-   len = kbuffer[0];
-
-   /* We pass the length back to userspace as well */
-   if (len + 1 > count)
-   len = count - 1;
-
-   while (len) {
-   nread = min_t(ssize_t, len, sizeof(kbuffer) - len_byte);
-
-   bt_readn(bt_bmc, kbuffer + 

[RFC v1 1/4] ipmi_bmc: framework for BT IPMI on BMCs

2017-08-07 Thread Brendan Higgins
From: Benjamin Fair 

This patch introduces a framework for writing IPMI drivers which run on
a Board Management Controller. It is similar in function to OpenIPMI.
The framework handles registering devices and routing messages.

Signed-off-by: Benjamin Fair 
Signed-off-by: Brendan Higgins 
---
 drivers/char/ipmi_bmc/Makefile   |   1 +
 drivers/char/ipmi_bmc/ipmi_bmc.c | 294 +++
 include/linux/ipmi_bmc.h | 184 
 3 files changed, 479 insertions(+)
 create mode 100644 drivers/char/ipmi_bmc/ipmi_bmc.c

diff --git a/drivers/char/ipmi_bmc/Makefile b/drivers/char/ipmi_bmc/Makefile
index 8bff32b55c24..9c7cd48d899f 100644
--- a/drivers/char/ipmi_bmc/Makefile
+++ b/drivers/char/ipmi_bmc/Makefile
@@ -2,5 +2,6 @@
 # Makefile for the ipmi bmc drivers.
 #
 
+obj-$(CONFIG_IPMI_BMC) += ipmi_bmc.o
 obj-$(CONFIG_IPMI_BMC_BT_I2C) += ipmi_bmc_bt_i2c.o
 obj-$(CONFIG_ASPEED_BT_IPMI_BMC) += ipmi_bmc_bt_aspeed.o
diff --git a/drivers/char/ipmi_bmc/ipmi_bmc.c b/drivers/char/ipmi_bmc/ipmi_bmc.c
new file mode 100644
index ..c1324ac9a83c
--- /dev/null
+++ b/drivers/char/ipmi_bmc/ipmi_bmc.c
@@ -0,0 +1,294 @@
+/*
+ * Copyright 2017 Google Inc.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define PFX "IPMI BMC core: "
+
+struct ipmi_bmc_ctx *ipmi_bmc_get_global_ctx()
+{
+   static struct ipmi_bmc_ctx global_ctx;
+
+   return _ctx;
+}
+
+int ipmi_bmc_send_response(struct ipmi_bmc_ctx *ctx,
+  struct bt_msg *bt_response)
+{
+   struct ipmi_bmc_bus *bus;
+   int ret = -ENODEV;
+
+   rcu_read_lock();
+   bus = rcu_dereference(ctx->bus);
+
+   if (bus)
+   ret = bus->send_response(bus, bt_response);
+
+   rcu_read_unlock();
+   return ret;
+}
+EXPORT_SYMBOL(ipmi_bmc_send_response);
+
+bool ipmi_bmc_is_response_open(struct ipmi_bmc_ctx *ctx)
+{
+   struct ipmi_bmc_bus *bus;
+   bool ret = false;
+
+   rcu_read_lock();
+   bus = rcu_dereference(ctx->bus);
+
+   if (bus)
+   ret = bus->is_response_open(bus);
+
+   rcu_read_unlock();
+   return ret;
+}
+EXPORT_SYMBOL(ipmi_bmc_is_response_open);
+
+int ipmi_bmc_register_device(struct ipmi_bmc_ctx *ctx,
+struct ipmi_bmc_device *device_in)
+{
+   struct ipmi_bmc_device *device;
+
+   mutex_lock(>drivers_mutex);
+   /* Make sure it hasn't already been registered. */
+   list_for_each_entry(device, >devices, link) {
+   if (device == device_in) {
+   mutex_unlock(>drivers_mutex);
+   return -EINVAL;
+   }
+   }
+
+   list_add_rcu(_in->link, >devices);
+   mutex_unlock(>drivers_mutex);
+
+   return 0;
+}
+EXPORT_SYMBOL(ipmi_bmc_register_device);
+
+int ipmi_bmc_unregister_device(struct ipmi_bmc_ctx *ctx,
+  struct ipmi_bmc_device *device_in)
+{
+   struct ipmi_bmc_device *device;
+   bool found = false;
+
+   mutex_lock(>drivers_mutex);
+   /* Make sure it is currently registered. */
+   list_for_each_entry(device, >devices, link) {
+   if (device == device_in) {
+   found = true;
+   break;
+   }
+   }
+   if (!found) {
+   mutex_unlock(>drivers_mutex);
+   return -ENXIO;
+   }
+
+   list_del_rcu(_in->link);
+   mutex_unlock(>drivers_mutex);
+   synchronize_rcu();
+
+   return 0;
+}
+EXPORT_SYMBOL(ipmi_bmc_unregister_device);
+
+int ipmi_bmc_register_default_device(struct ipmi_bmc_ctx *ctx,
+struct ipmi_bmc_device *device)
+{
+   int ret;
+
+   mutex_lock(>drivers_mutex);
+   if (!ctx->default_device) {
+   ctx->default_device = device;
+   ret = 0;
+   } else {
+   ret = -EBUSY;
+   }
+   mutex_unlock(>drivers_mutex);
+
+   return ret;
+}
+EXPORT_SYMBOL(ipmi_bmc_register_default_device);
+
+int ipmi_bmc_unregister_default_device(struct ipmi_bmc_ctx *ctx,
+  struct ipmi_bmc_device *device)
+{
+   int ret;
+
+   mutex_lock(>drivers_mutex);
+   if (ctx->default_device == device) {
+   ctx->default_device = NULL;
+   ret = 0;
+   } else {
+   ret = -ENXIO;
+   }
+ 

  1   2   3   4   5   6   7   8   9   10   >