[PATCH] afs: Fix server record deletion

2018-04-18 Thread David Howells
AFS server records get removed from the net->fs_servers tree when they're
deleted, but not from the net->fs_addresses{4,6} lists, which can lead to
an oops in afs_find_server() when a server record has been removed, for
instance during rmmod.

Fix this by deleting the record from the by-address lists before posting it
for RCU destruction.

The reason this hasn't been noticed before is that the fileserver keeps
probing the local cache manager, thereby keeping the service record alive,
so the oops would only happen when a fileserver eventually gets bored and
stops pinging or if the module gets rmmod'd and a call comes in from the
fileserver during the window between the server records being destroyed and
the socket being closed.

The oops looks something like:

BUG: unable to handle kernel NULL pointer dereference at 001c
...
Workqueue: kafsd afs_process_async_call [kafs]
RIP: 0010:afs_find_server+0x271/0x36f [kafs]
...
Call Trace:
 ? worker_thread+0x230/0x2ac
 ? worker_thread+0x230/0x2ac
 afs_deliver_cb_init_call_back_state3+0x1f2/0x21f [kafs]
 afs_deliver_to_call+0x1ee/0x5e8 [kafs]
 ? worker_thread+0x230/0x2ac
 afs_process_async_call+0x5b/0xd0 [kafs]
 process_one_work+0x2c2/0x504
 ? worker_thread+0x230/0x2ac
 worker_thread+0x1d4/0x2ac
 ? rescuer_thread+0x29b/0x29b
 kthread+0x11f/0x127
 ? kthread_create_on_node+0x3f/0x3f
 ret_from_fork+0x24/0x30

Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and 
fileserver rotation")
Signed-off-by: David Howells 
---

 fs/afs/server.c |9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/afs/server.c b/fs/afs/server.c
index e23be63998a8..629c74986cff 100644
--- a/fs/afs/server.c
+++ b/fs/afs/server.c
@@ -428,8 +428,15 @@ static void afs_gc_servers(struct afs_net *net, struct 
afs_server *gc_list)
}
write_sequnlock(>fs_lock);
 
-   if (deleted)
+   if (deleted) {
+   write_seqlock(>fs_addr_lock);
+   if (!hlist_unhashed(>addr4_link))
+   hlist_del_rcu(>addr4_link);
+   if (!hlist_unhashed(>addr6_link))
+   hlist_del_rcu(>addr6_link);
+   write_sequnlock(>fs_addr_lock);
afs_destroy_server(net, server);
+   }
}
 }
 



Re: [PATCH v6 3/4] MIPS: vmlinuz: Use generic ashldi3

2018-04-18 Thread Matt Redfearn

Hi James,

On 18/04/18 00:09, James Hogan wrote:

On Wed, Apr 11, 2018 at 08:50:18AM +0100, Matt Redfearn wrote:

diff --git a/arch/mips/boot/compressed/Makefile 
b/arch/mips/boot/compressed/Makefile
index adce180f3ee4..e03f522c33ac 100644
--- a/arch/mips/boot/compressed/Makefile
+++ b/arch/mips/boot/compressed/Makefile
@@ -46,9 +46,12 @@ $(obj)/uart-ath79.c: 
$(srctree)/arch/mips/ath79/early_printk.c
  
  vmlinuzobjs-$(CONFIG_KERNEL_XZ) += $(obj)/ashldi3.o $(obj)/bswapsi.o
  
-extra-y += ashldi3.c bswapsi.c

-$(obj)/ashldi3.o $(obj)/bswapsi.o: KBUILD_CFLAGS += -I$(srctree)/arch/mips/lib
-$(obj)/ashldi3.c $(obj)/bswapsi.c: $(obj)/%.c: $(srctree)/arch/mips/lib/%.c
+extra-y += ashldi3.c
+$(obj)/ashldi3.c: $(obj)/%.c: $(srctree)/lib/%.c
+   $(call cmd,shipped)
+
+extra-y += bswapsi.c
+$(obj)/bswapsi.c: $(obj)/%.c: $(srctree)/arch/mips/lib/%.c
$(call cmd,shipped)


ci20_defconfig:

arch/mips/boot/compressed/ashldi3.c:4:10: fatal error: libgcc.h: No such file 
or directory
  #include "libgcc.h"
^~

It looks like it had already copied ashldi3.c from arch/mips/lib/ when
building an older commit, and it hasn't been regenerated from lib/ since
the Makefile changed, so its still using the old version.

I think it should be using FORCE and if_changed like this:

diff --git a/arch/mips/boot/compressed/Makefile 
b/arch/mips/boot/compressed/Makefile
index e03f522c33ac..abe77add8789 100644
--- a/arch/mips/boot/compressed/Makefile
+++ b/arch/mips/boot/compressed/Makefile
@@ -47,12 +47,12 @@ $(obj)/uart-ath79.c: 
$(srctree)/arch/mips/ath79/early_printk.c
  vmlinuzobjs-$(CONFIG_KERNEL_XZ) += $(obj)/ashldi3.o $(obj)/bswapsi.o
  
  extra-y += ashldi3.c

-$(obj)/ashldi3.c: $(obj)/%.c: $(srctree)/lib/%.c
-   $(call cmd,shipped)
+$(obj)/ashldi3.c: $(obj)/%.c: $(srctree)/lib/%.c FORCE
+   $(call if_changed,shipped)
  
  extra-y += bswapsi.c

-$(obj)/bswapsi.c: $(obj)/%.c: $(srctree)/arch/mips/lib/%.c
-   $(call cmd,shipped)
+$(obj)/bswapsi.c: $(obj)/%.c: $(srctree)/arch/mips/lib/%.c FORCE
+   $(call if_changed,shipped)
  
  targets := $(notdir $(vmlinuzobjs-y))
  
That resolves the build failures when checking out old -> new without

cleaning, since the .ashldi3.c.cmd is missing so it gets rebuilt.

It should also resolve issues if the path it copies from is updated in
future since the .ashldi3.c.cmd will get updated.

If you checkout new -> old without cleaning, the now removed
arch/mips/lib/ashldi3.c will get added which will trigger regeneration,
so it won't error.

However if you do new -> old -> new then the .ashldi3.cmd file isn't
updated while at old, so you get the same error as above. I'm not sure
there's much we can practically do about that, aside perhaps avoiding
the issue in future by somehow auto-deleting stale .*.cmd files.

Cc'ing kbuild folk in case they have any bright ideas.

At least the straightforward old->new upgrade will work with the above
fixup though. If you're okay with it I'm happy to apply as a fixup.


Unbelievable how fragile this change is proving to be :-/
Yeah fixup looks good to me.

Thanks,
Matt



Cheers
James



Re: [PATCH v6 01/11] ARM: sunxi: smp: Move assembly code into a file

2018-04-18 Thread Maxime Ripard
On Tue, Apr 17, 2018 at 07:25:15PM +0800, Chen-Yu Tsai wrote:
> On Tue, Apr 17, 2018 at 7:17 PM, Maxime Ripard
>  wrote:
> > On Tue, Apr 17, 2018 at 11:12:41AM +0800, Chen-Yu Tsai wrote:
> >> On Tue, Apr 17, 2018 at 5:50 AM, Mylène Josserand
> >>  wrote:
> >> > Move the assembly code for cluster cache enabling and resuming
> >> > into an assembly file instead of having it directly in C code.
> >> >
> >> > Remove the CFLAGS because we are using the ARM directive "arch"
> >> > instead.
> >> >
> >> > Signed-off-by: Mylène Josserand 
> >> > ---
> >> >  arch/arm/mach-sunxi/Makefile  |  4 +--
> >> >  arch/arm/mach-sunxi/headsmp.S | 80 
> >> > +
> >> >  arch/arm/mach-sunxi/mc_smp.c  | 82 
> >> > +++
> >> >  3 files changed, 85 insertions(+), 81 deletions(-)
> >> >  create mode 100644 arch/arm/mach-sunxi/headsmp.S
> >>
> >> I'm still not convinced about this whole "move ASM to separate
> >> file" thing, especially now that you aren't actually adding any
> >> sunxi-specific ASM code beyond a simple function call.
> >>
> >> Could you drop this for now?
> >
> > I'd really like to have this merged actually. There's a significant
> > readibility improvement, so even if there's no particular functional
> > improvement, I'd still call it a win.
> 
> What parts do you consider hard to read? The extra quotes? Trailing
> newline? Or perhaps the __stringify bits?

All of this, plus the clobbers and operands.

Maxime

-- 
Maxime Ripard, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com


signature.asc
Description: PGP signature


[PATCH 1/6 RESEND] fs: use << for MS_* flags

2018-04-18 Thread Christian Brauner
Consistenly use << to define MS_* constants.

Signed-off-by: Christian Brauner 
Cc: Alexander Viro 
---
 include/uapi/linux/fs.h | 33 +
 1 file changed, 17 insertions(+), 16 deletions(-)

diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index d2a8313fabd7..9662790a657c 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -105,22 +105,23 @@ struct inodes_stat_t {
 /*
  * These are the fs-independent mount-flags: up to 32 flags are supported
  */
-#define MS_RDONLY   1  /* Mount read-only */
-#define MS_NOSUID   2  /* Ignore suid and sgid bits */
-#define MS_NODEV4  /* Disallow access to device special files */
-#define MS_NOEXEC   8  /* Disallow program execution */
-#define MS_SYNCHRONOUS 16  /* Writes are synced at once */
-#define MS_REMOUNT 32  /* Alter flags of a mounted FS */
-#define MS_MANDLOCK64  /* Allow mandatory locks on an FS */
-#define MS_DIRSYNC 128 /* Directory modifications are synchronous */
-#define MS_NOATIME 1024/* Do not update access times. */
-#define MS_NODIRATIME  2048/* Do not update directory access times */
-#define MS_BIND4096
-#define MS_MOVE8192
-#define MS_REC 16384
-#define MS_VERBOSE 32768   /* War is peace. Verbosity is silence.
-  MS_VERBOSE is deprecated. */
-#define MS_SILENT  32768
+#define MS_RDONLY  (1<<0)  /* Mount read-only */
+#define MS_NOSUID  (1<<1)  /* Ignore suid and sgid bits */
+#define MS_NODEV   (1<<2)  /* Disallow access to device special files */
+#define MS_NOEXEC  (1<<3)  /* Disallow program execution */
+#define MS_SYNCHRONOUS (1<<4)  /* Writes are synced at once */
+#define MS_REMOUNT (1<<5)  /* Alter flags of a mounted FS */
+#define MS_MANDLOCK(1<<6)  /* Allow mandatory locks on an FS */
+#define MS_DIRSYNC (1<<7)  /* Directory modifications are synchronous */
+#define MS_NOATIME (1<<10) /* Do not update access times. */
+#define MS_NODIRATIME  (1<<11) /* Do not update directory access times */
+#define MS_BIND(1<<12)
+#define MS_MOVE(1<<13)
+#define MS_REC (1<<14)
+#define MS_VERBOSE (1<<15) /* War is peace. Verbosity is silence.
+* MS_VERBOSE is deprecated.
+*/
+#define MS_SILENT  (1<<15)
 #define MS_POSIXACL(1<<16) /* VFS does not apply the umask */
 #define MS_UNBINDABLE  (1<<17) /* change to unbindable */
 #define MS_PRIVATE (1<<18) /* change to private */
-- 
2.17.0



[PATCH] mtd: dataflash: replace msleep with usleep_range

2018-04-18 Thread Luca Ellero
Since msleep is based on jiffies, this 3 ms sleep becomes actually 20 ms.
Worst of all, since this sleep is used in a loop when writing, a single page
write (256 to 1024 bytes) causes 17 ms extra time.
When writing large files (for example u-boot is usually 512 KB) this delay
adds up to minutes.
See Documentation/timers/timers-howto.txt "Why not msleep for (1ms - 20ms)".

Signed-off-by: Luca Ellero 
---
 drivers/mtd/devices/mtd_dataflash.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/mtd/devices/mtd_dataflash.c 
b/drivers/mtd/devices/mtd_dataflash.c
index aaaeaae..3a6f450 100644
--- a/drivers/mtd/devices/mtd_dataflash.c
+++ b/drivers/mtd/devices/mtd_dataflash.c
@@ -140,7 +140,7 @@ static int dataflash_waitready(struct spi_device *spi)
if (status & (1 << 7))  /* RDY/nBSY */
return status;
 
-   msleep(3);
+   usleep_range(3000, 4000);
}
 }
 
-- 
2.7.4



Re: [PATCH 2/6] tracing: Add trace event error log

2018-04-18 Thread Masami Hiramatsu
On Fri, 13 Apr 2018 10:44:32 -0400
Steven Rostedt  wrote:

> On Fri, 13 Apr 2018 09:24:34 -0500
> Tom Zanussi  wrote:
> 
> > Yeah, I agree - I'd rather get it right than get it in now.  I thought
> > this made sense, and was based on input from Masami, which I may have
> > misinterpreted, but I'll wait for some more ideas about the best way to
> > do this.
> 
> Too bad we are not closer to November, as this would actually be a good
> Plumbers topic. Maybe it's not that important and we should wait until
> then. I'd like to get some brain storming ideas out before we decide on
> anything, and this is something I believe is better done face to face
> than over email.

OK, sounds good for me too :)
My point was that printk buffer is not good place for the parser error
of ftrace, nor each sub-features (like hist, trigger, probe_events etc.) 
has different place to show it. I just want to unify the user experience
over the ftrace UI.

Thanks,

-- 
Masami Hiramatsu 


[PATCH 5/5] f2fs: fix to avoid race during access gc_thread pointer

2018-04-18 Thread Chao Yu
Thread AThread BThread C
- f2fs_remount
 - stop_gc_thread
- f2fs_sbi_store
- issue_discard_thread
   sbi->gc_thread = NULL;
  sbi->gc_thread->gc_wake = 1
  access 
sbi->gc_thread->gc_urgent

Previously, we allocate memory for sbi->gc_thread based on background
gc thread mount option, the memory can be released if we turn off
that mount option, but still there are several places access gc_thread
pointer without considering race condition, result in NULL point
dereference.

In order to fix this issue, keep gc_thread structure valid in sbi all
the time instead of alloc/free it dynamically.

Signed-off-by: Chao Yu 
---
 fs/f2fs/debug.c   |  3 +--
 fs/f2fs/f2fs.h|  7 +++
 fs/f2fs/gc.c  | 58 +--
 fs/f2fs/segment.c |  4 ++--
 fs/f2fs/super.c   | 13 +++--
 fs/f2fs/sysfs.c   |  8 
 6 files changed, 60 insertions(+), 33 deletions(-)

diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c
index 715beb85e9db..7bb036a3bb81 100644
--- a/fs/f2fs/debug.c
+++ b/fs/f2fs/debug.c
@@ -223,8 +223,7 @@ static void update_mem_info(struct f2fs_sb_info *sbi)
si->cache_mem = 0;
 
/* build gc */
-   if (sbi->gc_thread)
-   si->cache_mem += sizeof(struct f2fs_gc_kthread);
+   si->cache_mem += sizeof(struct f2fs_gc_kthread);
 
/* build merge flush thread */
if (SM_I(sbi)->fcc_info)
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 567c6bb57ae3..c553f63199e8 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1412,6 +1412,11 @@ static inline struct sit_info *SIT_I(struct f2fs_sb_info 
*sbi)
return (struct sit_info *)(SM_I(sbi)->sit_info);
 }
 
+static inline struct f2fs_gc_kthread *GC_I(struct f2fs_sb_info *sbi)
+{
+   return (struct f2fs_gc_kthread *)(sbi->gc_thread);
+}
+
 static inline struct free_segmap_info *FREE_I(struct f2fs_sb_info *sbi)
 {
return (struct free_segmap_info *)(SM_I(sbi)->free_info);
@@ -2954,6 +2959,8 @@ bool f2fs_overwrite_io(struct inode *inode, loff_t pos, 
size_t len);
 /*
  * gc.c
  */
+int init_gc_context(struct f2fs_sb_info *sbi);
+void destroy_gc_context(struct f2fs_sb_info * sbi);
 int start_gc_thread(struct f2fs_sb_info *sbi);
 void stop_gc_thread(struct f2fs_sb_info *sbi);
 block_t start_bidx_of_node(unsigned int node_ofs, struct inode *inode);
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index da89ca16a55d..7d310e454b77 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -26,8 +26,8 @@
 static int gc_thread_func(void *data)
 {
struct f2fs_sb_info *sbi = data;
-   struct f2fs_gc_kthread *gc_th = sbi->gc_thread;
-   wait_queue_head_t *wq = >gc_thread->gc_wait_queue_head;
+   struct f2fs_gc_kthread *gc_th = GC_I(sbi);
+   wait_queue_head_t *wq = _th->gc_wait_queue_head;
unsigned int wait_ms;
 
wait_ms = gc_th->min_sleep_time;
@@ -114,17 +114,15 @@ static int gc_thread_func(void *data)
return 0;
 }
 
-int start_gc_thread(struct f2fs_sb_info *sbi)
+int init_gc_context(struct f2fs_sb_info *sbi)
 {
struct f2fs_gc_kthread *gc_th;
-   dev_t dev = sbi->sb->s_bdev->bd_dev;
-   int err = 0;
 
gc_th = f2fs_kmalloc(sbi, sizeof(struct f2fs_gc_kthread), GFP_KERNEL);
-   if (!gc_th) {
-   err = -ENOMEM;
-   goto out;
-   }
+   if (!gc_th)
+   return -ENOMEM;
+
+   gc_th->f2fs_gc_task = NULL;
 
gc_th->urgent_sleep_time = DEF_GC_THREAD_URGENT_SLEEP_TIME;
gc_th->min_sleep_time = DEF_GC_THREAD_MIN_SLEEP_TIME;
@@ -139,26 +137,41 @@ int start_gc_thread(struct f2fs_sb_info *sbi)
gc_th->atomic_file[FG_GC] = 0;
 
sbi->gc_thread = gc_th;
-   init_waitqueue_head(>gc_thread->gc_wait_queue_head);
-   sbi->gc_thread->f2fs_gc_task = kthread_run(gc_thread_func, sbi,
+
+   return 0;
+}
+
+void destroy_gc_context(struct f2fs_sb_info *sbi)
+{
+   kfree(GC_I(sbi));
+   sbi->gc_thread = NULL;
+}
+
+int start_gc_thread(struct f2fs_sb_info *sbi)
+{
+   struct f2fs_gc_kthread *gc_th = GC_I(sbi);
+   dev_t dev = sbi->sb->s_bdev->bd_dev;
+   int err = 0;
+
+   init_waitqueue_head(_th->gc_wait_queue_head);
+   gc_th->f2fs_gc_task = kthread_run(gc_thread_func, sbi,
"f2fs_gc-%u:%u", MAJOR(dev), MINOR(dev));
if (IS_ERR(gc_th->f2fs_gc_task)) {
err = PTR_ERR(gc_th->f2fs_gc_task);
-   kfree(gc_th);
-   sbi->gc_thread = NULL;
+   gc_th->f2fs_gc_task = NULL;
}
-out:
+
return err;
 }
 
 void stop_gc_thread(struct f2fs_sb_info *sbi)
 {
-   struct f2fs_gc_kthread *gc_th = sbi->gc_thread;
-   if (!gc_th)
-   return;
-   kthread_stop(gc_th->f2fs_gc_task);
-   kfree(gc_th);

[PATCH 3/5] f2fs: avoid stucking GC due to atomic write

2018-04-18 Thread Chao Yu
f2fs doesn't allow abuse on atomic write class interface, so except
limiting in-mem pages' total memory usage capacity, we need to limit
atomic-write usage as well when filesystem is seriously fragmented,
otherwise we may run into infinite loop during foreground GC because
target blocks in victim segment are belong to atomic opened file for
long time.

Now, we will detect failure due to atomic write in foreground GC, if
the count exceeds threshold, we will drop all atomic written data in
cache, by this, I expect it can keep our system running safely to
prevent Dos attack.

Signed-off-by: Chao Yu 
---
 fs/f2fs/f2fs.h|  1 +
 fs/f2fs/file.c|  5 +
 fs/f2fs/gc.c  | 27 +++
 fs/f2fs/gc.h  |  3 +++
 fs/f2fs/segment.c |  1 +
 fs/f2fs/segment.h |  2 ++
 6 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index c1c3a1d11186..3453288d6a71 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -2249,6 +2249,7 @@ enum {
FI_EXTRA_ATTR,  /* indicate file has extra attribute */
FI_PROJ_INHERIT,/* indicate file inherits projectid */
FI_PIN_FILE,/* indicate file should not be gced */
+   FI_ATOMIC_REVOKE_REQUEST,/* indicate atomic committed data has been 
dropped */
 };
 
 static inline void __mark_inode_dirty_flag(struct inode *inode,
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 7c90ded5a431..cddd9aee1bb2 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -1698,6 +1698,7 @@ static int f2fs_ioc_start_atomic_write(struct file *filp)
 skip_flush:
set_inode_flag(inode, FI_HOT_DATA);
set_inode_flag(inode, FI_ATOMIC_FILE);
+   clear_inode_flag(inode, FI_ATOMIC_REVOKE_REQUEST);
f2fs_update_time(F2FS_I_SB(inode), REQ_TIME);
 
F2FS_I(inode)->inmem_task = current;
@@ -1746,6 +1747,10 @@ static int f2fs_ioc_commit_atomic_write(struct file 
*filp)
ret = f2fs_do_sync_file(filp, 0, LLONG_MAX, 1, false);
}
 err_out:
+   if (is_inode_flag_set(inode, FI_ATOMIC_REVOKE_REQUEST)) {
+   clear_inode_flag(inode, FI_ATOMIC_REVOKE_REQUEST);
+   ret = -EINVAL;
+   }
up_write(_I(inode)->dio_rwsem[WRITE]);
inode_unlock(inode);
mnt_drop_write_file(filp);
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index bfb7a4a3a929..495876ca62b6 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -135,6 +135,8 @@ int start_gc_thread(struct f2fs_sb_info *sbi)
gc_th->gc_urgent = 0;
gc_th->gc_wake= 0;
 
+   gc_th->atomic_file = 0;
+
sbi->gc_thread = gc_th;
init_waitqueue_head(>gc_thread->gc_wait_queue_head);
sbi->gc_thread->f2fs_gc_task = kthread_run(gc_thread_func, sbi,
@@ -603,7 +605,7 @@ static bool is_alive(struct f2fs_sb_info *sbi, struct 
f2fs_summary *sum,
  * This can be used to move blocks, aka LBAs, directly on disk.
  */
 static void move_data_block(struct inode *inode, block_t bidx,
-   unsigned int segno, int off)
+   int gc_type, unsigned int segno, int off)
 {
struct f2fs_io_info fio = {
.sbi = F2FS_I_SB(inode),
@@ -630,8 +632,10 @@ static void move_data_block(struct inode *inode, block_t 
bidx,
if (!check_valid_map(F2FS_I_SB(inode), segno, off))
goto out;
 
-   if (f2fs_is_atomic_file(inode))
+   if (f2fs_is_atomic_file(inode)) {
+   F2FS_I_SB(inode)->gc_thread->atomic_file++;
goto out;
+   }
 
if (f2fs_is_pinned_file(inode)) {
f2fs_pin_file_control(inode, true);
@@ -737,8 +741,10 @@ static void move_data_page(struct inode *inode, block_t 
bidx, int gc_type,
if (!check_valid_map(F2FS_I_SB(inode), segno, off))
goto out;
 
-   if (f2fs_is_atomic_file(inode))
+   if (f2fs_is_atomic_file(inode)) {
+   F2FS_I_SB(inode)->gc_thread->atomic_file++;
goto out;
+   }
if (f2fs_is_pinned_file(inode)) {
if (gc_type == FG_GC)
f2fs_pin_file_control(inode, true);
@@ -900,7 +906,8 @@ static void gc_data_segment(struct f2fs_sb_info *sbi, 
struct f2fs_summary *sum,
start_bidx = start_bidx_of_node(nofs, inode)
+ ofs_in_node;
if (f2fs_encrypted_file(inode))
-   move_data_block(inode, start_bidx, segno, off);
+   move_data_block(inode, start_bidx, gc_type,
+   segno, off);
else
move_data_page(inode, start_bidx, gc_type,
segno, off);
@@ -1017,6 +1024,8 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync,
.ilist = 

[PATCH 1/5] f2fs: fix race in between GC and atomic open

2018-04-18 Thread Chao Yu
Thread  GC thread
- f2fs_ioc_start_atomic_write
 - get_dirty_pages
 - filemap_write_and_wait_range
- f2fs_gc
 - do_garbage_collect
  - gc_data_segment
   - move_data_page
- f2fs_is_atomic_file
- set_page_dirty
 - set_inode_flag(, FI_ATOMIC_FILE)

Dirty data page can still be generated by GC in race condition as
above call stack.

This patch adds fi->dio_rwsem[WRITE] in f2fs_ioc_start_atomic_write
to avoid such race.

Signed-off-by: Chao Yu 
---
 fs/f2fs/file.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 78b3a58cfe21..408471bf4799 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -1677,6 +1677,8 @@ static int f2fs_ioc_start_atomic_write(struct file *filp)
 
inode_lock(inode);
 
+   down_write(_I(inode)->dio_rwsem[WRITE]);
+
if (f2fs_is_atomic_file(inode))
goto out;
 
@@ -1702,6 +1704,7 @@ static int f2fs_ioc_start_atomic_write(struct file *filp)
stat_inc_atomic_write(inode);
stat_update_max_atomic_write(inode);
 out:
+   up_write(_I(inode)->dio_rwsem[WRITE]);
inode_unlock(inode);
mnt_drop_write_file(filp);
return ret;
-- 
2.15.0.55.gc2ece9dc4de6



Re: [RFC 2/6] dmaengine: xilinx_dma: Pass AXI4-Stream control words to netdev dma client

2018-04-18 Thread Peter Ujfalusi

On 2018-04-17 18:54, Lars-Peter Clausen wrote:
> On 04/17/2018 04:53 PM, Peter Ujfalusi wrote:
>> On 2018-04-17 16:58, Lars-Peter Clausen wrote:
> There are two options.
>
> Either you extend the generic interfaces so it can cover your usecase in a
> generic way. E.g. the ability to attach meta data to transfer.

 Fwiw I have this patch as part of a bigger work to achieve similar results:
>>>
>>> That's good stuff. Is this in a public tree somewhere?
>>
>> Not atm. I can not send the user of the new API and I did not wanted to
>> send something like this out of the blue w/o context.
>>
>> But as it is a generic patch, I can send it as well. The only thing is
>> that the need for the memcpy, so I might end up with
>> ptr = get_metadata_ptr(desc, ); /* size: in RX the valid size */
>>
>> and set_metadata_size(); /* in TX to tell how the client placed */
>>
>> Or something like that, the attach_metadata() as it is works just fine,
>> but high throughput might not like the memcpy.
>>
> 
> In the most abstracted way I'd say metadata and data are two different data
> streams that are correlated and send/received at the same time.

In my case the meatdata is sideband information or parameters for/from
the remote end. Like timestamp, algorithm parameters, keys, etc.

It is tight to the data payload, but it is not part of it.

But the API should be generic enough to cover other use cases where
clients need to provide additional information.
For me, the metadata is part of the descriptor we give and receive back
from the DMA, others might have sideband channel to send that.

For metadata handling we could have:

struct dma_desc_metadata_ops {
   /* To give a buffer for the DMA with the metadata, as it was in my
* original patch
*/
   int (*desc_attach_metadata)(struct dma_async_tx_descriptor *desc,
   void *data, size_t len);

   void *(*desc_get_metadata_ptr)(struct dma_async_tx_descriptor *desc,
  size_t *payload_len, size_t *max_len);
   int (*desc_set_payload_len)(struct dma_async_tx_descriptor *desc,
  size_t payload_len);
};

Probably a simple flag variable to indicate which of the two modes are
supported:
1. Client provided metadata buffer handling
Clients provide the buffer via desc_attach_metadata(), the DMA driver
will do whatever it needs to do, copy it in place, send it differently,
use parameters.
In RX the received metadata is going to be placed to the provided buffer.
2. Ability to give the metadata pointer to user to work on it.
In TX, clients can use desc_get_metadata_ptr() to get the pointer,
current payload size and maximum size of the metadata and can work
directly on the buffer to place the data. Then desc_set_payload_len() to
let the DMA know how much data is actually placed there.
In RX, desc_get_metadata_ptr() will give the user the pointer and the
payload size so it can process that information correctly.

DMA driver can implement either or both, but clients must only use
either 1 or 2 to work with the metadata.


> Think multi-planar transfer, like for audio when the right and left channel
> are in separate buffers and not interleaved. Or video with different
> color/luminance components in separate buffers. This is something that is at
> the moment not covered by the dmaengine API either.

Hrm, true, but it is hardly the metadata use case. It is more like
different DMA transfer type.

> Or you can implement a interface that is specific to your DMA controller 
> and
> any client using this interface knows it is talking to your DMA 
> controller.

 Hrm, so we can have DMA driver specific calls? The reason why TI's 
 keystone 2
 navigator DMA support was rejected that it was introducing NAV specific 
 calls
 for clients to configure features not yet supported by the framework.
>>>
>>> In my opinion it is OK, somebody else might have different ideas. I mean it
>>> is not nice, but it is better than the alternative of overloading the
>>> generic API with driver specific semantics or introducing some kind of IOCTL
>>> catch all callback.
>>
>> True, but the generic API can be extended as well to cover new grounds,
>> features. Like this metadata thing.
>>
>>> If there is tight coupling between the DMA core and client and there is no
>>> intention of using a generic client the best solution might even be to no
>>> use DMAengine at all.
>>
>> This is how the knav stuff ended up. Well it is only used by networking
>> atm, so it is 'fine' to have custom API, but it is not portable.
> 
> I totally agree generic APIs are better, but not everybody has the resources
> to rewrite the whole framework just because they want to do this tiny thing
> that isn't covered by the framework yet. In that case it is better to go
> with a custom API (that might evolve into a generic API), rather than
> overloading the generic API and putting a strain on 

Re: [PATCH 1/1] i2c: dev: check i2c_msg len before memdup_user() to prevent ZERO_SIZE_PTR deref

2018-04-18 Thread Uwe Kleine-König
Hello,

On Wed, Apr 18, 2018 at 03:16:45AM +0300, Alexander Popov wrote:
> Currently i2cdev_ioctl_rdwr() doesn't check i2c_msg len against zero
> before calling memdup_user(). If this len is zero memdup_user() returns
> ZERO_SIZE_PTR, which is later considered as valid since
> IS_ERR(ZERO_SIZE_PTR) is false. That causes ZERO_SIZE_PTR deref oops.

You're saying that

memdup_user(ptr, 0)

reads from *ptr? I'd say this is a bug in memdup_user, not its user.

If however the problem only happens later in

if (msgs[i].flags & I2C_M_RECV_LEN) {
if (!(msgs[i].flags & I2C_M_RD) || msgs[i].buf[0] < 1 || ...)

Your commit log is wrong (and I think the patch, too).

Best regards
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | http://www.pengutronix.de/  |


[PATCH 4/6] rhashtable: improve rhashtable_walk stability when stop/start used.

2018-04-18 Thread NeilBrown
When a walk of an rhashtable is interrupted with rhastable_walk_stop()
and then rhashtable_walk_start(), the location to restart from is based
on a 'skip' count in the current hash chain, and this can be incorrect
if insertions or deletions have happened.  This does not happen when
the walk is not stopped and started as iter->p is a placeholder which
is safe to use while holding the RCU read lock.

In rhashtable_walk_start() we can revalidate that 'p' is still in the
same hash chain.  If it isn't then the current method is still used.

With this patch, if a rhashtable walker ensures that the current
object remains in the table over a stop/start period (possibly by
elevating the reference count if that is sufficient), it can be sure
that a walk will not miss objects that were in the hashtable for the
whole time of the walk.

rhashtable_walk_start() may not find the object even though it is
still in the hashtable if a rehash has moved it to a new table.  In
this case it will (eventually) get -EAGAIN and will need to proceed
through the whole table again to be sure to see everything at least
once.

Acked-by: Herbert Xu 
Signed-off-by: NeilBrown 
---
 lib/rhashtable.c |   44 +---
 1 file changed, 41 insertions(+), 3 deletions(-)

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 28e1be9f681b..16cde54a553b 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -723,6 +723,7 @@ int rhashtable_walk_start_check(struct rhashtable_iter 
*iter)
__acquires(RCU)
 {
struct rhashtable *ht = iter->ht;
+   bool rhlist = ht->rhlist;
 
rcu_read_lock();
 
@@ -731,13 +732,52 @@ int rhashtable_walk_start_check(struct rhashtable_iter 
*iter)
list_del(>walker.list);
spin_unlock(>lock);
 
-   if (!iter->walker.tbl && !iter->end_of_table) {
+   if (iter->end_of_table)
+   return 0;
+   if (!iter->walker.tbl) {
iter->walker.tbl = rht_dereference_rcu(ht->tbl, ht);
iter->slot = 0;
iter->skip = 0;
return -EAGAIN;
}
 
+   if (iter->p && !rhlist) {
+   /*
+* We need to validate that 'p' is still in the table, and
+* if so, update 'skip'
+*/
+   struct rhash_head *p;
+   int skip = 0;
+   rht_for_each_rcu(p, iter->walker.tbl, iter->slot) {
+   skip++;
+   if (p == iter->p) {
+   iter->skip = skip;
+   goto found;
+   }
+   }
+   iter->p = NULL;
+   } else if (iter->p && rhlist) {
+   /* Need to validate that 'list' is still in the table, and
+* if so, update 'skip' and 'p'.
+*/
+   struct rhash_head *p;
+   struct rhlist_head *list;
+   int skip = 0;
+   rht_for_each_rcu(p, iter->walker.tbl, iter->slot) {
+   for (list = container_of(p, struct rhlist_head, rhead);
+list;
+list = rcu_dereference(list->next)) {
+   skip++;
+   if (list == iter->list) {
+   iter->p = p;
+   skip = skip;
+   goto found;
+   }
+   }
+   }
+   iter->p = NULL;
+   }
+found:
return 0;
 }
 EXPORT_SYMBOL_GPL(rhashtable_walk_start_check);
@@ -913,8 +953,6 @@ void rhashtable_walk_stop(struct rhashtable_iter *iter)
iter->walker.tbl = NULL;
spin_unlock(>lock);
 
-   iter->p = NULL;
-
 out:
rcu_read_unlock();
 }




Re: [PATCH 1/1] i2c: dev: check i2c_msg len before memdup_user() to prevent ZERO_SIZE_PTR deref

2018-04-18 Thread Alexander Popov
On 18.04.2018 10:07, Uwe Kleine-König wrote:
> Hello,

Hello Uwe,

Thanks for your reply.

> On Wed, Apr 18, 2018 at 03:16:45AM +0300, Alexander Popov wrote:
>> Currently i2cdev_ioctl_rdwr() doesn't check i2c_msg len against zero
>> before calling memdup_user(). If this len is zero memdup_user() returns
>> ZERO_SIZE_PTR, which is later considered as valid since
>> IS_ERR(ZERO_SIZE_PTR) is false. That causes ZERO_SIZE_PTR deref oops.
> 
> You're saying that
> 
>   memdup_user(ptr, 0)
> 
> reads from *ptr? I'd say this is a bug in memdup_user, not its user.

No, I don't say that.

memdup_user(ptr, 0) returns ZERO_SIZE_PTR, which is later considered as valid
since IS_ERR(ZERO_SIZE_PTR) is false:

msgs[i].buf = memdup_user(data_ptrs[i], msgs[i].len);
if (IS_ERR(msgs[i].buf)) {
res = PTR_ERR(msgs[i].buf);
break;
}

That causes ZERO_SIZE_PTR deref oops after that:

root@syzkaller:~# ./repro
[   22.015442] kasan: CONFIG_KASAN_INLINE enabled
[   22.066965] kasan: GPF could be caused by NULL-ptr deref or user memory 
access
[   22.068624] general protection fault:  [#1] SMP KASAN
[   22.069705] Dumping ftrace buffer:
[   22.070399](ftrace buffer empty)
[   22.071033] Modules linked in:
[   22.071562] CPU: 0 PID: 3899 Comm: repro.exe Not tainted 4.17.0-rc1 #2
[   22.072632] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
Ubuntu-1.8.2-1ubuntu1 04/01/2014
[   22.074219] RIP: 0010:i2cdev_ioctl_rdwr+0x12b/0x7b0
[   22.075023] RSP: 0018:880061f3fa68 EFLAGS: 00010346
[   22.075877] RAX: 0002 RBX:  RCX: 
[   22.076973] RDX:  RSI: 2000 RDI: 88006a2e9542
[   22.078086] RBP: 880061f3fac0 R08:  R09: 880060b44780
[   22.079166] R10: 11000c3e7f1d R11:  R12: dc00
[   22.080251] R13: 88006a2e9540 R14: 0010 R15: 0001
[   22.081339] FS:  020bc880() GS:88006ba0()
knlGS:
[   22.082615] CS:  0010 DS:  ES:  CR0: 80050033
[   22.083526] CR2: 22c3 CR3: 6724a000 CR4: 06f0
[   22.084631] Call Trace:
[   22.085501]  ? i2cdev_ioctl_rdwr+0xf/0x7b0
[   22.086865]  i2cdev_ioctl+0x4ec/0x940
[   22.088677]  ? kasan_check_read+0x11/0x20
[   22.090555]  ? i2cdev_ioctl_smbus+0x6a0/0x6a0
[   22.091862]  ? do_raw_spin_trylock+0x1e0/0x1e0
[   22.092428]  ? kasan_check_write+0x14/0x20
[   22.092946]  ? trace_hardirqs_off+0xd/0x10
[   22.093451]  ? _raw_spin_unlock_irqrestore+0xa6/0xe0
[   22.094013]  ? debug_check_no_obj_freed+0x341/0x7eb
[   22.094547]  ? i2cdev_ioctl_smbus+0x6a0/0x6a0
[   22.095086]  do_vfs_ioctl+0x1cd/0x17b0
[   22.095482]  ? kasan_check_read+0x11/0x20
[   22.095978]  ? rcu_is_watching+0x7b/0x150
[   22.096428]  ? ioctl_preallocate+0x350/0x350
[   22.096908]  ? __fget_light+0x2fc/0x4c0
[   22.097351]  ? fget_raw+0x20/0x20
[   22.097721]  ? kmem_cache_free+0x31c/0x450
[   22.098164]  ? putname+0xfa/0x150
[   22.098511]  ? do_sys_open+0x31c/0x710
[   22.099792]  ? security_file_ioctl+0x8c/0xc0
[   22.102080]  ksys_ioctl+0x94/0xb0
[   22.103204]  __x64_sys_ioctl+0x7c/0xd0
[   22.103643]  do_syscall_64+0x193/0x920
[   22.104186]  ? trace_event_raw_event_sys_exit+0x2e0/0x2e0
[   22.105061]  ? syscall_return_slowpath+0x6a0/0x6a0
[   22.106717]  ? syscall_return_slowpath+0x2de/0x6a0
[   22.108183]  ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[   22.109597]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[   22.110167]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   22.110735] RIP: 0033:0x44df89
[   22.111077] RSP: 002b:7fff7fb01ca8 EFLAGS: 0213 ORIG_RAX:
0010
[   22.111932] RAX: ffda RBX: 00400418 RCX: 0044df89
[   22.112958] RDX: 2080 RSI: 0707 RDI: 0003
[   22.114078] RBP: 7fff7fb01cc0 R08:  R09: 00401af0
[   22.115784] R10:  R11: 0213 R12: 00401b90
[   22.116870] R13:  R14: 006bd018 R15: 
[   22.117953] Code: 00 e8 7a 53 bd fb 41 83 e7 01 0f 84 e8 03 00 00 e8 6b 53 bd
fb 4d 85 f6 0f 84 12 06 00 00 4c 89 f0 4c 89 f1 48 c1 e8 03 83 e1 07 <42> 0f b6
04 20 38 c8 7f 08 84 c0 0f 85 e7 05 00 00 45 0f b6 36
[   22.120532] RIP: i2cdev_ioctl_rdwr+0x12b/0x7b0 RSP: 880061f3fa68
[   22.121290] ---[ end trace b365c176b1d95614 ]---


> If however the problem only happens later in
> 
>   if (msgs[i].flags & I2C_M_RECV_LEN) {
>   if (!(msgs[i].flags & I2C_M_RD) || msgs[i].buf[0] < 1 || ...)

Yes, that's true. I think I should make the commit message more verbose. I'll
come with v2.

> Your commit log is wrong (and I think the patch, too).

I believe this bug is not a memdup_user() issue. There is a nice selection from
LKML discussions about ZERO_SIZE_PTR, which convinces me:
http://yarchive.net/comp/linux/malloc_0.html

Best regards,

Re: [v4 PATCH] mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct

2018-04-18 Thread Michal Hocko
On Sun 15-04-18 02:24:51, Yang Shi wrote:
> mmap_sem is on the hot path of kernel, and it very contended, but it is
> abused too. It is used to protect arg_start|end and evn_start|end when
> reading /proc/$PID/cmdline and /proc/$PID/environ, but it doesn't make
> sense since those proc files just expect to read 4 values atomically and
> not related to VM, they could be set to arbitrary values by C/R.
> 
> And, the mmap_sem contention may cause unexpected issue like below:
> 
> INFO: task ps:14018 blocked for more than 120 seconds.
>Tainted: GE 4.9.79-009.ali3000.alios7.x86_64 #1
>  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
> message.
>  ps  D0 14018  1 0x0004
>   885582f84000 885e8682f000 880972943000 885ebf499bc0
>   8828ee12 c900349bfca8 817154d0 0040
>   00ff812f872a 885ebf499bc0 024000d000948300 880972943000
>  Call Trace:
>   [] ? __schedule+0x250/0x730
>   [] schedule+0x36/0x80
>   [] rwsem_down_read_failed+0xf0/0x150
>   [] call_rwsem_down_read_failed+0x18/0x30
>   [] down_read+0x20/0x40
>   [] proc_pid_cmdline_read+0xd9/0x4e0
>   [] ? do_filp_open+0xa5/0x100
>   [] __vfs_read+0x37/0x150
>   [] ? security_file_permission+0x9b/0xc0
>   [] vfs_read+0x96/0x130
>   [] SyS_read+0x55/0xc0
>   [] entry_SYSCALL_64_fastpath+0x1a/0xc5
> 
> Both Alexey Dobriyan and Michal Hocko suggested to use dedicated lock
> for them to mitigate the abuse of mmap_sem.
> 
> So, introduce a new spinlock in mm_struct to protect the concurrent
> access to arg_start|end, env_start|end and others, as well as replace
> write map_sem to read to protect the race condition between prctl and
> sys_brk which might break check_data_rlimit(), and makes prctl more
> friendly to other VM operations.
> 
> This patch just eliminates the abuse of mmap_sem, but it can't resolve the
> above hung task warning completely since the later access_remote_vm() call
> needs acquire mmap_sem. The mmap_sem scalability issue will be solved in the
> future.
> 
> Signed-off-by: Yang Shi 
> Cc: Alexey Dobriyan 
> Cc: Michal Hocko 
> Cc: Matthew Wilcox 
> Cc: Mateusz Guzik 
> Cc: Cyrill Gorcunov 

Yes, looks good to me. As mentioned in other emails prctl_set_mm_map
really deserves a comment explaining why we are doing the down_read

What about something like the following?
"
arg_lock protects concurent updates but we still need mmap_sem for read
to exclude races with do_brk.
"
Acked-by: Michal Hocko 

> ---
> v3 --> v4:
> * Protected values update with down_read + spin_lock to prevent from race
>   condition between prctl and sys_brk and made prctl more friendly to VM
>   operations per Michal's suggestion
> 
> v2 --> v3:
> * Restored down_write in prctl syscall
> * Elaborate the limitation of this patch suggested by Michal
> * Protect those fields by the new lock except brk and start_brk per Michal's
>   suggestion
> * Based off Cyrill's non PR_SET_MM_MAP oprations deprecation patch
>   (https://lkml.org/lkml/2018/4/5/541)
> 
> v1 --> v2:
> * Use spinlock instead of rwlock per Mattew's suggestion
> * Replace down_write to down_read in prctl_set_mm (see commit log for details)
>  fs/proc/base.c   | 8 
>  include/linux/mm_types.h | 2 ++
>  kernel/fork.c| 1 +
>  kernel/sys.c | 6 --
>  mm/init-mm.c | 1 +
>  5 files changed, 12 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index eafa39a..3551757 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -239,12 +239,12 @@ static ssize_t proc_pid_cmdline_read(struct file *file, 
> char __user *buf,
>   goto out_mmput;
>   }
>  
> - down_read(>mmap_sem);
> + spin_lock(>arg_lock);
>   arg_start = mm->arg_start;
>   arg_end = mm->arg_end;
>   env_start = mm->env_start;
>   env_end = mm->env_end;
> - up_read(>mmap_sem);
> + spin_unlock(>arg_lock);
>  
>   BUG_ON(arg_start > arg_end);
>   BUG_ON(env_start > env_end);
> @@ -929,10 +929,10 @@ static ssize_t environ_read(struct file *file, char 
> __user *buf,
>   if (!mmget_not_zero(mm))
>   goto free;
>  
> - down_read(>mmap_sem);
> + spin_lock(>arg_lock);
>   env_start = mm->env_start;
>   env_end = mm->env_end;
> - up_read(>mmap_sem);
> + spin_unlock(>arg_lock);
>  
>   while (count > 0) {
>   size_t this_len, max_len;
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 2161234..49dd59e 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -413,6 +413,8 @@ struct mm_struct {
>   unsigned long exec_vm;  /* VM_EXEC & ~VM_WRITE & ~VM_STACK */
>   unsigned long stack_vm; /* VM_STACK */
>   unsigned long def_flags;
> +
> + 

Re: [RESEND PATCH v2 0/4] ARM: davinci: remove the mach-specific aemif driver - part 1

2018-04-18 Thread Sekhar Nori
On Tuesday 17 April 2018 10:59 PM, Santosh Shilimkar wrote:
> On 4/17/2018 5:36 AM, Bartosz Golaszewski wrote:
>> 2018-04-17 12:53 GMT+02:00 Sekhar Nori :
>>> Hi Bartosz,
> 
> [...]
> 
 This series applies on top of v8 of David Lechner's CCF series.
>>>
>>> Are there any patches in the series that can safely be applied to
>>> v4.17-rc1?
> 
>>
>> Hi Sekhar,
>>
>> yes, all of them. They're not linked to David's work in any way.
>>
> Please separate the driver/emif/* patches from the series. These
> are not fixes so they are not rcx candidates. But will queue them
> up for next merge window. Thanks !!

Thanks Santosh! Can you please queue the drivers/emif/ patches on an
immutable branch based off v4.17-rc1 which I can then merge to queue the
platform patches?

Thanks,
Sekhar



Re: [PATCH v3 1/6] ilog2: create truly constant version for sparse

2018-04-18 Thread Luc Van Oostenryck
On Wed, Apr 18, 2018 at 10:12:54AM +0200, Martin Wilck wrote:
> On Tue, 2018-04-17 at 17:07 -0700, Linus Torvalds wrote:
> > On Tue, Apr 17, 2018 at 4:35 PM, Martin Wilck 
> > wrote:
> > > Sparse emits errors about ilog2() in array indices because of the
> > > use of
> > > __ilog2_32() and __ilog2_64(),
> > 
> > If sparse warns about it, then presumably gcc with -Wvla warns about
> > it too?
> 
> No, it doesn't (gcc 7.3.0). -> https://paste.opensuse.org/27471594
> It doesn't even warn on an expression like this:
> 
>   #define SIZE (1<<10)
>   static int foo[ilog2(SIZE)];
> 
> sparse 0.5.2 doesn't warn about that either. It emits "error: bad
> integer constant expression" only if ilog2 is used in an array

sparse supports VLAs at syntaxic level but not much more. Anything
needing directly or indirectly the array size will give this error.

-- Luc Van Oostenryck


Re: [RFC PATCH 19/35] ovl: readd reflink/copyfile/dedup support

2018-04-18 Thread Amir Goldstein
On Tue, Apr 17, 2018 at 11:31 PM, Amir Goldstein  wrote:
> On Thu, Apr 12, 2018 at 6:08 PM, Miklos Szeredi  wrote:
>> Since set of arguments are so similar, handle in a common helper.
>>
>> Signed-off-by: Miklos Szeredi 
>> ---
>>  fs/overlayfs/file.c | 79 
>> +
>>  1 file changed, 79 insertions(+)
>>
>> diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
>> index 9670e160967e..39b1b73334ad 100644
>> --- a/fs/overlayfs/file.c
>> +++ b/fs/overlayfs/file.c
>> @@ -352,6 +352,81 @@ long ovl_ioctl(struct file *file, unsigned int cmd, 
>> unsigned long arg)
>> return ret;
>>  }
>>
>> +enum ovl_copyop {
>> +   OVL_COPY,
>> +   OVL_CLONE,
>> +   OVL_DEDUPE,
>> +};
>> +
>> +static ssize_t ovl_copyfile(struct file *file_in, loff_t pos_in,
>> +   struct file *file_out, loff_t pos_out,
>> +   u64 len, unsigned int flags, enum ovl_copyop op)
>> +{
>> +   struct inode *inode_out = file_inode(file_out);
>> +   struct fd real_in, real_out;
>> +   const struct cred *old_cred;
>> +   int ret;
>> +
>> +   ret = ovl_real_file(file_out, _out);
>> +   if (ret)
>> +   return ret;
>> +
>> +   ret = ovl_real_file(file_in, _in);
>> +   if (ret) {
>> +   fdput(real_out);
>> +   return ret;
>> +   }
>> +
>> +   old_cred = ovl_override_creds(file_inode(file_out)->i_sb);
>> +   switch (op) {
>> +   case OVL_COPY:
>> +   ret = vfs_copy_file_range(real_in.file, pos_in,
>> + real_out.file, pos_out, len, 
>> flags);
>
> Problem:
> vfs_copy_file_range(ovl_lower_file, ovl_upper_file) on non samefs
> will get -EXDEV from  ovl_copy_file_range(), so will not fall back
> to do_splice_direct().
> We may be better off checking in_sb != out_sb and returning
> -EOPNOTSUPP? not sure.
>
>
>> +   break;
>> +
>> +   case OVL_CLONE:
>> +   ret = vfs_clone_file_range(real_in.file, pos_in,
>> +  real_out.file, pos_out, len);
>> +   break;
>> +
>> +   case OVL_DEDUPE:
>> +   ret = vfs_dedupe_file_range_one(real_in.file, pos_in, len,
>> +   real_out.file, pos_out);
>
> Problem:
> real_out can be a readonly fd (for is_admin), so we will be deduping
> the lower file.
> I guess this problem is mitigated in current code by may_write_real().
>
> How can we deal with that sort of "write leak" without patching
>  mnt_want_write_file()?
>

Possible solution:
Add interface file_oprations->permission().

At least in rw_verify_area() and clone_verify_area() it is clear
how this would be used. Instead if calling security_file_permission()
call it via a helper file_permission() like with inode_permission.

My understanding in the VFS permission checks model is too
limited to say if this makes sense.

Thanks,
Amir.


[PATCH] m68k/defconfig: Update defconfigs for v4.17-rc1

2018-04-18 Thread Geert Uytterhoeven
Signed-off-by: Geert Uytterhoeven 
---
 arch/m68k/configs/amiga_defconfig| 14 +-
 arch/m68k/configs/apollo_defconfig   | 14 +-
 arch/m68k/configs/atari_defconfig| 14 +-
 arch/m68k/configs/bvme6000_defconfig | 14 +-
 arch/m68k/configs/hp300_defconfig| 14 +-
 arch/m68k/configs/mac_defconfig  | 14 +-
 arch/m68k/configs/multi_defconfig| 14 +-
 arch/m68k/configs/mvme147_defconfig  | 14 +-
 arch/m68k/configs/mvme16x_defconfig  | 14 +-
 arch/m68k/configs/q40_defconfig  | 14 +-
 arch/m68k/configs/sun3_defconfig | 14 +-
 arch/m68k/configs/sun3x_defconfig| 14 +-
 12 files changed, 108 insertions(+), 60 deletions(-)

diff --git a/arch/m68k/configs/amiga_defconfig 
b/arch/m68k/configs/amiga_defconfig
index 37a8e5ab87285e7d..b5abd401cec95a63 100644
--- a/arch/m68k/configs/amiga_defconfig
+++ b/arch/m68k/configs/amiga_defconfig
@@ -98,8 +98,8 @@ CONFIG_NF_CONNTRACK_SANE=m
 CONFIG_NF_CONNTRACK_SIP=m
 CONFIG_NF_CONNTRACK_TFTP=m
 CONFIG_NF_TABLES=m
-CONFIG_NF_TABLES_INET=m
-CONFIG_NF_TABLES_NETDEV=m
+CONFIG_NF_TABLES_INET=y
+CONFIG_NF_TABLES_NETDEV=y
 CONFIG_NFT_EXTHDR=m
 CONFIG_NFT_META=m
 CONFIG_NFT_RT=m
@@ -204,7 +204,7 @@ CONFIG_NF_SOCKET_IPV4=m
 CONFIG_NFT_CHAIN_ROUTE_IPV4=m
 CONFIG_NFT_DUP_IPV4=m
 CONFIG_NFT_FIB_IPV4=m
-CONFIG_NF_TABLES_ARP=m
+CONFIG_NF_TABLES_ARP=y
 CONFIG_NF_FLOW_TABLE_IPV4=m
 CONFIG_NF_LOG_ARP=m
 CONFIG_NFT_CHAIN_NAT_IPV4=m
@@ -259,7 +259,7 @@ CONFIG_IP6_NF_RAW=m
 CONFIG_IP6_NF_NAT=m
 CONFIG_IP6_NF_TARGET_MASQUERADE=m
 CONFIG_IP6_NF_TARGET_NPT=m
-CONFIG_NF_TABLES_BRIDGE=m
+CONFIG_NF_TABLES_BRIDGE=y
 CONFIG_NFT_BRIDGE_META=m
 CONFIG_NFT_BRIDGE_REJECT=m
 CONFIG_NF_LOG_BRIDGE=m
@@ -310,7 +310,6 @@ CONFIG_NET_MPLS_GSO=m
 CONFIG_MPLS_ROUTING=m
 CONFIG_MPLS_IPTUNNEL=m
 CONFIG_NET_NSH=m
-CONFIG_NET_L3_MASTER_DEV=y
 CONFIG_AF_KCM=m
 # CONFIG_WIRELESS is not set
 CONFIG_PSAMPLE=m
@@ -414,6 +413,7 @@ CONFIG_ARIADNE=y
 # CONFIG_NET_VENDOR_MARVELL is not set
 # CONFIG_NET_VENDOR_MICREL is not set
 # CONFIG_NET_VENDOR_NETRONOME is not set
+# CONFIG_NET_VENDOR_NI is not set
 CONFIG_HYDRA=y
 CONFIG_APNE=y
 CONFIG_ZORRO8390=y
@@ -485,6 +485,7 @@ CONFIG_RTC_DRV_MSM6242=m
 CONFIG_RTC_DRV_RP5C01=m
 # CONFIG_VIRTIO_MENU is not set
 # CONFIG_IOMMU_SUPPORT is not set
+CONFIG_DAX=m
 CONFIG_HEARTBEAT=y
 CONFIG_PROC_HARDWARE=y
 CONFIG_AMIGA_BUILTIN_SERIAL=y
@@ -621,6 +622,7 @@ CONFIG_CRYPTO_CRYPTD=m
 CONFIG_CRYPTO_MCRYPTD=m
 CONFIG_CRYPTO_TEST=m
 CONFIG_CRYPTO_CHACHA20POLY1305=m
+CONFIG_CRYPTO_CFB=m
 CONFIG_CRYPTO_LRW=m
 CONFIG_CRYPTO_PCBC=m
 CONFIG_CRYPTO_KEYWRAP=m
@@ -647,6 +649,8 @@ CONFIG_CRYPTO_KHAZAD=m
 CONFIG_CRYPTO_SALSA20=m
 CONFIG_CRYPTO_SEED=m
 CONFIG_CRYPTO_SERPENT=m
+CONFIG_CRYPTO_SM4=m
+CONFIG_CRYPTO_SPECK=m
 CONFIG_CRYPTO_TEA=m
 CONFIG_CRYPTO_TWOFISH=m
 CONFIG_CRYPTO_LZO=m
diff --git a/arch/m68k/configs/apollo_defconfig 
b/arch/m68k/configs/apollo_defconfig
index 6a466266b852fb77..078a623459af2a17 100644
--- a/arch/m68k/configs/apollo_defconfig
+++ b/arch/m68k/configs/apollo_defconfig
@@ -96,8 +96,8 @@ CONFIG_NF_CONNTRACK_SANE=m
 CONFIG_NF_CONNTRACK_SIP=m
 CONFIG_NF_CONNTRACK_TFTP=m
 CONFIG_NF_TABLES=m
-CONFIG_NF_TABLES_INET=m
-CONFIG_NF_TABLES_NETDEV=m
+CONFIG_NF_TABLES_INET=y
+CONFIG_NF_TABLES_NETDEV=y
 CONFIG_NFT_EXTHDR=m
 CONFIG_NFT_META=m
 CONFIG_NFT_RT=m
@@ -202,7 +202,7 @@ CONFIG_NF_SOCKET_IPV4=m
 CONFIG_NFT_CHAIN_ROUTE_IPV4=m
 CONFIG_NFT_DUP_IPV4=m
 CONFIG_NFT_FIB_IPV4=m
-CONFIG_NF_TABLES_ARP=m
+CONFIG_NF_TABLES_ARP=y
 CONFIG_NF_FLOW_TABLE_IPV4=m
 CONFIG_NF_LOG_ARP=m
 CONFIG_NFT_CHAIN_NAT_IPV4=m
@@ -257,7 +257,7 @@ CONFIG_IP6_NF_RAW=m
 CONFIG_IP6_NF_NAT=m
 CONFIG_IP6_NF_TARGET_MASQUERADE=m
 CONFIG_IP6_NF_TARGET_NPT=m
-CONFIG_NF_TABLES_BRIDGE=m
+CONFIG_NF_TABLES_BRIDGE=y
 CONFIG_NFT_BRIDGE_META=m
 CONFIG_NFT_BRIDGE_REJECT=m
 CONFIG_NF_LOG_BRIDGE=m
@@ -308,7 +308,6 @@ CONFIG_NET_MPLS_GSO=m
 CONFIG_MPLS_ROUTING=m
 CONFIG_MPLS_IPTUNNEL=m
 CONFIG_NET_NSH=m
-CONFIG_NET_L3_MASTER_DEV=y
 CONFIG_AF_KCM=m
 # CONFIG_WIRELESS is not set
 CONFIG_PSAMPLE=m
@@ -392,6 +391,7 @@ CONFIG_VETH=m
 # CONFIG_NET_VENDOR_MICREL is not set
 # CONFIG_NET_VENDOR_NATSEMI is not set
 # CONFIG_NET_VENDOR_NETRONOME is not set
+# CONFIG_NET_VENDOR_NI is not set
 # CONFIG_NET_VENDOR_QUALCOMM is not set
 # CONFIG_NET_VENDOR_RENESAS is not set
 # CONFIG_NET_VENDOR_ROCKER is not set
@@ -446,6 +446,7 @@ CONFIG_RTC_CLASS=y
 CONFIG_RTC_DRV_GENERIC=m
 # CONFIG_VIRTIO_MENU is not set
 # CONFIG_IOMMU_SUPPORT is not set
+CONFIG_DAX=m
 CONFIG_HEARTBEAT=y
 CONFIG_PROC_HARDWARE=y
 CONFIG_EXT4_FS=y
@@ -580,6 +581,7 @@ CONFIG_CRYPTO_CRYPTD=m
 CONFIG_CRYPTO_MCRYPTD=m
 CONFIG_CRYPTO_TEST=m
 CONFIG_CRYPTO_CHACHA20POLY1305=m
+CONFIG_CRYPTO_CFB=m
 CONFIG_CRYPTO_LRW=m
 CONFIG_CRYPTO_PCBC=m
 CONFIG_CRYPTO_KEYWRAP=m
@@ -606,6 +608,8 @@ CONFIG_CRYPTO_KHAZAD=m
 CONFIG_CRYPTO_SALSA20=m
 CONFIG_CRYPTO_SEED=m
 CONFIG_CRYPTO_SERPENT=m
+CONFIG_CRYPTO_SM4=m

[PATCH v2 0/3] intel-iommu: fix mapping PSI missing for iommu_map()

2018-04-18 Thread Peter Xu
v2:
- cc correct people and iommu list

(PSI stands for: Page Selective Invalidations)

Intel IOMMU has the caching mode to ease emulation of the device.
When that bit is set, we need to send PSIs even for newly mapped
pages.  However current driver is not fully obey the rule.  E.g.,
iommu_map() API will only do the mapping but it never sent the PSIs
before.  That can be problematic to emulated IOMMU devices since
they'll never be able to build up the shadow page tables if without
such information.  This patchset tries to fix the problem.

Patch 1 is a tracing enhancement that helped me to triage the problem.
It might even be useful in the future.

Patch 2 generalized a helper to notify the MAP PSIs.

Patch 3 fixes the real problem by making sure every domain mapping
will trigger the MAP PSI notifications.

Without the patchset, nested device assignment (assign one device
firstly to L1 guest, then to L2 guest) won't work for QEMU.  After
applying the patchset, it works.

Please review.  Thanks.

Peter Xu (3):
  intel-iommu: add some traces for PSIs
  intel-iommu: generalize __mapping_notify_one()
  intel-iommu: fix iotlb psi missing for mappings

 drivers/iommu/dmar.c|  3 ++
 drivers/iommu/intel-iommu.c | 68 -
 2 files changed, 52 insertions(+), 19 deletions(-)

-- 
2.14.3



[PATCH v2 1/3] intel-iommu: add some traces for PSIs

2018-04-18 Thread Peter Xu
It is helpful to debug and triage PSI notification missings.

Signed-off-by: Peter Xu 
---
 drivers/iommu/dmar.c| 3 +++
 drivers/iommu/intel-iommu.c | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 9a7ffd13c7f0..62ae26c3f7b7 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1325,6 +1325,9 @@ void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, 
u64 addr,
struct qi_desc desc;
int ih = 0;
 
+   pr_debug("%s: iommu %d did %u addr 0x%llx order %u type %llx\n",
+__func__, iommu->seq_id, did, addr, size_order, type);
+
if (cap_write_drain(iommu->cap))
dw = 1;
 
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 582fd01cb7d1..a64da83e867c 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1396,6 +1396,9 @@ static void __iommu_flush_iotlb(struct intel_iommu 
*iommu, u16 did,
u64 val = 0, val_iva = 0;
unsigned long flag;
 
+   pr_debug("%s: iommu %d did %u addr 0x%llx order %u type %llx\n",
+__func__, iommu->seq_id, did, addr, size_order, type);
+
switch (type) {
case DMA_TLB_GLOBAL_FLUSH:
/* global flush doesn't need set IVA_REG */
-- 
2.14.3



[PATCH v2 2/3] intel-iommu: generalize __mapping_notify_one()

2018-04-18 Thread Peter Xu
Generalize this new helper to notify one newly created mapping on one
single IOMMU.  We can further leverage this helper in the next patch.

Signed-off-by: Peter Xu 
---
 drivers/iommu/intel-iommu.c | 26 ++
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index a64da83e867c..bf111e60857c 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1607,6 +1607,18 @@ static void iommu_flush_iotlb_psi(struct intel_iommu 
*iommu,
iommu_flush_dev_iotlb(domain, addr, mask);
 }
 
+/* Notification for newly created mappings */
+static inline void __mapping_notify_one(struct intel_iommu *iommu,
+   struct dmar_domain *domain,
+   unsigned long pfn, unsigned int pages)
+{
+   /* It's a non-present to present mapping. Only flush if caching mode */
+   if (cap_caching_mode(iommu->cap))
+   iommu_flush_iotlb_psi(iommu, domain, pfn, pages, 0, 1);
+   else
+   iommu_flush_write_buffer(iommu);
+}
+
 static void iommu_flush_iova(struct iova_domain *iovad)
 {
struct dmar_domain *domain;
@@ -3626,13 +3638,7 @@ static dma_addr_t __intel_map_single(struct device *dev, 
phys_addr_t paddr,
if (ret)
goto error;
 
-   /* it's a non-present to present mapping. Only flush if caching mode */
-   if (cap_caching_mode(iommu->cap))
-   iommu_flush_iotlb_psi(iommu, domain,
- mm_to_dma_pfn(iova_pfn),
- size, 0, 1);
-   else
-   iommu_flush_write_buffer(iommu);
+   __mapping_notify_one(iommu, domain, mm_to_dma_pfn(iova_pfn), size);
 
start_paddr = (phys_addr_t)iova_pfn << PAGE_SHIFT;
start_paddr += paddr & ~PAGE_MASK;
@@ -3851,11 +3857,7 @@ static int intel_map_sg(struct device *dev, struct 
scatterlist *sglist, int nele
return 0;
}
 
-   /* it's a non-present to present mapping. Only flush if caching mode */
-   if (cap_caching_mode(iommu->cap))
-   iommu_flush_iotlb_psi(iommu, domain, start_vpfn, size, 0, 1);
-   else
-   iommu_flush_write_buffer(iommu);
+   __mapping_notify_one(iommu, domain, start_vpfn, size);
 
return nelems;
 }
-- 
2.14.3



[PATCH v2 3/3] intel-iommu: fix iotlb psi missing for mappings

2018-04-18 Thread Peter Xu
When caching mode is enabled for IOMMU, we should send explicit IOTLB
PSIs even for newly created mappings.  However these events are missing
for all intel_iommu_map() callers, e.g., iommu_map().  One direct user
is the vfio-pci driver.

To make sure we'll send the PSIs always when necessary, this patch
firstly introduced domain_mapping() helper for page mappings, then fixed
the problem by generalizing the explicit map IOTLB PSI logic into that
new helper. With that, we let iommu_domain_identity_map() to use the
simplified version to avoid sending the notifications, while for all the
rest of cases we send the notifications always.

For VM case, we send the PSIs to all the backend IOMMUs for the domain.

This patch allows the nested device assignment to work with QEMU (assign
device firstly to L1 guest, then assign it again to L2 guest).

Signed-off-by: Peter Xu 
---
 drivers/iommu/intel-iommu.c | 43 ++-
 1 file changed, 34 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index bf111e60857c..eb0f0911342f 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2353,18 +2353,47 @@ static int __domain_mapping(struct dmar_domain *domain, 
unsigned long iov_pfn,
return 0;
 }
 
+static int domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
+ struct scatterlist *sg, unsigned long phys_pfn,
+ unsigned long nr_pages, int prot)
+{
+   int ret;
+   struct intel_iommu *iommu;
+
+   /* Do the real mapping first */
+   ret = __domain_mapping(domain, iov_pfn, sg, phys_pfn, nr_pages, prot);
+   if (ret)
+   return ret;
+
+   /* Notify about the new mapping */
+   if (domain_type_is_vm(domain)) {
+  /* VM typed domains can have more than one IOMMUs */
+  int iommu_id;
+  for_each_domain_iommu(iommu_id, domain) {
+  iommu = g_iommus[iommu_id];
+  __mapping_notify_one(iommu, domain, iov_pfn, nr_pages);
+  }
+   } else {
+  /* General domains only have one IOMMU */
+  iommu = domain_get_iommu(domain);
+  __mapping_notify_one(iommu, domain, iov_pfn, nr_pages);
+   }
+
+   return 0;
+}
+
 static inline int domain_sg_mapping(struct dmar_domain *domain, unsigned long 
iov_pfn,
struct scatterlist *sg, unsigned long 
nr_pages,
int prot)
 {
-   return __domain_mapping(domain, iov_pfn, sg, 0, nr_pages, prot);
+   return domain_mapping(domain, iov_pfn, sg, 0, nr_pages, prot);
 }
 
 static inline int domain_pfn_mapping(struct dmar_domain *domain, unsigned long 
iov_pfn,
 unsigned long phys_pfn, unsigned long 
nr_pages,
 int prot)
 {
-   return __domain_mapping(domain, iov_pfn, NULL, phys_pfn, nr_pages, 
prot);
+   return domain_mapping(domain, iov_pfn, NULL, phys_pfn, nr_pages, prot);
 }
 
 static void domain_context_clear_one(struct intel_iommu *iommu, u8 bus, u8 
devfn)
@@ -2669,9 +2698,9 @@ static int iommu_domain_identity_map(struct dmar_domain 
*domain,
 */
dma_pte_clear_range(domain, first_vpfn, last_vpfn);
 
-   return domain_pfn_mapping(domain, first_vpfn, first_vpfn,
- last_vpfn - first_vpfn + 1,
- DMA_PTE_READ|DMA_PTE_WRITE);
+   return __domain_mapping(domain, first_vpfn, NULL,
+   first_vpfn, last_vpfn - first_vpfn + 1,
+   DMA_PTE_READ|DMA_PTE_WRITE);
 }
 
 static int domain_prepare_identity_map(struct device *dev,
@@ -3638,8 +3667,6 @@ static dma_addr_t __intel_map_single(struct device *dev, 
phys_addr_t paddr,
if (ret)
goto error;
 
-   __mapping_notify_one(iommu, domain, mm_to_dma_pfn(iova_pfn), size);
-
start_paddr = (phys_addr_t)iova_pfn << PAGE_SHIFT;
start_paddr += paddr & ~PAGE_MASK;
return start_paddr;
@@ -3857,8 +3884,6 @@ static int intel_map_sg(struct device *dev, struct 
scatterlist *sglist, int nele
return 0;
}
 
-   __mapping_notify_one(iommu, domain, start_vpfn, size);
-
return nelems;
 }
 
-- 
2.14.3



Re: [PATCH] blk-mq: Clear out elevator private data

2018-04-18 Thread Paolo Valente


> Il giorno 17 apr 2018, alle ore 23:42, Kees Cook  ha 
> scritto:
> 
> Some elevators may not correctly check rq->rq_flags & RQF_ELVPRIV, and
> may attempt to read rq->elv fields. When requests got reused, this
> caused BFQ to think it already had a bfqq (rq->elv.priv[1]) allocated.

Hi Kees,
where does BFQ gets confused and operates on a request not destined to
it?  I'm asking because I paid attention to always avoid such a
mistake.

Thanks,
Paolo

> This could lead to odd behaviors like having the sense buffer address
> slowly start incrementing. This eventually tripped HARDENED_USERCOPY
> and KASAN.
> 
> This patch wipes all of rq->elv instead of just rq->elv.icq. While
> it shouldn't technically be needed, this ends up being a robustness
> improvement that should lead to at least finding bugs in elevators faster.
> 
> Reported-by: Oleksandr Natalenko 
> Fixes: bd166ef183c26 ("blk-mq-sched: add framework for MQ capable IO 
> schedulers")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Kees Cook 
> ---
> In theory, BFQ needs to also check the RQF_ELVPRIV flag, but I'll leave that
> to Paolo to figure out. Also, my Fixes line is kind of a best-guess. This
> is where icq was originally wiped, so it seemed as good a commit as any.
> ---
> block/blk-mq.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 0dc9e341c2a7..859df3160303 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -363,7 +363,7 @@ static struct request *blk_mq_get_request(struct 
> request_queue *q,
> 
>   rq = blk_mq_rq_ctx_init(data, tag, op);
>   if (!op_is_flush(op)) {
> - rq->elv.icq = NULL;
> + memset(>elv, 0, sizeof(rq->elv));
>   if (e && e->type->ops.mq.prepare_request) {
>   if (e->type->icq_cache && rq_ioc(bio))
>   blk_mq_sched_assign_ioc(rq, bio);
> @@ -461,7 +461,7 @@ void blk_mq_free_request(struct request *rq)
>   e->type->ops.mq.finish_request(rq);
>   if (rq->elv.icq) {
>   put_io_context(rq->elv.icq->ioc);
> - rq->elv.icq = NULL;
> + memset(>elv, 0, sizeof(rq->elv));
>   }
>   }
> 
> -- 
> 2.7.4
> 
> 
> -- 
> Kees Cook
> Pixel Security



Re: [v4 PATCH] mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct

2018-04-18 Thread Michal Hocko
On Wed 18-04-18 12:02:17, Cyrill Gorcunov wrote:
> On Wed, Apr 18, 2018 at 10:05:55AM +0200, Michal Hocko wrote:
> > 
> > Yes, looks good to me. As mentioned in other emails prctl_set_mm_map
> > really deserves a comment explaining why we are doing the down_read
> > 
> > What about something like the following?
> > "
> > arg_lock protects concurent updates but we still need mmap_sem for read
> > to exclude races with do_brk.
> > "
> > Acked-by: Michal Hocko 
> 
> Yes, thanks! Andrew, could you slightly update the changelog please?

No, I meant it to be a comment in the _code_.

-- 
Michal Hocko
SUSE Labs


Re: [RFC v4 3/4] irqflags: Avoid unnecessary calls to trace_ if you can

2018-04-18 Thread Masami Hiramatsu
On Mon, 16 Apr 2018 21:07:47 -0700
Joel Fernandes  wrote:

> With TRACE_IRQFLAGS, we call trace_ API too many times. We don't need
> to if local_irq_restore or local_irq_save didn't actually do anything.
> 
> This gives around a 4% improvement in performance when doing the
> following command: "time find / > /dev/null"
> 
> Also its best to avoid these calls where possible, since in this series,
> the RCU code in tracepoint.h seems to be call these quite a bit and I'd
> like to keep this overhead low.

Can we assume that the "flags" has only 1 bit irq-disable flag?
Since it skips calling raw_local_irq_restore(flags); too,
if there is any state in the flags on any arch, it may change the
result. In that case, we can do it as below (just skipping trace_hardirqs_*)

int disabled = irqs_disabled();

if (!raw_irqs_disabled_flags(flags) && disabled)
trace_hardirqs_on();

raw_local_irq_restore(flags);

if (raw_irqs_disabled_flags(flags) && !disabled)
trace_hardirqs_off();

Thank you,

> 
> Cc: Steven Rostedt 
> Cc: Peter Zilstra 
> Cc: Ingo Molnar 
> Cc: Mathieu Desnoyers 
> Cc: Tom Zanussi 
> Cc: Namhyung Kim 
> Cc: Thomas Glexiner 
> Cc: Boqun Feng 
> Cc: Paul McKenney 
> Cc: Frederic Weisbecker 
> Cc: Randy Dunlap 
> Cc: Masami Hiramatsu 
> Cc: Fenguang Wu 
> Cc: Baohong Liu 
> Cc: Vedang Patel 
> Signed-off-by: Joel Fernandes 
> ---
>  include/linux/irqflags.h | 21 +++--
>  1 file changed, 11 insertions(+), 10 deletions(-)
> 
> diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
> index 9700f00bbc04..ea8df0ac57d3 100644
> --- a/include/linux/irqflags.h
> +++ b/include/linux/irqflags.h
> @@ -104,19 +104,20 @@ do {\
>  #define local_irq_save(flags)\
>   do {\
>   raw_local_irq_save(flags);  \
> - trace_hardirqs_off();   \
> + if (!raw_irqs_disabled_flags(flags))\
> + trace_hardirqs_off();   \
>   } while (0)
>  
>  
> -#define local_irq_restore(flags) \
> - do {\
> - if (raw_irqs_disabled_flags(flags)) {   \
> - raw_local_irq_restore(flags);   \
> - trace_hardirqs_off();   \
> - } else {\
> - trace_hardirqs_on();\
> - raw_local_irq_restore(flags);   \
> - }   \
> +#define local_irq_restore(flags) 
> \
> + do {
> \
> + if (raw_irqs_disabled_flags(flags) && !irqs_disabled()) {   
> \
> + raw_local_irq_restore(flags);   
> \
> + trace_hardirqs_off();   
> \
> + } else if (!raw_irqs_disabled_flags(flags) && irqs_disabled()) 
> {\
> + trace_hardirqs_on();
> \
> + raw_local_irq_restore(flags);   
> \
> + }   
> \
>   } while (0)
>  
>  #define safe_halt()  \
> -- 
> 2.17.0.484.g0c8726318c-goog
> 


-- 
Masami Hiramatsu 


Re: [PATCH v6 11/11] ARM: shmobile: Convert file to use cntvoff

2018-04-18 Thread Geert Uytterhoeven
Hi Mylène,

On Mon, Apr 16, 2018 at 11:50 PM, Mylène Josserand
 wrote:
> Now that a common function is available for CNTVOFF's
> initialization, let's convert shmobile-apmu code to use
> this function.

Thanks for your patch, works fine on Renesas ALT with R-Car E2,
which suffers from lack of CNTVOFF initialization.

> Signed-off-by: Mylène Josserand 

Reviewed-by: Geert Uytterhoeven 
Tested-by: Geert Uytterhoeven 

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: INFO: task hung in fsnotify_mark_destroy_workfn

2018-04-18 Thread Jan Kara
Hello,

On Tue 17-04-18 18:02:02, syzbot wrote:
> syzbot hit the following crash on upstream commit
> a27fc14219f2e3c4a46ba9177b04d9b52c875532 (Mon Apr 16 21:07:39 2018 +)
> Merge branch 'parisc-4.17-3' of
> git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
> syzbot dashboard link:
> https://syzkaller.appspot.com/bug?extid=e38306788a2e7102a3b6
> 
> syzkaller reproducer:
> https://syzkaller.appspot.com/x/repro.syz?id=5126465372815360
> Raw console output:
> https://syzkaller.appspot.com/x/log.txt?id=5956756370882560
> Kernel config:
> https://syzkaller.appspot.com/x/.config?id=-5914490758943236750
> compiler: gcc (GCC) 8.0.1 20180413 (experimental)
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+e38306788a2e7102a...@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.
> 

Removed binder messages from the lockup splat so that it's more readable.

> INFO: task kworker/u4:4:853 blocked for more than 120 seconds.
>   Not tainted 4.17.0-rc1+ #6
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kworker/u4:4D11512   853  2 0x8000
> Workqueue: events_unbound fsnotify_mark_destroy_workfn
> Call Trace:
>  context_switch kernel/sched/core.c:2848 [inline]
>  __schedule+0x801/0x1e30 kernel/sched/core.c:3490
>  schedule+0xef/0x430 kernel/sched/core.c:3549
>  schedule_timeout+0x1b5/0x240 kernel/time/timer.c:1777
>  do_wait_for_common kernel/sched/completion.c:83 [inline]
>  __wait_for_common kernel/sched/completion.c:104 [inline]
>  wait_for_common kernel/sched/completion.c:115 [inline]
>  wait_for_completion+0x3e7/0x870 kernel/sched/completion.c:136
>  __synchronize_srcu+0x189/0x240 kernel/rcu/srcutree.c:924
>  synchronize_srcu+0x408/0x54f kernel/rcu/srcutree.c:1002
>  fsnotify_mark_destroy_workfn+0x1aa/0x530 fs/notify/mark.c:759
>  process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145
>  worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279
>  kthread+0x345/0x410 kernel/kthread.c:238
>  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412

OK, so we are waiting for the grace period on fsnotify_mark_srcu. Seems
like someone is holding fsnotify_mark_srcu too long or srcu period cannot
finish for some other reason. However the reproducer basically contains
only one binder ioctl and I have no idea how that's connected with fsnotify
in any way. So either the reproducer is wrong, or binder is corrupting
memory and fsnotify is just a victim, or something like that...

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: [PATCH 1/4] ASoC: dwc: I2S Controller instance param added

2018-04-18 Thread Mukunda,Vijendar



On Tuesday 17 April 2018 09:39 PM, Mark Brown wrote:

On Tue, Apr 17, 2018 at 10:29:51AM +0530, Vijendar Mukunda wrote:


+#define I2S_SP_INSTANCE1
+#define I2S_BT_INSTANCE2


This is obviously very specific to the system you're working with and
therefore doesn't belong in the generic driver.  The device should be
dealing with its own configuration, it shouldn't need to know about what
specifically is connected to it.  It's not even clear what they're doing
in this driver given that there doesn't appear to be any use of the
information, it feels like this is something that the machine driver
should be encapsulating.

Like I said with previous reviews this use of magic numbers for the
interfaces is a bit of a red flag, internally within a driver they're
fine but they shouldn't leak out too much except with things like
numbering an array.



I will remove macros from designware header file and I will re spin
the patch set


RE: [PATCH net-next 3/3] net: phy: Enable C45 PHYs with vendor specific address space

2018-04-18 Thread Vicenţiu Galanopulo


> > Having dev-addr stored in devices_addrs, in get_phy_c45_ids(), when
> > probing the identifiers, dev-addr can be extracted from devices_addrs
> > and probed if devices_addrs[current_identifier] is not 0.
> 
> I must clearly be missing something, but why are you introducing all these
> conditionals instead of updating the existing code to be able to operate 
> against
> an arbitrary dev-addr value, and then just making sure the first thing you do 
> is
> fetch that property from Device Tree? There is no way someone is going to be
> testing with your specific use case in the future (except yourselves) so 
> unless you
> make supporting an arbitrary "dev-addr" value become part of how the code
> works, this is going to be breaking badly.
>

Hi Florian,

My intention was to have this patch as "plugin" and modify the existing kernel 
API little to none. I was thinking that with a #ifdef, ideally,  all changes 
could be part of a CONFIG kernel option.
Updating the existing code, instead of the conditionals, might run into just 
that, and the change could propagate across multiple modules. This is from my 
first RFC patch, review by Andrew:
of_mdiobus_register(), when it loops over the children, looks for 
the new property. If found, it passed dev-id to 
of_mdiobus_register_phy().
That passes it to get_phy_device(). I think get_phy_device() can 
then set the ID in c45_ids, before passing it to get_phy_id().
get_phy_c45_ids() will first look at devices in package and can add 
further devices to c45_ids. It will then probe both those found, and 
the static
one you added.

  Andrew

[Vicenţiu Galanopulo]
Just to make sure I understand. Do you want me to change the signature 
of all of_mdiobus_register_phy(), get_phy_device(), get_phy_id() and
get_phy_c45_ids() and include the dev_addr parameter obtained from the 
device tree?  (a propagation of this parameter across all functions 
all the way to
get_phy_c45_devs_in_pkg?) This will impact xgbe-mdio.c, fixed_phy.c 
because get_phy_device() is used in these files. 

 
 The "catch" is to transport the dev-addr value from of_mdio.c (location of the 
 loop of the PHY device tree node which reads all PHY node properties) to 
phy_device.c (this is where you can get the PHY ID).
My understanding from Andrew's comment is that the key here is the c45_ids, and 
that these could be filled in of_mdio.c, first, with the IDs from dev-addr (he 
called them "static" as they are queried directly by using the value of 
dev-addr) and afterwards, in phy_device.c (following the lookup loop - in a 
"dynamic" way).
There's nothing more to this patch than some functionality from phy_device.c 
ported to of_mdio.c, to enable the extraction of the PHY IDs. 
I guess the code redundancy could be reduced (between of_mdio.c and 
phy_device.c) and maybe you or Andrew could comment on this if you would like 
to go with this patch approach.

Not sure I understand your comment about the specific use case and the breaking 
badly part.  
Right now I'm able to test because I have access to a PHY with dev-addr = 0x1e. 
But the whole mechanism in this patch starts to work the moment you set 
 in the device tree. If you don't set that, nothing happens. If you 
set it to a bogus value, no PHY ID will be found at that address. Besides that, 
the PHY ID extraction code is the same as what is currently working in 
phy_device.c. 
80-90% of the patch is based on what already exists in phy_device.c and 
of_mdio.c. Where is the breaking badly part supposed to happen? 

> And please, can you keep me copied for next submissions?
Yes, the "to" list was pretty long and I somehow missed you. Sorry.
 
Vicentiu


Re: [PATCH v3 2/2] iio: afe: unit-converter: new driver

2018-04-18 Thread Jonathan Cameron
On Mon, 16 Apr 2018 09:12:45 +0200
Peter Rosin  wrote:

> On 2018-04-15 19:31, Jonathan Cameron wrote:
> > On Tue, 10 Apr 2018 17:28:02 +0200
> > Peter Rosin  wrote:
> >   
> >> If an ADC channel measures the midpoint of a voltage divider, the
> >> interesting voltage is often the voltage over the full resistance.
> >> E.g. if the full voltage is too big for the ADC to handle.
> >> Likewise, if an ADC channel measures the voltage across a shunt
> >> resistor, the interesting value is often the current through the
> >> resistor.
> >>
> >> This driver solves both problems by allowing to linearly scale a channel
> >> and by allowing changes to the type of the channel. Or both.
> >>
> >> Signed-off-by: Peter Rosin   
> > So I 'think' the only outstanding question is Andrew's one about the driver
> > name.  We aren't in a hurry at this point in the kernel cycle, so lets
> > wait until that discussion has ended.  Assuming that we do possibly end
> > up with a change, then please roll all the patches up into a single series
> > to avoid me getting confusion.  
> 
> Yeah, sure, sorry for the split series, but the lt6106 that's present in
> one of our newer designs didn't occur to me until just seconds after
> firing the first half of the series. Which is kind of typical...
> 
> Anyway, about the driver naming. The suggestion I like best so far is
> linear-scaler from Linus W, but thinking about it some more I think I
> like iio-rescale even better.
> 
> Any objections to iio-rescale?

Works for me. But then I rarely care 'that much' about naming and am
responsible for plenty of previous confusing choices ;)

Jonathan

> 
> Cheers,
> Peter
> --
> To unsubscribe from this list: send the line "unsubscribe linux-iio" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



[RESEND][PATCH 0/4] Few NFC fixes from android-4.14 tree

2018-04-18 Thread Amit Pundir
Hi,

Resending few NFC fixes I picked up from android-4.14 tree[1]
for review and comments. They seem reasonable upstream candidates.
My last attempt was not timed properly and it got lost between
Christmas-New Year break and then Meltdown-Spectre happened.

Also like to point out that I have not feature tested these patches
at all. Only made small cosmetic changes to the original patches
(removed Android-only tag and internal bug ID) and build tested for
arm/arm64 defconfigs, before posting them here for review.

Really appreciate any comments or feedback on how to take it forward.

Regards,
Amit Pundir
[1] https://android.googlesource.com/kernel/common/+log/android-4.14

Suren Baghdasaryan (4):
  NFC: st21nfca: Fix out of bounds kernel access when handling ATR_REQ
  NFC: st21nfca: Fix memory OOB and leak issues in connectivity events
handler
  NFC: Fix possible memory corruption when handling SHDLC I-Frame
commands
  NFC: fdp: Fix possible buffer overflow in WCS4000 NFC driver

 drivers/nfc/fdp/i2c.c  | 10 ++
 drivers/nfc/st21nfca/dep.c |  3 ++-
 drivers/nfc/st21nfca/se.c  | 18 ++
 net/nfc/hci/core.c | 10 ++
 4 files changed, 36 insertions(+), 5 deletions(-)

-- 
2.7.4



[RESEND][PATCH 4/4] NFC: fdp: Fix possible buffer overflow in WCS4000 NFC driver

2018-04-18 Thread Amit Pundir
From: Suren Baghdasaryan 

Possible buffer overflow when reading next_read_size bytes into
tmp buffer after next_read_size was extracted from a previous packet.

Signed-off-by: Suren Baghdasaryan 
Signed-off-by: Amit Pundir 
---
 drivers/nfc/fdp/i2c.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/nfc/fdp/i2c.c b/drivers/nfc/fdp/i2c.c
index c4da50e07bbc..08a4f82a2965 100644
--- a/drivers/nfc/fdp/i2c.c
+++ b/drivers/nfc/fdp/i2c.c
@@ -176,6 +176,16 @@ static int fdp_nci_i2c_read(struct fdp_i2c_phy *phy, 
struct sk_buff **skb)
/* Packet that contains a length */
if (tmp[0] == 0 && tmp[1] == 0) {
phy->next_read_size = (tmp[2] << 8) + tmp[3] + 3;
+   /*
+* Ensure next_read_size does not exceed sizeof(tmp)
+* for reading that many bytes during next iteration
+*/
+   if (phy->next_read_size > FDP_NCI_I2C_MAX_PAYLOAD) {
+   dev_dbg(>dev, "%s: corrupted packet\n",
+   __func__);
+   phy->next_read_size = 5;
+   goto flush;
+   }
} else {
phy->next_read_size = FDP_NCI_I2C_MIN_PAYLOAD;
 
-- 
2.7.4



[RESEND][PATCH 3/4] NFC: Fix possible memory corruption when handling SHDLC I-Frame commands

2018-04-18 Thread Amit Pundir
From: Suren Baghdasaryan 

When handling SHDLC I-Frame commands "pipe" field used for indexing
into an array should be checked before usage. If left unchecked it
might access memory outside of the array of size NFC_HCI_MAX_PIPES(127).

Signed-off-by: Suren Baghdasaryan 
Signed-off-by: Amit Pundir 
---
 net/nfc/hci/core.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/net/nfc/hci/core.c b/net/nfc/hci/core.c
index ac8030c4bcf8..19cb2e473ea6 100644
--- a/net/nfc/hci/core.c
+++ b/net/nfc/hci/core.c
@@ -209,6 +209,11 @@ void nfc_hci_cmd_received(struct nfc_hci_dev *hdev, u8 
pipe, u8 cmd,
}
create_info = (struct hci_create_pipe_resp *)skb->data;
 
+   if (create_info->pipe >= NFC_HCI_MAX_PIPES) {
+   status = NFC_HCI_ANY_E_NOK;
+   goto exit;
+   }
+
/* Save the new created pipe and bind with local gate,
 * the description for skb->data[3] is destination gate id
 * but since we received this cmd from host controller, we
@@ -232,6 +237,11 @@ void nfc_hci_cmd_received(struct nfc_hci_dev *hdev, u8 
pipe, u8 cmd,
}
delete_info = (struct hci_delete_pipe_noti *)skb->data;
 
+   if (delete_info->pipe >= NFC_HCI_MAX_PIPES) {
+   status = NFC_HCI_ANY_E_NOK;
+   goto exit;
+   }
+
hdev->pipes[delete_info->pipe].gate = NFC_HCI_INVALID_GATE;
hdev->pipes[delete_info->pipe].dest_host = NFC_HCI_INVALID_HOST;
break;
-- 
2.7.4



Re: [PATCH v6 01/11] ARM: sunxi: smp: Move assembly code into a file

2018-04-18 Thread Chen-Yu Tsai
On Wed, Apr 18, 2018 at 4:45 PM, Maxime Ripard
 wrote:
> On Tue, Apr 17, 2018 at 07:25:15PM +0800, Chen-Yu Tsai wrote:
>> On Tue, Apr 17, 2018 at 7:17 PM, Maxime Ripard
>>  wrote:
>> > On Tue, Apr 17, 2018 at 11:12:41AM +0800, Chen-Yu Tsai wrote:
>> >> On Tue, Apr 17, 2018 at 5:50 AM, Mylène Josserand
>> >>  wrote:
>> >> > Move the assembly code for cluster cache enabling and resuming
>> >> > into an assembly file instead of having it directly in C code.
>> >> >
>> >> > Remove the CFLAGS because we are using the ARM directive "arch"
>> >> > instead.
>> >> >
>> >> > Signed-off-by: Mylène Josserand 
>> >> > ---
>> >> >  arch/arm/mach-sunxi/Makefile  |  4 +--
>> >> >  arch/arm/mach-sunxi/headsmp.S | 80 
>> >> > +
>> >> >  arch/arm/mach-sunxi/mc_smp.c  | 82 
>> >> > +++
>> >> >  3 files changed, 85 insertions(+), 81 deletions(-)
>> >> >  create mode 100644 arch/arm/mach-sunxi/headsmp.S
>> >>
>> >> I'm still not convinced about this whole "move ASM to separate
>> >> file" thing, especially now that you aren't actually adding any
>> >> sunxi-specific ASM code beyond a simple function call.
>> >>
>> >> Could you drop this for now?
>> >
>> > I'd really like to have this merged actually. There's a significant
>> > readibility improvement, so even if there's no particular functional
>> > improvement, I'd still call it a win.
>>
>> What parts do you consider hard to read? The extra quotes? Trailing
>> newline? Or perhaps the __stringify bits?
>
> All of this, plus the clobbers and operands.

Ok. Lets move it then.

The kbuild reports indicate this still needs some work though.

ChenYu


Re: [RFC PATCH spi] spi: pxa2xx: pxa2xx_spi_transfer_one() can be static

2018-04-18 Thread Jarkko Nikula

On 04/17/18 22:53, kbuild test robot wrote:


Fixes: d5898e19c0d7 ("spi: pxa2xx: Use core message processing loop")
Signed-off-by: Fengguang Wu 
---
  spi-pxa2xx.c |6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/spi/spi-pxa2xx.c b/drivers/spi/spi-pxa2xx.c
index c852ea5..40f1346 100644
--- a/drivers/spi/spi-pxa2xx.c
+++ b/drivers/spi/spi-pxa2xx.c
@@ -911,9 +911,9 @@ static bool pxa2xx_spi_can_dma(struct spi_controller 
*master,
   xfer->len >= chip->dma_burst_size;
  }
  
-int pxa2xx_spi_transfer_one(struct spi_controller *master,

-   struct spi_device *spi,
-   struct spi_transfer *transfer)
+static int pxa2xx_spi_transfer_one(struct spi_controller *master,
+  struct spi_device *spi,
+  struct spi_transfer *transfer)


Thanks Fengguang. I don't understand how I managed to drop "static" 
while doing manual s/pump_transfers/pxa2xx_spi_transfer_one/ :-)


Reviewed-by: Jarkko Nikula 


Re: [PATCH net-next v4 0/3] kernel: add support to collect hardware logs in crash recovery kernel

2018-04-18 Thread Dave Young
Hi Rahul,
On 04/17/18 at 01:14pm, Rahul Lakkireddy wrote:
> On production servers running variety of workloads over time, kernel
> panic can happen sporadically after days or even months. It is
> important to collect as much debug logs as possible to root cause
> and fix the problem, that may not be easy to reproduce. Snapshot of
> underlying hardware/firmware state (like register dump, firmware
> logs, adapter memory, etc.), at the time of kernel panic will be very
> helpful while debugging the culprit device driver.
> 
> This series of patches add new generic framework that enable device
> drivers to collect device specific snapshot of the hardware/firmware
> state of the underlying device in the crash recovery kernel. In crash
> recovery kernel, the collected logs are added as elf notes to
> /proc/vmcore, which is copied by user space scripts for post-analysis.
> 
> The sequence of actions done by device drivers to append their device
> specific hardware/firmware logs to /proc/vmcore are as follows:
> 
> 1. During probe (before hardware is initialized), device drivers
> register to the vmcore module (via vmcore_add_device_dump()), with
> callback function, along with buffer size and log name needed for
> firmware/hardware log collection.

I assumed the elf notes info should be prepared while kexec_[file_]load
phase. But I did not read the old comment, not sure if it has been discussed
or not.

If do this in 2nd kernel a question is driver can be loaded later than vmcore 
init.
How to guarantee the function works if vmcore reading happens before
the driver is loaded?

Also it is possible that kdump initramfs does not contains the driver
module.

Am I missing something?

> 
> 2. vmcore module allocates the buffer with requested size. It adds
> an elf note and invokes the device driver's registered callback
> function.
> 
> 3. Device driver collects all hardware/firmware logs into the buffer
> and returns control back to vmcore module.
> 
> The device specific hardware/firmware logs can be seen as elf notes:
> 
> # readelf -n /proc/vmcore
> 
> Displaying notes found at file offset 0x1000 with length 0x04003288:
>   Owner Data size Description
>   VMCOREDD_cxgb4_:02:00.4 0x02000fd8  Unknown note type: (0x0700)
>   VMCOREDD_cxgb4_:04:00.4 0x02000fd8  Unknown note type: (0x0700)
>   CORE 0x0150 NT_PRSTATUS (prstatus structure)
>   CORE 0x0150 NT_PRSTATUS (prstatus structure)
>   CORE 0x0150 NT_PRSTATUS (prstatus structure)
>   CORE 0x0150 NT_PRSTATUS (prstatus structure)
>   CORE 0x0150 NT_PRSTATUS (prstatus structure)
>   CORE 0x0150 NT_PRSTATUS (prstatus structure)
>   CORE 0x0150 NT_PRSTATUS (prstatus structure)
>   CORE 0x0150 NT_PRSTATUS (prstatus structure)
>   VMCOREINFO   0x074f Unknown note type: (0x)
> 
> Patch 1 adds API to vmcore module to allow drivers to register callback
> to collect the device specific hardware/firmware logs.  The logs will
> be added to /proc/vmcore as elf notes.
> 
> Patch 2 updates read and mmap logic to append device specific hardware/
> firmware logs as elf notes.
> 
> Patch 3 shows a cxgb4 driver example using the API to collect
> hardware/firmware logs in crash recovery kernel, before hardware is
> initialized.
> 
> Thanks,
> Rahul
> 
> RFC v1: https://lkml.org/lkml/2018/3/2/542
> RFC v2: https://lkml.org/lkml/2018/3/16/326
> 
> ---
> v4:
> - Made __vmcore_add_device_dump() static.
> - Moved compile check to define vmcore_add_device_dump() to
>   crash_dump.h to fix compilation when vmcore.c is not compiled in.
> - Convert ---help--- to help in Kconfig as indicated by checkpatch.
> - Rebased to tip.
> 
> v3:
> - Dropped sysfs crashdd module.
> - Exported dumps as elf notes. Suggested by Eric Biederman
>   .  Added as patch 2 in this version.
> - Added CONFIG_PROC_VMCORE_DEVICE_DUMP to allow configuring device
>   dump support.
> - Moved logic related to adding dumps from crashdd to vmcore module.
> - Rename all crashdd* to vmcoredd*.
> - Updated comments.
> 
> v2:
> - Added ABI Documentation for crashdd.
> - Directly use octal permission instead of macro.
> 
> Changes since rfc v2:
> - Moved exporting crashdd from procfs to sysfs. Suggested by
>   Stephen Hemminger 
> - Moved code from fs/proc/crashdd.c to fs/crashdd/ directory.
> - Replaced all proc API with sysfs API and updated comments.
> - Calling driver callback before creating the binary file under
>   crashdd sysfs.
> - Changed binary dump file permission from S_IRUSR to S_IRUGO.
> - Changed module name from CRASH_DRIVER_DUMP to CRASH_DEVICE_DUMP.
> 
> rfc v2:
> - Collecting logs in 2nd kernel instead of during kernel panic.
>   Suggested by Eric Biederman .
> - Added new crashdd 

Re: [PATCH v3 2/2] iommu/amd: Add basic debugfs infrastructure for AMD IOMMU

2018-04-18 Thread Yang, Shunyong
Hi, Gary and Sohil,

On Tue, 2018-04-17 at 13:38 -0400, Hook, Gary wrote:
> On 4/13/2018 8:08 PM, Mehta, Sohil wrote:
> > 
> > On Fri, 2018-04-06 at 08:17 -0500, Gary R Hook wrote:
> > > 
> > >   
> > > +
> > > +void amd_iommu_debugfs_setup(struct amd_iommu *iommu)
> > > +{
> > > + char name[MAX_NAME_LEN + 1];
> > > + struct dentry *d_top;
> > > +
> > > + if (!debugfs_initialized())
> > Probably not needed.
> Right.

When will this check is needed?
IMO, this function is to check debugfs ready status before we want to
use debugfs. I just want to understand when we should use
debugfs_initialized();

Thanks.
Shunyong.

> 
> > 
> > 
> > > 
> > > + return;
> > > +
> > > + mutex_lock(_iommu_debugfs_lock);
> > > + if (!amd_iommu_debugfs) {
> > > + d_top = iommu_debugfs_setup();
> > > + if (d_top)
> > > + amd_iommu_debugfs =
> > > debugfs_create_dir("amd", d_top);
> > > + }
> > > + mutex_unlock(_iommu_debugfs_lock);
> > 
> > You can do the above only once if you iterate over the IOMMUs here
> >   instead of doing it in amd_iommu_init.
> I'm not sure it matters, given the finite number of IOMMUs in a
> system, 
> and the fact that this work is done exactly once. However, removal of
> a 
> lock is fine thing, so I'll move this around.
> 
> > 
> > 
> > > 
> > > + if (amd_iommu_debugfs) {
> > > + snprintf(name, MAX_NAME_LEN, "iommu%02d", iommu-
> > > > 
> > > > index);
> > > + iommu->debugfs = debugfs_create_dir(name,
> > > + amd_iommu_de
> > > bugf
> > > s);
> > > + if (!iommu->debugfs) {
> > > + debugfs_remove_recursive(amd_iommu_debug
> > > fs);
> > > + amd_iommu_debugfs = NULL;
> > > + }
> > > + }
> > > +}
> > -Sohil
> > 
> ___
> iommu mailing list
> io...@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy helper DRM driver

2018-04-18 Thread Oleksandr Andrushchenko

On 04/17/2018 11:57 PM, Dongwon Kim wrote:

On Tue, Apr 17, 2018 at 09:59:28AM +0200, Daniel Vetter wrote:

On Mon, Apr 16, 2018 at 12:29:05PM -0700, Dongwon Kim wrote:

Yeah, I definitely agree on the idea of expanding the use case to the
general domain where dmabuf sharing is used. However, what you are
targetting with proposed changes is identical to the core design of
hyper_dmabuf.

On top of this basic functionalities, hyper_dmabuf has driver level
inter-domain communication, that is needed for dma-buf remote tracking
(no fence forwarding though), event triggering and event handling, extra
meta data exchange and hyper_dmabuf_id that represents grefs
(grefs are shared implicitly on driver level)

This really isn't a positive design aspect of hyperdmabuf imo. The core
code in xen-zcopy (ignoring the ioctl side, which will be cleaned up) is
very simple & clean.

If there's a clear need later on we can extend that. But for now xen-zcopy
seems to cover the basic use-case needs, so gets the job done.


Also it is designed with frontend (common core framework) + backend
(hyper visor specific comm and memory sharing) structure for portability.
We just can't limit this feature to Xen because we want to use the same
uapis not only for Xen but also other applicable hypervisor, like ACORN.

See the discussion around udmabuf and the needs for kvm. I think trying to
make an ioctl/uapi that works for multiple hypervisors is misguided - it
likely won't work.

On top of that the 2nd hypervisor you're aiming to support is ACRN. That's
not even upstream yet, nor have I seen any patches proposing to land linux
support for ACRN. Since it's not upstream, it doesn't really matter for
upstream consideration. I'm doubting that ACRN will use the same grant
references as xen, so the same uapi won't work on ACRN as on Xen anyway.

Yeah, ACRN doesn't have grant-table. Only Xen supports it. But that is why
hyper_dmabuf has been architectured with the concept of backend.
If you look at the structure of backend, you will find that
backend is just a set of standard function calls as shown here:

struct hyper_dmabuf_bknd_ops {
 /* backend initialization routine (optional) */
 int (*init)(void);

 /* backend cleanup routine (optional) */
 int (*cleanup)(void);

 /* retreiving id of current virtual machine */
 int (*get_vm_id)(void);

 /* get pages shared via hypervisor-specific method */
 int (*share_pages)(struct page **pages, int vm_id,
int nents, void **refs_info);

 /* make shared pages unshared via hypervisor specific method */
 int (*unshare_pages)(void **refs_info, int nents);

 /* map remotely shared pages on importer's side via
  * hypervisor-specific method
  */
 struct page ** (*map_shared_pages)(unsigned long ref, int vm_id,
int nents, void **refs_info);

 /* unmap and free shared pages on importer's side via
  * hypervisor-specific method
  */
 int (*unmap_shared_pages)(void **refs_info, int nents);

 /* initialize communication environment */
 int (*init_comm_env)(void);

 void (*destroy_comm)(void);

 /* upstream ch setup (receiving and responding) */
 int (*init_rx_ch)(int vm_id);

 /* downstream ch setup (transmitting and parsing responses) */
 int (*init_tx_ch)(int vm_id);

 int (*send_req)(int vm_id, struct hyper_dmabuf_req *req, int wait);
};

All of these can be mapped with any hypervisor specific implementation.
We designed backend implementation for Xen using grant-table, Xen event
and ring buffer communication. For ACRN, we have another backend using Virt-IO
for both memory sharing and communication.

We tried to define this structure of backend to make it general enough (or
it can be even modified or extended to support more cases.) so that it can
fit to other hypervisor cases. Only requirements/expectation on the hypervisor
are page-level memory sharing and inter-domain communication, which I think
are standard features of modern hypervisor.

And please review common UAPIs that hyper_dmabuf and xen-zcopy supports. They
are very general. One is getting FD (dmabuf) and get those shared. The other
is generating dmabuf from global handle (secure handle hiding gref behind it).
On top of this, hyper_dmabuf has "unshare" and "query" which are also useful
for any cases.

So I don't know why we wouldn't want to try to make these standard in most of
hypervisor cases instead of limiting it to certain hypervisor like Xen.
Frontend-backend structre is optimal for this I think.


So I am wondering we can start with this hyper_dmabuf then modify it for
your use-case if needed and polish and fix any glitches if we want to
to use this for all general dma-buf usecases.

Imo xen-zcopy is a much more reasonable starting point for upstream, which
can 

[PATCH v2] module: Fix display of wrong module .text address

2018-04-18 Thread Thomas Richter
Fixes: ef0010a30935 ("vsprintf: don't use 'restricted_pointer()'
when not restricting") for /sys/module/*/sections/.text file.

Reading file /proc/modules shows the correct address:
[root@s35lp76 ~]# cat /proc/modules | egrep '^qeth_l2'
qeth_l2 94208 1 - Live 0x03ff80401000

and reading file /sys/module/qeth_l2/sections/.text
[root@s35lp76 ~]# cat /sys/module/qeth_l2/sections/.text
0x18ea8363
displays a random address.

This breaks the perf tool which uses this address on s390
to calculate start of .text section in memory.

Fix this by printing the correct (unhashed) address.

Thanks to Jessica Yu for helping on this.

Suggested-by: Linus Torvalds 
Signed-off-by: Thomas Richter 
Cc: Jessica Yu 
Cc: sta...@vger.kernel.org
---
 kernel/module.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/module.c b/kernel/module.c
index a6e43a5806a1..40b42000bd80 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -1472,7 +1472,8 @@ static ssize_t module_sect_show(struct module_attribute 
*mattr,
 {
struct module_sect_attr *sattr =
container_of(mattr, struct module_sect_attr, mattr);
-   return sprintf(buf, "0x%pK\n", (void *)sattr->address);
+   return sprintf(buf, "0x%px\n", kptr_restrict < 2 ?
+  (void *)sattr->address : 0);
 }
 
 static void free_sect_attrs(struct module_sect_attrs *sect_attrs)
-- 
2.14.3



Re: [RFC 2/6] dmaengine: xilinx_dma: Pass AXI4-Stream control words to netdev dma client

2018-04-18 Thread Peter Ujfalusi


On 2018-04-17 18:42, Vinod Koul wrote:
> On Tue, Apr 17, 2018 at 04:46:43PM +0300, Peter Ujfalusi wrote:
> 
>> @@ -709,6 +709,11 @@ struct dma_filter {
>>   *  be called after period_len bytes have been transferred.
>>   * @device_prep_interleaved_dma: Transfer expression in a generic way.
>>   * @device_prep_dma_imm_data: DMA's 8 byte immediate data to the dst address
>> + * @device_attach_metadata: Some DMA engines can send and receive side band
>> + *  information, commands or parameters which is not transferred within the
>> + *  data stream itself. In such case clients can set the metadata to the
>> + *  given descriptor and it is going to be sent to the peripheral, or in
>> + *  case of DEV_TO_MEM the provided buffer will receive the metadata.
>>   * @device_config: Pushes a new configuration to a channel, return 0 or an 
>> error
>>   *  code
>>   * @device_pause: Pauses any transfer happening on a channel. Returns
>> @@ -796,6 +801,9 @@ struct dma_device {
>>  struct dma_chan *chan, dma_addr_t dst, u64 data,
>>  unsigned long flags);
>>  
>> +int (*device_attach_metadata)(struct dma_async_tx_descriptor *desc,
>> +  void *data, size_t len);
> 
> while i am okay with the concept, I would not want to go again the custom
> pointer route, this is a no-go for me.
> 
> Instead lets add the vendor data, define that explicitly. We can use struct,
> tokens or something else to define these. But lets try to stay away from
> opaque objects please :-)

The DMA does not interpret the metadata, it is information which can be
only understood by the client driver and the remote peripheral. It is
just chunk of data (parameters, timestamps, keys, etc) that needs to
travel along with the payload.

The content is not relevant for the DMA itself.

- Péter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki


Re: [v5,05/13] ARM: dts: ipq4019: Add ipq4019-ap.dk04.dtsi

2018-04-18 Thread Sven Eckelmann
On Mittwoch, 18. April 2018 08:59:46 CEST Sven Eckelmann wrote:
[...]
> I would not know how to disable QSEE on these boards and thus would assume 
> that it should be part of this dtsi.


Just did some reviews of the reserved-memory regions in other QCA devices and 
it looks like this tz and smem are often directly added to the SoC dtsi. So I 
will prepare a similar change for qcom-ipq4019.dtsi and this would then solve 
it for AP-DK01/04/07 and no changes in the board-family specific dtsi would be 
necessary.

But maybe someone has an objection because tz and smem can actually be 
disabled in a sane way on these SoCs and thus it would be better to have these 
regions in the board specific dts(i) files. We will see...

Kind regards,
Sven

signature.asc
Description: This is a digitally signed message part.


Re: [PATCH] Bluetooth: hci_qca: Avoid missing rampatch failure with userspace fw loader

2018-04-18 Thread Marcel Holtmann
Hi Amit,

> AOSP use userspace firmware loader to load firmwares, which will
> return -EAGAIN in case qca/rampatch_00440302.bin is not found.
> Since there is no rampatch for dragonboard820c QCA controller
> revision, just make it work as is.
> 
> CC: Loic Poulain 
> CC: Nicolas Dechesne 
> CC: Marcel Holtmann 
> CC: Johan Hedberg 
> CC: Stable 
> Signed-off-by: Amit Pundir 
> ---
> drivers/bluetooth/hci_qca.c | 6 ++
> 1 file changed, 6 insertions(+)

patch has been applied to bluetooth-next tree.

Regards

Marcel



Re: [PATCH] mm:memcg: add __GFP_NOWARN in __memcg_schedule_kmem_cache_create

2018-04-18 Thread Minchan Kim
On Wed, Apr 18, 2018 at 09:20:02AM +0200, Michal Hocko wrote:
> On Wed 18-04-18 11:29:12, Minchan Kim wrote:
> > If there are heavy memory pressure, page allocation with __GFP_NOWAIT
> > fails easily although it's order-0 request.
> > I got below warning 9 times for normal boot.
> > 
> > [   17.072747] c0 0  : page allocation failure: order:0, 
> > mode:0x220(GFP_NOWAIT|__GFP_NOTRACK)
> > < snip >
> > [   17.072789] c0 0  Call trace:
> > [   17.072803] c0 0  [] dump_backtrace+0x0/0x4
> > [   17.072813] c0 0  [] dump_stack+0xa4/0xc0
> > [   17.072822] c0 0  [] warn_alloc+0xd4/0x15c
> > [   17.072829] c0 0  [] 
> > __alloc_pages_nodemask+0xf88/0x10fc
> > [   17.072838] c0 0  [] alloc_slab_page+0x40/0x18c
> > [   17.072843] c0 0  [] new_slab+0x2b8/0x2e0
> > [   17.072849] c0 0  [] ___slab_alloc+0x25c/0x464
> > [   17.072858] c0 0  [] __kmalloc+0x394/0x498
> > [   17.072865] c0 0  [] 
> > memcg_kmem_get_cache+0x114/0x2b8
> > [   17.072870] c0 0  [] kmem_cache_alloc+0x98/0x3e8
> > [   17.072878] c0 0  [] mmap_region+0x3bc/0x8c0
> > [   17.072884] c0 0  [] do_mmap+0x40c/0x43c
> > [   17.072890] c0 0  [] vm_mmap_pgoff+0x15c/0x1e4
> > [   17.072898] c0 0  [] sys_mmap+0xb0/0xc8
> > [   17.072904] c0 0  [] el0_svc_naked+0x24/0x28
> > [   17.072908] c0 0  Mem-Info:
> > [   17.072920] c0 0  active_anon:17124 inactive_anon:193 isolated_anon:0
> > [   17.072920] c0 0   active_file:7898 inactive_file:712955 
> > isolated_file:55
> > [   17.072920] c0 0   unevictable:0 dirty:27 writeback:18 unstable:0
> > [   17.072920] c0 0   slab_reclaimable:12250 slab_unreclaimable:23334
> > [   17.072920] c0 0   mapped:19310 shmem:212 pagetables:816 bounce:0
> > [   17.072920] c0 0   free:36561 free_pcp:1205 free_cma:35615
> > [   17.072933] c0 0  Node 0 active_anon:68496kB inactive_anon:772kB 
> > active_file:31592kB inactive_file:2851820kB unevictable:0kB 
> > isolated(anon):0kB isolated(file):220kB mapped:77240kB dirty:108kB 
> > writeback:72kB shmem:848kB writeback_tmp:0kB unstable:0kB 
> > all_unreclaimable? no
> > [   17.072945] c0 0  DMA free:142188kB min:3056kB low:3820kB 
> > high:4584kB active_anon:10052kB inactive_anon:12kB active_file:312kB 
> > inactive_file:1412620kB unevictable:0kB writepending:0kB present:1781412kB 
> > managed:1604728kB mlocked:0kB slab_reclaimable:3592kB 
> > slab_unreclaimable:876kB kernel_stack:400kB pagetables:52kB bounce:0kB 
> > free_pcp:1436kB local_pcp:124kB free_cma:142492kB
> > [   17.072949] c0 0  lowmem_reserve[]: 0 1842 1842
> > [   17.072966] c0 0  Normal free:4056kB min:4172kB low:5212kB 
> > high:6252kB active_anon:58376kB inactive_anon:760kB active_file:31348kB 
> > inactive_file:1439040kB unevictable:0kB writepending:180kB 
> > present:2000636kB managed:1923688kB mlocked:0kB slab_reclaimable:45408kB 
> > slab_unreclaimable:92460kB kernel_stack:9680kB pagetables:3212kB bounce:0kB 
> > free_pcp:3392kB local_pcp:688kB free_cma:0kB
> > [   17.072971] c0 0  lowmem_reserve[]: 0 0 0
> > [   17.072982] c0 0  DMA: 0*4kB 0*8kB 1*16kB (C) 0*32kB 0*64kB 0*128kB 
> > 1*256kB (C) 1*512kB (C) 0*1024kB 1*2048kB (C) 34*4096kB (C) = 142096kB
> > [   17.073024] c0 0  Normal: 228*4kB (UMEH) 172*8kB (UMH) 23*16kB (UH) 
> > 24*32kB (H) 5*64kB (H) 1*128kB (H) 0*256kB 0*512kB 0*1024kB 0*2048kB 
> > 0*4096kB = 3872kB
> > [   17.073069] c0 0  721350 total pagecache pages
> > [   17.073073] c0 0  0 pages in swap cache
> > [   17.073078] c0 0  Swap cache stats: add 0, delete 0, find 0/0
> > [   17.073081] c0 0  Free swap  = 0kB
> > [   17.073085] c0 0  Total swap = 0kB
> > [   17.073089] c0 0  945512 pages RAM
> > [   17.073093] c0 0  0 pages HighMem/MovableOnly
> > [   17.073097] c0 0  63408 pages reserved
> > [   17.073100] c0 0  51200 pages cma reserved
> > 
> > Let's not make user scared.
> 
> This is not a proper explanation. So what exactly happens when this
> allocation fails? I would suggest something like the following
> "
> __memcg_schedule_kmem_cache_create tries to create a shadow slab cache
> and the worker allocation failure is not really critical because we will
> retry on the next kmem charge. We might miss some charges but that
> shouldn't be critical. The excessive allocation failure report is not
> very much helpful. Replace it with a rate limited single line output so
> that we know that there is a lot of these failures and that we need to
> do something about it in future.
> "
> 
> With the last part to be implemented of course.

If you want to see warning and catch on it in future, I don't see any reason
to change it. Because I didn't see any excessive warning output that it could
make system slow unless we did ratelimiting.

It was a just report from non-MM guys who have a concern that somethings
might go wrong on the system. I just wanted them relax since it's not
critical.


Re: [PATCH v7 0/4] Bluetooth: hci_qca: Add serdev support

2018-04-18 Thread Marcel Holtmann
Hi Thierry,

> This patchset enables the Qualcomm BT controller QCA6174 node in the
> device tree of the db820c board. This allows the bluetooth chipset to
> be probed and registered against the hci layer by using the serdev
> framework.
> 
> This patchset also contains the documentation for the compatible
> string "qcom,qca6174-bt" related to this chipset.
> 
> v7:
> - Add a new patch enabling regulators and gpios for the bt/wlan
>  combo chip
> 
> v6:
> - Move pinctrl properties into subnodes
> - fix binding documentation
> 
> v5:
> - Rename 'bt-disable-n' gpio as 'enable'
> 
> v4:
> - Fix dt binding documentation
> - Address some other issues in patch #3
> 
> v3:
> - Address comments for patch #3 (details in patch)
> 
> v2:
> - Fix author email
> 
> 
> Srinivas Kandagatla (1):
>  arm64: dts: apq8096-db820c: Enable wlan and bt en pins
> 
> Thierry Escande (3):
>  arm64: dts: apq8096-db820c: enable bluetooth node
>  dt-bindings: net: bluetooth: Add qualcomm-bluetooth
>  Bluetooth: hci_qca: Add serdev support
> 
> .../devicetree/bindings/net/qualcomm-bluetooth.txt |  30 ++
> arch/arm64/boot/dts/qcom/apq8096-db820c-pins.dtsi  |  26 +
> .../boot/dts/qcom/apq8096-db820c-pmic-pins.dtsi|  32 ++
> arch/arm64/boot/dts/qcom/apq8096-db820c.dtsi   |  62 
> arch/arm64/boot/dts/qcom/msm8996.dtsi  |  10 ++
> drivers/bluetooth/Kconfig  |   1 +
> drivers/bluetooth/hci_qca.c| 109 -
> 7 files changed, 268 insertions(+), 2 deletions(-)
> create mode 100644 
> Documentation/devicetree/bindings/net/qualcomm-bluetooth.txt

all 4 patches have been applied to bluetooth-next tree.

Regards

Marcel



Re: [PATCH] mm:memcg: add __GFP_NOWARN in __memcg_schedule_kmem_cache_create

2018-04-18 Thread Michal Hocko
On Wed 18-04-18 16:41:17, Minchan Kim wrote:
> On Wed, Apr 18, 2018 at 09:20:02AM +0200, Michal Hocko wrote:
> > On Wed 18-04-18 11:29:12, Minchan Kim wrote:
[...]
> > > Let's not make user scared.
> > 
> > This is not a proper explanation. So what exactly happens when this
> > allocation fails? I would suggest something like the following
> > "
> > __memcg_schedule_kmem_cache_create tries to create a shadow slab cache
> > and the worker allocation failure is not really critical because we will
> > retry on the next kmem charge. We might miss some charges but that
> > shouldn't be critical. The excessive allocation failure report is not
> > very much helpful. Replace it with a rate limited single line output so
> > that we know that there is a lot of these failures and that we need to
> > do something about it in future.
> > "
> > 
> > With the last part to be implemented of course.
> 
> If you want to see warning and catch on it in future, I don't see any reason
> to change it. Because I didn't see any excessive warning output that it could
> make system slow unless we did ratelimiting.

Yeah, but a single line would be as much informative and less scary to
users.

> It was a just report from non-MM guys who have a concern that somethings
> might go wrong on the system. I just wanted them relax since it's not
> critical.

I do agree with __GFP_NOWARN but I think a single line warning is due
and helpful for further debugging.

-- 
Michal Hocko
SUSE Labs


Re: [PATCH v2 5/6] drm/atmel-hlcdc: add support for connecting to tda998x HDMI encoder

2018-04-18 Thread Peter Rosin
On 2018-04-18 09:36, Boris Brezillon wrote:
> On Tue, 17 Apr 2018 15:10:51 +0200
> Peter Rosin  wrote:
> 
>> When the of-graph points to a tda998x-compatible HDMI encoder, register
>> as a component master and bind to the encoder/connector provided by
>> the tda998x driver.
> 
> Can't we do the opposite: make the tda998x driver expose its devices as
> drm bridges. I'd rather not add another way to connect external
> encoders (or bridges) to display controller drivers, especially since,
> when I asked DRM maintainers/devs what was the good approach to
> represent such external encoders they pointed me to the drm_bridge
> interface.

>From the cover letter:

"However, I don't know if the tilcdc driver is interfacing with the
tda998x driver in a sane and modern way"

So, which way is the future? Should bridges become components or should
existing bridge-like components no longer be components? Are there others?

Cheers,
Peter


[PATCH 4/7] docs/vm: pagemap: change document title

2018-04-18 Thread Mike Rapoport
"pagemap from the Userspace Perspective" is not very descriptive for
unaware readers. Since the document describes how to examine a process page
tables, let's title it "Examining Process Page Tables"

Signed-off-by: Mike Rapoport 
---
 Documentation/vm/pagemap.rst | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/vm/pagemap.rst b/Documentation/vm/pagemap.rst
index 9644bc0..7ba8cbd 100644
--- a/Documentation/vm/pagemap.rst
+++ b/Documentation/vm/pagemap.rst
@@ -1,8 +1,8 @@
 .. _pagemap:
 
-==
-pagemap from the Userspace Perspective
-==
+=
+Examining Process Page Tables
+=
 
 pagemap is a new (as of 2.6.25) set of interfaces in the kernel that allow
 userspace programs to examine the page tables and related information by
-- 
2.7.4



Re: [PATCH v7 0/7] KVM: x86: Allow Qemu/KVM to use PVH entry point

2018-04-18 Thread Linus Walleij
I wonder why I am starting to get CCed on Xen patches all of a sudden.

I happened to run into Jürgen at a conference only last weekend, but
I still don't know anything whatsoever about Xen or how it works.

If get_maintainer.pl has started to return my name on this stuff I
really want to know why :/

Yours,
Linus Walleij


[PATCH] arm64: dts: correct SATA addresses for Stingray

2018-04-18 Thread Srinath Mannam
Correct all SATA ahci and phy controller register
addresses and interrupt lines to proper values.

Fixes: 344a2e514182 ("arm64: dts: Add SATA DT nodes for Stingray SoC")

Signed-off-by: Srinath Mannam 
Reviewed-by: Ray Jui 
Reviewed-by: Scott Branden 
Reviewed-by: Andrew Gospodarek 
---
 .../boot/dts/broadcom/stingray/stingray-sata.dtsi  | 80 +++---
 1 file changed, 40 insertions(+), 40 deletions(-)

diff --git a/arch/arm64/boot/dts/broadcom/stingray/stingray-sata.dtsi 
b/arch/arm64/boot/dts/broadcom/stingray/stingray-sata.dtsi
index 4b5465d..8c68e0c 100644
--- a/arch/arm64/boot/dts/broadcom/stingray/stingray-sata.dtsi
+++ b/arch/arm64/boot/dts/broadcom/stingray/stingray-sata.dtsi
@@ -36,11 +36,11 @@
#size-cells = <1>;
ranges = <0x0 0x0 0x67d0 0x0080>;
 
-   sata0: ahci@21 {
+   sata0: ahci@0 {
compatible = "brcm,iproc-ahci", "generic-ahci";
-   reg = <0x0021 0x1000>;
+   reg = <0x 0x1000>;
reg-names = "ahci";
-   interrupts = ;
+   interrupts = ;
#address-cells = <1>;
#size-cells = <0>;
status = "disabled";
@@ -52,9 +52,9 @@
};
};
 
-   sata_phy0: sata_phy@212100 {
+   sata_phy0: sata_phy@2100 {
compatible = "brcm,iproc-sr-sata-phy";
-   reg = <0x00212100 0x1000>;
+   reg = <0x2100 0x1000>;
reg-names = "phy";
#address-cells = <1>;
#size-cells = <0>;
@@ -66,11 +66,11 @@
};
};
 
-   sata1: ahci@31 {
+   sata1: ahci@1 {
compatible = "brcm,iproc-ahci", "generic-ahci";
-   reg = <0x0031 0x1000>;
+   reg = <0x0001 0x1000>;
reg-names = "ahci";
-   interrupts = ;
+   interrupts = ;
#address-cells = <1>;
#size-cells = <0>;
status = "disabled";
@@ -82,9 +82,9 @@
};
};
 
-   sata_phy1: sata_phy@312100 {
+   sata_phy1: sata_phy@12100 {
compatible = "brcm,iproc-sr-sata-phy";
-   reg = <0x00312100 0x1000>;
+   reg = <0x00012100 0x1000>;
reg-names = "phy";
#address-cells = <1>;
#size-cells = <0>;
@@ -96,11 +96,11 @@
};
};
 
-   sata2: ahci@12 {
+   sata2: ahci@2 {
compatible = "brcm,iproc-ahci", "generic-ahci";
-   reg = <0x0012 0x1000>;
+   reg = <0x0002 0x1000>;
reg-names = "ahci";
-   interrupts = ;
+   interrupts = ;
#address-cells = <1>;
#size-cells = <0>;
status = "disabled";
@@ -112,9 +112,9 @@
};
};
 
-   sata_phy2: sata_phy@122100 {
+   sata_phy2: sata_phy@22100 {
compatible = "brcm,iproc-sr-sata-phy";
-   reg = <0x00122100 0x1000>;
+   reg = <0x00022100 0x1000>;
reg-names = "phy";
#address-cells = <1>;
#size-cells = <0>;
@@ -126,11 +126,11 @@
};
};
 
-   sata3: ahci@13 {
+   sata3: ahci@3 {
compatible = "brcm,iproc-ahci", "generic-ahci";
-   reg = <0x0013 0x1000>;
+   reg = <0x0003 0x1000>;
reg-names = "ahci";
-   interrupts = ;
+   interrupts = ;
#address-cells = <1>;
#size-cells = <0>;
status = "disabled";
@@ -142,9 +142,9 @@
};
};
 
-   sata_phy3: sata_phy@132100 {
+   sata_phy3: sata_phy@32100 {
compatible = "brcm,iproc-sr-sata-phy";
-   reg = <0x00132100 0x1000>;
+   reg = <0x00032100 0x1000>;
reg-names = "phy";
#address-cells = <1>;
#size-cells = <0>;
@@ -156,11 +156,11 @@

Re: [PATCH v2 5/6] drm/atmel-hlcdc: add support for connecting to tda998x HDMI encoder

2018-04-18 Thread Boris Brezillon
On Wed, 18 Apr 2018 10:02:12 +0200
Peter Rosin  wrote:

> On 2018-04-18 09:36, Boris Brezillon wrote:
> > On Tue, 17 Apr 2018 15:10:51 +0200
> > Peter Rosin  wrote:
> >   
> >> When the of-graph points to a tda998x-compatible HDMI encoder, register
> >> as a component master and bind to the encoder/connector provided by
> >> the tda998x driver.  
> > 
> > Can't we do the opposite: make the tda998x driver expose its devices as
> > drm bridges. I'd rather not add another way to connect external
> > encoders (or bridges) to display controller drivers, especially since,
> > when I asked DRM maintainers/devs what was the good approach to
> > represent such external encoders they pointed me to the drm_bridge
> > interface.  
> 
> From the cover letter:
> 
> "However, I don't know if the tilcdc driver is interfacing with the
> tda998x driver in a sane and modern way"
> 
> So, which way is the future? Should bridges become components or should
> existing bridge-like components no longer be components? Are there others?

Well, what I've been told a while ago is that drm_bridge will take over
drm_encoder_slave and custom drm_encoder/drm_connector implementations
when it comes to representing bridges.

AFAIU, using the component framework to bind all elements of the
pipeline to the display controller is orthogonal to how you represent
elements in the pipeline. I mean, you could have a bridge that
registers as a component so that display controllers drivers who want
to use the component framework don't have to re-code the
component-to-bridge glue every time, and those who don't use the
component framework can still get access to the bridge.


Re: [PATCH] x86/Centaur: show more HW features in /proc/cpuinfo

2018-04-18 Thread David Wang


> -Original Mail-
> Sender: Thomas Gleixner [mailto:t...@linutronix.de]
> Time: 2018/4/17 18:19
> Receiver: David Wang 
> CC: mi...@redhat.com; h...@zytor.com; mi...@kernel.org;
> gre...@linuxfoundation.org; x...@kernel.org; linux-
> ker...@vger.kernel.org; brucech...@via-alliance.com;
> cooper...@zhaoxin.com; qiyuanw...@zhaoxin.com;
> benjamin...@viatech.com; luke...@viacpu.com; tim...@zhaoxin.com
> Subject: Re: [PATCH] x86/Centaur: show more HW features in /proc/cpuinfo
> 
> On Sun, 8 Apr 2018, David Wang wrote:
> 
> > We add this patch to show correct HW features(arch_perfmon,
> > tpr_shadow, vnmi, flexpriority, ept and vpid) when user execute "cat
> /proc/cpuinfo".
> 
> See the other mail vs. the changelog.
>

 
OK. Thanks.

> >
> > Signed-off-by: David Wang 
> > ---
> >  arch/x86/kernel/cpu/centaur.c | 49
> > +++
> >  1 file changed, 49 insertions(+)
> >
> > diff --git a/arch/x86/kernel/cpu/centaur.c
> > b/arch/x86/kernel/cpu/centaur.c index e5ec0f1..969fb8f 100644
> > --- a/arch/x86/kernel/cpu/centaur.c
> > +++ b/arch/x86/kernel/cpu/centaur.c
> > @@ -112,6 +112,44 @@ static void early_init_centaur(struct cpuinfo_x86
> *c)
> > }
> >  }
> >
> > +static void centaur_detect_vmx_virtcap(struct cpuinfo_x86 *c) {
> > +#define X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW   0x0020
> > +#define X86_VMX_FEATURE_PROC_CTLS_VNMI 0x0040
> > +#define X86_VMX_FEATURE_PROC_CTLS_2ND_CTLS 0x8000
> > +#define X86_VMX_FEATURE_PROC_CTLS2_VIRT_APIC   0x0001
> > +#define X86_VMX_FEATURE_PROC_CTLS2_EPT 0x0002
> > +#define X86_VMX_FEATURE_PROC_CTLS2_VPID0x0020
> 
> Please move the defines outside the function. This is horrible to read,

OK.

> 
> > +
> > +   u32 vmx_msr_low, vmx_msr_high, msr_ctl, msr_ctl2;
> > +
> > +   clear_cpu_cap(c, X86_FEATURE_TPR_SHADOW);
> > +   clear_cpu_cap(c, X86_FEATURE_VNMI);
> > +   clear_cpu_cap(c, X86_FEATURE_FLEXPRIORITY);
> > +   clear_cpu_cap(c, X86_FEATURE_EPT);
> > +   clear_cpu_cap(c, X86_FEATURE_VPID);
> 
> Why are you clearing the capabilities? They are cleared at boot time.
> 

OK. It's really useless. 

> > +   rdmsr(MSR_IA32_VMX_PROCBASED_CTLS, vmx_msr_low,
> vmx_msr_high);
> > +   msr_ctl = vmx_msr_high | vmx_msr_low;
> > +
> > +   if (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW)
> > +   set_cpu_cap(c, X86_FEATURE_TPR_SHADOW);
> > +   if (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_VNMI)
> > +   set_cpu_cap(c, X86_FEATURE_VNMI);
> > +   if (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_2ND_CTLS) {
> > +   rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2,
> > + vmx_msr_low, vmx_msr_high);
> > +   msr_ctl2 = vmx_msr_high | vmx_msr_low;
> > +   if ((msr_ctl2 &
> X86_VMX_FEATURE_PROC_CTLS2_VIRT_APIC) &&
> > +   (msr_ctl &
> X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW))
> > +   set_cpu_cap(c, X86_FEATURE_FLEXPRIORITY);
> > +   if (msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_EPT)
> > +   set_cpu_cap(c, X86_FEATURE_EPT);
> > +   if (msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_VPID)
> > +   set_cpu_cap(c, X86_FEATURE_VPID);
> > +   }
> > +}
> > +
> >  static void init_centaur(struct cpuinfo_x86 *c)  {  #ifdef
> > CONFIG_X86_32 @@ -128,6 +166,14 @@ static void init_centaur(struct
> > cpuinfo_x86 *c)
> > clear_cpu_cap(c, 0*32+31);
> >  #endif
> > early_init_centaur(c);
> > +
> > +   if (c->cpuid_level > 9) {
> > +   unsigned eax = cpuid_eax(10);
> 
> Missing newline between variable declaration and code. checkpatch.pl
> should have told you that.
> 

OK. 

> > +   /* Check for version and the number of counters */
> > +   if ((eax & 0xff) && (((eax >> 8) & 0xff) > 1))
> 
> Magic constants and a comment which does not explain how the check
> works.
> 

OK. I will explain more detail in the comments.

> > +   set_cpu_cap(c, X86_FEATURE_ARCH_PERFMON);
> 
> Thanks,
> 
>   tglx

Thanks,

---
David





Re: [PATCH v6 6/7] remoteproc/davinci: use the reset framework

2018-04-18 Thread Philipp Zabel
On Tue, 2018-04-17 at 19:30 +0200, Bartosz Golaszewski wrote:
> From: Bartosz Golaszewski 
> 
> Switch to using the reset framework instead of handcoded reset routines
> we used so far.
> 
> Signed-off-by: Bartosz Golaszewski 

[...]
> @@ -268,6 +282,15 @@ static int da8xx_rproc_probe(struct platform_device 
> *pdev)
>   return PTR_ERR(dsp_clk);
>   }
>  
> + dsp_reset = devm_reset_control_get_exclusive(dev, NULL);
> + if (IS_ERR(dsp_reset)) {
> + if (PTR_ERR(dsp_reset) != -EPROBE_DEFER)
> + dev_err(dev, "unable to get reset control: %ld\n",
> + PTR_ERR(dsp_reset));
> +
> + return PTR_ERR(dsp_reset);
> + }
> +
>   if (dev->of_node) {
>   ret = of_reserved_mem_device_init(dev);
>   if (ret) {
[...]
> @@ -309,7 +333,7 @@ static int da8xx_rproc_probe(struct platform_device *pdev)
>* *not* in reset, but da8xx_rproc_start() needs the DSP to be
>* held in reset at the time it is called.

Given this requirement, devm_reset_control_get_exclusive above is the
correct choice.

>*/
> - ret = davinci_clk_reset_assert(drproc->dsp_clk);
> + ret = reset_control_assert(dsp_reset);

Reviewed-by: Philipp Zabel 

regards
Philipp


Re: [PATCH v6 28/30] drm/rockchip: Disable PSR from reboot notifier

2018-04-18 Thread Enric Balletbo Serra
Hi Andrzej, Tomasz

2018-04-16 15:12 GMT+02:00 Tomasz Figa :
> Hi Andrzej,
>
> On Mon, Apr 16, 2018 at 6:57 PM Andrzej Hajda  wrote:
>
>> On 05.04.2018 11:49, Enric Balletbo i Serra wrote:
>> > From: Tomasz Figa 
>> >
>> > It looks like the driver subsystem detaches devices from power domains
>> > at shutdown without consent of the drivers.
>
>> It looks bit strange. Could you elaborate more on it. Could you show the
>> code performing the detach?
>
> It not only looks strange, but it is strange. The code was present in 4.4:
>
> https://elixir.bootlin.com/linux/v4.4.128/source/drivers/base/platform.c#L553
>
> but was apparently removed in 4.5:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/drivers/base/platform.c?h=next-20180416=2d30bb0b3889adf09b342722b2ce596c0763bc93
>
> So we might not need this patch anymore.
>

Right, seems that we don't need this patch anymore, I'll do more few
tests and likely remove this patch from this series. Thanks for
catching this.

Best regards,
  Enric

> Best regards,
> Tomasz
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 5/6 RESEND] statfs: add ST_PRIVATE

2018-04-18 Thread Christian Brauner
This lets userspace query whether a mountpoint was made MS_PRIVATE.

Signed-off-by: Christian Brauner 
Cc: Alexander Viro 
---
 fs/statfs.c| 2 ++
 include/linux/statfs.h | 1 +
 2 files changed, 3 insertions(+)

diff --git a/fs/statfs.c b/fs/statfs.c
index 2fc6f9c3793c..26cda2586d7e 100644
--- a/fs/statfs.c
+++ b/fs/statfs.c
@@ -33,6 +33,8 @@ static int flags_by_mnt(int mnt_flags)
flags |= ST_UNBINDABLE;
if (mnt_flags & MNT_SHARED)
flags |= ST_SHARED;
+   else
+   flags |= ST_PRIVATE;
return flags;
 }
 
diff --git a/include/linux/statfs.h b/include/linux/statfs.h
index 5416b2936dd9..1ea4a45aa6c3 100644
--- a/include/linux/statfs.h
+++ b/include/linux/statfs.h
@@ -41,6 +41,7 @@ struct kstatfs {
 #define ST_NODIRATIME  (1<<11) /* do not update directory access times */
 #define ST_RELATIME(1<<12) /* update atime relative to mtime/ctime */
 #define ST_UNBINDABLE  (1<<17) /* change to unbindable */
+#define ST_PRIVATE (1<<18) /* change to private */
 #define ST_SHARED  (1<<20) /* change to shared */
 
 #endif
-- 
2.17.0



[PATCH 0/6 RESEND] statfs: handle mount propagation

2018-04-18 Thread Christian Brauner
Hey,

This is a resend of this to CC more people and because it seems to have
gotten lost in the prior merge window. I should've sent it afterwards
right away.

This series:
- unifies the definition of constants in statfs.h and fs.h
  Please note the comments by Greg and others on this part:
  https://patchwork.kernel.org/patch/10340403/
  https://patchwork.kernel.org/patch/10340379/
  I haven't yet changed the fs.h and statfs.h header changes to not
  bitshifts. I wanted to wait what Al would think of it.
- extends statfs to handle mount propagation. This will let userspace
  easily query a given mountpoint for MS_UNBINDABLE, MS_SHARED,
  MS_PRIVATE and MS_SLAVE without always having to do costly parsing of
  /proc//mountinfo.
  To this end the flags:
  - ST_UNBINDABLE
  - ST_SHARED
  - ST_PRIVATE
  - ST_SLAVE
  are added. They have the same value as their MS_* counterparts.

The patchset was made against Al's vfs/for-next tree.

Thanks!
Christian

Christian Brauner (6):
  fs: use << for MS_* flags
  statfs: use << to align with fs header
  statfs: add ST_UNBINDABLE
  statfs: add ST_SHARED
  statfs: add ST_PRIVATE
  statfs: add ST_SLAVE

 fs/statfs.c | 16 +++-
 include/linux/statfs.h  | 30 +-
 include/uapi/linux/fs.h | 33 +
 3 files changed, 49 insertions(+), 30 deletions(-)

-- 
2.17.0



[PATCH 6/6 RESEND] statfs: add ST_SLAVE

2018-04-18 Thread Christian Brauner
This lets userspace query whether a mountpoint was made MS_SLAVE.

Signed-off-by: Christian Brauner 
Cc: Alexander Viro 
---
 fs/statfs.c| 10 +-
 include/linux/statfs.h |  1 +
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/fs/statfs.c b/fs/statfs.c
index 26cda2586d7e..86e957d16a68 100644
--- a/fs/statfs.c
+++ b/fs/statfs.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include "internal.h"
+#include "pnode.h"
 
 static int flags_by_mnt(int mnt_flags)
 {
@@ -52,8 +53,15 @@ static int flags_by_sb(int s_flags)
 
 static int calculate_f_flags(struct vfsmount *mnt)
 {
-   return ST_VALID | flags_by_mnt(mnt->mnt_flags) |
+   int flags = 0;
+
+   flags = ST_VALID | flags_by_mnt(mnt->mnt_flags) |
flags_by_sb(mnt->mnt_sb->s_flags);
+
+   if (IS_MNT_SLAVE(real_mount(mnt)))
+   flags |= ST_SLAVE;
+
+   return flags;
 }
 
 static int statfs_by_dentry(struct dentry *dentry, struct kstatfs *buf)
diff --git a/include/linux/statfs.h b/include/linux/statfs.h
index 1ea4a45aa6c3..663fa5498a7d 100644
--- a/include/linux/statfs.h
+++ b/include/linux/statfs.h
@@ -42,6 +42,7 @@ struct kstatfs {
 #define ST_RELATIME(1<<12) /* update atime relative to mtime/ctime */
 #define ST_UNBINDABLE  (1<<17) /* change to unbindable */
 #define ST_PRIVATE (1<<18) /* change to private */
+#define ST_SLAVE   (1<<19) /* change to slave */
 #define ST_SHARED  (1<<20) /* change to shared */
 
 #endif
-- 
2.17.0



[PATCH 4/6 RESEND] statfs: add ST_SHARED

2018-04-18 Thread Christian Brauner
This lets userspace query whether a mountpoint was made MS_SHARED.

Signed-off-by: Christian Brauner 
Cc: Alexander Viro 
---
 fs/statfs.c| 2 ++
 include/linux/statfs.h | 1 +
 2 files changed, 3 insertions(+)

diff --git a/fs/statfs.c b/fs/statfs.c
index 61b3063d3921..2fc6f9c3793c 100644
--- a/fs/statfs.c
+++ b/fs/statfs.c
@@ -31,6 +31,8 @@ static int flags_by_mnt(int mnt_flags)
flags |= ST_RELATIME;
if (mnt_flags & MNT_UNBINDABLE)
flags |= ST_UNBINDABLE;
+   if (mnt_flags & MNT_SHARED)
+   flags |= ST_SHARED;
return flags;
 }
 
diff --git a/include/linux/statfs.h b/include/linux/statfs.h
index e1b84d0388c1..5416b2936dd9 100644
--- a/include/linux/statfs.h
+++ b/include/linux/statfs.h
@@ -41,5 +41,6 @@ struct kstatfs {
 #define ST_NODIRATIME  (1<<11) /* do not update directory access times */
 #define ST_RELATIME(1<<12) /* update atime relative to mtime/ctime */
 #define ST_UNBINDABLE  (1<<17) /* change to unbindable */
+#define ST_SHARED  (1<<20) /* change to shared */
 
 #endif
-- 
2.17.0



[PATCH 3/6 RESEND] statfs: add ST_UNBINDABLE

2018-04-18 Thread Christian Brauner
This lets userspace query whether a mountpoint was made MS_UNBINDABLE.

Signed-off-by: Christian Brauner 
Cc: Alexander Viro 
---
 fs/statfs.c| 2 ++
 include/linux/statfs.h | 1 +
 2 files changed, 3 insertions(+)

diff --git a/fs/statfs.c b/fs/statfs.c
index 5b2a24f0f263..61b3063d3921 100644
--- a/fs/statfs.c
+++ b/fs/statfs.c
@@ -29,6 +29,8 @@ static int flags_by_mnt(int mnt_flags)
flags |= ST_NODIRATIME;
if (mnt_flags & MNT_RELATIME)
flags |= ST_RELATIME;
+   if (mnt_flags & MNT_UNBINDABLE)
+   flags |= ST_UNBINDABLE;
return flags;
 }
 
diff --git a/include/linux/statfs.h b/include/linux/statfs.h
index b336c04e793c..e1b84d0388c1 100644
--- a/include/linux/statfs.h
+++ b/include/linux/statfs.h
@@ -40,5 +40,6 @@ struct kstatfs {
 #define ST_NOATIME (1<<10) /* do not update access times */
 #define ST_NODIRATIME  (1<<11) /* do not update directory access times */
 #define ST_RELATIME(1<<12) /* update atime relative to mtime/ctime */
+#define ST_UNBINDABLE  (1<<17) /* change to unbindable */
 
 #endif
-- 
2.17.0



[PATCH] bpf, x86_32: add eBPF JIT compiler for ia32 (x86_32)

2018-04-18 Thread Wang YanQing
The JIT compiler emits ia32 bit instructions. Currently, It supports
eBPF only. Classic BPF is supported because of the conversion by BPF core.

Almost all instructions from eBPF ISA supported except the following:
BPF_ALU64 | BPF_DIV | BPF_K
BPF_ALU64 | BPF_DIV | BPF_X
BPF_ALU64 | BPF_MOD | BPF_K
BPF_ALU64 | BPF_MOD | BPF_X
BPF_STX | BPF_XADD | BPF_W
BPF_STX | BPF_XADD | BPF_DW

It doesn't support BPF_JMP|BPF_CALL with BPF_PSEUDO_CALL too.

IA32 has few general purpose registers, EAX|EDX|ECX|EBX|ESI|EDI,
and for these six registers, we can't treat all of them as real
general purpose registers:
MUL instructions need EAX:EDX, shift instructions need ECX, ESI|EDI
for string manipulation instructions.

So I decide to use stack to emulate all eBPF 64 registers, this will
simplify the implementation very much, because we don't need to face
the flexible memory address modes on ia32, for example, we don't need
to write below codes for one BPF_ADD instruction:
if (src_reg is a register && dst_reg is a register)
{
   //one instruction encoding for ADD instruction
} else if (only src is a register)
{
   //another different instruction encoding for ADD instruction
} else if (only dst is a register)
{
   //another different instruction encoding for ADD instruction
} else
{
   //src and dst are all on stack.
   //another different instruction encoding for ADD instruction
}

If you think above if-else-else-else isn't so painful, try to think
it for BPF_ALU64|BPF_*SHIFT* instruction:)

Tested on my PC(Intel(R) Core(TM) i5-5200U CPU) and virtualbox.

Testing results on i5-5200U:

1) test_bpf: Summary: 349 PASSED, 0 FAILED, [319/341 JIT'ed]
2) test_progs: Summary: 81 PASSED, 2 FAILED.
   test_progs report "libbpf: incorrect bpf_call opcode" for
   test_l4lb_noinline and test_xdp_noinline, because there is
   no llvm-6.0 on my machine, and current implementation doesn't
   support BPF_CALL, so I think we can ignore it.
3) test_lpm: OK
4) test_lru_map: OK
5) test_verifier: Summary: 823 PASSED, 5 FAILED
   test_verifier report "invalid bpf_context access off=68 size=1/2/4/8"
   for all the 5 FAILED testcases, and test_verifier report them when
   we turn off the jit, so I think the jit can do nothing to fix them.

Above tests are all done with following flags enabled discretely:
bpf_jit_enable=1 and bpf_jit_harden=2

Below are some numbers for this jit implementation:
Note:
  I run test_progs 100 times in loop for every testcase, the numbers
  are in format: total/times=avg. The numbers that test_bpf report
  almost show the same relation.

a:jit_enable=0 and jit_harden=0b:jit_enable=1 and jit_harden=0
  test_pkt_access:PASS:ipv4:15622/100=156  
test_pkt_access:PASS:ipv4:10057/100=100
  test_pkt_access:PASS:ipv6:9130/100=91test_pkt_access:PASS:ipv6:5055/100=50
  test_xdp:PASS:ipv4:240198/100=2401   test_xdp:PASS:ipv4:145945/100=1459
  test_xdp:PASS:ipv6:137326/100=1373   test_xdp:PASS:ipv6:67337/100=673
  test_l4lb:PASS:ipv4:61100/100=611test_l4lb:PASS:ipv4:38137/100=381
  test_l4lb:PASS:ipv6:101000/100=1010  test_l4lb:PASS:ipv6:57779/100=577

c:jit_enable=0 and jit_harden=2b:jit_enable=1 and jit_harden=2
  test_pkt_access:PASS:ipv4:15214/100=152  
test_pkt_access:PASS:ipv4:12650/100=126
  test_pkt_access:PASS:ipv6:9132/100=91test_pkt_access:PASS:ipv6:7074/100=70
  test_xdp:PASS:ipv4:237252/100=2372   test_xdp:PASS:ipv4:147211/100=1472
  test_xdp:PASS:ipv6:135977/100=1359   test_xdp:PASS:ipv6:85783/100=857
  test_l4lb:PASS:ipv4:61324/100=613test_l4lb:PASS:ipv4:53222/100=532
  test_l4lb:PASS:ipv6:100833/100=1008  test_l4lb:PASS:ipv6:76322/100=763

Yes, the numbers are pretty without turn on jit_harden, if we want to speedup
jit_harden, then we need to move BPF_REG_AX to *real* register instead of stack
emulation, but If we do it, we need to face all the pain I describe above. We
can do it in next step.

See Documentation/networking/filter.txt for more information.

Signed-off-by: Wang YanQing 
---
 arch/x86/Kconfig |2 +-
 arch/x86/include/asm/nospec-branch.h |   26 +-
 arch/x86/net/Makefile|   10 +-
 arch/x86/net/bpf_jit32.S |  147 +++
 arch/x86/net/bpf_jit_comp32.c| 2239 ++
 5 files changed, 2419 insertions(+), 5 deletions(-)
 create mode 100644 arch/x86/net/bpf_jit32.S
 create mode 100644 arch/x86/net/bpf_jit_comp32.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 00fcf81..1f5fa2f 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -137,7 +137,7 @@ config X86
select HAVE_DMA_CONTIGUOUS
select HAVE_DYNAMIC_FTRACE
select HAVE_DYNAMIC_FTRACE_WITH_REGS
-   select HAVE_EBPF_JITif X86_64
+   select HAVE_EBPF_JIT
select HAVE_EFFICIENT_UNALIGNED_ACCESS
select HAVE_EXIT_THREAD
select HAVE_FENTRY  if X86_64 || DYNAMIC_FTRACE
diff --git 

Re: [PATCH v3 06/11] iio: inkern: add module put/get on iio dev module when requesting channels

2018-04-18 Thread Jonathan Cameron
On Tue, 17 Apr 2018 12:19:06 -0700
Dmitry Torokhov  wrote:

> Hi Eugen,
> 
> On Tue, Apr 17, 2018 at 10:39:24AM +0300, Eugen Hristev wrote:
> > 
> > 
> > On 17.04.2018 02:58, Dmitry Torokhov wrote:  
> > > On Sun, Apr 15, 2018 at 08:33:21PM +0100, Jonathan Cameron wrote:  
> > > > On Tue, 10 Apr 2018 11:57:52 +0300
> > > > Eugen Hristev  wrote:
> > > >   
> > > > > When requesting channels for a particular consumer device,
> > > > > besides requesting the device (incrementing the reference counter), 
> > > > > also
> > > > > do it for the driver module of the iio dev. This will avoid the 
> > > > > situation
> > > > > where the producer IIO device can be removed and the consumer is still
> > > > > present in the kernel.
> > > > > 
> > > > > Signed-off-by: Eugen Hristev 
> > > > > ---
> > > > >   drivers/iio/inkern.c | 8 +++-
> > > > >   1 file changed, 7 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/drivers/iio/inkern.c b/drivers/iio/inkern.c
> > > > > index ec98790..68d9b87 100644
> > > > > --- a/drivers/iio/inkern.c
> > > > > +++ b/drivers/iio/inkern.c
> > > > > @@ -11,6 +11,7 @@
> > > > >   #include 
> > > > >   #include 
> > > > >   #include 
> > > > > +#include 
> > > > >   #include 
> > > > >   #include "iio_core.h"
> > > > > @@ -152,6 +153,7 @@ static int __of_iio_channel_get(struct 
> > > > > iio_channel *channel,
> > > > >   if (index < 0)
> > > > >   goto err_put;
> > > > >   channel->channel = _dev->channels[index];
> > > > > + try_module_get(channel->indio_dev->driver_module);  
> > > > 
> > > > And if it fails? (the module we are trying to get is going away...)
> > > > 
> > > > We should try and handle it I think. Be it by just erroring out of 
> > > > here.  
> > > 
> > > Even more, this has nothing to do with modules. A device can go away for
> > > any number of reasons (we unbind it manually via sysfs, we pull the USB
> > > plug from the host in case it is USB-connected device, we unload I2C
> > > adapter for the bus device resides on, we kick underlying PCI device)
> > > and we should be able to handle this in some fashion. Handling errors
> > > from reads and ignoring garbage is one of methods.
> > > 
> > > FWIW this is a NACK from me.
> > > 
> > > Thanks.  
> > Hello,
> > 
> > This patch is actually a "best effort attempt" for the consumer driver
> > (touch driver) to get a reference to the producer of the data (the IIO
> > device), when it requests the specific channels.
> > As of this moment, there is no attempt whatsoever for the consumer to have a
> > reference on the producer driver. Thus, the producer can be removed at any
> > time, and the consumer will fail ungraciously.  
> 
> This is the root of the issue. The consumer should be prepared to handle
> errors from producer.
> 
> > I can change the perspective from "best effort" to "mandatory" to get a
> > reference to the producer, or you wish to stop trying to get any reference
> > at all (remove this patch completely) ?  
> 
> You should take reference to the device itself (if it is not taken
> already), so it does not disappear completely and you can continue using
> IIO API to access it, and IIO API should be prepared to deal with "dead"
> devices, but as I pointed in my other email, trying to pin the driver
> is quite pointless as there are myriad other ways of device stopping
> working besides module unloading.
> 
> In any case, I think this problem is outside of the scope of this
> patchset that adds a generic resistive touchscreen, so if you want to
> continue working on this I'd recommend moving it into a separate series.
> 
> Thanks.
> 
Agreed, this one has come up a number of times before.  Quite a lot of
work got done by (IIRC) Lars Peter Clausen to stabilize things in various
unexpected 'going away' events.  Of course there may be paths we have
added since that (it was years ago) that can cause trouble...

Anyhow, separate issue as Dmitry says, let's deal with it separately.

Jonathan



Re: [PATCH v6 11/11] ARM: shmobile: Convert file to use cntvoff

2018-04-18 Thread Mylène Josserand
Hello,

On Wed, 18 Apr 2018 11:36:27 +0200
Geert Uytterhoeven  wrote:

> Hi Mylène,
> 
> On Mon, Apr 16, 2018 at 11:50 PM, Mylène Josserand
>  wrote:
> > Now that a common function is available for CNTVOFF's
> > initialization, let's convert shmobile-apmu code to use
> > this function.  
> 
> Thanks for your patch, works fine on Renesas ALT with R-Car E2,
> which suffers from lack of CNTVOFF initialization.

Great to know that it works on this board.

> 
> > Signed-off-by: Mylène Josserand   
> 
> Reviewed-by: Geert Uytterhoeven 
> Tested-by: Geert Uytterhoeven 
> 
> Gr{oetje,eeting}s,
> 
> Geert
> 

Thank you again for testing it :)

Best regards,

-- 
Mylène Josserand, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
http://bootlin.com


Re: [PATCH] locking/rwsem: Synchronize task state & waiter->task of readers

2018-04-18 Thread Benjamin Herrenschmidt
On Tue, 2018-04-10 at 13:22 -0400, Waiman Long wrote:
> It was observed occasionally in PowerPC systems that there was reader
> who had not been woken up but that its waiter->task had been cleared.
> 
> One probable cause of this missed wakeup may be the fact that the
> waiter->task and the task state have not been properly synchronized as
> the lock release-acquire pair of different locks in the wakeup code path
> does not provide a full memory barrier guarantee. So smp_store_mb()
> is now used to set waiter->task to NULL to provide a proper memory
> barrier for synchronization.
> 
> Signed-off-by: Waiman Long 

That looks right... nothing in either lock or unlock will prevent a
store going past a load.

> ---
>  kernel/locking/rwsem-xadd.c | 17 +
>  1 file changed, 17 insertions(+)
> 
> diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
> index e795908..b3c588c 100644
> --- a/kernel/locking/rwsem-xadd.c
> +++ b/kernel/locking/rwsem-xadd.c
> @@ -209,6 +209,23 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
>   smp_store_release(>task, NULL);
>   }
>  
> + /*
> +  * To avoid missed wakeup of reader, we need to make sure
> +  * that task state and waiter->task are properly synchronized.
> +  *
> +  * wakeup sleep
> +  * -- -
> +  * __rwsem_mark_wake:   rwsem_down_read_failed*:
> +  *   [S] waiter->task [S] set_current_state(state)
> +  *   MB   MB
> +  * try_to_wake_up:
> +  *   [L] state[L] waiter->task
> +  *
> +  * For the wakeup path, the original lock release-acquire pair
> +  * does not provide enough guarantee of proper synchronization.
> +  */
> + smp_mb();
> +
>   adjustment = woken * RWSEM_ACTIVE_READ_BIAS - adjustment;
>   if (list_empty(>wait_list)) {
>   /* hit end of list above */


RE: [PATCH 2/6 v2] iommu: of: make of_pci_map_rid() available for other devices too

2018-04-18 Thread Nipun Gupta


> -Original Message-
> From: Robin Murphy [mailto:robin.mur...@arm.com]
> Sent: Tuesday, April 17, 2018 10:23 PM
> To: Nipun Gupta ; robh...@kernel.org;
> frowand.l...@gmail.com
> Cc: will.dea...@arm.com; mark.rutl...@arm.com; catalin.mari...@arm.com;
> h...@lst.de; gre...@linuxfoundation.org; j...@8bytes.org;
> m.szyprow...@samsung.com; shawn...@kernel.org; bhelg...@google.com;
> io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> devicet...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; linuxppc-
> d...@lists.ozlabs.org; linux-...@vger.kernel.org; Bharat Bhushan
> ; stuyo...@gmail.com; Laurentiu Tudor
> ; Leo Li 
> Subject: Re: [PATCH 2/6 v2] iommu: of: make of_pci_map_rid() available for
> other devices too
> 
> On 17/04/18 11:21, Nipun Gupta wrote:
> > iommu-map property is also used by devices with fsl-mc. This patch
> > moves the of_pci_map_rid to generic location, so that it can be used
> > by other busses too.
> >
> > Signed-off-by: Nipun Gupta 
> > ---
> >   drivers/iommu/of_iommu.c | 106
> > +--
> 
> Doesn't this break "msi-parent" parsing for !CONFIG_OF_IOMMU? I guess you
> don't want fsl-mc to have to depend on PCI, but this looks like a step in the
> wrong direction.

Thanks for pointing out.
Agree, this will break "msi-parent" parsing for !CONFIG_OF_IOMMU case.

> 
> I'm not entirely sure where of_map_rid() fits best, but from a quick look 
> around
> the least-worst option might be drivers/of/of_address.c, unless Rob and Frank
> have a better idea of where generic DT-based ID translation routines could 
> live?
> 
> >   drivers/of/irq.c |   6 +--
> >   drivers/pci/of.c | 101 
> > 
> >   include/linux/of_iommu.h |  11 +
> >   include/linux/of_pci.h   |  10 -
> >   5 files changed, 117 insertions(+), 117 deletions(-)
> >

[...]

> >   struct of_pci_iommu_alias_info {
> > struct device *dev;
> > struct device_node *np;
> > @@ -149,9 +249,9 @@ static int of_pci_iommu_init(struct pci_dev *pdev, u16
> alias, void *data)
> > struct of_phandle_args iommu_spec = { .args_count = 1 };
> > int err;
> >
> > -   err = of_pci_map_rid(info->np, alias, "iommu-map",
> > -"iommu-map-mask", _spec.np,
> > -iommu_spec.args);
> > +   err = of_map_rid(info->np, alias, "iommu-map",
> > +"iommu-map-mask", _spec.np,
> > +iommu_spec.args);
> 
> Super-nit: Apparently I missed rewrapping this to 2 lines in d87beb749281, 
> but if
> it's being touched again, that would be nice ;)

Sure.. I'll take care of this in the next version :)

Regards,
Nipun


Re: [PATCH] mm:memcg: add __GFP_NOWARN in __memcg_schedule_kmem_cache_create

2018-04-18 Thread Michal Hocko
On Tue 17-04-18 20:08:24, Matthew Wilcox wrote:
> On Wed, Apr 18, 2018 at 11:29:12AM +0900, Minchan Kim wrote:
> > If there are heavy memory pressure, page allocation with __GFP_NOWAIT
> > fails easily although it's order-0 request.
> > I got below warning 9 times for normal boot.
> > 
> > [   17.072747] c0 0  : page allocation failure: order:0, 
> > mode:0x220(GFP_NOWAIT|__GFP_NOTRACK)
> > 
> > Let's not make user scared.
> >  
> > -   cw = kmalloc(sizeof(*cw), GFP_NOWAIT);
> > +   cw = kmalloc(sizeof(*cw), GFP_NOWAIT | __GFP_NOWARN);
> > if (!cw)
> 
> Not arguing against this patch.  But how many places do we want to use
> GFP_NOWAIT without __GFP_NOWARN?  Not many, and the few which do do this
> seem like they simply haven't added it yet.  Maybe this would be a good idea?
> 
> -#define GFP_NOWAIT  (__GFP_KSWAPD_RECLAIM)
> +#define GFP_NOWAIT  (__GFP_KSWAPD_RECLAIM | __GFP_NOWARN)

We have tried something like this in the past and Linus was strongly
against. I do not have reference handy but his argument was that each
__GFP_NOWARN should be explicit rather than implicit because it is
a deliberate decision to make.

-- 
Michal Hocko
SUSE Labs


Re: [PATCH v3] module: Fix display of wrong module .text address

2018-04-18 Thread Thomas-Mich Richter
On 04/18/2018 09:17 AM, Tobin C. Harding wrote:
> On Wed, Apr 18, 2018 at 09:14:36AM +0200, Thomas Richter wrote:
>> Reading file /proc/modules shows the correct address:
>> [root@s35lp76 ~]# cat /proc/modules | egrep '^qeth_l2'
>> qeth_l2 94208 1 - Live 0x03ff80401000
>>
>> and reading file /sys/module/qeth_l2/sections/.text
>> [root@s35lp76 ~]# cat /sys/module/qeth_l2/sections/.text
>> 0x18ea8363
>> displays a random address.
>>
>> This breaks the perf tool which uses this address on s390
>> to calculate start of .text section in memory.
>>
>> Fix this by printing the correct (unhashed) address.
>>
>> Thanks to Jessica Yu for helping on this.
>>
>> Fixes: ef0010a30935 ("vsprintf: don't use 'restricted_pointer()' when not 
>> restricting")
>> Cc:  # v4.15+
>> Suggested-by: Linus Torvalds 
>> Signed-off-by: Thomas Richter 
>> Cc: Jessica Yu 
>> ---
> 
> What's changed in each version please?
> 
> 
> thanks,
> Tobin.
> 

V2: Changed sprintf format string from %#lx to 0x%px (suggested by Kees Cook).
V3: Changed sprintf agrument from 0 to NULL to avoid sparse warning.

-- 
Thomas Richter, Dept 3303, IBM LTC Boeblingen Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294



Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy helper DRM driver

2018-04-18 Thread Roger Pau Monné
On Wed, Apr 18, 2018 at 09:38:39AM +0300, Oleksandr Andrushchenko wrote:
> On 04/17/2018 11:57 PM, Dongwon Kim wrote:
> > On Tue, Apr 17, 2018 at 09:59:28AM +0200, Daniel Vetter wrote:
> > > On Mon, Apr 16, 2018 at 12:29:05PM -0700, Dongwon Kim wrote:
> 3.2 Backend exports dma-buf to xen-front
> 
> In this case Dom0 pages are shared with DomU. As before, DomU can only write
> to these pages, not any other page from Dom0, so it can be still considered
> safe.
> But, the following must be considered (highlighted in xen-front's Kernel
> documentation):
>  - If guest domain dies then pages/grants received from the backend cannot
>    be claimed back - think of it as memory lost to Dom0 (won't be used for
> any
>    other guest)
>  - Misbehaving guest may send too many requests to the backend exhausting
>    its grant references and memory (consider this from security POV). As the
>    backend runs in the trusted domain we also assume that it is trusted as
> well,
>    e.g. must take measures to prevent DDoS attacks.

I cannot parse the above sentence:

"As the backend runs in the trusted domain we also assume that it is
trusted as well, e.g. must take measures to prevent DDoS attacks."

What's the relation between being trusted and protecting from DoS
attacks?

In any case, all? PV protocols are implemented with the frontend
sharing pages to the backend, and I think there's a reason why this
model is used, and it should continue to be used.

Having to add logic in the backend to prevent such attacks means
that:

 - We need more code in the backend, which increases complexity and
   chances of bugs.
 - Such code/logic could be wrong, thus allowing DoS.

> 4. xen-front/backend/xen-zcopy synchronization
> 
> 4.1. As I already said in 2) all the inter VM communication happens between
> xen-front and the backend, xen-zcopy is NOT involved in that.
> When xen-front wants to destroy a display buffer (dumb/dma-buf) it issues a
> XENDISPL_OP_DBUF_DESTROY command (opposite to XENDISPL_OP_DBUF_CREATE).
> This call is synchronous, so xen-front expects that backend does free the
> buffer pages on return.
> 
> 4.2. Backend, on XENDISPL_OP_DBUF_DESTROY:
>   - closes all dumb handles/fd's of the buffer according to [3]
>   - issues DRM_IOCTL_XEN_ZCOPY_DUMB_WAIT_FREE IOCTL to xen-zcopy to make
> sure
>     the buffer is freed (think of it as it waits for dma-buf->release
> callback)

So this zcopy thing keeps some kind of track of the memory usage? Why
can't the user-space backend keep track of the buffer usage?

>   - replies to xen-front that the buffer can be destroyed.
> This way deletion of the buffer happens synchronously on both Dom0 and DomU
> sides. In case if DRM_IOCTL_XEN_ZCOPY_DUMB_WAIT_FREE returns with time-out
> error
> (BTW, wait time is a parameter of this IOCTL), Xen will defer grant
> reference
> removal and will retry later until those are free.
> 
> Hope this helps understand how buffers are synchronously deleted in case
> of xen-zcopy with a single protocol command.
> 
> I think the above logic can also be re-used by the hyper-dmabuf driver with
> some additional work:
> 
> 1. xen-zcopy can be split into 2 parts and extend:
> 1.1. Xen gntdev driver [4], [5] to allow creating dma-buf from grefs and
> vise versa,

I don't know much about the dma-buf implementation in Linux, but
gntdev is a user-space device, and AFAICT user-space applications
don't have any notion of dma buffers. How are such buffers useful for
user-space? Why can't this just be called memory?

Also, (with my FreeBSD maintainer hat) how is this going to translate
to other OSes? So far the operations performed by the gntdev device
are mostly OS-agnostic because this just map/unmap memory, and in fact
they are implemented by Linux and FreeBSD.

> implement "wait" ioctl (wait for dma-buf->release): currently these are
> DRM_XEN_ZCOPY_DUMB_FROM_REFS, DRM_XEN_ZCOPY_DUMB_TO_REFS and
> DRM_XEN_ZCOPY_DUMB_WAIT_FREE
> 1.2. Xen balloon driver [6] to allow allocating contiguous buffers (not
> needed
> by current hyper-dmabuf, but is a must for xen-zcopy use-cases)

I think this needs clarifying. In which memory space do you need those
regions to be contiguous?

Do they need to be contiguous in host physical memory, or guest
physical memory?

If it's in guest memory space, isn't there any generic interface that
you can use?

If it's in host physical memory space, why do you need this buffer to
be contiguous in host physical memory space? The IOMMU should hide all
this.

Thanks, Roger.


Re: [PATCH v2 4/6] drm/atmel-hlcdc: support bus-width (12/16/18/24) in endpoint nodes

2018-04-18 Thread Peter Rosin
On 2018-04-18 09:29, Boris Brezillon wrote:
> On Tue, 17 Apr 2018 15:10:50 +0200
> Peter Rosin  wrote:
> 
>> This beats the heuristic that the connector is involved in what format
>> should be output for cases where this fails.
>>
>> E.g. if there is a bridge that changes format between the encoder and the
>> connector, or if some of the RGB pins between the lcd controller and the
>> encoder are not routed on the PCB.
>>
>> This is critical for the devices that have the "conflicting output
>> formats" issue (SAM9N12, SAM9X5, SAMA5D3), since the most significant
>> RGB bits move around depending on the selected output mode. For
>> devices that do not have the "conflicting output formats" issue
>> (SAMA5D2, SAMA5D4), this is completely irrelevant.
>>
>> Signed-off-by: Peter Rosin 
>> ---
>>  drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_crtc.c | 85 
>> --
>>  1 file changed, 65 insertions(+), 20 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_crtc.c 
>> b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_crtc.c
>> index d73281095fac..2e718959981e 100644
>> --- a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_crtc.c
>> +++ b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_crtc.c
>> @@ -19,12 +19,14 @@
>>   */
>>  
>>  #include 
>> +#include 
>>  #include 
>>  #include 
>>  #include 
>>  
>>  #include 
>>  #include 
>> +#include 
>>  #include 
>>  
>>  #include 
>> @@ -226,6 +228,68 @@ static void atmel_hlcdc_crtc_atomic_enable(struct 
>> drm_crtc *c,
>>  #define ATMEL_HLCDC_RGB888_OUTPUT   BIT(3)
>>  #define ATMEL_HLCDC_OUTPUT_MODE_MASKGENMASK(3, 0)
>>  
>> +static int atmel_hlcdc_connector_output_mode(struct drm_connector_state 
>> *state)
>> +{
>> +struct drm_connector *connector = state->connector;
>> +struct drm_display_info *info = >display_info;
>> +unsigned int supported_fmts = 0;
>> +struct device_node *ep;
>> +int j;
>> +
>> +/*
>> + * Use the connector index as an approximation of the
>> + * endpoint node index. We know it's true for our case
>> + * depending on the driver implementation.
>> + */
>> +ep = of_graph_get_endpoint_by_regs(connector->dev->dev->of_node, 0,
>> +   connector->index);
>> +
> 
> Hm, this sounds a bit fragile. Can't we have a reference to the of_node
> attached to the connector? Or maybe we can parse this earlier and set a
> constraint on the accepted modes.
> 
>> +if (ep) {
>> +int bus_fmt = drm_of_media_bus_fmt(ep);
> 
> Hm, you're extracting this piece of information from the DT every time
> an atomic modeset is done. I'd really prefer to have this done once at

Yes, not happy about it either. I looked for other sensible places too
hook the info at probe time, but this was just the simplest. I'll take
another look...

> probe time. Since this property is attached to the connector, maybe we
> should overwrite the info->bus_formats[] array or mark some of its
> entries as invalid.

I find it very wrong to mix the connector format with what you want to
output. In my mind it's a broken assumption that they are related. It is
only correct for trivial cases. Also note my comment about the connector
index and the endpoint index, they are only coincidentally the same
based on our implementation. If the driver has more than one port or
initializes endpoints out of order for some reason, this is no longer
true.

I think it would be better to store this info somewhere near the encoder,
since that is what I find closest to what I'm trying to change.

As I said, I'll take another look and see if I can hook this in at some
other place.

>> +
>> +of_node_put(ep);
>> +
>> +if (bus_fmt < 0)
>> +return bus_fmt;
>> +
>> +switch (bus_fmt) {
>> +case 0:
>> +break;
>> +case MEDIA_BUS_FMT_RGB444_1X12:
>> +return ATMEL_HLCDC_RGB444_OUTPUT;
>> +case MEDIA_BUS_FMT_RGB565_1X16:
>> +return ATMEL_HLCDC_RGB565_OUTPUT;
>> +case MEDIA_BUS_FMT_RGB666_1X18:
>> +return ATMEL_HLCDC_RGB666_OUTPUT;
>> +case MEDIA_BUS_FMT_RGB888_1X24:
>> +return ATMEL_HLCDC_RGB888_OUTPUT;
>> +default:
>> +return -EINVAL;
>> +}
>> +}
>> +
>> +for (j = 0; j < info->num_bus_formats; j++) {
>> +switch (info->bus_formats[j]) {
>> +case MEDIA_BUS_FMT_RGB444_1X12:
>> +supported_fmts |= ATMEL_HLCDC_RGB444_OUTPUT;
>> +break;
>> +case MEDIA_BUS_FMT_RGB565_1X16:
>> +supported_fmts |= ATMEL_HLCDC_RGB565_OUTPUT;
>> +break;
>> +case MEDIA_BUS_FMT_RGB666_1X18:
>> +supported_fmts |= ATMEL_HLCDC_RGB666_OUTPUT;
>> +break;
>> +case 

Re: [RFC PATCH 31/35] Revert "vfs: add d_real_inode() helper"

2018-04-18 Thread Amir Goldstein
On Thu, Apr 12, 2018 at 6:08 PM, Miklos Szeredi  wrote:
> This reverts commit a118084432d642eeccb961c7c8cc61525a941fcb.
>
> No user of d_real_inode() remains, so it can be removed.
>

FYI, there is a new user in v4.17-rc1 added by commit
f0a2aa5a2a40 tracing/uprobe: Add support for overlayfs

Seems like this patch got merged without any CC to overlayfs
mailing list nor maintainer?

Not sure yet if overlayfs-rorw patches would allow reverting this
change.

Thanks,
Amir.


Re: [PATCH 17/30] Documentation: kconfig: document a new Kconfig macro language

2018-04-18 Thread Ulf Magnusson
On Tue, Apr 17, 2018 at 5:07 PM, Masahiro Yamada
 wrote:
> 2018-04-15 17:08 GMT+09:00 Ulf Magnusson :
>> On Fri, Apr 13, 2018 at 7:06 AM, Masahiro Yamada
>>  wrote:
>>> Add a document for the macro language introduced to Kconfig.
>>>
>>> The motivation of this work is to move the compiler option tests to
>>> Kconfig from Makefile.  A number of kernel features require the
>>> compiler support.  Enabling such features blindly in Kconfig ends up
>>> with a lot of nasty build-time testing in Makefiles.  If a chosen
>>> feature turns out unsupported by the compiler, what the build system
>>> can do is either to disable it (silently!) or to forcibly break the
>>> build, despite Kconfig has let the user to enable it.
>>>
>>> This change was strongly prompted by Linus Torvalds.  You can find
>>> his suggestions [1] [2] in ML.  The original idea was to add a new
>>> 'option', but I found generalized text expansion would make Kconfig
>>> more powerful and lovely.  While polishing up the implementation, I
>>> noticed sort of similarity between Make and Kconfig.  This might be
>>> too immature to be called 'language', but anyway here it is.  All
>>> ideas are from Make (you can even say it is addicted), so people
>>> will easily understand how it works.
>>>
>>> [1]: https://lkml.org/lkml/2016/12/9/577
>>> [2]: https://lkml.org/lkml/2018/2/7/527
>>>
>>> Signed-off-by: Masahiro Yamada 
>>> ---
>>>
>>> Changes in v3: None
>>> Changes in v2: None
>>>
>>>  Documentation/kbuild/kconfig-macro-language.txt | 179 
>>> 
>>>  MAINTAINERS |   2 +-
>>>  2 files changed, 180 insertions(+), 1 deletion(-)
>>>  create mode 100644 Documentation/kbuild/kconfig-macro-language.txt
>>>
>>> diff --git a/Documentation/kbuild/kconfig-macro-language.txt 
>>> b/Documentation/kbuild/kconfig-macro-language.txt
>>> new file mode 100644
>>> index 000..1f6281b
>>> --- /dev/null
>>> +++ b/Documentation/kbuild/kconfig-macro-language.txt
>>> @@ -0,0 +1,179 @@
>>> +Concept
>>> +---
>>> +
>>> +The basic idea was inspired by Make. When we look at Make, we notice sort 
>>> of
>>> +two languages in one. One language describes dependency graphs consisting 
>>> of
>>> +targets and prerequisites. The other is a macro language for performing 
>>> textual
>>> +substitution.
>>> +
>>> +There is clear distinction between the two language stages. For example, 
>>> you
>>> +can write a makefile like follows:
>>> +
>>> +APP := foo
>>> +SRC := foo.c
>>> +CC := gcc
>>> +
>>> +$(APP): $(SRC)
>>> +$(CC) -o $(APP) $(SRC)
>>> +
>>> +The macro language replaces the variable references with their expanded 
>>> form,
>>> +and handles as if the source file were input like follows:
>>> +
>>> +foo: foo.c
>>> +gcc -o foo foo.c
>>> +
>>> +Then, Make analyzes the dependency graph and determines the targets to be
>>> +updated.
>>> +
>>> +The idea is quite similar in Kconfig - it is possible to describe a Kconfig
>>> +file like this:
>>> +
>>> +CC := gcc
>>> +
>>> +config CC_HAS_FOO
>>> +def_bool $(shell $(srctree)/scripts/gcc-check-foo.sh $(CC))
>>> +
>>> +The macro language in Kconfig processes the source file into the following
>>> +intermediate:
>>> +
>>> +config CC_HAS_FOO
>>> +def_bool y
>>> +
>>> +Then, Kconfig moves onto the evaluation stage to resolve inter-symbol
>>> +dependency, which is explained in kconfig-language.txt.
>>> +
>>> +
>>> +Variables
>>> +-
>>> +
>>> +Like in Make, a variable in Kconfig works as a macro variable.  A macro
>>> +variable is expanded "in place" to yield a text string that may then 
>>> expanded
>>> +further. To get the value of a variable, enclose the variable name in $( ).
>>> +As a special case, single-letter variable names can omit the parentheses 
>>> and is
>>> +simply referenced like $X. Unlike Make, Kconfig does not support curly 
>>> braces
>>> +as in ${CC}.
>>
>> Do we need single-letter variable names for anything? It looks like
>> we're deviating
>> a bit from Make behavior already.
>>
>> I suspect they're just a side effect of Make having automatic variables like 
>> $@.
>> The Make manual discourages them otherwise:
>>
>> "A dollar sign followed by a character other than a dollar sign,
>> open-parenthesis or
>> open-brace treats that single character as the variable name. Thus, you could
>> reference the variable x with `$x'. However, this practice is strongly
>> discouraged,
>> except in the case of the automatic variables (see section Automatic
>> Variables)."
>>
>
> OK.  We do not need two ways to do the same thing.
>
> I will consider it
> although supporting single-letter variable is not costly.
>
>
>
> --
> Best Regards
> Masahiro Yamada

Can you think of any cases where dynamic generation of Kconfig symbol
names would be a good solution by the way?


Re: [PATCH AUTOSEL for 4.14 015/161] printk: Add console owner and waiter logic to load balance console writes

2018-04-18 Thread Petr Mladek
On Tue 2018-04-17 13:45:59, Sasha Levin wrote:
> On Tue, Apr 17, 2018 at 02:24:54PM +0200, Petr Mladek wrote:
> >Back to the trend. Last week I got autosel mails even for
> >patches that were still being discussed, had issues, and
> >were far from upstream:
> >
> > https://lkml.kernel.org/r/dm5pr2101mb1032ab19b489d46b717b50d4fb...@dm5pr2101mb1032.namprd21.prod.outlook.com
> > https://lkml.kernel.org/r/dm5pr2101mb10327fa0a7e0d2c901e33b79fb...@dm5pr2101mb1032.namprd21.prod.outlook.com
> >
> >It might be a good idea if the mail asked to add Fixes: tag
> >or stable mailing list. But the mail suggested to add the
> >unfinished patch into stable branch directly (even before
> >upstreaming?).
> 
> I obviously didn't suggest that this patch will go in -stable before
> it's upstream.
> 
> I've started doing those because some folks can't be arsed to reply to a
> review request for a patch that is months old. I found that if I send
> these mails while the discussion is still going on I'd get a much better
> response rate from people.

I see. It makes sense.

> If you think any of these patches should go in stable there were two
> ways about it:
>
>  - You end up adding the -stable tag yourself, and it would follow the
>usual route where Greg picks it up.
>  - You reply to that mail, and the patch would wait in a list until my
>script notices it made it upstream, at which point it would get
>queued for stable.

It would be great if the options are described in the mail.

I wonder if it would make sense to add also a tag that would
say that the commit is not suitable for stable. It might
help both sides. The maintainers will be able to share
their opinion and eventually reduce mails from autosel.
You would get feedback that maintainers considered
the patch for stable. It might be even useful for
teaching the AI.


> >Now, there are only hand full of printk patches in each
> >release, so it is still doable. I just do not understand
> >how other maintainers, from much more busy subsystems,
> >could cope with this trend.
> 
> So yes, I'm aware that the volume of patches is huge, but there's not
> much I can do about it because it's just a subset of the kernel's patch
> volume and since the kernel gets more and more patches each release, the
> volume of stable commits is bound to grow as well.

Yes, but the grow in the stable is much faster than the grow
in maintain at the moment. It might be fine if it was caused
just by engaging subsystems that ignored stable so far. But
I am not sure if it is the case. Also I am not sure about
your plans.

Anyway, I am surprised that the patches might go into stable
so easily (no response -> accepted). While it is pretty
hard to get through the review process for mainline.

Of course, many patches go into mainline without review
as well. But the difference is that they are pushed by
people that are familiar and responsible for the affected
area.

I could understand the pain. There are surely people that
do not care about stable, because it takes time, it is hard
to make decisions, flashbacks to the old code are painful,
etc. Well, this is the reason why the maintenance support
is and should be limited.

Anyway, I think that it cannot be done reasonably without
maintainers. You should be careful so that even the currently
cooperating maintainers will not start considering autosel
mails as a spam. (It is not my case. printk is small thing.
But I could imagine that it might stop being bearable
in bigger subsystems. As is already the case with xfs.)

Best Regards,
Petr


Re: kernel panics with 4.14.X versions

2018-04-18 Thread Pavlos Parissis
On 17/04/2018 02:12 μμ, Jan Kara wrote:
> On Tue 17-04-18 01:31:24, Pavlos Parissis wrote:
>> On 16/04/2018 04:40 μμ, Jan Kara wrote:
> 
> 
> 
>>> How easily can you hit this?
>>
>> Very easily, I only need to wait 1-2 days for a crash to occur.
> 
> I wouldn't call that very easily but opinions may differ :). Anyway it's
> good (at least for debugging) that it's reproducible.
> 

Unfortunately, I can't reproduce it, so waiting 1-2 days is the only option I 
have.

>>> Are you able to run debug kernels
>>
>> Well, I was under the impression I do as I have:
>>   grep -E 'DEBUG_KERNEL|DEBUG_INFO' /boot/config-4.14.32-1.el7.x86_64
>>   CONFIG_DEBUG_INFO=y
>>   # CONFIG_DEBUG_INFO_REDUCED is not set
>>   # CONFIG_DEBUG_INFO_SPLIT is not set
>>   # CONFIG_DEBUG_INFO_DWARF4 is not set
>>   CONFIG_DEBUG_KERNEL=y
>>
>> Do you think that my kernel doesn't produce a proper crash dump?
>> I have a production cluster where I can run any kernel we need, so if I need
>> to compile again with different settings I can certainly do that.
> 
> OK, good. So please try running 4.16 as you mention below to verify whether
> this is just a -stable regression or also a problem in the current upstream
> kernel. Based on your results with 4.16 I'll prepare a debug patch for you to
> apply on top of 4.14.32 so that we can debug this further.
> 
>>> / inspect
>>> crash dumps when the issue occurs?
>>
>> I can't do that as the server isn't responsive and I can only power cycle it.
> 
> Well, kernel crash dumps work in that situation as well - when the kernel
> panics, it will kexec into a new kernel and dump memory of the old kernel
> to disk. It can then be investigated with the 'crash' utility. But
> obviously you don't have this set up and don't have experience with this so
> let's go via a standard 'debug patch' route.
> 
>>> Also testing with the latest mainline
>>> kernel (4.16) would be welcome whether this isn't just an issue with the
>>> backport of fsnotify fixes from Miklos.
>>
>> I can try the kernel-ml-4.16.2 from elrepo (we use CentOS 7).
> 
> Yes, that would be good.
> 

I have production server running 4.16.2 and no kernel crash dumps yet.
Let's wait another day before we say anything.

Cheers,
Pavlos



signature.asc
Description: OpenPGP digital signature


[PATCH net] net: mvpp2: Fix DMA address mask size

2018-04-18 Thread Maxime Chevallier
PPv2 TX/RX descriptors uses 40bits DMA addresses, but 41 bits masks were
used (GENMASK_ULL(40, 0)).

This commit fixes that by using the correct mask.

Fixes: e7c5359f2eed ("net: mvpp2: introduce PPv2.2 HW descriptors and adapt 
accessors")
Signed-off-by: Maxime Chevallier 
---
 drivers/net/ethernet/marvell/mvpp2.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvpp2.c 
b/drivers/net/ethernet/marvell/mvpp2.c
index 9deb79b6dcc8..4202f9b5b966 100644
--- a/drivers/net/ethernet/marvell/mvpp2.c
+++ b/drivers/net/ethernet/marvell/mvpp2.c
@@ -916,6 +916,8 @@ static struct {
 
 #define MVPP2_MIB_COUNTERS_STATS_DELAY (1 * HZ)
 
+#define MVPP2_DESC_DMA_MASKDMA_BIT_MASK(40)
+
 /* Definitions */
 
 /* Shared Packet Processor resources */
@@ -1429,7 +1431,7 @@ static dma_addr_t mvpp2_txdesc_dma_addr_get(struct 
mvpp2_port *port,
if (port->priv->hw_version == MVPP21)
return tx_desc->pp21.buf_dma_addr;
else
-   return tx_desc->pp22.buf_dma_addr_ptp & GENMASK_ULL(40, 0);
+   return tx_desc->pp22.buf_dma_addr_ptp & MVPP2_DESC_DMA_MASK;
 }
 
 static void mvpp2_txdesc_dma_addr_set(struct mvpp2_port *port,
@@ -1447,7 +1449,7 @@ static void mvpp2_txdesc_dma_addr_set(struct mvpp2_port 
*port,
} else {
u64 val = (u64)addr;
 
-   tx_desc->pp22.buf_dma_addr_ptp &= ~GENMASK_ULL(40, 0);
+   tx_desc->pp22.buf_dma_addr_ptp &= ~MVPP2_DESC_DMA_MASK;
tx_desc->pp22.buf_dma_addr_ptp |= val;
tx_desc->pp22.packet_offset = offset;
}
@@ -1507,7 +1509,7 @@ static dma_addr_t mvpp2_rxdesc_dma_addr_get(struct 
mvpp2_port *port,
if (port->priv->hw_version == MVPP21)
return rx_desc->pp21.buf_dma_addr;
else
-   return rx_desc->pp22.buf_dma_addr_key_hash & GENMASK_ULL(40, 0);
+   return rx_desc->pp22.buf_dma_addr_key_hash & 
MVPP2_DESC_DMA_MASK;
 }
 
 static unsigned long mvpp2_rxdesc_cookie_get(struct mvpp2_port *port,
@@ -1516,7 +1518,7 @@ static unsigned long mvpp2_rxdesc_cookie_get(struct 
mvpp2_port *port,
if (port->priv->hw_version == MVPP21)
return rx_desc->pp21.buf_cookie;
else
-   return rx_desc->pp22.buf_cookie_misc & GENMASK_ULL(40, 0);
+   return rx_desc->pp22.buf_cookie_misc & MVPP2_DESC_DMA_MASK;
 }
 
 static size_t mvpp2_rxdesc_size_get(struct mvpp2_port *port,
@@ -8789,7 +8791,7 @@ static int mvpp2_probe(struct platform_device *pdev)
}
 
if (priv->hw_version == MVPP22) {
-   err = dma_set_mask(>dev, DMA_BIT_MASK(40));
+   err = dma_set_mask(>dev, MVPP2_DESC_DMA_MASK);
if (err)
goto err_mg_clk;
/* Sadly, the BM pools all share the same register to
-- 
2.11.0



Re: [PATCH v6 05/11] ARM: smp: Add initialization of CNTVOFF

2018-04-18 Thread Geert Uytterhoeven
Allo Mylène,

On Mon, Apr 16, 2018 at 11:50 PM, Mylène Josserand
 wrote:
> The CNTVOFF register from arch timer is uninitialized.
> It should be done by the bootloader but it is currently not the case,
> even for boot CPU because this SoC is booting in secure mode.
> It leads to an random offset value meaning that each CPU will have a
> different time, which isn't working very well.
>
> Add assembly code used for boot CPU and secondary CPU cores to make
> sure that the CNTVOFF register is initialized. Because this code can
> be used by different platforms, add this assembly file in ARM's common
> folder.

Thanks for your patch!

> Signed-off-by: Mylène Josserand 

Reviewed-by: Geert Uytterhoeven 
Tested-by: Geert Uytterhoeven 

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


[PATCH 2/6 RESEND] statfs: use << to align with fs header

2018-04-18 Thread Christian Brauner
Consistenly use << to define ST_* constants. This also aligns them with
their MS_* counterparts in fs.h

Signed-off-by: Christian Brauner 
Cc: Alexander Viro 
---
 include/linux/statfs.h | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/include/linux/statfs.h b/include/linux/statfs.h
index 3142e98546ac..b336c04e793c 100644
--- a/include/linux/statfs.h
+++ b/include/linux/statfs.h
@@ -27,18 +27,18 @@ struct kstatfs {
  * ABI.  The exception is ST_VALID which has the same value as MS_REMOUNT
  * which doesn't make any sense for statfs.
  */
-#define ST_RDONLY  0x0001  /* mount read-only */
-#define ST_NOSUID  0x0002  /* ignore suid and sgid bits */
-#define ST_NODEV   0x0004  /* disallow access to device special files */
-#define ST_NOEXEC  0x0008  /* disallow program execution */
-#define ST_SYNCHRONOUS 0x0010  /* writes are synced at once */
-#define ST_VALID   0x0020  /* f_flags support is implemented */
-#define ST_MANDLOCK0x0040  /* allow mandatory locks on an FS */
-/* 0x0080 used for ST_WRITE in glibc */
-/* 0x0100 used for ST_APPEND in glibc */
-/* 0x0200 used for ST_IMMUTABLE in glibc */
-#define ST_NOATIME 0x0400  /* do not update access times */
-#define ST_NODIRATIME  0x0800  /* do not update directory access times */
-#define ST_RELATIME0x1000  /* update atime relative to mtime/ctime */
+#define ST_RDONLY  (1<<0) /* mount read-only */
+#define ST_NOSUID  (1<<1) /* ignore suid and sgid bits */
+#define ST_NODEV   (1<<2) /* disallow access to device special files */
+#define ST_NOEXEC  (1<<3) /* disallow program execution */
+#define ST_SYNCHRONOUS (1<<4) /* writes are synced at once */
+#define ST_VALID   (1<<5) /* f_flags support is implemented */
+#define ST_MANDLOCK(1<<6) /* allow mandatory locks on an FS */
+/* (1<<7) used for ST_WRITE in glibc */
+/* (1<<8) used for ST_APPEND in glibc */
+/* (1<<9) used for ST_IMMUTABLE in glibc */
+#define ST_NOATIME (1<<10) /* do not update access times */
+#define ST_NODIRATIME  (1<<11) /* do not update directory access times */
+#define ST_RELATIME(1<<12) /* update atime relative to mtime/ctime */
 
 #endif
-- 
2.17.0



Re: [PATCH v2 1/7] powerpc: Add TIDR CPU feature for Power9

2018-04-18 Thread Andrew Donnellan

On 18/04/18 11:08, Alastair D'Silva wrote:

From: Alastair D'Silva 

This patch adds a CPU feature bit to show whether the CPU has
the TIDR register available, enabling as_notify/wait in userspace.

Signed-off-by: Alastair D'Silva 


Per my previous email:

Reviewed-by: Andrew Donnellan 


--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited



Re: [PATCH] drm/xen-front: Remove CMA support

2018-04-18 Thread Oleksandr Andrushchenko

On 04/17/2018 12:08 PM, Oleksandr Andrushchenko wrote:

On 04/17/2018 12:04 PM, Daniel Vetter wrote:

On Tue, Apr 17, 2018 at 10:40:12AM +0300, Oleksandr Andrushchenko wrote:

From: Oleksandr Andrushchenko 

Even if xen-front allocates its buffers from contiguous memory
those are still not contiguous in PA space, e.g. the buffer is only
contiguous in IPA space.
The only use-case for this mode was if xen-front is used to allocate
dumb buffers which later be used by some other driver requiring
contiguous memory, but there is no currently such a use-case or
it can be worked around with xen-front.
Please also mention the nents confusion here, and the patch that 
fixes it.

Or just outright take the commit message from my patch with all the
details:

ok, if you don't mind then I'll use your commit message entirely

 drm/xen: Dissable CMA support
  It turns out this was only needed to paper over a bug in 
the CMA

 helpers, which was addressed in
  commit 998fb1a0f478b83492220ff79583bf9ad538bdd8
 Author: Liviu Dudau 
 Date:   Fri Nov 10 13:33:10 2017 +
  drm: gem_cma_helper.c: Allow importing of contiguous 
scatterlists with nents > 1

  Without this the following pipeline didn't work:
  domU:
 1. xen-front allocates a non-contig buffer
 2. creates grants out of it
  dom0:
 3. converts the grants into a dma-buf. Since they're non-contig, 
the

 scatter-list is huge.
 4. imports it into rcar-du, which requires dma-contig memory for
 scanout.
  -> On this given platform there's an IOMMU, so in theory 
this should

 work. But in practice this failed, because of the huge number of sg
 entries, even though the IOMMU driver mapped it all into a 
dma-contig

 range.
  With a guest-contig buffer allocated in step 1, this 
problem doesn't

 exist. But there's technically no reason to require guest-contig
 memory for xen buffer sharing using grants.

With the commit message improved:

Acked-by: Daniel Vetter 

Thank you,
I'll wait for a day and apply to drm-misc-next if this is ok

applied to drm-misc-next


Signed-off-by: Oleksandr Andrushchenko 


Suggested-by: Daniel Vetter 
---
  Documentation/gpu/xen-front.rst | 12 
  drivers/gpu/drm/xen/Kconfig | 13 
  drivers/gpu/drm/xen/Makefile    |  9 +--
  drivers/gpu/drm/xen/xen_drm_front.c | 62 +++-
  drivers/gpu/drm/xen/xen_drm_front.h | 42 ++-
  drivers/gpu/drm/xen/xen_drm_front_gem.c | 12 +---
  drivers/gpu/drm/xen/xen_drm_front_gem.h |  3 -
  drivers/gpu/drm/xen/xen_drm_front_gem_cma.c | 79 
-

  drivers/gpu/drm/xen/xen_drm_front_shbuf.c   | 22 --
  drivers/gpu/drm/xen/xen_drm_front_shbuf.h   |  8 ---
  10 files changed, 21 insertions(+), 241 deletions(-)
  delete mode 100644 drivers/gpu/drm/xen/xen_drm_front_gem_cma.c

diff --git a/Documentation/gpu/xen-front.rst 
b/Documentation/gpu/xen-front.rst

index 009d942386c5..d988da7d1983 100644
--- a/Documentation/gpu/xen-front.rst
+++ b/Documentation/gpu/xen-front.rst
@@ -18,18 +18,6 @@ Buffers allocated by the frontend driver
  .. kernel-doc:: drivers/gpu/drm/xen/xen_drm_front.h
 :doc: Buffers allocated by the frontend driver
  -With GEM CMA helpers
-
-
-.. kernel-doc:: drivers/gpu/drm/xen/xen_drm_front.h
-   :doc: With GEM CMA helpers
-
-Without GEM CMA helpers
-~~~
-
-.. kernel-doc:: drivers/gpu/drm/xen/xen_drm_front.h
-   :doc: Without GEM CMA helpers
-
  Buffers allocated by the backend
  
  diff --git a/drivers/gpu/drm/xen/Kconfig 
b/drivers/gpu/drm/xen/Kconfig

index 4f4abc91f3b6..4cca160782ab 100644
--- a/drivers/gpu/drm/xen/Kconfig
+++ b/drivers/gpu/drm/xen/Kconfig
@@ -15,16 +15,3 @@ config DRM_XEN_FRONTEND
  help
    Choose this option if you want to enable a para-virtualized
    frontend DRM/KMS driver for Xen guest OSes.
-
-config DRM_XEN_FRONTEND_CMA
-    bool "Use DRM CMA to allocate dumb buffers"
-    depends on DRM_XEN_FRONTEND
-    select DRM_KMS_CMA_HELPER
-    select DRM_GEM_CMA_HELPER
-    help
-  Use DRM CMA helpers to allocate display buffers.
-  This is useful for the use-cases when guest driver needs to
-  share or export buffers to other drivers which only expect
-  contiguous buffers.
-  Note: in this mode driver cannot use buffers allocated
-  by the backend.
diff --git a/drivers/gpu/drm/xen/Makefile 
b/drivers/gpu/drm/xen/Makefile

index 352730dc6c13..712afff5ffc3 100644
--- a/drivers/gpu/drm/xen/Makefile
+++ b/drivers/gpu/drm/xen/Makefile
@@ -5,12 +5,7 @@ drm_xen_front-objs := xen_drm_front.o \
    xen_drm_front_conn.o \
    xen_drm_front_evtchnl.o \
    

Re: [PATCH/RFC] crypto: Add platform dependencies for CRYPTO_DEV_CCREE

2018-04-18 Thread Geert Uytterhoeven
Hi Arnd,

On Tue, Apr 17, 2018 at 9:53 PM, Arnd Bergmann  wrote:
> On Tue, Apr 17, 2018 at 8:14 PM, Geert Uytterhoeven
>  wrote:
>> The ARM TrustZone CryptoCell is found on ARM SoCs only.  Hence make it
>> depend on ARM or ARM64, unless compile-testing.
>>
>> Drop the dependency on HAS_DMA, as DMA is always available on ARM and
>> ARM64 platforms, and doing so will increase compile coverage.
>>
>> Signed-off-by: Geert Uytterhoeven 
>> ---
>> Is ARM || ARM64 OK?
>> Or should this be limited to either ARM or ARM64? Or something else?
>
> ARM || ARM64 seems fine, but don't you need '|| (HAS_DMA && COMPILE_TEST)'?
>
> I assume the HAS_DMA dependency was added to prevent compile
> testing to run into a build error.

Probably it was. But in v4.17-rc1, dummies are present in the NO_DMA case,
so everything compile-tests fine.

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: [GIT PULL V3] Thermal SoC management updates for v4.17-rc1

2018-04-18 Thread Zhang Rui
Hi, Eduardo,

On 六, 2018-04-14 at 11:30 -0700, Eduardo Valentin wrote:
> Hello Linus,
> 
> Please find thermal-soc changes for v4.17-rc1.
> Rui asked me to send the pull request directly to you
> as we are close to the end of the merge window.
> Essentially this pull removes the series that caused
> warning regression. I will work with the developer
> to get that fixed later on, but I am still sending
> the other few patches that are unrelated to that.
> Let me know if this causes any issues and can still
> be pulled.
> 
> Changelog:
> - New i.MX7 thermal sensor
> - Mediatek driver now supports MT7622 SoC
> - Removal of min max cpu cooling DT property
> 
> Differences in V3:
> - Rebased on top current linus/master, to avoid and merge issues
> from previous pulled thermal code.
> 
> Differences in V2:
> - Reordered the patches to drop exynos changes for now until we get
> agreement on the fix on that driver for the compilation warns
> caused by the confusing conversion functions.
> 
> 
> The following changes since commit
> 48023102b7078a6674516b1fe0d639669336049d:
> 
>   Merge branch 'overlayfs-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs (2018-04-
> 13 16:55:41 -0700)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-
> thermal linus
> 
> for you to fetch changes up to
> 15a32df1918259be6c23fc36014fc26ee66c836c:
> 
>   dt-bindings: thermal: Remove "cooling-{min|max}-level" properties
> (2018-04-14 09:37:55 -0700)
> 
This pull request does not catch this merge window.
So do you want to split it into 2 separate pull requests, one for 4.17-
rc and another for 4.18-rc1?

> 
> Anson Huang (1):
>   thermal: imx: add i.MX7 thermal sensor support
> 
> Bartlomiej Zolnierkiewicz (1):
>   dt-bindings: thermal: remove no longer needed samsung thermal
> properties
> 
> Sean Wang (2):
>   dt-bindings: thermal: add binding for MT7622 SoC
>   thermal: mediatek: add support for MT7622 SoC
> 
> Viresh Kumar (1):
>   dt-bindings: thermal: Remove "cooling-{min|max}-level"
> properties
> 
IMO, together with the refreshed exynos fixes, the one from Viresh and
the one from Bartlomiej can be queued for 4.17-rc, and the others have
to wait until next merge window.

thanks,
rui


>  .../devicetree/bindings/thermal/exynos-thermal.txt |  23 +-
>  .../devicetree/bindings/thermal/imx-thermal.txt|   9 +-
>  .../bindings/thermal/mediatek-thermal.txt  |   1 +
>  .../devicetree/bindings/thermal/thermal.txt|  16 +-
>  drivers/thermal/imx_thermal.c  | 295
> -
>  drivers/thermal/mtk_thermal.c  |  35 +++
>  6 files changed, 281 insertions(+), 98 deletions(-)


Re: [patch v2] mm, oom: fix concurrent munlock and oom reaper unmap

2018-04-18 Thread Michal Hocko
On Tue 17-04-18 19:52:41, David Rientjes wrote:
> Since exit_mmap() is done without the protection of mm->mmap_sem, it is
> possible for the oom reaper to concurrently operate on an mm until
> MMF_OOM_SKIP is set.
> 
> This allows munlock_vma_pages_all() to concurrently run while the oom
> reaper is operating on a vma.  Since munlock_vma_pages_range() depends on
> clearing VM_LOCKED from vm_flags before actually doing the munlock to
> determine if any other vmas are locking the same memory, the check for
> VM_LOCKED in the oom reaper is racy.
> 
> This is especially noticeable on architectures such as powerpc where
> clearing a huge pmd requires serialize_against_pte_lookup().  If the pmd
> is zapped by the oom reaper during follow_page_mask() after the check for
> pmd_none() is bypassed, this ends up deferencing a NULL ptl.
> 
> Fix this by reusing MMF_UNSTABLE to specify that an mm should not be
> reaped.  This prevents the concurrent munlock_vma_pages_range() and
> unmap_page_range().  The oom reaper will simply not operate on an mm that
> has the bit set and leave the unmapping to exit_mmap().

This will further complicate the protocol and actually theoretically
restores the oom lockup issues because the oom reaper doesn't set
MMF_OOM_SKIP when racing with exit_mmap so we fully rely that nothing
blocks there... So the resulting code is more fragile and tricky.

Can we try a simpler way and get back to what I was suggesting before
[1] and simply not play tricks with
down_write(>mmap_sem);
up_write(>mmap_sem);

and use the write lock in exit_mmap for oom_victims?

Andrea wanted to make this more clever but this is the second fallout
which could have been prevented. The patch would be smaller and the
locking protocol easier

[1] http://lkml.kernel.org/r/20170727065023.gb20...@dhcp22.suse.cz

> Fixes: 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run 
> concurrently")
> Cc: sta...@vger.kernel.org [4.14+]
> Signed-off-by: David Rientjes 
> ---
>  v2:
>   - oom reaper only sets MMF_OOM_SKIP if MMF_UNSTABLE was never set (either
> by itself or by exit_mmap(), per Tetsuo
>   - s/kick_all_cpus_sync/serialize_against_pte_lookup/ in changelog as more
> isolated way of forcing cpus as non-idle on power
> 
>  mm/mmap.c | 38 --
>  mm/oom_kill.c | 28 +---
>  2 files changed, 33 insertions(+), 33 deletions(-)
> 
> diff --git a/mm/mmap.c b/mm/mmap.c
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -3015,6 +3015,25 @@ void exit_mmap(struct mm_struct *mm)
>   /* mm's last user has gone, and its about to be pulled down */
>   mmu_notifier_release(mm);
>  
> + if (unlikely(mm_is_oom_victim(mm))) {
> + /*
> +  * Wait for oom_reap_task() to stop working on this mm.  Because
> +  * MMF_UNSTABLE is already set before calling down_read(),
> +  * oom_reap_task() will not run on this mm after up_write().
> +  * oom_reap_task() also depends on a stable VM_LOCKED flag to
> +  * indicate it should not unmap during munlock_vma_pages_all().
> +  *
> +  * mm_is_oom_victim() cannot be set from under us because
> +  * victim->mm is already set to NULL under task_lock before
> +  * calling mmput() and victim->signal->oom_mm is set by the oom
> +  * killer only if victim->mm is non-NULL while holding
> +  * task_lock().
> +  */
> + set_bit(MMF_UNSTABLE, >flags);
> + down_write(>mmap_sem);
> + up_write(>mmap_sem);
> + }
> +
>   if (mm->locked_vm) {
>   vma = mm->mmap;
>   while (vma) {
> @@ -3036,26 +3055,9 @@ void exit_mmap(struct mm_struct *mm)
>   /* update_hiwater_rss(mm) here? but nobody should be looking */
>   /* Use -1 here to ensure all VMAs in the mm are unmapped */
>   unmap_vmas(, vma, 0, -1);
> -
> - if (unlikely(mm_is_oom_victim(mm))) {
> - /*
> -  * Wait for oom_reap_task() to stop working on this
> -  * mm. Because MMF_OOM_SKIP is already set before
> -  * calling down_read(), oom_reap_task() will not run
> -  * on this "mm" post up_write().
> -  *
> -  * mm_is_oom_victim() cannot be set from under us
> -  * either because victim->mm is already set to NULL
> -  * under task_lock before calling mmput and oom_mm is
> -  * set not NULL by the OOM killer only if victim->mm
> -  * is found not NULL while holding the task_lock.
> -  */
> - set_bit(MMF_OOM_SKIP, >flags);
> - down_write(>mmap_sem);
> - up_write(>mmap_sem);
> - }
>   free_pgtables(, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING);
>   tlb_finish_mmu(, 0, -1);
> + set_bit(MMF_OOM_SKIP, >flags);
>  
>

Re: [PATCH v6 4/7] remoteproc/davinci: prepare and unprepare the clock where needed

2018-04-18 Thread Bartosz Golaszewski
2018-04-18 7:27 GMT+02:00 Sekhar Nori :
> On Tuesday 17 April 2018 11:00 PM, Bartosz Golaszewski wrote:
>> From: Bartosz Golaszewski 
>>
>> We're currently switching the platform to using the common clock
>> framework. We need to explicitly prepare and unprepare the rproc
>> clock.
>>
>> Signed-off-by: Bartosz Golaszewski 
>> Acked-by: Suman Anna 
>> Reviewed-by: David Lechner 
>
> Reviewed-by: Sekhar Nori 
>
> This should be safe to apply to v4.17-rc1 as well (for inclusion in v4.18).
>
> Bartosz, I noticed that CONFIG_REMOTEPROC and the DA8XX driver is not
> enabled in davinci_all_defconfig. Can you please send a patch enabling
> that too?
>
> Thanks,
> Sekhar

Sure, will do.

Bart


Re: [PATCH v3 3/3] Documentation/i2c: adopt kernel commenting style in examples

2018-04-18 Thread Wolfram Sang
On Fri, Apr 13, 2018 at 10:42:57AM -0700, Sam Hansen wrote:
> The example I2C code is rewritten to adopt the preferred kernel block
> commenting style.
> 
> Signed-off-by: Sam Hansen 

Applied to for-current, thanks!



signature.asc
Description: PGP signature


Re: [PATCH 1/1] i2c: dev: check i2c_msg len before memdup_user() to prevent ZERO_SIZE_PTR deref

2018-04-18 Thread Uwe Kleine-König
On Wed, Apr 18, 2018 at 10:56:03AM +0300, Alexander Popov wrote:
> On 18.04.2018 10:07, Uwe Kleine-König wrote:
> > Hello,
> 
> Hello Uwe,
> 
> Thanks for your reply.
> 
> > On Wed, Apr 18, 2018 at 03:16:45AM +0300, Alexander Popov wrote:
> >> Currently i2cdev_ioctl_rdwr() doesn't check i2c_msg len against zero
> >> before calling memdup_user(). If this len is zero memdup_user() returns
> >> ZERO_SIZE_PTR, which is later considered as valid since
> >> IS_ERR(ZERO_SIZE_PTR) is false. That causes ZERO_SIZE_PTR deref oops.
> > 
> > You're saying that
> > 
> > memdup_user(ptr, 0)
> > 
> > reads from *ptr? I'd say this is a bug in memdup_user, not its user.
> 
> No, I don't say that.
> 
> memdup_user(ptr, 0) returns ZERO_SIZE_PTR, which is later considered as valid
> since IS_ERR(ZERO_SIZE_PTR) is false:
> 
>   msgs[i].buf = memdup_user(data_ptrs[i], msgs[i].len);
>   if (IS_ERR(msgs[i].buf)) {
>   res = PTR_ERR(msgs[i].buf);
>   break;
>   }
> 
> That causes ZERO_SIZE_PTR deref oops after that:
> 
> root@syzkaller:~# ./repro
> [   22.015442] kasan: CONFIG_KASAN_INLINE enabled
> [   22.066965] kasan: GPF could be caused by NULL-ptr deref or user memory 
> access
> [   22.068624] general protection fault:  [#1] SMP KASAN
> [   22.069705] Dumping ftrace buffer:
> [   22.070399](ftrace buffer empty)
> [   22.071033] Modules linked in:
> [   22.071562] CPU: 0 PID: 3899 Comm: repro.exe Not tainted 4.17.0-rc1 #2
> [   22.072632] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> Ubuntu-1.8.2-1ubuntu1 04/01/2014
> [   22.074219] RIP: 0010:i2cdev_ioctl_rdwr+0x12b/0x7b0
> [   22.075023] RSP: 0018:880061f3fa68 EFLAGS: 00010346
> [   22.075877] RAX: 0002 RBX:  RCX: 
> 
> [   22.076973] RDX:  RSI: 2000 RDI: 
> 88006a2e9542
> [   22.078086] RBP: 880061f3fac0 R08:  R09: 
> 880060b44780
> [   22.079166] R10: 11000c3e7f1d R11:  R12: 
> dc00
> [   22.080251] R13: 88006a2e9540 R14: 0010 R15: 
> 0001
> [   22.081339] FS:  020bc880() GS:88006ba0()
> knlGS:
> [   22.082615] CS:  0010 DS:  ES:  CR0: 80050033
> [   22.083526] CR2: 22c3 CR3: 6724a000 CR4: 
> 06f0
> [   22.084631] Call Trace:
> [   22.085501]  ? i2cdev_ioctl_rdwr+0xf/0x7b0
> [   22.086865]  i2cdev_ioctl+0x4ec/0x940
> [   22.088677]  ? kasan_check_read+0x11/0x20
> [   22.090555]  ? i2cdev_ioctl_smbus+0x6a0/0x6a0
> [   22.091862]  ? do_raw_spin_trylock+0x1e0/0x1e0
> [   22.092428]  ? kasan_check_write+0x14/0x20
> [   22.092946]  ? trace_hardirqs_off+0xd/0x10
> [   22.093451]  ? _raw_spin_unlock_irqrestore+0xa6/0xe0
> [   22.094013]  ? debug_check_no_obj_freed+0x341/0x7eb
> [   22.094547]  ? i2cdev_ioctl_smbus+0x6a0/0x6a0
> [   22.095086]  do_vfs_ioctl+0x1cd/0x17b0
> [   22.095482]  ? kasan_check_read+0x11/0x20
> [   22.095978]  ? rcu_is_watching+0x7b/0x150
> [   22.096428]  ? ioctl_preallocate+0x350/0x350
> [   22.096908]  ? __fget_light+0x2fc/0x4c0
> [   22.097351]  ? fget_raw+0x20/0x20
> [   22.097721]  ? kmem_cache_free+0x31c/0x450
> [   22.098164]  ? putname+0xfa/0x150
> [   22.098511]  ? do_sys_open+0x31c/0x710
> [   22.099792]  ? security_file_ioctl+0x8c/0xc0
> [   22.102080]  ksys_ioctl+0x94/0xb0
> [   22.103204]  __x64_sys_ioctl+0x7c/0xd0
> [   22.103643]  do_syscall_64+0x193/0x920
> [   22.104186]  ? trace_event_raw_event_sys_exit+0x2e0/0x2e0
> [   22.105061]  ? syscall_return_slowpath+0x6a0/0x6a0
> [   22.106717]  ? syscall_return_slowpath+0x2de/0x6a0
> [   22.108183]  ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
> [   22.109597]  ? trace_hardirqs_off_thunk+0x1a/0x1c
> [   22.110167]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [   22.110735] RIP: 0033:0x44df89
> [   22.111077] RSP: 002b:7fff7fb01ca8 EFLAGS: 0213 ORIG_RAX:
> 0010
> [   22.111932] RAX: ffda RBX: 00400418 RCX: 
> 0044df89
> [   22.112958] RDX: 2080 RSI: 0707 RDI: 
> 0003
> [   22.114078] RBP: 7fff7fb01cc0 R08:  R09: 
> 00401af0
> [   22.115784] R10:  R11: 0213 R12: 
> 00401b90
> [   22.116870] R13:  R14: 006bd018 R15: 
> 
> [   22.117953] Code: 00 e8 7a 53 bd fb 41 83 e7 01 0f 84 e8 03 00 00 e8 6b 53 
> bd
> fb 4d 85 f6 0f 84 12 06 00 00 4c 89 f0 4c 89 f1 48 c1 e8 03 83 e1 07 <42> 0f 
> b6
> 04 20 38 c8 7f 08 84 c0 0f 85 e7 05 00 00 45 0f b6 36
> [   22.120532] RIP: i2cdev_ioctl_rdwr+0x12b/0x7b0 RSP: 880061f3fa68
> [   22.121290] ---[ end trace b365c176b1d95614 ]---
> 
> 
> > If however the problem only happens later in
> > 
> > if (msgs[i].flags & I2C_M_RECV_LEN) {
> > if (!(msgs[i].flags & I2C_M_RD) || msgs[i].buf[0] < 1 || ...)
> 
> Yes, that's true. I think I should make the commit 

Re: [PATCH] x86/centaur: report correct CPU/cache topology

2018-04-18 Thread David Wang


> -Original Mail-
> Sender: Thomas Gleixner [mailto:t...@linutronix.de]
> Time : 2018/4/17 18:16
> Receiver: David Wang 
> CC: mi...@redhat.com; h...@zytor.com; mi...@kernel.org;
> x...@kernel.org; linux-kernel@vger.kernel.org; brucechang@via-
> alliance.com; cooper...@zhaoxin.com; qiyuanw...@zhaoxin.com;
> benjamin...@viatech.com; luke...@viacpu.com; tim...@zhaoxin.com
> Subject: Re: [PATCH] x86/centaur: report correct CPU/cache topology
> 
> On Wed, 4 Apr 2018, David Wang wrote:
> 
> > This patch is used to support multi-core Centaur CPU. After using this
> > patch, we can get correct CPU topology and correct cache topology.
> 
> David. This changelog is pretty useless. First of all, please do not use
'This
> patch ..'. We all know already that this is a patch.
> Documentation/process/submitting-patches.rst has a good explanation
> about writing changelogs.
> 
> The changelog should explain why it does something. Let me give you an
> example:
> 
>   Centaur CPUs enumerate the cache topology in the same way as Intel CPUs,
>   but the functionality is unused so far. The Centaur init code also
misses
>   to initialize x86_cpuinfo::max_cores so the CPU topology cannot be
>   desribed correctly,
> 
>   Initialize x86_cpuinfo::max_cores and invoke init_intel_cacheinfo() to
>   make CPU and cache topology information available and correct.
> 
> See? I'm neither using 'this patch' nor 'We/I' as I'm not impersonatimg
the
> code. It's all factual instead.
> > Signed-off-by: David Wang 
> > ---
> >  arch/x86/kernel/cpu/centaur.c | 20 
> >  1 file changed, 20 insertions(+)
> >
> > diff --git a/arch/x86/kernel/cpu/centaur.c
> > b/arch/x86/kernel/cpu/centaur.c index e5ec0f1..713e4db 100644
> > --- a/arch/x86/kernel/cpu/centaur.c
> > +++ b/arch/x86/kernel/cpu/centaur.c
> > @@ -112,6 +112,19 @@ static void early_init_centaur(struct cpuinfo_x86
> *c)
> > }
> >  }
> >
> > +static int centaur_num_cpu_cores(struct cpuinfo_x86 *c) {
> > +   unsigned int eax, ebx, ecx, edx;
> > +
> > +   if (c->cpuid_level < 4)
> > +   return 1;
> > +   cpuid_count(4, 0, , , , );
> > +   if (eax & 0x1f)
> > +   return (eax >> 26) + 1;
> > +   else
> > +   return 1;
> 
> This is a bad copy of intel_num_cpu_cores(). See for the subtle
difference.
> Please rename the intel function and move it to common.c
> 
> >  static void init_centaur(struct cpuinfo_x86 *c)  {  #ifdef
> > CONFIG_X86_32 @@ -128,6 +141,13 @@ static void init_centaur(struct
> > cpuinfo_x86 *c)
> > clear_cpu_cap(c, 0*32+31);
> >  #endif
> > early_init_centaur(c);
> > +
> > +   init_intel_cacheinfo(c);
> > +   c->x86_max_cores = centaur_num_cpu_cores(c); #ifdef
> CONFIG_X86_32
> > +   detect_ht(c);
> > +#endif
> 
> Can you please create a stub inline of detect_ht() for the !32bit case and
get
> rid of these #ifdefs in the code. That wants to be a separate patch which
also
> cleans up the existing call sites.
> 
> Thanks,
> 
>   tglx

I will send patch v2 to solve all problems you listed.
Thanks,
---
David





Re: [PATCH v3 2/3] Documentation/i2c: sync docs with current state of i2c-tools

2018-04-18 Thread Wolfram Sang

> +The above functions are made available by linking against the libi2c library,
> +which is provided by the i2c-tools project.  See:
> +https://git.kernel.org/pub/scm/utils/i2c-tools/i2c-tools.git/.

In the beginning, we say that '#include ' is needed.
Shouldn't we mention i2c-tools there already and in what case it is
needed (only for SMBus)? I'd think so.

Sam, would you be open to do this as an incremental patch?



signature.asc
Description: PGP signature


Re: [PATCH v4 05/15] KVM: s390: enable/disable AP interpretive execution

2018-04-18 Thread Pierre Morel

On 17/04/2018 20:11, Tony Krowiak wrote:

On 04/17/2018 12:55 PM, Pierre Morel wrote:

On 17/04/2018 18:22, Tony Krowiak wrote:

On 04/17/2018 12:13 PM, Pierre Morel wrote:

On 17/04/2018 17:02, Tony Krowiak wrote:

On 04/16/2018 06:51 AM, Pierre Morel wrote:

On 15/04/2018 23:22, Tony Krowiak wrote:

The VFIO AP device model exploits interpretive execution of AP
instructions (APIE) to provide guests passthrough access to AP
devices. This patch introduces a new interface to enable and
disable APIE.

Signed-off-by: Tony Krowiak 
---
  arch/s390/include/asm/kvm-ap.h   |   16 
  arch/s390/include/asm/kvm_host.h |    1 +
  arch/s390/kvm/kvm-ap.c   |   20 
  arch/s390/kvm/kvm-s390.c |    9 +
  4 files changed, 46 insertions(+), 0 deletions(-)

diff --git a/arch/s390/include/asm/kvm-ap.h 
b/arch/s390/include/asm/kvm-ap.h

index 736e93e..a6c092e 100644
--- a/arch/s390/include/asm/kvm-ap.h
+++ b/arch/s390/include/asm/kvm-ap.h
@@ -35,4 +35,20 @@
   */
  void kvm_ap_build_crycbd(struct kvm *kvm);

+/**
+ * kvm_ap_interpret_instructions
+ *
+ * Indicate whether AP instructions shall be interpreted. If 
they are not
+ * interpreted, all AP instructions will be intercepted and 
routed back to

+ * userspace.
+ *
+ * @kvm: the virtual machine attributes
+ * @enable: indicates whether AP instructions are to be 
interpreted (true) or

+ *    or not (false).
+ *
+ * Returns 0 if completed successfully; otherwise, returns 
-EOPNOTSUPP

+ * indicating that AP instructions are not installed on the guest.
+ */
+int kvm_ap_interpret_instructions(struct kvm *kvm, bool enable);
+
  #endif /* _ASM_KVM_AP */
diff --git a/arch/s390/include/asm/kvm_host.h 
b/arch/s390/include/asm/kvm_host.h

index 3162783..5470685 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -715,6 +715,7 @@ struct kvm_s390_crypto {
  __u32 crycbd;
  __u8 aes_kw;
  __u8 dea_kw;
+    __u8 apie;
  };

  #define APCB0_MASK_SIZE 1
diff --git a/arch/s390/kvm/kvm-ap.c b/arch/s390/kvm/kvm-ap.c
index 991bae4..55d11b5 100644
--- a/arch/s390/kvm/kvm-ap.c
+++ b/arch/s390/kvm/kvm-ap.c
@@ -58,3 +58,23 @@ void kvm_ap_build_crycbd(struct kvm *kvm)
  }
  }
  EXPORT_SYMBOL(kvm_ap_build_crycbd);
+
+int kvm_ap_interpret_instructions(struct kvm *kvm, bool enable)
+{
+    int ret = 0;
+
+    mutex_lock(>lock);
+
+    if (!test_kvm_cpu_feat(kvm, KVM_S390_VM_CPU_FEAT_AP)) {


Do we really need to test CPU_FEAT_AP?


Yes we do.


really? why?


The KVM_S390_VM_CPU_FEAT_AP will not be enabled by KVM if the AP
instructions are not installed on the host. I assume - but have
no way of verifying - that if the AP instructions are not installed
on the host, that interpretation would fail. Do you know what would
happen if AP instructions are interpreted when not installed on
the host?


If the host has no AP instructions (his ECA.28=0) but it set ECA.28 
for a guest,

there will be no AP instructions available in the guest.


Then there's the answer to your question; this is why we to test 
CPU_FEAT_AP.


We can postpone this discussion when we discuss on VSIE.
For this specific call I just wanted to point out that obviously this 
function should not

be called if the guest has no AP instructions.
















I understand that KVM_S390_VM_CPU_FEAT_AP means AP instructions 
are interpreted.

shouldn't we add this information in the name?
like KVM_S390_VM_CPU_FEAT_APIE


KVM_S390_VM_CPU_FEAT_AP does NOT mean AP instructions are 
interpreted, it means

AP instructions are installed.


Right same error I made all along this review.
But AFAIK it means AP instructions are provided to the guest.
Then should this function be called if the guest has no AP 
instructions ?








+    ret = -EOPNOTSUPP;
+    goto done;
+    }
+
+    kvm->arch.crypto.apie = enable;
+    kvm_s390_vcpu_crypto_reset_all(kvm);
+
+done:
+    mutex_unlock(>lock);
+    return ret;
+}
+EXPORT_SYMBOL(kvm_ap_interpret_instructions);
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 55cd897..1dc8566 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -1901,6 +1901,9 @@ static void kvm_s390_crypto_init(struct 
kvm *kvm)

  kvm->arch.crypto.crycb = >arch.sie_page2->crycb;
  kvm_ap_build_crycbd(kvm);

+    /* Default setting indicating SIE shall interpret AP 
instructions */

+    kvm->arch.crypto.apie = 1;
+
  if (!test_kvm_facility(kvm, 76))
  return;

@@ -2434,6 +2437,12 @@ static void 
kvm_s390_vcpu_crypto_setup(struct kvm_vcpu *vcpu)

  {
  vcpu->arch.sie_block->crycbd = vcpu->kvm->arch.crypto.crycbd;

+    vcpu->arch.sie_block->eca &= ~ECA_APIE;
+    if (vcpu->kvm->arch.crypto.apie &&
+    test_kvm_cpu_feat(vcpu->kvm, KVM_S390_VM_CPU_FEAT_AP))


Do we call xxx_crypto_setup() if KVM does not support AP 
interpretation?


Yes, kvm_s390_vcpu_crypto_setup(vcpu) is called by 
kvm_arch_vcpu_setup(vcpu)
as 

Re: [PATCH v3 2/2] iommu/amd: Add basic debugfs infrastructure for AMD IOMMU

2018-04-18 Thread Yang, Shunyong
Hi, Sohil

On Wed, 2018-04-18 at 07:27 +, Mehta, Sohil wrote:
> On Wed, 2018-04-18 at 05:58 +, Yang, Shunyong wrote:
> > 
> > Hi, Gary and Sohil,
> > 
> > On Tue, 2018-04-17 at 13:38 -0400, Hook, Gary wrote:
> > > 
> > > On 4/13/2018 8:08 PM, Mehta, Sohil wrote:
> > > > 
> > > >  
> > > > On Fri, 2018-04-06 at 08:17 -0500, Gary R Hook wrote:
> > > > > 
> > > > >  
> > > > >   
> > > > > +
> > > > > +void amd_iommu_debugfs_setup(struct amd_iommu *iommu)
> > > > > +{
> > > > > + char name[MAX_NAME_LEN + 1];
> > > > > + struct dentry *d_top;
> > > > > +
> > > > > + if (!debugfs_initialized())
> > > > Probably not needed.
> > > Right.
> > When will this check is needed?
> > IMO, this function is to check debugfs ready status before we want
> > to
> > use debugfs. I just want to understand when we should use
> > debugfs_initialized();
> > 
> You are right debugfs_initialized() can be used to check if debugfs
> is
> ready. However in this case we can also rely on debugfs_create_dir()
> which is called in iommu_debufs_setup().
> 
> debugfs_create_dir() says:
> 
>  * If debugfs is not enabled in the kernel, the value -%ENODEV will
> be
>  * returned.

It seems "If debugfs is not enabled in the kernel"
means CONFIG_DEBUG_FS is not configured. Following is the code of no
such config.

  static inline struct dentry *debugfs_create_dir(const char *name,
struct dentry *parent)
  {
return ERR_PTR(-ENODEV);
  }

Looking into the code, debugfs_initialized() return the value of
debugfs_registered. debugfs_registered is set to true after
debugfs_init() has been called.
However, debugfs_create_dir() doesn't call debugfs_initialized() or
check debugfs_registered value.
So, there is tiny different of checking status by debugfs_create_dir()
and debugfs_initialized(). Although it can achieve functionality here.

Maybe the original design is to call debugfs_initialized() before
calling debugfs_create_xxx()?

Thanks.
Shunyong.

> 
> Sohil
> 
> > 
> > Thanks.
> > Shunyong.
> > 
> > > 
> > >  
> > > > 
> > > >  
> > > >  
> > > > > 
> > > > >  
> > > > > + return;
> > > > > +
> > > > > + mutex_lock(_iommu_debugfs_lock);
> > > > > + if (!amd_iommu_debugfs) {
> > > > > + d_top = iommu_debugfs_setup();
> > > > > + if (d_top)
> > > > > + amd_iommu_debugfs =
> > > > > debugfs_create_dir("amd", d_top);
> > > > > + }
> > > > > + mutex_unlock(_iommu_debugfs_lock);


Re: Issue with Enable LTR while pcie_aspm off

2018-04-18 Thread Srinath Mannam
Hi Bjorn,

Thank you very much for you time and solution.

On Tue, Apr 17, 2018 at 10:41 PM, Bjorn Helgaas  wrote:
> On Tue, Apr 17, 2018 at 02:33:52PM +0530, Srinath Mannam wrote:
>> Hi Bjorn,
>>
>> Thank you for more insight you have given about the problem.
>>
>> For us the issue comes before we disable apst feature.
>> on APST quirk set, NVMe driver disable apst by send a command to NVMe
>> controller.
>> We see issue at the time of NVMe initialization only.
>>
>> So APST quirk did not helped.
>
> OK, thanks for checking that.
>
>> On Tue, Apr 17, 2018 at 3:05 AM, Bjorn Helgaas  wrote:
>> > On Mon, Apr 16, 2018 at 09:03:33PM +0530, Srinath Mannam wrote:
>> >> On Sat, Apr 14, 2018 at 9:39 PM, Bjorn Helgaas  wrote:
>> >> > On Sat, Apr 14, 2018 at 09:04:05AM +0530, Srinath Mannam wrote:
>
>> >> >> But In our platform we required to disable ASPM.
>> >>
>> >> > We're trying to figure out exactly *why* you must disable ASPM.  If
>> >> > it's because of a hardware defect, e.g., the device advertises ASPM
>> >> > support but it's actually broken, we probably need to add a quirk.
>> >> > Given the complexity of ASPM, it's surprising we don't have similar
>> >> > quirks already.
>> >>
>> >> We see issues with ASPM enabled. Some link issues observed so for
>> >> time being we are using with aspm disabled until we fix that issue.
>> >> ...
>
>> >> with LTR enabled also we observed some problem, that after LTR
>> >> messages received from EP, we see completion timeout with config
>> >> write.
>> >
>> >> So I thought If LTR configuration function also part of aspm file,
>> >> as it was under CONFIG_ASPM.  using pcie_aspm boot arg I can disable
>> >> both ASPM and LTR.
>> >
>> >> If this is not possible, then I will go for alternative solution of
>> >> quirk implementation as you suggest.
>> >
>> > Is this platform a lab prototype or is it already shipping?  If it's
>> > already shipping, you probably need some sort of upstream solution
>> > like an NVMe or PCIe quirk, but if not, maybe you can just hack your
>> > bringup kernel to disable ASPM and LTR until you fix the root cause.
>> >
>> we are at evolution stage so we need to fix this ASAP.
>> As you said earlier, Can I add sysfs interface to enable LTR same as
>> we do L1SS or in the part of aspm cap init function.
>
> I don't know what evolution stage means, but it sounds like you do
> need an upstream fix.
>
> I don't understand the root cause of the problem yet, so I don't know
> what the best solution for you is.  Turning off LTR and ASPM
> completely is a pretty big hammer.  Ideally we could isolate this to
> certain devices or certain pieces of ASPM so we could still get some
> of the benefit.
>
>   - Do you need to disable LTR on the entire system, or just for the
> Samsung NVMe device?
>
We see both ASPM and LTR issues with samsung device SM961/PM961.
But the same device works fine with multiple other platforms, means we
have to debug our RC.

>   - Do you need to disable LTR if you replace the Samsung device with
> something else?
>
So far we see issue with samsung device only. Need to explore more.

>   - LTR is only needed for the ASPM L1.2 substate.  Do you need to
> completely disable ASPM, or is it sufficient to disable LTR and
> the ASPM L1.2 substate?

We need to disable common clock configuration which is done in the part of ASPM.
If we enable common clock, we see issues with samsung device.
>
> Here are some patches you can try.  These require some tweaking based
> on what the root cause of the problem is.
>
>   1  PCI/ACPI: Request control of LTR from the platform
>   2  PCI/ASPM: Disable ASPM L1.2 Substate if we don't have LTR
>   3  PCI: Enable LTR only if ASPM is enabled
>   4  PCI: Disable LTR for Samsung NVMe SSD Controller SM961/PM961
>
> If LTR is completely broken for all devices on your platform, and
> disabling it and the ASPM L1.2 substate is sufficient, you could use
> patches 1 and 2 along with a PCI or DMI quirk to clear
> host->native_ltr.  That should prevent use of LTR and the ASPM L1.2
> substate.
>
> Patch 3 should disable LTR as well as all of ASPM (not just the ASPM
> L1.2 substate) if you boot with "pcie_aspm=off".  I don't like this
> very well because the user experience is poor -- you need a release
> note or something telling the user to boot with "pcie_aspm=off", which
> is a really ugly solution.
>
> If the problem is some sort of interaction between the Samsung SSD and
> your platform, you could use patches 2 and 4 to disable LTR and ASPM
> L1.2 just for that device in your system.  Of course, you would need
> to add a DMI or similar check in the quirk, because we can't disable
> LTR and ASPM L1.2 for the Samsung SSD in *all* systems.
>
> I think patches 1-3 are candidates for the mainline kernel, regardless
> of whether you need them yourself.
>
As you suggested I am using patch 2 and 4 which are suitable to our requirement.
Until we 

Re: [RFC PATCH v2 4/6] sched/fair: Introduce an energy estimation helper function

2018-04-18 Thread Leo Yan
On Wed, Apr 18, 2018 at 09:13:39AM +0100, Quentin Perret wrote:
> On Tuesday 17 Apr 2018 at 23:22:13 (+0800), Leo Yan wrote:
> > On Fri, Apr 06, 2018 at 04:36:05PM +0100, Dietmar Eggemann wrote:
> > > From: Quentin Perret 
> > > 
> > > In preparation for the definition of an energy-aware wakeup path, a
> > > helper function is provided to estimate the consequence on system energy
> > > when a specific task wakes-up on a specific CPU. compute_energy()
> > > estimates the OPPs to be reached by all frequency domains and estimates
> > > the consumption of each online CPU according to its energy model and its
> > > percentage of busy time.
> > > 
> > > Cc: Ingo Molnar 
> > > Cc: Peter Zijlstra 
> > > Signed-off-by: Quentin Perret 
> > > Signed-off-by: Dietmar Eggemann 
> > > ---
> > >  include/linux/sched/energy.h | 20 +
> > >  kernel/sched/fair.c  | 68 
> > > 
> > >  kernel/sched/sched.h |  2 +-
> > >  3 files changed, 89 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/include/linux/sched/energy.h b/include/linux/sched/energy.h
> > > index 941071eec013..b4110b145228 100644
> > > --- a/include/linux/sched/energy.h
> > > +++ b/include/linux/sched/energy.h
> > > @@ -27,6 +27,24 @@ static inline bool sched_energy_enabled(void)
> > >   return static_branch_unlikely(_energy_present);
> > >  }
> > >  
> > > +static inline
> > > +struct capacity_state *find_cap_state(int cpu, unsigned long util)
> > > +{
> > > + struct sched_energy_model *em = *per_cpu_ptr(energy_model, cpu);
> > > + struct capacity_state *cs = NULL;
> > > + int i;
> > > +
> > > + util += util >> 2;
> > > +
> > > + for (i = 0; i < em->nr_cap_states; i++) {
> > > + cs = >cap_states[i];
> > > + if (cs->cap >= util)
> > > + break;
> > > + }
> > > +
> > > + return cs;
> > 
> > 'cs' is possible to return NULL.
> 
> Only if em-nr_cap_states==0, and that shouldn't be possible if
> sched_energy_present==True, so this code should be safe :-)

You are right. Thanks for explanation.

> > > +}
> > > +
> > >  static inline struct cpumask *freq_domain_span(struct freq_domain *fd)
> > >  {
> > >   return >span;
> > > @@ -42,6 +60,8 @@ struct freq_domain;
> > >  static inline bool sched_energy_enabled(void) { return false; }
> > >  static inline struct cpumask
> > >  *freq_domain_span(struct freq_domain *fd) { return NULL; }
> > > +static inline struct capacity_state
> > > +*find_cap_state(int cpu, unsigned long util) { return NULL; }
> > >  static inline void init_sched_energy(void) { }
> > >  #define for_each_freq_domain(fdom) for (; fdom; fdom = NULL)
> > >  #endif
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index 6960e5ef3c14..8cb9fb04fff2 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -6633,6 +6633,74 @@ static int wake_cap(struct task_struct *p, int 
> > > cpu, int prev_cpu)
> > >  }
> > >  
> > >  /*
> > > + * Returns the util of "cpu" if "p" wakes up on "dst_cpu".
> > > + */
> > > +static unsigned long cpu_util_next(int cpu, struct task_struct *p, int 
> > > dst_cpu)
> > > +{
> > > + unsigned long util, util_est;
> > > + struct cfs_rq *cfs_rq;
> > > +
> > > + /* Task is where it should be, or has no impact on cpu */
> > > + if ((task_cpu(p) == dst_cpu) || (cpu != task_cpu(p) && cpu != dst_cpu))
> > > + return cpu_util(cpu);
> > > +
> > > + cfs_rq = _rq(cpu)->cfs;
> > > + util = READ_ONCE(cfs_rq->avg.util_avg);
> > > +
> > > + if (dst_cpu == cpu)
> > > + util += task_util(p);
> > > + else
> > > + util = max_t(long, util - task_util(p), 0);
> > 
> > I tried to understand the logic at here, below code is more clear for
> > myself:
> > 
> > int prev_cpu = task_cpu(p);
> > 
> > cfs_rq = _rq(cpu)->cfs;
> > util = READ_ONCE(cfs_rq->avg.util_avg);
> > 
> > /* Bail out if src and dst CPUs are the same one */
> > if (prev_cpu == cpu && dst_cpu == cpu)
> > return util;
> > 
> > /* Remove task utilization for src CPU */
> > if (cpu == prev_cpu)
> > util = max_t(long, util - task_util(p), 0);
> > 
> > /* Add task utilization for dst CPU */
> > if (dst_cpu == cpu)
> > util += task_util(p);
> > 
> > BTW, CPU utilization is decayed value and task_util() is not decayed
> > value, so 'util - task_util(p)' calculates a smaller value than the
> > prev CPU pure utilization, right?
> 
> task_util() is the raw PELT signal, without UTIL_EST, so I think it's
> fine to do `util - task_util()`.
> 
> > 
> > Another question is can we reuse the function cpu_util_wake() and
> > just compenstate task util for dst cpu?
> 
> Well it's not that simple. cpu_util_wake() will give you the max between
> the util_avg and the util_est value, so which task_util() should you add
> 

Re: [PATCH v2 03/14] staging: iio: ad7746: Fix bound checkings

2018-04-18 Thread Jonathan Cameron
On Mon, 16 Apr 2018 11:47:05 -0300
Hernán Gonzalez  wrote:

> On Sun, Apr 15, 2018 at 12:05 PM, Jonathan Cameron  wrote:
> > On Fri, 13 Apr 2018 13:36:40 -0300
> > Hernán Gonzalez  wrote:
> >  
> >> Also remove unnecessary parenthesis  
> > I am probably missing something.  I'm not sure what you mean
> > by fix bound checking?   There are superfluous brackets, but
> > I don't see any functional change to indicate there was anything
> > wrong with the original checks.
> >  
> 
> Maybe I'm wrong but | is a bitwise operator while || is a logical one.
> There are no functional changes as you said but, from K, "One must
> distinguish the bitwise operators & and | from the logical operators
> && and II, which imply left-to-right evaluation of a truth value. For
> example, if x is 1 and y is 2, then x & y is zero while x && y is one"
> so it'd be slightly faster if the first condition is true, and it
> would be the "correct" operator to use in this case, even though it
> doesn't affect the result.
Got you, I missed the operator change entirely. Doh.

Jonathan

> 
> >>
> >> Signed-off-by: Hernán Gonzalez 
> >> ---
> >>  drivers/staging/iio/cdc/ad7746.c | 4 ++--
> >>  1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/staging/iio/cdc/ad7746.c 
> >> b/drivers/staging/iio/cdc/ad7746.c
> >> index 516aa93..d793785 100644
> >> --- a/drivers/staging/iio/cdc/ad7746.c
> >> +++ b/drivers/staging/iio/cdc/ad7746.c
> >> @@ -458,7 +458,7 @@ static int ad7746_write_raw(struct iio_dev *indio_dev,
> >>   ret = 0;
> >>   break;
> >>   case IIO_CHAN_INFO_CALIBBIAS:
> >> - if ((val < 0) | (val > 0x)) {
> >> + if (val < 0 || val > 0x) {
> >>   ret = -EINVAL;
> >>   goto out;
> >>   }
> >> @@ -470,7 +470,7 @@ static int ad7746_write_raw(struct iio_dev *indio_dev,
> >>   ret = 0;
> >>   break;
> >>   case IIO_CHAN_INFO_OFFSET:
> >> - if ((val < 0) | (val > 43008000)) { /* 21pF */
> >> + if (val < 0 || val > 43008000) { /* 21pF */
> >>   ret = -EINVAL;
> >>   goto out;
> >>   }  
> >  
> --
> To unsubscribe from this list: send the line "unsubscribe linux-iio" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



Re: [PATCH RFC tools/memory-model 2/5] tools/memory-model: Add litmus test for multicopy atomicity

2018-04-18 Thread Andrea Parri
On Mon, Apr 16, 2018 at 09:22:48AM -0700, Paul E. McKenney wrote:
> This commit adds a litmus test suggested by Alan Stern that is forbidden
> on multicopy atomic systems, but allowed on non-multicopy atomic systems.
> Note that other-multicopy atomic systems are examples of non-multicopy
> atomic systems.
> 
> Suggested-by: Alan Stern 
> Signed-off-by: Paul E. McKenney 
> ---
>  .../litmus-tests/SB+poonceoncescoh.litmus  | 31 
> ++
>  1 file changed, 31 insertions(+)
>  create mode 100644 tools/memory-model/litmus-tests/SB+poonceoncescoh.litmus

We seem to be missing an entry in litmus-tests/README...


> 
> diff --git a/tools/memory-model/litmus-tests/SB+poonceoncescoh.litmus 
> b/tools/memory-model/litmus-tests/SB+poonceoncescoh.litmus
> new file mode 100644
> index ..991a2d6dec63
> --- /dev/null
> +++ b/tools/memory-model/litmus-tests/SB+poonceoncescoh.litmus
> @@ -0,0 +1,31 @@
> +C SB+poonceoncescoh
> +
> +(*
> + * Result: Sometimes
> + *
> + * This litmus test demonstrates that LKMM is not multicopy atomic.
> + *)
> +
> +{}
> +
> +P0(int *x, int *y)
> +{
> + int r1;
> + int r2;
> +
> + WRITE_ONCE(*x, 1);
> + r1 = READ_ONCE(*x);
> + r2 = READ_ONCE(*y);
> +}
> +
> +P1(int *x, int *y)
> +{
> + int r3;
> + int r4;
> +
> + WRITE_ONCE(*y, 1);
> + r3 = READ_ONCE(*y);
> + r4 = READ_ONCE(*x);
> +}
> +
> +exists (0:r2=0 /\ 1:r4=0 /\ 0:r1=1 /\ 1:r3=1)

This test has a normalised name:  why don't use that?

  Andrea


> -- 
> 2.5.2
> 


Re: [v4 PATCH] mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct

2018-04-18 Thread Cyrill Gorcunov
On Wed, Apr 18, 2018 at 11:03:14AM +0200, Michal Hocko wrote:
> > > 
> > > What about something like the following?
> > > "
> > > arg_lock protects concurent updates but we still need mmap_sem for read
> > > to exclude races with do_brk.
> > > "
> > > Acked-by: Michal Hocko 
> > 
> > Yes, thanks! Andrew, could you slightly update the changelog please?
> 
> No, I meant it to be a comment in the _code_.

Ah, I see. Then small patch on top should do the trick.


[PATCH v2 5/5] f2fs: fix to avoid race during access gc_thread pointer

2018-04-18 Thread Chao Yu
Thread AThread BThread C
- f2fs_remount
 - stop_gc_thread
- f2fs_sbi_store
- issue_discard_thread
   sbi->gc_thread = NULL;
  sbi->gc_thread->gc_wake = 1
  access 
sbi->gc_thread->gc_urgent

Previously, we allocate memory for sbi->gc_thread based on background
gc thread mount option, the memory can be released if we turn off
that mount option, but still there are several places access gc_thread
pointer without considering race condition, result in NULL point
dereference.

In order to fix this issue, keep gc_thread structure valid in sbi all
the time instead of alloc/free it dynamically.

Signed-off-by: Chao Yu 
---
v2: avoid double destroy_gc_context.
 fs/f2fs/debug.c   |  3 +--
 fs/f2fs/f2fs.h|  7 +++
 fs/f2fs/gc.c  | 58 +--
 fs/f2fs/segment.c |  4 ++--
 fs/f2fs/super.c   | 11 +--
 fs/f2fs/sysfs.c   |  8 
 6 files changed, 58 insertions(+), 33 deletions(-)

diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c
index 715beb85e9db..7bb036a3bb81 100644
--- a/fs/f2fs/debug.c
+++ b/fs/f2fs/debug.c
@@ -223,8 +223,7 @@ static void update_mem_info(struct f2fs_sb_info *sbi)
si->cache_mem = 0;
 
/* build gc */
-   if (sbi->gc_thread)
-   si->cache_mem += sizeof(struct f2fs_gc_kthread);
+   si->cache_mem += sizeof(struct f2fs_gc_kthread);
 
/* build merge flush thread */
if (SM_I(sbi)->fcc_info)
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 567c6bb57ae3..c553f63199e8 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1412,6 +1412,11 @@ static inline struct sit_info *SIT_I(struct f2fs_sb_info 
*sbi)
return (struct sit_info *)(SM_I(sbi)->sit_info);
 }
 
+static inline struct f2fs_gc_kthread *GC_I(struct f2fs_sb_info *sbi)
+{
+   return (struct f2fs_gc_kthread *)(sbi->gc_thread);
+}
+
 static inline struct free_segmap_info *FREE_I(struct f2fs_sb_info *sbi)
 {
return (struct free_segmap_info *)(SM_I(sbi)->free_info);
@@ -2954,6 +2959,8 @@ bool f2fs_overwrite_io(struct inode *inode, loff_t pos, 
size_t len);
 /*
  * gc.c
  */
+int init_gc_context(struct f2fs_sb_info *sbi);
+void destroy_gc_context(struct f2fs_sb_info * sbi);
 int start_gc_thread(struct f2fs_sb_info *sbi);
 void stop_gc_thread(struct f2fs_sb_info *sbi);
 block_t start_bidx_of_node(unsigned int node_ofs, struct inode *inode);
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index da89ca16a55d..7d310e454b77 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -26,8 +26,8 @@
 static int gc_thread_func(void *data)
 {
struct f2fs_sb_info *sbi = data;
-   struct f2fs_gc_kthread *gc_th = sbi->gc_thread;
-   wait_queue_head_t *wq = >gc_thread->gc_wait_queue_head;
+   struct f2fs_gc_kthread *gc_th = GC_I(sbi);
+   wait_queue_head_t *wq = _th->gc_wait_queue_head;
unsigned int wait_ms;
 
wait_ms = gc_th->min_sleep_time;
@@ -114,17 +114,15 @@ static int gc_thread_func(void *data)
return 0;
 }
 
-int start_gc_thread(struct f2fs_sb_info *sbi)
+int init_gc_context(struct f2fs_sb_info *sbi)
 {
struct f2fs_gc_kthread *gc_th;
-   dev_t dev = sbi->sb->s_bdev->bd_dev;
-   int err = 0;
 
gc_th = f2fs_kmalloc(sbi, sizeof(struct f2fs_gc_kthread), GFP_KERNEL);
-   if (!gc_th) {
-   err = -ENOMEM;
-   goto out;
-   }
+   if (!gc_th)
+   return -ENOMEM;
+
+   gc_th->f2fs_gc_task = NULL;
 
gc_th->urgent_sleep_time = DEF_GC_THREAD_URGENT_SLEEP_TIME;
gc_th->min_sleep_time = DEF_GC_THREAD_MIN_SLEEP_TIME;
@@ -139,26 +137,41 @@ int start_gc_thread(struct f2fs_sb_info *sbi)
gc_th->atomic_file[FG_GC] = 0;
 
sbi->gc_thread = gc_th;
-   init_waitqueue_head(>gc_thread->gc_wait_queue_head);
-   sbi->gc_thread->f2fs_gc_task = kthread_run(gc_thread_func, sbi,
+
+   return 0;
+}
+
+void destroy_gc_context(struct f2fs_sb_info *sbi)
+{
+   kfree(GC_I(sbi));
+   sbi->gc_thread = NULL;
+}
+
+int start_gc_thread(struct f2fs_sb_info *sbi)
+{
+   struct f2fs_gc_kthread *gc_th = GC_I(sbi);
+   dev_t dev = sbi->sb->s_bdev->bd_dev;
+   int err = 0;
+
+   init_waitqueue_head(_th->gc_wait_queue_head);
+   gc_th->f2fs_gc_task = kthread_run(gc_thread_func, sbi,
"f2fs_gc-%u:%u", MAJOR(dev), MINOR(dev));
if (IS_ERR(gc_th->f2fs_gc_task)) {
err = PTR_ERR(gc_th->f2fs_gc_task);
-   kfree(gc_th);
-   sbi->gc_thread = NULL;
+   gc_th->f2fs_gc_task = NULL;
}
-out:
+
return err;
 }
 
 void stop_gc_thread(struct f2fs_sb_info *sbi)
 {
-   struct f2fs_gc_kthread *gc_th = sbi->gc_thread;
-   if (!gc_th)
-   return;
-   

Re: [PATCH v6 05/11] ARM: smp: Add initialization of CNTVOFF

2018-04-18 Thread Mylène Josserand
Hello Geert,

On Wed, 18 Apr 2018 11:30:47 +0200
Geert Uytterhoeven  wrote:

> Allo Mylène,
> 
> On Mon, Apr 16, 2018 at 11:50 PM, Mylène Josserand
>  wrote:
> > The CNTVOFF register from arch timer is uninitialized.
> > It should be done by the bootloader but it is currently not the case,
> > even for boot CPU because this SoC is booting in secure mode.
> > It leads to an random offset value meaning that each CPU will have a
> > different time, which isn't working very well.
> >
> > Add assembly code used for boot CPU and secondary CPU cores to make
> > sure that the CNTVOFF register is initialized. Because this code can
> > be used by different platforms, add this assembly file in ARM's common
> > folder.  
> 
> Thanks for your patch!
> 
> > Signed-off-by: Mylène Josserand   
> 
> Reviewed-by: Geert Uytterhoeven 
> Tested-by: Geert Uytterhoeven 
> 
> Gr{oetje,eeting}s,
> 
> Geert
> 

Great, thank you very much for your test!

Best regards,

-- 
Mylène Josserand, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
http://bootlin.com


Re: [v4 PATCH] mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct

2018-04-18 Thread Kirill Tkhai
On 14.04.2018 21:24, Yang Shi wrote:
> mmap_sem is on the hot path of kernel, and it very contended, but it is
> abused too. It is used to protect arg_start|end and evn_start|end when
> reading /proc/$PID/cmdline and /proc/$PID/environ, but it doesn't make
> sense since those proc files just expect to read 4 values atomically and
> not related to VM, they could be set to arbitrary values by C/R.
> 
> And, the mmap_sem contention may cause unexpected issue like below:
> 
> INFO: task ps:14018 blocked for more than 120 seconds.
>Tainted: GE 4.9.79-009.ali3000.alios7.x86_64 #1
>  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
> message.
>  ps  D0 14018  1 0x0004
>   885582f84000 885e8682f000 880972943000 885ebf499bc0
>   8828ee12 c900349bfca8 817154d0 0040
>   00ff812f872a 885ebf499bc0 024000d000948300 880972943000
>  Call Trace:
>   [] ? __schedule+0x250/0x730
>   [] schedule+0x36/0x80
>   [] rwsem_down_read_failed+0xf0/0x150
>   [] call_rwsem_down_read_failed+0x18/0x30
>   [] down_read+0x20/0x40
>   [] proc_pid_cmdline_read+0xd9/0x4e0
>   [] ? do_filp_open+0xa5/0x100
>   [] __vfs_read+0x37/0x150
>   [] ? security_file_permission+0x9b/0xc0
>   [] vfs_read+0x96/0x130
>   [] SyS_read+0x55/0xc0
>   [] entry_SYSCALL_64_fastpath+0x1a/0xc5
> 
> Both Alexey Dobriyan and Michal Hocko suggested to use dedicated lock
> for them to mitigate the abuse of mmap_sem.
> 
> So, introduce a new spinlock in mm_struct to protect the concurrent
> access to arg_start|end, env_start|end and others, as well as replace
> write map_sem to read to protect the race condition between prctl and
> sys_brk which might break check_data_rlimit(), and makes prctl more
> friendly to other VM operations.
> 
> This patch just eliminates the abuse of mmap_sem, but it can't resolve the
> above hung task warning completely since the later access_remote_vm() call
> needs acquire mmap_sem. The mmap_sem scalability issue will be solved in the
> future.
> 
> Signed-off-by: Yang Shi 
> Cc: Alexey Dobriyan 
> Cc: Michal Hocko 
> Cc: Matthew Wilcox 
> Cc: Mateusz Guzik 
> Cc: Cyrill Gorcunov 
> ---
> v3 --> v4:
> * Protected values update with down_read + spin_lock to prevent from race
>   condition between prctl and sys_brk and made prctl more friendly to VM
>   operations per Michal's suggestion
> 
> v2 --> v3:
> * Restored down_write in prctl syscall
> * Elaborate the limitation of this patch suggested by Michal
> * Protect those fields by the new lock except brk and start_brk per Michal's
>   suggestion
> * Based off Cyrill's non PR_SET_MM_MAP oprations deprecation patch
>   (https://lkml.org/lkml/2018/4/5/541)
> 
> v1 --> v2:
> * Use spinlock instead of rwlock per Mattew's suggestion
> * Replace down_write to down_read in prctl_set_mm (see commit log for details)
>  fs/proc/base.c   | 8 
>  include/linux/mm_types.h | 2 ++
>  kernel/fork.c| 1 +
>  kernel/sys.c | 6 --
>  mm/init-mm.c | 1 +
>  5 files changed, 12 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index eafa39a..3551757 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -239,12 +239,12 @@ static ssize_t proc_pid_cmdline_read(struct file *file, 
> char __user *buf,
>   goto out_mmput;
>   }
>  
> - down_read(>mmap_sem);
> + spin_lock(>arg_lock);
>   arg_start = mm->arg_start;
>   arg_end = mm->arg_end;
>   env_start = mm->env_start;
>   env_end = mm->env_end;
> - up_read(>mmap_sem);
> + spin_unlock(>arg_lock);
>  
>   BUG_ON(arg_start > arg_end);
>   BUG_ON(env_start > env_end);
> @@ -929,10 +929,10 @@ static ssize_t environ_read(struct file *file, char 
> __user *buf,
>   if (!mmget_not_zero(mm))
>   goto free;
>  
> - down_read(>mmap_sem);
> + spin_lock(>arg_lock);
>   env_start = mm->env_start;
>   env_end = mm->env_end;
> - up_read(>mmap_sem);
> + spin_unlock(>arg_lock);
>  
>   while (count > 0) {
>   size_t this_len, max_len;
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 2161234..49dd59e 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -413,6 +413,8 @@ struct mm_struct {
>   unsigned long exec_vm;  /* VM_EXEC & ~VM_WRITE & ~VM_STACK */
>   unsigned long stack_vm; /* VM_STACK */
>   unsigned long def_flags;
> +
> + spinlock_t arg_lock; /* protect the below fields */

What the reason is spinlock is used to protect this fields?
There may be several readers, say, doing "ps axf" and it's
OK for them to access these fields in parallel. Why should
we delay them by each other?

rw_lock seems be more suitable for here.

>   unsigned long start_code, 

Re: [PATCH 03/11] fs: add frozen sb state helpers

2018-04-18 Thread Jan Kara
On Tue 17-04-18 17:59:36, Luis R. Rodriguez wrote:
> On Thu, Dec 21, 2017 at 12:03:29PM +0100, Jan Kara wrote:
> > On Fri 01-12-17 22:13:27, Luis R. Rodriguez wrote:
> > > 
> > > I'll note that its still not perfectly clear if really the semantics 
> > > behind
> > > freeze_bdev() match what I described above fully. That still needs to be
> > > vetted for. For instance, does thaw_bdev() keep a superblock frozen if we
> > > an ioctl initiated freeze had occurred before? If so then great. Otherwise
> > > I think we'll need to distinguish the ioctl interface. Worst possible case
> > > is that bdev semantics and in-kernel semantics differ somehow, then that
> > > will really create a holy fucking mess.
> > 
> > I believe nobody really thought about mixing those two interfaces to fs
> > freezing and so the behavior is basically defined by the implementation.
> > That is:
> > 
> > freeze_bdev() on sb frozen by ioctl_fsfreeze() -> EBUSY
> 
> Note below as well on your *future* freeze_super() implementation.
> 
> > freeze_bdev() on sb frozen by freeze_bdev() -> success
> > ioctl_fsfreeze() on sb frozen by freeze_bdev() -> EBUSY
> > ioctl_fsfreeze() on sb frozen by ioctl_fsfreeze() -> EBUSY
> > 
> > thaw_bdev() on sb frozen by ioctl_fsfreeze() -> EINVAL
> 
> Phew, so this is what we want for the in-kernel freezing so we're good
> and *can* combine these then.
> 
> > ioctl_fsthaw() on sb frozen by freeze_bdev() -> success
> > 
> > What I propose is the following API:
> > 
> > freeze_super_excl()
> >   - freezes superblock, returns EBUSY if the superblock is already frozen
> > (either by another freeze_super_excl() or by freeze_super())
> > freeze_super()
> >   - this function will make sure superblock is frozen when the function
> > returns with success. 
> 
> That's straight forward.
> 
> > It can be nested with other freeze_super() or
> > freeze_super_excl() calls 
> 
> This is where it can get hairy. More below.
> 
> > (this second part is different from how
> > freeze_bdev() behaves currently but AFAICT this behavior is actually
> > what all current users of freeze_bdev() really want - just make sure
> > fs cannot be written to)
> 
> If we can agree to this, then sure. However there are two types of
> possible nested calls to consider, one where the sb was already frozen
> by an IOCTL, and the other where it was initiated by either another
> freeze_super_excl() or another freeze_super() call which is currently
> being processed. For the first type, its easy to say the device is
> already frozen as such return success. If the freezing is ongoing,
> we may want to wait or not wait, and this will depend on our current
> use cases for freeze_bdev().

A side note since I'm not sure I wrote this down in my previous email:
I want ioctl_fsfreeze() directly use freeze_super_excl().

Now to your freeze in progress question: freeze_super_excl() can
immediately return EBUSY when there's freezing in progress. OTOH
freeze_super() always has to wait for the current freeze / thaw to finish
and then do what's necessary. I don't see a use case where you'd like to
have freeze_super() not wait.

> As you noted above, freeze_bdev() currently returns EBUSY if we had
> the sb already frozen by ioctl_fsfreeze(). It may be a welcomed
> enhancement to correct the semantics first to address the first case,
> but keep the EBUSY for the other case. A secondary patch could then
> add a completion mechanism and let callers decide to either wait or not.
> *Iff* the caller did not opt-in to wait we keep the EBUSY return.

You're now speaking about steps to transition to the new API, right? I'd
structure the transition as follows:

1) Move bdev->bd_fsfreeze_count to a superblock.
2) Make freeze_super() grab the counter as well, thaw_super() drops it and
  unfreezes the filesystem only if the counter dropped to zero.
3) Rename freeze_super() to freeze_super_excl().
4) Only now I'd go for messing with freeze_bdev() as it now combines sanely
with freeze_super_excl(). Probably I'd just implement new freeze_super()
with the desired semantics (including waiting for ongoing operation to
finish).
5) And then switch all users (there are 4 in the kernel) from freeze_bdev()
to freeze_super() with the justification in each case why the new semantics
is actually desirable.
6) Drop old freeze_bdev() - note that only one freeze_bdev() user (in
drivers/md/dm.c) is actually interested in passing bdev, all the others are
better off just passing in superblock to new freeze_super(). Anyway for
that user in dm we might still provide a convenience wrapper to grab the
superblock and call new freeze_super() on it.

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy helper DRM driver

2018-04-18 Thread Roger Pau Monné
On Wed, Apr 18, 2018 at 11:01:12AM +0300, Oleksandr Andrushchenko wrote:
> On 04/18/2018 10:35 AM, Roger Pau Monné wrote:
> > On Wed, Apr 18, 2018 at 09:38:39AM +0300, Oleksandr Andrushchenko wrote:
> > > On 04/17/2018 11:57 PM, Dongwon Kim wrote:
> > > > On Tue, Apr 17, 2018 at 09:59:28AM +0200, Daniel Vetter wrote:
> > > > > On Mon, Apr 16, 2018 at 12:29:05PM -0700, Dongwon Kim wrote:
> > > 3.2 Backend exports dma-buf to xen-front
> > > 
> > > In this case Dom0 pages are shared with DomU. As before, DomU can only 
> > > write
> > > to these pages, not any other page from Dom0, so it can be still 
> > > considered
> > > safe.
> > > But, the following must be considered (highlighted in xen-front's Kernel
> > > documentation):
> > >   - If guest domain dies then pages/grants received from the backend 
> > > cannot
> > >     be claimed back - think of it as memory lost to Dom0 (won't be used 
> > > for
> > > any
> > >     other guest)
> > >   - Misbehaving guest may send too many requests to the backend exhausting
> > >     its grant references and memory (consider this from security POV). As 
> > > the
> > >     backend runs in the trusted domain we also assume that it is trusted 
> > > as
> > > well,
> > >     e.g. must take measures to prevent DDoS attacks.
> > I cannot parse the above sentence:
> > 
> > "As the backend runs in the trusted domain we also assume that it is
> > trusted as well, e.g. must take measures to prevent DDoS attacks."
> > 
> > What's the relation between being trusted and protecting from DoS
> > attacks?
> I mean that we trust the backend that it can prevent Dom0
> from crashing in case DomU's frontend misbehaves, e.g.
> if the frontend sends too many memory requests etc.
> > In any case, all? PV protocols are implemented with the frontend
> > sharing pages to the backend, and I think there's a reason why this
> > model is used, and it should continue to be used.
> This is the first use-case above. But there are real-world
> use-cases (embedded in my case) when physically contiguous memory
> needs to be shared, one of the possible ways to achieve this is
> to share contiguous memory from Dom0 to DomU (the second use-case above)
> > Having to add logic in the backend to prevent such attacks means
> > that:
> > 
> >   - We need more code in the backend, which increases complexity and
> > chances of bugs.
> >   - Such code/logic could be wrong, thus allowing DoS.
> You can live without this code at all, but this is then up to
> backend which may make Dom0 down because of DomU's frontend doing evil
> things

IMO we should design protocols that do not allow such attacks instead
of having to defend against them.

> > > 4. xen-front/backend/xen-zcopy synchronization
> > > 
> > > 4.1. As I already said in 2) all the inter VM communication happens 
> > > between
> > > xen-front and the backend, xen-zcopy is NOT involved in that.
> > > When xen-front wants to destroy a display buffer (dumb/dma-buf) it issues 
> > > a
> > > XENDISPL_OP_DBUF_DESTROY command (opposite to XENDISPL_OP_DBUF_CREATE).
> > > This call is synchronous, so xen-front expects that backend does free the
> > > buffer pages on return.
> > > 
> > > 4.2. Backend, on XENDISPL_OP_DBUF_DESTROY:
> > >    - closes all dumb handles/fd's of the buffer according to [3]
> > >    - issues DRM_IOCTL_XEN_ZCOPY_DUMB_WAIT_FREE IOCTL to xen-zcopy to make
> > > sure
> > >      the buffer is freed (think of it as it waits for dma-buf->release
> > > callback)
> > So this zcopy thing keeps some kind of track of the memory usage? Why
> > can't the user-space backend keep track of the buffer usage?
> Because there is no dma-buf UAPI which allows to track the buffer life cycle
> (e.g. wait until dma-buf's .release callback is called)
> > >    - replies to xen-front that the buffer can be destroyed.
> > > This way deletion of the buffer happens synchronously on both Dom0 and 
> > > DomU
> > > sides. In case if DRM_IOCTL_XEN_ZCOPY_DUMB_WAIT_FREE returns with time-out
> > > error
> > > (BTW, wait time is a parameter of this IOCTL), Xen will defer grant
> > > reference
> > > removal and will retry later until those are free.
> > > 
> > > Hope this helps understand how buffers are synchronously deleted in case
> > > of xen-zcopy with a single protocol command.
> > > 
> > > I think the above logic can also be re-used by the hyper-dmabuf driver 
> > > with
> > > some additional work:
> > > 
> > > 1. xen-zcopy can be split into 2 parts and extend:
> > > 1.1. Xen gntdev driver [4], [5] to allow creating dma-buf from grefs and
> > > vise versa,
> > I don't know much about the dma-buf implementation in Linux, but
> > gntdev is a user-space device, and AFAICT user-space applications
> > don't have any notion of dma buffers. How are such buffers useful for
> > user-space? Why can't this just be called memory?
> A dma-buf is seen by user-space as a file descriptor and you can
> pass it to different drivers then. For example, you can share a buffer
> used by 

Re: [PATCH] net: don't use kvzalloc for DMA memory

2018-04-18 Thread Michael S. Tsirkin
On Wed, Apr 18, 2018 at 01:38:43PM -0700, Eric Dumazet wrote:
> 
> 
> On 04/18/2018 10:55 AM, Michael S. Tsirkin wrote:
> 
> > Imagine you want to pass some data to card.
> > Natural thing is to just put it in a variable and start DMA.
> > However DMA API disallows stack access nowdays,
> > so it's natural to put this within struct device.
> > 
> > See e.g.
> > 
> > commit a725ee3e44e39dab1ec82cc745899a785d2a555e
> > Author: Andy Lutomirski 
> > Date:   Mon Jul 18 15:34:49 2016 -0700
> > 
> > virtio-net: Remove more stack DMA
> >
> 
> Andy just moved the problem to another one, since at that time we already
> had vmalloc() fallback for at least 2 years.
> 
> Note that my original patch had :
> 
> p = kzalloc(alloc_size, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
> if (!p)
>   p = vzalloc(alloc_size);
> 
> So really, normal (less than PAGE_SIZE) allocations would have 
> almost-zero-chance to end up to vmalloc(one_page)

Thanks Eric, I'll fix virtio.


-- 
MST



Re: [PATCH 2/2] cpufreq: brcmstb-avs-cpufreq: prefer SCMI cpufreq if supported

2018-04-18 Thread Viresh Kumar
On 18-04-18, 08:56, Markus Mayer wrote:
> From: Jim Quinlan 
> 
> If the SCMI cpufreq driver is supported, we bail, so that the new
> approach can be used.
> 
> Signed-off-by: Jim Quinlan 
> Signed-off-by: Markus Mayer 
> ---
>  drivers/cpufreq/brcmstb-avs-cpufreq.c | 16 
>  1 file changed, 16 insertions(+)
> 
> diff --git a/drivers/cpufreq/brcmstb-avs-cpufreq.c 
> b/drivers/cpufreq/brcmstb-avs-cpufreq.c
> index b07559b9ed99..b4861a730162 100644
> --- a/drivers/cpufreq/brcmstb-avs-cpufreq.c
> +++ b/drivers/cpufreq/brcmstb-avs-cpufreq.c
> @@ -164,6 +164,8 @@
>  #define BRCM_AVS_CPU_INTR"brcm,avs-cpu-l2-intr"
>  #define BRCM_AVS_HOST_INTR   "sw_intr"
>  
> +#define ARM_SCMI_COMPAT  "arm,scmi"
> +
>  struct pmap {
>   unsigned int mode;
>   unsigned int p1;
> @@ -511,6 +513,20 @@ static int brcm_avs_prepare_init(struct platform_device 
> *pdev)
>   struct device *dev;
>   int host_irq, ret;
>  
> + /*
> +  * If the SCMI cpufreq driver is supported, we bail, so that the more
> +  * modern approach can be used.
> +  */
> + if (IS_ENABLED(CONFIG_ARM_SCMI_PROTOCOL)) {
> + struct device_node *np;
> +
> + np = of_find_compatible_node(NULL, NULL, ARM_SCMI_COMPAT);
> + if (np) {
> + of_node_put(np);
> + return -ENXIO;
> + }
> + }
> +

What about adding !CONFIG_ARM_SCMI_PROTOCOL in Kconfig dependency and don't
compile the driver at all ?

-- 
viresh


Re: [PATCH] checkpatch: Add a --strict test for structs with bool member definitions

2018-04-18 Thread Julia Lawall


On Wed, 18 Apr 2018, Joe Perches wrote:

> On Tue, 2018-04-17 at 17:07 +0800, yuank...@codeaurora.org wrote:
> > Hi julia,
> >
> > On 2018-04-15 05:19 AM, Julia Lawall wrote:
> > > On Wed, 11 Apr 2018, Joe Perches wrote:
> > >
> > > > On Thu, 2018-04-12 at 08:22 +0200, Julia Lawall wrote:
> > > > > On Wed, 11 Apr 2018, Joe Perches wrote:
> > > > > > On Wed, 2018-04-11 at 09:29 -0700, Andrew Morton wrote:
> > > > > > > We already have some 500 bools-in-structs
> > > > > >
> > > > > > I got at least triple that only in include/
> > > > > > so I expect there are at probably an order
> > > > > > of magnitude more than 500 in the kernel.
> > > > > >
> > > > > > I suppose some cocci script could count the
> > > > > > actual number of instances.  A regex can not.
> > > > >
> > > > > I got 12667.
> > > >
> > > > Could you please post the cocci script?
> > > >
> > > > > I'm not sure to understand the issue.  Will using a bitfield help if 
> > > > > there
> > > > > are no other bitfields in the structure?
> > > >
> > > > IMO, not really.
> > > >
> > > > The primary issue is described by Linus here:
> > > > https://lkml.org/lkml/2017/11/21/384
> > > >
> > > > I personally do not find a significant issue with
> > > > uncontrolled sizes of bool in kernel structs as
> > > > all of the kernel structs are transitory and not
> > > > written out to storage.
> > > >
> > > > I suppose bool bitfields are also OK, but for the
> > > > RMW required.
> > > >
> > > > Using unsigned int :1 bitfield instead of bool :1
> > > > has the negative of truncation so that the uint
> > > > has to be set with !! instead of a simple assign.
> > >
> > > At least with gcc 5.4.0, a number of structures become larger with
> > > unsigned int :1. bool:1 seems to mostly solve this problem.  The
> > > structure
> > > ichx_desc, defined in drivers/gpio/gpio-ich.c seems to become larger
> > > with
> > > both approaches.
> >
> > [ZJ] Hopefully, this could make it better in your environment.
> >   IMHO, this is just for double check.
>
> I doubt this is actually better or smaller code.
>
> Check the actual object code using objdump and the
> struct alignment using pahole.

I didn't have a chance to try it, but it looks quite likely to result in a
smaller data structure based on the other examples that I looked at.

julia

>
> > diff --git a/drivers/gpio/gpio-ich.c b/drivers/gpio/gpio-ich.c
> > index 4f6d643..b46e170 100644
> > --- a/drivers/gpio/gpio-ich.c
> > +++ b/drivers/gpio/gpio-ich.c
> > @@ -70,6 +70,18 @@ static const u8 avoton_reglen[3] = {
> >   #define ICHX_READ(reg, base_res)   inl((reg) + (base_res)->start)
> >
> >   struct ichx_desc {
> > +   /* GPO_BLINK is available on this chipset */
> > +   bool uses_gpe0:1;
> > +
> > +   /* Whether the chipset has GPIO in GPE0_STS in the PM IO region
> > */
> > +bool uses_gpe0:1;
> > +
> > +/*
> > + * Some chipsets don't let reading output values on GPIO_LVL
> > register
> > + * this option allows driver caching written output values
> > + */
> > +bool use_outlvl_cache:1;
> > +
> >  /* Max GPIO pins the chipset can have */
> >  uint ngpio;
> >
> > @@ -77,24 +89,12 @@ struct ichx_desc {
> >  const u8 (*regs)[3];
> >  const u8 *reglen;
> >
> > -   /* GPO_BLINK is available on this chipset */
> > -   bool have_blink;
> > -
> > -   /* Whether the chipset has GPIO in GPE0_STS in the PM IO region
> > */
> > -   bool uses_gpe0;
> > -
> >  /* USE_SEL is bogus on some chipsets, eg 3100 */
> >  u32 use_sel_ignore[3];
> >
> >  /* Some chipsets have quirks, let these use their own
> > request/get */
> >  int (*request)(struct gpio_chip *chip, unsigned offset);
> >  int (*get)(struct gpio_chip *chip, unsigned offset);
> > -
> > -   /*
> > -* Some chipsets don't let reading output values on GPIO_LVL
> > register
> > -* this option allows driver caching written output values
> > -*/
> > -   bool use_outlvl_cache;
> >   };
> >
> >
> > ZJ
>


Re: [PATCH] checkpatch: Add a --strict test for structs with bool member definitions

2018-04-18 Thread Julia Lawall


On Wed, 18 Apr 2018, Joe Perches wrote:

> On Thu, 2018-04-19 at 06:40 +0200, Julia Lawall wrote:
> >
> > On Wed, 18 Apr 2018, Joe Perches wrote:
> >
> > > On Tue, 2018-04-17 at 17:07 +0800, yuank...@codeaurora.org wrote:
> > > > Hi julia,
> > > >
> > > > On 2018-04-15 05:19 AM, Julia Lawall wrote:
> > > > > On Wed, 11 Apr 2018, Joe Perches wrote:
> > > > >
> > > > > > On Thu, 2018-04-12 at 08:22 +0200, Julia Lawall wrote:
> > > > > > > On Wed, 11 Apr 2018, Joe Perches wrote:
> > > > > > > > On Wed, 2018-04-11 at 09:29 -0700, Andrew Morton wrote:
> > > > > > > > > We already have some 500 bools-in-structs
> > > > > > > >
> > > > > > > > I got at least triple that only in include/
> > > > > > > > so I expect there are at probably an order
> > > > > > > > of magnitude more than 500 in the kernel.
> > > > > > > >
> > > > > > > > I suppose some cocci script could count the
> > > > > > > > actual number of instances.  A regex can not.
> > > > > > >
> > > > > > > I got 12667.
> > > > > >
> > > > > > Could you please post the cocci script?
> > > > > >
> > > > > > > I'm not sure to understand the issue.  Will using a bitfield help 
> > > > > > > if there
> > > > > > > are no other bitfields in the structure?
> > > > > >
> > > > > > IMO, not really.
> > > > > >
> > > > > > The primary issue is described by Linus here:
> > > > > > https://lkml.org/lkml/2017/11/21/384
> > > > > >
> > > > > > I personally do not find a significant issue with
> > > > > > uncontrolled sizes of bool in kernel structs as
> > > > > > all of the kernel structs are transitory and not
> > > > > > written out to storage.
> > > > > >
> > > > > > I suppose bool bitfields are also OK, but for the
> > > > > > RMW required.
> > > > > >
> > > > > > Using unsigned int :1 bitfield instead of bool :1
> > > > > > has the negative of truncation so that the uint
> > > > > > has to be set with !! instead of a simple assign.
> > > > >
> > > > > At least with gcc 5.4.0, a number of structures become larger with
> > > > > unsigned int :1. bool:1 seems to mostly solve this problem.  The
> > > > > structure
> > > > > ichx_desc, defined in drivers/gpio/gpio-ich.c seems to become larger
> > > > > with
> > > > > both approaches.
> > > >
> > > > [ZJ] Hopefully, this could make it better in your environment.
> > > >   IMHO, this is just for double check.
> > >
> > > I doubt this is actually better or smaller code.
> > >
> > > Check the actual object code using objdump and the
> > > struct alignment using pahole.
> >
> > I didn't have a chance to try it, but it looks quite likely to result in a
> > smaller data structure based on the other examples that I looked at.
>
> I _really_ doubt there is any difference in size between the
> below in any architecture
>
> struct foo {
>   int bar;
>   bool baz:1;
>   int qux;
> };
>
> and
>
> struct foo {
>   int bar;
>   bool baz;
>   int qux;
> };
>
> Where there would be a difference in size is
>
> struct foo {
>   int bar;
>   bool baz1:1;
>   bool baz2:1;
>   int qux;
> };
>
> and
>
> struct foo {
>   int bar;
>   bool baz1;
>   bool baz2;
>
> int qux;
> };

In the situation of the example there are two bools together in the middle
of the structure and one at the end.  Somehow, even converting to bool:1
increases the size.  But it seems plausible that putting all three bools
together and converting them all to :1 would reduce the size.  I don't
know.  The size increase (more than 8 bytes) seems out of proportion for 3
bools.

I was able to check around 3000 structures that were not declared with any
attributes, that don't declare named types internally, and that are
compiled for x86.  Around 10% become smaller whn using bool:1, typically
by at most 8 bytes.

julia

>
>


<    2   3   4   5   6   7   8   9   10   11   >