date:20190922

[PATCH -next] fuse: Make fuse_args_to_req static

2019-09-22 Thread YueHaibing

Fix sparse warning:

fs/fuse/dev.c:468:6: warning: symbol 'fuse_args_to_req' was not declared. 
Should it be static?

Reported-by: Hulk Robot 
Signed-off-by: YueHaibing 
---
 fs/fuse/dev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 46d68d4..7844d35 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -465,7 +465,7 @@ static void fuse_force_creds(struct fuse_conn *fc, struct 
fuse_req *req)
req->in.h.pid = pid_nr_ns(task_pid(current), fc->pid_ns);
 }
 
-void fuse_args_to_req(struct fuse_req *req, struct fuse_args *args)
+static void fuse_args_to_req(struct fuse_req *req, struct fuse_args *args)
 {
req->in.h.opcode = args->opcode;
req->in.h.nodeid = args->nodeid;
-- 
2.7.4

Re: [PATCH] mm/hotplug: Reorder memblock_[free|remove]() calls in try_remove_memory()

2019-09-22 Thread Anshuman Khandual




On 09/16/2019 11:17 AM, Anshuman Khandual wrote:
> In add_memory_resource() the memory range to be hot added first gets into
> the memblock via memblock_add() before arch_add_memory() is called on it.
> Reverse sequence should be followed during memory hot removal which already
> is being followed in add_memory_resource() error path. This now ensures
> required re-order between memblock_[free|remove]() and arch_remove_memory()
> during memory hot-remove.
> 
> Cc: Andrew Morton 
> Cc: Oscar Salvador 
> Cc: Michal Hocko 
> Cc: David Hildenbrand 
> Cc: Pavel Tatashin 
> Cc: Dan Williams 
> Signed-off-by: Anshuman Khandual 
> ---
> Original patch https://lkml.org/lkml/2019/9/3/327
> 
> Memory hot remove now works on arm64 without this because a recent commit
> 60bb462fc7ad ("drivers/base/node.c: simplify 
> unregister_memory_block_under_nodes()").
> 
> David mentioned that re-ordering should still make sense for consistency
> purpose (removing stuff in the reverse order they were added). This patch
> is now detached from arm64 hot-remove series.
> 
> https://lkml.org/lkml/2019/9/3/326
> 
>  mm/memory_hotplug.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index c73f09913165..355c466e0621 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1770,13 +1770,13 @@ static int __ref try_remove_memory(int nid, u64 
> start, u64 size)
>  
>   /* remove memmap entry */
>   firmware_map_remove(start, start + size, "System RAM");
> - memblock_free(start, size);
> - memblock_remove(start, size);
>  
>   /* remove memory block devices before removing memory */
>   remove_memory_block_devices(start, size);
>  
>   arch_remove_memory(nid, start, size, NULL);
> + memblock_free(start, size);
> + memblock_remove(start, size);
>   __release_memory_resource(start, size);
>  
>   try_offline_node(nid);
> 

Hello Andrew,

Any feedbacks on this, does it look okay ?

- Anshuman

[RFC-v2 1/2] ext4: Add ext4_ilock & ext4_iunlock API

2019-09-22 Thread Ritesh Harjani

This adds ext4_ilock/iunlock types of APIs.
This is the preparation APIs to make shared
locking/unlocking & restarting with exclusive
locking/unlocking easier in next patch.

Along with above this also addresses the AIM7
regression problem which was only fixed for XFS
in,
commit 942491c9e6d6 ("xfs: fix AIM7 regression")

Signed-off-by: Ritesh Harjani 
---
 fs/ext4/ext4.h| 33 +
 fs/ext4/extents.c | 16 ++--
 fs/ext4/file.c| 63 +--
 fs/ext4/inode.c   |  4 +--
 fs/ext4/ioctl.c   | 16 ++--
 fs/ext4/super.c   | 12 -
 fs/ext4/xattr.c   | 16 ++--
 7 files changed, 98 insertions(+), 62 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 2ab91815f52d..9ffafbe6bc3f 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2945,6 +2945,39 @@ static inline int ext4_update_inode_size(struct inode 
*inode, loff_t newsize)
return changed;
 }
 
+#define EXT4_IOLOCK_EXCL   (1 << 0)
+#define EXT4_IOLOCK_SHARED (1 << 1)
+
+static inline void ext4_ilock(struct inode *inode, unsigned int iolock)
+{
+   if (iolock == EXT4_IOLOCK_EXCL)
+   inode_lock(inode);
+   else
+   inode_lock_shared(inode);
+}
+
+static inline void ext4_iunlock(struct inode *inode, unsigned int iolock)
+{
+   if (iolock == EXT4_IOLOCK_EXCL)
+   inode_unlock(inode);
+   else
+   inode_unlock_shared(inode);
+}
+
+static inline int ext4_ilock_nowait(struct inode *inode, unsigned int iolock)
+{
+   if (iolock == EXT4_IOLOCK_EXCL)
+   return inode_trylock(inode);
+   else
+   return inode_trylock_shared(inode);
+}
+
+static inline void ext4_ilock_demote(struct inode *inode, unsigned int iolock)
+{
+   BUG_ON(iolock != EXT4_IOLOCK_EXCL);
+   downgrade_write(>i_rwsem);
+}
+
 int ext4_update_disksize_before_punch(struct inode *inode, loff_t offset,
  loff_t len);
 
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index a869e206bd81..ef37f4d4ee7e 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4680,7 +4680,7 @@ static long ext4_zero_range(struct file *file, loff_t 
offset,
else
max_blocks -= lblk;
 
-   inode_lock(inode);
+   ext4_ilock(inode, EXT4_IOLOCK_EXCL);
 
/*
 * Indirect files do not support unwritten extnets
@@ -4790,7 +4790,7 @@ static long ext4_zero_range(struct file *file, loff_t 
offset,
 
ext4_journal_stop(handle);
 out_mutex:
-   inode_unlock(inode);
+   ext4_iunlock(inode, EXT4_IOLOCK_EXCL);
return ret;
 }
 
@@ -4856,7 +4856,7 @@ long ext4_fallocate(struct file *file, int mode, loff_t 
offset, loff_t len)
if (mode & FALLOC_FL_KEEP_SIZE)
flags |= EXT4_GET_BLOCKS_KEEP_SIZE;
 
-   inode_lock(inode);
+   ext4_ilock(inode, EXT4_IOLOCK_EXCL);
 
/*
 * We only support preallocation for extent-based files only
@@ -4887,7 +4887,7 @@ long ext4_fallocate(struct file *file, int mode, loff_t 
offset, loff_t len)
EXT4_I(inode)->i_sync_tid);
}
 out:
-   inode_unlock(inode);
+   ext4_iunlock(inode, EXT4_IOLOCK_EXCL);
trace_ext4_fallocate_exit(inode, offset, max_blocks, ret);
return ret;
 }
@@ -5387,7 +5387,7 @@ int ext4_collapse_range(struct inode *inode, loff_t 
offset, loff_t len)
return ret;
}
 
-   inode_lock(inode);
+   ext4_ilock(inode, EXT4_IOLOCK_EXCL);
/*
 * There is no need to overlap collapse range with EOF, in which case
 * it is effectively a truncate operation
@@ -5486,7 +5486,7 @@ int ext4_collapse_range(struct inode *inode, loff_t 
offset, loff_t len)
 out_mmap:
up_write(_I(inode)->i_mmap_sem);
 out_mutex:
-   inode_unlock(inode);
+   ext4_iunlock(inode, EXT4_IOLOCK_EXCL);
return ret;
 }
 
@@ -5537,7 +5537,7 @@ int ext4_insert_range(struct inode *inode, loff_t offset, 
loff_t len)
return ret;
}
 
-   inode_lock(inode);
+   ext4_ilock(inode, EXT4_IOLOCK_EXCL);
/* Currently just for extent based files */
if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) {
ret = -EOPNOTSUPP;
@@ -5664,7 +5664,7 @@ int ext4_insert_range(struct inode *inode, loff_t offset, 
loff_t len)
 out_mmap:
up_write(_I(inode)->i_mmap_sem);
 out_mutex:
-   inode_unlock(inode);
+   ext4_iunlock(inode, EXT4_IOLOCK_EXCL);
return ret;
 }
 
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index d2ff383a8b9f..ce1cecbae932 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -57,14 +57,15 @@ static ssize_t ext4_dio_read_iter(struct kiocb *iocb, 
struct iov_iter *to)
/*
 * Get exclusion from truncate and other inode operations.
 */
-   if (!inode_trylock_shared(inode)) {
-   if (iocb->ki_flags &

[RFC-v2 0/2] ext4: Improve ilock scalability to improve performance in DirectIO workloads

2019-09-22 Thread Ritesh Harjani

Hello All, 

These are ilock patches which helps improve the current inode lock scalabiliy
problem in ext4 DIO mixed-read/write workload case. The problem was first
reported by Joseph [1]. These patches are based upon upstream discussion
with Jan Kara & Joseph [2]. 

Since these patches introduce ext4_ilock/unlock API for inode locking/
unlocking in shared v/s exclusive mode to help achieve better
scalability in DIO writes case, it was better to take care of inode_trylock
for RWF_NOWAIT case in the same patch set.
Discussion about this can be found here [3].
Please let me know if otherwise.

These patches are based on the top of ext4 iomap for DIO patches by
Matthew [4].

Now, since this is a very interesting scalability problem w.r.t locking,
I thought of collecting a comprehensive set of performance numbers for
DIO sync mixed random read/write workload w.r.t number of threads (ext4).
This commit msg will present a performance benefit histogram using this
patch set v/s vanilla kernel (v5.3-rc5) in DIO mixed randrw workload.

Git tree for this can be found at-
https://github.com/riteshharjani/linux/tree/ext4-ilock-RFC-v2

Background
==

dioread_nolock feature in ext4, is to allow multiple DIO reads in parallel.
Also, in case of overwrites DIO since there is no block allocation needed,
those I/Os too can run in parallel. This gives a good scalable I/O
performance w.r.t multiple DIO read/writes threads.
In case of buffer write with dioread_nolock feature, ext4 will allocate
uninitialized extent and once the IO is completed will convert it back
to initialized.

But recently DIO reads got changed to be under shared inode rwsem.
commit 16c54688592c ("ext4: Allow parallel DIO reads").
It did allow parallel DIO reads to work fine, but this degraded the mixed
DIO read/write workload.


Problem description
===

Currently in DIO writes case, we start with an exclusive inode rwsem
and only once we know that it is a overwrite (with dioread_nolock mnt
option enabled),
we demote it to shared lock. Now with mixed DIO reads & writes, this
has a scalability problem. Which is, if we have any ongoing DIO reads,
then DIO writes (since it starts with exclusive) has to wait until the
shared lock is released (which only happens when DIO read is completed).
The same can be easily observed with perf-tools trace analysis.


Implementation
==

To fix this, we start with a shared lock (unless an exclusive lock is needed
in certain cases e.g. extending/unaligned writes) in DIO write case.
Then if it is a overwrite we continue with shared lock, if not then we
release the shared lock and switch to exclusive lock and
redo certain checks again.

This approach help in achieving good scalablity for mixed DIO read/writes
worloads [5-6].

Performance results
===

The performance numbers are collected using fio tool with dioread_nolock
mount option for DIO mixed random read/write(randrw) worload with 4K & 1M burst
sizes for increasing number of threads (1, 2, 4, 8, 12, 16, 24).
The performance historgram shown below is the percentage change in
performance by using this ilock patchset as compared to vanilla kernel
(v5.3-rc5).

To check absolute score comparison with some colorful graphs, refer [5-6].

FIO command:
fio -name=DIO-mixed-randrw -filename=./testfile -direct=1 -iodepth=1 -thread \
-rw=randrw -ioengine=psync -bs=$bs -size=10G -numjobs=$thread \
-group_reporting=1 -runtime=120

With dioread_nolock mount option


Below shows the performance benefit hist with this ilock patchset in (%)
w.r.t vanilla kernel(v5.3-rc5) with dioread_nolock mount option for
mixed randrw workload (for 4K block size).
If we notice, the percentage benefit increases with increasing number of
threads. So this patchset help achieve good scalability in the mentioned
workload. Also this gives upto ~70% perf improvement in 24 threads mixed randrw
workload with 4K burst size.
The performance difference can be even higher with high speed storage
devices, since bw speeds without the patch seems to flatten due to lock
contention problem in case of multiple threads.

Absolute graph comparison can be seen here - [5]

Performance benefit (%) data randrw (read)-4K
  80 +-+--++---++++---+--+-+   
 |++   ++++   +|   
  70 +-+   **+-+   
 |**   **  |   
  60 +-+  **  **   **+-+   
 |**  **   **  |   
  50 +-+ **   **  **   **+-+   
 |   **   **  **   **  |   
  40 +-+ **   **  **   **+-+   
  30 +-+

[RFC-v2 2/2] ext4: Improve DIO writes locking sequence

2019-09-22 Thread Ritesh Harjani

Earlier there was no shared lock in DIO read path.
But this patch (16c54688592ce: ext4: Allow parallel DIO reads)
simplified some of the locking mechanism while still allowing
for parallel DIO reads by adding shared lock in inode DIO
read path.

But this created problem with mixed read/write workload.
It is due to the fact that in DIO path, we first start with
exclusive lock and only when we determine that it is a ovewrite
IO, we downgrade the lock. This causes the problem, since
with above patch we have shared locking in DIO reads.

So, this patch tries to fix this issue by starting with
shared lock and then switching to exclusive lock only
when required based on ext4_dio_write_checks().

Other than that, it also simplifies below cases:-

1. Simplified ext4_unaligned_aio API to
ext4_unaligned_io.
Previous API was abused in the sense that it was
not really checking for AIO anywhere also it used to
check for extending writes.
So this API was renamed and simplified to ext4_unaligned_io()
which actully only checks if the IO is really unaligned.

This is because in all other cases inode_dio_wait()
will anyway become a no-op. So no need to over complicate
by checking for every condition here.

2. Added ext4_extending_io API. This checks if the IO
is extending the file.

Now we only check for
unaligned_io, extend, dioread_nolock & overwrite.

Signed-off-by: Ritesh Harjani 
---
 fs/ext4/file.c | 213 -
 1 file changed, 159 insertions(+), 54 deletions(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index ce1cecbae932..faa8dcf9c08d 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -166,19 +166,25 @@ static int ext4_release_file(struct inode *inode, struct 
file *filp)
  * threads are at work on the same unwritten block, they must be synchronized
  * or one thread will zero the other's data, causing corruption.
  */
-static int
-ext4_unaligned_aio(struct inode *inode, struct iov_iter *from, loff_t pos)
+static bool
+ext4_unaligned_io(struct inode *inode, struct iov_iter *from, loff_t pos)
 {
struct super_block *sb = inode->i_sb;
-   int blockmask = sb->s_blocksize - 1;
-
-   if (pos >= ALIGN(i_size_read(inode), sb->s_blocksize))
-   return 0;
+   unsigned long blockmask = sb->s_blocksize - 1;
 
if ((pos | iov_iter_alignment(from)) & blockmask)
-   return 1;
+   return true;
 
-   return 0;
+   return false;
+}
+
+static bool
+ext4_extending_io(struct inode *inode, loff_t offset, size_t len)
+{
+   if (offset + len > i_size_read(inode) ||
+   offset + len > EXT4_I(inode)->i_disksize)
+   return true;
+   return false;
 }
 
 /* Is IO overwriting allocated and initialized blocks? */
@@ -204,7 +210,9 @@ static bool ext4_overwrite_io(struct inode *inode, loff_t 
pos, loff_t len)
return err == blklen && (map.m_flags & EXT4_MAP_MAPPED);
 }
 
-static ssize_t ext4_write_checks(struct kiocb *iocb, struct iov_iter *from)
+
+static ssize_t ext4_generic_write_checks(struct kiocb *iocb,
+struct iov_iter *from)
 {
struct inode *inode = file_inode(iocb->ki_filp);
ssize_t ret;
@@ -216,10 +224,6 @@ static ssize_t ext4_write_checks(struct kiocb *iocb, 
struct iov_iter *from)
if (ret <= 0)
return ret;
 
-   ret = file_modified(iocb->ki_filp);
-   if (ret)
-   return 0;
-
/*
 * If we have encountered a bitmap-format file, the size limit
 * is smaller than s_maxbytes, which is for extent-mapped files.
@@ -231,9 +235,26 @@ static ssize_t ext4_write_checks(struct kiocb *iocb, 
struct iov_iter *from)
return -EFBIG;
iov_iter_truncate(from, sbi->s_bitmap_maxbytes - iocb->ki_pos);
}
+
return iov_iter_count(from);
 }
 
+static ssize_t ext4_write_checks(struct kiocb *iocb, struct iov_iter *from)
+{
+   ssize_t ret;
+   ssize_t count;
+
+   count = ext4_generic_write_checks(iocb, from);
+   if (count <= 0)
+   return count;
+
+   ret = file_modified(iocb->ki_filp);
+   if (ret)
+   return ret;
+
+   return count;
+}
+
 static ssize_t ext4_buffered_write_iter(struct kiocb *iocb,
struct iov_iter *from)
 {
@@ -336,6 +357,83 @@ static int ext4_handle_failed_inode_extension(struct inode 
*inode, loff_t size)
return 0;
 }
 
+/*
+ * The intention here is to start with shared lock acquired
+ * (except in unaligned IO & extending writes case),
+ * then see if any condition requires an exclusive inode
+ * lock. If yes, then we restart the whole operation by
+ * releasing the shared lock and acquiring exclusive lock.
+ *
+ * - For unaligned_io we never take exclusive lock as it
+ *   may cause data corruption when two unaligned IO tries
+ *   to modify the same block.
+ *
+ * - For extending wirtes case we don't take
+ *   the

[PATCH V8 0/2] arm64/mm: Enable memory hot remove

2019-09-22 Thread Anshuman Khandual

This series enables memory hot remove on arm64 after fixing a possible arm64
platform specific kernel page table race condition. This series is based on
linux-next (next-20190920).

Concurrent vmalloc() and hot-remove conflict:

As pointed out earlier on the v5 thread [2] there can be potential conflict
between concurrent vmalloc() and memory hot-remove operation. The problem here
is caused by inadequate locking in vmalloc() which protects installation of a
page table page but not the walk or the leaf entry modification.

Now free_empty_tables() and it's children functions take into account a maximum
possible range on which it operates as a floor-ceiling boundary. This along with
changed free_pxx_table() makes sure that no page table page is freed unless its
fully within the maximum possible range as decided by the caller.

Testing:

Memory hot remove has been tested on arm64 for 4K, 16K, 64K page config
options with all possible CONFIG_ARM64_VA_BITS and CONFIG_PGTABLE_LEVELS
combinations.

Changes in V8:

- Dropped the first patch (memblock_[free|remove] reorder) from the series which
  is no longer needed for arm64 hot-remove enablement and was posted separately
  as (https://patchwork.kernel.org/patch/11146361/)
- Dropped vmalloc-vmemmap detection and subsequent skipping of 
free_empty_tables()
- Changed free_empty_[pxx]_tables() functions which now accepts a possible 
maximum
  floor-ceiling address range on which it operates. Also changed 
free_pxx_table()
  functions to check against required alignment as well as maximum floor-ceiling
  range as another prerequisite before freeing the page table page.
- Dropped remove_pagetable(), instead call it's constituent functions directly

Changes in V7: (https://lkml.org/lkml/2019/9/3/326)

- vmalloc_vmemmap_overlap gets evaluated early during boot for a given config
- free_empty_tables() gets conditionally called based on vmalloc_vmemmap_overlap

Changes in V6: (https://lkml.org/lkml/2019/7/15/36)

- Implemented most of the suggestions from Mark Rutland
- Added  in ptdump
- remove_pagetable() now has two distinct passes over the kernel page table
- First pass unmap_hotplug_range() removes leaf level entries at all level
- Second pass free_empty_tables() removes empty page table pages
- Kernel page table lock has been dropped completely
- vmemmap_free() does not call freee_empty_tables() to avoid conflict with 
vmalloc()
- All address range scanning are converted to do {} while() loop
- Added 'unsigned long end' in __remove_pgd_mapping()
- Callers need not provide starting pointer argument to 
free_[pte|pmd|pud]_table() 
- Drop the starting pointer argument from free_[pte|pmd|pud]_table() functions
- Fetching pxxp[i] in free_[pte|pmd|pud]_table() is wrapped around in 
READ_ONCE()
- free_[pte|pmd|pud]_table() now computes starting pointer inside the function
- Fixed TLB handling while freeing huge page section mappings at PMD or PUD 
level
- Added WARN_ON(!page) in free_hotplug_page_range()
- Added WARN_ON(![pm|pud]_table(pud|pmd)) when there is no section mapping

- [PATCH 1/3] mm/hotplug: Reorder memblock_[free|remove]() calls in 
try_remove_memory()
- Request earlier for separate merger 
(https://patchwork.kernel.org/patch/10986599/)
- s/__remove_memory/try_remove_memory in the subject line
- s/arch_remove_memory/memblock_[free|remove] in the subject line
- A small change in the commit message as re-order happens now for memblock 
remove
  functions not for arch_remove_memory()

Changes in V5: (https://lkml.org/lkml/2019/5/29/218)

- Have some agreement [1] over using memory_hotplug_lock for arm64 ptdump
- Change 7ba36eccb3f8 ("arm64/mm: Inhibit huge-vmap with ptdump") already merged
- Dropped the above patch from this series
- Fixed indentation problem in arch_[add|remove]_memory() as per David
- Collected all new Acked-by tags
 
Changes in V4: (https://lkml.org/lkml/2019/5/20/19)

- Implemented most of the suggestions from Mark Rutland
- Interchanged patch [PATCH 2/4] <---> [PATCH 3/4] and updated commit message
- Moved CONFIG_PGTABLE_LEVELS inside free_[pud|pmd]_table()
- Used READ_ONCE() in missing instances while accessing page table entries
- s/p???_present()/p???_none() for checking valid kernel page table entries
- WARN_ON() when an entry is !p???_none() and !p???_present() at the same time
- Updated memory hot-remove commit message with additional details as suggested
- Rebased the series on 5.2-rc1 with hotplug changes from David and Michal Hocko
- Collected all new Acked-by tags

Changes in V3: (https://lkml.org/lkml/2019/5/14/197)
 
- Implemented most of the suggestions from Mark Rutland for remove_pagetable()
- Fixed applicable PGTABLE_LEVEL wrappers around pgtable page freeing functions
- Replaced 'direct' with 'sparse_vmap' in remove_pagetable() with inverted 
polarity
- Changed pointer names ('p' at end) and removed tmp from iterations
- Perform intermediate TLB invalidation while clearing pgtable entries
- Dropped flush_tlb_kernel_range()

[PATCH V8 2/2] arm64/mm: Enable memory hot remove

2019-09-22 Thread Anshuman Khandual

The arch code for hot-remove must tear down portions of the linear map and
vmemmap corresponding to memory being removed. In both cases the page
tables mapping these regions must be freed, and when sparse vmemmap is in
use the memory backing the vmemmap must also be freed.

This patch adds unmap_hotplug_range() and free_empty_tables() helpers which
can be used to tear down either region and calls it from vmemmap_free() and
___remove_pgd_mapping(). The sparse_vmap argument determines whether the
backing memory will be freed.

It makes two distinct passes over the kernel page table. In the first pass
with unmap_hotplug_range() it unmaps, invalidates applicable TLB cache and
frees backing memory if required (vmemmap) for each mapped leaf entry. In
the second pass with free_empty_tables() it looks for empty page table
sections whose page table page can be unmapped, TLB invalidated and freed.

While freeing intermediate level page table pages bail out if any of its
entries are still valid. This can happen for partially filled kernel page
table either from a previously attempted failed memory hot add or while
removing an address range which does not span the entire page table page
range.

The vmemmap region may share levels of table with the vmalloc region.
There can be conflicts between hot remove freeing page table pages with
a concurrent vmalloc() walking the kernel page table. This conflict can
not just be solved by taking the init_mm ptl because of existing locking
scheme in vmalloc(). So free_empty_tables() implements a floor and ceiling
method which is borrowed from user page table tear with free_pgd_range()
which skips freeing page table pages if intermediate address range is not
aligned or maximum floor-ceiling might not own the entire page table page.

While here update arch_add_memory() to handle __add_pages() failures by
just unmapping recently added kernel linear mapping. Now enable memory hot
remove on arm64 platforms by default with ARCH_ENABLE_MEMORY_HOTREMOVE.

This implementation is overall inspired from kernel page table tear down
procedure on X86 architecture and user page table tear down method.

Acked-by: Steve Capper 
Acked-by: David Hildenbrand 
Signed-off-by: Anshuman Khandual 
---
 arch/arm64/Kconfig  |   3 +
 arch/arm64/include/asm/memory.h |   1 +
 arch/arm64/mm/mmu.c | 305 ++--
 3 files changed, 300 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 41a9b42..511249f 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -274,6 +274,9 @@ config ZONE_DMA32
 config ARCH_ENABLE_MEMORY_HOTPLUG
def_bool y
 
+config ARCH_ENABLE_MEMORY_HOTREMOVE
+   def_bool y
+
 config SMP
def_bool y
 
diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index b61b50b..615dcd0 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -54,6 +54,7 @@
 #define MODULES_VADDR  (BPF_JIT_REGION_END)
 #define MODULES_VSIZE  (SZ_128M)
 #define VMEMMAP_START  (-VMEMMAP_SIZE - SZ_2M)
+#define VMEMMAP_END(VMEMMAP_START + VMEMMAP_SIZE)
 #define PCI_IO_END (VMEMMAP_START - SZ_2M)
 #define PCI_IO_START   (PCI_IO_END - PCI_IO_SIZE)
 #define FIXADDR_TOP(PCI_IO_START - SZ_2M)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 60c929f..6178837 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -725,6 +725,277 @@ int kern_addr_valid(unsigned long addr)
 
return pfn_valid(pte_pfn(pte));
 }
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+static void free_hotplug_page_range(struct page *page, size_t size)
+{
+   WARN_ON(!page || PageReserved(page));
+   free_pages((unsigned long)page_address(page), get_order(size));
+}
+
+static void free_hotplug_pgtable_page(struct page *page)
+{
+   free_hotplug_page_range(page, PAGE_SIZE);
+}
+
+static bool pgtable_range_aligned(unsigned long start, unsigned long end,
+ unsigned long floor, unsigned long ceiling,
+ unsigned long mask)
+{
+   start &= mask;
+   if (start < floor)
+   return false;
+
+   if (ceiling) {
+   ceiling &= mask;
+   if (!ceiling)
+   return false;
+   }
+
+   if (end - 1 > ceiling - 1)
+   return false;
+   return true;
+}
+
+static void free_pte_table(pmd_t *pmdp, unsigned long addr, unsigned long end,
+  unsigned long floor, unsigned long ceiling)
+{
+   struct page *page;
+   pte_t *ptep;
+   int i;
+
+   if (!pgtable_range_aligned(addr, end, floor, ceiling, PMD_MASK))
+   return;
+
+   ptep = pte_offset_kernel(pmdp, 0UL);
+   for (i = 0; i < PTRS_PER_PTE; i++) {
+   if (!pte_none(READ_ONCE(ptep[i])))
+   return;
+   }
+
+   page =

[PATCH V8 1/2] arm64/mm: Hold memory hotplug lock while walking for kernel page table dump

2019-09-22 Thread Anshuman Khandual

The arm64 page table dump code can race with concurrent modification of the
kernel page tables. When a leaf entries are modified concurrently, the dump
code may log stale or inconsistent information for a VA range, but this is
otherwise not harmful.

When intermediate levels of table are freed, the dump code will continue to
use memory which has been freed and potentially reallocated for another
purpose. In such cases, the dump code may dereference bogus addresses,
leading to a number of potential problems.

Intermediate levels of table may by freed during memory hot-remove,
which will be enabled by a subsequent patch. To avoid racing with
this, take the memory hotplug lock when walking the kernel page table.

Acked-by: David Hildenbrand 
Acked-by: Mark Rutland 
Signed-off-by: Anshuman Khandual 
---
 arch/arm64/mm/ptdump_debugfs.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/mm/ptdump_debugfs.c b/arch/arm64/mm/ptdump_debugfs.c
index 064163f..b5eebc8 100644
--- a/arch/arm64/mm/ptdump_debugfs.c
+++ b/arch/arm64/mm/ptdump_debugfs.c
@@ -1,5 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 #include 
+#include 
 #include 
 
 #include 
@@ -7,7 +8,10 @@
 static int ptdump_show(struct seq_file *m, void *v)
 {
struct ptdump_info *info = m->private;
+
+   get_online_mems();
ptdump_walk_pgd(m, info);
+   put_online_mems();
return 0;
 }
 DEFINE_SHOW_ATTRIBUTE(ptdump);
-- 
2.7.4

[PATCH -next] scsi: hisi_sas: Make three functions static

2019-09-22 Thread YueHaibing

Fix sparse warnings:

drivers/scsi/hisi_sas/hisi_sas_main.c:3686:6:
 warning: symbol 'hisi_sas_debugfs_release' was not declared. Should it be 
static?
drivers/scsi/hisi_sas/hisi_sas_main.c:3708:5:
 warning: symbol 'hisi_sas_debugfs_alloc' was not declared. Should it be static?
drivers/scsi/hisi_sas/hisi_sas_main.c:3799:6:
 warning: symbol 'hisi_sas_debugfs_bist_init' was not declared. Should it be 
static?

Reported-by: Hulk Robot 
Signed-off-by: YueHaibing 
---
 drivers/scsi/hisi_sas/hisi_sas_main.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c 
b/drivers/scsi/hisi_sas/hisi_sas_main.c
index d1513fd..0847e68 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -3683,7 +3683,7 @@ void hisi_sas_debugfs_work_handler(struct work_struct 
*work)
 }
 EXPORT_SYMBOL_GPL(hisi_sas_debugfs_work_handler);
 
-void hisi_sas_debugfs_release(struct hisi_hba *hisi_hba)
+static void hisi_sas_debugfs_release(struct hisi_hba *hisi_hba)
 {
struct device *dev = hisi_hba->dev;
int i;
@@ -3705,7 +3705,7 @@ void hisi_sas_debugfs_release(struct hisi_hba *hisi_hba)
devm_kfree(dev, hisi_hba->debugfs_port_reg[i]);
 }
 
-int hisi_sas_debugfs_alloc(struct hisi_hba *hisi_hba)
+static int hisi_sas_debugfs_alloc(struct hisi_hba *hisi_hba)
 {
const struct hisi_sas_hw *hw = hisi_hba->hw;
struct device *dev = hisi_hba->dev;
@@ -3796,7 +3796,7 @@ int hisi_sas_debugfs_alloc(struct hisi_hba *hisi_hba)
return -ENOMEM;
 }
 
-void hisi_sas_debugfs_bist_init(struct hisi_hba *hisi_hba)
+static void hisi_sas_debugfs_bist_init(struct hisi_hba *hisi_hba)
 {
hisi_hba->debugfs_bist_dentry =
debugfs_create_dir("bist", hisi_hba->debugfs_dir);
-- 
2.7.4

Re: [PATCH v2 2/2] PCI: pciehp: Prevent deadlock on disconnect

2019-09-22 Thread Lukas Wunner

On Mon, Aug 12, 2019 at 05:31:33PM +0300, Mika Westerberg wrote:
> If there are more than one PCIe switch with hotplug downstream ports
> hot-removing them leads to a following deadlock:

For the record, I think my comments on v1 of this patch still apply:

https://patchwork.ozlabs.org/patch/1117870/#2230798

Re: [PATCH] smack: fix an compile error in smack_post_notification

2019-09-22 Thread kbuild test robot

Hi zhong,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[cannot apply to v5.3 next-20190920]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:
https://github.com/0day-ci/linux/commits/zhong-jiang/smack-fix-an-compile-error-in-smack_post_notification/20190923-112403
config: x86_64-allyesconfig (attached as .config)
compiler: gcc-7 (Debian 7.4.0-13) 7.4.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot 

All errors (new ones prefixed by >>):

   In file included from security/smack/smack_lsm.c:45:0:
>> security/smack/smack.h:24:10: fatal error: linux/watch_queue.h: No such file 
>> or directory
#include 
 ^
   compilation terminated.

vim +24 security/smack/smack.h

11  
12  #include 
13  #include 
14  #include 
15  #include 
16  #if IS_ENABLED(CONFIG_IPV6)
17  #include 
18  #endif /* CONFIG_IPV6 */
19  #include 
20  #include 
21  #include 
22  #include 
23  #include 
  > 24  #include 
25  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return

2019-09-22 Thread Jason Wang




On 2019/9/23 上午11:18, Matt Cover wrote:

On Sun, Sep 22, 2019 at 7:34 PM Jason Wang  wrote:


On 2019/9/23 上午9:15, Matt Cover wrote:

On Sun, Sep 22, 2019 at 5:51 PM Jason Wang  wrote:

On 2019/9/23 上午6:30, Matt Cover wrote:

On Sun, Sep 22, 2019 at 1:36 PM Michael S. Tsirkin  wrote:

On Sun, Sep 22, 2019 at 10:43:19AM -0700, Matt Cover wrote:

On Sun, Sep 22, 2019 at 5:37 AM Michael S. Tsirkin  wrote:

On Fri, Sep 20, 2019 at 11:58:43AM -0700, Matthew Cover wrote:

Treat a negative return from a TUNSETSTEERINGEBPF bpf prog as a signal
to fallback to tun_automq_select_queue() for tx queue selection.

Compilation of this exact patch was tested.

For functional testing 3 additional printk()s were added.

Functional testing results (on 2 txq tap device):

 [Fri Sep 20 18:33:27 2019] == tun no prog ==
 [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1'
 [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran
 [Fri Sep 20 18:33:27 2019] == tun prog -1 ==
 [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '-1'
 [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1'
 [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran
 [Fri Sep 20 18:33:27 2019] == tun prog 0 ==
 [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '0'
 [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0'
 [Fri Sep 20 18:33:27 2019] == tun prog 1 ==
 [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '1'
 [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '1'
 [Fri Sep 20 18:33:27 2019] == tun prog 2 ==
 [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '2'
 [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0'

Signed-off-by: Matthew Cover 

Could you add a bit more motivation data here?

Thank you for these questions Michael.

I'll plan on adding the below information to the
commit message and submitting a v2 of this patch
when net-next reopens. In the meantime, it would
be very helpful to know if these answers address
some of your concerns.


1. why is this a good idea

This change allows TUNSETSTEERINGEBPF progs to
do any of the following.
1. implement queue selection for a subset of
   traffic (e.g. special queue selection logic
   for ipv4, but return negative and use the
   default automq logic for ipv6)
2. determine there isn't sufficient information
   to do proper queue selection; return
   negative and use the default automq logic
   for the unknown
3. implement a noop prog (e.g. do
   bpf_trace_printk() then return negative and
   use the default automq logic for everything)


2. how do we know existing userspace does not rely on existing behaviour

Prior to this change a negative return from a
TUNSETSTEERINGEBPF prog would have been cast
into a u16 and traversed netdev_cap_txqueue().

In most cases netdev_cap_txqueue() would have
found this value to exceed real_num_tx_queues
and queue_index would be updated to 0.

It is possible that a TUNSETSTEERINGEBPF prog
return a negative value which when cast into a
u16 results in a positive queue_index less than
real_num_tx_queues. For example, on x86_64, a
return value of -65535 results in a queue_index
of 1; which is a valid queue for any multiqueue
device.

It seems unlikely, however as stated above is
unfortunately possible, that existing
TUNSETSTEERINGEBPF programs would choose to
return a negative value rather than return the
positive value which holds the same meaning.

It seems more likely that future
TUNSETSTEERINGEBPF programs would leverage a
negative return and potentially be loaded into
a kernel with the old behavior.

OK if we are returning a special
value, shouldn't we limit it? How about a special
value with this meaning?
If we are changing an ABI let's at least make it
extensible.


A special value with this meaning sounds
good to me. I'll plan on adding a define
set to -1 to cause the fallback to automq.

Can it really return -1?

I see:

static inline u32 bpf_prog_run_clear_cb(const struct bpf_prog *prog,
   struct sk_buff *skb)
...



The way I was initially viewing the old
behavior was that returning negative was
undefined; it happened to have the
outcomes I walked through, but not
necessarily by design.

Having such fallback may bring extra troubles, it requires the eBPF
program know the existence of the behavior which is not a part of kernel
ABI actually. And then some eBPF program may start to rely on that which
is pretty dangerous. Note, one important consideration is to have
macvtap support where does not have any stuffs like automq.

Thanks


How about we call this TUN_SSE_ABORT
instead of TUN_SSE_DO_AUTOMQ?

TUN_SSE_ABORT could be documented as
falling back

Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return

2019-09-22 Thread Jason Wang




On 2019/9/23 上午11:00, Matt Cover wrote:

On Sun, Sep 22, 2019 at 7:32 PM Jason Wang  wrote:


On 2019/9/23 上午9:20, Matt Cover wrote:

On Sun, Sep 22, 2019 at 5:46 PM Jason Wang  wrote:

On 2019/9/23 上午1:43, Matt Cover wrote:

On Sun, Sep 22, 2019 at 5:37 AM Michael S. Tsirkin  wrote:

On Fri, Sep 20, 2019 at 11:58:43AM -0700, Matthew Cover wrote:

Treat a negative return from a TUNSETSTEERINGEBPF bpf prog as a signal
to fallback to tun_automq_select_queue() for tx queue selection.

Compilation of this exact patch was tested.

For functional testing 3 additional printk()s were added.

Functional testing results (on 2 txq tap device):

 [Fri Sep 20 18:33:27 2019] == tun no prog ==
 [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1'
 [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran
 [Fri Sep 20 18:33:27 2019] == tun prog -1 ==
 [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '-1'
 [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1'
 [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran
 [Fri Sep 20 18:33:27 2019] == tun prog 0 ==
 [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '0'
 [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0'
 [Fri Sep 20 18:33:27 2019] == tun prog 1 ==
 [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '1'
 [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '1'
 [Fri Sep 20 18:33:27 2019] == tun prog 2 ==
 [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '2'
 [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0'

Signed-off-by: Matthew Cover 

Could you add a bit more motivation data here?

Thank you for these questions Michael.

I'll plan on adding the below information to the
commit message and submitting a v2 of this patch
when net-next reopens. In the meantime, it would
be very helpful to know if these answers address
some of your concerns.


1. why is this a good idea

This change allows TUNSETSTEERINGEBPF progs to
do any of the following.
1. implement queue selection for a subset of
   traffic (e.g. special queue selection logic
   for ipv4, but return negative and use the
   default automq logic for ipv6)

Well, using ebpf means it need to take care of all the cases. E.g you
can easily implement the fallback through eBPF as well.


I really think there is value in being
able to implement a scoped special
case while leaving the rest of the
packets in the kernel's hands.


This is only work when some fucntion could not be done by eBPF itself
and then we can provide the function through eBPF helpers. But this is
not the case here.



Having to reimplement automq makes
this hookpoint less accessible to
beginners and experienced alike.


Note that automq itself is kind of complicated, it's best effort that is
hard to be documented accurately. It has several limitations (e.g flow
caches etc.) that may not work well in some conditions.

It's not hard to implement a user programmable steering policy through
maps which could have much deterministic behavior than automq. The goal
of steering ebpf is to get rid of automq completely not partially rely
on it.

And I don't see how relying on automq can simplify anything.

Thanks


I'm not suggesting that we document automq.

I'm suggesting that we add a return value
which is documented as signaling to the
kernel to implement whatever queue
selection method is used when there is no
ebpf prog attached.



Again, this only work when there's something that could not be done 
through eBPF. And then we can provide eBPF helper there.




  That behavior today is
automq.



Automq is not good, e.g tun_ebpf_select_queue() has already provided a 
fallback, anything that automq can do better than that?





There is nothing about this return value
which would harder to change the default
queue selection later. The default already
exists today when there is no program
loaded.



The patch depends on incorrect behavior of tuntap (updating flow caches 
when steering prog is set). I think it's wrong to update flow caches 
even when steering program is set which leads extra overhead. Will 
probably submit a patch to disable that behavior.


Thanks





2. determine there isn't sufficient information
   to do proper queue selection; return
   negative and use the default automq logic
   for the unknown

Same as above.



3. implement a noop prog (e.g. do
   bpf_trace_printk() then return negative and
   use the default automq logic for everything)

ditto.



2. how do we know existing userspace does not rely on existing behaviour

Prior to this change a negative return from a
TUNSETSTEERINGEBPF prog would have been cast
into a u16 and traversed netdev_cap_txqueue().

In

RE: [RFC PATCH v3 4/6] psci: Add hvc call service for ptp_kvm.

2019-09-22 Thread Jianyong Wu (Arm Technology China)

Hi Paolo, Marc

> -Original Message-
> From: Paolo Bonzini 
> Sent: Thursday, September 19, 2019 7:07 PM
> To: Jianyong Wu (Arm Technology China) ;
> net...@vger.kernel.org; yangbo...@nxp.com; john.stu...@linaro.org;
> t...@linutronix.de; sean.j.christopher...@intel.com; m...@kernel.org;
> richardcoch...@gmail.com; Mark Rutland ; Will
> Deacon ; Suzuki Poulose
> 
> Cc: linux-kernel@vger.kernel.org; k...@vger.kernel.org; Steve Capper
> ; Kaly Xin (Arm Technology China)
> ; Justin He (Arm Technology China)
> ; nd ; linux-arm-
> ker...@lists.infradead.org
> Subject: Re: [RFC PATCH v3 4/6] psci: Add hvc call service for ptp_kvm.
> 
> On 19/09/19 11:46, Jianyong Wu (Arm Technology China) wrote:
> >> On 18/09/19 11:57, Jianyong Wu (Arm Technology China) wrote:
> >>> Paolo Bonzini wrote:
>  This is not Y2038-safe.  Please use ktime_get_real_ts64 instead,
>  and split the 64-bit seconds value between val[0] and val[1].
> >
> > Val[] should be long not u32 I think, so in arm64 I can avoid that
> > Y2038_safe, but also need rewrite for arm32.
> 
> I don't think there's anything inherently wrong with u32 val[], and as you
> notice it lets you reuse code between arm and arm64.  It's up to you and
> Marc to decide.
> 
To compatible 32-bit, Integrates second value and nanosecond value as a 
nanosecond value then split it into val[0] and val[1] and split cycle value 
into val[2] and val[3],
 In this way, time will overflow at Y2262.
WDYT?

Thanks
Jianyong Wu

Re: Kernel Concurrency Sanitizer (KCSAN)

2019-09-22 Thread Boqun Feng

On Fri, Sep 20, 2019 at 04:54:21PM +0100, Will Deacon wrote:
> Hi Marco,
> 
> On Fri, Sep 20, 2019 at 04:18:57PM +0200, Marco Elver wrote:
> > We would like to share a new data-race detector for the Linux kernel:
> > Kernel Concurrency Sanitizer (KCSAN) --
> > https://github.com/google/ktsan/wiki/KCSAN  (Details:
> > https://github.com/google/ktsan/blob/kcsan/Documentation/dev-tools/kcsan.rst)
> > 
> > To those of you who we mentioned at LPC that we're working on a
> > watchpoint-based KTSAN inspired by DataCollider [1], this is it (we
> > renamed it to KCSAN to avoid confusion with KTSAN).
> > [1] http://usenix.org/legacy/events/osdi10/tech/full_papers/Erickson.pdf
> 
> Oh, spiffy!
> 
> > In the coming weeks we're planning to:
> > * Set up a syzkaller instance.
> > * Share the dashboard so that you can see the races that are found.
> > * Attempt to send fixes for some races upstream (if you find that the
> > kcsan-with-fixes branch contains an important fix, please feel free to
> > point it out and we'll prioritize that).
> 
> Curious: do you take into account things like alignment and/or access size
> when looking at READ_ONCE/WRITE_ONCE? Perhaps you could initially prune
> naturally aligned accesses for which __native_word() is true?
> 
> > There are a few open questions:
> > * The big one: most of the reported races are due to unmarked
> > accesses; prioritization or pruning of races to focus initial efforts
> > to fix races might be required. Comments on how best to proceed are
> > welcome. We're aware that these are issues that have recently received
> > attention in the context of the LKMM
> > (https://lwn.net/Articles/793253/).
> 
> This one is tricky. What I think we need to avoid is an onslaught of
> patches adding READ_ONCE/WRITE_ONCE without a concrete analysis of the
> code being modified. My worry is that Joe Developer is eager to get their
> first patch into the kernel, so runs this tool and starts spamming
> maintainers with these things to the point that they start ignoring KCSAN
> reports altogether because of the time they take up.
> 
> I suppose one thing we could do is to require each new READ_ONCE/WRITE_ONCE
> to have a comment describing the racy access, a bit like we do for memory
> barriers. Another possibility would be to use atomic_t more widely if
> there is genuine concurrency involved.
> 

Instead of commenting READ_ONCE/WRITE_ONCE()s, how about adding
anotations for data fields/variables that might be accessed without
holding a lock? Because if all accesses to a variable are protected by
proper locks, we mostly don't need to worry about data races caused by
not using READ_ONCE/WRITE_ONCE(). Bad things happen when we write to a
variable using locks but read it outside a lock critical section for
better performance, for example, rcu_node::qsmask. I'm thinking so maybe
we can introduce a new annotation similar to __rcu, maybe call it
__lockfree ;-) as follow:

struct rcu_node {
...
unsigned long __lockfree qsmask;
...
}

, and __lockfree indicates that by design the maintainer of this data
structure or variable believe there will be accesses outside lock
critical sections. Note that not all accesses to __lockfree field, need
to be READ_ONCE/WRITE_ONCE(), if the developer manages to build a
complex but working wake/wait state machine so that it could not be
accessed in the same time, READ_ONCE()/WRITE_ONCE() is not needed.

If we have such an annotation, I think it won't be hard for configuring
KCSAN to only examine accesses to variables with this annotation. Also 
this annotation could help other checkers in the future.

If KCSAN (at the least the upstream version) only check accesses with
such an anotation, "spamming with KCSAN warnings/fixes" will be the
choice of each maintainer ;-) 

Thoughts?

Regards,
Boqun

> > * How/when to upstream KCSAN?
> 
> Start by posting the patches :)
> 
> Will

signature.asc
Description: PGP signature

[PATCH v4 09/10] mm/gup: Allow to react to fatal signals

2019-09-22 Thread Peter Xu

The existing gup code does not react to the fatal signals in many code
paths.  For example, in one retry path of gup we're still using
down_read() rather than down_read_killable().  Also, when doing page
faults we don't pass in FAULT_FLAG_KILLABLE as well, which means that
within the faulting process we'll wait in non-killable way as well.
These were spotted by Linus during the code review of some other
patches.

Let's allow the gup code to react to fatal signals to improve the
responsiveness of threads when during gup and being killed.

Signed-off-by: Peter Xu 
---
 mm/gup.c | 12 +---
 mm/hugetlb.c |  3 ++-
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index d2811bb15a25..4c638473db83 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -640,7 +640,7 @@ static int faultin_page(struct task_struct *tsk, struct 
vm_area_struct *vma,
if (*flags & FOLL_REMOTE)
fault_flags |= FAULT_FLAG_REMOTE;
if (locked)
-   fault_flags |= FAULT_FLAG_ALLOW_RETRY;
+   fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
if (*flags & FOLL_NOWAIT)
fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT;
if (*flags & FOLL_TRIED) {
@@ -973,7 +973,7 @@ int fixup_user_fault(struct task_struct *tsk, struct 
mm_struct *mm,
vm_fault_t ret, major = 0;
 
if (unlocked)
-   fault_flags |= FAULT_FLAG_ALLOW_RETRY;
+   fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 
 retry:
vma = find_extend_vma(mm, address);
@@ -1086,7 +1086,13 @@ static __always_inline long 
__get_user_pages_locked(struct task_struct *tsk,
break;
 
*locked = 1;
-   down_read(>mmap_sem);
+   ret = down_read_killable(>mmap_sem);
+   if (ret) {
+   BUG_ON(ret > 0);
+   if (!pages_done)
+   pages_done = ret;
+   break;
+   }
 
ret = __get_user_pages(tsk, mm, start, 1, flags | FOLL_TRIED,
   pages, NULL, locked);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index d0c98cff5b0f..84034154d50e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4342,7 +4342,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct 
vm_area_struct *vma,
if (flags & FOLL_WRITE)
fault_flags |= FAULT_FLAG_WRITE;
if (locked)
-   fault_flags |= FAULT_FLAG_ALLOW_RETRY;
+   fault_flags |= FAULT_FLAG_ALLOW_RETRY |
+   FAULT_FLAG_KILLABLE;
if (flags & FOLL_NOWAIT)
fault_flags |= FAULT_FLAG_ALLOW_RETRY |
FAULT_FLAG_RETRY_NOWAIT;
-- 
2.21.0

[PATCH v4 07/10] mm: Allow VM_FAULT_RETRY for multiple times

2019-09-22 Thread Peter Xu

The idea comes from a discussion between Linus and Andrea [1].

Before this patch we only allow a page fault to retry once.  We
achieved this by clearing the FAULT_FLAG_ALLOW_RETRY flag when doing
handle_mm_fault() the second time.  This was majorly used to avoid
unexpected starvation of the system by looping over forever to handle
the page fault on a single page.  However that should hardly happen,
and after all for each code path to return a VM_FAULT_RETRY we'll
first wait for a condition (during which time we should possibly yield
the cpu) to happen before VM_FAULT_RETRY is really returned.

This patch removes the restriction by keeping the
FAULT_FLAG_ALLOW_RETRY flag when we receive VM_FAULT_RETRY.  It means
that the page fault handler now can retry the page fault for multiple
times if necessary without the need to generate another page fault
event.  Meanwhile we still keep the FAULT_FLAG_TRIED flag so page
fault handler can still identify whether a page fault is the first
attempt or not.

Then we'll have these combinations of fault flags (only considering
ALLOW_RETRY flag and TRIED flag):

  - ALLOW_RETRY and !TRIED:  this means the page fault allows to
 retry, and this is the first try

  - ALLOW_RETRY and TRIED:   this means the page fault allows to
 retry, and this is not the first try

  - !ALLOW_RETRY and !TRIED: this means the page fault does not allow
 to retry at all

  - !ALLOW_RETRY and TRIED:  this is forbidden and should never be used

In existing code we have multiple places that has taken special care
of the first condition above by checking against (fault_flags &
FAULT_FLAG_ALLOW_RETRY).  This patch introduces a simple helper to
detect the first retry of a page fault by checking against
both (fault_flags & FAULT_FLAG_ALLOW_RETRY) and !(fault_flag &
FAULT_FLAG_TRIED) because now even the 2nd try will have the
ALLOW_RETRY set, then use that helper in all existing special paths.
One example is in __lock_page_or_retry(), now we'll drop the mmap_sem
only in the first attempt of page fault and we'll keep it in follow up
retries, so old locking behavior will be retained.

This will be a nice enhancement for current code [2] at the same time
a supporting material for the future userfaultfd-writeprotect work,
since in that work there will always be an explicit userfault
writeprotect retry for protected pages, and if that cannot resolve the
page fault (e.g., when userfaultfd-writeprotect is used in conjunction
with swapped pages) then we'll possibly need a 3rd retry of the page
fault.  It might also benefit other potential users who will have
similar requirement like userfault write-protection.

GUP code is not touched yet and will be covered in follow up patch.

Please read the thread below for more information.

[1] https://lore.kernel.org/lkml/20171102193644.gb22...@redhat.com/
[2] https://lore.kernel.org/lkml/20181230154648.gb9...@redhat.com/

Suggested-by: Linus Torvalds 
Suggested-by: Andrea Arcangeli 
Reviewed-by: Jerome Glisse 
Signed-off-by: Peter Xu 
---
 arch/alpha/mm/fault.c   |  2 +-
 arch/arc/mm/fault.c |  1 -
 arch/arm/mm/fault.c |  3 ---
 arch/arm64/mm/fault.c   |  5 -
 arch/hexagon/mm/vm_fault.c  |  1 -
 arch/ia64/mm/fault.c|  1 -
 arch/m68k/mm/fault.c|  3 ---
 arch/microblaze/mm/fault.c  |  1 -
 arch/mips/mm/fault.c|  1 -
 arch/nds32/mm/fault.c   |  1 -
 arch/nios2/mm/fault.c   |  3 ---
 arch/openrisc/mm/fault.c|  1 -
 arch/parisc/mm/fault.c  |  4 +---
 arch/powerpc/mm/fault.c |  6 --
 arch/riscv/mm/fault.c   |  5 -
 arch/s390/mm/fault.c|  5 +
 arch/sh/mm/fault.c  |  1 -
 arch/sparc/mm/fault_32.c|  1 -
 arch/sparc/mm/fault_64.c|  1 -
 arch/um/kernel/trap.c   |  1 -
 arch/unicore32/mm/fault.c   |  4 +---
 arch/x86/mm/fault.c |  2 --
 arch/xtensa/mm/fault.c  |  1 -
 drivers/gpu/drm/ttm/ttm_bo_vm.c | 12 ---
 include/linux/mm.h  | 37 +
 mm/filemap.c|  2 +-
 mm/shmem.c  |  2 +-
 27 files changed, 52 insertions(+), 55 deletions(-)

diff --git a/arch/alpha/mm/fault.c b/arch/alpha/mm/fault.c
index ab1d4212d658..e032d2d03012 100644
--- a/arch/alpha/mm/fault.c
+++ b/arch/alpha/mm/fault.c
@@ -170,7 +170,7 @@ do_page_fault(unsigned long address, unsigned long mmcsr,
else
current->min_flt++;
if (fault & VM_FAULT_RETRY) {
-   flags &= ~FAULT_FLAG_ALLOW_RETRY;
+   flags |= FAULT_FLAG_TRIED;
 
 /* No need to up_read(>mmap_sem) as we would
 * have already released it in __lock_page_or_retry
diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c
index

[PATCH v4 08/10] mm/gup: Allow VM_FAULT_RETRY for multiple times

2019-09-22 Thread Peter Xu

This is the gup counterpart of the change that allows the
VM_FAULT_RETRY to happen for more than once.  One thing to mention is
that we must check the fatal signal here before retry because the GUP
can be interrupted by that, otherwise we can loop forever.

Signed-off-by: Peter Xu 
---
 mm/gup.c | 27 +--
 mm/hugetlb.c |  6 --
 2 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index e60d32f1674d..d2811bb15a25 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -644,7 +644,10 @@ static int faultin_page(struct task_struct *tsk, struct 
vm_area_struct *vma,
if (*flags & FOLL_NOWAIT)
fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT;
if (*flags & FOLL_TRIED) {
-   VM_WARN_ON_ONCE(fault_flags & FAULT_FLAG_ALLOW_RETRY);
+   /*
+* Note: FAULT_FLAG_ALLOW_RETRY and FAULT_FLAG_TRIED
+* can co-exist
+*/
fault_flags |= FAULT_FLAG_TRIED;
}
 
@@ -994,7 +997,6 @@ int fixup_user_fault(struct task_struct *tsk, struct 
mm_struct *mm,
down_read(>mmap_sem);
if (!(fault_flags & FAULT_FLAG_TRIED)) {
*unlocked = true;
-   fault_flags &= ~FAULT_FLAG_ALLOW_RETRY;
fault_flags |= FAULT_FLAG_TRIED;
goto retry;
}
@@ -1069,17 +1071,30 @@ static __always_inline long 
__get_user_pages_locked(struct task_struct *tsk,
if (likely(pages))
pages += ret;
start += ret << PAGE_SHIFT;
+   lock_dropped = true;
 
+retry:
/*
 * Repeat on the address that fired VM_FAULT_RETRY
-* without FAULT_FLAG_ALLOW_RETRY but with
-* FAULT_FLAG_TRIED.
+* with both FAULT_FLAG_ALLOW_RETRY and
+* FAULT_FLAG_TRIED.  Note that GUP can be interrupted
+* by fatal signals, so we need to check it before we
+* start trying again otherwise it can loop forever.
 */
+
+   if (fatal_signal_pending(current))
+   break;
+
*locked = 1;
-   lock_dropped = true;
down_read(>mmap_sem);
+
ret = __get_user_pages(tsk, mm, start, 1, flags | FOLL_TRIED,
-  pages, NULL, NULL);
+  pages, NULL, locked);
+   if (!*locked) {
+   /* Continue to retry until we succeeded */
+   BUG_ON(ret != 0);
+   goto retry;
+   }
if (ret != 1) {
BUG_ON(ret > 1);
if (!pages_done)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 31c2a6275023..d0c98cff5b0f 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4347,8 +4347,10 @@ long follow_hugetlb_page(struct mm_struct *mm, struct 
vm_area_struct *vma,
fault_flags |= FAULT_FLAG_ALLOW_RETRY |
FAULT_FLAG_RETRY_NOWAIT;
if (flags & FOLL_TRIED) {
-   VM_WARN_ON_ONCE(fault_flags &
-   FAULT_FLAG_ALLOW_RETRY);
+   /*
+* Note: FAULT_FLAG_ALLOW_RETRY and
+* FAULT_FLAG_TRIED can co-exist
+*/
fault_flags |= FAULT_FLAG_TRIED;
}
ret = hugetlb_fault(mm, vma, vaddr, fault_flags);
-- 
2.21.0

[PATCH v4 10/10] mm/userfaultfd: Honor FAULT_FLAG_KILLABLE in fault path

2019-09-22 Thread Peter Xu

Userfaultfd fault path was by default killable even if the caller does
not have FAULT_FLAG_KILLABLE.  That makes sense before in that when
with gup we don't have FAULT_FLAG_KILLABLE properly set before.  Now
after previous patch we've got FAULT_FLAG_KILLABLE applied even for
gup code so it should also make sense to let userfaultfd to honor the
FAULT_FLAG_KILLABLE.

Because we're unconditionally setting FAULT_FLAG_KILLABLE in gup code
right now, this patch should have no functional change.  It also
cleaned the code a little bit by introducing some helpers.

Signed-off-by: Peter Xu 
---
 fs/userfaultfd.c | 36 
 1 file changed, 28 insertions(+), 8 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 2b3b48e94ae4..8c5863ccbf0e 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -334,6 +334,30 @@ static inline bool userfaultfd_must_wait(struct 
userfaultfd_ctx *ctx,
return ret;
 }
 
+/* Should pair with userfaultfd_signal_pending() */
+static inline long userfaultfd_get_blocking_state(unsigned int flags)
+{
+   if (flags & FAULT_FLAG_INTERRUPTIBLE)
+   return TASK_INTERRUPTIBLE;
+
+   if (flags & FAULT_FLAG_KILLABLE)
+   return TASK_KILLABLE;
+
+   return TASK_UNINTERRUPTIBLE;
+}
+
+/* Should pair with userfaultfd_get_blocking_state() */
+static inline bool userfaultfd_signal_pending(unsigned int flags)
+{
+   if (flags & FAULT_FLAG_INTERRUPTIBLE)
+   return signal_pending(current);
+
+   if (flags & FAULT_FLAG_KILLABLE)
+   return fatal_signal_pending(current);
+
+   return false;
+}
+
 /*
  * The locking rules involved in returning VM_FAULT_RETRY depending on
  * FAULT_FLAG_ALLOW_RETRY, FAULT_FLAG_RETRY_NOWAIT and
@@ -355,7 +379,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned 
long reason)
struct userfaultfd_ctx *ctx;
struct userfaultfd_wait_queue uwq;
vm_fault_t ret = VM_FAULT_SIGBUS;
-   bool must_wait, return_to_userland;
+   bool must_wait;
long blocking_state;
 
/*
@@ -462,9 +486,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned 
long reason)
uwq.ctx = ctx;
uwq.waken = false;
 
-   return_to_userland = vmf->flags & FAULT_FLAG_INTERRUPTIBLE;
-   blocking_state = return_to_userland ? TASK_INTERRUPTIBLE :
-TASK_KILLABLE;
+   blocking_state = userfaultfd_get_blocking_state(vmf->flags);
 
spin_lock_irq(>fault_pending_wqh.lock);
/*
@@ -490,8 +512,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned 
long reason)
up_read(>mmap_sem);
 
if (likely(must_wait && !READ_ONCE(ctx->released) &&
-  (return_to_userland ? !signal_pending(current) :
-   !fatal_signal_pending(current {
+  userfaultfd_signal_pending(vmf->flags))) {
wake_up_poll(>fd_wqh, EPOLLIN);
schedule();
ret |= VM_FAULT_MAJOR;
@@ -513,8 +534,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned 
long reason)
set_current_state(blocking_state);
if (READ_ONCE(uwq.waken) ||
READ_ONCE(ctx->released) ||
-   (return_to_userland ? signal_pending(current) :
-fatal_signal_pending(current)))
+   userfaultfd_signal_pending(vmf->flags))
break;
schedule();
}
-- 
2.21.0

[PATCH v4 05/10] mm: Return faster for non-fatal signals in user mode faults

2019-09-22 Thread Peter Xu

The idea comes from the upstream discussion between Linus and Andrea:

  https://lore.kernel.org/lkml/20171102193644.gb22...@redhat.com/

A summary to the issue: there was a special path in handle_userfault()
in the past that we'll return a VM_FAULT_NOPAGE when we detected
non-fatal signals when waiting for userfault handling.  We did that by
reacquiring the mmap_sem before returning.  However that brings a risk
in that the vmas might have changed when we retake the mmap_sem and
even we could be holding an invalid vma structure.

This patch is a preparation of removing that special path by allowing
the page fault to return even faster if we were interrupted by a
non-fatal signal during a user-mode page fault handling routine.

Suggested-by: Linus Torvalds 
Suggested-by: Andrea Arcangeli 
Signed-off-by: Peter Xu 
---
 arch/alpha/mm/fault.c|  3 ++-
 arch/arc/mm/fault.c  |  5 +
 arch/arm/mm/fault.c  |  9 +
 arch/arm64/mm/fault.c|  9 +
 arch/hexagon/mm/vm_fault.c   |  3 ++-
 arch/ia64/mm/fault.c |  3 ++-
 arch/m68k/mm/fault.c |  5 +++--
 arch/microblaze/mm/fault.c   |  3 ++-
 arch/mips/mm/fault.c |  3 ++-
 arch/nds32/mm/fault.c|  9 +
 arch/nios2/mm/fault.c|  3 ++-
 arch/openrisc/mm/fault.c |  3 ++-
 arch/parisc/mm/fault.c   |  3 ++-
 arch/powerpc/mm/fault.c  |  2 ++
 arch/riscv/mm/fault.c|  5 +++--
 arch/s390/mm/fault.c |  4 ++--
 arch/sh/mm/fault.c   |  4 
 arch/sparc/mm/fault_32.c |  2 +-
 arch/sparc/mm/fault_64.c |  3 ++-
 arch/um/kernel/trap.c|  4 +++-
 arch/unicore32/mm/fault.c|  5 +++--
 arch/x86/mm/fault.c  |  2 ++
 arch/xtensa/mm/fault.c   |  3 ++-
 include/linux/sched/signal.h | 12 
 24 files changed, 75 insertions(+), 32 deletions(-)

diff --git a/arch/alpha/mm/fault.c b/arch/alpha/mm/fault.c
index de4cc6936391..ab1d4212d658 100644
--- a/arch/alpha/mm/fault.c
+++ b/arch/alpha/mm/fault.c
@@ -150,7 +150,8 @@ do_page_fault(unsigned long address, unsigned long mmcsr,
   the fault.  */
fault = handle_mm_fault(vma, address, flags);
 
-   if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
+   if ((fault & VM_FAULT_RETRY) &&
+   fault_should_check_signal(user_mode(regs)))
return;
 
if (unlikely(fault & VM_FAULT_ERROR)) {
diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c
index 61919e4e4eec..27adf4e608e4 100644
--- a/arch/arc/mm/fault.c
+++ b/arch/arc/mm/fault.c
@@ -142,6 +142,11 @@ void do_page_fault(unsigned long address, struct pt_regs 
*regs)
goto no_context;
return;
}
+
+   /* Allow user to handle non-fatal signals first */
+   if (signal_pending(current) && user_mode(regs))
+   return;
+
/*
 * retry state machine
 */
diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index 2ae28ffec622..f00fb4eafe54 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -291,14 +291,15 @@ do_page_fault(unsigned long addr, unsigned int fsr, 
struct pt_regs *regs)
 
fault = __do_page_fault(mm, addr, fsr, flags, tsk);
 
-   /* If we need to retry but a fatal signal is pending, handle the
+   /* If we need to retry but a signal is pending, try to handle the
 * signal first. We do not need to release the mmap_sem because
 * it would already be released in __lock_page_or_retry in
 * mm/filemap.c. */
-   if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) {
-   if (!user_mode(regs))
+   if (unlikely(fault & VM_FAULT_RETRY && signal_pending(current))) {
+   if (fatal_signal_pending(current) && !user_mode(regs))
goto no_context;
-   return 0;
+   if (user_mode(regs))
+   return 0;
}
 
/*
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 613e7434c208..0d3fe0ea6a70 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -479,15 +479,16 @@ static int __kprobes do_page_fault(unsigned long addr, 
unsigned int esr,
 
if (fault & VM_FAULT_RETRY) {
/*
-* If we need to retry but a fatal signal is pending,
+* If we need to retry but a signal is pending, try to
 * handle the signal first. We do not need to release
 * the mmap_sem because it would already be released
 * in __lock_page_or_retry in mm/filemap.c.
 */
-   if (fatal_signal_pending(current)) {
-   if (!user_mode(regs))
+   if (signal_pending(current)) {
+   if (fatal_signal_pending(current) && !user_mode(regs))
goto no_context;
-

[PATCH v4 06/10] userfaultfd: Don't retake mmap_sem to emulate NOPAGE

2019-09-22 Thread Peter Xu

This patch removes the risk path in handle_userfault() then we will be
sure that the callers of handle_mm_fault() will know that the VMAs
might have changed.  Meanwhile with previous patch we don't lose
responsiveness as well since the core mm code now can handle the
nonfatal userspace signals even if we return VM_FAULT_RETRY.

Suggested-by: Andrea Arcangeli 
Suggested-by: Linus Torvalds 
Reviewed-by: Jerome Glisse 
Signed-off-by: Peter Xu 
---
 fs/userfaultfd.c | 24 
 1 file changed, 24 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 3c55ee64bcb1..2b3b48e94ae4 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -522,30 +522,6 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned 
long reason)
 
__set_current_state(TASK_RUNNING);
 
-   if (return_to_userland) {
-   if (signal_pending(current) &&
-   !fatal_signal_pending(current)) {
-   /*
-* If we got a SIGSTOP or SIGCONT and this is
-* a normal userland page fault, just let
-* userland return so the signal will be
-* handled and gdb debugging works.  The page
-* fault code immediately after we return from
-* this function is going to release the
-* mmap_sem and it's not depending on it
-* (unlike gup would if we were not to return
-* VM_FAULT_RETRY).
-*
-* If a fatal signal is pending we still take
-* the streamlined VM_FAULT_RETRY failure path
-* and there's no need to retake the mmap_sem
-* in such case.
-*/
-   down_read(>mmap_sem);
-   ret = VM_FAULT_NOPAGE;
-   }
-   }
-
/*
 * Here we race with the list_del; list_add in
 * userfaultfd_ctx_read(), however because we don't ever run
-- 
2.21.0

[PATCH v4 01/10] mm/gup: Rename "nonblocking" to "locked" where proper

2019-09-22 Thread Peter Xu

There's plenty of places around __get_user_pages() that has a parameter
"nonblocking" which does not really mean that "it won't block" (because
it can really block) but instead it shows whether the mmap_sem is
released by up_read() during the page fault handling mostly when
VM_FAULT_RETRY is returned.

We have the correct naming in e.g. get_user_pages_locked() or
get_user_pages_remote() as "locked", however there're still many places
that are using the "nonblocking" as name.

Renaming the places to "locked" where proper to better suite the
functionality of the variable.  While at it, fixing up some of the
comments accordingly.

Reviewed-by: Mike Rapoport 
Reviewed-by: Jerome Glisse 
Reviewed-by: David Hildenbrand 
Signed-off-by: Peter Xu 
---
 mm/gup.c | 44 +---
 mm/hugetlb.c |  8 
 2 files changed, 25 insertions(+), 27 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 98f13ab37bac..eddbb95dcb8f 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -622,12 +622,12 @@ static int get_gate_page(struct mm_struct *mm, unsigned 
long address,
 }
 
 /*
- * mmap_sem must be held on entry.  If @nonblocking != NULL and
- * *@flags does not include FOLL_NOWAIT, the mmap_sem may be released.
- * If it is, *@nonblocking will be set to 0 and -EBUSY returned.
+ * mmap_sem must be held on entry.  If @locked != NULL and *@flags
+ * does not include FOLL_NOWAIT, the mmap_sem may be released.  If it
+ * is, *@locked will be set to 0 and -EBUSY returned.
  */
 static int faultin_page(struct task_struct *tsk, struct vm_area_struct *vma,
-   unsigned long address, unsigned int *flags, int *nonblocking)
+   unsigned long address, unsigned int *flags, int *locked)
 {
unsigned int fault_flags = 0;
vm_fault_t ret;
@@ -639,7 +639,7 @@ static int faultin_page(struct task_struct *tsk, struct 
vm_area_struct *vma,
fault_flags |= FAULT_FLAG_WRITE;
if (*flags & FOLL_REMOTE)
fault_flags |= FAULT_FLAG_REMOTE;
-   if (nonblocking)
+   if (locked)
fault_flags |= FAULT_FLAG_ALLOW_RETRY;
if (*flags & FOLL_NOWAIT)
fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT;
@@ -665,8 +665,8 @@ static int faultin_page(struct task_struct *tsk, struct 
vm_area_struct *vma,
}
 
if (ret & VM_FAULT_RETRY) {
-   if (nonblocking && !(fault_flags & FAULT_FLAG_RETRY_NOWAIT))
-   *nonblocking = 0;
+   if (locked && !(fault_flags & FAULT_FLAG_RETRY_NOWAIT))
+   *locked = 0;
return -EBUSY;
}
 
@@ -743,7 +743,7 @@ static int check_vma_flags(struct vm_area_struct *vma, 
unsigned long gup_flags)
  * only intends to ensure the pages are faulted in.
  * @vmas:  array of pointers to vmas corresponding to each page.
  * Or NULL if the caller does not require them.
- * @nonblocking: whether waiting for disk IO or mmap_sem contention
+ * @locked: whether we're still with the mmap_sem held
  *
  * Returns number of pages pinned. This may be fewer than the number
  * requested. If nr_pages is 0 or negative, returns 0. If no pages
@@ -772,13 +772,11 @@ static int check_vma_flags(struct vm_area_struct *vma, 
unsigned long gup_flags)
  * appropriate) must be called after the page is finished with, and
  * before put_page is called.
  *
- * If @nonblocking != NULL, __get_user_pages will not wait for disk IO
- * or mmap_sem contention, and if waiting is needed to pin all pages,
- * *@nonblocking will be set to 0.  Further, if @gup_flags does not
- * include FOLL_NOWAIT, the mmap_sem will be released via up_read() in
- * this case.
+ * If @locked != NULL, *@locked will be set to 0 when mmap_sem is
+ * released by an up_read().  That can happen if @gup_flags does not
+ * have FOLL_NOWAIT.
  *
- * A caller using such a combination of @nonblocking and @gup_flags
+ * A caller using such a combination of @locked and @gup_flags
  * must therefore hold the mmap_sem for reading only, and recognize
  * when it's been released.  Otherwise, it must be held for either
  * reading or writing and will not be released.
@@ -790,7 +788,7 @@ static int check_vma_flags(struct vm_area_struct *vma, 
unsigned long gup_flags)
 static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
-   struct vm_area_struct **vmas, int *nonblocking)
+   struct vm_area_struct **vmas, int *locked)
 {
long ret = 0, i = 0;
struct vm_area_struct *vma = NULL;
@@ -834,7 +832,7 @@ static long __get_user_pages(struct task_struct *tsk, 
struct mm_struct *mm,
if (is_vm_hugetlb_page(vma)) {
i = follow_hugetlb_page(mm, vma, pages, vmas,
,

[PATCH v4 02/10] mm/gup: Fix __get_user_pages() on fault retry of hugetlb

2019-09-22 Thread Peter Xu

When follow_hugetlb_page() returns with *locked==0, it means we've got
a VM_FAULT_RETRY within the fauling process and we've released the
mmap_sem.  When that happens, we should stop and bail out.

Signed-off-by: Peter Xu 
---
 mm/gup.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/mm/gup.c b/mm/gup.c
index eddbb95dcb8f..e60d32f1674d 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -833,6 +833,16 @@ static long __get_user_pages(struct task_struct *tsk, 
struct mm_struct *mm,
i = follow_hugetlb_page(mm, vma, pages, vmas,
, _pages, i,
gup_flags, locked);
+   if (locked && *locked == 0) {
+   /*
+* We've got a VM_FAULT_RETRY
+* and we've lost mmap_sem.
+* We must stop here.
+*/
+   BUG_ON(gup_flags & FOLL_NOWAIT);
+   BUG_ON(ret != 0);
+   goto out;
+   }
continue;
}
}
-- 
2.21.0

[PATCH v4 00/10] mm: Page fault enhancements

2019-09-22 Thread Peter Xu

This is the 4th version of the PF enhancement series on signal
handlings and fault retries.  This new version does not change
existing patches in v3 but added two more patches to address the
current gup issue on not responding to SIGKILL.  A 3rd new patch is
also added to allow handle_userfaultfd to respect FAULT_FLAG_KILLABLE
though should have no functional change when with the two new patches
above.

I would really appreciate any review comments for the series,
especially for the first two patches which IMHO are even not related
to this patchset and they should either cleanup or fix things.

v4:
- use lore.kernel.org for all the links in commit messages [Kirill]
- one more patch ("mm/gup: Fix __get_user_pages() on fault retry of
  hugetlb") to fix hugetlb path on fault retry
- one more patch ("mm/gup: Allow to react to fatal signals") to:
  - use down_read_killable() properly [Linus]
  - pass in FAULT_FLAG_KILLABLE for all GUP [Linus]
- one more patch ("mm/userfaultfd: Honor FAULT_FLAG_KILLABLE in fault
  path") to let handle_userfaultfd() respect FAULT_FLAG_KILLABLE.
  Should have no functional change after previous two new patches.

v3:
- check fatal signals in __get_user_page_locked() [Linus]
- add r-bs

v2:
- resent previous version, rebase only

=== v1 cover letter ==

This series is split out of userfaultfd-wp series to only cover the
general page fault changes, since it seems to make sense itself.

Basically it does two things:

  (a) Allows the page fault handlers to be more interactive on not
  only SIGKILL, but also the rest of userspace signals (especially
  for user-mode faults), and,

  (b) Allows the page fault retry (VM_FAULT_RETRY) to happen for more
  than once.

I'm keeping the CC list as in uffd-wp v5, hopefully I'm not sending
too much spams...

And, instead of writting again the cover letter, I'm just copy-pasting
my previous link here which has more details on why we do this:

  https://patchwork.kernel.org/cover/10691991/

The major change from that latest version should be that we introduced
a new page fault flag FAULT_FLAG_INTERRUPTIBLE as suggested by Linus
[1] to represents that we would like the fault handler to respond to
non-fatal signals.  Also, we're more careful now on when to do the
immediate return of the page fault for such signals.  For example, now
we'll only check against signal_pending() for user-mode page faults
and we keep the kernel-mode page fault patch untouched for it.  More
information can be found in separate patches.

The patchset is only lightly tested on x86.

All comments are greatly welcomed.  Thanks,

[1] https://lkml.org/lkml/2019/6/25/1382

Peter Xu (10):
  mm/gup: Rename "nonblocking" to "locked" where proper
  mm/gup: Fix __get_user_pages() on fault retry of hugetlb
  mm: Introduce FAULT_FLAG_DEFAULT
  mm: Introduce FAULT_FLAG_INTERRUPTIBLE
  mm: Return faster for non-fatal signals in user mode faults
  userfaultfd: Don't retake mmap_sem to emulate NOPAGE
  mm: Allow VM_FAULT_RETRY for multiple times
  mm/gup: Allow VM_FAULT_RETRY for multiple times
  mm/gup: Allow to react to fatal signals
  mm/userfaultfd: Honor FAULT_FLAG_KILLABLE in fault path

 arch/alpha/mm/fault.c   |  7 +--
 arch/arc/mm/fault.c |  8 ++-
 arch/arm/mm/fault.c | 14 +++--
 arch/arm64/mm/fault.c   | 16 +++---
 arch/hexagon/mm/vm_fault.c  |  6 +--
 arch/ia64/mm/fault.c|  6 +--
 arch/m68k/mm/fault.c| 10 ++--
 arch/microblaze/mm/fault.c  |  6 +--
 arch/mips/mm/fault.c|  6 +--
 arch/nds32/mm/fault.c   | 12 ++---
 arch/nios2/mm/fault.c   |  8 ++-
 arch/openrisc/mm/fault.c|  6 +--
 arch/parisc/mm/fault.c  |  9 ++--
 arch/powerpc/mm/fault.c | 10 ++--
 arch/riscv/mm/fault.c   | 12 ++---
 arch/s390/mm/fault.c| 11 ++--
 arch/sh/mm/fault.c  |  7 ++-
 arch/sparc/mm/fault_32.c|  5 +-
 arch/sparc/mm/fault_64.c|  6 +--
 arch/um/kernel/trap.c   |  7 +--
 arch/unicore32/mm/fault.c   | 11 ++--
 arch/x86/mm/fault.c |  6 +--
 arch/xtensa/mm/fault.c  |  6 +--
 drivers/gpu/drm/ttm/ttm_bo_vm.c | 12 +++--
 fs/userfaultfd.c| 62 ++
 include/linux/mm.h  | 81 
 include/linux/sched/signal.h| 12 +
 mm/filemap.c|  2 +-
 mm/gup.c| 93 +
 mm/hugetlb.c| 17 +++---
 mm/shmem.c  |  2 +-
 31 files changed, 283 insertions(+), 193 deletions(-)

-- 
2.21.0

[PATCH v4 04/10] mm: Introduce FAULT_FLAG_INTERRUPTIBLE

2019-09-22 Thread Peter Xu

handle_userfaultfd() is currently the only one place in the kernel
page fault procedures that can respond to non-fatal userspace signals.
It was trying to detect such an allowance by checking against USER &
KILLABLE flags, which was "un-official".

In this patch, we introduced a new flag (FAULT_FLAG_INTERRUPTIBLE) to
show that the fault handler allows the fault procedure to respond even
to non-fatal signals.  Meanwhile, add this new flag to the default
fault flags so that all the page fault handlers can benefit from the
new flag.  With that, replacing the userfault check to this one.

Since the line is getting even longer, clean up the fault flags a bit
too to ease TTY users.

Although we've got a new flag and applied it, we shouldn't have any
functional change with this patch so far.

Suggested-by: Linus Torvalds 
Reviewed-by: David Hildenbrand 
Signed-off-by: Peter Xu 
---
 fs/userfaultfd.c   |  4 +---
 include/linux/mm.h | 39 ---
 2 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index fe6d804a38dc..3c55ee64bcb1 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -462,9 +462,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned 
long reason)
uwq.ctx = ctx;
uwq.waken = false;
 
-   return_to_userland =
-   (vmf->flags & (FAULT_FLAG_USER|FAULT_FLAG_KILLABLE)) ==
-   (FAULT_FLAG_USER|FAULT_FLAG_KILLABLE);
+   return_to_userland = vmf->flags & FAULT_FLAG_INTERRUPTIBLE;
blocking_state = return_to_userland ? TASK_INTERRUPTIBLE :
 TASK_KILLABLE;
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 57fb5c535f8e..53ec7abb8472 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -383,22 +383,38 @@ extern unsigned int kobjsize(const void *objp);
  */
 extern pgprot_t protection_map[16];
 
-#define FAULT_FLAG_WRITE   0x01/* Fault was a write access */
-#define FAULT_FLAG_MKWRITE 0x02/* Fault was mkwrite of existing pte */
-#define FAULT_FLAG_ALLOW_RETRY 0x04/* Retry fault if blocking */
-#define FAULT_FLAG_RETRY_NOWAIT0x08/* Don't drop mmap_sem and wait 
when retrying */
-#define FAULT_FLAG_KILLABLE0x10/* The fault task is in SIGKILL 
killable region */
-#define FAULT_FLAG_TRIED   0x20/* Second try */
-#define FAULT_FLAG_USER0x40/* The fault originated in 
userspace */
-#define FAULT_FLAG_REMOTE  0x80/* faulting for non current tsk/mm */
-#define FAULT_FLAG_INSTRUCTION  0x100  /* The fault was during an instruction 
fetch */
+/**
+ * Fault flag definitions.
+ *
+ * @FAULT_FLAG_WRITE: Fault was a write fault.
+ * @FAULT_FLAG_MKWRITE: Fault was mkwrite of existing PTE.
+ * @FAULT_FLAG_ALLOW_RETRY: Allow to retry the fault if blocked.
+ * @FAULT_FLAG_RETRY_NOWAIT: Don't drop mmap_sem and wait when retrying.
+ * @FAULT_FLAG_KILLABLE: The fault task is in SIGKILL killable region.
+ * @FAULT_FLAG_TRIED: The fault has been tried once.
+ * @FAULT_FLAG_USER: The fault originated in userspace.
+ * @FAULT_FLAG_REMOTE: The fault is not for current task/mm.
+ * @FAULT_FLAG_INSTRUCTION: The fault was during an instruction fetch.
+ * @FAULT_FLAG_INTERRUPTIBLE: The fault can be interrupted by non-fatal 
signals.
+ */
+#define FAULT_FLAG_WRITE   0x01
+#define FAULT_FLAG_MKWRITE 0x02
+#define FAULT_FLAG_ALLOW_RETRY 0x04
+#define FAULT_FLAG_RETRY_NOWAIT0x08
+#define FAULT_FLAG_KILLABLE0x10
+#define FAULT_FLAG_TRIED   0x20
+#define FAULT_FLAG_USER0x40
+#define FAULT_FLAG_REMOTE  0x80
+#define FAULT_FLAG_INSTRUCTION 0x100
+#define FAULT_FLAG_INTERRUPTIBLE   0x200
 
 /*
  * The default fault flags that should be used by most of the
  * arch-specific page fault handlers.
  */
 #define FAULT_FLAG_DEFAULT  (FAULT_FLAG_ALLOW_RETRY | \
-FAULT_FLAG_KILLABLE)
+FAULT_FLAG_KILLABLE | \
+FAULT_FLAG_INTERRUPTIBLE)
 
 #define FAULT_FLAG_TRACE \
{ FAULT_FLAG_WRITE, "WRITE" }, \
@@ -409,7 +425,8 @@ extern pgprot_t protection_map[16];
{ FAULT_FLAG_TRIED, "TRIED" }, \
{ FAULT_FLAG_USER,  "USER" }, \
{ FAULT_FLAG_REMOTE,"REMOTE" }, \
-   { FAULT_FLAG_INSTRUCTION,   "INSTRUCTION" }
+   { FAULT_FLAG_INSTRUCTION,   "INSTRUCTION" }, \
+   { FAULT_FLAG_INTERRUPTIBLE, "INTERRUPTIBLE" }
 
 /*
  * vm_fault is filled by the the pagefault handler and passed to the vma's
-- 
2.21.0

[PATCH v4 03/10] mm: Introduce FAULT_FLAG_DEFAULT

2019-09-22 Thread Peter Xu

Although there're tons of arch-specific page fault handlers, most of
them are still sharing the same initial value of the page fault flags.
Say, merely all of the page fault handlers would allow the fault to be
retried, and they also allow the fault to respond to SIGKILL.

Let's define a default value for the fault flags to replace those
initial page fault flags that were copied over.  With this, it'll be
far easier to introduce new fault flag that can be used by all the
architectures instead of touching all the archs.

Reviewed-by: David Hildenbrand 
Signed-off-by: Peter Xu 
---
 arch/alpha/mm/fault.c  | 2 +-
 arch/arc/mm/fault.c| 2 +-
 arch/arm/mm/fault.c| 2 +-
 arch/arm64/mm/fault.c  | 2 +-
 arch/hexagon/mm/vm_fault.c | 2 +-
 arch/ia64/mm/fault.c   | 2 +-
 arch/m68k/mm/fault.c   | 2 +-
 arch/microblaze/mm/fault.c | 2 +-
 arch/mips/mm/fault.c   | 2 +-
 arch/nds32/mm/fault.c  | 2 +-
 arch/nios2/mm/fault.c  | 2 +-
 arch/openrisc/mm/fault.c   | 2 +-
 arch/parisc/mm/fault.c | 2 +-
 arch/powerpc/mm/fault.c| 2 +-
 arch/riscv/mm/fault.c  | 2 +-
 arch/s390/mm/fault.c   | 2 +-
 arch/sh/mm/fault.c | 2 +-
 arch/sparc/mm/fault_32.c   | 2 +-
 arch/sparc/mm/fault_64.c   | 2 +-
 arch/um/kernel/trap.c  | 2 +-
 arch/unicore32/mm/fault.c  | 2 +-
 arch/x86/mm/fault.c| 2 +-
 arch/xtensa/mm/fault.c | 2 +-
 include/linux/mm.h | 7 +++
 24 files changed, 30 insertions(+), 23 deletions(-)

diff --git a/arch/alpha/mm/fault.c b/arch/alpha/mm/fault.c
index 741e61ef9d3f..de4cc6936391 100644
--- a/arch/alpha/mm/fault.c
+++ b/arch/alpha/mm/fault.c
@@ -89,7 +89,7 @@ do_page_fault(unsigned long address, unsigned long mmcsr,
const struct exception_table_entry *fixup;
int si_code = SEGV_MAPERR;
vm_fault_t fault;
-   unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
+   unsigned int flags = FAULT_FLAG_DEFAULT;
 
/* As of EV6, a load into $31/$f31 is a prefetch, and never faults
   (or is suppressed by the PALcode).  Support that for older CPUs
diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c
index 3861543b66a0..61919e4e4eec 100644
--- a/arch/arc/mm/fault.c
+++ b/arch/arc/mm/fault.c
@@ -94,7 +94,7 @@ void do_page_fault(unsigned long address, struct pt_regs 
*regs)
 (regs->ecr_cause == ECR_C_PROTV_INST_FETCH))
exec = 1;
 
-   flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
+   flags = FAULT_FLAG_DEFAULT;
if (user_mode(regs))
flags |= FAULT_FLAG_USER;
if (write)
diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index 890eeaac3cbb..2ae28ffec622 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -241,7 +241,7 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct 
pt_regs *regs)
struct mm_struct *mm;
int sig, code;
vm_fault_t fault;
-   unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
+   unsigned int flags = FAULT_FLAG_DEFAULT;
 
if (kprobe_page_fault(regs, fsr))
return 0;
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index cfd65b63f36f..613e7434c208 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -410,7 +410,7 @@ static int __kprobes do_page_fault(unsigned long addr, 
unsigned int esr,
struct mm_struct *mm = current->mm;
vm_fault_t fault, major = 0;
unsigned long vm_flags = VM_READ | VM_WRITE;
-   unsigned int mm_flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
+   unsigned int mm_flags = FAULT_FLAG_DEFAULT;
 
if (kprobe_page_fault(regs, esr))
return 0;
diff --git a/arch/hexagon/mm/vm_fault.c b/arch/hexagon/mm/vm_fault.c
index b3bc71680ae4..223787e01bdd 100644
--- a/arch/hexagon/mm/vm_fault.c
+++ b/arch/hexagon/mm/vm_fault.c
@@ -41,7 +41,7 @@ void do_page_fault(unsigned long address, long cause, struct 
pt_regs *regs)
int si_code = SEGV_MAPERR;
vm_fault_t fault;
const struct exception_table_entry *fixup;
-   unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
+   unsigned int flags = FAULT_FLAG_DEFAULT;
 
/*
 * If we're in an interrupt or have no user context,
diff --git a/arch/ia64/mm/fault.c b/arch/ia64/mm/fault.c
index c2f299fe9e04..d039b846f671 100644
--- a/arch/ia64/mm/fault.c
+++ b/arch/ia64/mm/fault.c
@@ -65,7 +65,7 @@ ia64_do_page_fault (unsigned long address, unsigned long isr, 
struct pt_regs *re
struct mm_struct *mm = current->mm;
unsigned long mask;
vm_fault_t fault;
-   unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
+   unsigned int flags = FAULT_FLAG_DEFAULT;
 
mask = isr >> IA64_ISR_X_BIT) & 1UL) << VM_EXEC_BIT)
| (((isr >> IA64_ISR_W_BIT) & 1UL) << VM_WRITE_BIT));
diff --git a/arch/m68k/mm/fault.c b/arch/m68k/mm/fault.c
index e9b1d7585b43..8e734309ace9 100644

Re: [PATCH] tty:vt: Add check the return value of kzalloc to avoid oops

2019-09-22 Thread Nicolas Pitre

On Sat, 21 Sep 2019, Xiaoming Ni wrote:

> @ Nicolas Pitre
> Can I make a v2 patch based on your advice ?
> Or you will submit a patch for "GFP_WONTFAIL" yourself ?

Here's a patch implementing what I had in mind. This is compile tested 
only.

- >8

Subject: [PATCH] mm: add __GFP_WONTFAIL and GFP_ONBOOT

Some memory allocations are very unlikely to fail during system boot.
Because of that, the code often doesn't bother to check for allocation
failure, but this gets reported anyway.

As an alternative to adding code to check for NULL that has almost no
chance of ever being exercised, let's use a GFP flag to identify those
cases and panic the kernel if allocation failure ever occurs.

Conversion of one such instance is also included.

Signed-off-by: Nicolas Pitre 

diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c
index 34aa39d1ae..bd0a0e4807 100644
--- a/drivers/tty/vt/vt.c
+++ b/drivers/tty/vt/vt.c
@@ -3356,7 +3356,7 @@ static int __init con_init(void)
}
 
for (currcons = 0; currcons < MIN_NR_CONSOLES; currcons++) {
-   vc_cons[currcons].d = vc = kzalloc(sizeof(struct vc_data), 
GFP_NOWAIT);
+   vc_cons[currcons].d = vc = kzalloc(sizeof(struct vc_data), 
GFP_ONBOOT);
INIT_WORK(_cons[currcons].SAK_work, vc_SAK);
tty_port_init(>port);
visual_init(vc, currcons, 1);
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index f33881688f..6f33575cd6 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -39,8 +39,9 @@ struct vm_area_struct;
 #define ___GFP_HARDWALL0x10u
 #define ___GFP_THISNODE0x20u
 #define ___GFP_ACCOUNT 0x40u
+#define ___GFP_WONTFAIL0x80u
 #ifdef CONFIG_LOCKDEP
-#define ___GFP_NOLOCKDEP   0x80u
+#define ___GFP_NOLOCKDEP   0x100u
 #else
 #define ___GFP_NOLOCKDEP   0
 #endif
@@ -187,6 +188,12 @@ struct vm_area_struct;
  * definitely preferable to use the flag rather than opencode endless
  * loop around allocator.
  * Using this flag for costly allocations is _highly_ discouraged.
+ *
+ * %__GFP_WONTFAIL: No allocation error is expected what so ever. The
+ * caller presumes allocation will always succeed (e.g. the system is still
+ * booting, the allocation size is relatively small and memory should be
+ * plentiful) so testing for failure is skipped. If allocation ever fails
+ * then the kernel will simply panic.
  */
 #define __GFP_IO   ((__force gfp_t)___GFP_IO)
 #define __GFP_FS   ((__force gfp_t)___GFP_FS)
@@ -196,6 +203,7 @@ struct vm_area_struct;
 #define __GFP_RETRY_MAYFAIL((__force gfp_t)___GFP_RETRY_MAYFAIL)
 #define __GFP_NOFAIL   ((__force gfp_t)___GFP_NOFAIL)
 #define __GFP_NORETRY  ((__force gfp_t)___GFP_NORETRY)
+#define __GFP_WONTFAIL ((__force gfp_t)___GFP_WONTFAIL)
 
 /**
  * DOC: Action modifiers
@@ -217,7 +225,7 @@ struct vm_area_struct;
 #define __GFP_NOLOCKDEP ((__force gfp_t)___GFP_NOLOCKDEP)
 
 /* Room for N __GFP_FOO bits */
-#define __GFP_BITS_SHIFT (23 + IS_ENABLED(CONFIG_LOCKDEP))
+#define __GFP_BITS_SHIFT (24 + IS_ENABLED(CONFIG_LOCKDEP))
 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
 
 /**
@@ -285,6 +293,9 @@ struct vm_area_struct;
  * available and will not wake kswapd/kcompactd on failure. The _LIGHT
  * version does not attempt reclaim/compaction at all and is by default used
  * in page fault path, while the non-light is used by khugepaged.
+ *
+ * %GFP_ONBOOT is for relatively small allocations that are not expected
+ * to fail while the system is booting.
  */
 #define GFP_ATOMIC (__GFP_HIGH|__GFP_ATOMIC|__GFP_KSWAPD_RECLAIM)
 #define GFP_KERNEL (__GFP_RECLAIM | __GFP_IO | __GFP_FS)
@@ -300,6 +311,7 @@ struct vm_area_struct;
 #define GFP_TRANSHUGE_LIGHT((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \
 __GFP_NOMEMALLOC | __GFP_NOWARN) & ~__GFP_RECLAIM)
 #define GFP_TRANSHUGE  (GFP_TRANSHUGE_LIGHT | __GFP_DIRECT_RECLAIM)
+#define GFP_ONBOOT (GFP_NOWAIT | __GFP_WONTFAIL)
 
 /* Convert GFP flags to their corresponding migrate type */
 #define GFP_MOVABLE_MASK (__GFP_RECLAIMABLE|__GFP_MOVABLE)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ff5484fdbd..36dee09f7f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4625,6 +4625,14 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int 
order,
 fail:
warn_alloc(gfp_mask, ac->nodemask,
"page allocation failure: order:%u", order);
+   if (gfp_mask & __GFP_WONTFAIL) {
+   /*
+* The assumption was wrong. This is never supposed to happen.
+* Caller most likely won't check for a returned NULL either.
+* So the only reasonable thing to do is to pannic.
+*/
+   panic("Failed to allocate memory despite GFP_WONTFAIL\n");
+   }
 got_pg:
return page;
 }

Re: [PATCH v7 08/21] RISC-V: KVM: Implement KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls

2019-09-22 Thread Anup Patel

On Wed, Sep 4, 2019 at 9:44 PM Anup Patel  wrote:
>
> For KVM RISC-V, we use KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls to access
> VCPU config and registers from user-space.
>
> We have three types of VCPU registers:
> 1. CONFIG - these are VCPU config and capabilities
> 2. CORE   - these are VCPU general purpose registers
> 3. CSR- these are VCPU control and status registers
>
> The CONFIG registers available to user-space are ISA and TIMEBASE. Out
> of these, TIMEBASE is a read-only register which inform user-space about
> VCPU timer base frequency. The ISA register is a read and write register
> where user-space can only write the desired VCPU ISA capabilities before
> running the VCPU.
>
> The CORE registers available to user-space are PC, RA, SP, GP, TP, A0-A7,
> T0-T6, S0-S11 and MODE. Most of these are RISC-V general registers except
> PC and MODE. The PC register represents program counter whereas the MODE
> register represent VCPU privilege mode (i.e. S/U-mode).
>
> The CSRs available to user-space are SSTATUS, SIE, STVEC, SSCRATCH, SEPC,
> SCAUSE, STVAL, SIP, and SATP. All of these are read/write registers.
>
> In future, more VCPU register types will be added (such as FP) for the
> KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls.
>
> Signed-off-by: Anup Patel 
> Acked-by: Paolo Bonzini 
> Reviewed-by: Paolo Bonzini 
> ---
>  arch/riscv/include/uapi/asm/kvm.h |  46 +-
>  arch/riscv/kvm/vcpu.c | 235 +-
>  2 files changed, 278 insertions(+), 3 deletions(-)
>
> diff --git a/arch/riscv/include/uapi/asm/kvm.h 
> b/arch/riscv/include/uapi/asm/kvm.h
> index 6dbc056d58ba..08c4515ad71b 100644
> --- a/arch/riscv/include/uapi/asm/kvm.h
> +++ b/arch/riscv/include/uapi/asm/kvm.h
> @@ -23,8 +23,15 @@
>
>  /* for KVM_GET_REGS and KVM_SET_REGS */
>  struct kvm_regs {
> +   /* out (KVM_GET_REGS) / in (KVM_SET_REGS) */
> +   struct user_regs_struct regs;
> +   unsigned long mode;
>  };

As discussed in LPC 2019 with Alex Graf, I will add separate
struct for CORE registers instead of re-using "struct kvm_regs".

>
> +/* Possible privilege modes for kvm_regs */
> +#define KVM_RISCV_MODE_S   1
> +#define KVM_RISCV_MODE_U   0
> +
>  /* for KVM_GET_FPU and KVM_SET_FPU */
>  struct kvm_fpu {
>  };
> @@ -41,10 +48,47 @@ struct kvm_guest_debug_arch {
>  struct kvm_sync_regs {
>  };
>
> -/* dummy definition */
> +/* for KVM_GET_SREGS and KVM_SET_SREGS */
>  struct kvm_sregs {
> +   unsigned long sstatus;
> +   unsigned long sie;
> +   unsigned long stvec;
> +   unsigned long sscratch;
> +   unsigned long sepc;
> +   unsigned long scause;
> +   unsigned long stval;
> +   unsigned long sip;
> +   unsigned long satp;
> +};

Same as above, I will add separate struct for CSR registers instead
of re-using "struct kvm_sregs".

> +
> +/* for KVM_GET_ONE_REG and KVM_SET_ONE_REG */
> +struct kvm_riscv_config {
> +   unsigned long isa;
> +   unsigned long tbfreq;
>  };
>
> +#define KVM_REG_SIZE(id)   \
> +   (1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
> +
> +/* If you need to interpret the index values, here is the key: */
> +#define KVM_REG_RISCV_TYPE_MASK0xFF00
> +#define KVM_REG_RISCV_TYPE_SHIFT   24
> +
> +/* Config registers are mapped as type 1 */
> +#define KVM_REG_RISCV_CONFIG   (0x01 << KVM_REG_RISCV_TYPE_SHIFT)
> +#define KVM_REG_RISCV_CONFIG_REG(name) \
> +   (offsetof(struct kvm_riscv_config, name) / sizeof(unsigned long))
> +
> +/* Core registers are mapped as type 2 */
> +#define KVM_REG_RISCV_CORE (0x02 << KVM_REG_RISCV_TYPE_SHIFT)
> +#define KVM_REG_RISCV_CORE_REG(name)   \
> +   (offsetof(struct kvm_regs, name) / sizeof(unsigned long))
> +
> +/* Control and status registers are mapped as type 3 */
> +#define KVM_REG_RISCV_CSR  (0x03 << KVM_REG_RISCV_TYPE_SHIFT)
> +#define KVM_REG_RISCV_CSR_REG(name)\
> +   (offsetof(struct kvm_sregs, name) / sizeof(unsigned long))
> +
>  #endif
>
>  #endif /* __LINUX_KVM_RISCV_H */
> diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
> index 3223f723f79e..b95dfc959009 100644
> --- a/arch/riscv/kvm/vcpu.c
> +++ b/arch/riscv/kvm/vcpu.c
> @@ -165,6 +165,215 @@ vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, 
> struct vm_fault *vmf)
> return VM_FAULT_SIGBUS;
>  }
>
> +static int kvm_riscv_vcpu_get_reg_config(struct kvm_vcpu *vcpu,
> +const struct kvm_one_reg *reg)
> +{
> +   unsigned long __user *uaddr =
> +   (unsigned long __user *)(unsigned long)reg->addr;
> +   unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
> +   KVM_REG_SIZE_MASK |
> +   KVM_REG_RISCV_CONFIG);
> +   unsigned long reg_val;
> +
> +   if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long))
> +   return

Re: [PATCH v7 02/21] RISC-V: Add bitmap reprensenting ISA features common across CPUs

2019-09-22 Thread Anup Patel

On Sat, Sep 21, 2019 at 3:31 PM Paul Walmsley  wrote:
>
> Hi Anup,
>
> Thanks for changing this to use a bitmap.  A few comments below -
>
> On Wed, 4 Sep 2019, Anup Patel wrote:
>
> > This patch adds riscv_isa bitmap which represents Host ISA features
> > common across all Host CPUs. The riscv_isa is not same as elf_hwcap
> > because elf_hwcap will only have ISA features relevant for user-space
> > apps whereas riscv_isa will have ISA features relevant to both kernel
> > and user-space apps.
> >
> > One of the use-case for riscv_isa bitmap is in KVM hypervisor where
> > we will use it to do following operations:
> >
> > 1. Check whether hypervisor extension is available
> > 2. Find ISA features that need to be virtualized (e.g. floating
> >point support, vector extension, etc.)
> >
> > Signed-off-by: Anup Patel 
> > Signed-off-by: Atish Patra 
> > Reviewed-by: Alexander Graf 
> > ---
> >  arch/riscv/include/asm/hwcap.h | 26 +++
> >  arch/riscv/kernel/cpufeature.c | 79 --
> >  2 files changed, 102 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
> > index 7ecb7c6a57b1..9b657375aa51 100644
> > --- a/arch/riscv/include/asm/hwcap.h
> > +++ b/arch/riscv/include/asm/hwcap.h
> > @@ -8,6 +8,7 @@
> >  #ifndef __ASM_HWCAP_H
> >  #define __ASM_HWCAP_H
> >
> > +#include 
> >  #include 
> >
> >  #ifndef __ASSEMBLY__
> > @@ -22,5 +23,30 @@ enum {
> >  };
> >
> >  extern unsigned long elf_hwcap;
> > +
> > +#define RISCV_ISA_EXT_a  ('a' - 'a')
> > +#define RISCV_ISA_EXT_c  ('c' - 'a')
> > +#define RISCV_ISA_EXT_d  ('d' - 'a')
> > +#define RISCV_ISA_EXT_f  ('f' - 'a')
> > +#define RISCV_ISA_EXT_h  ('h' - 'a')
> > +#define RISCV_ISA_EXT_i  ('i' - 'a')
> > +#define RISCV_ISA_EXT_m  ('m' - 'a')
> > +#define RISCV_ISA_EXT_s  ('s' - 'a')
> > +#define RISCV_ISA_EXT_u  ('u' - 'a')
> > +#define RISCV_ISA_EXT_zicsr  (('z' - 'a') + 1)
> > +#define RISCV_ISA_EXT_zifencei   (('z' - 'a') + 2)
> > +#define RISCV_ISA_EXT_zam(('z' - 'a') + 3)
> > +#define RISCV_ISA_EXT_ztso   (('z' - 'a') + 4)
>
> If we add the Z extensions here, it's probably best if we drop Zam from
> this list.  The rationale is, as maintainers, we're planning to hold off
> on merging any support for extensions or modules that aren't in the
> "frozen" or "ratified" states, and according to the RISC-V specs, Zicsr,
> Zifencei, and Ztso are all either frozen or ratified.  However, see
> below -

No problem, I will remove Zam define.

Please add documentation under Documentation/riscv regarding the
policy of not merging support for RISC-V extensions which aren't in
"frozen" or "ratified" state.

>
> > +
> > +#define RISCV_ISA_EXT_MAX256
> > +
> > +unsigned long riscv_isa_extension_base(const unsigned long *isa_bitmap);
> > +
> > +#define riscv_isa_extension_mask(ext) BIT_MASK(RISCV_ISA_EXT_##ext)
> > +
> > +bool __riscv_isa_extension_available(const unsigned long *isa_bitmap, int 
> > bit);
> > +#define riscv_isa_extension_available(isa_bitmap, ext)   \
> > + __riscv_isa_extension_available(isa_bitmap, RISCV_ISA_EXT_##ext)
> > +
> >  #endif
> >  #endif
> > diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> > index b1ade9a49347..4ce71ce5e290 100644
> > --- a/arch/riscv/kernel/cpufeature.c
> > +++ b/arch/riscv/kernel/cpufeature.c
> > @@ -6,21 +6,64 @@
> >   * Copyright (C) 2017 SiFive
> >   */
> >
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> >  #include 
> >
> >  unsigned long elf_hwcap __read_mostly;
> > +
> > +/* Host ISA bitmap */
> > +static DECLARE_BITMAP(riscv_isa, RISCV_ISA_EXT_MAX) __read_mostly;
> > +
> >  #ifdef CONFIG_FPU
> >  bool has_fpu __read_mostly;
> >  #endif
> >
> > +/**
> > + * riscv_isa_extension_base - Get base extension word
> > + *
> > + * @isa_bitmap ISA bitmap to use
> > + * @returns base extension word as unsigned long value
> > + *
> > + * NOTE: If isa_bitmap is NULL then Host ISA bitmap will be used.
> > + */
>
> Am happy to see comments that can be automatically parsed, but could you
> reformat them into kernel-doc format?
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/doc-guide/kernel-doc.rst

Sure, I will update comments as-per kernel-doc.rst.

>
> > +unsigned long riscv_isa_extension_base(const unsigned long *isa_bitmap)
> > +{
> > + if (!isa_bitmap)
> > + return riscv_isa[0];
> > + return isa_bitmap[0];
> > +}
> > +EXPORT_SYMBOL_GPL(riscv_isa_extension_base);
> > +
> > +/**
> > + * __riscv_isa_extension_available - Check whether given extension
> > + * is available or not
> > + *
> > + * @isa_bitmap ISA bitmap to use
> > + * @bit bit position of the desired extension
> > + * @returns true or false
> > + *
> > + * NOTE: If isa_bitmap is NULL then Host ISA bitmap will be used.
> > + */
>
> Same

Re: [PATCH v1 0/2] pinctrl: Add new pinctrl/GPIO driver

2019-09-22 Thread Tanwar, Rahul



Hi Mika,

On 13/9/2019 4:18 PM, Mika Westerberg wrote:
> On Thu, Sep 12, 2019 at 11:11:32AM +0100, Linus Walleij wrote:
>> Hi Rahul,
>>
>> thanks for your patches!
>>
>> On Thu, Sep 12, 2019 at 8:59 AM Rahul Tanwar
>>  wrote:
>>
>>> This series is to add pinctrl & GPIO controller driver for a new SoC.
>>> Patch 1 adds pinmux & GPIO controller driver.
>>> Patch 2 adds the dt bindings document & include file.
>>>
>>> Patches are against Linux 5.3-rc5 at below Git tree:
>>> git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl.git
>> OK nice, I think you need to include Mika Westerberg on this review
>> as well, because I think he likes to stay on top of all things intel
>> in pin control. (Also included two other Intel folks in Finland who usually
>> take an interest in these things.)
> Thanks Linus for looping me in.
>
> Even if this is not directly based on the stuff we have under
> drivers/pinctrl/intel/*, I have a couple of comments. I don't have this
> patch series in my inbox so I'm commenting here.
>
> Since the driver name is equilibrium I suggest you to name
> intel_pinctrl_driver and the like (probe, remove) to follow that
> convention to avoid confusing this with the Intel pinctrl drivers under
> drivers/pinctrl/intel/*.
>
> Maybe use eqbr prefix so then intel_pinctrl_driver becomes
> eqbr_pinctrl_driver and so on. Also all the structures like
> intel_pinctrl_drv_data should be changed accordingly.
>
> Ditto for:
>
> MODULE_DESCRIPTION("Intel Pinctrl Driver for LGM SoC");
>
> I think better would be:
>
> MODULE_DESCRIPTION("Pinctrl Driver for LGM SoC (Equilibrium)");
>
> Anyway you get the idea :)

Yes, i understand your point. Will update in v2. Thanks.

Regards,
Rahul

[RFC v3] zswap: Add CONFIG_ZSWAP_IO_SWITCH to handle swap IO issue

2019-09-22 Thread Hui Zhu

This is the third version of this patch.  The first and second version
is in [1] and [2].
This verion is updated according to the comments from Randy Dunlap
in [3].

Currently, I use a VM that has 2 CPUs, 4G memory and 4G swap file.
I found that swap will affect the IO performance when it is running.
So I open zswap to handle it because it just use CPU cycles but not
disk IO.

It work OK but I found that zswap is slower than normal swap in this
VM.  zswap is about 300M/s and normal swap is about 500M/s. (The reason
is disk inside VM has fscache in host machine.)
So open zswap is make memory shrinker slower but good for IO performance
in this VM.
So I just want zswap work when the disk of the swap file is under high
IO load.

This commit is designed for this idea.
It add two parameters read_in_flight_limit and write_in_flight_limit to
zswap.
In zswap_frontswap_store, pages will be stored to zswap only when
the IO in flight number of swap device is bigger than
zswap_read_in_flight_limit or zswap_write_in_flight_limit
when zswap is enabled.
Then the zswap just work when the IO in flight number of swap device
is low.

[1] https://lkml.org/lkml/2019/9/11/935
[2] https://lkml.org/lkml/2019/9/20/90
[3] https://lkml.org/lkml/2019/9/20/1076

Signed-off-by: Hui Zhu 
---
 include/linux/swap.h |  3 +++
 mm/Kconfig   | 18 
 mm/page_io.c | 16 +++
 mm/zswap.c   | 58 
 4 files changed, 95 insertions(+)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index de2c67a..82b621f 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -389,6 +389,9 @@ extern void end_swap_bio_write(struct bio *bio);
 extern int __swap_writepage(struct page *page, struct writeback_control *wbc,
bio_end_io_t end_write_func);
 extern int swap_set_page_dirty(struct page *page);
+#ifdef CONFIG_ZSWAP_IO_SWITCH
+extern void swap_io_in_flight(struct page *page, unsigned int inflight[2]);
+#endif
 
 int add_swap_extent(struct swap_info_struct *sis, unsigned long start_page,
unsigned long nr_pages, sector_t start_block);
diff --git a/mm/Kconfig b/mm/Kconfig
index 56cec63..387c3b5 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -546,6 +546,24 @@ config ZSWAP
  they have not be fully explored on the large set of potential
  configurations and workloads that exist.
 
+config ZSWAP_IO_SWITCH
+   bool "Compressed cache for swap pages according to the IO status"
+   depends on ZSWAP
+   help
+ This function helps the system that normal swap speed is higher
+ than zswap speed to handle the swap IO issue.
+ For example, a VM where the disk device is not set cache config or
+ set cache=writeback.
+
+ This function makes zswap just work when the disk of the swap file
+ is under high IO load.
+ It add two parameters (read_in_flight_limit and
+ write_in_flight_limit) to zswap.  When zswap is enabled, pages will
+ be stored to zswap only when the IO in flight number of swap device
+ is bigger than zswap_read_in_flight_limit or
+ zswap_write_in_flight_limit.
+ If unsure, say "n".
+
 config ZPOOL
tristate "Common API for compressed memory storage"
help
diff --git a/mm/page_io.c b/mm/page_io.c
index 24ee600..e66b050 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -434,3 +434,19 @@ int swap_set_page_dirty(struct page *page)
return __set_page_dirty_no_writeback(page);
}
 }
+
+#ifdef CONFIG_ZSWAP_IO_SWITCH
+void swap_io_in_flight(struct page *page, unsigned int inflight[2])
+{
+   struct swap_info_struct *sis = page_swap_info(page);
+
+   if (!sis->bdev) {
+   inflight[0] = 0;
+   inflight[1] = 0;
+   return;
+   }
+
+   part_in_flight_rw(bdev_get_queue(sis->bdev), sis->bdev->bd_part,
+ inflight);
+}
+#endif
diff --git a/mm/zswap.c b/mm/zswap.c
index 0e22744..0190b2d 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -62,6 +62,14 @@ static u64 zswap_reject_compress_poor;
 static u64 zswap_reject_alloc_fail;
 /* Store failed because the entry metadata could not be allocated (rare) */
 static u64 zswap_reject_kmemcache_fail;
+#ifdef CONFIG_ZSWAP_IO_SWITCH
+/*
+ * Store failed because zswap_read_in_flight_limit or
+ * zswap_write_in_flight_limit is bigger than IO in flight number of
+ * swap device
+ */
+static u64 zswap_reject_io;
+#endif
 /* Duplicate store was encountered (rare) */
 static u64 zswap_duplicate_entry;
 
@@ -114,6 +122,24 @@ static bool zswap_same_filled_pages_enabled = true;
 module_param_named(same_filled_pages_enabled, zswap_same_filled_pages_enabled,
   bool, 0644);
 
+#ifdef CONFIG_ZSWAP_IO_SWITCH
+/*
+ * zswap will not try to store the page if zswap_read_in_flight_limit is
+ * bigger than IO read in flight number of swap device
+ */
+static unsigned int

Re: [PATCH v9 07/11] dt-bindings: pwm: pwm-mediatek: add a property "num-pwms"

2019-09-22 Thread Sam Shih

On Sat, 2019-09-21 at 02:21 +0200, Thierry Reding wrote:
> On Fri, Sep 20, 2019 at 06:49:07AM +0800, Sam Shih wrote:
> > From: Ryder Lee 
> > 
> > This adds a property "num-pwms" in example so that we could
> > specify the number of PWM channels via device tree.
> > 
> > Signed-off-by: Ryder Lee 
> > Signed-off-by: Sam Shih 
> > Reviewed-by: Matthias Brugger 
> > Acked-by: Uwe Kleine-König 
> > ---
> > Changes since v6:
> > Follow reviewers's comments:
> > - The subject should indicate this is for Mediatek
> > 
> > Changes since v5:
> > - Add an Acked-by tag
> > - This file is original v4 patch 5/10
> > (https://patchwork.kernel.org/patch/11102577/)
> > 
> > ---
> >  Documentation/devicetree/bindings/pwm/pwm-mediatek.txt | 7 ---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> You failed to address Rob's questions repeatedly and I agree with him
> that you can just as easily derive the number of PWMs from the specific
> compatible string. I won't be applying this and none of the patches that
> depend on it.
> 

Hi, 

Thanks for getting back to me.

New pwm driver (patch 04/11 : "pwm: mediatek: allocate the clks array
dynamically") can support different variants with different number of
PWMs by the new property 

For example:
1. Use "num-pwms" = <2> and assign clocks pwm1, pwm2 for mt7622
2. Use "num-pwms" = <6> and assign clocks pwm1, pwm2, pwm3, pwm4, pwm5,
pwm6 for mt7622.

If we just as easily derive the number of PWMs from the specific
compatible string in this document:

   - "pwm1-6": the six per PWM clocks for mt7622
   
This looks like all "pwm1", "pwm2", "pwm3", "pwm4", "pwm5", "pwm6" is
required property in DT, It doesn't make sense.
   
So we removed those descriptions and added  

   - "pwm1-N": the PWM clocks for each channel 
   
   
But the max number of clocks from the compatible string are still
important information that should be provide in this document.


What do you think of this?

   - "pwm1-N": per PWM clocks for mt2712, the max number of PWM channels
is 8

   - "pwm1-N": per PWM clocks for mt7622, the max number of PWM channels
is 6

   - "pwm1-N": per PWM clocks for mt7623, the max number of PWM channels
is 5

   where N starting from 1 to the maximum number of PWM channels
   - num-pwms: the number of PWM channels.

   
Thanks
Best Regards
Sam

[PATCH] smack: fix an compile error in smack_post_notification

2019-09-22 Thread zhong jiang

I hit the following error when compile the kernel.

security/smack/smack_lsm.c: In function smack_post_notification:
security/smack/smack_lsm.c:4383:7: error: dereferencing pointer to incomplete 
type struct watch_notification
  if (n->type == WATCH_TYPE_META)
   ^~
security/smack/smack_lsm.c:4383:17: error: WATCH_TYPE_META undeclared (first 
use in this function); did you mean TCA_PIE_BETA?
  if (n->type == WATCH_TYPE_META)
 ^~~
 TCA_PIE_BETA
security/smack/smack_lsm.c:4383:17: note: each undeclared identifier is 
reported only once for each function it appears in

Signed-off-by: zhong jiang 
---
 security/smack/smack.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/security/smack/smack.h b/security/smack/smack.h
index 62529f3..02b05a2 100644
--- a/security/smack/smack.h
+++ b/security/smack/smack.h
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Use IPv6 port labeling if IPv6 is enabled and secmarks
-- 
1.7.12.4

RE: [RFC PATCH v3 4/6] psci: Add hvc call service for ptp_kvm.

2019-09-22 Thread Jianyong Wu (Arm Technology China)

Hi Marc, Paolo,

> -Original Message-
> From: Paolo Bonzini 
> Sent: Thursday, September 19, 2019 8:13 PM
> To: Marc Zyngier ; Jianyong Wu (Arm Technology China)
> ; net...@vger.kernel.org; yangbo...@nxp.com;
> john.stu...@linaro.org; t...@linutronix.de; sean.j.christopher...@intel.com;
> richardcoch...@gmail.com; Mark Rutland ; Will
> Deacon ; Suzuki Poulose
> 
> Cc: linux-kernel@vger.kernel.org; k...@vger.kernel.org; Steve Capper
> ; Kaly Xin (Arm Technology China)
> ; Justin He (Arm Technology China)
> ; nd ; linux-arm-
> ker...@lists.infradead.org
> Subject: Re: [RFC PATCH v3 4/6] psci: Add hvc call service for ptp_kvm.
> 
> On 19/09/19 13:39, Marc Zyngier wrote:
> >> I don't think it's ugly but more important, using tk->tkr_mono.clock
> >> is incorrect.  See how the x86 code hardcodes _clock, it's the
> >> same for ARM.
> > Not really. The guest kernel is free to use any clocksource it wishes.
> 
> Understood, in fact it's the same on x86.
> 
> However, for PTP to work, the cycles value returned by the clocksource must
> match the one returned by the hypercall.  So for ARM
> get_device_system_crosststamp must receive the arch timer clocksource, so
> that it will return -ENODEV if the active clocksource is anything else.
> 

After day's reflection on this, I'm a little clear about this issue, let me 
clarify it.
I think there is three principles for this issue:
(1): guest and host can use different clocksouces as their current clocksouce 
at the same time
and we can't  or it's not easy to probe that if they use the same one.
(2): the cycle value and the clocksouce which passed to 
get_device_system_crosststamp must be match.
(3): ptp_kvm target to increase the time sync precision so we should choose a 
high rate clocksource as ptp_kvm clocksource.
Base on (1) and (2) we can deduce that the ptp_kvm clocksource should be better 
a fixed one. So we can test if the cycle and
clocksource is match. 
Base on (3) the arch_sys_timer should be chosen, as it's the best clocksource 
by far for arm.

Thanks
Jianyong Wu

> Paolo
> 
> > In some cases, it is actually desirable (like these broken systems
> > that cannot use an in-kernel irqchip...). Maybe it is that on x86 the
> > guest only uses the kvm_clock, but that's a much harder sell on ARM.
> > The fact that ptp_kvm assumes that the clocksource is fixed doesn't
> > seem correct in that case.

Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return

2019-09-22 Thread Matt Cover

On Sun, Sep 22, 2019 at 7:34 PM Jason Wang  wrote:
>
>
> On 2019/9/23 上午9:15, Matt Cover wrote:
> > On Sun, Sep 22, 2019 at 5:51 PM Jason Wang  wrote:
> >>
> >> On 2019/9/23 上午6:30, Matt Cover wrote:
> >>> On Sun, Sep 22, 2019 at 1:36 PM Michael S. Tsirkin  
> >>> wrote:
>  On Sun, Sep 22, 2019 at 10:43:19AM -0700, Matt Cover wrote:
> > On Sun, Sep 22, 2019 at 5:37 AM Michael S. Tsirkin  
> > wrote:
> >> On Fri, Sep 20, 2019 at 11:58:43AM -0700, Matthew Cover wrote:
> >>> Treat a negative return from a TUNSETSTEERINGEBPF bpf prog as a signal
> >>> to fallback to tun_automq_select_queue() for tx queue selection.
> >>>
> >>> Compilation of this exact patch was tested.
> >>>
> >>> For functional testing 3 additional printk()s were added.
> >>>
> >>> Functional testing results (on 2 txq tap device):
> >>>
> >>> [Fri Sep 20 18:33:27 2019] == tun no prog ==
> >>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() 
> >>> returned '-1'
> >>> [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran
> >>> [Fri Sep 20 18:33:27 2019] == tun prog -1 ==
> >>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() 
> >>> returned '-1'
> >>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() 
> >>> returned '-1'
> >>> [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran
> >>> [Fri Sep 20 18:33:27 2019] == tun prog 0 ==
> >>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() 
> >>> returned '0'
> >>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() 
> >>> returned '0'
> >>> [Fri Sep 20 18:33:27 2019] == tun prog 1 ==
> >>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() 
> >>> returned '1'
> >>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() 
> >>> returned '1'
> >>> [Fri Sep 20 18:33:27 2019] == tun prog 2 ==
> >>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() 
> >>> returned '2'
> >>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() 
> >>> returned '0'
> >>>
> >>> Signed-off-by: Matthew Cover 
> >> Could you add a bit more motivation data here?
> > Thank you for these questions Michael.
> >
> > I'll plan on adding the below information to the
> > commit message and submitting a v2 of this patch
> > when net-next reopens. In the meantime, it would
> > be very helpful to know if these answers address
> > some of your concerns.
> >
> >> 1. why is this a good idea
> > This change allows TUNSETSTEERINGEBPF progs to
> > do any of the following.
> >1. implement queue selection for a subset of
> >   traffic (e.g. special queue selection logic
> >   for ipv4, but return negative and use the
> >   default automq logic for ipv6)
> >2. determine there isn't sufficient information
> >   to do proper queue selection; return
> >   negative and use the default automq logic
> >   for the unknown
> >3. implement a noop prog (e.g. do
> >   bpf_trace_printk() then return negative and
> >   use the default automq logic for everything)
> >
> >> 2. how do we know existing userspace does not rely on existing 
> >> behaviour
> > Prior to this change a negative return from a
> > TUNSETSTEERINGEBPF prog would have been cast
> > into a u16 and traversed netdev_cap_txqueue().
> >
> > In most cases netdev_cap_txqueue() would have
> > found this value to exceed real_num_tx_queues
> > and queue_index would be updated to 0.
> >
> > It is possible that a TUNSETSTEERINGEBPF prog
> > return a negative value which when cast into a
> > u16 results in a positive queue_index less than
> > real_num_tx_queues. For example, on x86_64, a
> > return value of -65535 results in a queue_index
> > of 1; which is a valid queue for any multiqueue
> > device.
> >
> > It seems unlikely, however as stated above is
> > unfortunately possible, that existing
> > TUNSETSTEERINGEBPF programs would choose to
> > return a negative value rather than return the
> > positive value which holds the same meaning.
> >
> > It seems more likely that future
> > TUNSETSTEERINGEBPF programs would leverage a
> > negative return and potentially be loaded into
> > a kernel with the old behavior.
>  OK if we are returning a special
>  value, shouldn't we limit it? How about a special
>  value with this meaning?
>  If we are changing an ABI let's at least make it
>  extensible.
> 
> >>> A special value with this meaning sounds
> >>> good to me. I'll plan on adding a define
> >>> set to -1 to cause the fallback to automq.
>

[PATCH] futex: robust futex maybe never be awaked, on rare situation.

2019-09-22 Thread Yunfeng Cui

I use model checker find a issue of robust and pi futex. On below
situation, the owner can't find something in pi_state_list, while
the requester will be blocked, never be awaked.

CPU0   CPU1
   futex_lock_pi
   /*some cs code*/
futex_lock_pi
  futex_lock_pi_atomic
...
newval = uval | FUTEX_WAITERS;
ret = lock_pi_update_atomic(uaddr, uval, newval);
...
attach_to_pi_owner
 
 p = find_get_task_by_vpid(pid);
 if (!p)
   return handle_exit_race(uaddr, uval, NULL);
   
   raw_spin_lock_irq(>pi_lock);
   
   pi_state = alloc_pi_state();
   
   do_exit->mm_release
   if (unlikely(tsk->robust_list)) {
 exit_robust_list(tsk);
 tsk->robust_list = NULL;
   }
   if (unlikely(!list_empty(>pi_state_list)))
 exit_pi_state_list(tsk); /*WILL MISS*/
  list_add(_state->list, >pi_state_list);
WILL BLOCKED, NEVER WAKEUP!

Signed-off-by: Yunfeng Cui 
Reviewed-by: Bo Wang 
Reviewed-by: Yi Wang 
---
 kernel/fork.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 53e780748fe3..58b90f21dac4 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1277,15 +1277,16 @@ void mm_release(struct task_struct *tsk, struct 
mm_struct *mm)
if (unlikely(tsk->robust_list)) {
exit_robust_list(tsk);
tsk->robust_list = NULL;
+   /*Check pi_state_list of task on pi_lock be acquired*/
+   exit_pi_state_list(tsk);
}
 #ifdef CONFIG_COMPAT
if (unlikely(tsk->compat_robust_list)) {
compat_exit_robust_list(tsk);
tsk->compat_robust_list = NULL;
+   exit_pi_state_list(tsk);
}
 #endif
-   if (unlikely(!list_empty(>pi_state_list)))
-   exit_pi_state_list(tsk);
 #endif
 
uprobe_free_utask(tsk);
-- 
2.15.2

Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return

2019-09-22 Thread Matt Cover

On Sun, Sep 22, 2019 at 7:32 PM Jason Wang  wrote:
>
>
> On 2019/9/23 上午9:20, Matt Cover wrote:
> > On Sun, Sep 22, 2019 at 5:46 PM Jason Wang  wrote:
> >>
> >> On 2019/9/23 上午1:43, Matt Cover wrote:
> >>> On Sun, Sep 22, 2019 at 5:37 AM Michael S. Tsirkin  
> >>> wrote:
>  On Fri, Sep 20, 2019 at 11:58:43AM -0700, Matthew Cover wrote:
> > Treat a negative return from a TUNSETSTEERINGEBPF bpf prog as a signal
> > to fallback to tun_automq_select_queue() for tx queue selection.
> >
> > Compilation of this exact patch was tested.
> >
> > For functional testing 3 additional printk()s were added.
> >
> > Functional testing results (on 2 txq tap device):
> >
> > [Fri Sep 20 18:33:27 2019] == tun no prog ==
> > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned 
> > '-1'
> > [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran
> > [Fri Sep 20 18:33:27 2019] == tun prog -1 ==
> > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned 
> > '-1'
> > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned 
> > '-1'
> > [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran
> > [Fri Sep 20 18:33:27 2019] == tun prog 0 ==
> > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned 
> > '0'
> > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned 
> > '0'
> > [Fri Sep 20 18:33:27 2019] == tun prog 1 ==
> > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned 
> > '1'
> > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned 
> > '1'
> > [Fri Sep 20 18:33:27 2019] == tun prog 2 ==
> > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned 
> > '2'
> > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned 
> > '0'
> >
> > Signed-off-by: Matthew Cover 
>  Could you add a bit more motivation data here?
> >>> Thank you for these questions Michael.
> >>>
> >>> I'll plan on adding the below information to the
> >>> commit message and submitting a v2 of this patch
> >>> when net-next reopens. In the meantime, it would
> >>> be very helpful to know if these answers address
> >>> some of your concerns.
> >>>
>  1. why is this a good idea
> >>> This change allows TUNSETSTEERINGEBPF progs to
> >>> do any of the following.
> >>>1. implement queue selection for a subset of
> >>>   traffic (e.g. special queue selection logic
> >>>   for ipv4, but return negative and use the
> >>>   default automq logic for ipv6)
> >>
> >> Well, using ebpf means it need to take care of all the cases. E.g you
> >> can easily implement the fallback through eBPF as well.
> >>
> > I really think there is value in being
> > able to implement a scoped special
> > case while leaving the rest of the
> > packets in the kernel's hands.
>
>
> This is only work when some fucntion could not be done by eBPF itself
> and then we can provide the function through eBPF helpers. But this is
> not the case here.
>
>
> >
> > Having to reimplement automq makes
> > this hookpoint less accessible to
> > beginners and experienced alike.
>
>
> Note that automq itself is kind of complicated, it's best effort that is
> hard to be documented accurately. It has several limitations (e.g flow
> caches etc.) that may not work well in some conditions.
>
> It's not hard to implement a user programmable steering policy through
> maps which could have much deterministic behavior than automq. The goal
> of steering ebpf is to get rid of automq completely not partially rely
> on it.
>
> And I don't see how relying on automq can simplify anything.
>
> Thanks
>

I'm not suggesting that we document automq.

I'm suggesting that we add a return value
which is documented as signaling to the
kernel to implement whatever queue
selection method is used when there is no
ebpf prog attached. That behavior today is
automq.

There is nothing about this return value
which would harder to change the default
queue selection later. The default already
exists today when there is no program
loaded.

>
> >
> >>>2. determine there isn't sufficient information
> >>>   to do proper queue selection; return
> >>>   negative and use the default automq logic
> >>>   for the unknown
> >>
> >> Same as above.
> >>
> >>
> >>>3. implement a noop prog (e.g. do
> >>>   bpf_trace_printk() then return negative and
> >>>   use the default automq logic for everything)
> >>
> >> ditto.
> >>
> >>
>  2. how do we know existing userspace does not rely on existing behaviour
> >>> Prior to this change a negative return from a
> >>> TUNSETSTEERINGEBPF prog would have been cast
> >>> into a u16 and traversed netdev_cap_txqueue().
> >>>
> >>> In most

RE: [PATCH V4 2/5] input: keyboard: imx_sc: Add i.MX system controller key support

2019-09-22 Thread Anson Huang

Hi, Pavel

> Subject: Re: [PATCH V4 2/5] input: keyboard: imx_sc: Add i.MX system
> controller key support
> 
> Hi!
> 
> > > > +   ret = imx_scu_call_rpc(priv->key_ipc_handle, , true);
> > > > +   if (ret) {
> > > > +   dev_err(>dev, "read imx sc key failed, ret 
> > > > %d\n", ret);
> > > > +   return;
> > > > +   }
> > > > +
> > > > +   state = (bool)msg.state;
> > > > +
> > > > +   if (!state && !priv->keystate)
> > > > +   state = true;
> > >
> > > This needs an explanation please.
> >
> > This is to handle the quick press of button, e.g., when button is
> > pressed and released very quickly, when the delay work is scheduled,
> > the button state read from SCU FW is actually a release state (0), the press
> state is (1), so the quick press/release will be ignored.
> >
> > However, after double check and test, I think this should be handled
> > by debounce time, if the button is pressed/release very quickly, the
> > event should be ignored, I will remove it and reduce the debounce time to
> 30mS, previous 100mS is too long, using 30mS as debounce time, I did NOT
> see similar issue no matter how quick I press/release the button.
> 
> Are you sure this is expected behaviour?
> 
> AFAIK microswitches can bounce when the button is pressed and released,
> but will not generate glitches when the button was not pressed, so even
> short presses are real and should be propagated...

Latest version of patch handle the quick press as below:

- When the button is pressed, the delayed work will be scheduled;
- In the delayed work, the button state will be read, once the state is 
different as previous
  state, it will be propagated;
- If the state is a press state, another delayed work will be scheduled for 
handling release
  event.

So there could be a small window of DEBOUNCE_TIME (30mS) between button press 
and the delayed
work scheduled for reading the button state, if the button is released in less 
than 30ms, then the
state read could be a release state which is same as previous state and it will 
be ignored.

Do you think in this case, we should also propagated the press event? If yes, 
then what about the
release event? Although from what I tested, no matter how quick I press the 
button, the driver
can always receive both the press and release event, as the 30mS is very short.

Thanks,
Anson

Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return

2019-09-22 Thread Jason Wang




On 2019/9/23 上午9:15, Matt Cover wrote:

On Sun, Sep 22, 2019 at 5:51 PM Jason Wang  wrote:


On 2019/9/23 上午6:30, Matt Cover wrote:

On Sun, Sep 22, 2019 at 1:36 PM Michael S. Tsirkin  wrote:

On Sun, Sep 22, 2019 at 10:43:19AM -0700, Matt Cover wrote:

On Sun, Sep 22, 2019 at 5:37 AM Michael S. Tsirkin  wrote:

On Fri, Sep 20, 2019 at 11:58:43AM -0700, Matthew Cover wrote:

Treat a negative return from a TUNSETSTEERINGEBPF bpf prog as a signal
to fallback to tun_automq_select_queue() for tx queue selection.

Compilation of this exact patch was tested.

For functional testing 3 additional printk()s were added.

Functional testing results (on 2 txq tap device):

[Fri Sep 20 18:33:27 2019] == tun no prog ==
[Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1'
[Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran
[Fri Sep 20 18:33:27 2019] == tun prog -1 ==
[Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '-1'
[Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1'
[Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran
[Fri Sep 20 18:33:27 2019] == tun prog 0 ==
[Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '0'
[Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0'
[Fri Sep 20 18:33:27 2019] == tun prog 1 ==
[Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '1'
[Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '1'
[Fri Sep 20 18:33:27 2019] == tun prog 2 ==
[Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '2'
[Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0'

Signed-off-by: Matthew Cover 

Could you add a bit more motivation data here?

Thank you for these questions Michael.

I'll plan on adding the below information to the
commit message and submitting a v2 of this patch
when net-next reopens. In the meantime, it would
be very helpful to know if these answers address
some of your concerns.


1. why is this a good idea

This change allows TUNSETSTEERINGEBPF progs to
do any of the following.
   1. implement queue selection for a subset of
  traffic (e.g. special queue selection logic
  for ipv4, but return negative and use the
  default automq logic for ipv6)
   2. determine there isn't sufficient information
  to do proper queue selection; return
  negative and use the default automq logic
  for the unknown
   3. implement a noop prog (e.g. do
  bpf_trace_printk() then return negative and
  use the default automq logic for everything)


2. how do we know existing userspace does not rely on existing behaviour

Prior to this change a negative return from a
TUNSETSTEERINGEBPF prog would have been cast
into a u16 and traversed netdev_cap_txqueue().

In most cases netdev_cap_txqueue() would have
found this value to exceed real_num_tx_queues
and queue_index would be updated to 0.

It is possible that a TUNSETSTEERINGEBPF prog
return a negative value which when cast into a
u16 results in a positive queue_index less than
real_num_tx_queues. For example, on x86_64, a
return value of -65535 results in a queue_index
of 1; which is a valid queue for any multiqueue
device.

It seems unlikely, however as stated above is
unfortunately possible, that existing
TUNSETSTEERINGEBPF programs would choose to
return a negative value rather than return the
positive value which holds the same meaning.

It seems more likely that future
TUNSETSTEERINGEBPF programs would leverage a
negative return and potentially be loaded into
a kernel with the old behavior.

OK if we are returning a special
value, shouldn't we limit it? How about a special
value with this meaning?
If we are changing an ABI let's at least make it
extensible.


A special value with this meaning sounds
good to me. I'll plan on adding a define
set to -1 to cause the fallback to automq.


Can it really return -1?

I see:

static inline u32 bpf_prog_run_clear_cb(const struct bpf_prog *prog,
  struct sk_buff *skb)
...



The way I was initially viewing the old
behavior was that returning negative was
undefined; it happened to have the
outcomes I walked through, but not
necessarily by design.


Having such fallback may bring extra troubles, it requires the eBPF
program know the existence of the behavior which is not a part of kernel
ABI actually. And then some eBPF program may start to rely on that which
is pretty dangerous. Note, one important consideration is to have
macvtap support where does not have any stuffs like automq.

Thanks


How about we call this TUN_SSE_ABORT
instead of TUN_SSE_DO_AUTOMQ?

TUN_SSE_ABORT could be documented as
falling back to the default queue
selection method in either space
(presumably macvtap has some queue
selection method when there is

RE: [PATCH V2 1/5] dt-bindings: fsl: scu: add scu power key binding

2019-09-22 Thread Anson Huang

Hi, Pavel

> Subject: Re: [PATCH V2 1/5] dt-bindings: fsl: scu: add scu power key binding
> 
> On Tue 2019-09-03 10:03:40, Anson Huang wrote:
> > NXP i.MX8QXP is an ARMv8 SoC with a Cortex-M4 core inside as system
> > controller, the system controller is in charge of system power, clock
> > and power key event etc. management, Linux kernel has to communicate
> > with system controller via MU (message unit) IPC to get power key
> > event, add binding doc for i.MX system controller power key driver.
> >
> > Signed-off-by: Anson Huang 
> > ---
> > Changes since V1:
> > - remove "wakeup-source" property, as it is NOT needed for SCU
> interrupt;
> > - remove "status" in example.
> > ---
> >  .../devicetree/bindings/arm/freescale/fsl,scu.txt  | 14
> ++
> >  1 file changed, 14 insertions(+)
> >
> > diff --git
> > a/Documentation/devicetree/bindings/arm/freescale/fsl,scu.txt
> > b/Documentation/devicetree/bindings/arm/freescale/fsl,scu.txt
> > index c149fad..f93e2e4 100644
> > --- a/Documentation/devicetree/bindings/arm/freescale/fsl,scu.txt
> > +++ b/Documentation/devicetree/bindings/arm/freescale/fsl,scu.txt
> > @@ -157,6 +157,15 @@ Required properties:
> >  Optional properties:
> >  - timeout-sec: contains the watchdog timeout in seconds.
> >
> > +Power key bindings based on SCU Message Protocol
> > +
> > +
> > +Required properties:
> > +- compatible: should be:
> > +  "fsl,imx8qxp-sc-pwrkey"
> > +  followed by "fsl,imx-sc-pwrkey";
> > +- linux,keycodes: See
> > +Documentation/devicetree/bindings/input/keys.txt
> 
> Note you have keycode_s_ here, but keycode in the example and in the dts
> patch.

NOT quite understand your point, could you please provide more details?

Thanks,
Anson

RE: [PATCH] ARM: Add support for Realtek SOC

2019-09-22 Thread James Tai

> Subject: Re: [PATCH] ARM: Add support for Realtek SOC
> 
> On Wed, Sep 11, 2019 at 5:17 PM Arnd Bergmann  wrote:
> >
> > On Wed, Sep 11, 2019 at 9:46 AM James Tai[戴志峰]
>  wrote:
> > > > Subject: Re: [PATCH] ARM: Add support for Realtek SOC
> >
> > > > > @@ -148,6 +148,7 @@ endif
> > > > >  textofs-$(CONFIG_ARCH_MSM8X60) := 0x00208000
> > > > >  textofs-$(CONFIG_ARCH_MSM8960) := 0x00208000
> > > > >  textofs-$(CONFIG_ARCH_MESON) := 0x00208000
> > > > > +textofs-$(CONFIG_ARCH_REALTEK) := 0x00208000
> > > > >  textofs-$(CONFIG_ARCH_AXXIA) := 0x00308000
> > > >
> > > > Can you explain why this is needed for your platform?
> > > >
> > > We need to reserve memory (0x ~ 0x001B) for rom and boot
> code.
> >
> > Ok.
> 
> 
> I do not like this much.
> 
> This platform is ARCH_MULTI_V7.
> 
> ARM_PATCH_PHYS_VIRT allows you to place the kernel image anywhere in
> memory as long as the base is aligned at 16MB.
> 
> The minimum 'textofs-y := 0x0008000' + extra 16MB offset will create a space
> (0x ~ 0x01008000).
> 
> This is more than needed, but it is not a big deal to waste some megabytes of
> memory.
> 

OK. I understand.

> --
> Best Regards
> Masahiro Yamada
> 
> --Please consider the environment before printing this e-mail.

Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return

2019-09-22 Thread Jason Wang




On 2019/9/23 上午9:20, Matt Cover wrote:

On Sun, Sep 22, 2019 at 5:46 PM Jason Wang  wrote:


On 2019/9/23 上午1:43, Matt Cover wrote:

On Sun, Sep 22, 2019 at 5:37 AM Michael S. Tsirkin  wrote:

On Fri, Sep 20, 2019 at 11:58:43AM -0700, Matthew Cover wrote:

Treat a negative return from a TUNSETSTEERINGEBPF bpf prog as a signal
to fallback to tun_automq_select_queue() for tx queue selection.

Compilation of this exact patch was tested.

For functional testing 3 additional printk()s were added.

Functional testing results (on 2 txq tap device):

[Fri Sep 20 18:33:27 2019] == tun no prog ==
[Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1'
[Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran
[Fri Sep 20 18:33:27 2019] == tun prog -1 ==
[Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '-1'
[Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1'
[Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran
[Fri Sep 20 18:33:27 2019] == tun prog 0 ==
[Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '0'
[Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0'
[Fri Sep 20 18:33:27 2019] == tun prog 1 ==
[Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '1'
[Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '1'
[Fri Sep 20 18:33:27 2019] == tun prog 2 ==
[Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '2'
[Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0'

Signed-off-by: Matthew Cover 

Could you add a bit more motivation data here?

Thank you for these questions Michael.

I'll plan on adding the below information to the
commit message and submitting a v2 of this patch
when net-next reopens. In the meantime, it would
be very helpful to know if these answers address
some of your concerns.


1. why is this a good idea

This change allows TUNSETSTEERINGEBPF progs to
do any of the following.
   1. implement queue selection for a subset of
  traffic (e.g. special queue selection logic
  for ipv4, but return negative and use the
  default automq logic for ipv6)


Well, using ebpf means it need to take care of all the cases. E.g you
can easily implement the fallback through eBPF as well.


I really think there is value in being
able to implement a scoped special
case while leaving the rest of the
packets in the kernel's hands.



This is only work when some fucntion could not be done by eBPF itself 
and then we can provide the function through eBPF helpers. But this is 
not the case here.





Having to reimplement automq makes
this hookpoint less accessible to
beginners and experienced alike.



Note that automq itself is kind of complicated, it's best effort that is 
hard to be documented accurately. It has several limitations (e.g flow 
caches etc.) that may not work well in some conditions.


It's not hard to implement a user programmable steering policy through 
maps which could have much deterministic behavior than automq. The goal 
of steering ebpf is to get rid of automq completely not partially rely 
on it.


And I don't see how relying on automq can simplify anything.

Thanks





   2. determine there isn't sufficient information
  to do proper queue selection; return
  negative and use the default automq logic
  for the unknown


Same as above.



   3. implement a noop prog (e.g. do
  bpf_trace_printk() then return negative and
  use the default automq logic for everything)


ditto.



2. how do we know existing userspace does not rely on existing behaviour

Prior to this change a negative return from a
TUNSETSTEERINGEBPF prog would have been cast
into a u16 and traversed netdev_cap_txqueue().

In most cases netdev_cap_txqueue() would have
found this value to exceed real_num_tx_queues
and queue_index would be updated to 0.

It is possible that a TUNSETSTEERINGEBPF prog
return a negative value which when cast into a
u16 results in a positive queue_index less than
real_num_tx_queues. For example, on x86_64, a
return value of -65535 results in a queue_index
of 1; which is a valid queue for any multiqueue
device.

It seems unlikely, however as stated above is
unfortunately possible, that existing
TUNSETSTEERINGEBPF programs would choose to
return a negative value rather than return the
positive value which holds the same meaning.

It seems more likely that future
TUNSETSTEERINGEBPF programs would leverage a
negative return and potentially be loaded into
a kernel with the old behavior.


Yes, eBPF can return probably wrong value, but what kernel did is just
to make sure it doesn't harm anything.

I would rather just drop the packet in this case.


In addition to TUN_SSE_ABORT, we can
add TUN_SSE_DROP. That could be made the
default for any undefined negative

RE: [PATCH V2 1/5] dt-bindings: fsl: scu: add scu power key binding

2019-09-22 Thread Anson Huang

Hi, Pavel

> Subject: Re: [PATCH V2 1/5] dt-bindings: fsl: scu: add scu power key binding
> 
> Hi!
> 
> > NXP i.MX8QXP is an ARMv8 SoC with a Cortex-M4 core inside as system
> > controller, the system controller is in charge of system power, clock
> > and power key event etc. management, Linux kernel has to communicate
> > with system controller via MU (message unit) IPC to get power key
> > event, add binding doc for i.MX system controller power key driver.
> >
> > Signed-off-by: Anson Huang 
> 
> > +++ b/Documentation/devicetree/bindings/arm/freescale/fsl,scu.txt
> > @@ -157,6 +157,15 @@ Required properties:
> >  Optional properties:
> >  - timeout-sec: contains the watchdog timeout in seconds.
> >
> > +Power key bindings based on SCU Message Protocol
> > +
> > +
> > +Required properties:
> > +- compatible: should be:
> > +  "fsl,imx8qxp-sc-pwrkey"
> > +  followed by "fsl,imx-sc-pwrkey";
> > +- linux,keycodes: See
> > +Documentation/devicetree/bindings/input/keys.txt
> 
> Actually there's no reason for having "linux,keycodes" property when it is
> always a power button.
The latest version of patch already change the compatible name to *-sc-key which
is more general as key driver, so the "linux,keycodes" is needed now for driver
to define the key function.

Thanks,
Anson

RE: [PATCH] ARM: Add support for Realtek SOC

2019-09-22 Thread James Tai

>  wrote:
> > > Subject: Re: [PATCH] ARM: Add support for Realtek SOC
> 
> > > > @@ -148,6 +148,7 @@ endif
> > > >  textofs-$(CONFIG_ARCH_MSM8X60) := 0x00208000
> > > >  textofs-$(CONFIG_ARCH_MSM8960) := 0x00208000
> > > >  textofs-$(CONFIG_ARCH_MESON) := 0x00208000
> > > > +textofs-$(CONFIG_ARCH_REALTEK) := 0x00208000
> > > >  textofs-$(CONFIG_ARCH_AXXIA) := 0x00308000
> > >
> > > Can you explain why this is needed for your platform?
> > >
> > We need to reserve memory (0x ~ 0x001B) for rom and boot
> code.
> 
> Ok.
> 
> > > > +config ARCH_RTD16XX
> > > > +   bool "Enable support for RTD1619"
> > > > +   depends on ARCH_REALTEK
> > > > +   select ARM_GIC_V3
> > > > +   select ARM_PSCI
> > >
> > > As I understand, this chip uses a Cortex-A55. What is the reason for
> > > adding support only to the 32-bit ARM architecture rather than 64-bit?
> >
> > The RTD16XX platform also support the 64-bit ARM architecture.
> > I will add the 64-bit ARM architecture in new version patch.
> >
> > > Most 64-bit SoCs are only supported with arch/arm64, but generally
> > > speaking that is not a requirement. My rule of thumb is that on
> > > systems with 1GB of RAM or more, one would want to run a 64-bit
> > > kernel, while systems with less than that are better off with a
> > > 32-bit one, but that is clearly not the only reason for picking one over 
> > > the
> other.
> > >
> > Support 32-bit ARM architecture is for application compatibility.
> 
> Generally speaking, a 64-bit kernel should work better on 64-bit hardware
> even when you are running mostly 32-bit applications. However, you may have
> device drivers that do not correctly set compat_ioctl handlers.
> 
> As I said, it's no problem to allow both, just explain this in the changelog 
> text
> for the driver, along with the need for the textofs setting.
> 
OK.

> > > It's very unusual to see custom smp operations on an ARMv8 system,
> > > as we normally use PSCI here. Can you explain what is going on here?
> > > Are you able to use a boot wrapper that implements these in psci instead?
> > >
> > The smp operations is porting form other Realtek platform.
> >
> > Currently, The RTD16XX platform can use the PSCI method.
> > I will add PSCI method in new version patch.
> 
> Ok, perfect!
> 
> > > > +   timer_probe();
> > > > +   tick_setup_hrtimer_broadcast(); }
> > >
> > > What do you need tick_setup_hrtimer_broadcast() for? I don't see any
> > > other platform calling this.
> > >
> > I want to initialize the HR timer.
> 
> I'm still unsure about this one. My feeling is that it should not be in the
> platform code, but I don't quite understand which hardware needs it. I see 
> that
> Lorenzo Pieralisi added the same code to arm64 in commit 9358d755bd5c
> ("arm64: kernel: initialize broadcast hrtimer based clock event device"), but
> nothing ever calls it on 32-bit arch/arm even though the code does get built 
> in
> to the kernel.

I will add the 'hrtimer' initialization flow in related devices drivers.

> My feeling is that either you don't really need it, or this is something that 
> other
> platforms should really do as well, and it should be called from the generic
> time_init() function in arch/arm/kernel/time.c instead.
> 
OK. I understand.

> Can you try to find out more of the background here, and then move the call to
> time_init() assuming it is indeed useful?

I agree with you. It is not necessary to call 'time_init()' function in 
platform code.

>Arnd
> 
> --Please consider the environment before printing this e-mail.

Re: [PATCH 1/4] MIPS: CI20: DTS: Add I2C nodes

2019-09-22 Thread kbuild test robot

Hi Alexandre,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[cannot apply to v5.3 next-20190920]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:
https://github.com/0day-ci/linux/commits/Alexandre-GRIVEAUX/MIPS-CI20-DTS-Add-nodes-to-Creator-CI20-board/20190923-041656
config: mips-allmodconfig (attached as .config)
compiler: mips-linux-gcc (GCC) 7.4.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.4.0 make.cross ARCH=mips 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot 

All errors (new ones prefixed by >>):

>> Error: arch/mips/boot/dts/ingenic/ci20.dts:90.1-6 Label or path i2c0 not 
>> found
>> Error: arch/mips/boot/dts/ingenic/ci20.dts:168.1-6 Label or path i2c1 not 
>> found
>> Error: arch/mips/boot/dts/ingenic/ci20.dts:176.1-6 Label or path i2c2 not 
>> found
>> Error: arch/mips/boot/dts/ingenic/ci20.dts:184.1-6 Label or path i2c3 not 
>> found
>> Error: arch/mips/boot/dts/ingenic/ci20.dts:192.1-6 Label or path i2c4 not 
>> found
   FATAL ERROR: Syntax error parsing input tree

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

[PATCH V2 4/7] USB: serial: f81232: Add F81534A support

2019-09-22 Thread Ji-Ze Hong (Peter Hong)

The Fintek F81532A/534A/535/536 is USB-to-2/4/8/12 serial ports device
and the serial port is default disabled when plugin computer.

The IC is contains devices as following:
1. HUB (all devices is connected with this hub)
2. GPIO/Control device. (enable serial port and control GPIOs)
3. serial port 1 to x (2/4/8/12)

It's most same with F81232, the UART device is difference as follow:
1. TX/RX bulk size is 128/512bytes
2. RX bulk layout change:
F81232: [LSR(1Byte)+DATA(1Byte)][LSR(1Byte)+DATA(1Byte)]...
F81534A:[LEN][Data.][LSR]

Signed-off-by: Ji-Ze Hong (Peter Hong) 
---
 drivers/usb/serial/f81232.c | 131 ++--
 1 file changed, 127 insertions(+), 4 deletions(-)

diff --git a/drivers/usb/serial/f81232.c b/drivers/usb/serial/f81232.c
index e4db0aec9af0..36a17aedc2ae 100644
--- a/drivers/usb/serial/f81232.c
+++ b/drivers/usb/serial/f81232.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 /*
  * Fintek F81232 USB to serial adaptor driver
+ * Fintek F81532A/534A/535/536 USB to 2/4/8/12 serial adaptor driver
  *
  * Copyright (C) 2012 Greg Kroah-Hartman (gre...@linuxfoundation.org)
  * Copyright (C) 2012 Linux Foundation
@@ -21,11 +22,36 @@
 #include 
 #include 
 
+#define F81232_ID  \
+   { USB_DEVICE(0x1934, 0x0706) }  /* 1 port UART device */
+
+#define F81534A_SERIES_ID  \
+   { USB_DEVICE(0x2c42, 0x1602) }, /* In-Box 2 port UART device */ \
+   { USB_DEVICE(0x2c42, 0x1604) }, /* In-Box 4 port UART device */ \
+   { USB_DEVICE(0x2c42, 0x1605) }, /* In-Box 8 port UART device */ \
+   { USB_DEVICE(0x2c42, 0x1606) }, /* In-Box 12 port UART device */ \
+   { USB_DEVICE(0x2c42, 0x1608) }, /* Non-Flash type */ \
+   { USB_DEVICE(0x2c42, 0x1632) }, /* 2 port UART device */ \
+   { USB_DEVICE(0x2c42, 0x1634) }, /* 4 port UART device */ \
+   { USB_DEVICE(0x2c42, 0x1635) }, /* 8 port UART device */ \
+   { USB_DEVICE(0x2c42, 0x1636) }  /* 12 port UART device */
+
 static const struct usb_device_id id_table[] = {
-   { USB_DEVICE(0x1934, 0x0706) },
+   F81232_ID,
+   { } /* Terminating entry */
+};
+
+static const struct usb_device_id f81534a_id_table[] = {
+   F81534A_SERIES_ID,
+   { } /* Terminating entry */
+};
+
+static const struct usb_device_id all_serial_id_table[] = {
+   F81232_ID,
+   F81534A_SERIES_ID,
{ } /* Terminating entry */
 };
-MODULE_DEVICE_TABLE(usb, id_table);
+MODULE_DEVICE_TABLE(usb, all_serial_id_table);
 
 /* Maximum baudrate for F81232 */
 #define F81232_MAX_BAUDRATE150
@@ -35,6 +61,10 @@ MODULE_DEVICE_TABLE(usb, id_table);
 #define F81232_REGISTER_REQUEST0xa0
 #define F81232_GET_REGISTER0xc0
 #define F81232_SET_REGISTER0x40
+#define F81534A_REGISTER_REQUEST   F81232_REGISTER_REQUEST
+#define F81534A_GET_REGISTER   F81232_GET_REGISTER
+#define F81534A_SET_REGISTER   F81232_SET_REGISTER
+#define F81534A_ACCESS_REG_RETRY   2
 
 #define SERIAL_BASE_ADDRESS0x0120
 #define RECEIVE_BUFFER_REGISTER(0x00 + SERIAL_BASE_ADDRESS)
@@ -61,6 +91,11 @@ MODULE_DEVICE_TABLE(usb, id_table);
 #define F81232_CLK_14_77_MHZ   (BIT(1) | BIT(0))
 #define F81232_CLK_MASKGENMASK(1, 0)
 
+#define F81534A_MODE_CONF_REG  0x107
+#define F81534A_TRIGGER_MASK   GENMASK(3, 2)
+#define F81534A_TRIGGER_MULTPILE_4XBIT(3)
+#define F81534A_FIFO_128BYTE   (BIT(1) | BIT(0))
+
 struct f81232_private {
struct mutex lock;
u8 modem_control;
@@ -383,6 +418,46 @@ static void f81232_process_read_urb(struct urb *urb)
tty_flip_buffer_push(>port);
 }
 
+static void f81534a_process_read_urb(struct urb *urb)
+{
+   struct usb_serial_port *port = urb->context;
+   unsigned char *data = urb->transfer_buffer;
+   char tty_flag;
+   unsigned int i;
+   u8 lsr;
+   u8 len;
+
+   if (urb->actual_length < 3) {
+   dev_err(>dev, "error actual_length: %d\n",
+   urb->actual_length);
+   return;
+   }
+
+   len = data[0];
+   if (len != urb->actual_length) {
+   dev_err(>dev, "len(%d) != urb->actual_length(%d)\n", len,
+   urb->actual_length);
+   return;
+   }
+
+   /* bulk-in data: [LEN][Data.][LSR] */
+   lsr = data[len - 1];
+   tty_flag = f81232_handle_lsr(port, lsr);
+
+   if (port->port.console && port->sysrq) {
+   for (i = 1; i < urb->actual_length - 1; ++i)
+   if (!usb_serial_handle_sysrq_char(port, data[i]))
+   tty_insert_flip_char(>port, data[i],
+   tty_flag);
+   }

[PATCH V2 7/7] USB: serial: f81232: Add gpiolib to GPIO device

2019-09-22 Thread Ji-Ze Hong (Peter Hong)

The Fintek F81534A series contains 3 GPIOs per UART and The max GPIOs
is 12x3 = 36 GPIOs and this patch will implements GPIO device as a
gpiochip to control all GPIO pins even transforms to transceiver pins.

Signed-off-by: Ji-Ze Hong (Peter Hong) 
---
 drivers/usb/serial/f81232.c | 249 
 1 file changed, 249 insertions(+)

diff --git a/drivers/usb/serial/f81232.c b/drivers/usb/serial/f81232.c
index 82cc1e6cff62..dc9b28738b80 100644
--- a/drivers/usb/serial/f81232.c
+++ b/drivers/usb/serial/f81232.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -104,6 +105,8 @@ MODULE_DEVICE_TABLE(usb, all_serial_id_table);
 #define F81534A_TRIGGER_MULTPILE_4XBIT(3)
 #define F81534A_FIFO_128BYTE   (BIT(1) | BIT(0))
 
+#define F81534A_MAX_PORT   12
+
 /* Serial port self GPIO control, 2bytes [control data][input data] */
 #define F81534A_GPIO_REG   0x10e
 #define F81534A_GPIO_MODE2_DIR BIT(6) /* 1: input, 0: output */
@@ -115,6 +118,13 @@ MODULE_DEVICE_TABLE(usb, all_serial_id_table);
 
 #define F81534A_CMD_ENABLE_PORT0x116
 
+/*
+ * Control device global GPIO control,
+ * 2bytes [control data][input data]
+ */
+#define F81534A_CTRL_GPIO_REG  0x1601
+#define F81534A_CTRL_GPIO_MAX_PIN  3
+
 struct f81232_private {
struct mutex lock;
u8 modem_control;
@@ -126,6 +136,12 @@ struct f81232_private {
struct usb_serial_port *port;
 };
 
+struct f81534a_ctrl_private {
+   struct usb_interface *intf;
+   struct gpio_chip chip;
+   struct mutex lock;
+};
+
 static u32 const baudrate_table[] = { 115200, 921600, 1152000, 150 };
 static u8 const clock_table[] = { F81232_CLK_1_846_MHZ, F81232_CLK_14_77_MHZ,
F81232_CLK_18_46_MHZ, F81232_CLK_24_MHZ };
@@ -863,6 +879,50 @@ static void f81232_lsr_worker(struct work_struct *work)
dev_warn(>dev, "read LSR failed: %d\n", status);
 }
 
+static int f81534a_ctrl_get_register(struct usb_device *dev, u16 reg, u16 size,
+   void *val)
+{
+   int retry = F81534A_ACCESS_REG_RETRY;
+   int status;
+   u8 *tmp;
+
+   tmp = kmalloc(size, GFP_KERNEL);
+   if (!tmp)
+   return -ENOMEM;
+
+   while (retry--) {
+   status = usb_control_msg(dev,
+   usb_rcvctrlpipe(dev, 0),
+   F81534A_REGISTER_REQUEST,
+   F81534A_GET_REGISTER,
+   reg,
+   0,
+   tmp,
+   size,
+   USB_CTRL_GET_TIMEOUT);
+   if (status != size) {
+   status = usb_translate_errors(status);
+   if (status == -EIO)
+   continue;
+
+   status = -EIO;
+   } else {
+   status = 0;
+   memcpy(val, tmp, size);
+   }
+
+   break;
+   }
+
+   if (status) {
+   dev_err(>dev, "get reg: %x, failed status: %d\n", reg,
+   status);
+   }
+
+   kfree(tmp);
+   return status;
+}
+
 static int f81534a_ctrl_set_register(struct usb_device *dev, u16 reg, u16 size,
void *val)
 {
@@ -908,6 +968,182 @@ static int f81534a_ctrl_set_register(struct usb_device 
*dev, u16 reg, u16 size,
return status;
 }
 
+#ifdef CONFIG_GPIOLIB
+static int f81534a_ctrl_set_mask_register(struct usb_device *dev, u16 reg,
+   u8 mask, u8 val)
+{
+   int status;
+   u8 tmp;
+
+   status = f81534a_ctrl_get_register(dev, reg, 1, );
+   if (status)
+   return status;
+
+
+   tmp = (tmp & ~mask) | (val & mask);
+
+   status = f81534a_ctrl_set_register(dev, reg, 1, );
+   if (status)
+   return status;
+
+   return 0;
+}
+
+static int f81534a_gpio_get(struct gpio_chip *chip, unsigned int gpio_num)
+{
+   struct f81534a_ctrl_private *priv = gpiochip_get_data(chip);
+   struct usb_interface *intf = priv->intf;
+   struct usb_device *dev = interface_to_usbdev(intf);
+   int status;
+   u8 tmp[2];
+   int set;
+   int idx;
+   int reg;
+
+   set = gpio_num / F81534A_CTRL_GPIO_MAX_PIN;
+   idx = gpio_num % F81534A_CTRL_GPIO_MAX_PIN;
+   reg = F81534A_CTRL_GPIO_REG + set;
+
+   mutex_lock(>lock);
+
+   status = f81534a_ctrl_get_register(dev, reg, sizeof(tmp), tmp);
+   if (status) {
+   mutex_unlock(>lock);
+   return status;
+   }
+
+   mutex_unlock(>lock);
+
+   return !!(tmp[1] & BIT(idx));
+}
+
+static int f81534a_gpio_direction_in(struct gpio_chip

[PATCH V2 3/7] USB: serial: f81232: Use devm_kzalloc

2019-09-22 Thread Ji-Ze Hong (Peter Hong)

Use devm_kzalloc() to replace kzalloc().

Signed-off-by: Ji-Ze Hong (Peter Hong) 
---
 drivers/usb/serial/f81232.c | 13 +
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/drivers/usb/serial/f81232.c b/drivers/usb/serial/f81232.c
index b42b3738a768..e4db0aec9af0 100644
--- a/drivers/usb/serial/f81232.c
+++ b/drivers/usb/serial/f81232.c
@@ -756,7 +756,7 @@ static int f81232_port_probe(struct usb_serial_port *port)
 {
struct f81232_private *priv;
 
-   priv = kzalloc(sizeof(*priv), GFP_KERNEL);
+   priv = devm_kzalloc(>dev, sizeof(*priv), GFP_KERNEL);
if (!priv)
return -ENOMEM;
 
@@ -772,16 +772,6 @@ static int f81232_port_probe(struct usb_serial_port *port)
return 0;
 }
 
-static int f81232_port_remove(struct usb_serial_port *port)
-{
-   struct f81232_private *priv;
-
-   priv = usb_get_serial_port_data(port);
-   kfree(priv);
-
-   return 0;
-}
-
 static int f81232_suspend(struct usb_serial *serial, pm_message_t message)
 {
struct usb_serial_port *port = serial->port[0];
@@ -841,7 +831,6 @@ static struct usb_serial_driver f81232_device = {
.process_read_urb = f81232_process_read_urb,
.read_int_callback =f81232_read_int_callback,
.port_probe =   f81232_port_probe,
-   .port_remove =  f81232_port_remove,
.suspend =  f81232_suspend,
.resume =   f81232_resume,
 };
-- 
2.17.1

[PATCH V2 1/7] USB: serial: f81232: Extract LSR handler

2019-09-22 Thread Ji-Ze Hong (Peter Hong)

Extract LSR handler to function.

Signed-off-by: Ji-Ze Hong (Peter Hong) 
---
 drivers/usb/serial/f81232.c | 53 +
 1 file changed, 30 insertions(+), 23 deletions(-)

diff --git a/drivers/usb/serial/f81232.c b/drivers/usb/serial/f81232.c
index 43fa1f0716b7..c07d376c743d 100644
--- a/drivers/usb/serial/f81232.c
+++ b/drivers/usb/serial/f81232.c
@@ -322,10 +322,38 @@ static void f81232_read_int_callback(struct urb *urb)
__func__, retval);
 }
 
+static char f81232_handle_lsr(struct usb_serial_port *port, u8 lsr)
+{
+   struct f81232_private *priv = usb_get_serial_port_data(port);
+   char tty_flag = TTY_NORMAL;
+
+   if (!(lsr & UART_LSR_BRK_ERROR_BITS))
+   return tty_flag;
+
+   if (lsr & UART_LSR_BI) {
+   tty_flag = TTY_BREAK;
+   port->icount.brk++;
+   usb_serial_handle_break(port);
+   } else if (lsr & UART_LSR_PE) {
+   tty_flag = TTY_PARITY;
+   port->icount.parity++;
+   } else if (lsr & UART_LSR_FE) {
+   tty_flag = TTY_FRAME;
+   port->icount.frame++;
+   }
+
+   if (lsr & UART_LSR_OE) {
+   port->icount.overrun++;
+   schedule_work(>lsr_work);
+   tty_insert_flip_char(>port, 0, TTY_OVERRUN);
+   }
+
+   return tty_flag;
+}
+
 static void f81232_process_read_urb(struct urb *urb)
 {
struct usb_serial_port *port = urb->context;
-   struct f81232_private *priv = usb_get_serial_port_data(port);
unsigned char *data = urb->transfer_buffer;
char tty_flag;
unsigned int i;
@@ -341,29 +369,8 @@ static void f81232_process_read_urb(struct urb *urb)
/* bulk-in data: [LSR(1Byte)+DATA(1Byte)][LSR(1Byte)+DATA(1Byte)]... */
 
for (i = 0; i < urb->actual_length; i += 2) {
-   tty_flag = TTY_NORMAL;
lsr = data[i];
-
-   if (lsr & UART_LSR_BRK_ERROR_BITS) {
-   if (lsr & UART_LSR_BI) {
-   tty_flag = TTY_BREAK;
-   port->icount.brk++;
-   usb_serial_handle_break(port);
-   } else if (lsr & UART_LSR_PE) {
-   tty_flag = TTY_PARITY;
-   port->icount.parity++;
-   } else if (lsr & UART_LSR_FE) {
-   tty_flag = TTY_FRAME;
-   port->icount.frame++;
-   }
-
-   if (lsr & UART_LSR_OE) {
-   port->icount.overrun++;
-   schedule_work(>lsr_work);
-   tty_insert_flip_char(>port, 0,
-   TTY_OVERRUN);
-   }
-   }
+   tty_flag = f81232_handle_lsr(port, lsr);
 
if (port->port.console && port->sysrq) {
if (usb_serial_handle_sysrq_char(port, data[i + 1]))
-- 
2.17.1

[PATCH V2 0/7] Add Fintek F81534A series usb-to-serial driver

2019-09-22 Thread Ji-Ze Hong (Peter Hong)

The Fintek F81532A/534A/535/536 is USB-to-2/4/8/12 serial ports device
and the serial port is default disabled when plugin computer.

The part number is a bit same with F81532/534, but F81534A series UART
core is enhanced from F81232, not F81532/534.  

The IC is contains devices as following:
1. HUB (all devices is connected with this hub)
2. GPIO/Control device. (enable serial port and control all GPIOs)
3. serial port 1 to x (2/4/8/12)

It's most same with F81232, the UART device is difference as follow:
1. TX/RX bulk size is 128/512bytes
2. RX bulk layout change:
F81232: [LSR(1Byte)+DATA(1Byte)][LSR(1Byte)+DATA(1Byte)]...
F81534A:[LEN][Data.][LSR]

We'll try to do some code refacting before add F81534A series.

Ji-Ze Hong (Peter Hong) (7):
  USB: serial: f81232: Extract LSR handler
  USB: serial: f81232: Add tx_empty function
  USB: serial: f81232: Use devm_kzalloc
  USB: serial: f81232: Add F81534A support
  USB: serial: f81232: Set F81534A serial port with RS232 mode
  USB: serial: f81232: Add generator for F81534A
  USB: serial: f81232: Add gpiolib to GPIO device

 drivers/usb/serial/f81232.c | 604 ++--
 1 file changed, 570 insertions(+), 34 deletions(-)

-- 
2.17.1

[PATCH V2 6/7] USB: serial: f81232: Add generator for F81534A

2019-09-22 Thread Ji-Ze Hong (Peter Hong)

The Fintek F81534A series is contains 1 HUB / 1 GPIO device / n UARTs,
but the UART is default disable and need enabled by GPIO device(2c42/16F8).

When F81534A plug to host, we can only see 1 HUB & 1 GPIO device and we
need write 0x8fff to GPIO device register F81534A_CMD_ENABLE_PORT(116h)
to enable all available serial ports.

Signed-off-by: Ji-Ze Hong (Peter Hong) 
---
 drivers/usb/serial/f81232.c | 135 +++-
 1 file changed, 134 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/serial/f81232.c b/drivers/usb/serial/f81232.c
index 01cb5a5ea1d2..82cc1e6cff62 100644
--- a/drivers/usb/serial/f81232.c
+++ b/drivers/usb/serial/f81232.c
@@ -36,6 +36,9 @@
{ USB_DEVICE(0x2c42, 0x1635) }, /* 8 port UART device */ \
{ USB_DEVICE(0x2c42, 0x1636) }  /* 12 port UART device */
 
+#define F81534A_CTRL_ID\
+   { USB_DEVICE(0x2c42, 0x16f8) }  /* Global control device */
+
 static const struct usb_device_id id_table[] = {
F81232_ID,
{ } /* Terminating entry */
@@ -46,6 +49,11 @@ static const struct usb_device_id f81534a_id_table[] = {
{ } /* Terminating entry */
 };
 
+static const struct usb_device_id f81534a_ctrl_id_table[] = {
+   F81534A_CTRL_ID,
+   { } /* Terminating entry */
+};
+
 static const struct usb_device_id all_serial_id_table[] = {
F81232_ID,
F81534A_SERIES_ID,
@@ -105,6 +113,8 @@ MODULE_DEVICE_TABLE(usb, all_serial_id_table);
 #define F81534A_GPIO_MODE1_OUTPUT  BIT(1)
 #define F81534A_GPIO_MODE0_OUTPUT  BIT(0)
 
+#define F81534A_CMD_ENABLE_PORT0x116
+
 struct f81232_private {
struct mutex lock;
u8 modem_control;
@@ -853,6 +863,95 @@ static void f81232_lsr_worker(struct work_struct *work)
dev_warn(>dev, "read LSR failed: %d\n", status);
 }
 
+static int f81534a_ctrl_set_register(struct usb_device *dev, u16 reg, u16 size,
+   void *val)
+{
+   int retry = F81534A_ACCESS_REG_RETRY;
+   int status;
+   u8 *tmp;
+
+   tmp = kmalloc(size, GFP_KERNEL);
+   if (!tmp)
+   return -ENOMEM;
+
+   memcpy(tmp, val, size);
+
+   while (retry--) {
+   status = usb_control_msg(dev,
+   usb_sndctrlpipe(dev, 0),
+   F81534A_REGISTER_REQUEST,
+   F81534A_SET_REGISTER,
+   reg,
+   0,
+   tmp,
+   size,
+   USB_CTRL_SET_TIMEOUT);
+   if (status != size) {
+   status = usb_translate_errors(status);
+   if (status == -EIO)
+   continue;
+
+   status = -EIO;
+   } else {
+   status = 0;
+   }
+
+   break;
+   }
+
+   if (status) {
+   dev_err(>dev, "set ctrl reg: %x, failed status: %d\n", reg,
+   status);
+   }
+
+   kfree(tmp);
+   return status;
+}
+
+static int f81534a_ctrl_enable_all_ports(struct usb_interface *intf)
+{
+   struct usb_device *dev = interface_to_usbdev(intf);
+   unsigned char enable[2];
+   int status;
+
+   /* enable all available serial ports */
+   enable[0] = 0xff;
+   enable[1] = 0x8f;
+
+   status = f81534a_ctrl_set_register(dev, F81534A_CMD_ENABLE_PORT,
+   sizeof(enable), enable);
+   if (status)
+   dev_warn(>dev, "set CMD_ENABLE_PORT failed: %d\n", status);
+
+   return status;
+}
+
+static int f81534a_ctrl_probe(struct usb_interface *intf,
+   const struct usb_device_id *id)
+{
+   struct usb_device *dev = interface_to_usbdev(intf);
+   int status;
+
+   status = f81534a_ctrl_enable_all_ports(intf);
+   if (status)
+   return status;
+
+   dev = usb_get_dev(dev);
+   return 0;
+}
+
+static void f81534a_ctrl_disconnect(struct usb_interface *intf)
+{
+   struct usb_device *dev = interface_to_usbdev(intf);
+
+   usb_put_dev(dev);
+}
+
+static int f81534a_ctrl_resume(struct usb_interface *intf)
+{
+   return f81534a_ctrl_enable_all_ports(intf);
+}
+
 static int f81232_port_probe(struct usb_serial_port *port)
 {
struct f81232_private *priv;
@@ -980,7 +1079,41 @@ static struct usb_serial_driver * const serial_drivers[] 
= {
NULL,
 };
 
-module_usb_serial_driver(serial_drivers, all_serial_id_table);
+static struct usb_driver f81534a_ctrl_driver = {
+   .name = "f81534a_ctrl",
+   .id_table = f81534a_ctrl_id_table,
+   .probe =f81534a_ctrl_probe,
+

[PATCH V2 5/7] USB: serial: f81232: Set F81534A serial port with RS232 mode

2019-09-22 Thread Ji-Ze Hong (Peter Hong)

The Fintek F81532A/534A/535/536 is USB-to-2/4/8/12 serial ports device
and the serial ports are default disabled. Each port contains max 3 pins
GPIO and the 3 pins are default pull high with input mode.

When the serial port had activated (running probe()), we'll transform the
3 pins from GPIO function publicly to control Tranceiver privately use.
We'll default set to 0/0/1 for control transceiver to RS232 mode.

Otherwise, If the serial port is not active, the 3 pins is in GPIO mode
and controlled by global GPIO device with VID/PID: 2c42/16f8.

Signed-off-by: Ji-Ze Hong (Peter Hong) 
---
 drivers/usb/serial/f81232.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/drivers/usb/serial/f81232.c b/drivers/usb/serial/f81232.c
index 36a17aedc2ae..01cb5a5ea1d2 100644
--- a/drivers/usb/serial/f81232.c
+++ b/drivers/usb/serial/f81232.c
@@ -96,6 +96,15 @@ MODULE_DEVICE_TABLE(usb, all_serial_id_table);
 #define F81534A_TRIGGER_MULTPILE_4XBIT(3)
 #define F81534A_FIFO_128BYTE   (BIT(1) | BIT(0))
 
+/* Serial port self GPIO control, 2bytes [control data][input data] */
+#define F81534A_GPIO_REG   0x10e
+#define F81534A_GPIO_MODE2_DIR BIT(6) /* 1: input, 0: output */
+#define F81534A_GPIO_MODE1_DIR BIT(5)
+#define F81534A_GPIO_MODE0_DIR BIT(4)
+#define F81534A_GPIO_MODE2_OUTPUT  BIT(2)
+#define F81534A_GPIO_MODE1_OUTPUT  BIT(1)
+#define F81534A_GPIO_MODE0_OUTPUT  BIT(0)
+
 struct f81232_private {
struct mutex lock;
u8 modem_control;
@@ -866,6 +875,14 @@ static int f81232_port_probe(struct usb_serial_port *port)
 
 static int f81534a_port_probe(struct usb_serial_port *port)
 {
+   int status;
+
+   /* tri-state with pull-high, default RS232 Mode */
+   status = f81232_set_register(port, F81534A_GPIO_REG,
+   F81534A_GPIO_MODE2_DIR);
+   if (status)
+   return status;
+
return f81232_port_probe(port);
 }
 
-- 
2.17.1

[PATCH V2 2/7] USB: serial: f81232: Add tx_empty function

2019-09-22 Thread Ji-Ze Hong (Peter Hong)

Add tx_empty() function for F81232.

Signed-off-by: Ji-Ze Hong (Peter Hong) 
---
 drivers/usb/serial/f81232.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/drivers/usb/serial/f81232.c b/drivers/usb/serial/f81232.c
index c07d376c743d..b42b3738a768 100644
--- a/drivers/usb/serial/f81232.c
+++ b/drivers/usb/serial/f81232.c
@@ -685,6 +685,23 @@ static void f81232_dtr_rts(struct usb_serial_port *port, 
int on)
f81232_set_mctrl(port, 0, TIOCM_DTR | TIOCM_RTS);
 }
 
+static bool f81232_tx_empty(struct usb_serial_port *port)
+{
+   int status;
+   u8 tmp;
+
+   status = f81232_get_register(port, LINE_STATUS_REGISTER, );
+   if (status) {
+   dev_err(>dev, "get LSR status failed: %d\n", status);
+   return false;
+   }
+
+   if ((tmp & UART_LSR_TEMT) != UART_LSR_TEMT)
+   return false;
+
+   return true;
+}
+
 static int f81232_carrier_raised(struct usb_serial_port *port)
 {
u8 msr;
@@ -820,6 +837,7 @@ static struct usb_serial_driver f81232_device = {
.tiocmget = f81232_tiocmget,
.tiocmset = f81232_tiocmset,
.tiocmiwait =   usb_serial_generic_tiocmiwait,
+   .tx_empty = f81232_tx_empty,
.process_read_urb = f81232_process_read_urb,
.read_int_callback =f81232_read_int_callback,
.port_probe =   f81232_port_probe,
-- 
2.17.1

[PATCH] iio: Fix an undefied reference error in noa1305_probe

2019-09-22 Thread zhong jiang

I hit the following error when compile the kernel.

drivers/iio/light/noa1305.o: In function `noa1305_probe':
noa1305.c:(.text+0x65): undefined reference to `__devm_regmap_init_i2c'
make: *** [vmlinux] Error 1

Signed-off-by: zhong jiang 
---
 drivers/iio/light/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/iio/light/Kconfig b/drivers/iio/light/Kconfig
index 08d7e1e..4a1a883 100644
--- a/drivers/iio/light/Kconfig
+++ b/drivers/iio/light/Kconfig
@@ -314,6 +314,7 @@ config MAX44009
 config NOA1305
tristate "ON Semiconductor NOA1305 ambient light sensor"
depends on I2C
+   select REGMAP_I2C
help
 Say Y here if you want to build support for the ON Semiconductor
 NOA1305 ambient light sensor.
-- 
1.7.12.4

Re: [PATCH 4.19 32/79] fpga: altera-ps-spi: Fix getting of optional confd gpio

2019-09-22 Thread Phil Reid


On 22/09/2019 04:46, Pavel Machek wrote:

Hi!


Currently the driver does not handle EPROBE_DEFER for the confd gpio.
Use devm_gpiod_get_optional() instead of devm_gpiod_get() and return
error codes from altera_ps_probe().



@@ -265,10 +265,13 @@ static int altera_ps_probe(struct spi_device *spi)
return PTR_ERR(conf->status);
}
  
-	conf->confd = devm_gpiod_get(>dev, "confd", GPIOD_IN);

+   conf->confd = devm_gpiod_get_optional(>dev, "confd", GPIOD_IN);
if (IS_ERR(conf->confd)) {
-   dev_warn(>dev, "Not using confd gpio: %ld\n",
-PTR_ERR(conf->confd));
+   dev_err(>dev, "Failed to get confd gpio: %ld\n",
+   PTR_ERR(conf->confd));
+   return PTR_ERR(conf->confd);


Will this result in repeated errors in dmesg in case of EPROBE_DEFER?
We often avoid printing such messages in EPROBE_DEFER case.


Yes it will. I can submit a patch for that if required.




+   } else if (!conf->confd) {
+   dev_warn(>dev, "Not using confd gpio");
}
  
  	/* Register manager with unique name */


Best regards,
Pavel




--
Regards
Phil Reid

Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return

2019-09-22 Thread Matt Cover

On Sun, Sep 22, 2019 at 5:46 PM Jason Wang  wrote:
>
>
> On 2019/9/23 上午1:43, Matt Cover wrote:
> > On Sun, Sep 22, 2019 at 5:37 AM Michael S. Tsirkin  wrote:
> >> On Fri, Sep 20, 2019 at 11:58:43AM -0700, Matthew Cover wrote:
> >>> Treat a negative return from a TUNSETSTEERINGEBPF bpf prog as a signal
> >>> to fallback to tun_automq_select_queue() for tx queue selection.
> >>>
> >>> Compilation of this exact patch was tested.
> >>>
> >>> For functional testing 3 additional printk()s were added.
> >>>
> >>> Functional testing results (on 2 txq tap device):
> >>>
> >>>[Fri Sep 20 18:33:27 2019] == tun no prog ==
> >>>[Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned 
> >>> '-1'
> >>>[Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran
> >>>[Fri Sep 20 18:33:27 2019] == tun prog -1 ==
> >>>[Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned 
> >>> '-1'
> >>>[Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned 
> >>> '-1'
> >>>[Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran
> >>>[Fri Sep 20 18:33:27 2019] == tun prog 0 ==
> >>>[Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '0'
> >>>[Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0'
> >>>[Fri Sep 20 18:33:27 2019] == tun prog 1 ==
> >>>[Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '1'
> >>>[Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '1'
> >>>[Fri Sep 20 18:33:27 2019] == tun prog 2 ==
> >>>[Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '2'
> >>>[Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0'
> >>>
> >>> Signed-off-by: Matthew Cover 
> >>
> >> Could you add a bit more motivation data here?
> > Thank you for these questions Michael.
> >
> > I'll plan on adding the below information to the
> > commit message and submitting a v2 of this patch
> > when net-next reopens. In the meantime, it would
> > be very helpful to know if these answers address
> > some of your concerns.
> >
> >> 1. why is this a good idea
> > This change allows TUNSETSTEERINGEBPF progs to
> > do any of the following.
> >   1. implement queue selection for a subset of
> >  traffic (e.g. special queue selection logic
> >  for ipv4, but return negative and use the
> >  default automq logic for ipv6)
>
>
> Well, using ebpf means it need to take care of all the cases. E.g you
> can easily implement the fallback through eBPF as well.
>

I really think there is value in being
able to implement a scoped special
case while leaving the rest of the
packets in the kernel's hands.

Having to reimplement automq makes
this hookpoint less accessible to
beginners and experienced alike.

>
> >   2. determine there isn't sufficient information
> >  to do proper queue selection; return
> >  negative and use the default automq logic
> >  for the unknown
>
>
> Same as above.
>
>
> >   3. implement a noop prog (e.g. do
> >  bpf_trace_printk() then return negative and
> >  use the default automq logic for everything)
>
>
> ditto.
>
>
> >
> >> 2. how do we know existing userspace does not rely on existing behaviour
> > Prior to this change a negative return from a
> > TUNSETSTEERINGEBPF prog would have been cast
> > into a u16 and traversed netdev_cap_txqueue().
> >
> > In most cases netdev_cap_txqueue() would have
> > found this value to exceed real_num_tx_queues
> > and queue_index would be updated to 0.
> >
> > It is possible that a TUNSETSTEERINGEBPF prog
> > return a negative value which when cast into a
> > u16 results in a positive queue_index less than
> > real_num_tx_queues. For example, on x86_64, a
> > return value of -65535 results in a queue_index
> > of 1; which is a valid queue for any multiqueue
> > device.
> >
> > It seems unlikely, however as stated above is
> > unfortunately possible, that existing
> > TUNSETSTEERINGEBPF programs would choose to
> > return a negative value rather than return the
> > positive value which holds the same meaning.
> >
> > It seems more likely that future
> > TUNSETSTEERINGEBPF programs would leverage a
> > negative return and potentially be loaded into
> > a kernel with the old behavior.
>
>
> Yes, eBPF can return probably wrong value, but what kernel did is just
> to make sure it doesn't harm anything.
>
> I would rather just drop the packet in this case.
>

In addition to TUN_SSE_ABORT, we can
add TUN_SSE_DROP. That could be made the
default for any undefined negative
return as well.

> Thanks
>
>
> >
> >> 3. why doesn't userspace need a way to figure out whether it runs on a 
> >> kernel with and
> >> without this patch
> > There may be some value in exposing this fact
> > to the ebpf prog loader. What is the standard
> > practice here, a define?
> >
> >>
> >> thanks,
> >> MST
> >>
>

RE: [PATCH 2/2] nvmem: imx: scu: support write

2019-09-22 Thread Peng Fan

Hi Srinivas,

> >> Subject: [PATCH 2/2] nvmem: imx: scu: support write
> >
> > Ping..
> >
> Thanks for your patience!
> I normally do not take patches after rc5 for nvmem.
> These will be applied after rc1 is released!

Sorry to ping again. Will you pick up since merge window is open?

Thanks,
Peng.

> 
> Thanks,
> srini
> > Thanks,
> > Peng.
> >
> >>
> >> From: Peng Fan 
> >>
> >> The fuse programming from non-secure world is blocked, so we could
> >> only use Arm Trusted Firmware SIP call to let ATF program fuse.
> >>
> >> Because there is ECC region that could only be programmed once, so
> >> add a heler in_ecc to check the ecc region.
> >>
> >> Signed-off-by: Peng Fan 
> >> ---
> >>
> >> The ATF patch will soon be posted to ATF community.
> >>
> >>   drivers/nvmem/imx-ocotp-scu.c | 73
> >> ++-
> >>   1 file changed, 72 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/nvmem/imx-ocotp-scu.c
> >> b/drivers/nvmem/imx-ocotp-scu.c index 2f339d7432e6..0f064f2e74a8
> >> 100644
> >> --- a/drivers/nvmem/imx-ocotp-scu.c
> >> +++ b/drivers/nvmem/imx-ocotp-scu.c
> >> @@ -7,6 +7,7 @@
> >>* Peng Fan 
> >>*/
> >>
> >> +#include 
> >>   #include 
> >>   #include 
> >>   #include 
> >> @@ -14,6 +15,9 @@
> >>   #include 
> >>   #include 
> >>
> >> +#define IMX_SIP_OTP   0xC20A
> >> +#define IMX_SIP_OTP_WRITE 0x2
> >> +
> >>   enum ocotp_devtype {
> >>IMX8QXP,
> >>   };
> >> @@ -45,6 +49,8 @@ struct imx_sc_msg_misc_fuse_read {
> >>u32 word;
> >>   } __packed;
> >>
> >> +static DEFINE_MUTEX(scu_ocotp_mutex);
> >> +
> >>   static struct ocotp_devtype_data imx8qxp_data = {
> >>.devtype = IMX8QXP,
> >>.nregs = 800,
> >> @@ -73,6 +79,23 @@ static bool in_hole(void *context, u32 index)
> >>return false;
> >>   }
> >>
> >> +static bool in_ecc(void *context, u32 index) {
> >> +  struct ocotp_priv *priv = context;
> >> +  const struct ocotp_devtype_data *data = priv->data;
> >> +  int i;
> >> +
> >> +  for (i = 0; i < data->num_region; i++) {
> >> +  if (data->region[i].flag & ECC_REGION) {
> >> +  if ((index >= data->region[i].start) &&
> >> +  (index <= data->region[i].end))
> >> +  return true;
> >> +  }
> >> +  }
> >> +
> >> +  return false;
> >> +}
> >> +
> >>   static int imx_sc_misc_otp_fuse_read(struct imx_sc_ipc *ipc, u32
> word,
> >> u32 *val)
> >>   {
> >> @@ -116,6 +139,8 @@ static int imx_scu_ocotp_read(void *context,
> >> unsigned int offset,
> >>if (!p)
> >>return -ENOMEM;
> >>
> >> +  mutex_lock(_ocotp_mutex);
> >> +
> >>buf = p;
> >>
> >>for (i = index; i < (index + count); i++) { @@ -126,6 +151,7 @@
> >> static int imx_scu_ocotp_read(void *context, unsigned int offset,
> >>
> >>ret = imx_sc_misc_otp_fuse_read(priv->nvmem_ipc, i, buf);
> >>if (ret) {
> >> +  mutex_unlock(_ocotp_mutex);
> >>kfree(p);
> >>return ret;
> >>}
> >> @@ -134,18 +160,63 @@ static int imx_scu_ocotp_read(void *context,
> >> unsigned int offset,
> >>
> >>memcpy(val, (u8 *)p + offset % 4, bytes);
> >>
> >> +  mutex_unlock(_ocotp_mutex);
> >> +
> >>kfree(p);
> >>
> >>return 0;
> >>   }
> >>
> >> +static int imx_scu_ocotp_write(void *context, unsigned int offset,
> >> + void *val, size_t bytes)
> >> +{
> >> +  struct ocotp_priv *priv = context;
> >> +  struct arm_smccc_res res;
> >> +  u32 *buf = val;
> >> +  u32 tmp;
> >> +  u32 index;
> >> +  int ret;
> >> +
> >> +  /* allow only writing one complete OTP word at a time */
> >> +  if ((bytes != 4) || (offset % 4))
> >> +  return -EINVAL;
> >> +
> >> +  index = offset >> 2;
> >> +
> >> +  if (in_hole(context, index))
> >> +  return -EINVAL;
> >> +
> >> +  if (in_ecc(context, index)) {
> >> +  pr_warn("ECC region, only program once\n");
> >> +  mutex_lock(_ocotp_mutex);
> >> +  ret = imx_sc_misc_otp_fuse_read(priv->nvmem_ipc, index, );
> >> +  mutex_unlock(_ocotp_mutex);
> >> +  if (ret)
> >> +  return ret;
> >> +  if (tmp) {
> >> +  pr_warn("ECC region, already has value: %x\n", tmp);
> >> +  return -EIO;
> >> +  }
> >> +  }
> >> +
> >> +  mutex_lock(_ocotp_mutex);
> >> +
> >> +  arm_smccc_smc(IMX_SIP_OTP, IMX_SIP_OTP_WRITE, index, *buf,
> >> +0, 0, 0, 0, );
> >> +
> >> +  mutex_unlock(_ocotp_mutex);
> >> +
> >> +  return res.a0;
> >> +}
> >> +
> >>   static struct nvmem_config imx_scu_ocotp_nvmem_config = {
> >>.name = "imx-scu-ocotp",
> >> -  .read_only = true,
> >> +  .read_only = false,
> >>.word_size = 4,
> >>.stride = 1,
> >>.owner = THIS_MODULE,
> >>.reg_read = imx_scu_ocotp_read,
> >> +  .reg_write = imx_scu_ocotp_write,
> >>   };
> >>
> >>   static const struct

Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return

2019-09-22 Thread Matt Cover

On Sun, Sep 22, 2019 at 5:51 PM Jason Wang  wrote:
>
>
> On 2019/9/23 上午6:30, Matt Cover wrote:
> > On Sun, Sep 22, 2019 at 1:36 PM Michael S. Tsirkin  wrote:
> >> On Sun, Sep 22, 2019 at 10:43:19AM -0700, Matt Cover wrote:
> >>> On Sun, Sep 22, 2019 at 5:37 AM Michael S. Tsirkin  
> >>> wrote:
>  On Fri, Sep 20, 2019 at 11:58:43AM -0700, Matthew Cover wrote:
> > Treat a negative return from a TUNSETSTEERINGEBPF bpf prog as a signal
> > to fallback to tun_automq_select_queue() for tx queue selection.
> >
> > Compilation of this exact patch was tested.
> >
> > For functional testing 3 additional printk()s were added.
> >
> > Functional testing results (on 2 txq tap device):
> >
> >[Fri Sep 20 18:33:27 2019] == tun no prog ==
> >[Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned 
> > '-1'
> >[Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran
> >[Fri Sep 20 18:33:27 2019] == tun prog -1 ==
> >[Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned 
> > '-1'
> >[Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned 
> > '-1'
> >[Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran
> >[Fri Sep 20 18:33:27 2019] == tun prog 0 ==
> >[Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned 
> > '0'
> >[Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned 
> > '0'
> >[Fri Sep 20 18:33:27 2019] == tun prog 1 ==
> >[Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned 
> > '1'
> >[Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned 
> > '1'
> >[Fri Sep 20 18:33:27 2019] == tun prog 2 ==
> >[Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned 
> > '2'
> >[Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned 
> > '0'
> >
> > Signed-off-by: Matthew Cover 
> 
>  Could you add a bit more motivation data here?
> >>> Thank you for these questions Michael.
> >>>
> >>> I'll plan on adding the below information to the
> >>> commit message and submitting a v2 of this patch
> >>> when net-next reopens. In the meantime, it would
> >>> be very helpful to know if these answers address
> >>> some of your concerns.
> >>>
>  1. why is this a good idea
> >>> This change allows TUNSETSTEERINGEBPF progs to
> >>> do any of the following.
> >>>   1. implement queue selection for a subset of
> >>>  traffic (e.g. special queue selection logic
> >>>  for ipv4, but return negative and use the
> >>>  default automq logic for ipv6)
> >>>   2. determine there isn't sufficient information
> >>>  to do proper queue selection; return
> >>>  negative and use the default automq logic
> >>>  for the unknown
> >>>   3. implement a noop prog (e.g. do
> >>>  bpf_trace_printk() then return negative and
> >>>  use the default automq logic for everything)
> >>>
>  2. how do we know existing userspace does not rely on existing behaviour
> >>> Prior to this change a negative return from a
> >>> TUNSETSTEERINGEBPF prog would have been cast
> >>> into a u16 and traversed netdev_cap_txqueue().
> >>>
> >>> In most cases netdev_cap_txqueue() would have
> >>> found this value to exceed real_num_tx_queues
> >>> and queue_index would be updated to 0.
> >>>
> >>> It is possible that a TUNSETSTEERINGEBPF prog
> >>> return a negative value which when cast into a
> >>> u16 results in a positive queue_index less than
> >>> real_num_tx_queues. For example, on x86_64, a
> >>> return value of -65535 results in a queue_index
> >>> of 1; which is a valid queue for any multiqueue
> >>> device.
> >>>
> >>> It seems unlikely, however as stated above is
> >>> unfortunately possible, that existing
> >>> TUNSETSTEERINGEBPF programs would choose to
> >>> return a negative value rather than return the
> >>> positive value which holds the same meaning.
> >>>
> >>> It seems more likely that future
> >>> TUNSETSTEERINGEBPF programs would leverage a
> >>> negative return and potentially be loaded into
> >>> a kernel with the old behavior.
> >> OK if we are returning a special
> >> value, shouldn't we limit it? How about a special
> >> value with this meaning?
> >> If we are changing an ABI let's at least make it
> >> extensible.
> >>
> > A special value with this meaning sounds
> > good to me. I'll plan on adding a define
> > set to -1 to cause the fallback to automq.
>
>
> Can it really return -1?
>
> I see:
>
> static inline u32 bpf_prog_run_clear_cb(const struct bpf_prog *prog,
>  struct sk_buff *skb)
> ...
>
>
> >
> > The way I was initially viewing the old
> > behavior was that returning negative was
> > undefined; it happened to have the
> > outcomes I walked

[PATCH v2] soc: ti: move 2 driver config options into the TI SOC drivers menu

2019-09-22 Thread Randy Dunlap

From: Randy Dunlap 

Move the AM654 and J721E SOC config options inside the "TI SOC drivers"
menu with the other TI SOC drivers.

Fixes: a869b7b30dac ("soc: ti: Add Support for AM654 SoC config option")
Fixes: cff377f7897a ("soc: ti: Add Support for J721E SoC config option")
Signed-off-by: Randy Dunlap 
Cc: Santosh Shilimkar 
Cc: Olof Johansson 
Cc: Nishanth Menon 
#Cc: Benjamin Fair 
Cc: Tony Lindgren 
Cc: Tero Kristo 
Cc: linux-kernel@vger.kernel.org
Cc: linux-arm-ker...@lists.infradead.org
---
v2: add Santosh (maintainer) for merging
drop Benjamin Fair (email address bounces)

 drivers/soc/ti/Kconfig |   20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

--- lnx-53.orig/drivers/soc/ti/Kconfig
+++ lnx-53/drivers/soc/ti/Kconfig
@@ -1,4 +1,12 @@
 # SPDX-License-Identifier: GPL-2.0-only
+
+# TI SOC drivers
+#
+menuconfig SOC_TI
+   bool "TI SOC drivers support"
+
+if SOC_TI
+
 # 64-bit ARM SoCs from TI
 if ARM64
 
@@ -14,17 +22,9 @@ config ARCH_K3_J721E_SOC
help
  Enable support for TI's J721E SoC Family.
 
-endif
+endif # ARCH_K3
 
-endif
-
-#
-# TI SOC drivers
-#
-menuconfig SOC_TI
-   bool "TI SOC drivers support"
-
-if SOC_TI
+endif # ARM64
 
 config KEYSTONE_NAVIGATOR_QMSS
tristate "Keystone Queue Manager Subsystem"

Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return

2019-09-22 Thread Jason Wang




On 2019/9/23 上午6:30, Matt Cover wrote:

On Sun, Sep 22, 2019 at 1:36 PM Michael S. Tsirkin  wrote:

On Sun, Sep 22, 2019 at 10:43:19AM -0700, Matt Cover wrote:

On Sun, Sep 22, 2019 at 5:37 AM Michael S. Tsirkin  wrote:

On Fri, Sep 20, 2019 at 11:58:43AM -0700, Matthew Cover wrote:

Treat a negative return from a TUNSETSTEERINGEBPF bpf prog as a signal
to fallback to tun_automq_select_queue() for tx queue selection.

Compilation of this exact patch was tested.

For functional testing 3 additional printk()s were added.

Functional testing results (on 2 txq tap device):

   [Fri Sep 20 18:33:27 2019] == tun no prog ==
   [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1'
   [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran
   [Fri Sep 20 18:33:27 2019] == tun prog -1 ==
   [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '-1'
   [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1'
   [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran
   [Fri Sep 20 18:33:27 2019] == tun prog 0 ==
   [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '0'
   [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0'
   [Fri Sep 20 18:33:27 2019] == tun prog 1 ==
   [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '1'
   [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '1'
   [Fri Sep 20 18:33:27 2019] == tun prog 2 ==
   [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '2'
   [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0'

Signed-off-by: Matthew Cover 


Could you add a bit more motivation data here?

Thank you for these questions Michael.

I'll plan on adding the below information to the
commit message and submitting a v2 of this patch
when net-next reopens. In the meantime, it would
be very helpful to know if these answers address
some of your concerns.


1. why is this a good idea

This change allows TUNSETSTEERINGEBPF progs to
do any of the following.
  1. implement queue selection for a subset of
 traffic (e.g. special queue selection logic
 for ipv4, but return negative and use the
 default automq logic for ipv6)
  2. determine there isn't sufficient information
 to do proper queue selection; return
 negative and use the default automq logic
 for the unknown
  3. implement a noop prog (e.g. do
 bpf_trace_printk() then return negative and
 use the default automq logic for everything)


2. how do we know existing userspace does not rely on existing behaviour

Prior to this change a negative return from a
TUNSETSTEERINGEBPF prog would have been cast
into a u16 and traversed netdev_cap_txqueue().

In most cases netdev_cap_txqueue() would have
found this value to exceed real_num_tx_queues
and queue_index would be updated to 0.

It is possible that a TUNSETSTEERINGEBPF prog
return a negative value which when cast into a
u16 results in a positive queue_index less than
real_num_tx_queues. For example, on x86_64, a
return value of -65535 results in a queue_index
of 1; which is a valid queue for any multiqueue
device.

It seems unlikely, however as stated above is
unfortunately possible, that existing
TUNSETSTEERINGEBPF programs would choose to
return a negative value rather than return the
positive value which holds the same meaning.

It seems more likely that future
TUNSETSTEERINGEBPF programs would leverage a
negative return and potentially be loaded into
a kernel with the old behavior.

OK if we are returning a special
value, shouldn't we limit it? How about a special
value with this meaning?
If we are changing an ABI let's at least make it
extensible.


A special value with this meaning sounds
good to me. I'll plan on adding a define
set to -1 to cause the fallback to automq.



Can it really return -1?

I see:

static inline u32 bpf_prog_run_clear_cb(const struct bpf_prog *prog,
    struct sk_buff *skb)
...




The way I was initially viewing the old
behavior was that returning negative was
undefined; it happened to have the
outcomes I walked through, but not
necessarily by design.



Having such fallback may bring extra troubles, it requires the eBPF 
program know the existence of the behavior which is not a part of kernel 
ABI actually. And then some eBPF program may start to rely on that which 
is pretty dangerous. Note, one important consideration is to have 
macvtap support where does not have any stuffs like automq.


Thanks




In order to keep the new behavior
extensible, how should we state that a
negative return other than -1 is
undefined and therefore subject to
change. Is something like this
sufficient?

   Documentation/networking/tc-actions-env-rules.txt

Additionally, what should the new
behavior implement when a negative other
than -1 is returned? I would like

[PATCH 1/4] riscv: avoid kernel hangs when trapped in BUG()

2019-09-22 Thread Vincent Chen

When the CONFIG_GENERIC_BUG is disabled by disabling CONFIG_BUG, if a
kernel thread is trapped by BUG(), the whole system will be in the
loop that infinitely handles the ebreak exception instead of entering the
die function. To fix this problem, the do_trap_break() will always call
the die() to deal with the break exception as the type of break is
BUG_TRAP_TYPE_BUG.

Signed-off-by: Vincent Chen 
---
 arch/riscv/kernel/traps.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
index 424eb72d56b1..055a937aca70 100644
--- a/arch/riscv/kernel/traps.c
+++ b/arch/riscv/kernel/traps.c
@@ -124,23 +124,23 @@ static inline unsigned long 
get_break_insn_length(unsigned long pc)
 
 asmlinkage void do_trap_break(struct pt_regs *regs)
 {
-#ifdef CONFIG_GENERIC_BUG
if (!user_mode(regs)) {
enum bug_trap_type type;
 
type = report_bug(regs->sepc, regs);
switch (type) {
+#ifdef CONFIG_GENERIC_BUG
case BUG_TRAP_TYPE_NONE:
break;
case BUG_TRAP_TYPE_WARN:
regs->sepc += get_break_insn_length(regs->sepc);
break;
case BUG_TRAP_TYPE_BUG:
+#endif /* CONFIG_GENERIC_BUG */
+   default:
die(regs, "Kernel BUG");
}
}
-#endif /* CONFIG_GENERIC_BUG */
-
force_sig_fault(SIGTRAP, TRAP_BRKPT, (void __user *)(regs->sepc));
 }
 
-- 
2.7.4

[PATCH 0/4] riscv: correct the do_trap_break()

2019-09-22 Thread Vincent Chen



The following three situations may occur in the current implementation of
do_trap_break().
1. When the CONFIG_GENERIC_BUG is disabled, if a kernel thread is trapped
   by BUG(), the whole system will be in the loop that infinitely handles
   the break exception instead of entering the die function.
2. When the kernel runs code on behalf of a user thread, and the kernel
   executes a WARN() or WARN_ON(), the user thread will be sent a bogus
   SIGTRAP.
3. Handling the unexpected ebreak instructions is to send a SIGTRAP
   to the trapped thread. However, if a kernel executes an unexpected
   ebreak, it may cause the kernel thread to be stuck in the ebreak
   instruction.

This patch set will solve the above problems by adjusting the
implementations of the do_trap_break().


Vincent Chen (4):
  riscv: avoid kernel hangs when trapped in BUG()
  rsicv: avoid sending a SIGTRAP to a user thread trapped in WARN()
  riscv: Correct the handling of unexpected ebreak in do_trap_break()
  riscv: remove the switch statement in do_trap_break()

 arch/riscv/kernel/traps.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

-- 
2.7.4

[PATCH 2/4] rsicv: avoid sending a SIGTRAP to a user thread trapped in WARN()

2019-09-22 Thread Vincent Chen

On RISC-V, when the kernel runs code on behalf of a user thread, and the
kernel executes a WARN() or WARN_ON(), the user thread will be sent
a bogus SIGTRAP.  Fix the RISC-V kernel code to not send a SIGTRAP when
a WARN()/WARN_ON() is executed.

Signed-off-by: Vincent Chen 
---
 arch/riscv/kernel/traps.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
index 055a937aca70..82f42a55451e 100644
--- a/arch/riscv/kernel/traps.c
+++ b/arch/riscv/kernel/traps.c
@@ -134,7 +134,7 @@ asmlinkage void do_trap_break(struct pt_regs *regs)
break;
case BUG_TRAP_TYPE_WARN:
regs->sepc += get_break_insn_length(regs->sepc);
-   break;
+   return;
case BUG_TRAP_TYPE_BUG:
 #endif /* CONFIG_GENERIC_BUG */
default:
-- 
2.7.4

Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return

2019-09-22 Thread Jason Wang




On 2019/9/23 上午1:43, Matt Cover wrote:

On Sun, Sep 22, 2019 at 5:37 AM Michael S. Tsirkin  wrote:

On Fri, Sep 20, 2019 at 11:58:43AM -0700, Matthew Cover wrote:

Treat a negative return from a TUNSETSTEERINGEBPF bpf prog as a signal
to fallback to tun_automq_select_queue() for tx queue selection.

Compilation of this exact patch was tested.

For functional testing 3 additional printk()s were added.

Functional testing results (on 2 txq tap device):

   [Fri Sep 20 18:33:27 2019] == tun no prog ==
   [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1'
   [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran
   [Fri Sep 20 18:33:27 2019] == tun prog -1 ==
   [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '-1'
   [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1'
   [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran
   [Fri Sep 20 18:33:27 2019] == tun prog 0 ==
   [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '0'
   [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0'
   [Fri Sep 20 18:33:27 2019] == tun prog 1 ==
   [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '1'
   [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '1'
   [Fri Sep 20 18:33:27 2019] == tun prog 2 ==
   [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '2'
   [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0'

Signed-off-by: Matthew Cover 


Could you add a bit more motivation data here?

Thank you for these questions Michael.

I'll plan on adding the below information to the
commit message and submitting a v2 of this patch
when net-next reopens. In the meantime, it would
be very helpful to know if these answers address
some of your concerns.


1. why is this a good idea

This change allows TUNSETSTEERINGEBPF progs to
do any of the following.
  1. implement queue selection for a subset of
 traffic (e.g. special queue selection logic
 for ipv4, but return negative and use the
 default automq logic for ipv6)



Well, using ebpf means it need to take care of all the cases. E.g you 
can easily implement the fallback through eBPF as well.




  2. determine there isn't sufficient information
 to do proper queue selection; return
 negative and use the default automq logic
 for the unknown



Same as above.



  3. implement a noop prog (e.g. do
 bpf_trace_printk() then return negative and
 use the default automq logic for everything)



ditto.





2. how do we know existing userspace does not rely on existing behaviour

Prior to this change a negative return from a
TUNSETSTEERINGEBPF prog would have been cast
into a u16 and traversed netdev_cap_txqueue().

In most cases netdev_cap_txqueue() would have
found this value to exceed real_num_tx_queues
and queue_index would be updated to 0.

It is possible that a TUNSETSTEERINGEBPF prog
return a negative value which when cast into a
u16 results in a positive queue_index less than
real_num_tx_queues. For example, on x86_64, a
return value of -65535 results in a queue_index
of 1; which is a valid queue for any multiqueue
device.

It seems unlikely, however as stated above is
unfortunately possible, that existing
TUNSETSTEERINGEBPF programs would choose to
return a negative value rather than return the
positive value which holds the same meaning.

It seems more likely that future
TUNSETSTEERINGEBPF programs would leverage a
negative return and potentially be loaded into
a kernel with the old behavior.



Yes, eBPF can return probably wrong value, but what kernel did is just 
to make sure it doesn't harm anything.


I would rather just drop the packet in this case.

Thanks





3. why doesn't userspace need a way to figure out whether it runs on a kernel 
with and
without this patch

There may be some value in exposing this fact
to the ebpf prog loader. What is the standard
practice here, a define?



thanks,
MST


---
  drivers/net/tun.c | 20 +++-
  1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index aab0be4..173d159 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -583,35 +583,37 @@ static u16 tun_automq_select_queue(struct tun_struct 
*tun, struct sk_buff *skb)
   return txq;
  }

-static u16 tun_ebpf_select_queue(struct tun_struct *tun, struct sk_buff *skb)
+static int tun_ebpf_select_queue(struct tun_struct *tun, struct sk_buff *skb)
  {
   struct tun_prog *prog;
   u32 numqueues;
- u16 ret = 0;
+ int ret = -1;

   numqueues = READ_ONCE(tun->numqueues);
   if (!numqueues)
   return 0;

+ rcu_read_lock();
   prog = rcu_dereference(tun->steering_prog);
   if (prog)
   ret = bpf_prog_run_clear_cb(prog->prog, skb);
+

[PATCH 4/4] riscv: remove the switch statement in do_trap_break()

2019-09-22 Thread Vincent Chen

To make the code more straightforward, replacing the switch statement
with if statement.

Suggested-by: Paul Walmsley 
Signed-off-by: Vincent Chen 
---
 arch/riscv/kernel/traps.c | 23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
index dd13bc90aeb6..1ac75f7d0bff 100644
--- a/arch/riscv/kernel/traps.c
+++ b/arch/riscv/kernel/traps.c
@@ -124,23 +124,24 @@ static inline unsigned long 
get_break_insn_length(unsigned long pc)
 
 asmlinkage void do_trap_break(struct pt_regs *regs)
 {
-   if (!user_mode(regs)) {
+   if (user_mode(regs)) {
+   force_sig_fault(SIGTRAP, TRAP_BRKPT,
+   (void __user *)(regs->sepc));
+   return;
+   }
+#ifdef CONFIG_GENERIC_BUG
+   {
enum bug_trap_type type;
 
type = report_bug(regs->sepc, regs);
-   switch (type) {
-#ifdef CONFIG_GENERIC_BUG
-   case BUG_TRAP_TYPE_WARN:
+   if (type == BUG_TRAP_TYPE_WARN) {
regs->sepc += get_break_insn_length(regs->sepc);
return;
-   case BUG_TRAP_TYPE_BUG:
-#endif /* CONFIG_GENERIC_BUG */
-   default:
-   die(regs, "Kernel BUG");
}
-   } else
-   force_sig_fault(SIGTRAP, TRAP_BRKPT,
-   (void __user *)(regs->sepc));
+   }
+#endif /* CONFIG_GENERIC_BUG */
+
+   die(regs, "Kernel BUG");
 }
 
 #ifdef CONFIG_GENERIC_BUG
-- 
2.7.4

[PATCH 3/4] riscv: Correct the handling of unexpected ebreak in do_trap_break()

2019-09-22 Thread Vincent Chen

For the kernel space, all ebreak instructions are determined at compile
time because the kernel space debugging module is currently unsupported.
Hence, it should be treated as a bug if an ebreak instruction which does
not belong to BUG_TRAP_TYPE_WARN or BUG_TRAP_TYPE_BUG is executed in
kernel space. For the userspace, debugging module or user problem may
intentionally insert an ebreak instruction to trigger a SIGTRAP signal.
To approach the above two situations, the do_trap_break() will direct
the BUG_TRAP_TYPE_NONE ebreak exception issued in kernel space to die()
and will send a SIGTRAP to the trapped process only when the ebreak is
in userspace.

Signed-off-by: Vincent Chen 
---
 arch/riscv/kernel/traps.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
index 82f42a55451e..dd13bc90aeb6 100644
--- a/arch/riscv/kernel/traps.c
+++ b/arch/riscv/kernel/traps.c
@@ -130,8 +130,6 @@ asmlinkage void do_trap_break(struct pt_regs *regs)
type = report_bug(regs->sepc, regs);
switch (type) {
 #ifdef CONFIG_GENERIC_BUG
-   case BUG_TRAP_TYPE_NONE:
-   break;
case BUG_TRAP_TYPE_WARN:
regs->sepc += get_break_insn_length(regs->sepc);
return;
@@ -140,8 +138,9 @@ asmlinkage void do_trap_break(struct pt_regs *regs)
default:
die(regs, "Kernel BUG");
}
-   }
-   force_sig_fault(SIGTRAP, TRAP_BRKPT, (void __user *)(regs->sepc));
+   } else
+   force_sig_fault(SIGTRAP, TRAP_BRKPT,
+   (void __user *)(regs->sepc));
 }
 
 #ifdef CONFIG_GENERIC_BUG
-- 
2.7.4

[mm, slab_common] c5e1edaa1a: BUG:unable_to_handle_page_fault_for_address

2019-09-22 Thread kernel test robot

FYI, we noticed the following commit (built with gcc-7):

commit: c5e1edaa1a52581b350315e4b163cdcc0fd7605d ("[PATCH v5 7/7] mm, 
slab_common: Modify kmalloc_caches[type][idx] to kmalloc_caches[idx][type]")
url: 
https://github.com/0day-ci/linux/commits/Pengfei-Li/mm-slab-Make-kmalloc_info-contain-all-types-of-names/20190916-224827


in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 4G

caused below changes (please refer to attached dmesg/kmsg for entire 
log/backtrace):


+-+++
| | 4e0d9effb5 | c5e1edaa1a |
+-+++
| boot_successes  | 4  | 0  |
| boot_failures   | 0  | 4  |
| BUG:unable_to_handle_page_fault_for_address | 0  | 4  |
| Oops:#[##]  | 0  | 4  |
| RIP:calculate_slab_order| 0  | 4  |
| Kernel_panic-not_syncing:Fatal_exception| 0  | 4  |
+-+++


If you fix the issue, kindly add following tag
Reported-by: kernel test robot 


[0.417634] BUG: unable to handle page fault for address: 00301171
[0.418931] #PF: supervisor read access in kernel mode
[0.419879] #PF: error_code(0x) - not-present page
[0.420827] PGD 0 P4D 0 
[0.421296] Oops:  [#1] DEBUG_PAGEALLOC
[0.422073] CPU: 0 PID: 0 Comm: swapper Tainted: GT 
5.3.0-7-gc5e1edaa1a525 #2
[0.423725] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1 04/01/2014
[0.425265] RIP: 0010:calculate_slab_order+0x81/0xd9
[0.426194] Code: ba e4 11 41 89 5f 24 41 89 4f 28 73 35 eb 61 31 f6 89 c7 
89 4c 24 04 48 89 54 24 08 e8 95 8a fd ff 48 85 c0 8b 4c 24 04 74 38 <83> 78 20 
00 78 32 41 8b 77 14 48 8b 54 24 08 d1 ee 39 70 14 76 be
[0.429675] RSP: :82603e50 EFLAGS: 00010002
[0.430648] RAX: 00301151 RBX: 0001 RCX: 
[0.431966] RDX:  RSI:  RDI: 0001
[0.433282] RBP: 1000 R08:  R09: 
[0.434617] R10: 2000 R11: 0005 R12: 80002800
[0.435936] R13: 8000 R14:  R15: 82853bc0
[0.437259] FS:  () GS:8264e000() 
knlGS:
[0.438765] CS:  0010 DS:  ES:  CR0: 80050033
[0.439840] CR2: 00301171 CR3: 02622000 CR4: 000406b0
[0.441157] Call Trace:
[0.441614]  set_off_slab_cache+0x39/0x59
[0.442357]  __kmem_cache_create+0x11a/0x26e
[0.443171]  create_boot_cache+0x46/0x69
[0.443900]  kmem_cache_init+0x72/0x115
[0.444612]  start_kernel+0x204/0x454
[0.445301]  secondary_startup_64+0xa4/0xb0
[0.446078] Modules linked in:
[0.446696] CR2: 00301171
[0.447323] random: get_random_bytes called from init_oops_id+0x1d/0x2c with 
crng_init=0
[0.448820] ---[ end trace b2352b5ced6eae1c ]---


To reproduce:

# build kernel
cd linux
cp config-5.3.0-7-gc5e1edaa1a525 .config
make HOSTCC=gcc-7 CC=gcc-7 ARCH=x86_64 olddefconfig prepare 
modules_prepare bzImage

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k  job-script # job-script is attached in this 
email



Thanks,
lkp

#
# Automatically generated file; DO NOT EDIT.
# Linux/x86_64 5.3.0 Kernel Configuration
#

#
# Compiler: gcc-7 (Debian 7.4.0-13) 7.4.0
#
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=70400
CONFIG_CLANG_VERSION=0
CONFIG_CC_CAN_LINK=y
CONFIG_CC_HAS_ASM_GOTO=y
CONFIG_CC_HAS_WARN_MAYBE_UNINITIALIZED=y
CONFIG_CC_DISABLE_WARN_MAYBE_UNINITIALIZED=y
CONFIG_CONSTRUCTORS=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_COMPILE_TEST is not set
# CONFIG_HEADER_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_BUILD_SALT=""
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
# CONFIG_KERNEL_GZIP is not set
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
CONFIG_KERNEL_XZ=y
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
# CONFIG_SYSVIPC is not set
# CONFIG_POSIX_MQUEUE is not set
CONFIG_CROSS_MEMORY_ATTACH=y
# CONFIG_USELIB is not set
# CONFIG_AUDIT is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_SIM=y

Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return

2019-09-22 Thread Matt Cover

On Sun, Sep 22, 2019 at 3:46 PM Matt Cover  wrote:
>
> On Sun, Sep 22, 2019 at 3:30 PM Matt Cover  wrote:
> >
> > On Sun, Sep 22, 2019 at 1:36 PM Michael S. Tsirkin  wrote:
> > >
> > > On Sun, Sep 22, 2019 at 10:43:19AM -0700, Matt Cover wrote:
> > > > On Sun, Sep 22, 2019 at 5:37 AM Michael S. Tsirkin  
> > > > wrote:
> > > > >
> > > > > On Fri, Sep 20, 2019 at 11:58:43AM -0700, Matthew Cover wrote:
> > > > > > Treat a negative return from a TUNSETSTEERINGEBPF bpf prog as a 
> > > > > > signal
> > > > > > to fallback to tun_automq_select_queue() for tx queue selection.
> > > > > >
> > > > > > Compilation of this exact patch was tested.
> > > > > >
> > > > > > For functional testing 3 additional printk()s were added.
> > > > > >
> > > > > > Functional testing results (on 2 txq tap device):
> > > > > >
> > > > > >   [Fri Sep 20 18:33:27 2019] == tun no prog ==
> > > > > >   [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() 
> > > > > > returned '-1'
> > > > > >   [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran
> > > > > >   [Fri Sep 20 18:33:27 2019] == tun prog -1 ==
> > > > > >   [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() 
> > > > > > returned '-1'
> > > > > >   [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() 
> > > > > > returned '-1'
> > > > > >   [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran
> > > > > >   [Fri Sep 20 18:33:27 2019] == tun prog 0 ==
> > > > > >   [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() 
> > > > > > returned '0'
> > > > > >   [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() 
> > > > > > returned '0'
> > > > > >   [Fri Sep 20 18:33:27 2019] == tun prog 1 ==
> > > > > >   [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() 
> > > > > > returned '1'
> > > > > >   [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() 
> > > > > > returned '1'
> > > > > >   [Fri Sep 20 18:33:27 2019] == tun prog 2 ==
> > > > > >   [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() 
> > > > > > returned '2'
> > > > > >   [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() 
> > > > > > returned '0'
> > > > > >
> > > > > > Signed-off-by: Matthew Cover 
> > > > >
> > > > >
> > > > > Could you add a bit more motivation data here?
> > > >
> > > > Thank you for these questions Michael.
> > > >
> > > > I'll plan on adding the below information to the
> > > > commit message and submitting a v2 of this patch
> > > > when net-next reopens. In the meantime, it would
> > > > be very helpful to know if these answers address
> > > > some of your concerns.
> > > >
> > > > > 1. why is this a good idea
> > > >
> > > > This change allows TUNSETSTEERINGEBPF progs to
> > > > do any of the following.
> > > >  1. implement queue selection for a subset of
> > > > traffic (e.g. special queue selection logic
> > > > for ipv4, but return negative and use the
> > > > default automq logic for ipv6)
> > > >  2. determine there isn't sufficient information
> > > > to do proper queue selection; return
> > > > negative and use the default automq logic
> > > > for the unknown
> > > >  3. implement a noop prog (e.g. do
> > > > bpf_trace_printk() then return negative and
> > > > use the default automq logic for everything)
> > > >
> > > > > 2. how do we know existing userspace does not rely on existing 
> > > > > behaviour
> > > >
> > > > Prior to this change a negative return from a
> > > > TUNSETSTEERINGEBPF prog would have been cast
> > > > into a u16 and traversed netdev_cap_txqueue().
> > > >
> > > > In most cases netdev_cap_txqueue() would have
> > > > found this value to exceed real_num_tx_queues
> > > > and queue_index would be updated to 0.
> > > >
> > > > It is possible that a TUNSETSTEERINGEBPF prog
> > > > return a negative value which when cast into a
> > > > u16 results in a positive queue_index less than
> > > > real_num_tx_queues. For example, on x86_64, a
> > > > return value of -65535 results in a queue_index
> > > > of 1; which is a valid queue for any multiqueue
> > > > device.
> > > >
> > > > It seems unlikely, however as stated above is
> > > > unfortunately possible, that existing
> > > > TUNSETSTEERINGEBPF programs would choose to
> > > > return a negative value rather than return the
> > > > positive value which holds the same meaning.
> > > >
> > > > It seems more likely that future
> > > > TUNSETSTEERINGEBPF programs would leverage a
> > > > negative return and potentially be loaded into
> > > > a kernel with the old behavior.
> > >
> > > OK if we are returning a special
> > > value, shouldn't we limit it? How about a special
> > > value with this meaning?
> > > If we are changing an ABI let's at least make it
> > > extensible.
> > >
> >
> > A special value with this meaning sounds
> > good to me. I'll plan on adding a define
> > set to -1 to cause the fallback to automq.
> >
> >

Re: [PATCH 2/2] soc: ti: move 2 driver config options into the TI SOC drivers menu

2019-09-22 Thread santosh . shilimkar





On 9/21/19 1:46 PM, Randy Dunlap wrote:

Hi Santosh,

Would you also pick up patch 2/2, which I didn't Cc: you on?:(

Do I need to resend it?


Yes please. I don't have your 2/2

Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return

2019-09-22 Thread Matt Cover

On Sun, Sep 22, 2019 at 3:30 PM Matt Cover  wrote:
>
> On Sun, Sep 22, 2019 at 1:36 PM Michael S. Tsirkin  wrote:
> >
> > On Sun, Sep 22, 2019 at 10:43:19AM -0700, Matt Cover wrote:
> > > On Sun, Sep 22, 2019 at 5:37 AM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Fri, Sep 20, 2019 at 11:58:43AM -0700, Matthew Cover wrote:
> > > > > Treat a negative return from a TUNSETSTEERINGEBPF bpf prog as a signal
> > > > > to fallback to tun_automq_select_queue() for tx queue selection.
> > > > >
> > > > > Compilation of this exact patch was tested.
> > > > >
> > > > > For functional testing 3 additional printk()s were added.
> > > > >
> > > > > Functional testing results (on 2 txq tap device):
> > > > >
> > > > >   [Fri Sep 20 18:33:27 2019] == tun no prog ==
> > > > >   [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned 
> > > > > '-1'
> > > > >   [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran
> > > > >   [Fri Sep 20 18:33:27 2019] == tun prog -1 ==
> > > > >   [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned 
> > > > > '-1'
> > > > >   [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned 
> > > > > '-1'
> > > > >   [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran
> > > > >   [Fri Sep 20 18:33:27 2019] == tun prog 0 ==
> > > > >   [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned 
> > > > > '0'
> > > > >   [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned 
> > > > > '0'
> > > > >   [Fri Sep 20 18:33:27 2019] == tun prog 1 ==
> > > > >   [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned 
> > > > > '1'
> > > > >   [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned 
> > > > > '1'
> > > > >   [Fri Sep 20 18:33:27 2019] == tun prog 2 ==
> > > > >   [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned 
> > > > > '2'
> > > > >   [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned 
> > > > > '0'
> > > > >
> > > > > Signed-off-by: Matthew Cover 
> > > >
> > > >
> > > > Could you add a bit more motivation data here?
> > >
> > > Thank you for these questions Michael.
> > >
> > > I'll plan on adding the below information to the
> > > commit message and submitting a v2 of this patch
> > > when net-next reopens. In the meantime, it would
> > > be very helpful to know if these answers address
> > > some of your concerns.
> > >
> > > > 1. why is this a good idea
> > >
> > > This change allows TUNSETSTEERINGEBPF progs to
> > > do any of the following.
> > >  1. implement queue selection for a subset of
> > > traffic (e.g. special queue selection logic
> > > for ipv4, but return negative and use the
> > > default automq logic for ipv6)
> > >  2. determine there isn't sufficient information
> > > to do proper queue selection; return
> > > negative and use the default automq logic
> > > for the unknown
> > >  3. implement a noop prog (e.g. do
> > > bpf_trace_printk() then return negative and
> > > use the default automq logic for everything)
> > >
> > > > 2. how do we know existing userspace does not rely on existing behaviour
> > >
> > > Prior to this change a negative return from a
> > > TUNSETSTEERINGEBPF prog would have been cast
> > > into a u16 and traversed netdev_cap_txqueue().
> > >
> > > In most cases netdev_cap_txqueue() would have
> > > found this value to exceed real_num_tx_queues
> > > and queue_index would be updated to 0.
> > >
> > > It is possible that a TUNSETSTEERINGEBPF prog
> > > return a negative value which when cast into a
> > > u16 results in a positive queue_index less than
> > > real_num_tx_queues. For example, on x86_64, a
> > > return value of -65535 results in a queue_index
> > > of 1; which is a valid queue for any multiqueue
> > > device.
> > >
> > > It seems unlikely, however as stated above is
> > > unfortunately possible, that existing
> > > TUNSETSTEERINGEBPF programs would choose to
> > > return a negative value rather than return the
> > > positive value which holds the same meaning.
> > >
> > > It seems more likely that future
> > > TUNSETSTEERINGEBPF programs would leverage a
> > > negative return and potentially be loaded into
> > > a kernel with the old behavior.
> >
> > OK if we are returning a special
> > value, shouldn't we limit it? How about a special
> > value with this meaning?
> > If we are changing an ABI let's at least make it
> > extensible.
> >
>
> A special value with this meaning sounds
> good to me. I'll plan on adding a define
> set to -1 to cause the fallback to automq.
>
> The way I was initially viewing the old
> behavior was that returning negative was
> undefined; it happened to have the
> outcomes I walked through, but not
> necessarily by design.
>
> In order to keep the new behavior
> extensible, how should we state that a
> negative return other than -1 is
>

Re: [PATCH v4 1/2] soc/tegra: pmc: Query PCLK clock rate at probe time

2019-09-22 Thread Dmitry Osipenko

04.08.2019 23:29, Dmitry Osipenko пишет:
> It is possible to get a lockup if kernel decides to enter LP2 cpuidle
> from some clk-notifier, in that case CCF's "prepare" mutex is kept locked
> and thus clk_get_rate(pclk) blocks on the same mutex with interrupts being
> disabled.
> 
> Signed-off-by: Dmitry Osipenko 
> ---
> 
> Changelog:
> 
> v4: Added clk-notifier to track PCLK rate-changes, which may become useful
> in the future. That's done in response to v3 review comment from Peter
> De Schrijver.
> 
> Now properly handling case where clk pointer is intentionally NULL on
> the driver's probe.
> 
> v3: Changed commit's message because I actually recalled what was the
> initial reason for the patch, since the problem reoccurred once again.
> 
> v2: Addressed review comments that were made by Jon Hunter to v1 by
> not moving the memory barrier, replacing one missed clk_get_rate()
> with pmc->rate, handling possible clk_get_rate() error on probe and
> slightly adjusting the commits message.
> 
>  drivers/soc/tegra/pmc.c | 71 ++---
>  1 file changed, 53 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/soc/tegra/pmc.c b/drivers/soc/tegra/pmc.c
> index 9f9c1c677cf4..4e44943d0b26 100644
> --- a/drivers/soc/tegra/pmc.c
> +++ b/drivers/soc/tegra/pmc.c
> @@ -309,6 +309,7 @@ static const char * const tegra210_reset_sources[] = {
>   * @pctl_dev: pin controller exposed by the PMC
>   * @domain: IRQ domain provided by the PMC
>   * @irq: chip implementation for the IRQ domain
> + * @clk_nb: pclk clock changes handler
>   */
>  struct tegra_pmc {
>   struct device *dev;
> @@ -344,6 +345,8 @@ struct tegra_pmc {
>  
>   struct irq_domain *domain;
>   struct irq_chip irq;
> +
> + struct notifier_block clk_nb;
>  };
>  
>  static struct tegra_pmc *pmc = &(struct tegra_pmc) {
> @@ -1192,7 +1195,7 @@ static int tegra_io_pad_prepare(struct tegra_pmc *pmc, 
> enum tegra_io_pad id,
>   return err;
>  
>   if (pmc->clk) {
> - rate = clk_get_rate(pmc->clk);
> + rate = pmc->rate;
>   if (!rate) {
>   dev_err(pmc->dev, "failed to get clock rate\n");
>   return -ENODEV;
> @@ -1433,6 +1436,7 @@ void tegra_pmc_set_suspend_mode(enum tegra_suspend_mode 
> mode)
>  void tegra_pmc_enter_suspend_mode(enum tegra_suspend_mode mode)
>  {
>   unsigned long long rate = 0;
> + u64 ticks;
>   u32 value;
>  
>   switch (mode) {
> @@ -1441,31 +1445,22 @@ void tegra_pmc_enter_suspend_mode(enum 
> tegra_suspend_mode mode)
>   break;
>  
>   case TEGRA_SUSPEND_LP2:
> - rate = clk_get_rate(pmc->clk);
> + rate = pmc->rate;
>   break;
>  
>   default:
>   break;
>   }
>  
> - if (WARN_ON_ONCE(rate == 0))
> - rate = 1;
> -
> - if (rate != pmc->rate) {
> - u64 ticks;
> -
> - ticks = pmc->cpu_good_time * rate + USEC_PER_SEC - 1;
> - do_div(ticks, USEC_PER_SEC);
> - tegra_pmc_writel(pmc, ticks, PMC_CPUPWRGOOD_TIMER);
> + ticks = pmc->cpu_good_time * rate + USEC_PER_SEC - 1;
> + do_div(ticks, USEC_PER_SEC);
> + tegra_pmc_writel(pmc, ticks, PMC_CPUPWRGOOD_TIMER);
>  
> - ticks = pmc->cpu_off_time * rate + USEC_PER_SEC - 1;
> - do_div(ticks, USEC_PER_SEC);
> - tegra_pmc_writel(pmc, ticks, PMC_CPUPWROFF_TIMER);
> + ticks = pmc->cpu_off_time * rate + USEC_PER_SEC - 1;
> + do_div(ticks, USEC_PER_SEC);
> + tegra_pmc_writel(pmc, ticks, PMC_CPUPWROFF_TIMER);
>  
> - wmb();
> -
> - pmc->rate = rate;
> - }
> + wmb();
>  
>   value = tegra_pmc_readl(pmc, PMC_CNTRL);
>   value &= ~PMC_CNTRL_SIDE_EFFECT_LP0;
> @@ -2019,6 +2014,20 @@ static int tegra_pmc_irq_init(struct tegra_pmc *pmc)
>   return 0;
>  }
>  
> +static int tegra_pmc_clk_notify_cb(struct notifier_block *nb,
> +unsigned long action, void *ptr)
> +{
> + struct clk_notifier_data *data = ptr;
> + struct tegra_pmc *pmc;
> +
> + if (action == POST_RATE_CHANGE) {
> + pmc = container_of(nb, struct tegra_pmc, clk_nb);
> + pmc->rate = data->new_rate;
> + }
> +
> + return NOTIFY_OK;
> +}
> +
>  static int tegra_pmc_probe(struct platform_device *pdev)
>  {
>   void __iomem *base;
> @@ -2082,6 +2091,30 @@ static int tegra_pmc_probe(struct platform_device 
> *pdev)
>   pmc->clk = NULL;
>   }
>  
> + /*
> +  * PCLK clock rate can't be retrieved using CLK API because it
> +  * causes lockup if CPU enters LP2 idle state from some other
> +  * CLK notifier, hence we're caching the rate's value locally.
> +  */
> + if (pmc->clk) {
> + pmc->clk_nb.notifier_call = tegra_pmc_clk_notify_cb;
> + err = clk_notifier_register(pmc->clk, >clk_nb);
>

Re: [PATCH v2 1/2] clk: tegra: divider: Add missing check for enable-bit on rate's recalculation

2019-09-22 Thread Dmitry Osipenko

23.07.2019 5:52, Dmitry Osipenko пишет:
> Unset "enable" bit means that divider is in bypass mode, hence it doesn't
> have any effect in that case. Please note that there are no known bugs
> caused by the missing check.
> 
> Signed-off-by: Dmitry Osipenko 
> ---
> 
> Changelog:
> 
> v2: Changed the commit's description from 'Fix' to 'Add' in response to the
> Stephen's Boyd question about the need to backport the patch into stable
> kernels. The backporting is not really needed.
> 
>  drivers/clk/tegra/clk-divider.c | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/clk/tegra/clk-divider.c b/drivers/clk/tegra/clk-divider.c
> index e76731fb7d69..f33c19045386 100644
> --- a/drivers/clk/tegra/clk-divider.c
> +++ b/drivers/clk/tegra/clk-divider.c
> @@ -40,8 +40,13 @@ static unsigned long clk_frac_div_recalc_rate(struct 
> clk_hw *hw,
>   int div, mul;
>   u64 rate = parent_rate;
>  
> - reg = readl_relaxed(divider->reg) >> divider->shift;
> - div = reg & div_mask(divider);
> + reg = readl_relaxed(divider->reg);
> +
> + if ((divider->flags & TEGRA_DIVIDER_UART) &&
> + !(reg & PERIPH_CLK_UART_DIV_ENB))
> + return rate;
> +
> + div = (reg >> divider->shift) & div_mask(divider);
>  
>   mul = get_mul(divider);
>   div += mul;
> 

Hello Peter,

ACK?

Re: threads-max observe limits

2019-09-22 Thread Eric W. Biederman

Michal Hocko  writes:

> From 711000fdc243b6bc68a92f9ef0017ae495086d39 Mon Sep 17 00:00:00 2001
> From: Michal Hocko 
> Date: Sun, 22 Sep 2019 08:45:28 +0200
> Subject: [PATCH] kernel/sysctl.c: do not override max_threads provided by
>  userspace
>
> Partially revert 16db3d3f1170 ("kernel/sysctl.c: threads-max observe limits")
> because the patch is causing a regression to any workload which needs to
> override the auto-tuning of the limit provided by kernel.
>
> set_max_threads is implementing a boot time guesstimate to provide a
> sensible limit of the concurrently running threads so that runaways will
> not deplete all the memory. This is a good thing in general but there
> are workloads which might need to increase this limit for an application
> to run (reportedly WebSpher MQ is affected) and that is simply not
> possible after the mentioned change. It is also very dubious to override
> an admin decision by an estimation that doesn't have any direct relation
> to correctness of the kernel operation.
>
> Fix this by dropping set_max_threads from sysctl_max_threads so any
> value is accepted as long as it fits into MAX_THREADS which is important
> to check because allowing more threads could break internal robust futex
> restriction. While at it, do not use MIN_THREADS as the lower boundary
> because it is also only a heuristic for automatic estimation and admin
> might have a good reason to stop new threads to be created even when
> below this limit.
>
> Fixes: 16db3d3f1170 ("kernel/sysctl.c: threads-max observe limits")
> Cc: stable
> Signed-off-by: Michal Hocko 
Reviewed-by: "Eric W. Biederman" 

> ---
>  kernel/fork.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 2852d0e76ea3..ef865be37e98 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -2929,7 +2929,7 @@ int sysctl_max_threads(struct ctl_table *table, int 
> write,
>   struct ctl_table t;
>   int ret;
>   int threads = max_threads;
> - int min = MIN_THREADS;
> + int min = 1;
>   int max = MAX_THREADS;
>  
>   t = *table;
> @@ -2941,7 +2941,7 @@ int sysctl_max_threads(struct ctl_table *table, int 
> write,
>   if (ret || !write)
>   return ret;
>  
> - set_max_threads(threads);
> + max_threads = threads;
>  
>   return 0;
>  }
> -- 
> 2.20.1

Re: threads-max observe limits

2019-09-22 Thread Eric W. Biederman

Heinrich Schuchardt  writes:

> Did this patch when applied to the customer's kernel solve any problem?
>
> WebSphere MQ is a messaging application. If it hits the current limits
> of threads-max, there is a bug in the software or in the way that it has
> been set up at the customer. Instead of messing around with the kernel
> the application should be fixed.

While it is true that almost every workload will be buggy if it exceeds
1/8 of memory with just the kernel data structures for threads.  It is
not necessary true of every application.  I can easily imagine cases
up around 1/2 of memory where things could work reasonably.

Further we can exhaust all of memory much more simply in a default
configuration by malloc'ing more memory that in physically present
and zeroing it all.

Henrich, you were the one messed with the kernel by breaking a
reasonable kernel tunable.  AKA you caused a regression.  That violates
the no regression rule.

As much as possible we fix regressions so software that used to work
continues to work.  Removing footguns is not a reason to introduce a
regression.

I do agree that Michal's customer's problem sounds like it is something
else but if the kernel did not have a regression we could focus on the
real problem instead of being side tracked by the regression.

> With this patch you allow administrators to set values that will crash
> their system. And they will not even have a way to find out the limits
> which he should adhere to. So expect a lot of systems to be downed
> this way.

Nope.  The system administrator just setting a higher value whon't crash
their system.  Only using that many resources would crash the system.

Nor is a sysctl like this for discovering the physical limits of a
machine.  Which the current value is vastly inappropriate for.  As
the physical limits of many machines are much higher than 1/8 of memory.

Eric

Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return

2019-09-22 Thread Matt Cover

On Sun, Sep 22, 2019 at 1:36 PM Michael S. Tsirkin  wrote:
>
> On Sun, Sep 22, 2019 at 10:43:19AM -0700, Matt Cover wrote:
> > On Sun, Sep 22, 2019 at 5:37 AM Michael S. Tsirkin  wrote:
> > >
> > > On Fri, Sep 20, 2019 at 11:58:43AM -0700, Matthew Cover wrote:
> > > > Treat a negative return from a TUNSETSTEERINGEBPF bpf prog as a signal
> > > > to fallback to tun_automq_select_queue() for tx queue selection.
> > > >
> > > > Compilation of this exact patch was tested.
> > > >
> > > > For functional testing 3 additional printk()s were added.
> > > >
> > > > Functional testing results (on 2 txq tap device):
> > > >
> > > >   [Fri Sep 20 18:33:27 2019] == tun no prog ==
> > > >   [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned 
> > > > '-1'
> > > >   [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran
> > > >   [Fri Sep 20 18:33:27 2019] == tun prog -1 ==
> > > >   [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned 
> > > > '-1'
> > > >   [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned 
> > > > '-1'
> > > >   [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran
> > > >   [Fri Sep 20 18:33:27 2019] == tun prog 0 ==
> > > >   [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned 
> > > > '0'
> > > >   [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned 
> > > > '0'
> > > >   [Fri Sep 20 18:33:27 2019] == tun prog 1 ==
> > > >   [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned 
> > > > '1'
> > > >   [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned 
> > > > '1'
> > > >   [Fri Sep 20 18:33:27 2019] == tun prog 2 ==
> > > >   [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned 
> > > > '2'
> > > >   [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned 
> > > > '0'
> > > >
> > > > Signed-off-by: Matthew Cover 
> > >
> > >
> > > Could you add a bit more motivation data here?
> >
> > Thank you for these questions Michael.
> >
> > I'll plan on adding the below information to the
> > commit message and submitting a v2 of this patch
> > when net-next reopens. In the meantime, it would
> > be very helpful to know if these answers address
> > some of your concerns.
> >
> > > 1. why is this a good idea
> >
> > This change allows TUNSETSTEERINGEBPF progs to
> > do any of the following.
> >  1. implement queue selection for a subset of
> > traffic (e.g. special queue selection logic
> > for ipv4, but return negative and use the
> > default automq logic for ipv6)
> >  2. determine there isn't sufficient information
> > to do proper queue selection; return
> > negative and use the default automq logic
> > for the unknown
> >  3. implement a noop prog (e.g. do
> > bpf_trace_printk() then return negative and
> > use the default automq logic for everything)
> >
> > > 2. how do we know existing userspace does not rely on existing behaviour
> >
> > Prior to this change a negative return from a
> > TUNSETSTEERINGEBPF prog would have been cast
> > into a u16 and traversed netdev_cap_txqueue().
> >
> > In most cases netdev_cap_txqueue() would have
> > found this value to exceed real_num_tx_queues
> > and queue_index would be updated to 0.
> >
> > It is possible that a TUNSETSTEERINGEBPF prog
> > return a negative value which when cast into a
> > u16 results in a positive queue_index less than
> > real_num_tx_queues. For example, on x86_64, a
> > return value of -65535 results in a queue_index
> > of 1; which is a valid queue for any multiqueue
> > device.
> >
> > It seems unlikely, however as stated above is
> > unfortunately possible, that existing
> > TUNSETSTEERINGEBPF programs would choose to
> > return a negative value rather than return the
> > positive value which holds the same meaning.
> >
> > It seems more likely that future
> > TUNSETSTEERINGEBPF programs would leverage a
> > negative return and potentially be loaded into
> > a kernel with the old behavior.
>
> OK if we are returning a special
> value, shouldn't we limit it? How about a special
> value with this meaning?
> If we are changing an ABI let's at least make it
> extensible.
>

A special value with this meaning sounds
good to me. I'll plan on adding a define
set to -1 to cause the fallback to automq.

The way I was initially viewing the old
behavior was that returning negative was
undefined; it happened to have the
outcomes I walked through, but not
necessarily by design.

In order to keep the new behavior
extensible, how should we state that a
negative return other than -1 is
undefined and therefore subject to
change. Is something like this
sufficient?

  Documentation/networking/tc-actions-env-rules.txt

Additionally, what should the new
behavior implement when a negative other
than -1 is returned? I would like to have
it do the same thing as -1 for now, but
with the

Re: [PATCH] serdev: Add ACPI devices by ResourceSource field

2019-09-22 Thread Maximilian Luz


Hi all,

On 9/20/19 5:00 PM, Hans de Goede wrote:

So as promised I've given this patch a try, unfortunately it breaks
existing users of ACPI serdev device instantation.

After adding this patch "ls /sys/bus/serial/devices" is empty,
where as before it gives:

[root@dhcp-45-50 ~]# ls -l /sys/bus/serial/devices/
total 0
lrwxrwxrwx. 1 root root 0 Sep 20 16:43 serial0 -> 
../../../devices/pci:00/8086228A:00/serial0
lrwxrwxrwx. 1 root root 0 Sep 20 16:43 serial0-0 -> 
../../../devices/pci:00/8086228A:00/serial0/serial0-0

And since the serdev is missing bluetooth does not work.


Thanks to some testing by Hans, it turns out that the reason for this is
that acpi_walk_resources fails with AE_AML_INVALID_RESOURCE_TYPE for a
specific device. If anyone is interested, the _CRS of the device in
question is

Method (_CRS, 0, NotSerialized)  // _CRS: Current Resource Settings
{
Name (NAM, Buffer (0x14)
{
"\\_SB.PCI0.SDHB.BRC1"
})
Name (SPB, Buffer (0x0C)
{
/*  */  0x8E, 0x1D, 0x00, 0x01, 0x00, 0xC0, 0x02, 0x00,
/* 0008 */  0x00, 0x01, 0x00, 0x00
})
Name (END, ResourceTemplate ()
{
})
Concatenate (SPB, NAM, Local0)
Concatenate (Local0, END, Local1)
Return (Local1)
}

To solve this, I propose ignoring errors that occur when evaluating the
_CRS method. Note that with the previously discussed change for v2,
where we will only look at the first device in _CRS, we should be able
to handle errors from the actual serdev device allocation separately
(and only ignore AML evaluation errors).

Further, I think it might also make sense to move the status and
already-enumerated checks out of acpi_serdev_register_device to before
looking at _CRS. On one hand, this might save us from unnecessarily
checking the _CRS on devices that are not present, but on the other
hand, moving the status check may cause more AML code execution, as we'd
be checking the status of every device, even if it doesn't have a _CRS.
Maybe a better solution would be something like: Check if device has
already been enumerated, then check for _CRS presence, then for
status/device-presence, and finally look at _CRS contents and
potentially allocate serdev client?

Regards,

Maximilian

Re: [PATCH] net: dsa: b53: Use the correct style for SPDX License Identifier

2019-09-22 Thread Jakub Kicinski

On Sat, 21 Sep 2019 19:00:16 +0530, Nishad Kamdar wrote:
> This patch corrects the SPDX License Identifier style
> in header file for Broadcom BCM53xx managed switch driver.
> For C header files Documentation/process/license-rules.rst
> mandates C-like comments (opposed to C source files where
> C++ style should be used)
> 
> Changes made by using a script provided by Joe Perches here:
> https://lkml.org/lkml/2019/2/7/46.
> 
> Suggested-by: Joe Perches 
> Signed-off-by: Nishad Kamdar 

Applied, thank you!

Re: [PATCH] net: dsa: Use the correct style for SPDX License Identifier

2019-09-22 Thread Jakub Kicinski

On Sat, 21 Sep 2019 19:15:25 +0530, Nishad Kamdar wrote:
> This patch corrects the SPDX License Identifier style
> in header file for Distributed Switch Architecture drivers.
> For C header files Documentation/process/license-rules.rst
> mandates C-like comments (opposed to C source files where
> C++ style should be used)
> 
> Changes made by using a script provided by Joe Perches here:
> https://lkml.org/lkml/2019/2/7/46.
> 
> Suggested-by: Joe Perches 
> Signed-off-by: Nishad Kamdar 

Applied, thank you!

Re: [PATCH net] net: stmmac: selftests: Flow Control test can also run with ASYM Pause

2019-09-22 Thread Jakub Kicinski

On Thu, 19 Sep 2019 12:09:49 +0200, Jose Abreu wrote:
> The Flow Control selftest is also available with ASYM Pause. Lets add
> this check to the test and fix eventual false positive failures.
> 
> Fixes: 091810dbded9 ("net: stmmac: Introduce selftests support")
> Signed-off-by: Jose Abreu 

Hi Jose! 

Thanks for the patch it looks good, seems like you posted it from 
a slightly different email address than was used for signoff:

From: Jose Abreu 
vs
Signed-off-by: Jose Abreu 

Could you please fix and repost? Automation may get upset otherwise.

Re: [PATCH 0/5] net: ethernet: stmmac: some fixes and optimization

2019-09-22 Thread Jakub Kicinski

On Fri, 20 Sep 2019 07:38:12 +0200, Christophe Roullier wrote:
> Some improvements (manage syscfg as optional clock, update slew rate of
> ETH_MDIO pin, Enable gating of the MAC TX clock during TX low-power mode)
> Fix warning build message when W=1

There seems to be some new features/cleanups (or improvements as
you say) here. Could you explain the negative impact not applying 
these changes will have? Patches 1 and 3 in particular.

net-next is now closed [1], and will reopen some time after the merge
window is over. For now we are only expecting fixes for the net tree.

Could you (a) provide stronger motivation these changes are fixes; or
(b) separate the fixes from improvements?

Thank you!

[1] https://www.kernel.org/doc/html/latest/networking/netdev-FAQ.html

[PATCH v1] clk: tegra20/30: Optimize PLLX configuration restoring

2019-09-22 Thread Dmitry Osipenko

There is no need to re-configure PLLX if its configuration in unchanged
on return from suspend / cpuidle, this saves 300us if PLLX is already
enabled (common case for cpuidle).

Signed-off-by: Dmitry Osipenko 
---
 drivers/clk/tegra/clk-tegra20.c | 25 -
 drivers/clk/tegra/clk-tegra30.c | 25 -
 2 files changed, 32 insertions(+), 18 deletions(-)

diff --git a/drivers/clk/tegra/clk-tegra20.c b/drivers/clk/tegra/clk-tegra20.c
index cceefbd67a3b..4d8222f5c638 100644
--- a/drivers/clk/tegra/clk-tegra20.c
+++ b/drivers/clk/tegra/clk-tegra20.c
@@ -955,6 +955,7 @@ static void tegra20_cpu_clock_suspend(void)
 static void tegra20_cpu_clock_resume(void)
 {
unsigned int reg, policy;
+   u32 misc, base;
 
/* Is CPU complex already running on PLLX? */
reg = readl(clk_base + CCLK_BURST_POLICY);
@@ -968,15 +969,21 @@ static void tegra20_cpu_clock_resume(void)
BUG();
 
if (reg != CCLK_BURST_POLICY_PLLX) {
-   /* restore PLLX settings if CPU is on different PLL */
-   writel(tegra20_cpu_clk_sctx.pllx_misc,
-   clk_base + PLLX_MISC);
-   writel(tegra20_cpu_clk_sctx.pllx_base,
-   clk_base + PLLX_BASE);
-
-   /* wait for PLL stabilization if PLLX was enabled */
-   if (tegra20_cpu_clk_sctx.pllx_base & (1 << 30))
-   udelay(300);
+   misc = readl_relaxed(clk_base + PLLX_MISC);
+   base = readl_relaxed(clk_base + PLLX_BASE);
+
+   if (misc != tegra20_cpu_clk_sctx.pllx_misc ||
+   base != tegra20_cpu_clk_sctx.pllx_base) {
+   /* restore PLLX settings if CPU is on different PLL */
+   writel(tegra20_cpu_clk_sctx.pllx_misc,
+   clk_base + PLLX_MISC);
+   writel(tegra20_cpu_clk_sctx.pllx_base,
+   clk_base + PLLX_BASE);
+
+   /* wait for PLL stabilization if PLLX was enabled */
+   if (tegra20_cpu_clk_sctx.pllx_base & (1 << 30))
+   udelay(300);
+   }
}
 
/*
diff --git a/drivers/clk/tegra/clk-tegra30.c b/drivers/clk/tegra/clk-tegra30.c
index a19840fac716..3b5bca44b7aa 100644
--- a/drivers/clk/tegra/clk-tegra30.c
+++ b/drivers/clk/tegra/clk-tegra30.c
@@ -1135,6 +1135,7 @@ static void tegra30_cpu_clock_suspend(void)
 static void tegra30_cpu_clock_resume(void)
 {
unsigned int reg, policy;
+   u32 misc, base;
 
/* Is CPU complex already running on PLLX? */
reg = readl(clk_base + CLK_RESET_CCLK_BURST);
@@ -1148,15 +1149,21 @@ static void tegra30_cpu_clock_resume(void)
BUG();
 
if (reg != CLK_RESET_CCLK_BURST_POLICY_PLLX) {
-   /* restore PLLX settings if CPU is on different PLL */
-   writel(tegra30_cpu_clk_sctx.pllx_misc,
-   clk_base + CLK_RESET_PLLX_MISC);
-   writel(tegra30_cpu_clk_sctx.pllx_base,
-   clk_base + CLK_RESET_PLLX_BASE);
-
-   /* wait for PLL stabilization if PLLX was enabled */
-   if (tegra30_cpu_clk_sctx.pllx_base & (1 << 30))
-   udelay(300);
+   misc = readl_relaxed(clk_base + CLK_RESET_PLLX_MISC);
+   base = readl_relaxed(clk_base + CLK_RESET_PLLX_BASE);
+
+   if (misc != tegra30_cpu_clk_sctx.pllx_misc ||
+   base != tegra30_cpu_clk_sctx.pllx_base) {
+   /* restore PLLX settings if CPU is on different PLL */
+   writel(tegra30_cpu_clk_sctx.pllx_misc,
+   clk_base + CLK_RESET_PLLX_MISC);
+   writel(tegra30_cpu_clk_sctx.pllx_base,
+   clk_base + CLK_RESET_PLLX_BASE);
+
+   /* wait for PLL stabilization if PLLX was enabled */
+   if (tegra30_cpu_clk_sctx.pllx_base & (1 << 30))
+   udelay(300);
+   }
}
 
/*
-- 
2.23.0

Re: [PATCH] kbuild: binrpm-pkg: Propagate O= to rpmbuild

2019-09-22 Thread Jeffrey Hugo


On 9/20/2019 11:30 PM, Masahiro Yamada wrote:

Hi Jeffrey,

On Sat, Sep 21, 2019 at 4:01 AM Jeffrey Hugo  wrote:


If the user specifies O= to indicate a specific output directory for the
build, rpmbuild does not honor this, and will use its default, which could
be the user's home directory.  In cases where the user has limited home
directory space, this could cause the build to outright fail.

In the case of the binrpm-pkg target, redefine the top directory for output
to be what the user specified in O=, thus the user will find a "rpmbuild"
subdirectory in that location with all of the RPM artifacts.

This does not apply to rpm-pkg, since we already cannot handle creating
the source tarball out of tree.

Signed-off-by: Jeffrey Hugo 



binrpm-pkg creates intermediate build artifacts in $(objtree)/,
but puts only the final .rpm into ${HOME}/rpmbuild/RPMS/${ARCH}/.


I disagree with this.  Ubuntu 16.04 with the 4.12 version of the rpm 
packaging utilities will create several directories under rpmbuild in 
the user's home directory -

BUILDROOT  RPMS  SOURCES  SPECS  SRPMS

SOURCES/SPECS/SRPMS are empty for binrpm-pkg.
RPMs contains the final RPMs as you've indicated
BUILDROOT appears to contain more intermediate files which are used to 
then generate the final rpms.


It seems like more than just the final rpms are dumped into the home 
directory.




It has worked like that since a long time before
probably in order to respect the default of rpmbuild.


It still seems inconsistent to me that there is an option defined 
(KBUILD_OUTPUT and O=) which is described to allow the user to specify 
the location of the build output, yet there are parts of the build 
system which do not respect this.


I also find it curious that there is a comment in the makefile this 
patch modifies that seems to suggest that it is intended for O= to 
impact the binrpm-pkg target, however that is currently not the case.





If you change this behavior, it should be consistent.
The 'rpmbuild' should be always located in the kernel tree
instead of the user's home directory.


I'm ok with this.  The current patch attempts to preserve current 
behavior in the default case of not specifying an output directory, but 
dumping everything in the kernel tree seems sane to me.




But, doing so might give impact to other users who
rely on having 'rpmbuild' in the home directory.
I have to hear opinions from others
if this change is desired.


Fair enough.



Meanwhile, if you are unhappy with that, one solution is to use RPMOPTS.
RPMOPTS exists to tweak the default behavior.


I don't see this documented anywhere.  I'm assuming that is supposed to 
be an environment variable.  I'll have to see how well that works with 
our automated build systems.  Command line options are generally preferred.





Thanks.



---
  scripts/Makefile.package | 12 +---
  1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/scripts/Makefile.package b/scripts/Makefile.package
index 56eadcc..aab0711 100644
--- a/scripts/Makefile.package
+++ b/scripts/Makefile.package
@@ -21,7 +21,7 @@ include $(srctree)/scripts/Kbuild.include
  # - Use /. to avoid tar packing just the symlink

  # Note that the rpm-pkg target cannot be used with KBUILD_OUTPUT,
-# but the binrpm-pkg target can; for some reason O= gets ignored.
+# but the binrpm-pkg target can

  # Remove hyphens since they have special meaning in RPM filenames
  KERNELPATH := kernel-$(subst -,_,$(KERNELRELEASE))
@@ -33,6 +33,12 @@ TAR_CONTENT := $(KBUILD_ALLDIRS) .config .scmversion 
Makefile \
 Kbuild Kconfig COPYING $(wildcard localversion*)
  MKSPEC := $(srctree)/scripts/package/mkspec

+RPM_OUTDIR :=
+ifneq ($(objtree),$(srctree))
+# Using absolute path as relative path will cause parts of rpmbuild to fail
+RPM_OUTDIR := --define "_topdir $(abs_objtree)/rpmbuild"
+endif
+
  quiet_cmd_src_tar = TAR $(2).tar.gz
cmd_src_tar = \
  if test "$(objtree)" != "$(srctree)"; then \
@@ -65,8 +71,8 @@ PHONY += binrpm-pkg
  binrpm-pkg:
 $(MAKE) -f $(srctree)/Makefile
 $(CONFIG_SHELL) $(MKSPEC) prebuilt > $(objtree)/binkernel.spec
-   +rpmbuild $(RPMOPTS) --define "_builddir $(objtree)" --target \
-   $(UTS_MACHINE) -bb $(objtree)/binkernel.spec
+   +rpmbuild $(RPMOPTS) --define "_builddir $(objtree)" $(RPM_OUTDIR) \
+   --target $(UTS_MACHINE) -bb $(objtree)/binkernel.spec

  PHONY += deb-pkg
  deb-pkg:
--
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.




--
Best Regards
Masahiro Yamada




--
Jeffrey Hugo
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

Re: [PATCH 30/32] tools lib bpf: Renaming pr_warning to pr_warn

2019-09-22 Thread Andrii Nakryiko

On Fri, Sep 20, 2019 at 10:06 AM Kefeng Wang  wrote:
>
> For kernel logging macro, pr_warning is completely removed and
> replaced by pr_warn, using pr_warn in tools lib bpf for symmetry
> to kernel logging macro, then we could drop pr_warning in the
> whole linux code.
>
> Cc: Alexei Starovoitov 
> Cc: Daniel Borkmann 
> Cc: Martin KaFai Lau 
> Cc: Song Liu 
> Cc: Yonghong Song 
> Cc: b...@vger.kernel.org
> Signed-off-by: Kefeng Wang 
> ---
>  tools/lib/bpf/btf.c |  56 +--
>  tools/lib/bpf/btf_dump.c|  20 +-
>  tools/lib/bpf/libbpf.c  | 652 
>  tools/lib/bpf/libbpf_internal.h |   2 +-
>  tools/lib/bpf/xsk.c |   4 +-
>  5 files changed, 363 insertions(+), 371 deletions(-)
>

Thanks! This will allow to get rid of tons warnings from checkpatch.pl.

Alexei, Daniel, can we take this through bpf-next tree once it's open?

Acked-by: Andrii Nakryiko 

[...]

Re: [PATCH v2 net] net: ena: Select DIMLIB for ENA_ETHERNET

2019-09-22 Thread Jakub Kicinski

On Sun, 22 Sep 2019 13:38:08 +0800, Mao Wenan wrote:
> If CONFIG_ENA_ETHERNET=y and CONFIG_DIMLIB=n,
> below erros can be found:
> drivers/net/ethernet/amazon/ena/ena_netdev.o: In function `ena_dim_work':
> ena_netdev.c:(.text+0x21cc): undefined reference to 
> `net_dim_get_rx_moderation'
> ena_netdev.c:(.text+0x21cc): relocation truncated to
> fit: R_AARCH64_CALL26 against undefined symbol `net_dim_get_rx_moderation'
> drivers/net/ethernet/amazon/ena/ena_netdev.o: In function `ena_io_poll':
> ena_netdev.c:(.text+0x7bd4): undefined reference to `net_dim'
> ena_netdev.c:(.text+0x7bd4): relocation truncated to fit:
> R_AARCH64_CALL26 against undefined symbol `net_dim'
> 
> After commit 282faf61a053 ("net: ena: switch to dim algorithm for rx adaptive
> interrupt moderation"), it introduces dim algorithm, which configured by 
> CONFIG_DIMLIB.
> So, this patch is to select DIMLIB for ENA_ETHERNET.
> 
> Fixes: 282faf61a053 ("net: ena: switch to dim algorithm for rx adaptive 
> interrupt moderation")
> Signed-off-by: Mao Wenan 

Applied, thank you!

[GIT PULL] livepatching for 5.4

2019-09-22 Thread Jiri Kosina

Linus,

please pull from

  git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching.git 
for-linus

to receive livepatching subsystem update:

=
- error handling fix in livepatching module notifier, from Miroslav Benes
=

Thanks.


Miroslav Benes (1):
  livepatch: Nullify obj->mod in klp_module_coming()'s error path

 kernel/livepatch/core.c | 1 +
 1 file changed, 1 insertion(+)

-- 
Jiri Kosina
SUSE Labs

[GIT PULL] HID for 5.4

2019-09-22 Thread Jiri Kosina

Linus,

please pull from

  git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid.git for-linus

to receive merge window updates for HID subsystem:

=
- syzbot memory corruption fixes for hidraw, Prodikeys, Logitech and Sony 
  drivers from Alan Stern and Roderick Colenbrander
- stuck 'fn' key fix for hid-apple from Joao Moreno
- proper propagation of EPOLLOUT from hiddev and hidraw, from
  Fabian Henneke
- fixes for handling power management for intel-ish devices with NO_D3 
  flag set, from Zhang Lixu
- extension of supported usage range for customer page, as some Logitech 
  devices are actually making use of it. From Olivier Gay.
- hid-multitouch is no longer filtering mice node creation, from Benjamin 
  Tissoires
- MobileStudio Pro 13 support, from Ping Cheng
- a few other device ID additions and assorted smaller fixes
=

Thanks.


Aaron Armstrong Skomra (1):
  HID: wacom: support named keys on older devices

Alan Stern (3):
  HID: hidraw: Fix invalid read in hidraw_ioctl
  HID: logitech: Fix general protection fault caused by Logitech driver
  HID: prodikeys: Fix general protection fault during probe

Bastien Nocera (1):
  HID: sb0540: add support for Creative SB0540 IR receivers

Benjamin Tissoires (5):
  HID: wacom: do not call hid_set_drvdata(hdev, NULL)
  HID: do not call hid_set_drvdata(hdev, NULL) in drivers
  HID: multitouch: do not filter mice nodes
  HID: multitouch: add support for the Smart Tech panel
  HID: logitech-dj: add support of the G700(s) receiver

Fabian Henneke (2):
  hidraw: Return EPOLLOUT from hidraw_poll
  hiddev: Return EPOLLOUT from hiddev_poll

Filipe Laíns (1):
  hid-logitech-dj: add the new Lightspeed receiver

Hans de Goede (1):
  HID: logitech-dj: Fix crash when initial 
logi_dj_recv_query_paired_devices fails

HungNien Chen (1):
  HID: i2c-hid: modify quirks for weida's devices

Jason Gerecke (1):
  HID: wacom: Fix several minor compiler warnings

Joao Moreno (1):
  HID: apple: Fix stuck function keys when using FN

Joshua Clayton (3):
  HID: core: reformat and reduce hid_printk macros
  HID: core: Add printk_once variants to hid_warn() etc
  HID: core: fix dmesg flooding if report field larger than 32bit

Olivier Gay (1):
  HID: logitech-dj: extend consumer usages range

Ping Cheng (1):
  HID: wacom: add new MobileStudio Pro 13 support

Roderick Colenbrander (1):
  HID: sony: Fix memory corruption issue on cleanup.

Sebastian Parschauer (1):
  HID: Add quirk for HP X500 PIXART OEM mouse

Zhang Lixu (3):
  HID: intel-ish-hid: ipc: set NO_D3 flag only when needed
  HID: intel-ish-hid: ipc: make ish suspend paths clear
  HID: intel-ish-hid: ipc: check the NO_D3 flag to distinguish resume paths

 MAINTAINERS |   6 +
 drivers/hid/Kconfig |   9 ++
 drivers/hid/Makefile|   1 +
 drivers/hid/hid-apple.c |  49 +++---
 drivers/hid/hid-core.c  |   4 +-
 drivers/hid/hid-cougar.c|   6 +-
 drivers/hid/hid-creative-sb0540.c   | 268 
 drivers/hid/hid-gfrm.c  |   7 -
 drivers/hid/hid-ids.h   |   5 +-
 drivers/hid/hid-lenovo.c|   2 -
 drivers/hid/hid-lg.c|  10 +-
 drivers/hid/hid-lg4ff.c |   1 -
 drivers/hid/hid-logitech-dj.c   |  32 ++--
 drivers/hid/hid-multitouch.c|  37 -
 drivers/hid/hid-picolcd_core.c  |   7 +-
 drivers/hid/hid-prodikeys.c |  12 +-
 drivers/hid/hid-quirks.c|   1 +
 drivers/hid/hid-sensor-hub.c|   1 -
 drivers/hid/hid-sony.c  |   2 +-
 drivers/hid/hidraw.c|   4 +-
 drivers/hid/i2c-hid/i2c-hid-core.c  |   4 +-
 drivers/hid/intel-ish-hid/ipc/hw-ish.h  |   1 +
 drivers/hid/intel-ish-hid/ipc/ipc.c |   2 +-
 drivers/hid/intel-ish-hid/ipc/pci-ish.c |  95 ++-
 drivers/hid/usbhid/hiddev.c |   2 +-
 drivers/hid/wacom_sys.c |  25 ++-
 drivers/hid/wacom_wac.c |  76 -
 include/linux/hid.h |  43 ++---
 28 files changed, 561 insertions(+), 151 deletions(-)
 create mode 100644 drivers/hid/hid-creative-sb0540.c

-- 
Jiri Kosina
SUSE Labs

Re: [PATCH v2 2/2] rtc: wilco-ec: Fix license to GPL from GPLv2

2019-09-22 Thread Alexandre Belloni

On 22/09/2019 22:29:48+0200, Pavel Machek wrote:
> On Mon 2019-09-16 12:12:17, Nick Crews wrote:
> > Signed-off-by: Nick Crews 
> > ---
> >  drivers/rtc/rtc-wilco-ec.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/rtc/rtc-wilco-ec.c b/drivers/rtc/rtc-wilco-ec.c
> > index e84faa268caf..951268f5e690 100644
> > --- a/drivers/rtc/rtc-wilco-ec.c
> > +++ b/drivers/rtc/rtc-wilco-ec.c
> > @@ -184,5 +184,5 @@ module_platform_driver(wilco_ec_rtc_driver);
> >  
> >  MODULE_ALIAS("platform:rtc-wilco-ec");
> >  MODULE_AUTHOR("Nick Crews ");
> > -MODULE_LICENSE("GPL v2");
> > +MODULE_LICENSE("GPL");
> >  MODULE_DESCRIPTION("Wilco EC RTC driver");
> 
> File spdx header says GPL-2.0, this change would make it inconsistent with 
> that...
> 

Commit bf7fbeeae6db ("module: Cure the MODULE_LICENSE "GPL" vs. "GPL v2"
bogosity") doesn't agree with you (but I was surprised too).


-- 
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return

2019-09-22 Thread Michael S. Tsirkin

On Sun, Sep 22, 2019 at 10:43:19AM -0700, Matt Cover wrote:
> On Sun, Sep 22, 2019 at 5:37 AM Michael S. Tsirkin  wrote:
> >
> > On Fri, Sep 20, 2019 at 11:58:43AM -0700, Matthew Cover wrote:
> > > Treat a negative return from a TUNSETSTEERINGEBPF bpf prog as a signal
> > > to fallback to tun_automq_select_queue() for tx queue selection.
> > >
> > > Compilation of this exact patch was tested.
> > >
> > > For functional testing 3 additional printk()s were added.
> > >
> > > Functional testing results (on 2 txq tap device):
> > >
> > >   [Fri Sep 20 18:33:27 2019] == tun no prog ==
> > >   [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1'
> > >   [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran
> > >   [Fri Sep 20 18:33:27 2019] == tun prog -1 ==
> > >   [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '-1'
> > >   [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1'
> > >   [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran
> > >   [Fri Sep 20 18:33:27 2019] == tun prog 0 ==
> > >   [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '0'
> > >   [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0'
> > >   [Fri Sep 20 18:33:27 2019] == tun prog 1 ==
> > >   [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '1'
> > >   [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '1'
> > >   [Fri Sep 20 18:33:27 2019] == tun prog 2 ==
> > >   [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '2'
> > >   [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0'
> > >
> > > Signed-off-by: Matthew Cover 
> >
> >
> > Could you add a bit more motivation data here?
> 
> Thank you for these questions Michael.
> 
> I'll plan on adding the below information to the
> commit message and submitting a v2 of this patch
> when net-next reopens. In the meantime, it would
> be very helpful to know if these answers address
> some of your concerns.
> 
> > 1. why is this a good idea
> 
> This change allows TUNSETSTEERINGEBPF progs to
> do any of the following.
>  1. implement queue selection for a subset of
> traffic (e.g. special queue selection logic
> for ipv4, but return negative and use the
> default automq logic for ipv6)
>  2. determine there isn't sufficient information
> to do proper queue selection; return
> negative and use the default automq logic
> for the unknown
>  3. implement a noop prog (e.g. do
> bpf_trace_printk() then return negative and
> use the default automq logic for everything)
> 
> > 2. how do we know existing userspace does not rely on existing behaviour
> 
> Prior to this change a negative return from a
> TUNSETSTEERINGEBPF prog would have been cast
> into a u16 and traversed netdev_cap_txqueue().
> 
> In most cases netdev_cap_txqueue() would have
> found this value to exceed real_num_tx_queues
> and queue_index would be updated to 0.
> 
> It is possible that a TUNSETSTEERINGEBPF prog
> return a negative value which when cast into a
> u16 results in a positive queue_index less than
> real_num_tx_queues. For example, on x86_64, a
> return value of -65535 results in a queue_index
> of 1; which is a valid queue for any multiqueue
> device.
> 
> It seems unlikely, however as stated above is
> unfortunately possible, that existing
> TUNSETSTEERINGEBPF programs would choose to
> return a negative value rather than return the
> positive value which holds the same meaning.
> 
> It seems more likely that future
> TUNSETSTEERINGEBPF programs would leverage a
> negative return and potentially be loaded into
> a kernel with the old behavior.

OK if we are returning a special
value, shouldn't we limit it? How about a special
value with this meaning?
If we are changing an ABI let's at least make it
extensible.

> > 3. why doesn't userspace need a way to figure out whether it runs on a 
> > kernel with and
> >without this patch
> 
> There may be some value in exposing this fact
> to the ebpf prog loader. What is the standard
> practice here, a define?


We'll need something at runtime - people move binaries between kernels
without rebuilding then. An ioctl is one option.
A sysfs attribute is another, an ethtool flag yet another.
A combination of these is possible.

And if we are doing this anyway, maybe let userspace select
the new behaviour? This way we can stay compatible with old
userspace...

> >
> >
> > thanks,
> > MST
> >
> > > ---
> > >  drivers/net/tun.c | 20 +++-
> > >  1 file changed, 11 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > > index aab0be4..173d159 100644
> > > --- a/drivers/net/tun.c
> > > +++ b/drivers/net/tun.c
> > > @@ -583,35 +583,37 @@ static u16 tun_automq_select_queue(struct 
> > > tun_struct *tun, struct sk_buff *skb)
> > >

Re: [PATCH v2 2/2] rtc: wilco-ec: Fix license to GPL from GPLv2

2019-09-22 Thread Pavel Machek

On Mon 2019-09-16 12:12:17, Nick Crews wrote:
> Signed-off-by: Nick Crews 
> ---
>  drivers/rtc/rtc-wilco-ec.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/rtc/rtc-wilco-ec.c b/drivers/rtc/rtc-wilco-ec.c
> index e84faa268caf..951268f5e690 100644
> --- a/drivers/rtc/rtc-wilco-ec.c
> +++ b/drivers/rtc/rtc-wilco-ec.c
> @@ -184,5 +184,5 @@ module_platform_driver(wilco_ec_rtc_driver);
>  
>  MODULE_ALIAS("platform:rtc-wilco-ec");
>  MODULE_AUTHOR("Nick Crews ");
> -MODULE_LICENSE("GPL v2");
> +MODULE_LICENSE("GPL");
>  MODULE_DESCRIPTION("Wilco EC RTC driver");

File spdx header says GPL-2.0, this change would make it inconsistent with 
that...

Best regards,

Pavel

Re: [PATCH AUTOSEL 5.3 169/203] x86/platform/uv: Fix kmalloc() NULL check routine

2019-09-22 Thread Greg KH

On Sun, Sep 22, 2019 at 02:43:15PM -0400, Sasha Levin wrote:
> From: Austin Kim 
> 
> [ Upstream commit 864b23f0169d5bff677e8443a7a90dfd6b090afc ]
> 
> The result of kmalloc() should have been checked ahead of below statement:
> 
>   pqp = (struct bau_pq_entry *)vp;
> 
> Move BUG_ON(!vp) before above statement.
> 
> Signed-off-by: Austin Kim 
> Cc: Dimitri Sivanich 
> Cc: Hedi Berriche 
> Cc: Linus Torvalds 
> Cc: Mike Travis 
> Cc: Peter Zijlstra 
> Cc: Russ Anderson 
> Cc: Steve Wahl 
> Cc: Thomas Gleixner 
> Cc: alli...@lohutok.net
> Cc: a...@infradead.org
> Cc: arm...@tjaldur.nl
> Cc: b...@alien8.de
> Cc: dvh...@infradead.org
> Cc: gre...@linuxfoundation.org
> Cc: h...@zytor.com
> Cc: k...@umn.edu
> Cc: platform-driver-...@vger.kernel.org
> Link: https://lkml.kernel.org/r/20190905232951.GA28779@LGEARND20B15
> Signed-off-by: Ingo Molnar 
> Signed-off-by: Sasha Levin 
> ---
>  arch/x86/platform/uv/tlb_uv.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/platform/uv/tlb_uv.c b/arch/x86/platform/uv/tlb_uv.c
> index 20c389a91b803..5f0a96bf27a1f 100644
> --- a/arch/x86/platform/uv/tlb_uv.c
> +++ b/arch/x86/platform/uv/tlb_uv.c
> @@ -1804,9 +1804,9 @@ static void pq_init(int node, int pnode)
>  
>   plsize = (DEST_Q_SIZE + 1) * sizeof(struct bau_pq_entry);
>   vp = kmalloc_node(plsize, GFP_KERNEL, node);
> - pqp = (struct bau_pq_entry *)vp;
> - BUG_ON(!pqp);
> + BUG_ON(!vp);
>  
> + pqp = (struct bau_pq_entry *)vp;
>   cp = (char *)pqp + 31;
>   pqp = (struct bau_pq_entry *)(((unsigned long)cp >> 5) << 5);
>  

How did this even get merged in the first place?  I thought a number of
us complained about it.

This isn't any change in code, and the original is just fine, the author
didn't realize how C works :(

Please drop this.

thanks,

greg k-h

Re: [PATCH AUTOSEL 5.2 154/185] x86/platform/uv: Fix kmalloc() NULL check routine

2019-09-22 Thread Greg KH

On Sun, Sep 22, 2019 at 02:48:52PM -0400, Sasha Levin wrote:
> From: Austin Kim 
> 
> [ Upstream commit 864b23f0169d5bff677e8443a7a90dfd6b090afc ]
> 
> The result of kmalloc() should have been checked ahead of below statement:
> 
>   pqp = (struct bau_pq_entry *)vp;
> 
> Move BUG_ON(!vp) before above statement.
> 
> Signed-off-by: Austin Kim 
> Cc: Dimitri Sivanich 
> Cc: Hedi Berriche 
> Cc: Linus Torvalds 
> Cc: Mike Travis 
> Cc: Peter Zijlstra 
> Cc: Russ Anderson 
> Cc: Steve Wahl 
> Cc: Thomas Gleixner 
> Cc: alli...@lohutok.net
> Cc: a...@infradead.org
> Cc: arm...@tjaldur.nl
> Cc: b...@alien8.de
> Cc: dvh...@infradead.org
> Cc: gre...@linuxfoundation.org
> Cc: h...@zytor.com
> Cc: k...@umn.edu
> Cc: platform-driver-...@vger.kernel.org
> Link: https://lkml.kernel.org/r/20190905232951.GA28779@LGEARND20B15
> Signed-off-by: Ingo Molnar 
> Signed-off-by: Sasha Levin 
> ---
>  arch/x86/platform/uv/tlb_uv.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/platform/uv/tlb_uv.c b/arch/x86/platform/uv/tlb_uv.c
> index 0c7dfec4acac8..5a66d73620ce4 100644
> --- a/arch/x86/platform/uv/tlb_uv.c
> +++ b/arch/x86/platform/uv/tlb_uv.c
> @@ -1815,9 +1815,9 @@ static void pq_init(int node, int pnode)
>  
>   plsize = (DEST_Q_SIZE + 1) * sizeof(struct bau_pq_entry);
>   vp = kmalloc_node(plsize, GFP_KERNEL, node);
> - pqp = (struct bau_pq_entry *)vp;
> - BUG_ON(!pqp);
> + BUG_ON(!vp);
>  
> + pqp = (struct bau_pq_entry *)vp;
>   cp = (char *)pqp + 31;
>   pqp = (struct bau_pq_entry *)(((unsigned long)cp >> 5) << 5);
>  

Please drop from everywhere.

thanks,

greg k-h

[PATCH 4/4] MIPS: JZ4780: DTS: Add CPU nodes

2019-09-22 Thread Alexandre GRIVEAUX

The JZ4780 have 2 core, adding to DT.

Signed-off-by: Alexandre GRIVEAUX 
---
 arch/mips/boot/dts/ingenic/jz4780.dtsi | 17 +
 1 file changed, 17 insertions(+)

diff --git a/arch/mips/boot/dts/ingenic/jz4780.dtsi 
b/arch/mips/boot/dts/ingenic/jz4780.dtsi
index f928329b034b..9c7346724f1f 100644
--- a/arch/mips/boot/dts/ingenic/jz4780.dtsi
+++ b/arch/mips/boot/dts/ingenic/jz4780.dtsi
@@ -7,6 +7,23 @@
#size-cells = <1>;
compatible = "ingenic,jz4780";
 
+   cpus {
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   cpu@0 {
+   compatible = "ingenic,jz4780";
+   device_type = "cpu";
+   reg = <0>;
+   };
+
+   cpu@1 {
+   compatible = "ingenic,jz4780";
+   device_type = "cpu";
+   reg = <1>;
+   };
+   };
+
cpuintc: interrupt-controller {
#address-cells = <0>;
#interrupt-cells = <1>;
-- 
2.20.1

[PATCH 3/4] MIPS: CI20: DTS: Add Leds

2019-09-22 Thread Alexandre GRIVEAUX

Adding leds and related triggers.

Signed-off-by: Alexandre GRIVEAUX 
---
 arch/mips/boot/dts/ingenic/ci20.dts | 28 
 1 file changed, 28 insertions(+)

diff --git a/arch/mips/boot/dts/ingenic/ci20.dts 
b/arch/mips/boot/dts/ingenic/ci20.dts
index c62c36ae94c2..37b93166bf22 100644
--- a/arch/mips/boot/dts/ingenic/ci20.dts
+++ b/arch/mips/boot/dts/ingenic/ci20.dts
@@ -25,6 +25,34 @@
   0x3000 0x3000>;
};
 
+   leds {
+   compatible = "gpio-leds";
+
+   led0 {
+   label = "ci20:red:led0";
+   gpios = < 3 GPIO_ACTIVE_HIGH>;
+   linux,default-trigger = "none";
+   };
+
+   led1 {
+   label = "ci20:red:led1";
+   gpios = < 2 GPIO_ACTIVE_HIGH>;
+   linux,default-trigger = "nand-disk";
+   };
+
+   led2 {
+   label = "ci20:red:led2";
+   gpios = < 1 GPIO_ACTIVE_HIGH>;
+   linux,default-trigger = "cpu1";
+   };
+
+   led3 {
+   label = "ci20:red:led3";
+   gpios = < 0 GPIO_ACTIVE_HIGH>;
+   linux,default-trigger = "cpu0";
+   };
+   };
+
eth0_power: fixedregulator@0 {
compatible = "regulator-fixed";
regulator-name = "eth0_power";
-- 
2.20.1

[PATCH 2/4] MIPS: CI20: DTS: Add IW8103 Wifi + bluetooth

2019-09-22 Thread Alexandre GRIVEAUX

Add IW8103 Wifi + bluetooth module to device tree and related power domain.

Signed-off-by: Alexandre GRIVEAUX 
---
 arch/mips/boot/dts/ingenic/ci20.dts | 39 +
 1 file changed, 39 insertions(+)

diff --git a/arch/mips/boot/dts/ingenic/ci20.dts 
b/arch/mips/boot/dts/ingenic/ci20.dts
index 4a77fa30a9cd..c62c36ae94c2 100644
--- a/arch/mips/boot/dts/ingenic/ci20.dts
+++ b/arch/mips/boot/dts/ingenic/ci20.dts
@@ -31,6 +31,13 @@
gpio = < 25 GPIO_ACTIVE_LOW>;
enable-active-high;
};
+
+   wlan0_power: fixedregulator@1 {
+   compatible = "regulator-fixed";
+   regulator-name = "wlan0_power";
+   gpio = < 19 GPIO_ACTIVE_LOW>;
+   enable-active-high;
+   };
 };
 
  {
@@ -54,9 +61,18 @@
 
bus-width = <4>;
max-frequency = <5000>;
+   non-removable;
 
pinctrl-names = "default";
pinctrl-0 = <_mmc1>;
+
+   brcmf: wifi@1 {
+/* reg = <4>;*/
+   compatible = "brcm,bcm4330-fmac";
+   vcc-supply = <_power>;
+   device-wakeup-gpios = < 9 GPIO_ACTIVE_HIGH>;
+   shutdown-gpios = < 7 GPIO_ACTIVE_LOW>;
+   };
 };
 
  {
@@ -73,6 +89,23 @@
pinctrl-0 = <_uart1>;
 };
 
+ {
+   status = "okay";
+
+   pinctrl-names = "default";
+   pinctrl-0 = <_uart2>;
+   uart-has-rtscts;
+
+   bluetooth {
+   compatible = "brcm,bcm4330-bt";
+   reset-gpios = < 8 GPIO_ACTIVE_HIGH>;
+   vcc-supply = <_power>;
+   device-wakeup-gpios = < 5 GPIO_ACTIVE_HIGH>;
+   host-wakeup-gpios = < 6 GPIO_ACTIVE_HIGH>;
+   shutdown-gpios = < 4 GPIO_ACTIVE_LOW>;
+   };
+};
+
  {
status = "okay";
 
@@ -314,6 +347,12 @@
bias-disable;
};
 
+   pins_uart2: uart2 {
+   function = "uart2";
+   groups = "uart2-data", "uart2-hwflow";
+   bias-disable;
+   };
+
pins_uart3: uart3 {
function = "uart3";
groups = "uart3-data", "uart3-hwflow";
-- 
2.20.1

Re: [PATCH V7 5/7] cpufreq: Register notifiers with the PM QoS framework

2019-09-22 Thread Dmitry Osipenko

08.07.2019 13:57, Viresh Kumar пишет:
> This registers the notifiers for min/max frequency constraints with the
> PM QoS framework. The constraints are also taken into consideration in
> cpufreq_set_policy().
> 
> This also relocates cpufreq_policy_put_kobj() as it is required to be
> called from cpufreq_policy_alloc() now.
> 
> refresh_frequency_limits() is updated to avoid calling
> cpufreq_set_policy() for inactive policies and handle_update() is
> updated to have proper locking in place.
> 
> No constraints are added until now though.
> 
> Reviewed-by: Matthias Kaehlcke 
> Reviewed-by: Ulf Hansson 
> Signed-off-by: Viresh Kumar 
> Signed-off-by: Rafael J. Wysocki 
> ---
> V6->V7:
> - All callers of refresh_frequency_limits(), except handle_update(),
>   take the policy->rwsem and result in deadlock as
>   refresh_frequency_limits() also takes the same lock again. Fix that
>   by taking the rwsem from handle_update() instead.
> 
> @Rafael: Sending it before Pavel has verified it as I would be offline
> later, in case you want to apply this today itself.
> 
>  drivers/cpufreq/cpufreq.c | 135 +-
>  include/linux/cpufreq.h   |   3 +
>  2 files changed, 108 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index ceb57af15ca0..b96ef6db1bfe 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -26,6 +26,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -999,7 +1000,7 @@ static void add_cpu_dev_symlink(struct cpufreq_policy 
> *policy, unsigned int cpu)
>  {
>   struct device *dev = get_cpu_device(cpu);
>  
> - if (!dev)
> + if (unlikely(!dev))
>   return;
>  
>   if (cpumask_test_and_set_cpu(cpu, policy->real_cpus))
> @@ -1117,14 +1118,16 @@ static int cpufreq_add_policy_cpu(struct 
> cpufreq_policy *policy, unsigned int cp
>  
>  static void refresh_frequency_limits(struct cpufreq_policy *policy)
>  {
> - struct cpufreq_policy new_policy = *policy;
> -
> - pr_debug("updating policy for CPU %u\n", policy->cpu);
> + struct cpufreq_policy new_policy;
>  
> - new_policy.min = policy->user_policy.min;
> - new_policy.max = policy->user_policy.max;
> + if (!policy_is_inactive(policy)) {
> + new_policy = *policy;
> + pr_debug("updating policy for CPU %u\n", policy->cpu);
>  
> - cpufreq_set_policy(policy, _policy);
> + new_policy.min = policy->user_policy.min;
> + new_policy.max = policy->user_policy.max;
> + cpufreq_set_policy(policy, _policy);
> + }
>  }
>  
>  static void handle_update(struct work_struct *work)
> @@ -1133,14 +1136,60 @@ static void handle_update(struct work_struct *work)
>   container_of(work, struct cpufreq_policy, update);
>  
>   pr_debug("handle_update for cpu %u called\n", policy->cpu);
> + down_write(>rwsem);
>   refresh_frequency_limits(policy);
> + up_write(>rwsem);
> +}
> +
> +static int cpufreq_notifier_min(struct notifier_block *nb, unsigned long 
> freq,
> + void *data)
> +{
> + struct cpufreq_policy *policy = container_of(nb, struct cpufreq_policy, 
> nb_min);
> +
> + schedule_work(>update);
> + return 0;
> +}
> +
> +static int cpufreq_notifier_max(struct notifier_block *nb, unsigned long 
> freq,
> + void *data)
> +{
> + struct cpufreq_policy *policy = container_of(nb, struct cpufreq_policy, 
> nb_max);
> +
> + schedule_work(>update);
> + return 0;
> +}
> +
> +static void cpufreq_policy_put_kobj(struct cpufreq_policy *policy)
> +{
> + struct kobject *kobj;
> + struct completion *cmp;
> +
> + down_write(>rwsem);
> + cpufreq_stats_free_table(policy);
> + kobj = >kobj;
> + cmp = >kobj_unregister;
> + up_write(>rwsem);
> + kobject_put(kobj);
> +
> + /*
> +  * We need to make sure that the underlying kobj is
> +  * actually not referenced anymore by anybody before we
> +  * proceed with unloading.
> +  */
> + pr_debug("waiting for dropping of refcount\n");
> + wait_for_completion(cmp);
> + pr_debug("wait complete\n");
>  }
>  
>  static struct cpufreq_policy *cpufreq_policy_alloc(unsigned int cpu)
>  {
>   struct cpufreq_policy *policy;
> + struct device *dev = get_cpu_device(cpu);
>   int ret;
>  
> + if (!dev)
> + return NULL;
> +
>   policy = kzalloc(sizeof(*policy), GFP_KERNEL);
>   if (!policy)
>   return NULL;
> @@ -1157,7 +1206,7 @@ static struct cpufreq_policy 
> *cpufreq_policy_alloc(unsigned int cpu)
>   ret = kobject_init_and_add(>kobj, _cpufreq,
>  cpufreq_global_kobject, "policy%u", cpu);
>   if (ret) {
> - pr_err("%s: failed to init policy->kobj: %d\n", __func__, ret);
> + dev_err(dev, "%s: failed to init

[PATCH 1/4] MIPS: CI20: DTS: Add I2C nodes

2019-09-22 Thread Alexandre GRIVEAUX

Adding missing I2C nodes and some peripheral:
- PMU
- RTC

Signed-off-by: Alexandre GRIVEAUX 
---
 arch/mips/boot/dts/ingenic/ci20.dts | 147 
 1 file changed, 147 insertions(+)

diff --git a/arch/mips/boot/dts/ingenic/ci20.dts 
b/arch/mips/boot/dts/ingenic/ci20.dts
index 2e9952311ecd..4a77fa30a9cd 100644
--- a/arch/mips/boot/dts/ingenic/ci20.dts
+++ b/arch/mips/boot/dts/ingenic/ci20.dts
@@ -87,6 +87,123 @@
pinctrl-0 = <_uart4>;
 };
 
+ {
+   status = "okay";
+
+   pinctrl-names = "default";
+   pinctrl-0 = <_i2c0>;
+
+   clock-frequency = <40>;
+
+   act8600: act8600@5a {
+   compatible = "active-semi,act8600";
+   reg = <0x5a>;
+   status = "okay";
+
+   regulators {
+   vddcore: SUDCDC1 {
+   regulator-name = "VDDCORE";
+   regulator-min-microvolt = <110>;
+   regulator-max-microvolt = <110>;
+   regulator-always-on;
+   };
+   vddmem: SUDCDC2 {
+   regulator-name = "VDDMEM";
+   regulator-min-microvolt = <150>;
+   regulator-max-microvolt = <150>;
+   regulator-always-on;
+   };
+   vcc_33: SUDCDC3 {
+   regulator-name = "VCC33";
+   regulator-min-microvolt = <330>;
+   regulator-max-microvolt = <330>;
+   regulator-always-on;
+   };
+   vcc_50: SUDCDC4 {
+   regulator-name = "VCC50";
+   regulator-min-microvolt = <500>;
+   regulator-max-microvolt = <500>;
+   regulator-always-on;
+   };
+   vcc_25: LDO_REG5 {
+   regulator-name = "VCC25";
+   regulator-min-microvolt = <250>;
+   regulator-max-microvolt = <250>;
+   regulator-always-on;
+   };
+   wifi_io: LDO_REG6 {
+   regulator-name = "WIFIIO";
+   regulator-min-microvolt = <250>;
+   regulator-max-microvolt = <250>;
+   regulator-always-on;
+   };
+   vcc_28: LDO_REG7 {
+   regulator-name = "VCC28";
+   regulator-min-microvolt = <280>;
+   regulator-max-microvolt = <280>;
+   regulator-always-on;
+   };
+   vcc_15: LDO_REG8 {
+   regulator-name = "VCC15";
+   regulator-min-microvolt = <150>;
+   regulator-max-microvolt = <150>;
+   regulator-always-on;
+   };
+   vcc_18: LDO_REG9 {
+   regulator-name = "VCC18";
+   regulator-min-microvolt = <180>;
+   regulator-max-microvolt = <180>;
+   regulator-always-on;
+   };
+   vcc_11: LDO_REG10 {
+   regulator-name = "VCC11";
+   regulator-min-microvolt = <110>;
+   regulator-max-microvolt = <110>;
+   regulator-always-on;
+   };
+   };
+   };
+};
+
+ {
+   status = "okay";
+
+   pinctrl-names = "default";
+   pinctrl-0 = <_i2c1>;
+
+};
+
+ {
+   status = "okay";
+
+   pinctrl-names = "default";
+   pinctrl-0 = <_i2c2>;
+
+};
+
+ {
+   status = "okay";
+
+   pinctrl-names = "default";
+   pinctrl-0 = <_i2c3>;
+
+};
+
+ {
+   status = "okay";
+
+   pinctrl-names = "default";
+   pinctrl-0 = <_i2c4>;
+
+   clock-frequency = <40>;
+
+   rtc@51 {
+   compatible = "nxp,pcf8563";
+   reg = <0x51>;
+   interrupts = <110>;
+   };
+};
+
  {
status = "okay";
 
@@ -209,6 +326,36 @@
bias-disable;
};
 
+   pins_i2c0: i2c0 {
+   function = "i2c0";
+   groups = "i2c0-data";
+   bias-disable;
+   };
+
+   pins_i2c1: i2c1 {
+   function = "i2c1";
+   groups = "i2c1-data";
+

[PATCH 0/4] MIPS: CI20: DTS: Add nodes to Creator CI20 board

2019-09-22 Thread Alexandre GRIVEAUX

Attemping to make my CI20 more usefull than a paperweight, I add nodes to 
Devicetree, at this time:

- The IW8103 need some work to stay alive because power seem to turn off.
- The leds patch lack of correct option in ci20_defconfig.
- The Cpu patch isn't usefull without SMP support of jz4780.

Alexandre GRIVEAUX (4):
  MIPS: CI20: DTS: Add I2C nodes
  MIPS: CI20: DTS: Add IW8103 Wifi + bluetooth
  MIPS: CI20: DTS: Add Leds
  MIPS: JZ4780: DTS: Add CPU nodes

 arch/mips/boot/dts/ingenic/ci20.dts| 214 +
 arch/mips/boot/dts/ingenic/jz4780.dtsi |  17 ++
 2 files changed, 231 insertions(+)


base-commit: 4c37310a2e605357cf47b3d357696309f1181b5c
-- 
2.20.1

Teasith18

2019-09-22 Thread 79994574909

开发瓢   v:ⓍⓍⓑⓑ①④③②


 



eople using this calendar will see your email address if you join. 
To join a shared calendar, you must have iCloud. Learn More

[PATCH AUTOSEL 5.3 030/203] rcu/tree: Call setschedule() gp ktread to SCHED_FIFO outside of atomic region

2019-09-22 Thread Sasha Levin

From: Juri Lelli 

[ Upstream commit 1a763fd7c6335e3122c1cc09576ef6c99ada4267 ]

sched_setscheduler() needs to acquire cpuset_rwsem, but it is currently
called from an invalid (atomic) context by rcu_spawn_gp_kthread().

Fix that by simply moving sched_setscheduler_nocheck() call outside of
the atomic region, as it doesn't actually require to be guarded by
rcu_node lock.

Suggested-by: Peter Zijlstra 
Tested-by: Dietmar Eggemann 
Signed-off-by: Juri Lelli 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Linus Torvalds 
Cc: Thomas Gleixner 
Cc: bris...@redhat.com
Cc: clau...@evidence.eu.com
Cc: lize...@huawei.com
Cc: long...@redhat.com
Cc: luca.ab...@santannapisa.it
Cc: mathieu.poir...@linaro.org
Cc: rost...@goodmis.org
Cc: t...@kernel.org
Cc: tommaso.cucino...@santannapisa.it
Link: https://lkml.kernel.org/r/2019071914.31694-8-juri.le...@redhat.com
Signed-off-by: Ingo Molnar 
Signed-off-by: Sasha Levin 
---
 kernel/rcu/tree.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index a14e5fbbea467..eb764c24bc4d4 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3234,13 +3234,13 @@ static int __init rcu_spawn_gp_kthread(void)
t = kthread_create(rcu_gp_kthread, NULL, "%s", rcu_state.name);
if (WARN_ONCE(IS_ERR(t), "%s: Could not start grace-period kthread, OOM 
is now expected behavior\n", __func__))
return 0;
+   if (kthread_prio)
+   sched_setscheduler_nocheck(t, SCHED_FIFO, );
rnp = rcu_get_root();
raw_spin_lock_irqsave_rcu_node(rnp, flags);
rcu_state.gp_kthread = t;
-   if (kthread_prio) {
+   if (kthread_prio)
sp.sched_priority = kthread_prio;
-   sched_setscheduler_nocheck(t, SCHED_FIFO, );
-   }
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
wake_up_process(t);
rcu_spawn_nocb_kthreads();
-- 
2.20.1

[PATCH AUTOSEL 5.3 005/203] regulator: lm363x: Fix off-by-one n_voltages for lm3632 ldo_vpos/ldo_vneg

2019-09-22 Thread Sasha Levin

From: Axel Lin 

[ Upstream commit 1e2cc8c5e0745b545d4974788dc606d678b6e564 ]

According to the datasheet https://www.ti.com/lit/ds/symlink/lm3632a.pdf
Table 20. VPOS Bias Register Field Descriptions VPOS[5:0]
Sets the Positive Display Bias (LDO) Voltage (50 mV per step)
00: 4 V
01: 4.05 V
10: 4.1 V

011101: 5.45 V
00: 5.5 V (Default)
01: 5.55 V

100111: 5.95 V
101000: 6 V
Note: Codes 101001 to 11 map to 6 V

The LM3632_LDO_VSEL_MAX should be 0b101000 (0x28), so the maximum voltage
can match the datasheet.

Fixes: 3a8d1a73a037 ("regulator: add LM363X driver")
Signed-off-by: Axel Lin 
Link: https://lore.kernel.org/r/20190626132632.32629-1-axel@ingics.com
Signed-off-by: Mark Brown 
Signed-off-by: Sasha Levin 
---
 drivers/regulator/lm363x-regulator.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/regulator/lm363x-regulator.c 
b/drivers/regulator/lm363x-regulator.c
index 5647e2f97ff8d..e4a27d63bf901 100644
--- a/drivers/regulator/lm363x-regulator.c
+++ b/drivers/regulator/lm363x-regulator.c
@@ -30,7 +30,7 @@
 
 /* LM3632 */
 #define LM3632_BOOST_VSEL_MAX  0x26
-#define LM3632_LDO_VSEL_MAX0x29
+#define LM3632_LDO_VSEL_MAX0x28
 #define LM3632_VBOOST_MIN  450
 #define LM3632_VLDO_MIN400
 
-- 
2.20.1

1 2 3 4 5 6 7 8 9 >

1 - 100 of 899 matches

Mail list logo