[PATCH -next] fuse: Make fuse_args_to_req static
Fix sparse warning: fs/fuse/dev.c:468:6: warning: symbol 'fuse_args_to_req' was not declared. Should it be static? Reported-by: Hulk Robot Signed-off-by: YueHaibing --- fs/fuse/dev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 46d68d4..7844d35 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -465,7 +465,7 @@ static void fuse_force_creds(struct fuse_conn *fc, struct fuse_req *req) req->in.h.pid = pid_nr_ns(task_pid(current), fc->pid_ns); } -void fuse_args_to_req(struct fuse_req *req, struct fuse_args *args) +static void fuse_args_to_req(struct fuse_req *req, struct fuse_args *args) { req->in.h.opcode = args->opcode; req->in.h.nodeid = args->nodeid; -- 2.7.4
Re: [PATCH] mm/hotplug: Reorder memblock_[free|remove]() calls in try_remove_memory()
On 09/16/2019 11:17 AM, Anshuman Khandual wrote: > In add_memory_resource() the memory range to be hot added first gets into > the memblock via memblock_add() before arch_add_memory() is called on it. > Reverse sequence should be followed during memory hot removal which already > is being followed in add_memory_resource() error path. This now ensures > required re-order between memblock_[free|remove]() and arch_remove_memory() > during memory hot-remove. > > Cc: Andrew Morton > Cc: Oscar Salvador > Cc: Michal Hocko > Cc: David Hildenbrand > Cc: Pavel Tatashin > Cc: Dan Williams > Signed-off-by: Anshuman Khandual > --- > Original patch https://lkml.org/lkml/2019/9/3/327 > > Memory hot remove now works on arm64 without this because a recent commit > 60bb462fc7ad ("drivers/base/node.c: simplify > unregister_memory_block_under_nodes()"). > > David mentioned that re-ordering should still make sense for consistency > purpose (removing stuff in the reverse order they were added). This patch > is now detached from arm64 hot-remove series. > > https://lkml.org/lkml/2019/9/3/326 > > mm/memory_hotplug.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index c73f09913165..355c466e0621 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -1770,13 +1770,13 @@ static int __ref try_remove_memory(int nid, u64 > start, u64 size) > > /* remove memmap entry */ > firmware_map_remove(start, start + size, "System RAM"); > - memblock_free(start, size); > - memblock_remove(start, size); > > /* remove memory block devices before removing memory */ > remove_memory_block_devices(start, size); > > arch_remove_memory(nid, start, size, NULL); > + memblock_free(start, size); > + memblock_remove(start, size); > __release_memory_resource(start, size); > > try_offline_node(nid); > Hello Andrew, Any feedbacks on this, does it look okay ? - Anshuman
[RFC-v2 1/2] ext4: Add ext4_ilock & ext4_iunlock API
This adds ext4_ilock/iunlock types of APIs. This is the preparation APIs to make shared locking/unlocking & restarting with exclusive locking/unlocking easier in next patch. Along with above this also addresses the AIM7 regression problem which was only fixed for XFS in, commit 942491c9e6d6 ("xfs: fix AIM7 regression") Signed-off-by: Ritesh Harjani --- fs/ext4/ext4.h| 33 + fs/ext4/extents.c | 16 ++-- fs/ext4/file.c| 63 +-- fs/ext4/inode.c | 4 +-- fs/ext4/ioctl.c | 16 ++-- fs/ext4/super.c | 12 - fs/ext4/xattr.c | 16 ++-- 7 files changed, 98 insertions(+), 62 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 2ab91815f52d..9ffafbe6bc3f 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -2945,6 +2945,39 @@ static inline int ext4_update_inode_size(struct inode *inode, loff_t newsize) return changed; } +#define EXT4_IOLOCK_EXCL (1 << 0) +#define EXT4_IOLOCK_SHARED (1 << 1) + +static inline void ext4_ilock(struct inode *inode, unsigned int iolock) +{ + if (iolock == EXT4_IOLOCK_EXCL) + inode_lock(inode); + else + inode_lock_shared(inode); +} + +static inline void ext4_iunlock(struct inode *inode, unsigned int iolock) +{ + if (iolock == EXT4_IOLOCK_EXCL) + inode_unlock(inode); + else + inode_unlock_shared(inode); +} + +static inline int ext4_ilock_nowait(struct inode *inode, unsigned int iolock) +{ + if (iolock == EXT4_IOLOCK_EXCL) + return inode_trylock(inode); + else + return inode_trylock_shared(inode); +} + +static inline void ext4_ilock_demote(struct inode *inode, unsigned int iolock) +{ + BUG_ON(iolock != EXT4_IOLOCK_EXCL); + downgrade_write(>i_rwsem); +} + int ext4_update_disksize_before_punch(struct inode *inode, loff_t offset, loff_t len); diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index a869e206bd81..ef37f4d4ee7e 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -4680,7 +4680,7 @@ static long ext4_zero_range(struct file *file, loff_t offset, else max_blocks -= lblk; - inode_lock(inode); + ext4_ilock(inode, EXT4_IOLOCK_EXCL); /* * Indirect files do not support unwritten extnets @@ -4790,7 +4790,7 @@ static long ext4_zero_range(struct file *file, loff_t offset, ext4_journal_stop(handle); out_mutex: - inode_unlock(inode); + ext4_iunlock(inode, EXT4_IOLOCK_EXCL); return ret; } @@ -4856,7 +4856,7 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len) if (mode & FALLOC_FL_KEEP_SIZE) flags |= EXT4_GET_BLOCKS_KEEP_SIZE; - inode_lock(inode); + ext4_ilock(inode, EXT4_IOLOCK_EXCL); /* * We only support preallocation for extent-based files only @@ -4887,7 +4887,7 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len) EXT4_I(inode)->i_sync_tid); } out: - inode_unlock(inode); + ext4_iunlock(inode, EXT4_IOLOCK_EXCL); trace_ext4_fallocate_exit(inode, offset, max_blocks, ret); return ret; } @@ -5387,7 +5387,7 @@ int ext4_collapse_range(struct inode *inode, loff_t offset, loff_t len) return ret; } - inode_lock(inode); + ext4_ilock(inode, EXT4_IOLOCK_EXCL); /* * There is no need to overlap collapse range with EOF, in which case * it is effectively a truncate operation @@ -5486,7 +5486,7 @@ int ext4_collapse_range(struct inode *inode, loff_t offset, loff_t len) out_mmap: up_write(_I(inode)->i_mmap_sem); out_mutex: - inode_unlock(inode); + ext4_iunlock(inode, EXT4_IOLOCK_EXCL); return ret; } @@ -5537,7 +5537,7 @@ int ext4_insert_range(struct inode *inode, loff_t offset, loff_t len) return ret; } - inode_lock(inode); + ext4_ilock(inode, EXT4_IOLOCK_EXCL); /* Currently just for extent based files */ if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) { ret = -EOPNOTSUPP; @@ -5664,7 +5664,7 @@ int ext4_insert_range(struct inode *inode, loff_t offset, loff_t len) out_mmap: up_write(_I(inode)->i_mmap_sem); out_mutex: - inode_unlock(inode); + ext4_iunlock(inode, EXT4_IOLOCK_EXCL); return ret; } diff --git a/fs/ext4/file.c b/fs/ext4/file.c index d2ff383a8b9f..ce1cecbae932 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -57,14 +57,15 @@ static ssize_t ext4_dio_read_iter(struct kiocb *iocb, struct iov_iter *to) /* * Get exclusion from truncate and other inode operations. */ - if (!inode_trylock_shared(inode)) { - if (iocb->ki_flags &
[RFC-v2 0/2] ext4: Improve ilock scalability to improve performance in DirectIO workloads
Hello All, These are ilock patches which helps improve the current inode lock scalabiliy problem in ext4 DIO mixed-read/write workload case. The problem was first reported by Joseph [1]. These patches are based upon upstream discussion with Jan Kara & Joseph [2]. Since these patches introduce ext4_ilock/unlock API for inode locking/ unlocking in shared v/s exclusive mode to help achieve better scalability in DIO writes case, it was better to take care of inode_trylock for RWF_NOWAIT case in the same patch set. Discussion about this can be found here [3]. Please let me know if otherwise. These patches are based on the top of ext4 iomap for DIO patches by Matthew [4]. Now, since this is a very interesting scalability problem w.r.t locking, I thought of collecting a comprehensive set of performance numbers for DIO sync mixed random read/write workload w.r.t number of threads (ext4). This commit msg will present a performance benefit histogram using this patch set v/s vanilla kernel (v5.3-rc5) in DIO mixed randrw workload. Git tree for this can be found at- https://github.com/riteshharjani/linux/tree/ext4-ilock-RFC-v2 Background == dioread_nolock feature in ext4, is to allow multiple DIO reads in parallel. Also, in case of overwrites DIO since there is no block allocation needed, those I/Os too can run in parallel. This gives a good scalable I/O performance w.r.t multiple DIO read/writes threads. In case of buffer write with dioread_nolock feature, ext4 will allocate uninitialized extent and once the IO is completed will convert it back to initialized. But recently DIO reads got changed to be under shared inode rwsem. commit 16c54688592c ("ext4: Allow parallel DIO reads"). It did allow parallel DIO reads to work fine, but this degraded the mixed DIO read/write workload. Problem description === Currently in DIO writes case, we start with an exclusive inode rwsem and only once we know that it is a overwrite (with dioread_nolock mnt option enabled), we demote it to shared lock. Now with mixed DIO reads & writes, this has a scalability problem. Which is, if we have any ongoing DIO reads, then DIO writes (since it starts with exclusive) has to wait until the shared lock is released (which only happens when DIO read is completed). The same can be easily observed with perf-tools trace analysis. Implementation == To fix this, we start with a shared lock (unless an exclusive lock is needed in certain cases e.g. extending/unaligned writes) in DIO write case. Then if it is a overwrite we continue with shared lock, if not then we release the shared lock and switch to exclusive lock and redo certain checks again. This approach help in achieving good scalablity for mixed DIO read/writes worloads [5-6]. Performance results === The performance numbers are collected using fio tool with dioread_nolock mount option for DIO mixed random read/write(randrw) worload with 4K & 1M burst sizes for increasing number of threads (1, 2, 4, 8, 12, 16, 24). The performance historgram shown below is the percentage change in performance by using this ilock patchset as compared to vanilla kernel (v5.3-rc5). To check absolute score comparison with some colorful graphs, refer [5-6]. FIO command: fio -name=DIO-mixed-randrw -filename=./testfile -direct=1 -iodepth=1 -thread \ -rw=randrw -ioengine=psync -bs=$bs -size=10G -numjobs=$thread \ -group_reporting=1 -runtime=120 With dioread_nolock mount option Below shows the performance benefit hist with this ilock patchset in (%) w.r.t vanilla kernel(v5.3-rc5) with dioread_nolock mount option for mixed randrw workload (for 4K block size). If we notice, the percentage benefit increases with increasing number of threads. So this patchset help achieve good scalability in the mentioned workload. Also this gives upto ~70% perf improvement in 24 threads mixed randrw workload with 4K burst size. The performance difference can be even higher with high speed storage devices, since bw speeds without the patch seems to flatten due to lock contention problem in case of multiple threads. Absolute graph comparison can be seen here - [5] Performance benefit (%) data randrw (read)-4K 80 +-+--++---++++---+--+-+ |++ ++++ +| 70 +-+ **+-+ |** ** | 60 +-+ ** ** **+-+ |** ** ** | 50 +-+ ** ** ** **+-+ | ** ** ** ** | 40 +-+ ** ** ** **+-+ 30 +-+
[RFC-v2 2/2] ext4: Improve DIO writes locking sequence
Earlier there was no shared lock in DIO read path. But this patch (16c54688592ce: ext4: Allow parallel DIO reads) simplified some of the locking mechanism while still allowing for parallel DIO reads by adding shared lock in inode DIO read path. But this created problem with mixed read/write workload. It is due to the fact that in DIO path, we first start with exclusive lock and only when we determine that it is a ovewrite IO, we downgrade the lock. This causes the problem, since with above patch we have shared locking in DIO reads. So, this patch tries to fix this issue by starting with shared lock and then switching to exclusive lock only when required based on ext4_dio_write_checks(). Other than that, it also simplifies below cases:- 1. Simplified ext4_unaligned_aio API to ext4_unaligned_io. Previous API was abused in the sense that it was not really checking for AIO anywhere also it used to check for extending writes. So this API was renamed and simplified to ext4_unaligned_io() which actully only checks if the IO is really unaligned. This is because in all other cases inode_dio_wait() will anyway become a no-op. So no need to over complicate by checking for every condition here. 2. Added ext4_extending_io API. This checks if the IO is extending the file. Now we only check for unaligned_io, extend, dioread_nolock & overwrite. Signed-off-by: Ritesh Harjani --- fs/ext4/file.c | 213 - 1 file changed, 159 insertions(+), 54 deletions(-) diff --git a/fs/ext4/file.c b/fs/ext4/file.c index ce1cecbae932..faa8dcf9c08d 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -166,19 +166,25 @@ static int ext4_release_file(struct inode *inode, struct file *filp) * threads are at work on the same unwritten block, they must be synchronized * or one thread will zero the other's data, causing corruption. */ -static int -ext4_unaligned_aio(struct inode *inode, struct iov_iter *from, loff_t pos) +static bool +ext4_unaligned_io(struct inode *inode, struct iov_iter *from, loff_t pos) { struct super_block *sb = inode->i_sb; - int blockmask = sb->s_blocksize - 1; - - if (pos >= ALIGN(i_size_read(inode), sb->s_blocksize)) - return 0; + unsigned long blockmask = sb->s_blocksize - 1; if ((pos | iov_iter_alignment(from)) & blockmask) - return 1; + return true; - return 0; + return false; +} + +static bool +ext4_extending_io(struct inode *inode, loff_t offset, size_t len) +{ + if (offset + len > i_size_read(inode) || + offset + len > EXT4_I(inode)->i_disksize) + return true; + return false; } /* Is IO overwriting allocated and initialized blocks? */ @@ -204,7 +210,9 @@ static bool ext4_overwrite_io(struct inode *inode, loff_t pos, loff_t len) return err == blklen && (map.m_flags & EXT4_MAP_MAPPED); } -static ssize_t ext4_write_checks(struct kiocb *iocb, struct iov_iter *from) + +static ssize_t ext4_generic_write_checks(struct kiocb *iocb, +struct iov_iter *from) { struct inode *inode = file_inode(iocb->ki_filp); ssize_t ret; @@ -216,10 +224,6 @@ static ssize_t ext4_write_checks(struct kiocb *iocb, struct iov_iter *from) if (ret <= 0) return ret; - ret = file_modified(iocb->ki_filp); - if (ret) - return 0; - /* * If we have encountered a bitmap-format file, the size limit * is smaller than s_maxbytes, which is for extent-mapped files. @@ -231,9 +235,26 @@ static ssize_t ext4_write_checks(struct kiocb *iocb, struct iov_iter *from) return -EFBIG; iov_iter_truncate(from, sbi->s_bitmap_maxbytes - iocb->ki_pos); } + return iov_iter_count(from); } +static ssize_t ext4_write_checks(struct kiocb *iocb, struct iov_iter *from) +{ + ssize_t ret; + ssize_t count; + + count = ext4_generic_write_checks(iocb, from); + if (count <= 0) + return count; + + ret = file_modified(iocb->ki_filp); + if (ret) + return ret; + + return count; +} + static ssize_t ext4_buffered_write_iter(struct kiocb *iocb, struct iov_iter *from) { @@ -336,6 +357,83 @@ static int ext4_handle_failed_inode_extension(struct inode *inode, loff_t size) return 0; } +/* + * The intention here is to start with shared lock acquired + * (except in unaligned IO & extending writes case), + * then see if any condition requires an exclusive inode + * lock. If yes, then we restart the whole operation by + * releasing the shared lock and acquiring exclusive lock. + * + * - For unaligned_io we never take exclusive lock as it + * may cause data corruption when two unaligned IO tries + * to modify the same block. + * + * - For extending wirtes case we don't take + * the
[PATCH V8 0/2] arm64/mm: Enable memory hot remove
This series enables memory hot remove on arm64 after fixing a possible arm64 platform specific kernel page table race condition. This series is based on linux-next (next-20190920). Concurrent vmalloc() and hot-remove conflict: As pointed out earlier on the v5 thread [2] there can be potential conflict between concurrent vmalloc() and memory hot-remove operation. The problem here is caused by inadequate locking in vmalloc() which protects installation of a page table page but not the walk or the leaf entry modification. Now free_empty_tables() and it's children functions take into account a maximum possible range on which it operates as a floor-ceiling boundary. This along with changed free_pxx_table() makes sure that no page table page is freed unless its fully within the maximum possible range as decided by the caller. Testing: Memory hot remove has been tested on arm64 for 4K, 16K, 64K page config options with all possible CONFIG_ARM64_VA_BITS and CONFIG_PGTABLE_LEVELS combinations. Changes in V8: - Dropped the first patch (memblock_[free|remove] reorder) from the series which is no longer needed for arm64 hot-remove enablement and was posted separately as (https://patchwork.kernel.org/patch/11146361/) - Dropped vmalloc-vmemmap detection and subsequent skipping of free_empty_tables() - Changed free_empty_[pxx]_tables() functions which now accepts a possible maximum floor-ceiling address range on which it operates. Also changed free_pxx_table() functions to check against required alignment as well as maximum floor-ceiling range as another prerequisite before freeing the page table page. - Dropped remove_pagetable(), instead call it's constituent functions directly Changes in V7: (https://lkml.org/lkml/2019/9/3/326) - vmalloc_vmemmap_overlap gets evaluated early during boot for a given config - free_empty_tables() gets conditionally called based on vmalloc_vmemmap_overlap Changes in V6: (https://lkml.org/lkml/2019/7/15/36) - Implemented most of the suggestions from Mark Rutland - Added in ptdump - remove_pagetable() now has two distinct passes over the kernel page table - First pass unmap_hotplug_range() removes leaf level entries at all level - Second pass free_empty_tables() removes empty page table pages - Kernel page table lock has been dropped completely - vmemmap_free() does not call freee_empty_tables() to avoid conflict with vmalloc() - All address range scanning are converted to do {} while() loop - Added 'unsigned long end' in __remove_pgd_mapping() - Callers need not provide starting pointer argument to free_[pte|pmd|pud]_table() - Drop the starting pointer argument from free_[pte|pmd|pud]_table() functions - Fetching pxxp[i] in free_[pte|pmd|pud]_table() is wrapped around in READ_ONCE() - free_[pte|pmd|pud]_table() now computes starting pointer inside the function - Fixed TLB handling while freeing huge page section mappings at PMD or PUD level - Added WARN_ON(!page) in free_hotplug_page_range() - Added WARN_ON(![pm|pud]_table(pud|pmd)) when there is no section mapping - [PATCH 1/3] mm/hotplug: Reorder memblock_[free|remove]() calls in try_remove_memory() - Request earlier for separate merger (https://patchwork.kernel.org/patch/10986599/) - s/__remove_memory/try_remove_memory in the subject line - s/arch_remove_memory/memblock_[free|remove] in the subject line - A small change in the commit message as re-order happens now for memblock remove functions not for arch_remove_memory() Changes in V5: (https://lkml.org/lkml/2019/5/29/218) - Have some agreement [1] over using memory_hotplug_lock for arm64 ptdump - Change 7ba36eccb3f8 ("arm64/mm: Inhibit huge-vmap with ptdump") already merged - Dropped the above patch from this series - Fixed indentation problem in arch_[add|remove]_memory() as per David - Collected all new Acked-by tags Changes in V4: (https://lkml.org/lkml/2019/5/20/19) - Implemented most of the suggestions from Mark Rutland - Interchanged patch [PATCH 2/4] <---> [PATCH 3/4] and updated commit message - Moved CONFIG_PGTABLE_LEVELS inside free_[pud|pmd]_table() - Used READ_ONCE() in missing instances while accessing page table entries - s/p???_present()/p???_none() for checking valid kernel page table entries - WARN_ON() when an entry is !p???_none() and !p???_present() at the same time - Updated memory hot-remove commit message with additional details as suggested - Rebased the series on 5.2-rc1 with hotplug changes from David and Michal Hocko - Collected all new Acked-by tags Changes in V3: (https://lkml.org/lkml/2019/5/14/197) - Implemented most of the suggestions from Mark Rutland for remove_pagetable() - Fixed applicable PGTABLE_LEVEL wrappers around pgtable page freeing functions - Replaced 'direct' with 'sparse_vmap' in remove_pagetable() with inverted polarity - Changed pointer names ('p' at end) and removed tmp from iterations - Perform intermediate TLB invalidation while clearing pgtable entries - Dropped flush_tlb_kernel_range()
[PATCH V8 2/2] arm64/mm: Enable memory hot remove
The arch code for hot-remove must tear down portions of the linear map and vmemmap corresponding to memory being removed. In both cases the page tables mapping these regions must be freed, and when sparse vmemmap is in use the memory backing the vmemmap must also be freed. This patch adds unmap_hotplug_range() and free_empty_tables() helpers which can be used to tear down either region and calls it from vmemmap_free() and ___remove_pgd_mapping(). The sparse_vmap argument determines whether the backing memory will be freed. It makes two distinct passes over the kernel page table. In the first pass with unmap_hotplug_range() it unmaps, invalidates applicable TLB cache and frees backing memory if required (vmemmap) for each mapped leaf entry. In the second pass with free_empty_tables() it looks for empty page table sections whose page table page can be unmapped, TLB invalidated and freed. While freeing intermediate level page table pages bail out if any of its entries are still valid. This can happen for partially filled kernel page table either from a previously attempted failed memory hot add or while removing an address range which does not span the entire page table page range. The vmemmap region may share levels of table with the vmalloc region. There can be conflicts between hot remove freeing page table pages with a concurrent vmalloc() walking the kernel page table. This conflict can not just be solved by taking the init_mm ptl because of existing locking scheme in vmalloc(). So free_empty_tables() implements a floor and ceiling method which is borrowed from user page table tear with free_pgd_range() which skips freeing page table pages if intermediate address range is not aligned or maximum floor-ceiling might not own the entire page table page. While here update arch_add_memory() to handle __add_pages() failures by just unmapping recently added kernel linear mapping. Now enable memory hot remove on arm64 platforms by default with ARCH_ENABLE_MEMORY_HOTREMOVE. This implementation is overall inspired from kernel page table tear down procedure on X86 architecture and user page table tear down method. Acked-by: Steve Capper Acked-by: David Hildenbrand Signed-off-by: Anshuman Khandual --- arch/arm64/Kconfig | 3 + arch/arm64/include/asm/memory.h | 1 + arch/arm64/mm/mmu.c | 305 ++-- 3 files changed, 300 insertions(+), 9 deletions(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 41a9b42..511249f 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -274,6 +274,9 @@ config ZONE_DMA32 config ARCH_ENABLE_MEMORY_HOTPLUG def_bool y +config ARCH_ENABLE_MEMORY_HOTREMOVE + def_bool y + config SMP def_bool y diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h index b61b50b..615dcd0 100644 --- a/arch/arm64/include/asm/memory.h +++ b/arch/arm64/include/asm/memory.h @@ -54,6 +54,7 @@ #define MODULES_VADDR (BPF_JIT_REGION_END) #define MODULES_VSIZE (SZ_128M) #define VMEMMAP_START (-VMEMMAP_SIZE - SZ_2M) +#define VMEMMAP_END(VMEMMAP_START + VMEMMAP_SIZE) #define PCI_IO_END (VMEMMAP_START - SZ_2M) #define PCI_IO_START (PCI_IO_END - PCI_IO_SIZE) #define FIXADDR_TOP(PCI_IO_START - SZ_2M) diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 60c929f..6178837 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -725,6 +725,277 @@ int kern_addr_valid(unsigned long addr) return pfn_valid(pte_pfn(pte)); } + +#ifdef CONFIG_MEMORY_HOTPLUG +static void free_hotplug_page_range(struct page *page, size_t size) +{ + WARN_ON(!page || PageReserved(page)); + free_pages((unsigned long)page_address(page), get_order(size)); +} + +static void free_hotplug_pgtable_page(struct page *page) +{ + free_hotplug_page_range(page, PAGE_SIZE); +} + +static bool pgtable_range_aligned(unsigned long start, unsigned long end, + unsigned long floor, unsigned long ceiling, + unsigned long mask) +{ + start &= mask; + if (start < floor) + return false; + + if (ceiling) { + ceiling &= mask; + if (!ceiling) + return false; + } + + if (end - 1 > ceiling - 1) + return false; + return true; +} + +static void free_pte_table(pmd_t *pmdp, unsigned long addr, unsigned long end, + unsigned long floor, unsigned long ceiling) +{ + struct page *page; + pte_t *ptep; + int i; + + if (!pgtable_range_aligned(addr, end, floor, ceiling, PMD_MASK)) + return; + + ptep = pte_offset_kernel(pmdp, 0UL); + for (i = 0; i < PTRS_PER_PTE; i++) { + if (!pte_none(READ_ONCE(ptep[i]))) + return; + } + + page =
[PATCH V8 1/2] arm64/mm: Hold memory hotplug lock while walking for kernel page table dump
The arm64 page table dump code can race with concurrent modification of the kernel page tables. When a leaf entries are modified concurrently, the dump code may log stale or inconsistent information for a VA range, but this is otherwise not harmful. When intermediate levels of table are freed, the dump code will continue to use memory which has been freed and potentially reallocated for another purpose. In such cases, the dump code may dereference bogus addresses, leading to a number of potential problems. Intermediate levels of table may by freed during memory hot-remove, which will be enabled by a subsequent patch. To avoid racing with this, take the memory hotplug lock when walking the kernel page table. Acked-by: David Hildenbrand Acked-by: Mark Rutland Signed-off-by: Anshuman Khandual --- arch/arm64/mm/ptdump_debugfs.c | 4 1 file changed, 4 insertions(+) diff --git a/arch/arm64/mm/ptdump_debugfs.c b/arch/arm64/mm/ptdump_debugfs.c index 064163f..b5eebc8 100644 --- a/arch/arm64/mm/ptdump_debugfs.c +++ b/arch/arm64/mm/ptdump_debugfs.c @@ -1,5 +1,6 @@ // SPDX-License-Identifier: GPL-2.0 #include +#include #include #include @@ -7,7 +8,10 @@ static int ptdump_show(struct seq_file *m, void *v) { struct ptdump_info *info = m->private; + + get_online_mems(); ptdump_walk_pgd(m, info); + put_online_mems(); return 0; } DEFINE_SHOW_ATTRIBUTE(ptdump); -- 2.7.4
[PATCH -next] scsi: hisi_sas: Make three functions static
Fix sparse warnings: drivers/scsi/hisi_sas/hisi_sas_main.c:3686:6: warning: symbol 'hisi_sas_debugfs_release' was not declared. Should it be static? drivers/scsi/hisi_sas/hisi_sas_main.c:3708:5: warning: symbol 'hisi_sas_debugfs_alloc' was not declared. Should it be static? drivers/scsi/hisi_sas/hisi_sas_main.c:3799:6: warning: symbol 'hisi_sas_debugfs_bist_init' was not declared. Should it be static? Reported-by: Hulk Robot Signed-off-by: YueHaibing --- drivers/scsi/hisi_sas/hisi_sas_main.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c index d1513fd..0847e68 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_main.c +++ b/drivers/scsi/hisi_sas/hisi_sas_main.c @@ -3683,7 +3683,7 @@ void hisi_sas_debugfs_work_handler(struct work_struct *work) } EXPORT_SYMBOL_GPL(hisi_sas_debugfs_work_handler); -void hisi_sas_debugfs_release(struct hisi_hba *hisi_hba) +static void hisi_sas_debugfs_release(struct hisi_hba *hisi_hba) { struct device *dev = hisi_hba->dev; int i; @@ -3705,7 +3705,7 @@ void hisi_sas_debugfs_release(struct hisi_hba *hisi_hba) devm_kfree(dev, hisi_hba->debugfs_port_reg[i]); } -int hisi_sas_debugfs_alloc(struct hisi_hba *hisi_hba) +static int hisi_sas_debugfs_alloc(struct hisi_hba *hisi_hba) { const struct hisi_sas_hw *hw = hisi_hba->hw; struct device *dev = hisi_hba->dev; @@ -3796,7 +3796,7 @@ int hisi_sas_debugfs_alloc(struct hisi_hba *hisi_hba) return -ENOMEM; } -void hisi_sas_debugfs_bist_init(struct hisi_hba *hisi_hba) +static void hisi_sas_debugfs_bist_init(struct hisi_hba *hisi_hba) { hisi_hba->debugfs_bist_dentry = debugfs_create_dir("bist", hisi_hba->debugfs_dir); -- 2.7.4
Re: [PATCH v2 2/2] PCI: pciehp: Prevent deadlock on disconnect
On Mon, Aug 12, 2019 at 05:31:33PM +0300, Mika Westerberg wrote: > If there are more than one PCIe switch with hotplug downstream ports > hot-removing them leads to a following deadlock: For the record, I think my comments on v1 of this patch still apply: https://patchwork.ozlabs.org/patch/1117870/#2230798
Re: [PATCH] smack: fix an compile error in smack_post_notification
Hi zhong, Thank you for the patch! Yet something to improve: [auto build test ERROR on linus/master] [cannot apply to v5.3 next-20190920] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system. BTW, we also suggest to use '--base' option to specify the base tree in git format-patch, please see https://stackoverflow.com/a/37406982] url: https://github.com/0day-ci/linux/commits/zhong-jiang/smack-fix-an-compile-error-in-smack_post_notification/20190923-112403 config: x86_64-allyesconfig (attached as .config) compiler: gcc-7 (Debian 7.4.0-13) 7.4.0 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 If you fix the issue, kindly add following tag Reported-by: kbuild test robot All errors (new ones prefixed by >>): In file included from security/smack/smack_lsm.c:45:0: >> security/smack/smack.h:24:10: fatal error: linux/watch_queue.h: No such file >> or directory #include ^ compilation terminated. vim +24 security/smack/smack.h 11 12 #include 13 #include 14 #include 15 #include 16 #if IS_ENABLED(CONFIG_IPV6) 17 #include 18 #endif /* CONFIG_IPV6 */ 19 #include 20 #include 21 #include 22 #include 23 #include > 24 #include 25 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return
On 2019/9/23 上午11:18, Matt Cover wrote: On Sun, Sep 22, 2019 at 7:34 PM Jason Wang wrote: On 2019/9/23 上午9:15, Matt Cover wrote: On Sun, Sep 22, 2019 at 5:51 PM Jason Wang wrote: On 2019/9/23 上午6:30, Matt Cover wrote: On Sun, Sep 22, 2019 at 1:36 PM Michael S. Tsirkin wrote: On Sun, Sep 22, 2019 at 10:43:19AM -0700, Matt Cover wrote: On Sun, Sep 22, 2019 at 5:37 AM Michael S. Tsirkin wrote: On Fri, Sep 20, 2019 at 11:58:43AM -0700, Matthew Cover wrote: Treat a negative return from a TUNSETSTEERINGEBPF bpf prog as a signal to fallback to tun_automq_select_queue() for tx queue selection. Compilation of this exact patch was tested. For functional testing 3 additional printk()s were added. Functional testing results (on 2 txq tap device): [Fri Sep 20 18:33:27 2019] == tun no prog == [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran [Fri Sep 20 18:33:27 2019] == tun prog -1 == [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '-1' [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran [Fri Sep 20 18:33:27 2019] == tun prog 0 == [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '0' [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' [Fri Sep 20 18:33:27 2019] == tun prog 1 == [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '1' [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '1' [Fri Sep 20 18:33:27 2019] == tun prog 2 == [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '2' [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' Signed-off-by: Matthew Cover Could you add a bit more motivation data here? Thank you for these questions Michael. I'll plan on adding the below information to the commit message and submitting a v2 of this patch when net-next reopens. In the meantime, it would be very helpful to know if these answers address some of your concerns. 1. why is this a good idea This change allows TUNSETSTEERINGEBPF progs to do any of the following. 1. implement queue selection for a subset of traffic (e.g. special queue selection logic for ipv4, but return negative and use the default automq logic for ipv6) 2. determine there isn't sufficient information to do proper queue selection; return negative and use the default automq logic for the unknown 3. implement a noop prog (e.g. do bpf_trace_printk() then return negative and use the default automq logic for everything) 2. how do we know existing userspace does not rely on existing behaviour Prior to this change a negative return from a TUNSETSTEERINGEBPF prog would have been cast into a u16 and traversed netdev_cap_txqueue(). In most cases netdev_cap_txqueue() would have found this value to exceed real_num_tx_queues and queue_index would be updated to 0. It is possible that a TUNSETSTEERINGEBPF prog return a negative value which when cast into a u16 results in a positive queue_index less than real_num_tx_queues. For example, on x86_64, a return value of -65535 results in a queue_index of 1; which is a valid queue for any multiqueue device. It seems unlikely, however as stated above is unfortunately possible, that existing TUNSETSTEERINGEBPF programs would choose to return a negative value rather than return the positive value which holds the same meaning. It seems more likely that future TUNSETSTEERINGEBPF programs would leverage a negative return and potentially be loaded into a kernel with the old behavior. OK if we are returning a special value, shouldn't we limit it? How about a special value with this meaning? If we are changing an ABI let's at least make it extensible. A special value with this meaning sounds good to me. I'll plan on adding a define set to -1 to cause the fallback to automq. Can it really return -1? I see: static inline u32 bpf_prog_run_clear_cb(const struct bpf_prog *prog, struct sk_buff *skb) ... The way I was initially viewing the old behavior was that returning negative was undefined; it happened to have the outcomes I walked through, but not necessarily by design. Having such fallback may bring extra troubles, it requires the eBPF program know the existence of the behavior which is not a part of kernel ABI actually. And then some eBPF program may start to rely on that which is pretty dangerous. Note, one important consideration is to have macvtap support where does not have any stuffs like automq. Thanks How about we call this TUN_SSE_ABORT instead of TUN_SSE_DO_AUTOMQ? TUN_SSE_ABORT could be documented as falling back
Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return
On 2019/9/23 上午11:00, Matt Cover wrote: On Sun, Sep 22, 2019 at 7:32 PM Jason Wang wrote: On 2019/9/23 上午9:20, Matt Cover wrote: On Sun, Sep 22, 2019 at 5:46 PM Jason Wang wrote: On 2019/9/23 上午1:43, Matt Cover wrote: On Sun, Sep 22, 2019 at 5:37 AM Michael S. Tsirkin wrote: On Fri, Sep 20, 2019 at 11:58:43AM -0700, Matthew Cover wrote: Treat a negative return from a TUNSETSTEERINGEBPF bpf prog as a signal to fallback to tun_automq_select_queue() for tx queue selection. Compilation of this exact patch was tested. For functional testing 3 additional printk()s were added. Functional testing results (on 2 txq tap device): [Fri Sep 20 18:33:27 2019] == tun no prog == [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran [Fri Sep 20 18:33:27 2019] == tun prog -1 == [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '-1' [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran [Fri Sep 20 18:33:27 2019] == tun prog 0 == [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '0' [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' [Fri Sep 20 18:33:27 2019] == tun prog 1 == [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '1' [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '1' [Fri Sep 20 18:33:27 2019] == tun prog 2 == [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '2' [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' Signed-off-by: Matthew Cover Could you add a bit more motivation data here? Thank you for these questions Michael. I'll plan on adding the below information to the commit message and submitting a v2 of this patch when net-next reopens. In the meantime, it would be very helpful to know if these answers address some of your concerns. 1. why is this a good idea This change allows TUNSETSTEERINGEBPF progs to do any of the following. 1. implement queue selection for a subset of traffic (e.g. special queue selection logic for ipv4, but return negative and use the default automq logic for ipv6) Well, using ebpf means it need to take care of all the cases. E.g you can easily implement the fallback through eBPF as well. I really think there is value in being able to implement a scoped special case while leaving the rest of the packets in the kernel's hands. This is only work when some fucntion could not be done by eBPF itself and then we can provide the function through eBPF helpers. But this is not the case here. Having to reimplement automq makes this hookpoint less accessible to beginners and experienced alike. Note that automq itself is kind of complicated, it's best effort that is hard to be documented accurately. It has several limitations (e.g flow caches etc.) that may not work well in some conditions. It's not hard to implement a user programmable steering policy through maps which could have much deterministic behavior than automq. The goal of steering ebpf is to get rid of automq completely not partially rely on it. And I don't see how relying on automq can simplify anything. Thanks I'm not suggesting that we document automq. I'm suggesting that we add a return value which is documented as signaling to the kernel to implement whatever queue selection method is used when there is no ebpf prog attached. Again, this only work when there's something that could not be done through eBPF. And then we can provide eBPF helper there. That behavior today is automq. Automq is not good, e.g tun_ebpf_select_queue() has already provided a fallback, anything that automq can do better than that? There is nothing about this return value which would harder to change the default queue selection later. The default already exists today when there is no program loaded. The patch depends on incorrect behavior of tuntap (updating flow caches when steering prog is set). I think it's wrong to update flow caches even when steering program is set which leads extra overhead. Will probably submit a patch to disable that behavior. Thanks 2. determine there isn't sufficient information to do proper queue selection; return negative and use the default automq logic for the unknown Same as above. 3. implement a noop prog (e.g. do bpf_trace_printk() then return negative and use the default automq logic for everything) ditto. 2. how do we know existing userspace does not rely on existing behaviour Prior to this change a negative return from a TUNSETSTEERINGEBPF prog would have been cast into a u16 and traversed netdev_cap_txqueue(). In
RE: [RFC PATCH v3 4/6] psci: Add hvc call service for ptp_kvm.
Hi Paolo, Marc > -Original Message- > From: Paolo Bonzini > Sent: Thursday, September 19, 2019 7:07 PM > To: Jianyong Wu (Arm Technology China) ; > net...@vger.kernel.org; yangbo...@nxp.com; john.stu...@linaro.org; > t...@linutronix.de; sean.j.christopher...@intel.com; m...@kernel.org; > richardcoch...@gmail.com; Mark Rutland ; Will > Deacon ; Suzuki Poulose > > Cc: linux-kernel@vger.kernel.org; k...@vger.kernel.org; Steve Capper > ; Kaly Xin (Arm Technology China) > ; Justin He (Arm Technology China) > ; nd ; linux-arm- > ker...@lists.infradead.org > Subject: Re: [RFC PATCH v3 4/6] psci: Add hvc call service for ptp_kvm. > > On 19/09/19 11:46, Jianyong Wu (Arm Technology China) wrote: > >> On 18/09/19 11:57, Jianyong Wu (Arm Technology China) wrote: > >>> Paolo Bonzini wrote: > This is not Y2038-safe. Please use ktime_get_real_ts64 instead, > and split the 64-bit seconds value between val[0] and val[1]. > > > > Val[] should be long not u32 I think, so in arm64 I can avoid that > > Y2038_safe, but also need rewrite for arm32. > > I don't think there's anything inherently wrong with u32 val[], and as you > notice it lets you reuse code between arm and arm64. It's up to you and > Marc to decide. > To compatible 32-bit, Integrates second value and nanosecond value as a nanosecond value then split it into val[0] and val[1] and split cycle value into val[2] and val[3], In this way, time will overflow at Y2262. WDYT? Thanks Jianyong Wu
Re: Kernel Concurrency Sanitizer (KCSAN)
On Fri, Sep 20, 2019 at 04:54:21PM +0100, Will Deacon wrote: > Hi Marco, > > On Fri, Sep 20, 2019 at 04:18:57PM +0200, Marco Elver wrote: > > We would like to share a new data-race detector for the Linux kernel: > > Kernel Concurrency Sanitizer (KCSAN) -- > > https://github.com/google/ktsan/wiki/KCSAN (Details: > > https://github.com/google/ktsan/blob/kcsan/Documentation/dev-tools/kcsan.rst) > > > > To those of you who we mentioned at LPC that we're working on a > > watchpoint-based KTSAN inspired by DataCollider [1], this is it (we > > renamed it to KCSAN to avoid confusion with KTSAN). > > [1] http://usenix.org/legacy/events/osdi10/tech/full_papers/Erickson.pdf > > Oh, spiffy! > > > In the coming weeks we're planning to: > > * Set up a syzkaller instance. > > * Share the dashboard so that you can see the races that are found. > > * Attempt to send fixes for some races upstream (if you find that the > > kcsan-with-fixes branch contains an important fix, please feel free to > > point it out and we'll prioritize that). > > Curious: do you take into account things like alignment and/or access size > when looking at READ_ONCE/WRITE_ONCE? Perhaps you could initially prune > naturally aligned accesses for which __native_word() is true? > > > There are a few open questions: > > * The big one: most of the reported races are due to unmarked > > accesses; prioritization or pruning of races to focus initial efforts > > to fix races might be required. Comments on how best to proceed are > > welcome. We're aware that these are issues that have recently received > > attention in the context of the LKMM > > (https://lwn.net/Articles/793253/). > > This one is tricky. What I think we need to avoid is an onslaught of > patches adding READ_ONCE/WRITE_ONCE without a concrete analysis of the > code being modified. My worry is that Joe Developer is eager to get their > first patch into the kernel, so runs this tool and starts spamming > maintainers with these things to the point that they start ignoring KCSAN > reports altogether because of the time they take up. > > I suppose one thing we could do is to require each new READ_ONCE/WRITE_ONCE > to have a comment describing the racy access, a bit like we do for memory > barriers. Another possibility would be to use atomic_t more widely if > there is genuine concurrency involved. > Instead of commenting READ_ONCE/WRITE_ONCE()s, how about adding anotations for data fields/variables that might be accessed without holding a lock? Because if all accesses to a variable are protected by proper locks, we mostly don't need to worry about data races caused by not using READ_ONCE/WRITE_ONCE(). Bad things happen when we write to a variable using locks but read it outside a lock critical section for better performance, for example, rcu_node::qsmask. I'm thinking so maybe we can introduce a new annotation similar to __rcu, maybe call it __lockfree ;-) as follow: struct rcu_node { ... unsigned long __lockfree qsmask; ... } , and __lockfree indicates that by design the maintainer of this data structure or variable believe there will be accesses outside lock critical sections. Note that not all accesses to __lockfree field, need to be READ_ONCE/WRITE_ONCE(), if the developer manages to build a complex but working wake/wait state machine so that it could not be accessed in the same time, READ_ONCE()/WRITE_ONCE() is not needed. If we have such an annotation, I think it won't be hard for configuring KCSAN to only examine accesses to variables with this annotation. Also this annotation could help other checkers in the future. If KCSAN (at the least the upstream version) only check accesses with such an anotation, "spamming with KCSAN warnings/fixes" will be the choice of each maintainer ;-) Thoughts? Regards, Boqun > > * How/when to upstream KCSAN? > > Start by posting the patches :) > > Will signature.asc Description: PGP signature
[PATCH v4 09/10] mm/gup: Allow to react to fatal signals
The existing gup code does not react to the fatal signals in many code paths. For example, in one retry path of gup we're still using down_read() rather than down_read_killable(). Also, when doing page faults we don't pass in FAULT_FLAG_KILLABLE as well, which means that within the faulting process we'll wait in non-killable way as well. These were spotted by Linus during the code review of some other patches. Let's allow the gup code to react to fatal signals to improve the responsiveness of threads when during gup and being killed. Signed-off-by: Peter Xu --- mm/gup.c | 12 +--- mm/hugetlb.c | 3 ++- 2 files changed, 11 insertions(+), 4 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index d2811bb15a25..4c638473db83 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -640,7 +640,7 @@ static int faultin_page(struct task_struct *tsk, struct vm_area_struct *vma, if (*flags & FOLL_REMOTE) fault_flags |= FAULT_FLAG_REMOTE; if (locked) - fault_flags |= FAULT_FLAG_ALLOW_RETRY; + fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; if (*flags & FOLL_NOWAIT) fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT; if (*flags & FOLL_TRIED) { @@ -973,7 +973,7 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm, vm_fault_t ret, major = 0; if (unlocked) - fault_flags |= FAULT_FLAG_ALLOW_RETRY; + fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; retry: vma = find_extend_vma(mm, address); @@ -1086,7 +1086,13 @@ static __always_inline long __get_user_pages_locked(struct task_struct *tsk, break; *locked = 1; - down_read(>mmap_sem); + ret = down_read_killable(>mmap_sem); + if (ret) { + BUG_ON(ret > 0); + if (!pages_done) + pages_done = ret; + break; + } ret = __get_user_pages(tsk, mm, start, 1, flags | FOLL_TRIED, pages, NULL, locked); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index d0c98cff5b0f..84034154d50e 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4342,7 +4342,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, if (flags & FOLL_WRITE) fault_flags |= FAULT_FLAG_WRITE; if (locked) - fault_flags |= FAULT_FLAG_ALLOW_RETRY; + fault_flags |= FAULT_FLAG_ALLOW_RETRY | + FAULT_FLAG_KILLABLE; if (flags & FOLL_NOWAIT) fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT; -- 2.21.0
[PATCH v4 07/10] mm: Allow VM_FAULT_RETRY for multiple times
The idea comes from a discussion between Linus and Andrea [1]. Before this patch we only allow a page fault to retry once. We achieved this by clearing the FAULT_FLAG_ALLOW_RETRY flag when doing handle_mm_fault() the second time. This was majorly used to avoid unexpected starvation of the system by looping over forever to handle the page fault on a single page. However that should hardly happen, and after all for each code path to return a VM_FAULT_RETRY we'll first wait for a condition (during which time we should possibly yield the cpu) to happen before VM_FAULT_RETRY is really returned. This patch removes the restriction by keeping the FAULT_FLAG_ALLOW_RETRY flag when we receive VM_FAULT_RETRY. It means that the page fault handler now can retry the page fault for multiple times if necessary without the need to generate another page fault event. Meanwhile we still keep the FAULT_FLAG_TRIED flag so page fault handler can still identify whether a page fault is the first attempt or not. Then we'll have these combinations of fault flags (only considering ALLOW_RETRY flag and TRIED flag): - ALLOW_RETRY and !TRIED: this means the page fault allows to retry, and this is the first try - ALLOW_RETRY and TRIED: this means the page fault allows to retry, and this is not the first try - !ALLOW_RETRY and !TRIED: this means the page fault does not allow to retry at all - !ALLOW_RETRY and TRIED: this is forbidden and should never be used In existing code we have multiple places that has taken special care of the first condition above by checking against (fault_flags & FAULT_FLAG_ALLOW_RETRY). This patch introduces a simple helper to detect the first retry of a page fault by checking against both (fault_flags & FAULT_FLAG_ALLOW_RETRY) and !(fault_flag & FAULT_FLAG_TRIED) because now even the 2nd try will have the ALLOW_RETRY set, then use that helper in all existing special paths. One example is in __lock_page_or_retry(), now we'll drop the mmap_sem only in the first attempt of page fault and we'll keep it in follow up retries, so old locking behavior will be retained. This will be a nice enhancement for current code [2] at the same time a supporting material for the future userfaultfd-writeprotect work, since in that work there will always be an explicit userfault writeprotect retry for protected pages, and if that cannot resolve the page fault (e.g., when userfaultfd-writeprotect is used in conjunction with swapped pages) then we'll possibly need a 3rd retry of the page fault. It might also benefit other potential users who will have similar requirement like userfault write-protection. GUP code is not touched yet and will be covered in follow up patch. Please read the thread below for more information. [1] https://lore.kernel.org/lkml/20171102193644.gb22...@redhat.com/ [2] https://lore.kernel.org/lkml/20181230154648.gb9...@redhat.com/ Suggested-by: Linus Torvalds Suggested-by: Andrea Arcangeli Reviewed-by: Jerome Glisse Signed-off-by: Peter Xu --- arch/alpha/mm/fault.c | 2 +- arch/arc/mm/fault.c | 1 - arch/arm/mm/fault.c | 3 --- arch/arm64/mm/fault.c | 5 - arch/hexagon/mm/vm_fault.c | 1 - arch/ia64/mm/fault.c| 1 - arch/m68k/mm/fault.c| 3 --- arch/microblaze/mm/fault.c | 1 - arch/mips/mm/fault.c| 1 - arch/nds32/mm/fault.c | 1 - arch/nios2/mm/fault.c | 3 --- arch/openrisc/mm/fault.c| 1 - arch/parisc/mm/fault.c | 4 +--- arch/powerpc/mm/fault.c | 6 -- arch/riscv/mm/fault.c | 5 - arch/s390/mm/fault.c| 5 + arch/sh/mm/fault.c | 1 - arch/sparc/mm/fault_32.c| 1 - arch/sparc/mm/fault_64.c| 1 - arch/um/kernel/trap.c | 1 - arch/unicore32/mm/fault.c | 4 +--- arch/x86/mm/fault.c | 2 -- arch/xtensa/mm/fault.c | 1 - drivers/gpu/drm/ttm/ttm_bo_vm.c | 12 --- include/linux/mm.h | 37 + mm/filemap.c| 2 +- mm/shmem.c | 2 +- 27 files changed, 52 insertions(+), 55 deletions(-) diff --git a/arch/alpha/mm/fault.c b/arch/alpha/mm/fault.c index ab1d4212d658..e032d2d03012 100644 --- a/arch/alpha/mm/fault.c +++ b/arch/alpha/mm/fault.c @@ -170,7 +170,7 @@ do_page_fault(unsigned long address, unsigned long mmcsr, else current->min_flt++; if (fault & VM_FAULT_RETRY) { - flags &= ~FAULT_FLAG_ALLOW_RETRY; + flags |= FAULT_FLAG_TRIED; /* No need to up_read(>mmap_sem) as we would * have already released it in __lock_page_or_retry diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c index
[PATCH v4 08/10] mm/gup: Allow VM_FAULT_RETRY for multiple times
This is the gup counterpart of the change that allows the VM_FAULT_RETRY to happen for more than once. One thing to mention is that we must check the fatal signal here before retry because the GUP can be interrupted by that, otherwise we can loop forever. Signed-off-by: Peter Xu --- mm/gup.c | 27 +-- mm/hugetlb.c | 6 -- 2 files changed, 25 insertions(+), 8 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index e60d32f1674d..d2811bb15a25 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -644,7 +644,10 @@ static int faultin_page(struct task_struct *tsk, struct vm_area_struct *vma, if (*flags & FOLL_NOWAIT) fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT; if (*flags & FOLL_TRIED) { - VM_WARN_ON_ONCE(fault_flags & FAULT_FLAG_ALLOW_RETRY); + /* +* Note: FAULT_FLAG_ALLOW_RETRY and FAULT_FLAG_TRIED +* can co-exist +*/ fault_flags |= FAULT_FLAG_TRIED; } @@ -994,7 +997,6 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm, down_read(>mmap_sem); if (!(fault_flags & FAULT_FLAG_TRIED)) { *unlocked = true; - fault_flags &= ~FAULT_FLAG_ALLOW_RETRY; fault_flags |= FAULT_FLAG_TRIED; goto retry; } @@ -1069,17 +1071,30 @@ static __always_inline long __get_user_pages_locked(struct task_struct *tsk, if (likely(pages)) pages += ret; start += ret << PAGE_SHIFT; + lock_dropped = true; +retry: /* * Repeat on the address that fired VM_FAULT_RETRY -* without FAULT_FLAG_ALLOW_RETRY but with -* FAULT_FLAG_TRIED. +* with both FAULT_FLAG_ALLOW_RETRY and +* FAULT_FLAG_TRIED. Note that GUP can be interrupted +* by fatal signals, so we need to check it before we +* start trying again otherwise it can loop forever. */ + + if (fatal_signal_pending(current)) + break; + *locked = 1; - lock_dropped = true; down_read(>mmap_sem); + ret = __get_user_pages(tsk, mm, start, 1, flags | FOLL_TRIED, - pages, NULL, NULL); + pages, NULL, locked); + if (!*locked) { + /* Continue to retry until we succeeded */ + BUG_ON(ret != 0); + goto retry; + } if (ret != 1) { BUG_ON(ret > 1); if (!pages_done) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 31c2a6275023..d0c98cff5b0f 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4347,8 +4347,10 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT; if (flags & FOLL_TRIED) { - VM_WARN_ON_ONCE(fault_flags & - FAULT_FLAG_ALLOW_RETRY); + /* +* Note: FAULT_FLAG_ALLOW_RETRY and +* FAULT_FLAG_TRIED can co-exist +*/ fault_flags |= FAULT_FLAG_TRIED; } ret = hugetlb_fault(mm, vma, vaddr, fault_flags); -- 2.21.0
[PATCH v4 10/10] mm/userfaultfd: Honor FAULT_FLAG_KILLABLE in fault path
Userfaultfd fault path was by default killable even if the caller does not have FAULT_FLAG_KILLABLE. That makes sense before in that when with gup we don't have FAULT_FLAG_KILLABLE properly set before. Now after previous patch we've got FAULT_FLAG_KILLABLE applied even for gup code so it should also make sense to let userfaultfd to honor the FAULT_FLAG_KILLABLE. Because we're unconditionally setting FAULT_FLAG_KILLABLE in gup code right now, this patch should have no functional change. It also cleaned the code a little bit by introducing some helpers. Signed-off-by: Peter Xu --- fs/userfaultfd.c | 36 1 file changed, 28 insertions(+), 8 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 2b3b48e94ae4..8c5863ccbf0e 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -334,6 +334,30 @@ static inline bool userfaultfd_must_wait(struct userfaultfd_ctx *ctx, return ret; } +/* Should pair with userfaultfd_signal_pending() */ +static inline long userfaultfd_get_blocking_state(unsigned int flags) +{ + if (flags & FAULT_FLAG_INTERRUPTIBLE) + return TASK_INTERRUPTIBLE; + + if (flags & FAULT_FLAG_KILLABLE) + return TASK_KILLABLE; + + return TASK_UNINTERRUPTIBLE; +} + +/* Should pair with userfaultfd_get_blocking_state() */ +static inline bool userfaultfd_signal_pending(unsigned int flags) +{ + if (flags & FAULT_FLAG_INTERRUPTIBLE) + return signal_pending(current); + + if (flags & FAULT_FLAG_KILLABLE) + return fatal_signal_pending(current); + + return false; +} + /* * The locking rules involved in returning VM_FAULT_RETRY depending on * FAULT_FLAG_ALLOW_RETRY, FAULT_FLAG_RETRY_NOWAIT and @@ -355,7 +379,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) struct userfaultfd_ctx *ctx; struct userfaultfd_wait_queue uwq; vm_fault_t ret = VM_FAULT_SIGBUS; - bool must_wait, return_to_userland; + bool must_wait; long blocking_state; /* @@ -462,9 +486,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) uwq.ctx = ctx; uwq.waken = false; - return_to_userland = vmf->flags & FAULT_FLAG_INTERRUPTIBLE; - blocking_state = return_to_userland ? TASK_INTERRUPTIBLE : -TASK_KILLABLE; + blocking_state = userfaultfd_get_blocking_state(vmf->flags); spin_lock_irq(>fault_pending_wqh.lock); /* @@ -490,8 +512,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) up_read(>mmap_sem); if (likely(must_wait && !READ_ONCE(ctx->released) && - (return_to_userland ? !signal_pending(current) : - !fatal_signal_pending(current { + userfaultfd_signal_pending(vmf->flags))) { wake_up_poll(>fd_wqh, EPOLLIN); schedule(); ret |= VM_FAULT_MAJOR; @@ -513,8 +534,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) set_current_state(blocking_state); if (READ_ONCE(uwq.waken) || READ_ONCE(ctx->released) || - (return_to_userland ? signal_pending(current) : -fatal_signal_pending(current))) + userfaultfd_signal_pending(vmf->flags)) break; schedule(); } -- 2.21.0
[PATCH v4 05/10] mm: Return faster for non-fatal signals in user mode faults
The idea comes from the upstream discussion between Linus and Andrea: https://lore.kernel.org/lkml/20171102193644.gb22...@redhat.com/ A summary to the issue: there was a special path in handle_userfault() in the past that we'll return a VM_FAULT_NOPAGE when we detected non-fatal signals when waiting for userfault handling. We did that by reacquiring the mmap_sem before returning. However that brings a risk in that the vmas might have changed when we retake the mmap_sem and even we could be holding an invalid vma structure. This patch is a preparation of removing that special path by allowing the page fault to return even faster if we were interrupted by a non-fatal signal during a user-mode page fault handling routine. Suggested-by: Linus Torvalds Suggested-by: Andrea Arcangeli Signed-off-by: Peter Xu --- arch/alpha/mm/fault.c| 3 ++- arch/arc/mm/fault.c | 5 + arch/arm/mm/fault.c | 9 + arch/arm64/mm/fault.c| 9 + arch/hexagon/mm/vm_fault.c | 3 ++- arch/ia64/mm/fault.c | 3 ++- arch/m68k/mm/fault.c | 5 +++-- arch/microblaze/mm/fault.c | 3 ++- arch/mips/mm/fault.c | 3 ++- arch/nds32/mm/fault.c| 9 + arch/nios2/mm/fault.c| 3 ++- arch/openrisc/mm/fault.c | 3 ++- arch/parisc/mm/fault.c | 3 ++- arch/powerpc/mm/fault.c | 2 ++ arch/riscv/mm/fault.c| 5 +++-- arch/s390/mm/fault.c | 4 ++-- arch/sh/mm/fault.c | 4 arch/sparc/mm/fault_32.c | 2 +- arch/sparc/mm/fault_64.c | 3 ++- arch/um/kernel/trap.c| 4 +++- arch/unicore32/mm/fault.c| 5 +++-- arch/x86/mm/fault.c | 2 ++ arch/xtensa/mm/fault.c | 3 ++- include/linux/sched/signal.h | 12 24 files changed, 75 insertions(+), 32 deletions(-) diff --git a/arch/alpha/mm/fault.c b/arch/alpha/mm/fault.c index de4cc6936391..ab1d4212d658 100644 --- a/arch/alpha/mm/fault.c +++ b/arch/alpha/mm/fault.c @@ -150,7 +150,8 @@ do_page_fault(unsigned long address, unsigned long mmcsr, the fault. */ fault = handle_mm_fault(vma, address, flags); - if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) + if ((fault & VM_FAULT_RETRY) && + fault_should_check_signal(user_mode(regs))) return; if (unlikely(fault & VM_FAULT_ERROR)) { diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c index 61919e4e4eec..27adf4e608e4 100644 --- a/arch/arc/mm/fault.c +++ b/arch/arc/mm/fault.c @@ -142,6 +142,11 @@ void do_page_fault(unsigned long address, struct pt_regs *regs) goto no_context; return; } + + /* Allow user to handle non-fatal signals first */ + if (signal_pending(current) && user_mode(regs)) + return; + /* * retry state machine */ diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c index 2ae28ffec622..f00fb4eafe54 100644 --- a/arch/arm/mm/fault.c +++ b/arch/arm/mm/fault.c @@ -291,14 +291,15 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs) fault = __do_page_fault(mm, addr, fsr, flags, tsk); - /* If we need to retry but a fatal signal is pending, handle the + /* If we need to retry but a signal is pending, try to handle the * signal first. We do not need to release the mmap_sem because * it would already be released in __lock_page_or_retry in * mm/filemap.c. */ - if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) { - if (!user_mode(regs)) + if (unlikely(fault & VM_FAULT_RETRY && signal_pending(current))) { + if (fatal_signal_pending(current) && !user_mode(regs)) goto no_context; - return 0; + if (user_mode(regs)) + return 0; } /* diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c index 613e7434c208..0d3fe0ea6a70 100644 --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c @@ -479,15 +479,16 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr, if (fault & VM_FAULT_RETRY) { /* -* If we need to retry but a fatal signal is pending, +* If we need to retry but a signal is pending, try to * handle the signal first. We do not need to release * the mmap_sem because it would already be released * in __lock_page_or_retry in mm/filemap.c. */ - if (fatal_signal_pending(current)) { - if (!user_mode(regs)) + if (signal_pending(current)) { + if (fatal_signal_pending(current) && !user_mode(regs)) goto no_context; -
[PATCH v4 06/10] userfaultfd: Don't retake mmap_sem to emulate NOPAGE
This patch removes the risk path in handle_userfault() then we will be sure that the callers of handle_mm_fault() will know that the VMAs might have changed. Meanwhile with previous patch we don't lose responsiveness as well since the core mm code now can handle the nonfatal userspace signals even if we return VM_FAULT_RETRY. Suggested-by: Andrea Arcangeli Suggested-by: Linus Torvalds Reviewed-by: Jerome Glisse Signed-off-by: Peter Xu --- fs/userfaultfd.c | 24 1 file changed, 24 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 3c55ee64bcb1..2b3b48e94ae4 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -522,30 +522,6 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) __set_current_state(TASK_RUNNING); - if (return_to_userland) { - if (signal_pending(current) && - !fatal_signal_pending(current)) { - /* -* If we got a SIGSTOP or SIGCONT and this is -* a normal userland page fault, just let -* userland return so the signal will be -* handled and gdb debugging works. The page -* fault code immediately after we return from -* this function is going to release the -* mmap_sem and it's not depending on it -* (unlike gup would if we were not to return -* VM_FAULT_RETRY). -* -* If a fatal signal is pending we still take -* the streamlined VM_FAULT_RETRY failure path -* and there's no need to retake the mmap_sem -* in such case. -*/ - down_read(>mmap_sem); - ret = VM_FAULT_NOPAGE; - } - } - /* * Here we race with the list_del; list_add in * userfaultfd_ctx_read(), however because we don't ever run -- 2.21.0
[PATCH v4 01/10] mm/gup: Rename "nonblocking" to "locked" where proper
There's plenty of places around __get_user_pages() that has a parameter "nonblocking" which does not really mean that "it won't block" (because it can really block) but instead it shows whether the mmap_sem is released by up_read() during the page fault handling mostly when VM_FAULT_RETRY is returned. We have the correct naming in e.g. get_user_pages_locked() or get_user_pages_remote() as "locked", however there're still many places that are using the "nonblocking" as name. Renaming the places to "locked" where proper to better suite the functionality of the variable. While at it, fixing up some of the comments accordingly. Reviewed-by: Mike Rapoport Reviewed-by: Jerome Glisse Reviewed-by: David Hildenbrand Signed-off-by: Peter Xu --- mm/gup.c | 44 +--- mm/hugetlb.c | 8 2 files changed, 25 insertions(+), 27 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index 98f13ab37bac..eddbb95dcb8f 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -622,12 +622,12 @@ static int get_gate_page(struct mm_struct *mm, unsigned long address, } /* - * mmap_sem must be held on entry. If @nonblocking != NULL and - * *@flags does not include FOLL_NOWAIT, the mmap_sem may be released. - * If it is, *@nonblocking will be set to 0 and -EBUSY returned. + * mmap_sem must be held on entry. If @locked != NULL and *@flags + * does not include FOLL_NOWAIT, the mmap_sem may be released. If it + * is, *@locked will be set to 0 and -EBUSY returned. */ static int faultin_page(struct task_struct *tsk, struct vm_area_struct *vma, - unsigned long address, unsigned int *flags, int *nonblocking) + unsigned long address, unsigned int *flags, int *locked) { unsigned int fault_flags = 0; vm_fault_t ret; @@ -639,7 +639,7 @@ static int faultin_page(struct task_struct *tsk, struct vm_area_struct *vma, fault_flags |= FAULT_FLAG_WRITE; if (*flags & FOLL_REMOTE) fault_flags |= FAULT_FLAG_REMOTE; - if (nonblocking) + if (locked) fault_flags |= FAULT_FLAG_ALLOW_RETRY; if (*flags & FOLL_NOWAIT) fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT; @@ -665,8 +665,8 @@ static int faultin_page(struct task_struct *tsk, struct vm_area_struct *vma, } if (ret & VM_FAULT_RETRY) { - if (nonblocking && !(fault_flags & FAULT_FLAG_RETRY_NOWAIT)) - *nonblocking = 0; + if (locked && !(fault_flags & FAULT_FLAG_RETRY_NOWAIT)) + *locked = 0; return -EBUSY; } @@ -743,7 +743,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) * only intends to ensure the pages are faulted in. * @vmas: array of pointers to vmas corresponding to each page. * Or NULL if the caller does not require them. - * @nonblocking: whether waiting for disk IO or mmap_sem contention + * @locked: whether we're still with the mmap_sem held * * Returns number of pages pinned. This may be fewer than the number * requested. If nr_pages is 0 or negative, returns 0. If no pages @@ -772,13 +772,11 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) * appropriate) must be called after the page is finished with, and * before put_page is called. * - * If @nonblocking != NULL, __get_user_pages will not wait for disk IO - * or mmap_sem contention, and if waiting is needed to pin all pages, - * *@nonblocking will be set to 0. Further, if @gup_flags does not - * include FOLL_NOWAIT, the mmap_sem will be released via up_read() in - * this case. + * If @locked != NULL, *@locked will be set to 0 when mmap_sem is + * released by an up_read(). That can happen if @gup_flags does not + * have FOLL_NOWAIT. * - * A caller using such a combination of @nonblocking and @gup_flags + * A caller using such a combination of @locked and @gup_flags * must therefore hold the mmap_sem for reading only, and recognize * when it's been released. Otherwise, it must be held for either * reading or writing and will not be released. @@ -790,7 +788,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, unsigned long start, unsigned long nr_pages, unsigned int gup_flags, struct page **pages, - struct vm_area_struct **vmas, int *nonblocking) + struct vm_area_struct **vmas, int *locked) { long ret = 0, i = 0; struct vm_area_struct *vma = NULL; @@ -834,7 +832,7 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, if (is_vm_hugetlb_page(vma)) { i = follow_hugetlb_page(mm, vma, pages, vmas, ,
[PATCH v4 02/10] mm/gup: Fix __get_user_pages() on fault retry of hugetlb
When follow_hugetlb_page() returns with *locked==0, it means we've got a VM_FAULT_RETRY within the fauling process and we've released the mmap_sem. When that happens, we should stop and bail out. Signed-off-by: Peter Xu --- mm/gup.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/mm/gup.c b/mm/gup.c index eddbb95dcb8f..e60d32f1674d 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -833,6 +833,16 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, i = follow_hugetlb_page(mm, vma, pages, vmas, , _pages, i, gup_flags, locked); + if (locked && *locked == 0) { + /* +* We've got a VM_FAULT_RETRY +* and we've lost mmap_sem. +* We must stop here. +*/ + BUG_ON(gup_flags & FOLL_NOWAIT); + BUG_ON(ret != 0); + goto out; + } continue; } } -- 2.21.0
[PATCH v4 00/10] mm: Page fault enhancements
This is the 4th version of the PF enhancement series on signal handlings and fault retries. This new version does not change existing patches in v3 but added two more patches to address the current gup issue on not responding to SIGKILL. A 3rd new patch is also added to allow handle_userfaultfd to respect FAULT_FLAG_KILLABLE though should have no functional change when with the two new patches above. I would really appreciate any review comments for the series, especially for the first two patches which IMHO are even not related to this patchset and they should either cleanup or fix things. v4: - use lore.kernel.org for all the links in commit messages [Kirill] - one more patch ("mm/gup: Fix __get_user_pages() on fault retry of hugetlb") to fix hugetlb path on fault retry - one more patch ("mm/gup: Allow to react to fatal signals") to: - use down_read_killable() properly [Linus] - pass in FAULT_FLAG_KILLABLE for all GUP [Linus] - one more patch ("mm/userfaultfd: Honor FAULT_FLAG_KILLABLE in fault path") to let handle_userfaultfd() respect FAULT_FLAG_KILLABLE. Should have no functional change after previous two new patches. v3: - check fatal signals in __get_user_page_locked() [Linus] - add r-bs v2: - resent previous version, rebase only === v1 cover letter == This series is split out of userfaultfd-wp series to only cover the general page fault changes, since it seems to make sense itself. Basically it does two things: (a) Allows the page fault handlers to be more interactive on not only SIGKILL, but also the rest of userspace signals (especially for user-mode faults), and, (b) Allows the page fault retry (VM_FAULT_RETRY) to happen for more than once. I'm keeping the CC list as in uffd-wp v5, hopefully I'm not sending too much spams... And, instead of writting again the cover letter, I'm just copy-pasting my previous link here which has more details on why we do this: https://patchwork.kernel.org/cover/10691991/ The major change from that latest version should be that we introduced a new page fault flag FAULT_FLAG_INTERRUPTIBLE as suggested by Linus [1] to represents that we would like the fault handler to respond to non-fatal signals. Also, we're more careful now on when to do the immediate return of the page fault for such signals. For example, now we'll only check against signal_pending() for user-mode page faults and we keep the kernel-mode page fault patch untouched for it. More information can be found in separate patches. The patchset is only lightly tested on x86. All comments are greatly welcomed. Thanks, [1] https://lkml.org/lkml/2019/6/25/1382 Peter Xu (10): mm/gup: Rename "nonblocking" to "locked" where proper mm/gup: Fix __get_user_pages() on fault retry of hugetlb mm: Introduce FAULT_FLAG_DEFAULT mm: Introduce FAULT_FLAG_INTERRUPTIBLE mm: Return faster for non-fatal signals in user mode faults userfaultfd: Don't retake mmap_sem to emulate NOPAGE mm: Allow VM_FAULT_RETRY for multiple times mm/gup: Allow VM_FAULT_RETRY for multiple times mm/gup: Allow to react to fatal signals mm/userfaultfd: Honor FAULT_FLAG_KILLABLE in fault path arch/alpha/mm/fault.c | 7 +-- arch/arc/mm/fault.c | 8 ++- arch/arm/mm/fault.c | 14 +++-- arch/arm64/mm/fault.c | 16 +++--- arch/hexagon/mm/vm_fault.c | 6 +-- arch/ia64/mm/fault.c| 6 +-- arch/m68k/mm/fault.c| 10 ++-- arch/microblaze/mm/fault.c | 6 +-- arch/mips/mm/fault.c| 6 +-- arch/nds32/mm/fault.c | 12 ++--- arch/nios2/mm/fault.c | 8 ++- arch/openrisc/mm/fault.c| 6 +-- arch/parisc/mm/fault.c | 9 ++-- arch/powerpc/mm/fault.c | 10 ++-- arch/riscv/mm/fault.c | 12 ++--- arch/s390/mm/fault.c| 11 ++-- arch/sh/mm/fault.c | 7 ++- arch/sparc/mm/fault_32.c| 5 +- arch/sparc/mm/fault_64.c| 6 +-- arch/um/kernel/trap.c | 7 +-- arch/unicore32/mm/fault.c | 11 ++-- arch/x86/mm/fault.c | 6 +-- arch/xtensa/mm/fault.c | 6 +-- drivers/gpu/drm/ttm/ttm_bo_vm.c | 12 +++-- fs/userfaultfd.c| 62 ++ include/linux/mm.h | 81 include/linux/sched/signal.h| 12 + mm/filemap.c| 2 +- mm/gup.c| 93 + mm/hugetlb.c| 17 +++--- mm/shmem.c | 2 +- 31 files changed, 283 insertions(+), 193 deletions(-) -- 2.21.0
[PATCH v4 04/10] mm: Introduce FAULT_FLAG_INTERRUPTIBLE
handle_userfaultfd() is currently the only one place in the kernel page fault procedures that can respond to non-fatal userspace signals. It was trying to detect such an allowance by checking against USER & KILLABLE flags, which was "un-official". In this patch, we introduced a new flag (FAULT_FLAG_INTERRUPTIBLE) to show that the fault handler allows the fault procedure to respond even to non-fatal signals. Meanwhile, add this new flag to the default fault flags so that all the page fault handlers can benefit from the new flag. With that, replacing the userfault check to this one. Since the line is getting even longer, clean up the fault flags a bit too to ease TTY users. Although we've got a new flag and applied it, we shouldn't have any functional change with this patch so far. Suggested-by: Linus Torvalds Reviewed-by: David Hildenbrand Signed-off-by: Peter Xu --- fs/userfaultfd.c | 4 +--- include/linux/mm.h | 39 --- 2 files changed, 29 insertions(+), 14 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index fe6d804a38dc..3c55ee64bcb1 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -462,9 +462,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) uwq.ctx = ctx; uwq.waken = false; - return_to_userland = - (vmf->flags & (FAULT_FLAG_USER|FAULT_FLAG_KILLABLE)) == - (FAULT_FLAG_USER|FAULT_FLAG_KILLABLE); + return_to_userland = vmf->flags & FAULT_FLAG_INTERRUPTIBLE; blocking_state = return_to_userland ? TASK_INTERRUPTIBLE : TASK_KILLABLE; diff --git a/include/linux/mm.h b/include/linux/mm.h index 57fb5c535f8e..53ec7abb8472 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -383,22 +383,38 @@ extern unsigned int kobjsize(const void *objp); */ extern pgprot_t protection_map[16]; -#define FAULT_FLAG_WRITE 0x01/* Fault was a write access */ -#define FAULT_FLAG_MKWRITE 0x02/* Fault was mkwrite of existing pte */ -#define FAULT_FLAG_ALLOW_RETRY 0x04/* Retry fault if blocking */ -#define FAULT_FLAG_RETRY_NOWAIT0x08/* Don't drop mmap_sem and wait when retrying */ -#define FAULT_FLAG_KILLABLE0x10/* The fault task is in SIGKILL killable region */ -#define FAULT_FLAG_TRIED 0x20/* Second try */ -#define FAULT_FLAG_USER0x40/* The fault originated in userspace */ -#define FAULT_FLAG_REMOTE 0x80/* faulting for non current tsk/mm */ -#define FAULT_FLAG_INSTRUCTION 0x100 /* The fault was during an instruction fetch */ +/** + * Fault flag definitions. + * + * @FAULT_FLAG_WRITE: Fault was a write fault. + * @FAULT_FLAG_MKWRITE: Fault was mkwrite of existing PTE. + * @FAULT_FLAG_ALLOW_RETRY: Allow to retry the fault if blocked. + * @FAULT_FLAG_RETRY_NOWAIT: Don't drop mmap_sem and wait when retrying. + * @FAULT_FLAG_KILLABLE: The fault task is in SIGKILL killable region. + * @FAULT_FLAG_TRIED: The fault has been tried once. + * @FAULT_FLAG_USER: The fault originated in userspace. + * @FAULT_FLAG_REMOTE: The fault is not for current task/mm. + * @FAULT_FLAG_INSTRUCTION: The fault was during an instruction fetch. + * @FAULT_FLAG_INTERRUPTIBLE: The fault can be interrupted by non-fatal signals. + */ +#define FAULT_FLAG_WRITE 0x01 +#define FAULT_FLAG_MKWRITE 0x02 +#define FAULT_FLAG_ALLOW_RETRY 0x04 +#define FAULT_FLAG_RETRY_NOWAIT0x08 +#define FAULT_FLAG_KILLABLE0x10 +#define FAULT_FLAG_TRIED 0x20 +#define FAULT_FLAG_USER0x40 +#define FAULT_FLAG_REMOTE 0x80 +#define FAULT_FLAG_INSTRUCTION 0x100 +#define FAULT_FLAG_INTERRUPTIBLE 0x200 /* * The default fault flags that should be used by most of the * arch-specific page fault handlers. */ #define FAULT_FLAG_DEFAULT (FAULT_FLAG_ALLOW_RETRY | \ -FAULT_FLAG_KILLABLE) +FAULT_FLAG_KILLABLE | \ +FAULT_FLAG_INTERRUPTIBLE) #define FAULT_FLAG_TRACE \ { FAULT_FLAG_WRITE, "WRITE" }, \ @@ -409,7 +425,8 @@ extern pgprot_t protection_map[16]; { FAULT_FLAG_TRIED, "TRIED" }, \ { FAULT_FLAG_USER, "USER" }, \ { FAULT_FLAG_REMOTE,"REMOTE" }, \ - { FAULT_FLAG_INSTRUCTION, "INSTRUCTION" } + { FAULT_FLAG_INSTRUCTION, "INSTRUCTION" }, \ + { FAULT_FLAG_INTERRUPTIBLE, "INTERRUPTIBLE" } /* * vm_fault is filled by the the pagefault handler and passed to the vma's -- 2.21.0
[PATCH v4 03/10] mm: Introduce FAULT_FLAG_DEFAULT
Although there're tons of arch-specific page fault handlers, most of them are still sharing the same initial value of the page fault flags. Say, merely all of the page fault handlers would allow the fault to be retried, and they also allow the fault to respond to SIGKILL. Let's define a default value for the fault flags to replace those initial page fault flags that were copied over. With this, it'll be far easier to introduce new fault flag that can be used by all the architectures instead of touching all the archs. Reviewed-by: David Hildenbrand Signed-off-by: Peter Xu --- arch/alpha/mm/fault.c | 2 +- arch/arc/mm/fault.c| 2 +- arch/arm/mm/fault.c| 2 +- arch/arm64/mm/fault.c | 2 +- arch/hexagon/mm/vm_fault.c | 2 +- arch/ia64/mm/fault.c | 2 +- arch/m68k/mm/fault.c | 2 +- arch/microblaze/mm/fault.c | 2 +- arch/mips/mm/fault.c | 2 +- arch/nds32/mm/fault.c | 2 +- arch/nios2/mm/fault.c | 2 +- arch/openrisc/mm/fault.c | 2 +- arch/parisc/mm/fault.c | 2 +- arch/powerpc/mm/fault.c| 2 +- arch/riscv/mm/fault.c | 2 +- arch/s390/mm/fault.c | 2 +- arch/sh/mm/fault.c | 2 +- arch/sparc/mm/fault_32.c | 2 +- arch/sparc/mm/fault_64.c | 2 +- arch/um/kernel/trap.c | 2 +- arch/unicore32/mm/fault.c | 2 +- arch/x86/mm/fault.c| 2 +- arch/xtensa/mm/fault.c | 2 +- include/linux/mm.h | 7 +++ 24 files changed, 30 insertions(+), 23 deletions(-) diff --git a/arch/alpha/mm/fault.c b/arch/alpha/mm/fault.c index 741e61ef9d3f..de4cc6936391 100644 --- a/arch/alpha/mm/fault.c +++ b/arch/alpha/mm/fault.c @@ -89,7 +89,7 @@ do_page_fault(unsigned long address, unsigned long mmcsr, const struct exception_table_entry *fixup; int si_code = SEGV_MAPERR; vm_fault_t fault; - unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; + unsigned int flags = FAULT_FLAG_DEFAULT; /* As of EV6, a load into $31/$f31 is a prefetch, and never faults (or is suppressed by the PALcode). Support that for older CPUs diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c index 3861543b66a0..61919e4e4eec 100644 --- a/arch/arc/mm/fault.c +++ b/arch/arc/mm/fault.c @@ -94,7 +94,7 @@ void do_page_fault(unsigned long address, struct pt_regs *regs) (regs->ecr_cause == ECR_C_PROTV_INST_FETCH)) exec = 1; - flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; + flags = FAULT_FLAG_DEFAULT; if (user_mode(regs)) flags |= FAULT_FLAG_USER; if (write) diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c index 890eeaac3cbb..2ae28ffec622 100644 --- a/arch/arm/mm/fault.c +++ b/arch/arm/mm/fault.c @@ -241,7 +241,7 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs) struct mm_struct *mm; int sig, code; vm_fault_t fault; - unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; + unsigned int flags = FAULT_FLAG_DEFAULT; if (kprobe_page_fault(regs, fsr)) return 0; diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c index cfd65b63f36f..613e7434c208 100644 --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c @@ -410,7 +410,7 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr, struct mm_struct *mm = current->mm; vm_fault_t fault, major = 0; unsigned long vm_flags = VM_READ | VM_WRITE; - unsigned int mm_flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; + unsigned int mm_flags = FAULT_FLAG_DEFAULT; if (kprobe_page_fault(regs, esr)) return 0; diff --git a/arch/hexagon/mm/vm_fault.c b/arch/hexagon/mm/vm_fault.c index b3bc71680ae4..223787e01bdd 100644 --- a/arch/hexagon/mm/vm_fault.c +++ b/arch/hexagon/mm/vm_fault.c @@ -41,7 +41,7 @@ void do_page_fault(unsigned long address, long cause, struct pt_regs *regs) int si_code = SEGV_MAPERR; vm_fault_t fault; const struct exception_table_entry *fixup; - unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; + unsigned int flags = FAULT_FLAG_DEFAULT; /* * If we're in an interrupt or have no user context, diff --git a/arch/ia64/mm/fault.c b/arch/ia64/mm/fault.c index c2f299fe9e04..d039b846f671 100644 --- a/arch/ia64/mm/fault.c +++ b/arch/ia64/mm/fault.c @@ -65,7 +65,7 @@ ia64_do_page_fault (unsigned long address, unsigned long isr, struct pt_regs *re struct mm_struct *mm = current->mm; unsigned long mask; vm_fault_t fault; - unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; + unsigned int flags = FAULT_FLAG_DEFAULT; mask = isr >> IA64_ISR_X_BIT) & 1UL) << VM_EXEC_BIT) | (((isr >> IA64_ISR_W_BIT) & 1UL) << VM_WRITE_BIT)); diff --git a/arch/m68k/mm/fault.c b/arch/m68k/mm/fault.c index e9b1d7585b43..8e734309ace9 100644
Re: [PATCH] tty:vt: Add check the return value of kzalloc to avoid oops
On Sat, 21 Sep 2019, Xiaoming Ni wrote: > @ Nicolas Pitre > Can I make a v2 patch based on your advice ? > Or you will submit a patch for "GFP_WONTFAIL" yourself ? Here's a patch implementing what I had in mind. This is compile tested only. - >8 Subject: [PATCH] mm: add __GFP_WONTFAIL and GFP_ONBOOT Some memory allocations are very unlikely to fail during system boot. Because of that, the code often doesn't bother to check for allocation failure, but this gets reported anyway. As an alternative to adding code to check for NULL that has almost no chance of ever being exercised, let's use a GFP flag to identify those cases and panic the kernel if allocation failure ever occurs. Conversion of one such instance is also included. Signed-off-by: Nicolas Pitre diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c index 34aa39d1ae..bd0a0e4807 100644 --- a/drivers/tty/vt/vt.c +++ b/drivers/tty/vt/vt.c @@ -3356,7 +3356,7 @@ static int __init con_init(void) } for (currcons = 0; currcons < MIN_NR_CONSOLES; currcons++) { - vc_cons[currcons].d = vc = kzalloc(sizeof(struct vc_data), GFP_NOWAIT); + vc_cons[currcons].d = vc = kzalloc(sizeof(struct vc_data), GFP_ONBOOT); INIT_WORK(_cons[currcons].SAK_work, vc_SAK); tty_port_init(>port); visual_init(vc, currcons, 1); diff --git a/include/linux/gfp.h b/include/linux/gfp.h index f33881688f..6f33575cd6 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -39,8 +39,9 @@ struct vm_area_struct; #define ___GFP_HARDWALL0x10u #define ___GFP_THISNODE0x20u #define ___GFP_ACCOUNT 0x40u +#define ___GFP_WONTFAIL0x80u #ifdef CONFIG_LOCKDEP -#define ___GFP_NOLOCKDEP 0x80u +#define ___GFP_NOLOCKDEP 0x100u #else #define ___GFP_NOLOCKDEP 0 #endif @@ -187,6 +188,12 @@ struct vm_area_struct; * definitely preferable to use the flag rather than opencode endless * loop around allocator. * Using this flag for costly allocations is _highly_ discouraged. + * + * %__GFP_WONTFAIL: No allocation error is expected what so ever. The + * caller presumes allocation will always succeed (e.g. the system is still + * booting, the allocation size is relatively small and memory should be + * plentiful) so testing for failure is skipped. If allocation ever fails + * then the kernel will simply panic. */ #define __GFP_IO ((__force gfp_t)___GFP_IO) #define __GFP_FS ((__force gfp_t)___GFP_FS) @@ -196,6 +203,7 @@ struct vm_area_struct; #define __GFP_RETRY_MAYFAIL((__force gfp_t)___GFP_RETRY_MAYFAIL) #define __GFP_NOFAIL ((__force gfp_t)___GFP_NOFAIL) #define __GFP_NORETRY ((__force gfp_t)___GFP_NORETRY) +#define __GFP_WONTFAIL ((__force gfp_t)___GFP_WONTFAIL) /** * DOC: Action modifiers @@ -217,7 +225,7 @@ struct vm_area_struct; #define __GFP_NOLOCKDEP ((__force gfp_t)___GFP_NOLOCKDEP) /* Room for N __GFP_FOO bits */ -#define __GFP_BITS_SHIFT (23 + IS_ENABLED(CONFIG_LOCKDEP)) +#define __GFP_BITS_SHIFT (24 + IS_ENABLED(CONFIG_LOCKDEP)) #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1)) /** @@ -285,6 +293,9 @@ struct vm_area_struct; * available and will not wake kswapd/kcompactd on failure. The _LIGHT * version does not attempt reclaim/compaction at all and is by default used * in page fault path, while the non-light is used by khugepaged. + * + * %GFP_ONBOOT is for relatively small allocations that are not expected + * to fail while the system is booting. */ #define GFP_ATOMIC (__GFP_HIGH|__GFP_ATOMIC|__GFP_KSWAPD_RECLAIM) #define GFP_KERNEL (__GFP_RECLAIM | __GFP_IO | __GFP_FS) @@ -300,6 +311,7 @@ struct vm_area_struct; #define GFP_TRANSHUGE_LIGHT((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \ __GFP_NOMEMALLOC | __GFP_NOWARN) & ~__GFP_RECLAIM) #define GFP_TRANSHUGE (GFP_TRANSHUGE_LIGHT | __GFP_DIRECT_RECLAIM) +#define GFP_ONBOOT (GFP_NOWAIT | __GFP_WONTFAIL) /* Convert GFP flags to their corresponding migrate type */ #define GFP_MOVABLE_MASK (__GFP_RECLAIMABLE|__GFP_MOVABLE) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index ff5484fdbd..36dee09f7f 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4625,6 +4625,14 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, fail: warn_alloc(gfp_mask, ac->nodemask, "page allocation failure: order:%u", order); + if (gfp_mask & __GFP_WONTFAIL) { + /* +* The assumption was wrong. This is never supposed to happen. +* Caller most likely won't check for a returned NULL either. +* So the only reasonable thing to do is to pannic. +*/ + panic("Failed to allocate memory despite GFP_WONTFAIL\n"); + } got_pg: return page; }
Re: [PATCH v7 08/21] RISC-V: KVM: Implement KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls
On Wed, Sep 4, 2019 at 9:44 PM Anup Patel wrote: > > For KVM RISC-V, we use KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls to access > VCPU config and registers from user-space. > > We have three types of VCPU registers: > 1. CONFIG - these are VCPU config and capabilities > 2. CORE - these are VCPU general purpose registers > 3. CSR- these are VCPU control and status registers > > The CONFIG registers available to user-space are ISA and TIMEBASE. Out > of these, TIMEBASE is a read-only register which inform user-space about > VCPU timer base frequency. The ISA register is a read and write register > where user-space can only write the desired VCPU ISA capabilities before > running the VCPU. > > The CORE registers available to user-space are PC, RA, SP, GP, TP, A0-A7, > T0-T6, S0-S11 and MODE. Most of these are RISC-V general registers except > PC and MODE. The PC register represents program counter whereas the MODE > register represent VCPU privilege mode (i.e. S/U-mode). > > The CSRs available to user-space are SSTATUS, SIE, STVEC, SSCRATCH, SEPC, > SCAUSE, STVAL, SIP, and SATP. All of these are read/write registers. > > In future, more VCPU register types will be added (such as FP) for the > KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls. > > Signed-off-by: Anup Patel > Acked-by: Paolo Bonzini > Reviewed-by: Paolo Bonzini > --- > arch/riscv/include/uapi/asm/kvm.h | 46 +- > arch/riscv/kvm/vcpu.c | 235 +- > 2 files changed, 278 insertions(+), 3 deletions(-) > > diff --git a/arch/riscv/include/uapi/asm/kvm.h > b/arch/riscv/include/uapi/asm/kvm.h > index 6dbc056d58ba..08c4515ad71b 100644 > --- a/arch/riscv/include/uapi/asm/kvm.h > +++ b/arch/riscv/include/uapi/asm/kvm.h > @@ -23,8 +23,15 @@ > > /* for KVM_GET_REGS and KVM_SET_REGS */ > struct kvm_regs { > + /* out (KVM_GET_REGS) / in (KVM_SET_REGS) */ > + struct user_regs_struct regs; > + unsigned long mode; > }; As discussed in LPC 2019 with Alex Graf, I will add separate struct for CORE registers instead of re-using "struct kvm_regs". > > +/* Possible privilege modes for kvm_regs */ > +#define KVM_RISCV_MODE_S 1 > +#define KVM_RISCV_MODE_U 0 > + > /* for KVM_GET_FPU and KVM_SET_FPU */ > struct kvm_fpu { > }; > @@ -41,10 +48,47 @@ struct kvm_guest_debug_arch { > struct kvm_sync_regs { > }; > > -/* dummy definition */ > +/* for KVM_GET_SREGS and KVM_SET_SREGS */ > struct kvm_sregs { > + unsigned long sstatus; > + unsigned long sie; > + unsigned long stvec; > + unsigned long sscratch; > + unsigned long sepc; > + unsigned long scause; > + unsigned long stval; > + unsigned long sip; > + unsigned long satp; > +}; Same as above, I will add separate struct for CSR registers instead of re-using "struct kvm_sregs". > + > +/* for KVM_GET_ONE_REG and KVM_SET_ONE_REG */ > +struct kvm_riscv_config { > + unsigned long isa; > + unsigned long tbfreq; > }; > > +#define KVM_REG_SIZE(id) \ > + (1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT)) > + > +/* If you need to interpret the index values, here is the key: */ > +#define KVM_REG_RISCV_TYPE_MASK0xFF00 > +#define KVM_REG_RISCV_TYPE_SHIFT 24 > + > +/* Config registers are mapped as type 1 */ > +#define KVM_REG_RISCV_CONFIG (0x01 << KVM_REG_RISCV_TYPE_SHIFT) > +#define KVM_REG_RISCV_CONFIG_REG(name) \ > + (offsetof(struct kvm_riscv_config, name) / sizeof(unsigned long)) > + > +/* Core registers are mapped as type 2 */ > +#define KVM_REG_RISCV_CORE (0x02 << KVM_REG_RISCV_TYPE_SHIFT) > +#define KVM_REG_RISCV_CORE_REG(name) \ > + (offsetof(struct kvm_regs, name) / sizeof(unsigned long)) > + > +/* Control and status registers are mapped as type 3 */ > +#define KVM_REG_RISCV_CSR (0x03 << KVM_REG_RISCV_TYPE_SHIFT) > +#define KVM_REG_RISCV_CSR_REG(name)\ > + (offsetof(struct kvm_sregs, name) / sizeof(unsigned long)) > + > #endif > > #endif /* __LINUX_KVM_RISCV_H */ > diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c > index 3223f723f79e..b95dfc959009 100644 > --- a/arch/riscv/kvm/vcpu.c > +++ b/arch/riscv/kvm/vcpu.c > @@ -165,6 +165,215 @@ vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, > struct vm_fault *vmf) > return VM_FAULT_SIGBUS; > } > > +static int kvm_riscv_vcpu_get_reg_config(struct kvm_vcpu *vcpu, > +const struct kvm_one_reg *reg) > +{ > + unsigned long __user *uaddr = > + (unsigned long __user *)(unsigned long)reg->addr; > + unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK | > + KVM_REG_SIZE_MASK | > + KVM_REG_RISCV_CONFIG); > + unsigned long reg_val; > + > + if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long)) > + return
Re: [PATCH v7 02/21] RISC-V: Add bitmap reprensenting ISA features common across CPUs
On Sat, Sep 21, 2019 at 3:31 PM Paul Walmsley wrote: > > Hi Anup, > > Thanks for changing this to use a bitmap. A few comments below - > > On Wed, 4 Sep 2019, Anup Patel wrote: > > > This patch adds riscv_isa bitmap which represents Host ISA features > > common across all Host CPUs. The riscv_isa is not same as elf_hwcap > > because elf_hwcap will only have ISA features relevant for user-space > > apps whereas riscv_isa will have ISA features relevant to both kernel > > and user-space apps. > > > > One of the use-case for riscv_isa bitmap is in KVM hypervisor where > > we will use it to do following operations: > > > > 1. Check whether hypervisor extension is available > > 2. Find ISA features that need to be virtualized (e.g. floating > >point support, vector extension, etc.) > > > > Signed-off-by: Anup Patel > > Signed-off-by: Atish Patra > > Reviewed-by: Alexander Graf > > --- > > arch/riscv/include/asm/hwcap.h | 26 +++ > > arch/riscv/kernel/cpufeature.c | 79 -- > > 2 files changed, 102 insertions(+), 3 deletions(-) > > > > diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h > > index 7ecb7c6a57b1..9b657375aa51 100644 > > --- a/arch/riscv/include/asm/hwcap.h > > +++ b/arch/riscv/include/asm/hwcap.h > > @@ -8,6 +8,7 @@ > > #ifndef __ASM_HWCAP_H > > #define __ASM_HWCAP_H > > > > +#include > > #include > > > > #ifndef __ASSEMBLY__ > > @@ -22,5 +23,30 @@ enum { > > }; > > > > extern unsigned long elf_hwcap; > > + > > +#define RISCV_ISA_EXT_a ('a' - 'a') > > +#define RISCV_ISA_EXT_c ('c' - 'a') > > +#define RISCV_ISA_EXT_d ('d' - 'a') > > +#define RISCV_ISA_EXT_f ('f' - 'a') > > +#define RISCV_ISA_EXT_h ('h' - 'a') > > +#define RISCV_ISA_EXT_i ('i' - 'a') > > +#define RISCV_ISA_EXT_m ('m' - 'a') > > +#define RISCV_ISA_EXT_s ('s' - 'a') > > +#define RISCV_ISA_EXT_u ('u' - 'a') > > +#define RISCV_ISA_EXT_zicsr (('z' - 'a') + 1) > > +#define RISCV_ISA_EXT_zifencei (('z' - 'a') + 2) > > +#define RISCV_ISA_EXT_zam(('z' - 'a') + 3) > > +#define RISCV_ISA_EXT_ztso (('z' - 'a') + 4) > > If we add the Z extensions here, it's probably best if we drop Zam from > this list. The rationale is, as maintainers, we're planning to hold off > on merging any support for extensions or modules that aren't in the > "frozen" or "ratified" states, and according to the RISC-V specs, Zicsr, > Zifencei, and Ztso are all either frozen or ratified. However, see > below - No problem, I will remove Zam define. Please add documentation under Documentation/riscv regarding the policy of not merging support for RISC-V extensions which aren't in "frozen" or "ratified" state. > > > + > > +#define RISCV_ISA_EXT_MAX256 > > + > > +unsigned long riscv_isa_extension_base(const unsigned long *isa_bitmap); > > + > > +#define riscv_isa_extension_mask(ext) BIT_MASK(RISCV_ISA_EXT_##ext) > > + > > +bool __riscv_isa_extension_available(const unsigned long *isa_bitmap, int > > bit); > > +#define riscv_isa_extension_available(isa_bitmap, ext) \ > > + __riscv_isa_extension_available(isa_bitmap, RISCV_ISA_EXT_##ext) > > + > > #endif > > #endif > > diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c > > index b1ade9a49347..4ce71ce5e290 100644 > > --- a/arch/riscv/kernel/cpufeature.c > > +++ b/arch/riscv/kernel/cpufeature.c > > @@ -6,21 +6,64 @@ > > * Copyright (C) 2017 SiFive > > */ > > > > +#include > > #include > > #include > > #include > > #include > > > > unsigned long elf_hwcap __read_mostly; > > + > > +/* Host ISA bitmap */ > > +static DECLARE_BITMAP(riscv_isa, RISCV_ISA_EXT_MAX) __read_mostly; > > + > > #ifdef CONFIG_FPU > > bool has_fpu __read_mostly; > > #endif > > > > +/** > > + * riscv_isa_extension_base - Get base extension word > > + * > > + * @isa_bitmap ISA bitmap to use > > + * @returns base extension word as unsigned long value > > + * > > + * NOTE: If isa_bitmap is NULL then Host ISA bitmap will be used. > > + */ > > Am happy to see comments that can be automatically parsed, but could you > reformat them into kernel-doc format? > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/doc-guide/kernel-doc.rst Sure, I will update comments as-per kernel-doc.rst. > > > +unsigned long riscv_isa_extension_base(const unsigned long *isa_bitmap) > > +{ > > + if (!isa_bitmap) > > + return riscv_isa[0]; > > + return isa_bitmap[0]; > > +} > > +EXPORT_SYMBOL_GPL(riscv_isa_extension_base); > > + > > +/** > > + * __riscv_isa_extension_available - Check whether given extension > > + * is available or not > > + * > > + * @isa_bitmap ISA bitmap to use > > + * @bit bit position of the desired extension > > + * @returns true or false > > + * > > + * NOTE: If isa_bitmap is NULL then Host ISA bitmap will be used. > > + */ > > Same
Re: [PATCH v1 0/2] pinctrl: Add new pinctrl/GPIO driver
Hi Mika, On 13/9/2019 4:18 PM, Mika Westerberg wrote: > On Thu, Sep 12, 2019 at 11:11:32AM +0100, Linus Walleij wrote: >> Hi Rahul, >> >> thanks for your patches! >> >> On Thu, Sep 12, 2019 at 8:59 AM Rahul Tanwar >> wrote: >> >>> This series is to add pinctrl & GPIO controller driver for a new SoC. >>> Patch 1 adds pinmux & GPIO controller driver. >>> Patch 2 adds the dt bindings document & include file. >>> >>> Patches are against Linux 5.3-rc5 at below Git tree: >>> git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl.git >> OK nice, I think you need to include Mika Westerberg on this review >> as well, because I think he likes to stay on top of all things intel >> in pin control. (Also included two other Intel folks in Finland who usually >> take an interest in these things.) > Thanks Linus for looping me in. > > Even if this is not directly based on the stuff we have under > drivers/pinctrl/intel/*, I have a couple of comments. I don't have this > patch series in my inbox so I'm commenting here. > > Since the driver name is equilibrium I suggest you to name > intel_pinctrl_driver and the like (probe, remove) to follow that > convention to avoid confusing this with the Intel pinctrl drivers under > drivers/pinctrl/intel/*. > > Maybe use eqbr prefix so then intel_pinctrl_driver becomes > eqbr_pinctrl_driver and so on. Also all the structures like > intel_pinctrl_drv_data should be changed accordingly. > > Ditto for: > > MODULE_DESCRIPTION("Intel Pinctrl Driver for LGM SoC"); > > I think better would be: > > MODULE_DESCRIPTION("Pinctrl Driver for LGM SoC (Equilibrium)"); > > Anyway you get the idea :) Yes, i understand your point. Will update in v2. Thanks. Regards, Rahul
[RFC v3] zswap: Add CONFIG_ZSWAP_IO_SWITCH to handle swap IO issue
This is the third version of this patch. The first and second version is in [1] and [2]. This verion is updated according to the comments from Randy Dunlap in [3]. Currently, I use a VM that has 2 CPUs, 4G memory and 4G swap file. I found that swap will affect the IO performance when it is running. So I open zswap to handle it because it just use CPU cycles but not disk IO. It work OK but I found that zswap is slower than normal swap in this VM. zswap is about 300M/s and normal swap is about 500M/s. (The reason is disk inside VM has fscache in host machine.) So open zswap is make memory shrinker slower but good for IO performance in this VM. So I just want zswap work when the disk of the swap file is under high IO load. This commit is designed for this idea. It add two parameters read_in_flight_limit and write_in_flight_limit to zswap. In zswap_frontswap_store, pages will be stored to zswap only when the IO in flight number of swap device is bigger than zswap_read_in_flight_limit or zswap_write_in_flight_limit when zswap is enabled. Then the zswap just work when the IO in flight number of swap device is low. [1] https://lkml.org/lkml/2019/9/11/935 [2] https://lkml.org/lkml/2019/9/20/90 [3] https://lkml.org/lkml/2019/9/20/1076 Signed-off-by: Hui Zhu --- include/linux/swap.h | 3 +++ mm/Kconfig | 18 mm/page_io.c | 16 +++ mm/zswap.c | 58 4 files changed, 95 insertions(+) diff --git a/include/linux/swap.h b/include/linux/swap.h index de2c67a..82b621f 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -389,6 +389,9 @@ extern void end_swap_bio_write(struct bio *bio); extern int __swap_writepage(struct page *page, struct writeback_control *wbc, bio_end_io_t end_write_func); extern int swap_set_page_dirty(struct page *page); +#ifdef CONFIG_ZSWAP_IO_SWITCH +extern void swap_io_in_flight(struct page *page, unsigned int inflight[2]); +#endif int add_swap_extent(struct swap_info_struct *sis, unsigned long start_page, unsigned long nr_pages, sector_t start_block); diff --git a/mm/Kconfig b/mm/Kconfig index 56cec63..387c3b5 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -546,6 +546,24 @@ config ZSWAP they have not be fully explored on the large set of potential configurations and workloads that exist. +config ZSWAP_IO_SWITCH + bool "Compressed cache for swap pages according to the IO status" + depends on ZSWAP + help + This function helps the system that normal swap speed is higher + than zswap speed to handle the swap IO issue. + For example, a VM where the disk device is not set cache config or + set cache=writeback. + + This function makes zswap just work when the disk of the swap file + is under high IO load. + It add two parameters (read_in_flight_limit and + write_in_flight_limit) to zswap. When zswap is enabled, pages will + be stored to zswap only when the IO in flight number of swap device + is bigger than zswap_read_in_flight_limit or + zswap_write_in_flight_limit. + If unsure, say "n". + config ZPOOL tristate "Common API for compressed memory storage" help diff --git a/mm/page_io.c b/mm/page_io.c index 24ee600..e66b050 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -434,3 +434,19 @@ int swap_set_page_dirty(struct page *page) return __set_page_dirty_no_writeback(page); } } + +#ifdef CONFIG_ZSWAP_IO_SWITCH +void swap_io_in_flight(struct page *page, unsigned int inflight[2]) +{ + struct swap_info_struct *sis = page_swap_info(page); + + if (!sis->bdev) { + inflight[0] = 0; + inflight[1] = 0; + return; + } + + part_in_flight_rw(bdev_get_queue(sis->bdev), sis->bdev->bd_part, + inflight); +} +#endif diff --git a/mm/zswap.c b/mm/zswap.c index 0e22744..0190b2d 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -62,6 +62,14 @@ static u64 zswap_reject_compress_poor; static u64 zswap_reject_alloc_fail; /* Store failed because the entry metadata could not be allocated (rare) */ static u64 zswap_reject_kmemcache_fail; +#ifdef CONFIG_ZSWAP_IO_SWITCH +/* + * Store failed because zswap_read_in_flight_limit or + * zswap_write_in_flight_limit is bigger than IO in flight number of + * swap device + */ +static u64 zswap_reject_io; +#endif /* Duplicate store was encountered (rare) */ static u64 zswap_duplicate_entry; @@ -114,6 +122,24 @@ static bool zswap_same_filled_pages_enabled = true; module_param_named(same_filled_pages_enabled, zswap_same_filled_pages_enabled, bool, 0644); +#ifdef CONFIG_ZSWAP_IO_SWITCH +/* + * zswap will not try to store the page if zswap_read_in_flight_limit is + * bigger than IO read in flight number of swap device + */ +static unsigned int
Re: [PATCH v9 07/11] dt-bindings: pwm: pwm-mediatek: add a property "num-pwms"
On Sat, 2019-09-21 at 02:21 +0200, Thierry Reding wrote: > On Fri, Sep 20, 2019 at 06:49:07AM +0800, Sam Shih wrote: > > From: Ryder Lee > > > > This adds a property "num-pwms" in example so that we could > > specify the number of PWM channels via device tree. > > > > Signed-off-by: Ryder Lee > > Signed-off-by: Sam Shih > > Reviewed-by: Matthias Brugger > > Acked-by: Uwe Kleine-König > > --- > > Changes since v6: > > Follow reviewers's comments: > > - The subject should indicate this is for Mediatek > > > > Changes since v5: > > - Add an Acked-by tag > > - This file is original v4 patch 5/10 > > (https://patchwork.kernel.org/patch/11102577/) > > > > --- > > Documentation/devicetree/bindings/pwm/pwm-mediatek.txt | 7 --- > > 1 file changed, 4 insertions(+), 3 deletions(-) > > You failed to address Rob's questions repeatedly and I agree with him > that you can just as easily derive the number of PWMs from the specific > compatible string. I won't be applying this and none of the patches that > depend on it. > Hi, Thanks for getting back to me. New pwm driver (patch 04/11 : "pwm: mediatek: allocate the clks array dynamically") can support different variants with different number of PWMs by the new property For example: 1. Use "num-pwms" = <2> and assign clocks pwm1, pwm2 for mt7622 2. Use "num-pwms" = <6> and assign clocks pwm1, pwm2, pwm3, pwm4, pwm5, pwm6 for mt7622. If we just as easily derive the number of PWMs from the specific compatible string in this document: - "pwm1-6": the six per PWM clocks for mt7622 This looks like all "pwm1", "pwm2", "pwm3", "pwm4", "pwm5", "pwm6" is required property in DT, It doesn't make sense. So we removed those descriptions and added - "pwm1-N": the PWM clocks for each channel But the max number of clocks from the compatible string are still important information that should be provide in this document. What do you think of this? - "pwm1-N": per PWM clocks for mt2712, the max number of PWM channels is 8 - "pwm1-N": per PWM clocks for mt7622, the max number of PWM channels is 6 - "pwm1-N": per PWM clocks for mt7623, the max number of PWM channels is 5 where N starting from 1 to the maximum number of PWM channels - num-pwms: the number of PWM channels. Thanks Best Regards Sam
[PATCH] smack: fix an compile error in smack_post_notification
I hit the following error when compile the kernel. security/smack/smack_lsm.c: In function smack_post_notification: security/smack/smack_lsm.c:4383:7: error: dereferencing pointer to incomplete type struct watch_notification if (n->type == WATCH_TYPE_META) ^~ security/smack/smack_lsm.c:4383:17: error: WATCH_TYPE_META undeclared (first use in this function); did you mean TCA_PIE_BETA? if (n->type == WATCH_TYPE_META) ^~~ TCA_PIE_BETA security/smack/smack_lsm.c:4383:17: note: each undeclared identifier is reported only once for each function it appears in Signed-off-by: zhong jiang --- security/smack/smack.h | 1 + 1 file changed, 1 insertion(+) diff --git a/security/smack/smack.h b/security/smack/smack.h index 62529f3..02b05a2 100644 --- a/security/smack/smack.h +++ b/security/smack/smack.h @@ -21,6 +21,7 @@ #include #include #include +#include /* * Use IPv6 port labeling if IPv6 is enabled and secmarks -- 1.7.12.4
RE: [RFC PATCH v3 4/6] psci: Add hvc call service for ptp_kvm.
Hi Marc, Paolo, > -Original Message- > From: Paolo Bonzini > Sent: Thursday, September 19, 2019 8:13 PM > To: Marc Zyngier ; Jianyong Wu (Arm Technology China) > ; net...@vger.kernel.org; yangbo...@nxp.com; > john.stu...@linaro.org; t...@linutronix.de; sean.j.christopher...@intel.com; > richardcoch...@gmail.com; Mark Rutland ; Will > Deacon ; Suzuki Poulose > > Cc: linux-kernel@vger.kernel.org; k...@vger.kernel.org; Steve Capper > ; Kaly Xin (Arm Technology China) > ; Justin He (Arm Technology China) > ; nd ; linux-arm- > ker...@lists.infradead.org > Subject: Re: [RFC PATCH v3 4/6] psci: Add hvc call service for ptp_kvm. > > On 19/09/19 13:39, Marc Zyngier wrote: > >> I don't think it's ugly but more important, using tk->tkr_mono.clock > >> is incorrect. See how the x86 code hardcodes _clock, it's the > >> same for ARM. > > Not really. The guest kernel is free to use any clocksource it wishes. > > Understood, in fact it's the same on x86. > > However, for PTP to work, the cycles value returned by the clocksource must > match the one returned by the hypercall. So for ARM > get_device_system_crosststamp must receive the arch timer clocksource, so > that it will return -ENODEV if the active clocksource is anything else. > After day's reflection on this, I'm a little clear about this issue, let me clarify it. I think there is three principles for this issue: (1): guest and host can use different clocksouces as their current clocksouce at the same time and we can't or it's not easy to probe that if they use the same one. (2): the cycle value and the clocksouce which passed to get_device_system_crosststamp must be match. (3): ptp_kvm target to increase the time sync precision so we should choose a high rate clocksource as ptp_kvm clocksource. Base on (1) and (2) we can deduce that the ptp_kvm clocksource should be better a fixed one. So we can test if the cycle and clocksource is match. Base on (3) the arch_sys_timer should be chosen, as it's the best clocksource by far for arm. Thanks Jianyong Wu > Paolo > > > In some cases, it is actually desirable (like these broken systems > > that cannot use an in-kernel irqchip...). Maybe it is that on x86 the > > guest only uses the kvm_clock, but that's a much harder sell on ARM. > > The fact that ptp_kvm assumes that the clocksource is fixed doesn't > > seem correct in that case.
Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return
On Sun, Sep 22, 2019 at 7:34 PM Jason Wang wrote: > > > On 2019/9/23 上午9:15, Matt Cover wrote: > > On Sun, Sep 22, 2019 at 5:51 PM Jason Wang wrote: > >> > >> On 2019/9/23 上午6:30, Matt Cover wrote: > >>> On Sun, Sep 22, 2019 at 1:36 PM Michael S. Tsirkin > >>> wrote: > On Sun, Sep 22, 2019 at 10:43:19AM -0700, Matt Cover wrote: > > On Sun, Sep 22, 2019 at 5:37 AM Michael S. Tsirkin > > wrote: > >> On Fri, Sep 20, 2019 at 11:58:43AM -0700, Matthew Cover wrote: > >>> Treat a negative return from a TUNSETSTEERINGEBPF bpf prog as a signal > >>> to fallback to tun_automq_select_queue() for tx queue selection. > >>> > >>> Compilation of this exact patch was tested. > >>> > >>> For functional testing 3 additional printk()s were added. > >>> > >>> Functional testing results (on 2 txq tap device): > >>> > >>> [Fri Sep 20 18:33:27 2019] == tun no prog == > >>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() > >>> returned '-1' > >>> [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran > >>> [Fri Sep 20 18:33:27 2019] == tun prog -1 == > >>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() > >>> returned '-1' > >>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() > >>> returned '-1' > >>> [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran > >>> [Fri Sep 20 18:33:27 2019] == tun prog 0 == > >>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() > >>> returned '0' > >>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() > >>> returned '0' > >>> [Fri Sep 20 18:33:27 2019] == tun prog 1 == > >>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() > >>> returned '1' > >>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() > >>> returned '1' > >>> [Fri Sep 20 18:33:27 2019] == tun prog 2 == > >>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() > >>> returned '2' > >>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() > >>> returned '0' > >>> > >>> Signed-off-by: Matthew Cover > >> Could you add a bit more motivation data here? > > Thank you for these questions Michael. > > > > I'll plan on adding the below information to the > > commit message and submitting a v2 of this patch > > when net-next reopens. In the meantime, it would > > be very helpful to know if these answers address > > some of your concerns. > > > >> 1. why is this a good idea > > This change allows TUNSETSTEERINGEBPF progs to > > do any of the following. > >1. implement queue selection for a subset of > > traffic (e.g. special queue selection logic > > for ipv4, but return negative and use the > > default automq logic for ipv6) > >2. determine there isn't sufficient information > > to do proper queue selection; return > > negative and use the default automq logic > > for the unknown > >3. implement a noop prog (e.g. do > > bpf_trace_printk() then return negative and > > use the default automq logic for everything) > > > >> 2. how do we know existing userspace does not rely on existing > >> behaviour > > Prior to this change a negative return from a > > TUNSETSTEERINGEBPF prog would have been cast > > into a u16 and traversed netdev_cap_txqueue(). > > > > In most cases netdev_cap_txqueue() would have > > found this value to exceed real_num_tx_queues > > and queue_index would be updated to 0. > > > > It is possible that a TUNSETSTEERINGEBPF prog > > return a negative value which when cast into a > > u16 results in a positive queue_index less than > > real_num_tx_queues. For example, on x86_64, a > > return value of -65535 results in a queue_index > > of 1; which is a valid queue for any multiqueue > > device. > > > > It seems unlikely, however as stated above is > > unfortunately possible, that existing > > TUNSETSTEERINGEBPF programs would choose to > > return a negative value rather than return the > > positive value which holds the same meaning. > > > > It seems more likely that future > > TUNSETSTEERINGEBPF programs would leverage a > > negative return and potentially be loaded into > > a kernel with the old behavior. > OK if we are returning a special > value, shouldn't we limit it? How about a special > value with this meaning? > If we are changing an ABI let's at least make it > extensible. > > >>> A special value with this meaning sounds > >>> good to me. I'll plan on adding a define > >>> set to -1 to cause the fallback to automq. >
[PATCH] futex: robust futex maybe never be awaked, on rare situation.
I use model checker find a issue of robust and pi futex. On below situation, the owner can't find something in pi_state_list, while the requester will be blocked, never be awaked. CPU0 CPU1 futex_lock_pi /*some cs code*/ futex_lock_pi futex_lock_pi_atomic ... newval = uval | FUTEX_WAITERS; ret = lock_pi_update_atomic(uaddr, uval, newval); ... attach_to_pi_owner p = find_get_task_by_vpid(pid); if (!p) return handle_exit_race(uaddr, uval, NULL); raw_spin_lock_irq(>pi_lock); pi_state = alloc_pi_state(); do_exit->mm_release if (unlikely(tsk->robust_list)) { exit_robust_list(tsk); tsk->robust_list = NULL; } if (unlikely(!list_empty(>pi_state_list))) exit_pi_state_list(tsk); /*WILL MISS*/ list_add(_state->list, >pi_state_list); WILL BLOCKED, NEVER WAKEUP! Signed-off-by: Yunfeng Cui Reviewed-by: Bo Wang Reviewed-by: Yi Wang --- kernel/fork.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/kernel/fork.c b/kernel/fork.c index 53e780748fe3..58b90f21dac4 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1277,15 +1277,16 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm) if (unlikely(tsk->robust_list)) { exit_robust_list(tsk); tsk->robust_list = NULL; + /*Check pi_state_list of task on pi_lock be acquired*/ + exit_pi_state_list(tsk); } #ifdef CONFIG_COMPAT if (unlikely(tsk->compat_robust_list)) { compat_exit_robust_list(tsk); tsk->compat_robust_list = NULL; + exit_pi_state_list(tsk); } #endif - if (unlikely(!list_empty(>pi_state_list))) - exit_pi_state_list(tsk); #endif uprobe_free_utask(tsk); -- 2.15.2
Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return
On Sun, Sep 22, 2019 at 7:32 PM Jason Wang wrote: > > > On 2019/9/23 上午9:20, Matt Cover wrote: > > On Sun, Sep 22, 2019 at 5:46 PM Jason Wang wrote: > >> > >> On 2019/9/23 上午1:43, Matt Cover wrote: > >>> On Sun, Sep 22, 2019 at 5:37 AM Michael S. Tsirkin > >>> wrote: > On Fri, Sep 20, 2019 at 11:58:43AM -0700, Matthew Cover wrote: > > Treat a negative return from a TUNSETSTEERINGEBPF bpf prog as a signal > > to fallback to tun_automq_select_queue() for tx queue selection. > > > > Compilation of this exact patch was tested. > > > > For functional testing 3 additional printk()s were added. > > > > Functional testing results (on 2 txq tap device): > > > > [Fri Sep 20 18:33:27 2019] == tun no prog == > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned > > '-1' > > [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran > > [Fri Sep 20 18:33:27 2019] == tun prog -1 == > > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned > > '-1' > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned > > '-1' > > [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran > > [Fri Sep 20 18:33:27 2019] == tun prog 0 == > > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned > > '0' > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned > > '0' > > [Fri Sep 20 18:33:27 2019] == tun prog 1 == > > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned > > '1' > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned > > '1' > > [Fri Sep 20 18:33:27 2019] == tun prog 2 == > > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned > > '2' > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned > > '0' > > > > Signed-off-by: Matthew Cover > Could you add a bit more motivation data here? > >>> Thank you for these questions Michael. > >>> > >>> I'll plan on adding the below information to the > >>> commit message and submitting a v2 of this patch > >>> when net-next reopens. In the meantime, it would > >>> be very helpful to know if these answers address > >>> some of your concerns. > >>> > 1. why is this a good idea > >>> This change allows TUNSETSTEERINGEBPF progs to > >>> do any of the following. > >>>1. implement queue selection for a subset of > >>> traffic (e.g. special queue selection logic > >>> for ipv4, but return negative and use the > >>> default automq logic for ipv6) > >> > >> Well, using ebpf means it need to take care of all the cases. E.g you > >> can easily implement the fallback through eBPF as well. > >> > > I really think there is value in being > > able to implement a scoped special > > case while leaving the rest of the > > packets in the kernel's hands. > > > This is only work when some fucntion could not be done by eBPF itself > and then we can provide the function through eBPF helpers. But this is > not the case here. > > > > > > Having to reimplement automq makes > > this hookpoint less accessible to > > beginners and experienced alike. > > > Note that automq itself is kind of complicated, it's best effort that is > hard to be documented accurately. It has several limitations (e.g flow > caches etc.) that may not work well in some conditions. > > It's not hard to implement a user programmable steering policy through > maps which could have much deterministic behavior than automq. The goal > of steering ebpf is to get rid of automq completely not partially rely > on it. > > And I don't see how relying on automq can simplify anything. > > Thanks > I'm not suggesting that we document automq. I'm suggesting that we add a return value which is documented as signaling to the kernel to implement whatever queue selection method is used when there is no ebpf prog attached. That behavior today is automq. There is nothing about this return value which would harder to change the default queue selection later. The default already exists today when there is no program loaded. > > > > >>>2. determine there isn't sufficient information > >>> to do proper queue selection; return > >>> negative and use the default automq logic > >>> for the unknown > >> > >> Same as above. > >> > >> > >>>3. implement a noop prog (e.g. do > >>> bpf_trace_printk() then return negative and > >>> use the default automq logic for everything) > >> > >> ditto. > >> > >> > 2. how do we know existing userspace does not rely on existing behaviour > >>> Prior to this change a negative return from a > >>> TUNSETSTEERINGEBPF prog would have been cast > >>> into a u16 and traversed netdev_cap_txqueue(). > >>> > >>> In most
RE: [PATCH V4 2/5] input: keyboard: imx_sc: Add i.MX system controller key support
Hi, Pavel > Subject: Re: [PATCH V4 2/5] input: keyboard: imx_sc: Add i.MX system > controller key support > > Hi! > > > > > + ret = imx_scu_call_rpc(priv->key_ipc_handle, , true); > > > > + if (ret) { > > > > + dev_err(>dev, "read imx sc key failed, ret > > > > %d\n", ret); > > > > + return; > > > > + } > > > > + > > > > + state = (bool)msg.state; > > > > + > > > > + if (!state && !priv->keystate) > > > > + state = true; > > > > > > This needs an explanation please. > > > > This is to handle the quick press of button, e.g., when button is > > pressed and released very quickly, when the delay work is scheduled, > > the button state read from SCU FW is actually a release state (0), the press > state is (1), so the quick press/release will be ignored. > > > > However, after double check and test, I think this should be handled > > by debounce time, if the button is pressed/release very quickly, the > > event should be ignored, I will remove it and reduce the debounce time to > 30mS, previous 100mS is too long, using 30mS as debounce time, I did NOT > see similar issue no matter how quick I press/release the button. > > Are you sure this is expected behaviour? > > AFAIK microswitches can bounce when the button is pressed and released, > but will not generate glitches when the button was not pressed, so even > short presses are real and should be propagated... Latest version of patch handle the quick press as below: - When the button is pressed, the delayed work will be scheduled; - In the delayed work, the button state will be read, once the state is different as previous state, it will be propagated; - If the state is a press state, another delayed work will be scheduled for handling release event. So there could be a small window of DEBOUNCE_TIME (30mS) between button press and the delayed work scheduled for reading the button state, if the button is released in less than 30ms, then the state read could be a release state which is same as previous state and it will be ignored. Do you think in this case, we should also propagated the press event? If yes, then what about the release event? Although from what I tested, no matter how quick I press the button, the driver can always receive both the press and release event, as the 30mS is very short. Thanks, Anson
Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return
On 2019/9/23 上午9:15, Matt Cover wrote: On Sun, Sep 22, 2019 at 5:51 PM Jason Wang wrote: On 2019/9/23 上午6:30, Matt Cover wrote: On Sun, Sep 22, 2019 at 1:36 PM Michael S. Tsirkin wrote: On Sun, Sep 22, 2019 at 10:43:19AM -0700, Matt Cover wrote: On Sun, Sep 22, 2019 at 5:37 AM Michael S. Tsirkin wrote: On Fri, Sep 20, 2019 at 11:58:43AM -0700, Matthew Cover wrote: Treat a negative return from a TUNSETSTEERINGEBPF bpf prog as a signal to fallback to tun_automq_select_queue() for tx queue selection. Compilation of this exact patch was tested. For functional testing 3 additional printk()s were added. Functional testing results (on 2 txq tap device): [Fri Sep 20 18:33:27 2019] == tun no prog == [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran [Fri Sep 20 18:33:27 2019] == tun prog -1 == [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '-1' [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran [Fri Sep 20 18:33:27 2019] == tun prog 0 == [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '0' [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' [Fri Sep 20 18:33:27 2019] == tun prog 1 == [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '1' [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '1' [Fri Sep 20 18:33:27 2019] == tun prog 2 == [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '2' [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' Signed-off-by: Matthew Cover Could you add a bit more motivation data here? Thank you for these questions Michael. I'll plan on adding the below information to the commit message and submitting a v2 of this patch when net-next reopens. In the meantime, it would be very helpful to know if these answers address some of your concerns. 1. why is this a good idea This change allows TUNSETSTEERINGEBPF progs to do any of the following. 1. implement queue selection for a subset of traffic (e.g. special queue selection logic for ipv4, but return negative and use the default automq logic for ipv6) 2. determine there isn't sufficient information to do proper queue selection; return negative and use the default automq logic for the unknown 3. implement a noop prog (e.g. do bpf_trace_printk() then return negative and use the default automq logic for everything) 2. how do we know existing userspace does not rely on existing behaviour Prior to this change a negative return from a TUNSETSTEERINGEBPF prog would have been cast into a u16 and traversed netdev_cap_txqueue(). In most cases netdev_cap_txqueue() would have found this value to exceed real_num_tx_queues and queue_index would be updated to 0. It is possible that a TUNSETSTEERINGEBPF prog return a negative value which when cast into a u16 results in a positive queue_index less than real_num_tx_queues. For example, on x86_64, a return value of -65535 results in a queue_index of 1; which is a valid queue for any multiqueue device. It seems unlikely, however as stated above is unfortunately possible, that existing TUNSETSTEERINGEBPF programs would choose to return a negative value rather than return the positive value which holds the same meaning. It seems more likely that future TUNSETSTEERINGEBPF programs would leverage a negative return and potentially be loaded into a kernel with the old behavior. OK if we are returning a special value, shouldn't we limit it? How about a special value with this meaning? If we are changing an ABI let's at least make it extensible. A special value with this meaning sounds good to me. I'll plan on adding a define set to -1 to cause the fallback to automq. Can it really return -1? I see: static inline u32 bpf_prog_run_clear_cb(const struct bpf_prog *prog, struct sk_buff *skb) ... The way I was initially viewing the old behavior was that returning negative was undefined; it happened to have the outcomes I walked through, but not necessarily by design. Having such fallback may bring extra troubles, it requires the eBPF program know the existence of the behavior which is not a part of kernel ABI actually. And then some eBPF program may start to rely on that which is pretty dangerous. Note, one important consideration is to have macvtap support where does not have any stuffs like automq. Thanks How about we call this TUN_SSE_ABORT instead of TUN_SSE_DO_AUTOMQ? TUN_SSE_ABORT could be documented as falling back to the default queue selection method in either space (presumably macvtap has some queue selection method when there is
RE: [PATCH V2 1/5] dt-bindings: fsl: scu: add scu power key binding
Hi, Pavel > Subject: Re: [PATCH V2 1/5] dt-bindings: fsl: scu: add scu power key binding > > On Tue 2019-09-03 10:03:40, Anson Huang wrote: > > NXP i.MX8QXP is an ARMv8 SoC with a Cortex-M4 core inside as system > > controller, the system controller is in charge of system power, clock > > and power key event etc. management, Linux kernel has to communicate > > with system controller via MU (message unit) IPC to get power key > > event, add binding doc for i.MX system controller power key driver. > > > > Signed-off-by: Anson Huang > > --- > > Changes since V1: > > - remove "wakeup-source" property, as it is NOT needed for SCU > interrupt; > > - remove "status" in example. > > --- > > .../devicetree/bindings/arm/freescale/fsl,scu.txt | 14 > ++ > > 1 file changed, 14 insertions(+) > > > > diff --git > > a/Documentation/devicetree/bindings/arm/freescale/fsl,scu.txt > > b/Documentation/devicetree/bindings/arm/freescale/fsl,scu.txt > > index c149fad..f93e2e4 100644 > > --- a/Documentation/devicetree/bindings/arm/freescale/fsl,scu.txt > > +++ b/Documentation/devicetree/bindings/arm/freescale/fsl,scu.txt > > @@ -157,6 +157,15 @@ Required properties: > > Optional properties: > > - timeout-sec: contains the watchdog timeout in seconds. > > > > +Power key bindings based on SCU Message Protocol > > + > > + > > +Required properties: > > +- compatible: should be: > > + "fsl,imx8qxp-sc-pwrkey" > > + followed by "fsl,imx-sc-pwrkey"; > > +- linux,keycodes: See > > +Documentation/devicetree/bindings/input/keys.txt > > Note you have keycode_s_ here, but keycode in the example and in the dts > patch. NOT quite understand your point, could you please provide more details? Thanks, Anson
RE: [PATCH] ARM: Add support for Realtek SOC
> Subject: Re: [PATCH] ARM: Add support for Realtek SOC > > On Wed, Sep 11, 2019 at 5:17 PM Arnd Bergmann wrote: > > > > On Wed, Sep 11, 2019 at 9:46 AM James Tai[戴志峰] > wrote: > > > > Subject: Re: [PATCH] ARM: Add support for Realtek SOC > > > > > > > @@ -148,6 +148,7 @@ endif > > > > > textofs-$(CONFIG_ARCH_MSM8X60) := 0x00208000 > > > > > textofs-$(CONFIG_ARCH_MSM8960) := 0x00208000 > > > > > textofs-$(CONFIG_ARCH_MESON) := 0x00208000 > > > > > +textofs-$(CONFIG_ARCH_REALTEK) := 0x00208000 > > > > > textofs-$(CONFIG_ARCH_AXXIA) := 0x00308000 > > > > > > > > Can you explain why this is needed for your platform? > > > > > > > We need to reserve memory (0x ~ 0x001B) for rom and boot > code. > > > > Ok. > > > I do not like this much. > > This platform is ARCH_MULTI_V7. > > ARM_PATCH_PHYS_VIRT allows you to place the kernel image anywhere in > memory as long as the base is aligned at 16MB. > > The minimum 'textofs-y := 0x0008000' + extra 16MB offset will create a space > (0x ~ 0x01008000). > > This is more than needed, but it is not a big deal to waste some megabytes of > memory. > OK. I understand. > -- > Best Regards > Masahiro Yamada > > --Please consider the environment before printing this e-mail.
Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return
On 2019/9/23 上午9:20, Matt Cover wrote: On Sun, Sep 22, 2019 at 5:46 PM Jason Wang wrote: On 2019/9/23 上午1:43, Matt Cover wrote: On Sun, Sep 22, 2019 at 5:37 AM Michael S. Tsirkin wrote: On Fri, Sep 20, 2019 at 11:58:43AM -0700, Matthew Cover wrote: Treat a negative return from a TUNSETSTEERINGEBPF bpf prog as a signal to fallback to tun_automq_select_queue() for tx queue selection. Compilation of this exact patch was tested. For functional testing 3 additional printk()s were added. Functional testing results (on 2 txq tap device): [Fri Sep 20 18:33:27 2019] == tun no prog == [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran [Fri Sep 20 18:33:27 2019] == tun prog -1 == [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '-1' [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran [Fri Sep 20 18:33:27 2019] == tun prog 0 == [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '0' [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' [Fri Sep 20 18:33:27 2019] == tun prog 1 == [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '1' [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '1' [Fri Sep 20 18:33:27 2019] == tun prog 2 == [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '2' [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' Signed-off-by: Matthew Cover Could you add a bit more motivation data here? Thank you for these questions Michael. I'll plan on adding the below information to the commit message and submitting a v2 of this patch when net-next reopens. In the meantime, it would be very helpful to know if these answers address some of your concerns. 1. why is this a good idea This change allows TUNSETSTEERINGEBPF progs to do any of the following. 1. implement queue selection for a subset of traffic (e.g. special queue selection logic for ipv4, but return negative and use the default automq logic for ipv6) Well, using ebpf means it need to take care of all the cases. E.g you can easily implement the fallback through eBPF as well. I really think there is value in being able to implement a scoped special case while leaving the rest of the packets in the kernel's hands. This is only work when some fucntion could not be done by eBPF itself and then we can provide the function through eBPF helpers. But this is not the case here. Having to reimplement automq makes this hookpoint less accessible to beginners and experienced alike. Note that automq itself is kind of complicated, it's best effort that is hard to be documented accurately. It has several limitations (e.g flow caches etc.) that may not work well in some conditions. It's not hard to implement a user programmable steering policy through maps which could have much deterministic behavior than automq. The goal of steering ebpf is to get rid of automq completely not partially rely on it. And I don't see how relying on automq can simplify anything. Thanks 2. determine there isn't sufficient information to do proper queue selection; return negative and use the default automq logic for the unknown Same as above. 3. implement a noop prog (e.g. do bpf_trace_printk() then return negative and use the default automq logic for everything) ditto. 2. how do we know existing userspace does not rely on existing behaviour Prior to this change a negative return from a TUNSETSTEERINGEBPF prog would have been cast into a u16 and traversed netdev_cap_txqueue(). In most cases netdev_cap_txqueue() would have found this value to exceed real_num_tx_queues and queue_index would be updated to 0. It is possible that a TUNSETSTEERINGEBPF prog return a negative value which when cast into a u16 results in a positive queue_index less than real_num_tx_queues. For example, on x86_64, a return value of -65535 results in a queue_index of 1; which is a valid queue for any multiqueue device. It seems unlikely, however as stated above is unfortunately possible, that existing TUNSETSTEERINGEBPF programs would choose to return a negative value rather than return the positive value which holds the same meaning. It seems more likely that future TUNSETSTEERINGEBPF programs would leverage a negative return and potentially be loaded into a kernel with the old behavior. Yes, eBPF can return probably wrong value, but what kernel did is just to make sure it doesn't harm anything. I would rather just drop the packet in this case. In addition to TUN_SSE_ABORT, we can add TUN_SSE_DROP. That could be made the default for any undefined negative
RE: [PATCH V2 1/5] dt-bindings: fsl: scu: add scu power key binding
Hi, Pavel > Subject: Re: [PATCH V2 1/5] dt-bindings: fsl: scu: add scu power key binding > > Hi! > > > NXP i.MX8QXP is an ARMv8 SoC with a Cortex-M4 core inside as system > > controller, the system controller is in charge of system power, clock > > and power key event etc. management, Linux kernel has to communicate > > with system controller via MU (message unit) IPC to get power key > > event, add binding doc for i.MX system controller power key driver. > > > > Signed-off-by: Anson Huang > > > +++ b/Documentation/devicetree/bindings/arm/freescale/fsl,scu.txt > > @@ -157,6 +157,15 @@ Required properties: > > Optional properties: > > - timeout-sec: contains the watchdog timeout in seconds. > > > > +Power key bindings based on SCU Message Protocol > > + > > + > > +Required properties: > > +- compatible: should be: > > + "fsl,imx8qxp-sc-pwrkey" > > + followed by "fsl,imx-sc-pwrkey"; > > +- linux,keycodes: See > > +Documentation/devicetree/bindings/input/keys.txt > > Actually there's no reason for having "linux,keycodes" property when it is > always a power button. The latest version of patch already change the compatible name to *-sc-key which is more general as key driver, so the "linux,keycodes" is needed now for driver to define the key function. Thanks, Anson
RE: [PATCH] ARM: Add support for Realtek SOC
> wrote: > > > Subject: Re: [PATCH] ARM: Add support for Realtek SOC > > > > > @@ -148,6 +148,7 @@ endif > > > > textofs-$(CONFIG_ARCH_MSM8X60) := 0x00208000 > > > > textofs-$(CONFIG_ARCH_MSM8960) := 0x00208000 > > > > textofs-$(CONFIG_ARCH_MESON) := 0x00208000 > > > > +textofs-$(CONFIG_ARCH_REALTEK) := 0x00208000 > > > > textofs-$(CONFIG_ARCH_AXXIA) := 0x00308000 > > > > > > Can you explain why this is needed for your platform? > > > > > We need to reserve memory (0x ~ 0x001B) for rom and boot > code. > > Ok. > > > > > +config ARCH_RTD16XX > > > > + bool "Enable support for RTD1619" > > > > + depends on ARCH_REALTEK > > > > + select ARM_GIC_V3 > > > > + select ARM_PSCI > > > > > > As I understand, this chip uses a Cortex-A55. What is the reason for > > > adding support only to the 32-bit ARM architecture rather than 64-bit? > > > > The RTD16XX platform also support the 64-bit ARM architecture. > > I will add the 64-bit ARM architecture in new version patch. > > > > > Most 64-bit SoCs are only supported with arch/arm64, but generally > > > speaking that is not a requirement. My rule of thumb is that on > > > systems with 1GB of RAM or more, one would want to run a 64-bit > > > kernel, while systems with less than that are better off with a > > > 32-bit one, but that is clearly not the only reason for picking one over > > > the > other. > > > > > Support 32-bit ARM architecture is for application compatibility. > > Generally speaking, a 64-bit kernel should work better on 64-bit hardware > even when you are running mostly 32-bit applications. However, you may have > device drivers that do not correctly set compat_ioctl handlers. > > As I said, it's no problem to allow both, just explain this in the changelog > text > for the driver, along with the need for the textofs setting. > OK. > > > It's very unusual to see custom smp operations on an ARMv8 system, > > > as we normally use PSCI here. Can you explain what is going on here? > > > Are you able to use a boot wrapper that implements these in psci instead? > > > > > The smp operations is porting form other Realtek platform. > > > > Currently, The RTD16XX platform can use the PSCI method. > > I will add PSCI method in new version patch. > > Ok, perfect! > > > > > + timer_probe(); > > > > + tick_setup_hrtimer_broadcast(); } > > > > > > What do you need tick_setup_hrtimer_broadcast() for? I don't see any > > > other platform calling this. > > > > > I want to initialize the HR timer. > > I'm still unsure about this one. My feeling is that it should not be in the > platform code, but I don't quite understand which hardware needs it. I see > that > Lorenzo Pieralisi added the same code to arm64 in commit 9358d755bd5c > ("arm64: kernel: initialize broadcast hrtimer based clock event device"), but > nothing ever calls it on 32-bit arch/arm even though the code does get built > in > to the kernel. I will add the 'hrtimer' initialization flow in related devices drivers. > My feeling is that either you don't really need it, or this is something that > other > platforms should really do as well, and it should be called from the generic > time_init() function in arch/arm/kernel/time.c instead. > OK. I understand. > Can you try to find out more of the background here, and then move the call to > time_init() assuming it is indeed useful? I agree with you. It is not necessary to call 'time_init()' function in platform code. >Arnd > > --Please consider the environment before printing this e-mail.
Re: [PATCH 1/4] MIPS: CI20: DTS: Add I2C nodes
Hi Alexandre, Thank you for the patch! Yet something to improve: [auto build test ERROR on linus/master] [cannot apply to v5.3 next-20190920] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system. BTW, we also suggest to use '--base' option to specify the base tree in git format-patch, please see https://stackoverflow.com/a/37406982] url: https://github.com/0day-ci/linux/commits/Alexandre-GRIVEAUX/MIPS-CI20-DTS-Add-nodes-to-Creator-CI20-board/20190923-041656 config: mips-allmodconfig (attached as .config) compiler: mips-linux-gcc (GCC) 7.4.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree GCC_VERSION=7.4.0 make.cross ARCH=mips If you fix the issue, kindly add following tag Reported-by: kbuild test robot All errors (new ones prefixed by >>): >> Error: arch/mips/boot/dts/ingenic/ci20.dts:90.1-6 Label or path i2c0 not >> found >> Error: arch/mips/boot/dts/ingenic/ci20.dts:168.1-6 Label or path i2c1 not >> found >> Error: arch/mips/boot/dts/ingenic/ci20.dts:176.1-6 Label or path i2c2 not >> found >> Error: arch/mips/boot/dts/ingenic/ci20.dts:184.1-6 Label or path i2c3 not >> found >> Error: arch/mips/boot/dts/ingenic/ci20.dts:192.1-6 Label or path i2c4 not >> found FATAL ERROR: Syntax error parsing input tree --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
[PATCH V2 4/7] USB: serial: f81232: Add F81534A support
The Fintek F81532A/534A/535/536 is USB-to-2/4/8/12 serial ports device and the serial port is default disabled when plugin computer. The IC is contains devices as following: 1. HUB (all devices is connected with this hub) 2. GPIO/Control device. (enable serial port and control GPIOs) 3. serial port 1 to x (2/4/8/12) It's most same with F81232, the UART device is difference as follow: 1. TX/RX bulk size is 128/512bytes 2. RX bulk layout change: F81232: [LSR(1Byte)+DATA(1Byte)][LSR(1Byte)+DATA(1Byte)]... F81534A:[LEN][Data.][LSR] Signed-off-by: Ji-Ze Hong (Peter Hong) --- drivers/usb/serial/f81232.c | 131 ++-- 1 file changed, 127 insertions(+), 4 deletions(-) diff --git a/drivers/usb/serial/f81232.c b/drivers/usb/serial/f81232.c index e4db0aec9af0..36a17aedc2ae 100644 --- a/drivers/usb/serial/f81232.c +++ b/drivers/usb/serial/f81232.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 /* * Fintek F81232 USB to serial adaptor driver + * Fintek F81532A/534A/535/536 USB to 2/4/8/12 serial adaptor driver * * Copyright (C) 2012 Greg Kroah-Hartman (gre...@linuxfoundation.org) * Copyright (C) 2012 Linux Foundation @@ -21,11 +22,36 @@ #include #include +#define F81232_ID \ + { USB_DEVICE(0x1934, 0x0706) } /* 1 port UART device */ + +#define F81534A_SERIES_ID \ + { USB_DEVICE(0x2c42, 0x1602) }, /* In-Box 2 port UART device */ \ + { USB_DEVICE(0x2c42, 0x1604) }, /* In-Box 4 port UART device */ \ + { USB_DEVICE(0x2c42, 0x1605) }, /* In-Box 8 port UART device */ \ + { USB_DEVICE(0x2c42, 0x1606) }, /* In-Box 12 port UART device */ \ + { USB_DEVICE(0x2c42, 0x1608) }, /* Non-Flash type */ \ + { USB_DEVICE(0x2c42, 0x1632) }, /* 2 port UART device */ \ + { USB_DEVICE(0x2c42, 0x1634) }, /* 4 port UART device */ \ + { USB_DEVICE(0x2c42, 0x1635) }, /* 8 port UART device */ \ + { USB_DEVICE(0x2c42, 0x1636) } /* 12 port UART device */ + static const struct usb_device_id id_table[] = { - { USB_DEVICE(0x1934, 0x0706) }, + F81232_ID, + { } /* Terminating entry */ +}; + +static const struct usb_device_id f81534a_id_table[] = { + F81534A_SERIES_ID, + { } /* Terminating entry */ +}; + +static const struct usb_device_id all_serial_id_table[] = { + F81232_ID, + F81534A_SERIES_ID, { } /* Terminating entry */ }; -MODULE_DEVICE_TABLE(usb, id_table); +MODULE_DEVICE_TABLE(usb, all_serial_id_table); /* Maximum baudrate for F81232 */ #define F81232_MAX_BAUDRATE150 @@ -35,6 +61,10 @@ MODULE_DEVICE_TABLE(usb, id_table); #define F81232_REGISTER_REQUEST0xa0 #define F81232_GET_REGISTER0xc0 #define F81232_SET_REGISTER0x40 +#define F81534A_REGISTER_REQUEST F81232_REGISTER_REQUEST +#define F81534A_GET_REGISTER F81232_GET_REGISTER +#define F81534A_SET_REGISTER F81232_SET_REGISTER +#define F81534A_ACCESS_REG_RETRY 2 #define SERIAL_BASE_ADDRESS0x0120 #define RECEIVE_BUFFER_REGISTER(0x00 + SERIAL_BASE_ADDRESS) @@ -61,6 +91,11 @@ MODULE_DEVICE_TABLE(usb, id_table); #define F81232_CLK_14_77_MHZ (BIT(1) | BIT(0)) #define F81232_CLK_MASKGENMASK(1, 0) +#define F81534A_MODE_CONF_REG 0x107 +#define F81534A_TRIGGER_MASK GENMASK(3, 2) +#define F81534A_TRIGGER_MULTPILE_4XBIT(3) +#define F81534A_FIFO_128BYTE (BIT(1) | BIT(0)) + struct f81232_private { struct mutex lock; u8 modem_control; @@ -383,6 +418,46 @@ static void f81232_process_read_urb(struct urb *urb) tty_flip_buffer_push(>port); } +static void f81534a_process_read_urb(struct urb *urb) +{ + struct usb_serial_port *port = urb->context; + unsigned char *data = urb->transfer_buffer; + char tty_flag; + unsigned int i; + u8 lsr; + u8 len; + + if (urb->actual_length < 3) { + dev_err(>dev, "error actual_length: %d\n", + urb->actual_length); + return; + } + + len = data[0]; + if (len != urb->actual_length) { + dev_err(>dev, "len(%d) != urb->actual_length(%d)\n", len, + urb->actual_length); + return; + } + + /* bulk-in data: [LEN][Data.][LSR] */ + lsr = data[len - 1]; + tty_flag = f81232_handle_lsr(port, lsr); + + if (port->port.console && port->sysrq) { + for (i = 1; i < urb->actual_length - 1; ++i) + if (!usb_serial_handle_sysrq_char(port, data[i])) + tty_insert_flip_char(>port, data[i], + tty_flag); + }
[PATCH V2 7/7] USB: serial: f81232: Add gpiolib to GPIO device
The Fintek F81534A series contains 3 GPIOs per UART and The max GPIOs is 12x3 = 36 GPIOs and this patch will implements GPIO device as a gpiochip to control all GPIO pins even transforms to transceiver pins. Signed-off-by: Ji-Ze Hong (Peter Hong) --- drivers/usb/serial/f81232.c | 249 1 file changed, 249 insertions(+) diff --git a/drivers/usb/serial/f81232.c b/drivers/usb/serial/f81232.c index 82cc1e6cff62..dc9b28738b80 100644 --- a/drivers/usb/serial/f81232.c +++ b/drivers/usb/serial/f81232.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include #include @@ -104,6 +105,8 @@ MODULE_DEVICE_TABLE(usb, all_serial_id_table); #define F81534A_TRIGGER_MULTPILE_4XBIT(3) #define F81534A_FIFO_128BYTE (BIT(1) | BIT(0)) +#define F81534A_MAX_PORT 12 + /* Serial port self GPIO control, 2bytes [control data][input data] */ #define F81534A_GPIO_REG 0x10e #define F81534A_GPIO_MODE2_DIR BIT(6) /* 1: input, 0: output */ @@ -115,6 +118,13 @@ MODULE_DEVICE_TABLE(usb, all_serial_id_table); #define F81534A_CMD_ENABLE_PORT0x116 +/* + * Control device global GPIO control, + * 2bytes [control data][input data] + */ +#define F81534A_CTRL_GPIO_REG 0x1601 +#define F81534A_CTRL_GPIO_MAX_PIN 3 + struct f81232_private { struct mutex lock; u8 modem_control; @@ -126,6 +136,12 @@ struct f81232_private { struct usb_serial_port *port; }; +struct f81534a_ctrl_private { + struct usb_interface *intf; + struct gpio_chip chip; + struct mutex lock; +}; + static u32 const baudrate_table[] = { 115200, 921600, 1152000, 150 }; static u8 const clock_table[] = { F81232_CLK_1_846_MHZ, F81232_CLK_14_77_MHZ, F81232_CLK_18_46_MHZ, F81232_CLK_24_MHZ }; @@ -863,6 +879,50 @@ static void f81232_lsr_worker(struct work_struct *work) dev_warn(>dev, "read LSR failed: %d\n", status); } +static int f81534a_ctrl_get_register(struct usb_device *dev, u16 reg, u16 size, + void *val) +{ + int retry = F81534A_ACCESS_REG_RETRY; + int status; + u8 *tmp; + + tmp = kmalloc(size, GFP_KERNEL); + if (!tmp) + return -ENOMEM; + + while (retry--) { + status = usb_control_msg(dev, + usb_rcvctrlpipe(dev, 0), + F81534A_REGISTER_REQUEST, + F81534A_GET_REGISTER, + reg, + 0, + tmp, + size, + USB_CTRL_GET_TIMEOUT); + if (status != size) { + status = usb_translate_errors(status); + if (status == -EIO) + continue; + + status = -EIO; + } else { + status = 0; + memcpy(val, tmp, size); + } + + break; + } + + if (status) { + dev_err(>dev, "get reg: %x, failed status: %d\n", reg, + status); + } + + kfree(tmp); + return status; +} + static int f81534a_ctrl_set_register(struct usb_device *dev, u16 reg, u16 size, void *val) { @@ -908,6 +968,182 @@ static int f81534a_ctrl_set_register(struct usb_device *dev, u16 reg, u16 size, return status; } +#ifdef CONFIG_GPIOLIB +static int f81534a_ctrl_set_mask_register(struct usb_device *dev, u16 reg, + u8 mask, u8 val) +{ + int status; + u8 tmp; + + status = f81534a_ctrl_get_register(dev, reg, 1, ); + if (status) + return status; + + + tmp = (tmp & ~mask) | (val & mask); + + status = f81534a_ctrl_set_register(dev, reg, 1, ); + if (status) + return status; + + return 0; +} + +static int f81534a_gpio_get(struct gpio_chip *chip, unsigned int gpio_num) +{ + struct f81534a_ctrl_private *priv = gpiochip_get_data(chip); + struct usb_interface *intf = priv->intf; + struct usb_device *dev = interface_to_usbdev(intf); + int status; + u8 tmp[2]; + int set; + int idx; + int reg; + + set = gpio_num / F81534A_CTRL_GPIO_MAX_PIN; + idx = gpio_num % F81534A_CTRL_GPIO_MAX_PIN; + reg = F81534A_CTRL_GPIO_REG + set; + + mutex_lock(>lock); + + status = f81534a_ctrl_get_register(dev, reg, sizeof(tmp), tmp); + if (status) { + mutex_unlock(>lock); + return status; + } + + mutex_unlock(>lock); + + return !!(tmp[1] & BIT(idx)); +} + +static int f81534a_gpio_direction_in(struct gpio_chip
[PATCH V2 3/7] USB: serial: f81232: Use devm_kzalloc
Use devm_kzalloc() to replace kzalloc(). Signed-off-by: Ji-Ze Hong (Peter Hong) --- drivers/usb/serial/f81232.c | 13 + 1 file changed, 1 insertion(+), 12 deletions(-) diff --git a/drivers/usb/serial/f81232.c b/drivers/usb/serial/f81232.c index b42b3738a768..e4db0aec9af0 100644 --- a/drivers/usb/serial/f81232.c +++ b/drivers/usb/serial/f81232.c @@ -756,7 +756,7 @@ static int f81232_port_probe(struct usb_serial_port *port) { struct f81232_private *priv; - priv = kzalloc(sizeof(*priv), GFP_KERNEL); + priv = devm_kzalloc(>dev, sizeof(*priv), GFP_KERNEL); if (!priv) return -ENOMEM; @@ -772,16 +772,6 @@ static int f81232_port_probe(struct usb_serial_port *port) return 0; } -static int f81232_port_remove(struct usb_serial_port *port) -{ - struct f81232_private *priv; - - priv = usb_get_serial_port_data(port); - kfree(priv); - - return 0; -} - static int f81232_suspend(struct usb_serial *serial, pm_message_t message) { struct usb_serial_port *port = serial->port[0]; @@ -841,7 +831,6 @@ static struct usb_serial_driver f81232_device = { .process_read_urb = f81232_process_read_urb, .read_int_callback =f81232_read_int_callback, .port_probe = f81232_port_probe, - .port_remove = f81232_port_remove, .suspend = f81232_suspend, .resume = f81232_resume, }; -- 2.17.1
[PATCH V2 1/7] USB: serial: f81232: Extract LSR handler
Extract LSR handler to function. Signed-off-by: Ji-Ze Hong (Peter Hong) --- drivers/usb/serial/f81232.c | 53 + 1 file changed, 30 insertions(+), 23 deletions(-) diff --git a/drivers/usb/serial/f81232.c b/drivers/usb/serial/f81232.c index 43fa1f0716b7..c07d376c743d 100644 --- a/drivers/usb/serial/f81232.c +++ b/drivers/usb/serial/f81232.c @@ -322,10 +322,38 @@ static void f81232_read_int_callback(struct urb *urb) __func__, retval); } +static char f81232_handle_lsr(struct usb_serial_port *port, u8 lsr) +{ + struct f81232_private *priv = usb_get_serial_port_data(port); + char tty_flag = TTY_NORMAL; + + if (!(lsr & UART_LSR_BRK_ERROR_BITS)) + return tty_flag; + + if (lsr & UART_LSR_BI) { + tty_flag = TTY_BREAK; + port->icount.brk++; + usb_serial_handle_break(port); + } else if (lsr & UART_LSR_PE) { + tty_flag = TTY_PARITY; + port->icount.parity++; + } else if (lsr & UART_LSR_FE) { + tty_flag = TTY_FRAME; + port->icount.frame++; + } + + if (lsr & UART_LSR_OE) { + port->icount.overrun++; + schedule_work(>lsr_work); + tty_insert_flip_char(>port, 0, TTY_OVERRUN); + } + + return tty_flag; +} + static void f81232_process_read_urb(struct urb *urb) { struct usb_serial_port *port = urb->context; - struct f81232_private *priv = usb_get_serial_port_data(port); unsigned char *data = urb->transfer_buffer; char tty_flag; unsigned int i; @@ -341,29 +369,8 @@ static void f81232_process_read_urb(struct urb *urb) /* bulk-in data: [LSR(1Byte)+DATA(1Byte)][LSR(1Byte)+DATA(1Byte)]... */ for (i = 0; i < urb->actual_length; i += 2) { - tty_flag = TTY_NORMAL; lsr = data[i]; - - if (lsr & UART_LSR_BRK_ERROR_BITS) { - if (lsr & UART_LSR_BI) { - tty_flag = TTY_BREAK; - port->icount.brk++; - usb_serial_handle_break(port); - } else if (lsr & UART_LSR_PE) { - tty_flag = TTY_PARITY; - port->icount.parity++; - } else if (lsr & UART_LSR_FE) { - tty_flag = TTY_FRAME; - port->icount.frame++; - } - - if (lsr & UART_LSR_OE) { - port->icount.overrun++; - schedule_work(>lsr_work); - tty_insert_flip_char(>port, 0, - TTY_OVERRUN); - } - } + tty_flag = f81232_handle_lsr(port, lsr); if (port->port.console && port->sysrq) { if (usb_serial_handle_sysrq_char(port, data[i + 1])) -- 2.17.1
[PATCH V2 0/7] Add Fintek F81534A series usb-to-serial driver
The Fintek F81532A/534A/535/536 is USB-to-2/4/8/12 serial ports device and the serial port is default disabled when plugin computer. The part number is a bit same with F81532/534, but F81534A series UART core is enhanced from F81232, not F81532/534. The IC is contains devices as following: 1. HUB (all devices is connected with this hub) 2. GPIO/Control device. (enable serial port and control all GPIOs) 3. serial port 1 to x (2/4/8/12) It's most same with F81232, the UART device is difference as follow: 1. TX/RX bulk size is 128/512bytes 2. RX bulk layout change: F81232: [LSR(1Byte)+DATA(1Byte)][LSR(1Byte)+DATA(1Byte)]... F81534A:[LEN][Data.][LSR] We'll try to do some code refacting before add F81534A series. Ji-Ze Hong (Peter Hong) (7): USB: serial: f81232: Extract LSR handler USB: serial: f81232: Add tx_empty function USB: serial: f81232: Use devm_kzalloc USB: serial: f81232: Add F81534A support USB: serial: f81232: Set F81534A serial port with RS232 mode USB: serial: f81232: Add generator for F81534A USB: serial: f81232: Add gpiolib to GPIO device drivers/usb/serial/f81232.c | 604 ++-- 1 file changed, 570 insertions(+), 34 deletions(-) -- 2.17.1
[PATCH V2 6/7] USB: serial: f81232: Add generator for F81534A
The Fintek F81534A series is contains 1 HUB / 1 GPIO device / n UARTs, but the UART is default disable and need enabled by GPIO device(2c42/16F8). When F81534A plug to host, we can only see 1 HUB & 1 GPIO device and we need write 0x8fff to GPIO device register F81534A_CMD_ENABLE_PORT(116h) to enable all available serial ports. Signed-off-by: Ji-Ze Hong (Peter Hong) --- drivers/usb/serial/f81232.c | 135 +++- 1 file changed, 134 insertions(+), 1 deletion(-) diff --git a/drivers/usb/serial/f81232.c b/drivers/usb/serial/f81232.c index 01cb5a5ea1d2..82cc1e6cff62 100644 --- a/drivers/usb/serial/f81232.c +++ b/drivers/usb/serial/f81232.c @@ -36,6 +36,9 @@ { USB_DEVICE(0x2c42, 0x1635) }, /* 8 port UART device */ \ { USB_DEVICE(0x2c42, 0x1636) } /* 12 port UART device */ +#define F81534A_CTRL_ID\ + { USB_DEVICE(0x2c42, 0x16f8) } /* Global control device */ + static const struct usb_device_id id_table[] = { F81232_ID, { } /* Terminating entry */ @@ -46,6 +49,11 @@ static const struct usb_device_id f81534a_id_table[] = { { } /* Terminating entry */ }; +static const struct usb_device_id f81534a_ctrl_id_table[] = { + F81534A_CTRL_ID, + { } /* Terminating entry */ +}; + static const struct usb_device_id all_serial_id_table[] = { F81232_ID, F81534A_SERIES_ID, @@ -105,6 +113,8 @@ MODULE_DEVICE_TABLE(usb, all_serial_id_table); #define F81534A_GPIO_MODE1_OUTPUT BIT(1) #define F81534A_GPIO_MODE0_OUTPUT BIT(0) +#define F81534A_CMD_ENABLE_PORT0x116 + struct f81232_private { struct mutex lock; u8 modem_control; @@ -853,6 +863,95 @@ static void f81232_lsr_worker(struct work_struct *work) dev_warn(>dev, "read LSR failed: %d\n", status); } +static int f81534a_ctrl_set_register(struct usb_device *dev, u16 reg, u16 size, + void *val) +{ + int retry = F81534A_ACCESS_REG_RETRY; + int status; + u8 *tmp; + + tmp = kmalloc(size, GFP_KERNEL); + if (!tmp) + return -ENOMEM; + + memcpy(tmp, val, size); + + while (retry--) { + status = usb_control_msg(dev, + usb_sndctrlpipe(dev, 0), + F81534A_REGISTER_REQUEST, + F81534A_SET_REGISTER, + reg, + 0, + tmp, + size, + USB_CTRL_SET_TIMEOUT); + if (status != size) { + status = usb_translate_errors(status); + if (status == -EIO) + continue; + + status = -EIO; + } else { + status = 0; + } + + break; + } + + if (status) { + dev_err(>dev, "set ctrl reg: %x, failed status: %d\n", reg, + status); + } + + kfree(tmp); + return status; +} + +static int f81534a_ctrl_enable_all_ports(struct usb_interface *intf) +{ + struct usb_device *dev = interface_to_usbdev(intf); + unsigned char enable[2]; + int status; + + /* enable all available serial ports */ + enable[0] = 0xff; + enable[1] = 0x8f; + + status = f81534a_ctrl_set_register(dev, F81534A_CMD_ENABLE_PORT, + sizeof(enable), enable); + if (status) + dev_warn(>dev, "set CMD_ENABLE_PORT failed: %d\n", status); + + return status; +} + +static int f81534a_ctrl_probe(struct usb_interface *intf, + const struct usb_device_id *id) +{ + struct usb_device *dev = interface_to_usbdev(intf); + int status; + + status = f81534a_ctrl_enable_all_ports(intf); + if (status) + return status; + + dev = usb_get_dev(dev); + return 0; +} + +static void f81534a_ctrl_disconnect(struct usb_interface *intf) +{ + struct usb_device *dev = interface_to_usbdev(intf); + + usb_put_dev(dev); +} + +static int f81534a_ctrl_resume(struct usb_interface *intf) +{ + return f81534a_ctrl_enable_all_ports(intf); +} + static int f81232_port_probe(struct usb_serial_port *port) { struct f81232_private *priv; @@ -980,7 +1079,41 @@ static struct usb_serial_driver * const serial_drivers[] = { NULL, }; -module_usb_serial_driver(serial_drivers, all_serial_id_table); +static struct usb_driver f81534a_ctrl_driver = { + .name = "f81534a_ctrl", + .id_table = f81534a_ctrl_id_table, + .probe =f81534a_ctrl_probe, +
[PATCH V2 5/7] USB: serial: f81232: Set F81534A serial port with RS232 mode
The Fintek F81532A/534A/535/536 is USB-to-2/4/8/12 serial ports device and the serial ports are default disabled. Each port contains max 3 pins GPIO and the 3 pins are default pull high with input mode. When the serial port had activated (running probe()), we'll transform the 3 pins from GPIO function publicly to control Tranceiver privately use. We'll default set to 0/0/1 for control transceiver to RS232 mode. Otherwise, If the serial port is not active, the 3 pins is in GPIO mode and controlled by global GPIO device with VID/PID: 2c42/16f8. Signed-off-by: Ji-Ze Hong (Peter Hong) --- drivers/usb/serial/f81232.c | 17 + 1 file changed, 17 insertions(+) diff --git a/drivers/usb/serial/f81232.c b/drivers/usb/serial/f81232.c index 36a17aedc2ae..01cb5a5ea1d2 100644 --- a/drivers/usb/serial/f81232.c +++ b/drivers/usb/serial/f81232.c @@ -96,6 +96,15 @@ MODULE_DEVICE_TABLE(usb, all_serial_id_table); #define F81534A_TRIGGER_MULTPILE_4XBIT(3) #define F81534A_FIFO_128BYTE (BIT(1) | BIT(0)) +/* Serial port self GPIO control, 2bytes [control data][input data] */ +#define F81534A_GPIO_REG 0x10e +#define F81534A_GPIO_MODE2_DIR BIT(6) /* 1: input, 0: output */ +#define F81534A_GPIO_MODE1_DIR BIT(5) +#define F81534A_GPIO_MODE0_DIR BIT(4) +#define F81534A_GPIO_MODE2_OUTPUT BIT(2) +#define F81534A_GPIO_MODE1_OUTPUT BIT(1) +#define F81534A_GPIO_MODE0_OUTPUT BIT(0) + struct f81232_private { struct mutex lock; u8 modem_control; @@ -866,6 +875,14 @@ static int f81232_port_probe(struct usb_serial_port *port) static int f81534a_port_probe(struct usb_serial_port *port) { + int status; + + /* tri-state with pull-high, default RS232 Mode */ + status = f81232_set_register(port, F81534A_GPIO_REG, + F81534A_GPIO_MODE2_DIR); + if (status) + return status; + return f81232_port_probe(port); } -- 2.17.1
[PATCH V2 2/7] USB: serial: f81232: Add tx_empty function
Add tx_empty() function for F81232. Signed-off-by: Ji-Ze Hong (Peter Hong) --- drivers/usb/serial/f81232.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/drivers/usb/serial/f81232.c b/drivers/usb/serial/f81232.c index c07d376c743d..b42b3738a768 100644 --- a/drivers/usb/serial/f81232.c +++ b/drivers/usb/serial/f81232.c @@ -685,6 +685,23 @@ static void f81232_dtr_rts(struct usb_serial_port *port, int on) f81232_set_mctrl(port, 0, TIOCM_DTR | TIOCM_RTS); } +static bool f81232_tx_empty(struct usb_serial_port *port) +{ + int status; + u8 tmp; + + status = f81232_get_register(port, LINE_STATUS_REGISTER, ); + if (status) { + dev_err(>dev, "get LSR status failed: %d\n", status); + return false; + } + + if ((tmp & UART_LSR_TEMT) != UART_LSR_TEMT) + return false; + + return true; +} + static int f81232_carrier_raised(struct usb_serial_port *port) { u8 msr; @@ -820,6 +837,7 @@ static struct usb_serial_driver f81232_device = { .tiocmget = f81232_tiocmget, .tiocmset = f81232_tiocmset, .tiocmiwait = usb_serial_generic_tiocmiwait, + .tx_empty = f81232_tx_empty, .process_read_urb = f81232_process_read_urb, .read_int_callback =f81232_read_int_callback, .port_probe = f81232_port_probe, -- 2.17.1
[PATCH] iio: Fix an undefied reference error in noa1305_probe
I hit the following error when compile the kernel. drivers/iio/light/noa1305.o: In function `noa1305_probe': noa1305.c:(.text+0x65): undefined reference to `__devm_regmap_init_i2c' make: *** [vmlinux] Error 1 Signed-off-by: zhong jiang --- drivers/iio/light/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/iio/light/Kconfig b/drivers/iio/light/Kconfig index 08d7e1e..4a1a883 100644 --- a/drivers/iio/light/Kconfig +++ b/drivers/iio/light/Kconfig @@ -314,6 +314,7 @@ config MAX44009 config NOA1305 tristate "ON Semiconductor NOA1305 ambient light sensor" depends on I2C + select REGMAP_I2C help Say Y here if you want to build support for the ON Semiconductor NOA1305 ambient light sensor. -- 1.7.12.4
Re: [PATCH 4.19 32/79] fpga: altera-ps-spi: Fix getting of optional confd gpio
On 22/09/2019 04:46, Pavel Machek wrote: Hi! Currently the driver does not handle EPROBE_DEFER for the confd gpio. Use devm_gpiod_get_optional() instead of devm_gpiod_get() and return error codes from altera_ps_probe(). @@ -265,10 +265,13 @@ static int altera_ps_probe(struct spi_device *spi) return PTR_ERR(conf->status); } - conf->confd = devm_gpiod_get(>dev, "confd", GPIOD_IN); + conf->confd = devm_gpiod_get_optional(>dev, "confd", GPIOD_IN); if (IS_ERR(conf->confd)) { - dev_warn(>dev, "Not using confd gpio: %ld\n", -PTR_ERR(conf->confd)); + dev_err(>dev, "Failed to get confd gpio: %ld\n", + PTR_ERR(conf->confd)); + return PTR_ERR(conf->confd); Will this result in repeated errors in dmesg in case of EPROBE_DEFER? We often avoid printing such messages in EPROBE_DEFER case. Yes it will. I can submit a patch for that if required. + } else if (!conf->confd) { + dev_warn(>dev, "Not using confd gpio"); } /* Register manager with unique name */ Best regards, Pavel -- Regards Phil Reid
Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return
On Sun, Sep 22, 2019 at 5:46 PM Jason Wang wrote: > > > On 2019/9/23 上午1:43, Matt Cover wrote: > > On Sun, Sep 22, 2019 at 5:37 AM Michael S. Tsirkin wrote: > >> On Fri, Sep 20, 2019 at 11:58:43AM -0700, Matthew Cover wrote: > >>> Treat a negative return from a TUNSETSTEERINGEBPF bpf prog as a signal > >>> to fallback to tun_automq_select_queue() for tx queue selection. > >>> > >>> Compilation of this exact patch was tested. > >>> > >>> For functional testing 3 additional printk()s were added. > >>> > >>> Functional testing results (on 2 txq tap device): > >>> > >>>[Fri Sep 20 18:33:27 2019] == tun no prog == > >>>[Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned > >>> '-1' > >>>[Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran > >>>[Fri Sep 20 18:33:27 2019] == tun prog -1 == > >>>[Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned > >>> '-1' > >>>[Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned > >>> '-1' > >>>[Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran > >>>[Fri Sep 20 18:33:27 2019] == tun prog 0 == > >>>[Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '0' > >>>[Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' > >>>[Fri Sep 20 18:33:27 2019] == tun prog 1 == > >>>[Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '1' > >>>[Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '1' > >>>[Fri Sep 20 18:33:27 2019] == tun prog 2 == > >>>[Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '2' > >>>[Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' > >>> > >>> Signed-off-by: Matthew Cover > >> > >> Could you add a bit more motivation data here? > > Thank you for these questions Michael. > > > > I'll plan on adding the below information to the > > commit message and submitting a v2 of this patch > > when net-next reopens. In the meantime, it would > > be very helpful to know if these answers address > > some of your concerns. > > > >> 1. why is this a good idea > > This change allows TUNSETSTEERINGEBPF progs to > > do any of the following. > > 1. implement queue selection for a subset of > > traffic (e.g. special queue selection logic > > for ipv4, but return negative and use the > > default automq logic for ipv6) > > > Well, using ebpf means it need to take care of all the cases. E.g you > can easily implement the fallback through eBPF as well. > I really think there is value in being able to implement a scoped special case while leaving the rest of the packets in the kernel's hands. Having to reimplement automq makes this hookpoint less accessible to beginners and experienced alike. > > > 2. determine there isn't sufficient information > > to do proper queue selection; return > > negative and use the default automq logic > > for the unknown > > > Same as above. > > > > 3. implement a noop prog (e.g. do > > bpf_trace_printk() then return negative and > > use the default automq logic for everything) > > > ditto. > > > > > >> 2. how do we know existing userspace does not rely on existing behaviour > > Prior to this change a negative return from a > > TUNSETSTEERINGEBPF prog would have been cast > > into a u16 and traversed netdev_cap_txqueue(). > > > > In most cases netdev_cap_txqueue() would have > > found this value to exceed real_num_tx_queues > > and queue_index would be updated to 0. > > > > It is possible that a TUNSETSTEERINGEBPF prog > > return a negative value which when cast into a > > u16 results in a positive queue_index less than > > real_num_tx_queues. For example, on x86_64, a > > return value of -65535 results in a queue_index > > of 1; which is a valid queue for any multiqueue > > device. > > > > It seems unlikely, however as stated above is > > unfortunately possible, that existing > > TUNSETSTEERINGEBPF programs would choose to > > return a negative value rather than return the > > positive value which holds the same meaning. > > > > It seems more likely that future > > TUNSETSTEERINGEBPF programs would leverage a > > negative return and potentially be loaded into > > a kernel with the old behavior. > > > Yes, eBPF can return probably wrong value, but what kernel did is just > to make sure it doesn't harm anything. > > I would rather just drop the packet in this case. > In addition to TUN_SSE_ABORT, we can add TUN_SSE_DROP. That could be made the default for any undefined negative return as well. > Thanks > > > > > >> 3. why doesn't userspace need a way to figure out whether it runs on a > >> kernel with and > >> without this patch > > There may be some value in exposing this fact > > to the ebpf prog loader. What is the standard > > practice here, a define? > > > >> > >> thanks, > >> MST > >> >
RE: [PATCH 2/2] nvmem: imx: scu: support write
Hi Srinivas, > >> Subject: [PATCH 2/2] nvmem: imx: scu: support write > > > > Ping.. > > > Thanks for your patience! > I normally do not take patches after rc5 for nvmem. > These will be applied after rc1 is released! Sorry to ping again. Will you pick up since merge window is open? Thanks, Peng. > > Thanks, > srini > > Thanks, > > Peng. > > > >> > >> From: Peng Fan > >> > >> The fuse programming from non-secure world is blocked, so we could > >> only use Arm Trusted Firmware SIP call to let ATF program fuse. > >> > >> Because there is ECC region that could only be programmed once, so > >> add a heler in_ecc to check the ecc region. > >> > >> Signed-off-by: Peng Fan > >> --- > >> > >> The ATF patch will soon be posted to ATF community. > >> > >> drivers/nvmem/imx-ocotp-scu.c | 73 > >> ++- > >> 1 file changed, 72 insertions(+), 1 deletion(-) > >> > >> diff --git a/drivers/nvmem/imx-ocotp-scu.c > >> b/drivers/nvmem/imx-ocotp-scu.c index 2f339d7432e6..0f064f2e74a8 > >> 100644 > >> --- a/drivers/nvmem/imx-ocotp-scu.c > >> +++ b/drivers/nvmem/imx-ocotp-scu.c > >> @@ -7,6 +7,7 @@ > >>* Peng Fan > >>*/ > >> > >> +#include > >> #include > >> #include > >> #include > >> @@ -14,6 +15,9 @@ > >> #include > >> #include > >> > >> +#define IMX_SIP_OTP 0xC20A > >> +#define IMX_SIP_OTP_WRITE 0x2 > >> + > >> enum ocotp_devtype { > >>IMX8QXP, > >> }; > >> @@ -45,6 +49,8 @@ struct imx_sc_msg_misc_fuse_read { > >>u32 word; > >> } __packed; > >> > >> +static DEFINE_MUTEX(scu_ocotp_mutex); > >> + > >> static struct ocotp_devtype_data imx8qxp_data = { > >>.devtype = IMX8QXP, > >>.nregs = 800, > >> @@ -73,6 +79,23 @@ static bool in_hole(void *context, u32 index) > >>return false; > >> } > >> > >> +static bool in_ecc(void *context, u32 index) { > >> + struct ocotp_priv *priv = context; > >> + const struct ocotp_devtype_data *data = priv->data; > >> + int i; > >> + > >> + for (i = 0; i < data->num_region; i++) { > >> + if (data->region[i].flag & ECC_REGION) { > >> + if ((index >= data->region[i].start) && > >> + (index <= data->region[i].end)) > >> + return true; > >> + } > >> + } > >> + > >> + return false; > >> +} > >> + > >> static int imx_sc_misc_otp_fuse_read(struct imx_sc_ipc *ipc, u32 > word, > >> u32 *val) > >> { > >> @@ -116,6 +139,8 @@ static int imx_scu_ocotp_read(void *context, > >> unsigned int offset, > >>if (!p) > >>return -ENOMEM; > >> > >> + mutex_lock(_ocotp_mutex); > >> + > >>buf = p; > >> > >>for (i = index; i < (index + count); i++) { @@ -126,6 +151,7 @@ > >> static int imx_scu_ocotp_read(void *context, unsigned int offset, > >> > >>ret = imx_sc_misc_otp_fuse_read(priv->nvmem_ipc, i, buf); > >>if (ret) { > >> + mutex_unlock(_ocotp_mutex); > >>kfree(p); > >>return ret; > >>} > >> @@ -134,18 +160,63 @@ static int imx_scu_ocotp_read(void *context, > >> unsigned int offset, > >> > >>memcpy(val, (u8 *)p + offset % 4, bytes); > >> > >> + mutex_unlock(_ocotp_mutex); > >> + > >>kfree(p); > >> > >>return 0; > >> } > >> > >> +static int imx_scu_ocotp_write(void *context, unsigned int offset, > >> + void *val, size_t bytes) > >> +{ > >> + struct ocotp_priv *priv = context; > >> + struct arm_smccc_res res; > >> + u32 *buf = val; > >> + u32 tmp; > >> + u32 index; > >> + int ret; > >> + > >> + /* allow only writing one complete OTP word at a time */ > >> + if ((bytes != 4) || (offset % 4)) > >> + return -EINVAL; > >> + > >> + index = offset >> 2; > >> + > >> + if (in_hole(context, index)) > >> + return -EINVAL; > >> + > >> + if (in_ecc(context, index)) { > >> + pr_warn("ECC region, only program once\n"); > >> + mutex_lock(_ocotp_mutex); > >> + ret = imx_sc_misc_otp_fuse_read(priv->nvmem_ipc, index, ); > >> + mutex_unlock(_ocotp_mutex); > >> + if (ret) > >> + return ret; > >> + if (tmp) { > >> + pr_warn("ECC region, already has value: %x\n", tmp); > >> + return -EIO; > >> + } > >> + } > >> + > >> + mutex_lock(_ocotp_mutex); > >> + > >> + arm_smccc_smc(IMX_SIP_OTP, IMX_SIP_OTP_WRITE, index, *buf, > >> +0, 0, 0, 0, ); > >> + > >> + mutex_unlock(_ocotp_mutex); > >> + > >> + return res.a0; > >> +} > >> + > >> static struct nvmem_config imx_scu_ocotp_nvmem_config = { > >>.name = "imx-scu-ocotp", > >> - .read_only = true, > >> + .read_only = false, > >>.word_size = 4, > >>.stride = 1, > >>.owner = THIS_MODULE, > >>.reg_read = imx_scu_ocotp_read, > >> + .reg_write = imx_scu_ocotp_write, > >> }; > >> > >> static const struct
Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return
On Sun, Sep 22, 2019 at 5:51 PM Jason Wang wrote: > > > On 2019/9/23 上午6:30, Matt Cover wrote: > > On Sun, Sep 22, 2019 at 1:36 PM Michael S. Tsirkin wrote: > >> On Sun, Sep 22, 2019 at 10:43:19AM -0700, Matt Cover wrote: > >>> On Sun, Sep 22, 2019 at 5:37 AM Michael S. Tsirkin > >>> wrote: > On Fri, Sep 20, 2019 at 11:58:43AM -0700, Matthew Cover wrote: > > Treat a negative return from a TUNSETSTEERINGEBPF bpf prog as a signal > > to fallback to tun_automq_select_queue() for tx queue selection. > > > > Compilation of this exact patch was tested. > > > > For functional testing 3 additional printk()s were added. > > > > Functional testing results (on 2 txq tap device): > > > >[Fri Sep 20 18:33:27 2019] == tun no prog == > >[Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned > > '-1' > >[Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran > >[Fri Sep 20 18:33:27 2019] == tun prog -1 == > >[Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned > > '-1' > >[Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned > > '-1' > >[Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran > >[Fri Sep 20 18:33:27 2019] == tun prog 0 == > >[Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned > > '0' > >[Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned > > '0' > >[Fri Sep 20 18:33:27 2019] == tun prog 1 == > >[Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned > > '1' > >[Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned > > '1' > >[Fri Sep 20 18:33:27 2019] == tun prog 2 == > >[Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned > > '2' > >[Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned > > '0' > > > > Signed-off-by: Matthew Cover > > Could you add a bit more motivation data here? > >>> Thank you for these questions Michael. > >>> > >>> I'll plan on adding the below information to the > >>> commit message and submitting a v2 of this patch > >>> when net-next reopens. In the meantime, it would > >>> be very helpful to know if these answers address > >>> some of your concerns. > >>> > 1. why is this a good idea > >>> This change allows TUNSETSTEERINGEBPF progs to > >>> do any of the following. > >>> 1. implement queue selection for a subset of > >>> traffic (e.g. special queue selection logic > >>> for ipv4, but return negative and use the > >>> default automq logic for ipv6) > >>> 2. determine there isn't sufficient information > >>> to do proper queue selection; return > >>> negative and use the default automq logic > >>> for the unknown > >>> 3. implement a noop prog (e.g. do > >>> bpf_trace_printk() then return negative and > >>> use the default automq logic for everything) > >>> > 2. how do we know existing userspace does not rely on existing behaviour > >>> Prior to this change a negative return from a > >>> TUNSETSTEERINGEBPF prog would have been cast > >>> into a u16 and traversed netdev_cap_txqueue(). > >>> > >>> In most cases netdev_cap_txqueue() would have > >>> found this value to exceed real_num_tx_queues > >>> and queue_index would be updated to 0. > >>> > >>> It is possible that a TUNSETSTEERINGEBPF prog > >>> return a negative value which when cast into a > >>> u16 results in a positive queue_index less than > >>> real_num_tx_queues. For example, on x86_64, a > >>> return value of -65535 results in a queue_index > >>> of 1; which is a valid queue for any multiqueue > >>> device. > >>> > >>> It seems unlikely, however as stated above is > >>> unfortunately possible, that existing > >>> TUNSETSTEERINGEBPF programs would choose to > >>> return a negative value rather than return the > >>> positive value which holds the same meaning. > >>> > >>> It seems more likely that future > >>> TUNSETSTEERINGEBPF programs would leverage a > >>> negative return and potentially be loaded into > >>> a kernel with the old behavior. > >> OK if we are returning a special > >> value, shouldn't we limit it? How about a special > >> value with this meaning? > >> If we are changing an ABI let's at least make it > >> extensible. > >> > > A special value with this meaning sounds > > good to me. I'll plan on adding a define > > set to -1 to cause the fallback to automq. > > > Can it really return -1? > > I see: > > static inline u32 bpf_prog_run_clear_cb(const struct bpf_prog *prog, > struct sk_buff *skb) > ... > > > > > > The way I was initially viewing the old > > behavior was that returning negative was > > undefined; it happened to have the > > outcomes I walked
[PATCH v2] soc: ti: move 2 driver config options into the TI SOC drivers menu
From: Randy Dunlap Move the AM654 and J721E SOC config options inside the "TI SOC drivers" menu with the other TI SOC drivers. Fixes: a869b7b30dac ("soc: ti: Add Support for AM654 SoC config option") Fixes: cff377f7897a ("soc: ti: Add Support for J721E SoC config option") Signed-off-by: Randy Dunlap Cc: Santosh Shilimkar Cc: Olof Johansson Cc: Nishanth Menon #Cc: Benjamin Fair Cc: Tony Lindgren Cc: Tero Kristo Cc: linux-kernel@vger.kernel.org Cc: linux-arm-ker...@lists.infradead.org --- v2: add Santosh (maintainer) for merging drop Benjamin Fair (email address bounces) drivers/soc/ti/Kconfig | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) --- lnx-53.orig/drivers/soc/ti/Kconfig +++ lnx-53/drivers/soc/ti/Kconfig @@ -1,4 +1,12 @@ # SPDX-License-Identifier: GPL-2.0-only + +# TI SOC drivers +# +menuconfig SOC_TI + bool "TI SOC drivers support" + +if SOC_TI + # 64-bit ARM SoCs from TI if ARM64 @@ -14,17 +22,9 @@ config ARCH_K3_J721E_SOC help Enable support for TI's J721E SoC Family. -endif +endif # ARCH_K3 -endif - -# -# TI SOC drivers -# -menuconfig SOC_TI - bool "TI SOC drivers support" - -if SOC_TI +endif # ARM64 config KEYSTONE_NAVIGATOR_QMSS tristate "Keystone Queue Manager Subsystem"
Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return
On 2019/9/23 上午6:30, Matt Cover wrote: On Sun, Sep 22, 2019 at 1:36 PM Michael S. Tsirkin wrote: On Sun, Sep 22, 2019 at 10:43:19AM -0700, Matt Cover wrote: On Sun, Sep 22, 2019 at 5:37 AM Michael S. Tsirkin wrote: On Fri, Sep 20, 2019 at 11:58:43AM -0700, Matthew Cover wrote: Treat a negative return from a TUNSETSTEERINGEBPF bpf prog as a signal to fallback to tun_automq_select_queue() for tx queue selection. Compilation of this exact patch was tested. For functional testing 3 additional printk()s were added. Functional testing results (on 2 txq tap device): [Fri Sep 20 18:33:27 2019] == tun no prog == [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran [Fri Sep 20 18:33:27 2019] == tun prog -1 == [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '-1' [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran [Fri Sep 20 18:33:27 2019] == tun prog 0 == [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '0' [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' [Fri Sep 20 18:33:27 2019] == tun prog 1 == [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '1' [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '1' [Fri Sep 20 18:33:27 2019] == tun prog 2 == [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '2' [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' Signed-off-by: Matthew Cover Could you add a bit more motivation data here? Thank you for these questions Michael. I'll plan on adding the below information to the commit message and submitting a v2 of this patch when net-next reopens. In the meantime, it would be very helpful to know if these answers address some of your concerns. 1. why is this a good idea This change allows TUNSETSTEERINGEBPF progs to do any of the following. 1. implement queue selection for a subset of traffic (e.g. special queue selection logic for ipv4, but return negative and use the default automq logic for ipv6) 2. determine there isn't sufficient information to do proper queue selection; return negative and use the default automq logic for the unknown 3. implement a noop prog (e.g. do bpf_trace_printk() then return negative and use the default automq logic for everything) 2. how do we know existing userspace does not rely on existing behaviour Prior to this change a negative return from a TUNSETSTEERINGEBPF prog would have been cast into a u16 and traversed netdev_cap_txqueue(). In most cases netdev_cap_txqueue() would have found this value to exceed real_num_tx_queues and queue_index would be updated to 0. It is possible that a TUNSETSTEERINGEBPF prog return a negative value which when cast into a u16 results in a positive queue_index less than real_num_tx_queues. For example, on x86_64, a return value of -65535 results in a queue_index of 1; which is a valid queue for any multiqueue device. It seems unlikely, however as stated above is unfortunately possible, that existing TUNSETSTEERINGEBPF programs would choose to return a negative value rather than return the positive value which holds the same meaning. It seems more likely that future TUNSETSTEERINGEBPF programs would leverage a negative return and potentially be loaded into a kernel with the old behavior. OK if we are returning a special value, shouldn't we limit it? How about a special value with this meaning? If we are changing an ABI let's at least make it extensible. A special value with this meaning sounds good to me. I'll plan on adding a define set to -1 to cause the fallback to automq. Can it really return -1? I see: static inline u32 bpf_prog_run_clear_cb(const struct bpf_prog *prog, struct sk_buff *skb) ... The way I was initially viewing the old behavior was that returning negative was undefined; it happened to have the outcomes I walked through, but not necessarily by design. Having such fallback may bring extra troubles, it requires the eBPF program know the existence of the behavior which is not a part of kernel ABI actually. And then some eBPF program may start to rely on that which is pretty dangerous. Note, one important consideration is to have macvtap support where does not have any stuffs like automq. Thanks In order to keep the new behavior extensible, how should we state that a negative return other than -1 is undefined and therefore subject to change. Is something like this sufficient? Documentation/networking/tc-actions-env-rules.txt Additionally, what should the new behavior implement when a negative other than -1 is returned? I would like
[PATCH 1/4] riscv: avoid kernel hangs when trapped in BUG()
When the CONFIG_GENERIC_BUG is disabled by disabling CONFIG_BUG, if a kernel thread is trapped by BUG(), the whole system will be in the loop that infinitely handles the ebreak exception instead of entering the die function. To fix this problem, the do_trap_break() will always call the die() to deal with the break exception as the type of break is BUG_TRAP_TYPE_BUG. Signed-off-by: Vincent Chen --- arch/riscv/kernel/traps.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c index 424eb72d56b1..055a937aca70 100644 --- a/arch/riscv/kernel/traps.c +++ b/arch/riscv/kernel/traps.c @@ -124,23 +124,23 @@ static inline unsigned long get_break_insn_length(unsigned long pc) asmlinkage void do_trap_break(struct pt_regs *regs) { -#ifdef CONFIG_GENERIC_BUG if (!user_mode(regs)) { enum bug_trap_type type; type = report_bug(regs->sepc, regs); switch (type) { +#ifdef CONFIG_GENERIC_BUG case BUG_TRAP_TYPE_NONE: break; case BUG_TRAP_TYPE_WARN: regs->sepc += get_break_insn_length(regs->sepc); break; case BUG_TRAP_TYPE_BUG: +#endif /* CONFIG_GENERIC_BUG */ + default: die(regs, "Kernel BUG"); } } -#endif /* CONFIG_GENERIC_BUG */ - force_sig_fault(SIGTRAP, TRAP_BRKPT, (void __user *)(regs->sepc)); } -- 2.7.4
[PATCH 0/4] riscv: correct the do_trap_break()
The following three situations may occur in the current implementation of do_trap_break(). 1. When the CONFIG_GENERIC_BUG is disabled, if a kernel thread is trapped by BUG(), the whole system will be in the loop that infinitely handles the break exception instead of entering the die function. 2. When the kernel runs code on behalf of a user thread, and the kernel executes a WARN() or WARN_ON(), the user thread will be sent a bogus SIGTRAP. 3. Handling the unexpected ebreak instructions is to send a SIGTRAP to the trapped thread. However, if a kernel executes an unexpected ebreak, it may cause the kernel thread to be stuck in the ebreak instruction. This patch set will solve the above problems by adjusting the implementations of the do_trap_break(). Vincent Chen (4): riscv: avoid kernel hangs when trapped in BUG() rsicv: avoid sending a SIGTRAP to a user thread trapped in WARN() riscv: Correct the handling of unexpected ebreak in do_trap_break() riscv: remove the switch statement in do_trap_break() arch/riscv/kernel/traps.c | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) -- 2.7.4
[PATCH 2/4] rsicv: avoid sending a SIGTRAP to a user thread trapped in WARN()
On RISC-V, when the kernel runs code on behalf of a user thread, and the kernel executes a WARN() or WARN_ON(), the user thread will be sent a bogus SIGTRAP. Fix the RISC-V kernel code to not send a SIGTRAP when a WARN()/WARN_ON() is executed. Signed-off-by: Vincent Chen --- arch/riscv/kernel/traps.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c index 055a937aca70..82f42a55451e 100644 --- a/arch/riscv/kernel/traps.c +++ b/arch/riscv/kernel/traps.c @@ -134,7 +134,7 @@ asmlinkage void do_trap_break(struct pt_regs *regs) break; case BUG_TRAP_TYPE_WARN: regs->sepc += get_break_insn_length(regs->sepc); - break; + return; case BUG_TRAP_TYPE_BUG: #endif /* CONFIG_GENERIC_BUG */ default: -- 2.7.4
Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return
On 2019/9/23 上午1:43, Matt Cover wrote: On Sun, Sep 22, 2019 at 5:37 AM Michael S. Tsirkin wrote: On Fri, Sep 20, 2019 at 11:58:43AM -0700, Matthew Cover wrote: Treat a negative return from a TUNSETSTEERINGEBPF bpf prog as a signal to fallback to tun_automq_select_queue() for tx queue selection. Compilation of this exact patch was tested. For functional testing 3 additional printk()s were added. Functional testing results (on 2 txq tap device): [Fri Sep 20 18:33:27 2019] == tun no prog == [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran [Fri Sep 20 18:33:27 2019] == tun prog -1 == [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '-1' [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran [Fri Sep 20 18:33:27 2019] == tun prog 0 == [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '0' [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' [Fri Sep 20 18:33:27 2019] == tun prog 1 == [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '1' [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '1' [Fri Sep 20 18:33:27 2019] == tun prog 2 == [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '2' [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' Signed-off-by: Matthew Cover Could you add a bit more motivation data here? Thank you for these questions Michael. I'll plan on adding the below information to the commit message and submitting a v2 of this patch when net-next reopens. In the meantime, it would be very helpful to know if these answers address some of your concerns. 1. why is this a good idea This change allows TUNSETSTEERINGEBPF progs to do any of the following. 1. implement queue selection for a subset of traffic (e.g. special queue selection logic for ipv4, but return negative and use the default automq logic for ipv6) Well, using ebpf means it need to take care of all the cases. E.g you can easily implement the fallback through eBPF as well. 2. determine there isn't sufficient information to do proper queue selection; return negative and use the default automq logic for the unknown Same as above. 3. implement a noop prog (e.g. do bpf_trace_printk() then return negative and use the default automq logic for everything) ditto. 2. how do we know existing userspace does not rely on existing behaviour Prior to this change a negative return from a TUNSETSTEERINGEBPF prog would have been cast into a u16 and traversed netdev_cap_txqueue(). In most cases netdev_cap_txqueue() would have found this value to exceed real_num_tx_queues and queue_index would be updated to 0. It is possible that a TUNSETSTEERINGEBPF prog return a negative value which when cast into a u16 results in a positive queue_index less than real_num_tx_queues. For example, on x86_64, a return value of -65535 results in a queue_index of 1; which is a valid queue for any multiqueue device. It seems unlikely, however as stated above is unfortunately possible, that existing TUNSETSTEERINGEBPF programs would choose to return a negative value rather than return the positive value which holds the same meaning. It seems more likely that future TUNSETSTEERINGEBPF programs would leverage a negative return and potentially be loaded into a kernel with the old behavior. Yes, eBPF can return probably wrong value, but what kernel did is just to make sure it doesn't harm anything. I would rather just drop the packet in this case. Thanks 3. why doesn't userspace need a way to figure out whether it runs on a kernel with and without this patch There may be some value in exposing this fact to the ebpf prog loader. What is the standard practice here, a define? thanks, MST --- drivers/net/tun.c | 20 +++- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index aab0be4..173d159 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -583,35 +583,37 @@ static u16 tun_automq_select_queue(struct tun_struct *tun, struct sk_buff *skb) return txq; } -static u16 tun_ebpf_select_queue(struct tun_struct *tun, struct sk_buff *skb) +static int tun_ebpf_select_queue(struct tun_struct *tun, struct sk_buff *skb) { struct tun_prog *prog; u32 numqueues; - u16 ret = 0; + int ret = -1; numqueues = READ_ONCE(tun->numqueues); if (!numqueues) return 0; + rcu_read_lock(); prog = rcu_dereference(tun->steering_prog); if (prog) ret = bpf_prog_run_clear_cb(prog->prog, skb); +
[PATCH 4/4] riscv: remove the switch statement in do_trap_break()
To make the code more straightforward, replacing the switch statement with if statement. Suggested-by: Paul Walmsley Signed-off-by: Vincent Chen --- arch/riscv/kernel/traps.c | 23 --- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c index dd13bc90aeb6..1ac75f7d0bff 100644 --- a/arch/riscv/kernel/traps.c +++ b/arch/riscv/kernel/traps.c @@ -124,23 +124,24 @@ static inline unsigned long get_break_insn_length(unsigned long pc) asmlinkage void do_trap_break(struct pt_regs *regs) { - if (!user_mode(regs)) { + if (user_mode(regs)) { + force_sig_fault(SIGTRAP, TRAP_BRKPT, + (void __user *)(regs->sepc)); + return; + } +#ifdef CONFIG_GENERIC_BUG + { enum bug_trap_type type; type = report_bug(regs->sepc, regs); - switch (type) { -#ifdef CONFIG_GENERIC_BUG - case BUG_TRAP_TYPE_WARN: + if (type == BUG_TRAP_TYPE_WARN) { regs->sepc += get_break_insn_length(regs->sepc); return; - case BUG_TRAP_TYPE_BUG: -#endif /* CONFIG_GENERIC_BUG */ - default: - die(regs, "Kernel BUG"); } - } else - force_sig_fault(SIGTRAP, TRAP_BRKPT, - (void __user *)(regs->sepc)); + } +#endif /* CONFIG_GENERIC_BUG */ + + die(regs, "Kernel BUG"); } #ifdef CONFIG_GENERIC_BUG -- 2.7.4
[PATCH 3/4] riscv: Correct the handling of unexpected ebreak in do_trap_break()
For the kernel space, all ebreak instructions are determined at compile time because the kernel space debugging module is currently unsupported. Hence, it should be treated as a bug if an ebreak instruction which does not belong to BUG_TRAP_TYPE_WARN or BUG_TRAP_TYPE_BUG is executed in kernel space. For the userspace, debugging module or user problem may intentionally insert an ebreak instruction to trigger a SIGTRAP signal. To approach the above two situations, the do_trap_break() will direct the BUG_TRAP_TYPE_NONE ebreak exception issued in kernel space to die() and will send a SIGTRAP to the trapped process only when the ebreak is in userspace. Signed-off-by: Vincent Chen --- arch/riscv/kernel/traps.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c index 82f42a55451e..dd13bc90aeb6 100644 --- a/arch/riscv/kernel/traps.c +++ b/arch/riscv/kernel/traps.c @@ -130,8 +130,6 @@ asmlinkage void do_trap_break(struct pt_regs *regs) type = report_bug(regs->sepc, regs); switch (type) { #ifdef CONFIG_GENERIC_BUG - case BUG_TRAP_TYPE_NONE: - break; case BUG_TRAP_TYPE_WARN: regs->sepc += get_break_insn_length(regs->sepc); return; @@ -140,8 +138,9 @@ asmlinkage void do_trap_break(struct pt_regs *regs) default: die(regs, "Kernel BUG"); } - } - force_sig_fault(SIGTRAP, TRAP_BRKPT, (void __user *)(regs->sepc)); + } else + force_sig_fault(SIGTRAP, TRAP_BRKPT, + (void __user *)(regs->sepc)); } #ifdef CONFIG_GENERIC_BUG -- 2.7.4
[mm, slab_common] c5e1edaa1a: BUG:unable_to_handle_page_fault_for_address
FYI, we noticed the following commit (built with gcc-7): commit: c5e1edaa1a52581b350315e4b163cdcc0fd7605d ("[PATCH v5 7/7] mm, slab_common: Modify kmalloc_caches[type][idx] to kmalloc_caches[idx][type]") url: https://github.com/0day-ci/linux/commits/Pengfei-Li/mm-slab-Make-kmalloc_info-contain-all-types-of-names/20190916-224827 in testcase: boot on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 4G caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace): +-+++ | | 4e0d9effb5 | c5e1edaa1a | +-+++ | boot_successes | 4 | 0 | | boot_failures | 0 | 4 | | BUG:unable_to_handle_page_fault_for_address | 0 | 4 | | Oops:#[##] | 0 | 4 | | RIP:calculate_slab_order| 0 | 4 | | Kernel_panic-not_syncing:Fatal_exception| 0 | 4 | +-+++ If you fix the issue, kindly add following tag Reported-by: kernel test robot [0.417634] BUG: unable to handle page fault for address: 00301171 [0.418931] #PF: supervisor read access in kernel mode [0.419879] #PF: error_code(0x) - not-present page [0.420827] PGD 0 P4D 0 [0.421296] Oops: [#1] DEBUG_PAGEALLOC [0.422073] CPU: 0 PID: 0 Comm: swapper Tainted: GT 5.3.0-7-gc5e1edaa1a525 #2 [0.423725] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 [0.425265] RIP: 0010:calculate_slab_order+0x81/0xd9 [0.426194] Code: ba e4 11 41 89 5f 24 41 89 4f 28 73 35 eb 61 31 f6 89 c7 89 4c 24 04 48 89 54 24 08 e8 95 8a fd ff 48 85 c0 8b 4c 24 04 74 38 <83> 78 20 00 78 32 41 8b 77 14 48 8b 54 24 08 d1 ee 39 70 14 76 be [0.429675] RSP: :82603e50 EFLAGS: 00010002 [0.430648] RAX: 00301151 RBX: 0001 RCX: [0.431966] RDX: RSI: RDI: 0001 [0.433282] RBP: 1000 R08: R09: [0.434617] R10: 2000 R11: 0005 R12: 80002800 [0.435936] R13: 8000 R14: R15: 82853bc0 [0.437259] FS: () GS:8264e000() knlGS: [0.438765] CS: 0010 DS: ES: CR0: 80050033 [0.439840] CR2: 00301171 CR3: 02622000 CR4: 000406b0 [0.441157] Call Trace: [0.441614] set_off_slab_cache+0x39/0x59 [0.442357] __kmem_cache_create+0x11a/0x26e [0.443171] create_boot_cache+0x46/0x69 [0.443900] kmem_cache_init+0x72/0x115 [0.444612] start_kernel+0x204/0x454 [0.445301] secondary_startup_64+0xa4/0xb0 [0.446078] Modules linked in: [0.446696] CR2: 00301171 [0.447323] random: get_random_bytes called from init_oops_id+0x1d/0x2c with crng_init=0 [0.448820] ---[ end trace b2352b5ced6eae1c ]--- To reproduce: # build kernel cd linux cp config-5.3.0-7-gc5e1edaa1a525 .config make HOSTCC=gcc-7 CC=gcc-7 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp qemu -k job-script # job-script is attached in this email Thanks, lkp # # Automatically generated file; DO NOT EDIT. # Linux/x86_64 5.3.0 Kernel Configuration # # # Compiler: gcc-7 (Debian 7.4.0-13) 7.4.0 # CONFIG_CC_IS_GCC=y CONFIG_GCC_VERSION=70400 CONFIG_CLANG_VERSION=0 CONFIG_CC_CAN_LINK=y CONFIG_CC_HAS_ASM_GOTO=y CONFIG_CC_HAS_WARN_MAYBE_UNINITIALIZED=y CONFIG_CC_DISABLE_WARN_MAYBE_UNINITIALIZED=y CONFIG_CONSTRUCTORS=y CONFIG_IRQ_WORK=y CONFIG_BUILDTIME_EXTABLE_SORT=y CONFIG_THREAD_INFO_IN_TASK=y # # General setup # CONFIG_BROKEN_ON_SMP=y CONFIG_INIT_ENV_ARG_LIMIT=32 # CONFIG_COMPILE_TEST is not set # CONFIG_HEADER_TEST is not set CONFIG_LOCALVERSION="" CONFIG_LOCALVERSION_AUTO=y CONFIG_BUILD_SALT="" CONFIG_HAVE_KERNEL_GZIP=y CONFIG_HAVE_KERNEL_BZIP2=y CONFIG_HAVE_KERNEL_LZMA=y CONFIG_HAVE_KERNEL_XZ=y CONFIG_HAVE_KERNEL_LZO=y CONFIG_HAVE_KERNEL_LZ4=y # CONFIG_KERNEL_GZIP is not set # CONFIG_KERNEL_BZIP2 is not set # CONFIG_KERNEL_LZMA is not set CONFIG_KERNEL_XZ=y # CONFIG_KERNEL_LZO is not set # CONFIG_KERNEL_LZ4 is not set CONFIG_DEFAULT_HOSTNAME="(none)" CONFIG_SWAP=y # CONFIG_SYSVIPC is not set # CONFIG_POSIX_MQUEUE is not set CONFIG_CROSS_MEMORY_ATTACH=y # CONFIG_USELIB is not set # CONFIG_AUDIT is not set CONFIG_HAVE_ARCH_AUDITSYSCALL=y # # IRQ subsystem # CONFIG_GENERIC_IRQ_PROBE=y CONFIG_GENERIC_IRQ_SHOW=y CONFIG_IRQ_DOMAIN=y CONFIG_IRQ_SIM=y
Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return
On Sun, Sep 22, 2019 at 3:46 PM Matt Cover wrote: > > On Sun, Sep 22, 2019 at 3:30 PM Matt Cover wrote: > > > > On Sun, Sep 22, 2019 at 1:36 PM Michael S. Tsirkin wrote: > > > > > > On Sun, Sep 22, 2019 at 10:43:19AM -0700, Matt Cover wrote: > > > > On Sun, Sep 22, 2019 at 5:37 AM Michael S. Tsirkin > > > > wrote: > > > > > > > > > > On Fri, Sep 20, 2019 at 11:58:43AM -0700, Matthew Cover wrote: > > > > > > Treat a negative return from a TUNSETSTEERINGEBPF bpf prog as a > > > > > > signal > > > > > > to fallback to tun_automq_select_queue() for tx queue selection. > > > > > > > > > > > > Compilation of this exact patch was tested. > > > > > > > > > > > > For functional testing 3 additional printk()s were added. > > > > > > > > > > > > Functional testing results (on 2 txq tap device): > > > > > > > > > > > > [Fri Sep 20 18:33:27 2019] == tun no prog == > > > > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() > > > > > > returned '-1' > > > > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran > > > > > > [Fri Sep 20 18:33:27 2019] == tun prog -1 == > > > > > > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() > > > > > > returned '-1' > > > > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() > > > > > > returned '-1' > > > > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran > > > > > > [Fri Sep 20 18:33:27 2019] == tun prog 0 == > > > > > > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() > > > > > > returned '0' > > > > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() > > > > > > returned '0' > > > > > > [Fri Sep 20 18:33:27 2019] == tun prog 1 == > > > > > > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() > > > > > > returned '1' > > > > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() > > > > > > returned '1' > > > > > > [Fri Sep 20 18:33:27 2019] == tun prog 2 == > > > > > > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() > > > > > > returned '2' > > > > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() > > > > > > returned '0' > > > > > > > > > > > > Signed-off-by: Matthew Cover > > > > > > > > > > > > > > > Could you add a bit more motivation data here? > > > > > > > > Thank you for these questions Michael. > > > > > > > > I'll plan on adding the below information to the > > > > commit message and submitting a v2 of this patch > > > > when net-next reopens. In the meantime, it would > > > > be very helpful to know if these answers address > > > > some of your concerns. > > > > > > > > > 1. why is this a good idea > > > > > > > > This change allows TUNSETSTEERINGEBPF progs to > > > > do any of the following. > > > > 1. implement queue selection for a subset of > > > > traffic (e.g. special queue selection logic > > > > for ipv4, but return negative and use the > > > > default automq logic for ipv6) > > > > 2. determine there isn't sufficient information > > > > to do proper queue selection; return > > > > negative and use the default automq logic > > > > for the unknown > > > > 3. implement a noop prog (e.g. do > > > > bpf_trace_printk() then return negative and > > > > use the default automq logic for everything) > > > > > > > > > 2. how do we know existing userspace does not rely on existing > > > > > behaviour > > > > > > > > Prior to this change a negative return from a > > > > TUNSETSTEERINGEBPF prog would have been cast > > > > into a u16 and traversed netdev_cap_txqueue(). > > > > > > > > In most cases netdev_cap_txqueue() would have > > > > found this value to exceed real_num_tx_queues > > > > and queue_index would be updated to 0. > > > > > > > > It is possible that a TUNSETSTEERINGEBPF prog > > > > return a negative value which when cast into a > > > > u16 results in a positive queue_index less than > > > > real_num_tx_queues. For example, on x86_64, a > > > > return value of -65535 results in a queue_index > > > > of 1; which is a valid queue for any multiqueue > > > > device. > > > > > > > > It seems unlikely, however as stated above is > > > > unfortunately possible, that existing > > > > TUNSETSTEERINGEBPF programs would choose to > > > > return a negative value rather than return the > > > > positive value which holds the same meaning. > > > > > > > > It seems more likely that future > > > > TUNSETSTEERINGEBPF programs would leverage a > > > > negative return and potentially be loaded into > > > > a kernel with the old behavior. > > > > > > OK if we are returning a special > > > value, shouldn't we limit it? How about a special > > > value with this meaning? > > > If we are changing an ABI let's at least make it > > > extensible. > > > > > > > A special value with this meaning sounds > > good to me. I'll plan on adding a define > > set to -1 to cause the fallback to automq. > > > >
Re: [PATCH 2/2] soc: ti: move 2 driver config options into the TI SOC drivers menu
On 9/21/19 1:46 PM, Randy Dunlap wrote: Hi Santosh, Would you also pick up patch 2/2, which I didn't Cc: you on?:( Do I need to resend it? Yes please. I don't have your 2/2
Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return
On Sun, Sep 22, 2019 at 3:30 PM Matt Cover wrote: > > On Sun, Sep 22, 2019 at 1:36 PM Michael S. Tsirkin wrote: > > > > On Sun, Sep 22, 2019 at 10:43:19AM -0700, Matt Cover wrote: > > > On Sun, Sep 22, 2019 at 5:37 AM Michael S. Tsirkin > > > wrote: > > > > > > > > On Fri, Sep 20, 2019 at 11:58:43AM -0700, Matthew Cover wrote: > > > > > Treat a negative return from a TUNSETSTEERINGEBPF bpf prog as a signal > > > > > to fallback to tun_automq_select_queue() for tx queue selection. > > > > > > > > > > Compilation of this exact patch was tested. > > > > > > > > > > For functional testing 3 additional printk()s were added. > > > > > > > > > > Functional testing results (on 2 txq tap device): > > > > > > > > > > [Fri Sep 20 18:33:27 2019] == tun no prog == > > > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned > > > > > '-1' > > > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran > > > > > [Fri Sep 20 18:33:27 2019] == tun prog -1 == > > > > > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned > > > > > '-1' > > > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned > > > > > '-1' > > > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran > > > > > [Fri Sep 20 18:33:27 2019] == tun prog 0 == > > > > > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned > > > > > '0' > > > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned > > > > > '0' > > > > > [Fri Sep 20 18:33:27 2019] == tun prog 1 == > > > > > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned > > > > > '1' > > > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned > > > > > '1' > > > > > [Fri Sep 20 18:33:27 2019] == tun prog 2 == > > > > > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned > > > > > '2' > > > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned > > > > > '0' > > > > > > > > > > Signed-off-by: Matthew Cover > > > > > > > > > > > > Could you add a bit more motivation data here? > > > > > > Thank you for these questions Michael. > > > > > > I'll plan on adding the below information to the > > > commit message and submitting a v2 of this patch > > > when net-next reopens. In the meantime, it would > > > be very helpful to know if these answers address > > > some of your concerns. > > > > > > > 1. why is this a good idea > > > > > > This change allows TUNSETSTEERINGEBPF progs to > > > do any of the following. > > > 1. implement queue selection for a subset of > > > traffic (e.g. special queue selection logic > > > for ipv4, but return negative and use the > > > default automq logic for ipv6) > > > 2. determine there isn't sufficient information > > > to do proper queue selection; return > > > negative and use the default automq logic > > > for the unknown > > > 3. implement a noop prog (e.g. do > > > bpf_trace_printk() then return negative and > > > use the default automq logic for everything) > > > > > > > 2. how do we know existing userspace does not rely on existing behaviour > > > > > > Prior to this change a negative return from a > > > TUNSETSTEERINGEBPF prog would have been cast > > > into a u16 and traversed netdev_cap_txqueue(). > > > > > > In most cases netdev_cap_txqueue() would have > > > found this value to exceed real_num_tx_queues > > > and queue_index would be updated to 0. > > > > > > It is possible that a TUNSETSTEERINGEBPF prog > > > return a negative value which when cast into a > > > u16 results in a positive queue_index less than > > > real_num_tx_queues. For example, on x86_64, a > > > return value of -65535 results in a queue_index > > > of 1; which is a valid queue for any multiqueue > > > device. > > > > > > It seems unlikely, however as stated above is > > > unfortunately possible, that existing > > > TUNSETSTEERINGEBPF programs would choose to > > > return a negative value rather than return the > > > positive value which holds the same meaning. > > > > > > It seems more likely that future > > > TUNSETSTEERINGEBPF programs would leverage a > > > negative return and potentially be loaded into > > > a kernel with the old behavior. > > > > OK if we are returning a special > > value, shouldn't we limit it? How about a special > > value with this meaning? > > If we are changing an ABI let's at least make it > > extensible. > > > > A special value with this meaning sounds > good to me. I'll plan on adding a define > set to -1 to cause the fallback to automq. > > The way I was initially viewing the old > behavior was that returning negative was > undefined; it happened to have the > outcomes I walked through, but not > necessarily by design. > > In order to keep the new behavior > extensible, how should we state that a > negative return other than -1 is >
Re: [PATCH v4 1/2] soc/tegra: pmc: Query PCLK clock rate at probe time
04.08.2019 23:29, Dmitry Osipenko пишет: > It is possible to get a lockup if kernel decides to enter LP2 cpuidle > from some clk-notifier, in that case CCF's "prepare" mutex is kept locked > and thus clk_get_rate(pclk) blocks on the same mutex with interrupts being > disabled. > > Signed-off-by: Dmitry Osipenko > --- > > Changelog: > > v4: Added clk-notifier to track PCLK rate-changes, which may become useful > in the future. That's done in response to v3 review comment from Peter > De Schrijver. > > Now properly handling case where clk pointer is intentionally NULL on > the driver's probe. > > v3: Changed commit's message because I actually recalled what was the > initial reason for the patch, since the problem reoccurred once again. > > v2: Addressed review comments that were made by Jon Hunter to v1 by > not moving the memory barrier, replacing one missed clk_get_rate() > with pmc->rate, handling possible clk_get_rate() error on probe and > slightly adjusting the commits message. > > drivers/soc/tegra/pmc.c | 71 ++--- > 1 file changed, 53 insertions(+), 18 deletions(-) > > diff --git a/drivers/soc/tegra/pmc.c b/drivers/soc/tegra/pmc.c > index 9f9c1c677cf4..4e44943d0b26 100644 > --- a/drivers/soc/tegra/pmc.c > +++ b/drivers/soc/tegra/pmc.c > @@ -309,6 +309,7 @@ static const char * const tegra210_reset_sources[] = { > * @pctl_dev: pin controller exposed by the PMC > * @domain: IRQ domain provided by the PMC > * @irq: chip implementation for the IRQ domain > + * @clk_nb: pclk clock changes handler > */ > struct tegra_pmc { > struct device *dev; > @@ -344,6 +345,8 @@ struct tegra_pmc { > > struct irq_domain *domain; > struct irq_chip irq; > + > + struct notifier_block clk_nb; > }; > > static struct tegra_pmc *pmc = &(struct tegra_pmc) { > @@ -1192,7 +1195,7 @@ static int tegra_io_pad_prepare(struct tegra_pmc *pmc, > enum tegra_io_pad id, > return err; > > if (pmc->clk) { > - rate = clk_get_rate(pmc->clk); > + rate = pmc->rate; > if (!rate) { > dev_err(pmc->dev, "failed to get clock rate\n"); > return -ENODEV; > @@ -1433,6 +1436,7 @@ void tegra_pmc_set_suspend_mode(enum tegra_suspend_mode > mode) > void tegra_pmc_enter_suspend_mode(enum tegra_suspend_mode mode) > { > unsigned long long rate = 0; > + u64 ticks; > u32 value; > > switch (mode) { > @@ -1441,31 +1445,22 @@ void tegra_pmc_enter_suspend_mode(enum > tegra_suspend_mode mode) > break; > > case TEGRA_SUSPEND_LP2: > - rate = clk_get_rate(pmc->clk); > + rate = pmc->rate; > break; > > default: > break; > } > > - if (WARN_ON_ONCE(rate == 0)) > - rate = 1; > - > - if (rate != pmc->rate) { > - u64 ticks; > - > - ticks = pmc->cpu_good_time * rate + USEC_PER_SEC - 1; > - do_div(ticks, USEC_PER_SEC); > - tegra_pmc_writel(pmc, ticks, PMC_CPUPWRGOOD_TIMER); > + ticks = pmc->cpu_good_time * rate + USEC_PER_SEC - 1; > + do_div(ticks, USEC_PER_SEC); > + tegra_pmc_writel(pmc, ticks, PMC_CPUPWRGOOD_TIMER); > > - ticks = pmc->cpu_off_time * rate + USEC_PER_SEC - 1; > - do_div(ticks, USEC_PER_SEC); > - tegra_pmc_writel(pmc, ticks, PMC_CPUPWROFF_TIMER); > + ticks = pmc->cpu_off_time * rate + USEC_PER_SEC - 1; > + do_div(ticks, USEC_PER_SEC); > + tegra_pmc_writel(pmc, ticks, PMC_CPUPWROFF_TIMER); > > - wmb(); > - > - pmc->rate = rate; > - } > + wmb(); > > value = tegra_pmc_readl(pmc, PMC_CNTRL); > value &= ~PMC_CNTRL_SIDE_EFFECT_LP0; > @@ -2019,6 +2014,20 @@ static int tegra_pmc_irq_init(struct tegra_pmc *pmc) > return 0; > } > > +static int tegra_pmc_clk_notify_cb(struct notifier_block *nb, > +unsigned long action, void *ptr) > +{ > + struct clk_notifier_data *data = ptr; > + struct tegra_pmc *pmc; > + > + if (action == POST_RATE_CHANGE) { > + pmc = container_of(nb, struct tegra_pmc, clk_nb); > + pmc->rate = data->new_rate; > + } > + > + return NOTIFY_OK; > +} > + > static int tegra_pmc_probe(struct platform_device *pdev) > { > void __iomem *base; > @@ -2082,6 +2091,30 @@ static int tegra_pmc_probe(struct platform_device > *pdev) > pmc->clk = NULL; > } > > + /* > + * PCLK clock rate can't be retrieved using CLK API because it > + * causes lockup if CPU enters LP2 idle state from some other > + * CLK notifier, hence we're caching the rate's value locally. > + */ > + if (pmc->clk) { > + pmc->clk_nb.notifier_call = tegra_pmc_clk_notify_cb; > + err = clk_notifier_register(pmc->clk, >clk_nb); >
Re: [PATCH v2 1/2] clk: tegra: divider: Add missing check for enable-bit on rate's recalculation
23.07.2019 5:52, Dmitry Osipenko пишет: > Unset "enable" bit means that divider is in bypass mode, hence it doesn't > have any effect in that case. Please note that there are no known bugs > caused by the missing check. > > Signed-off-by: Dmitry Osipenko > --- > > Changelog: > > v2: Changed the commit's description from 'Fix' to 'Add' in response to the > Stephen's Boyd question about the need to backport the patch into stable > kernels. The backporting is not really needed. > > drivers/clk/tegra/clk-divider.c | 9 +++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/drivers/clk/tegra/clk-divider.c b/drivers/clk/tegra/clk-divider.c > index e76731fb7d69..f33c19045386 100644 > --- a/drivers/clk/tegra/clk-divider.c > +++ b/drivers/clk/tegra/clk-divider.c > @@ -40,8 +40,13 @@ static unsigned long clk_frac_div_recalc_rate(struct > clk_hw *hw, > int div, mul; > u64 rate = parent_rate; > > - reg = readl_relaxed(divider->reg) >> divider->shift; > - div = reg & div_mask(divider); > + reg = readl_relaxed(divider->reg); > + > + if ((divider->flags & TEGRA_DIVIDER_UART) && > + !(reg & PERIPH_CLK_UART_DIV_ENB)) > + return rate; > + > + div = (reg >> divider->shift) & div_mask(divider); > > mul = get_mul(divider); > div += mul; > Hello Peter, ACK?
Re: threads-max observe limits
Michal Hocko writes: > From 711000fdc243b6bc68a92f9ef0017ae495086d39 Mon Sep 17 00:00:00 2001 > From: Michal Hocko > Date: Sun, 22 Sep 2019 08:45:28 +0200 > Subject: [PATCH] kernel/sysctl.c: do not override max_threads provided by > userspace > > Partially revert 16db3d3f1170 ("kernel/sysctl.c: threads-max observe limits") > because the patch is causing a regression to any workload which needs to > override the auto-tuning of the limit provided by kernel. > > set_max_threads is implementing a boot time guesstimate to provide a > sensible limit of the concurrently running threads so that runaways will > not deplete all the memory. This is a good thing in general but there > are workloads which might need to increase this limit for an application > to run (reportedly WebSpher MQ is affected) and that is simply not > possible after the mentioned change. It is also very dubious to override > an admin decision by an estimation that doesn't have any direct relation > to correctness of the kernel operation. > > Fix this by dropping set_max_threads from sysctl_max_threads so any > value is accepted as long as it fits into MAX_THREADS which is important > to check because allowing more threads could break internal robust futex > restriction. While at it, do not use MIN_THREADS as the lower boundary > because it is also only a heuristic for automatic estimation and admin > might have a good reason to stop new threads to be created even when > below this limit. > > Fixes: 16db3d3f1170 ("kernel/sysctl.c: threads-max observe limits") > Cc: stable > Signed-off-by: Michal Hocko Reviewed-by: "Eric W. Biederman" > --- > kernel/fork.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/kernel/fork.c b/kernel/fork.c > index 2852d0e76ea3..ef865be37e98 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -2929,7 +2929,7 @@ int sysctl_max_threads(struct ctl_table *table, int > write, > struct ctl_table t; > int ret; > int threads = max_threads; > - int min = MIN_THREADS; > + int min = 1; > int max = MAX_THREADS; > > t = *table; > @@ -2941,7 +2941,7 @@ int sysctl_max_threads(struct ctl_table *table, int > write, > if (ret || !write) > return ret; > > - set_max_threads(threads); > + max_threads = threads; > > return 0; > } > -- > 2.20.1
Re: threads-max observe limits
Heinrich Schuchardt writes: > Did this patch when applied to the customer's kernel solve any problem? > > WebSphere MQ is a messaging application. If it hits the current limits > of threads-max, there is a bug in the software or in the way that it has > been set up at the customer. Instead of messing around with the kernel > the application should be fixed. While it is true that almost every workload will be buggy if it exceeds 1/8 of memory with just the kernel data structures for threads. It is not necessary true of every application. I can easily imagine cases up around 1/2 of memory where things could work reasonably. Further we can exhaust all of memory much more simply in a default configuration by malloc'ing more memory that in physically present and zeroing it all. Henrich, you were the one messed with the kernel by breaking a reasonable kernel tunable. AKA you caused a regression. That violates the no regression rule. As much as possible we fix regressions so software that used to work continues to work. Removing footguns is not a reason to introduce a regression. I do agree that Michal's customer's problem sounds like it is something else but if the kernel did not have a regression we could focus on the real problem instead of being side tracked by the regression. > With this patch you allow administrators to set values that will crash > their system. And they will not even have a way to find out the limits > which he should adhere to. So expect a lot of systems to be downed > this way. Nope. The system administrator just setting a higher value whon't crash their system. Only using that many resources would crash the system. Nor is a sysctl like this for discovering the physical limits of a machine. Which the current value is vastly inappropriate for. As the physical limits of many machines are much higher than 1/8 of memory. Eric
Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return
On Sun, Sep 22, 2019 at 1:36 PM Michael S. Tsirkin wrote: > > On Sun, Sep 22, 2019 at 10:43:19AM -0700, Matt Cover wrote: > > On Sun, Sep 22, 2019 at 5:37 AM Michael S. Tsirkin wrote: > > > > > > On Fri, Sep 20, 2019 at 11:58:43AM -0700, Matthew Cover wrote: > > > > Treat a negative return from a TUNSETSTEERINGEBPF bpf prog as a signal > > > > to fallback to tun_automq_select_queue() for tx queue selection. > > > > > > > > Compilation of this exact patch was tested. > > > > > > > > For functional testing 3 additional printk()s were added. > > > > > > > > Functional testing results (on 2 txq tap device): > > > > > > > > [Fri Sep 20 18:33:27 2019] == tun no prog == > > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned > > > > '-1' > > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran > > > > [Fri Sep 20 18:33:27 2019] == tun prog -1 == > > > > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned > > > > '-1' > > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned > > > > '-1' > > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran > > > > [Fri Sep 20 18:33:27 2019] == tun prog 0 == > > > > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned > > > > '0' > > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned > > > > '0' > > > > [Fri Sep 20 18:33:27 2019] == tun prog 1 == > > > > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned > > > > '1' > > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned > > > > '1' > > > > [Fri Sep 20 18:33:27 2019] == tun prog 2 == > > > > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned > > > > '2' > > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned > > > > '0' > > > > > > > > Signed-off-by: Matthew Cover > > > > > > > > > Could you add a bit more motivation data here? > > > > Thank you for these questions Michael. > > > > I'll plan on adding the below information to the > > commit message and submitting a v2 of this patch > > when net-next reopens. In the meantime, it would > > be very helpful to know if these answers address > > some of your concerns. > > > > > 1. why is this a good idea > > > > This change allows TUNSETSTEERINGEBPF progs to > > do any of the following. > > 1. implement queue selection for a subset of > > traffic (e.g. special queue selection logic > > for ipv4, but return negative and use the > > default automq logic for ipv6) > > 2. determine there isn't sufficient information > > to do proper queue selection; return > > negative and use the default automq logic > > for the unknown > > 3. implement a noop prog (e.g. do > > bpf_trace_printk() then return negative and > > use the default automq logic for everything) > > > > > 2. how do we know existing userspace does not rely on existing behaviour > > > > Prior to this change a negative return from a > > TUNSETSTEERINGEBPF prog would have been cast > > into a u16 and traversed netdev_cap_txqueue(). > > > > In most cases netdev_cap_txqueue() would have > > found this value to exceed real_num_tx_queues > > and queue_index would be updated to 0. > > > > It is possible that a TUNSETSTEERINGEBPF prog > > return a negative value which when cast into a > > u16 results in a positive queue_index less than > > real_num_tx_queues. For example, on x86_64, a > > return value of -65535 results in a queue_index > > of 1; which is a valid queue for any multiqueue > > device. > > > > It seems unlikely, however as stated above is > > unfortunately possible, that existing > > TUNSETSTEERINGEBPF programs would choose to > > return a negative value rather than return the > > positive value which holds the same meaning. > > > > It seems more likely that future > > TUNSETSTEERINGEBPF programs would leverage a > > negative return and potentially be loaded into > > a kernel with the old behavior. > > OK if we are returning a special > value, shouldn't we limit it? How about a special > value with this meaning? > If we are changing an ABI let's at least make it > extensible. > A special value with this meaning sounds good to me. I'll plan on adding a define set to -1 to cause the fallback to automq. The way I was initially viewing the old behavior was that returning negative was undefined; it happened to have the outcomes I walked through, but not necessarily by design. In order to keep the new behavior extensible, how should we state that a negative return other than -1 is undefined and therefore subject to change. Is something like this sufficient? Documentation/networking/tc-actions-env-rules.txt Additionally, what should the new behavior implement when a negative other than -1 is returned? I would like to have it do the same thing as -1 for now, but with the
Re: [PATCH] serdev: Add ACPI devices by ResourceSource field
Hi all, On 9/20/19 5:00 PM, Hans de Goede wrote: So as promised I've given this patch a try, unfortunately it breaks existing users of ACPI serdev device instantation. After adding this patch "ls /sys/bus/serial/devices" is empty, where as before it gives: [root@dhcp-45-50 ~]# ls -l /sys/bus/serial/devices/ total 0 lrwxrwxrwx. 1 root root 0 Sep 20 16:43 serial0 -> ../../../devices/pci:00/8086228A:00/serial0 lrwxrwxrwx. 1 root root 0 Sep 20 16:43 serial0-0 -> ../../../devices/pci:00/8086228A:00/serial0/serial0-0 And since the serdev is missing bluetooth does not work. Thanks to some testing by Hans, it turns out that the reason for this is that acpi_walk_resources fails with AE_AML_INVALID_RESOURCE_TYPE for a specific device. If anyone is interested, the _CRS of the device in question is Method (_CRS, 0, NotSerialized) // _CRS: Current Resource Settings { Name (NAM, Buffer (0x14) { "\\_SB.PCI0.SDHB.BRC1" }) Name (SPB, Buffer (0x0C) { /* */ 0x8E, 0x1D, 0x00, 0x01, 0x00, 0xC0, 0x02, 0x00, /* 0008 */ 0x00, 0x01, 0x00, 0x00 }) Name (END, ResourceTemplate () { }) Concatenate (SPB, NAM, Local0) Concatenate (Local0, END, Local1) Return (Local1) } To solve this, I propose ignoring errors that occur when evaluating the _CRS method. Note that with the previously discussed change for v2, where we will only look at the first device in _CRS, we should be able to handle errors from the actual serdev device allocation separately (and only ignore AML evaluation errors). Further, I think it might also make sense to move the status and already-enumerated checks out of acpi_serdev_register_device to before looking at _CRS. On one hand, this might save us from unnecessarily checking the _CRS on devices that are not present, but on the other hand, moving the status check may cause more AML code execution, as we'd be checking the status of every device, even if it doesn't have a _CRS. Maybe a better solution would be something like: Check if device has already been enumerated, then check for _CRS presence, then for status/device-presence, and finally look at _CRS contents and potentially allocate serdev client? Regards, Maximilian
Re: [PATCH] net: dsa: b53: Use the correct style for SPDX License Identifier
On Sat, 21 Sep 2019 19:00:16 +0530, Nishad Kamdar wrote: > This patch corrects the SPDX License Identifier style > in header file for Broadcom BCM53xx managed switch driver. > For C header files Documentation/process/license-rules.rst > mandates C-like comments (opposed to C source files where > C++ style should be used) > > Changes made by using a script provided by Joe Perches here: > https://lkml.org/lkml/2019/2/7/46. > > Suggested-by: Joe Perches > Signed-off-by: Nishad Kamdar Applied, thank you!
Re: [PATCH] net: dsa: Use the correct style for SPDX License Identifier
On Sat, 21 Sep 2019 19:15:25 +0530, Nishad Kamdar wrote: > This patch corrects the SPDX License Identifier style > in header file for Distributed Switch Architecture drivers. > For C header files Documentation/process/license-rules.rst > mandates C-like comments (opposed to C source files where > C++ style should be used) > > Changes made by using a script provided by Joe Perches here: > https://lkml.org/lkml/2019/2/7/46. > > Suggested-by: Joe Perches > Signed-off-by: Nishad Kamdar Applied, thank you!
Re: [PATCH net] net: stmmac: selftests: Flow Control test can also run with ASYM Pause
On Thu, 19 Sep 2019 12:09:49 +0200, Jose Abreu wrote: > The Flow Control selftest is also available with ASYM Pause. Lets add > this check to the test and fix eventual false positive failures. > > Fixes: 091810dbded9 ("net: stmmac: Introduce selftests support") > Signed-off-by: Jose Abreu Hi Jose! Thanks for the patch it looks good, seems like you posted it from a slightly different email address than was used for signoff: From: Jose Abreu vs Signed-off-by: Jose Abreu Could you please fix and repost? Automation may get upset otherwise.
Re: [PATCH 0/5] net: ethernet: stmmac: some fixes and optimization
On Fri, 20 Sep 2019 07:38:12 +0200, Christophe Roullier wrote: > Some improvements (manage syscfg as optional clock, update slew rate of > ETH_MDIO pin, Enable gating of the MAC TX clock during TX low-power mode) > Fix warning build message when W=1 There seems to be some new features/cleanups (or improvements as you say) here. Could you explain the negative impact not applying these changes will have? Patches 1 and 3 in particular. net-next is now closed [1], and will reopen some time after the merge window is over. For now we are only expecting fixes for the net tree. Could you (a) provide stronger motivation these changes are fixes; or (b) separate the fixes from improvements? Thank you! [1] https://www.kernel.org/doc/html/latest/networking/netdev-FAQ.html
[PATCH v1] clk: tegra20/30: Optimize PLLX configuration restoring
There is no need to re-configure PLLX if its configuration in unchanged on return from suspend / cpuidle, this saves 300us if PLLX is already enabled (common case for cpuidle). Signed-off-by: Dmitry Osipenko --- drivers/clk/tegra/clk-tegra20.c | 25 - drivers/clk/tegra/clk-tegra30.c | 25 - 2 files changed, 32 insertions(+), 18 deletions(-) diff --git a/drivers/clk/tegra/clk-tegra20.c b/drivers/clk/tegra/clk-tegra20.c index cceefbd67a3b..4d8222f5c638 100644 --- a/drivers/clk/tegra/clk-tegra20.c +++ b/drivers/clk/tegra/clk-tegra20.c @@ -955,6 +955,7 @@ static void tegra20_cpu_clock_suspend(void) static void tegra20_cpu_clock_resume(void) { unsigned int reg, policy; + u32 misc, base; /* Is CPU complex already running on PLLX? */ reg = readl(clk_base + CCLK_BURST_POLICY); @@ -968,15 +969,21 @@ static void tegra20_cpu_clock_resume(void) BUG(); if (reg != CCLK_BURST_POLICY_PLLX) { - /* restore PLLX settings if CPU is on different PLL */ - writel(tegra20_cpu_clk_sctx.pllx_misc, - clk_base + PLLX_MISC); - writel(tegra20_cpu_clk_sctx.pllx_base, - clk_base + PLLX_BASE); - - /* wait for PLL stabilization if PLLX was enabled */ - if (tegra20_cpu_clk_sctx.pllx_base & (1 << 30)) - udelay(300); + misc = readl_relaxed(clk_base + PLLX_MISC); + base = readl_relaxed(clk_base + PLLX_BASE); + + if (misc != tegra20_cpu_clk_sctx.pllx_misc || + base != tegra20_cpu_clk_sctx.pllx_base) { + /* restore PLLX settings if CPU is on different PLL */ + writel(tegra20_cpu_clk_sctx.pllx_misc, + clk_base + PLLX_MISC); + writel(tegra20_cpu_clk_sctx.pllx_base, + clk_base + PLLX_BASE); + + /* wait for PLL stabilization if PLLX was enabled */ + if (tegra20_cpu_clk_sctx.pllx_base & (1 << 30)) + udelay(300); + } } /* diff --git a/drivers/clk/tegra/clk-tegra30.c b/drivers/clk/tegra/clk-tegra30.c index a19840fac716..3b5bca44b7aa 100644 --- a/drivers/clk/tegra/clk-tegra30.c +++ b/drivers/clk/tegra/clk-tegra30.c @@ -1135,6 +1135,7 @@ static void tegra30_cpu_clock_suspend(void) static void tegra30_cpu_clock_resume(void) { unsigned int reg, policy; + u32 misc, base; /* Is CPU complex already running on PLLX? */ reg = readl(clk_base + CLK_RESET_CCLK_BURST); @@ -1148,15 +1149,21 @@ static void tegra30_cpu_clock_resume(void) BUG(); if (reg != CLK_RESET_CCLK_BURST_POLICY_PLLX) { - /* restore PLLX settings if CPU is on different PLL */ - writel(tegra30_cpu_clk_sctx.pllx_misc, - clk_base + CLK_RESET_PLLX_MISC); - writel(tegra30_cpu_clk_sctx.pllx_base, - clk_base + CLK_RESET_PLLX_BASE); - - /* wait for PLL stabilization if PLLX was enabled */ - if (tegra30_cpu_clk_sctx.pllx_base & (1 << 30)) - udelay(300); + misc = readl_relaxed(clk_base + CLK_RESET_PLLX_MISC); + base = readl_relaxed(clk_base + CLK_RESET_PLLX_BASE); + + if (misc != tegra30_cpu_clk_sctx.pllx_misc || + base != tegra30_cpu_clk_sctx.pllx_base) { + /* restore PLLX settings if CPU is on different PLL */ + writel(tegra30_cpu_clk_sctx.pllx_misc, + clk_base + CLK_RESET_PLLX_MISC); + writel(tegra30_cpu_clk_sctx.pllx_base, + clk_base + CLK_RESET_PLLX_BASE); + + /* wait for PLL stabilization if PLLX was enabled */ + if (tegra30_cpu_clk_sctx.pllx_base & (1 << 30)) + udelay(300); + } } /* -- 2.23.0
Re: [PATCH] kbuild: binrpm-pkg: Propagate O= to rpmbuild
On 9/20/2019 11:30 PM, Masahiro Yamada wrote: Hi Jeffrey, On Sat, Sep 21, 2019 at 4:01 AM Jeffrey Hugo wrote: If the user specifies O= to indicate a specific output directory for the build, rpmbuild does not honor this, and will use its default, which could be the user's home directory. In cases where the user has limited home directory space, this could cause the build to outright fail. In the case of the binrpm-pkg target, redefine the top directory for output to be what the user specified in O=, thus the user will find a "rpmbuild" subdirectory in that location with all of the RPM artifacts. This does not apply to rpm-pkg, since we already cannot handle creating the source tarball out of tree. Signed-off-by: Jeffrey Hugo binrpm-pkg creates intermediate build artifacts in $(objtree)/, but puts only the final .rpm into ${HOME}/rpmbuild/RPMS/${ARCH}/. I disagree with this. Ubuntu 16.04 with the 4.12 version of the rpm packaging utilities will create several directories under rpmbuild in the user's home directory - BUILDROOT RPMS SOURCES SPECS SRPMS SOURCES/SPECS/SRPMS are empty for binrpm-pkg. RPMs contains the final RPMs as you've indicated BUILDROOT appears to contain more intermediate files which are used to then generate the final rpms. It seems like more than just the final rpms are dumped into the home directory. It has worked like that since a long time before probably in order to respect the default of rpmbuild. It still seems inconsistent to me that there is an option defined (KBUILD_OUTPUT and O=) which is described to allow the user to specify the location of the build output, yet there are parts of the build system which do not respect this. I also find it curious that there is a comment in the makefile this patch modifies that seems to suggest that it is intended for O= to impact the binrpm-pkg target, however that is currently not the case. If you change this behavior, it should be consistent. The 'rpmbuild' should be always located in the kernel tree instead of the user's home directory. I'm ok with this. The current patch attempts to preserve current behavior in the default case of not specifying an output directory, but dumping everything in the kernel tree seems sane to me. But, doing so might give impact to other users who rely on having 'rpmbuild' in the home directory. I have to hear opinions from others if this change is desired. Fair enough. Meanwhile, if you are unhappy with that, one solution is to use RPMOPTS. RPMOPTS exists to tweak the default behavior. I don't see this documented anywhere. I'm assuming that is supposed to be an environment variable. I'll have to see how well that works with our automated build systems. Command line options are generally preferred. Thanks. --- scripts/Makefile.package | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/scripts/Makefile.package b/scripts/Makefile.package index 56eadcc..aab0711 100644 --- a/scripts/Makefile.package +++ b/scripts/Makefile.package @@ -21,7 +21,7 @@ include $(srctree)/scripts/Kbuild.include # - Use /. to avoid tar packing just the symlink # Note that the rpm-pkg target cannot be used with KBUILD_OUTPUT, -# but the binrpm-pkg target can; for some reason O= gets ignored. +# but the binrpm-pkg target can # Remove hyphens since they have special meaning in RPM filenames KERNELPATH := kernel-$(subst -,_,$(KERNELRELEASE)) @@ -33,6 +33,12 @@ TAR_CONTENT := $(KBUILD_ALLDIRS) .config .scmversion Makefile \ Kbuild Kconfig COPYING $(wildcard localversion*) MKSPEC := $(srctree)/scripts/package/mkspec +RPM_OUTDIR := +ifneq ($(objtree),$(srctree)) +# Using absolute path as relative path will cause parts of rpmbuild to fail +RPM_OUTDIR := --define "_topdir $(abs_objtree)/rpmbuild" +endif + quiet_cmd_src_tar = TAR $(2).tar.gz cmd_src_tar = \ if test "$(objtree)" != "$(srctree)"; then \ @@ -65,8 +71,8 @@ PHONY += binrpm-pkg binrpm-pkg: $(MAKE) -f $(srctree)/Makefile $(CONFIG_SHELL) $(MKSPEC) prebuilt > $(objtree)/binkernel.spec - +rpmbuild $(RPMOPTS) --define "_builddir $(objtree)" --target \ - $(UTS_MACHINE) -bb $(objtree)/binkernel.spec + +rpmbuild $(RPMOPTS) --define "_builddir $(objtree)" $(RPM_OUTDIR) \ + --target $(UTS_MACHINE) -bb $(objtree)/binkernel.spec PHONY += deb-pkg deb-pkg: -- Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. -- Best Regards Masahiro Yamada -- Jeffrey Hugo Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
Re: [PATCH 30/32] tools lib bpf: Renaming pr_warning to pr_warn
On Fri, Sep 20, 2019 at 10:06 AM Kefeng Wang wrote: > > For kernel logging macro, pr_warning is completely removed and > replaced by pr_warn, using pr_warn in tools lib bpf for symmetry > to kernel logging macro, then we could drop pr_warning in the > whole linux code. > > Cc: Alexei Starovoitov > Cc: Daniel Borkmann > Cc: Martin KaFai Lau > Cc: Song Liu > Cc: Yonghong Song > Cc: b...@vger.kernel.org > Signed-off-by: Kefeng Wang > --- > tools/lib/bpf/btf.c | 56 +-- > tools/lib/bpf/btf_dump.c| 20 +- > tools/lib/bpf/libbpf.c | 652 > tools/lib/bpf/libbpf_internal.h | 2 +- > tools/lib/bpf/xsk.c | 4 +- > 5 files changed, 363 insertions(+), 371 deletions(-) > Thanks! This will allow to get rid of tons warnings from checkpatch.pl. Alexei, Daniel, can we take this through bpf-next tree once it's open? Acked-by: Andrii Nakryiko [...]
Re: [PATCH v2 net] net: ena: Select DIMLIB for ENA_ETHERNET
On Sun, 22 Sep 2019 13:38:08 +0800, Mao Wenan wrote: > If CONFIG_ENA_ETHERNET=y and CONFIG_DIMLIB=n, > below erros can be found: > drivers/net/ethernet/amazon/ena/ena_netdev.o: In function `ena_dim_work': > ena_netdev.c:(.text+0x21cc): undefined reference to > `net_dim_get_rx_moderation' > ena_netdev.c:(.text+0x21cc): relocation truncated to > fit: R_AARCH64_CALL26 against undefined symbol `net_dim_get_rx_moderation' > drivers/net/ethernet/amazon/ena/ena_netdev.o: In function `ena_io_poll': > ena_netdev.c:(.text+0x7bd4): undefined reference to `net_dim' > ena_netdev.c:(.text+0x7bd4): relocation truncated to fit: > R_AARCH64_CALL26 against undefined symbol `net_dim' > > After commit 282faf61a053 ("net: ena: switch to dim algorithm for rx adaptive > interrupt moderation"), it introduces dim algorithm, which configured by > CONFIG_DIMLIB. > So, this patch is to select DIMLIB for ENA_ETHERNET. > > Fixes: 282faf61a053 ("net: ena: switch to dim algorithm for rx adaptive > interrupt moderation") > Signed-off-by: Mao Wenan Applied, thank you!
[GIT PULL] livepatching for 5.4
Linus, please pull from git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching.git for-linus to receive livepatching subsystem update: = - error handling fix in livepatching module notifier, from Miroslav Benes = Thanks. Miroslav Benes (1): livepatch: Nullify obj->mod in klp_module_coming()'s error path kernel/livepatch/core.c | 1 + 1 file changed, 1 insertion(+) -- Jiri Kosina SUSE Labs
[GIT PULL] HID for 5.4
Linus, please pull from git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid.git for-linus to receive merge window updates for HID subsystem: = - syzbot memory corruption fixes for hidraw, Prodikeys, Logitech and Sony drivers from Alan Stern and Roderick Colenbrander - stuck 'fn' key fix for hid-apple from Joao Moreno - proper propagation of EPOLLOUT from hiddev and hidraw, from Fabian Henneke - fixes for handling power management for intel-ish devices with NO_D3 flag set, from Zhang Lixu - extension of supported usage range for customer page, as some Logitech devices are actually making use of it. From Olivier Gay. - hid-multitouch is no longer filtering mice node creation, from Benjamin Tissoires - MobileStudio Pro 13 support, from Ping Cheng - a few other device ID additions and assorted smaller fixes = Thanks. Aaron Armstrong Skomra (1): HID: wacom: support named keys on older devices Alan Stern (3): HID: hidraw: Fix invalid read in hidraw_ioctl HID: logitech: Fix general protection fault caused by Logitech driver HID: prodikeys: Fix general protection fault during probe Bastien Nocera (1): HID: sb0540: add support for Creative SB0540 IR receivers Benjamin Tissoires (5): HID: wacom: do not call hid_set_drvdata(hdev, NULL) HID: do not call hid_set_drvdata(hdev, NULL) in drivers HID: multitouch: do not filter mice nodes HID: multitouch: add support for the Smart Tech panel HID: logitech-dj: add support of the G700(s) receiver Fabian Henneke (2): hidraw: Return EPOLLOUT from hidraw_poll hiddev: Return EPOLLOUT from hiddev_poll Filipe Laíns (1): hid-logitech-dj: add the new Lightspeed receiver Hans de Goede (1): HID: logitech-dj: Fix crash when initial logi_dj_recv_query_paired_devices fails HungNien Chen (1): HID: i2c-hid: modify quirks for weida's devices Jason Gerecke (1): HID: wacom: Fix several minor compiler warnings Joao Moreno (1): HID: apple: Fix stuck function keys when using FN Joshua Clayton (3): HID: core: reformat and reduce hid_printk macros HID: core: Add printk_once variants to hid_warn() etc HID: core: fix dmesg flooding if report field larger than 32bit Olivier Gay (1): HID: logitech-dj: extend consumer usages range Ping Cheng (1): HID: wacom: add new MobileStudio Pro 13 support Roderick Colenbrander (1): HID: sony: Fix memory corruption issue on cleanup. Sebastian Parschauer (1): HID: Add quirk for HP X500 PIXART OEM mouse Zhang Lixu (3): HID: intel-ish-hid: ipc: set NO_D3 flag only when needed HID: intel-ish-hid: ipc: make ish suspend paths clear HID: intel-ish-hid: ipc: check the NO_D3 flag to distinguish resume paths MAINTAINERS | 6 + drivers/hid/Kconfig | 9 ++ drivers/hid/Makefile| 1 + drivers/hid/hid-apple.c | 49 +++--- drivers/hid/hid-core.c | 4 +- drivers/hid/hid-cougar.c| 6 +- drivers/hid/hid-creative-sb0540.c | 268 drivers/hid/hid-gfrm.c | 7 - drivers/hid/hid-ids.h | 5 +- drivers/hid/hid-lenovo.c| 2 - drivers/hid/hid-lg.c| 10 +- drivers/hid/hid-lg4ff.c | 1 - drivers/hid/hid-logitech-dj.c | 32 ++-- drivers/hid/hid-multitouch.c| 37 - drivers/hid/hid-picolcd_core.c | 7 +- drivers/hid/hid-prodikeys.c | 12 +- drivers/hid/hid-quirks.c| 1 + drivers/hid/hid-sensor-hub.c| 1 - drivers/hid/hid-sony.c | 2 +- drivers/hid/hidraw.c| 4 +- drivers/hid/i2c-hid/i2c-hid-core.c | 4 +- drivers/hid/intel-ish-hid/ipc/hw-ish.h | 1 + drivers/hid/intel-ish-hid/ipc/ipc.c | 2 +- drivers/hid/intel-ish-hid/ipc/pci-ish.c | 95 ++- drivers/hid/usbhid/hiddev.c | 2 +- drivers/hid/wacom_sys.c | 25 ++- drivers/hid/wacom_wac.c | 76 - include/linux/hid.h | 43 ++--- 28 files changed, 561 insertions(+), 151 deletions(-) create mode 100644 drivers/hid/hid-creative-sb0540.c -- Jiri Kosina SUSE Labs
Re: [PATCH v2 2/2] rtc: wilco-ec: Fix license to GPL from GPLv2
On 22/09/2019 22:29:48+0200, Pavel Machek wrote: > On Mon 2019-09-16 12:12:17, Nick Crews wrote: > > Signed-off-by: Nick Crews > > --- > > drivers/rtc/rtc-wilco-ec.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/rtc/rtc-wilco-ec.c b/drivers/rtc/rtc-wilco-ec.c > > index e84faa268caf..951268f5e690 100644 > > --- a/drivers/rtc/rtc-wilco-ec.c > > +++ b/drivers/rtc/rtc-wilco-ec.c > > @@ -184,5 +184,5 @@ module_platform_driver(wilco_ec_rtc_driver); > > > > MODULE_ALIAS("platform:rtc-wilco-ec"); > > MODULE_AUTHOR("Nick Crews "); > > -MODULE_LICENSE("GPL v2"); > > +MODULE_LICENSE("GPL"); > > MODULE_DESCRIPTION("Wilco EC RTC driver"); > > File spdx header says GPL-2.0, this change would make it inconsistent with > that... > Commit bf7fbeeae6db ("module: Cure the MODULE_LICENSE "GPL" vs. "GPL v2" bogosity") doesn't agree with you (but I was surprised too). -- Alexandre Belloni, Bootlin Embedded Linux and Kernel engineering https://bootlin.com
Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return
On Sun, Sep 22, 2019 at 10:43:19AM -0700, Matt Cover wrote: > On Sun, Sep 22, 2019 at 5:37 AM Michael S. Tsirkin wrote: > > > > On Fri, Sep 20, 2019 at 11:58:43AM -0700, Matthew Cover wrote: > > > Treat a negative return from a TUNSETSTEERINGEBPF bpf prog as a signal > > > to fallback to tun_automq_select_queue() for tx queue selection. > > > > > > Compilation of this exact patch was tested. > > > > > > For functional testing 3 additional printk()s were added. > > > > > > Functional testing results (on 2 txq tap device): > > > > > > [Fri Sep 20 18:33:27 2019] == tun no prog == > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran > > > [Fri Sep 20 18:33:27 2019] == tun prog -1 == > > > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '-1' > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran > > > [Fri Sep 20 18:33:27 2019] == tun prog 0 == > > > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '0' > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' > > > [Fri Sep 20 18:33:27 2019] == tun prog 1 == > > > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '1' > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '1' > > > [Fri Sep 20 18:33:27 2019] == tun prog 2 == > > > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '2' > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' > > > > > > Signed-off-by: Matthew Cover > > > > > > Could you add a bit more motivation data here? > > Thank you for these questions Michael. > > I'll plan on adding the below information to the > commit message and submitting a v2 of this patch > when net-next reopens. In the meantime, it would > be very helpful to know if these answers address > some of your concerns. > > > 1. why is this a good idea > > This change allows TUNSETSTEERINGEBPF progs to > do any of the following. > 1. implement queue selection for a subset of > traffic (e.g. special queue selection logic > for ipv4, but return negative and use the > default automq logic for ipv6) > 2. determine there isn't sufficient information > to do proper queue selection; return > negative and use the default automq logic > for the unknown > 3. implement a noop prog (e.g. do > bpf_trace_printk() then return negative and > use the default automq logic for everything) > > > 2. how do we know existing userspace does not rely on existing behaviour > > Prior to this change a negative return from a > TUNSETSTEERINGEBPF prog would have been cast > into a u16 and traversed netdev_cap_txqueue(). > > In most cases netdev_cap_txqueue() would have > found this value to exceed real_num_tx_queues > and queue_index would be updated to 0. > > It is possible that a TUNSETSTEERINGEBPF prog > return a negative value which when cast into a > u16 results in a positive queue_index less than > real_num_tx_queues. For example, on x86_64, a > return value of -65535 results in a queue_index > of 1; which is a valid queue for any multiqueue > device. > > It seems unlikely, however as stated above is > unfortunately possible, that existing > TUNSETSTEERINGEBPF programs would choose to > return a negative value rather than return the > positive value which holds the same meaning. > > It seems more likely that future > TUNSETSTEERINGEBPF programs would leverage a > negative return and potentially be loaded into > a kernel with the old behavior. OK if we are returning a special value, shouldn't we limit it? How about a special value with this meaning? If we are changing an ABI let's at least make it extensible. > > 3. why doesn't userspace need a way to figure out whether it runs on a > > kernel with and > >without this patch > > There may be some value in exposing this fact > to the ebpf prog loader. What is the standard > practice here, a define? We'll need something at runtime - people move binaries between kernels without rebuilding then. An ioctl is one option. A sysfs attribute is another, an ethtool flag yet another. A combination of these is possible. And if we are doing this anyway, maybe let userspace select the new behaviour? This way we can stay compatible with old userspace... > > > > > > thanks, > > MST > > > > > --- > > > drivers/net/tun.c | 20 +++- > > > 1 file changed, 11 insertions(+), 9 deletions(-) > > > > > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c > > > index aab0be4..173d159 100644 > > > --- a/drivers/net/tun.c > > > +++ b/drivers/net/tun.c > > > @@ -583,35 +583,37 @@ static u16 tun_automq_select_queue(struct > > > tun_struct *tun, struct sk_buff *skb) > > >
Re: [PATCH v2 2/2] rtc: wilco-ec: Fix license to GPL from GPLv2
On Mon 2019-09-16 12:12:17, Nick Crews wrote: > Signed-off-by: Nick Crews > --- > drivers/rtc/rtc-wilco-ec.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/rtc/rtc-wilco-ec.c b/drivers/rtc/rtc-wilco-ec.c > index e84faa268caf..951268f5e690 100644 > --- a/drivers/rtc/rtc-wilco-ec.c > +++ b/drivers/rtc/rtc-wilco-ec.c > @@ -184,5 +184,5 @@ module_platform_driver(wilco_ec_rtc_driver); > > MODULE_ALIAS("platform:rtc-wilco-ec"); > MODULE_AUTHOR("Nick Crews "); > -MODULE_LICENSE("GPL v2"); > +MODULE_LICENSE("GPL"); > MODULE_DESCRIPTION("Wilco EC RTC driver"); File spdx header says GPL-2.0, this change would make it inconsistent with that... Best regards, Pavel
Re: [PATCH AUTOSEL 5.3 169/203] x86/platform/uv: Fix kmalloc() NULL check routine
On Sun, Sep 22, 2019 at 02:43:15PM -0400, Sasha Levin wrote: > From: Austin Kim > > [ Upstream commit 864b23f0169d5bff677e8443a7a90dfd6b090afc ] > > The result of kmalloc() should have been checked ahead of below statement: > > pqp = (struct bau_pq_entry *)vp; > > Move BUG_ON(!vp) before above statement. > > Signed-off-by: Austin Kim > Cc: Dimitri Sivanich > Cc: Hedi Berriche > Cc: Linus Torvalds > Cc: Mike Travis > Cc: Peter Zijlstra > Cc: Russ Anderson > Cc: Steve Wahl > Cc: Thomas Gleixner > Cc: alli...@lohutok.net > Cc: a...@infradead.org > Cc: arm...@tjaldur.nl > Cc: b...@alien8.de > Cc: dvh...@infradead.org > Cc: gre...@linuxfoundation.org > Cc: h...@zytor.com > Cc: k...@umn.edu > Cc: platform-driver-...@vger.kernel.org > Link: https://lkml.kernel.org/r/20190905232951.GA28779@LGEARND20B15 > Signed-off-by: Ingo Molnar > Signed-off-by: Sasha Levin > --- > arch/x86/platform/uv/tlb_uv.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/platform/uv/tlb_uv.c b/arch/x86/platform/uv/tlb_uv.c > index 20c389a91b803..5f0a96bf27a1f 100644 > --- a/arch/x86/platform/uv/tlb_uv.c > +++ b/arch/x86/platform/uv/tlb_uv.c > @@ -1804,9 +1804,9 @@ static void pq_init(int node, int pnode) > > plsize = (DEST_Q_SIZE + 1) * sizeof(struct bau_pq_entry); > vp = kmalloc_node(plsize, GFP_KERNEL, node); > - pqp = (struct bau_pq_entry *)vp; > - BUG_ON(!pqp); > + BUG_ON(!vp); > > + pqp = (struct bau_pq_entry *)vp; > cp = (char *)pqp + 31; > pqp = (struct bau_pq_entry *)(((unsigned long)cp >> 5) << 5); > How did this even get merged in the first place? I thought a number of us complained about it. This isn't any change in code, and the original is just fine, the author didn't realize how C works :( Please drop this. thanks, greg k-h
Re: [PATCH AUTOSEL 5.2 154/185] x86/platform/uv: Fix kmalloc() NULL check routine
On Sun, Sep 22, 2019 at 02:48:52PM -0400, Sasha Levin wrote: > From: Austin Kim > > [ Upstream commit 864b23f0169d5bff677e8443a7a90dfd6b090afc ] > > The result of kmalloc() should have been checked ahead of below statement: > > pqp = (struct bau_pq_entry *)vp; > > Move BUG_ON(!vp) before above statement. > > Signed-off-by: Austin Kim > Cc: Dimitri Sivanich > Cc: Hedi Berriche > Cc: Linus Torvalds > Cc: Mike Travis > Cc: Peter Zijlstra > Cc: Russ Anderson > Cc: Steve Wahl > Cc: Thomas Gleixner > Cc: alli...@lohutok.net > Cc: a...@infradead.org > Cc: arm...@tjaldur.nl > Cc: b...@alien8.de > Cc: dvh...@infradead.org > Cc: gre...@linuxfoundation.org > Cc: h...@zytor.com > Cc: k...@umn.edu > Cc: platform-driver-...@vger.kernel.org > Link: https://lkml.kernel.org/r/20190905232951.GA28779@LGEARND20B15 > Signed-off-by: Ingo Molnar > Signed-off-by: Sasha Levin > --- > arch/x86/platform/uv/tlb_uv.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/platform/uv/tlb_uv.c b/arch/x86/platform/uv/tlb_uv.c > index 0c7dfec4acac8..5a66d73620ce4 100644 > --- a/arch/x86/platform/uv/tlb_uv.c > +++ b/arch/x86/platform/uv/tlb_uv.c > @@ -1815,9 +1815,9 @@ static void pq_init(int node, int pnode) > > plsize = (DEST_Q_SIZE + 1) * sizeof(struct bau_pq_entry); > vp = kmalloc_node(plsize, GFP_KERNEL, node); > - pqp = (struct bau_pq_entry *)vp; > - BUG_ON(!pqp); > + BUG_ON(!vp); > > + pqp = (struct bau_pq_entry *)vp; > cp = (char *)pqp + 31; > pqp = (struct bau_pq_entry *)(((unsigned long)cp >> 5) << 5); > Please drop from everywhere. thanks, greg k-h
[PATCH 4/4] MIPS: JZ4780: DTS: Add CPU nodes
The JZ4780 have 2 core, adding to DT. Signed-off-by: Alexandre GRIVEAUX --- arch/mips/boot/dts/ingenic/jz4780.dtsi | 17 + 1 file changed, 17 insertions(+) diff --git a/arch/mips/boot/dts/ingenic/jz4780.dtsi b/arch/mips/boot/dts/ingenic/jz4780.dtsi index f928329b034b..9c7346724f1f 100644 --- a/arch/mips/boot/dts/ingenic/jz4780.dtsi +++ b/arch/mips/boot/dts/ingenic/jz4780.dtsi @@ -7,6 +7,23 @@ #size-cells = <1>; compatible = "ingenic,jz4780"; + cpus { + #address-cells = <1>; + #size-cells = <0>; + + cpu@0 { + compatible = "ingenic,jz4780"; + device_type = "cpu"; + reg = <0>; + }; + + cpu@1 { + compatible = "ingenic,jz4780"; + device_type = "cpu"; + reg = <1>; + }; + }; + cpuintc: interrupt-controller { #address-cells = <0>; #interrupt-cells = <1>; -- 2.20.1
[PATCH 3/4] MIPS: CI20: DTS: Add Leds
Adding leds and related triggers. Signed-off-by: Alexandre GRIVEAUX --- arch/mips/boot/dts/ingenic/ci20.dts | 28 1 file changed, 28 insertions(+) diff --git a/arch/mips/boot/dts/ingenic/ci20.dts b/arch/mips/boot/dts/ingenic/ci20.dts index c62c36ae94c2..37b93166bf22 100644 --- a/arch/mips/boot/dts/ingenic/ci20.dts +++ b/arch/mips/boot/dts/ingenic/ci20.dts @@ -25,6 +25,34 @@ 0x3000 0x3000>; }; + leds { + compatible = "gpio-leds"; + + led0 { + label = "ci20:red:led0"; + gpios = < 3 GPIO_ACTIVE_HIGH>; + linux,default-trigger = "none"; + }; + + led1 { + label = "ci20:red:led1"; + gpios = < 2 GPIO_ACTIVE_HIGH>; + linux,default-trigger = "nand-disk"; + }; + + led2 { + label = "ci20:red:led2"; + gpios = < 1 GPIO_ACTIVE_HIGH>; + linux,default-trigger = "cpu1"; + }; + + led3 { + label = "ci20:red:led3"; + gpios = < 0 GPIO_ACTIVE_HIGH>; + linux,default-trigger = "cpu0"; + }; + }; + eth0_power: fixedregulator@0 { compatible = "regulator-fixed"; regulator-name = "eth0_power"; -- 2.20.1
[PATCH 2/4] MIPS: CI20: DTS: Add IW8103 Wifi + bluetooth
Add IW8103 Wifi + bluetooth module to device tree and related power domain. Signed-off-by: Alexandre GRIVEAUX --- arch/mips/boot/dts/ingenic/ci20.dts | 39 + 1 file changed, 39 insertions(+) diff --git a/arch/mips/boot/dts/ingenic/ci20.dts b/arch/mips/boot/dts/ingenic/ci20.dts index 4a77fa30a9cd..c62c36ae94c2 100644 --- a/arch/mips/boot/dts/ingenic/ci20.dts +++ b/arch/mips/boot/dts/ingenic/ci20.dts @@ -31,6 +31,13 @@ gpio = < 25 GPIO_ACTIVE_LOW>; enable-active-high; }; + + wlan0_power: fixedregulator@1 { + compatible = "regulator-fixed"; + regulator-name = "wlan0_power"; + gpio = < 19 GPIO_ACTIVE_LOW>; + enable-active-high; + }; }; { @@ -54,9 +61,18 @@ bus-width = <4>; max-frequency = <5000>; + non-removable; pinctrl-names = "default"; pinctrl-0 = <_mmc1>; + + brcmf: wifi@1 { +/* reg = <4>;*/ + compatible = "brcm,bcm4330-fmac"; + vcc-supply = <_power>; + device-wakeup-gpios = < 9 GPIO_ACTIVE_HIGH>; + shutdown-gpios = < 7 GPIO_ACTIVE_LOW>; + }; }; { @@ -73,6 +89,23 @@ pinctrl-0 = <_uart1>; }; + { + status = "okay"; + + pinctrl-names = "default"; + pinctrl-0 = <_uart2>; + uart-has-rtscts; + + bluetooth { + compatible = "brcm,bcm4330-bt"; + reset-gpios = < 8 GPIO_ACTIVE_HIGH>; + vcc-supply = <_power>; + device-wakeup-gpios = < 5 GPIO_ACTIVE_HIGH>; + host-wakeup-gpios = < 6 GPIO_ACTIVE_HIGH>; + shutdown-gpios = < 4 GPIO_ACTIVE_LOW>; + }; +}; + { status = "okay"; @@ -314,6 +347,12 @@ bias-disable; }; + pins_uart2: uart2 { + function = "uart2"; + groups = "uart2-data", "uart2-hwflow"; + bias-disable; + }; + pins_uart3: uart3 { function = "uart3"; groups = "uart3-data", "uart3-hwflow"; -- 2.20.1
Re: [PATCH V7 5/7] cpufreq: Register notifiers with the PM QoS framework
08.07.2019 13:57, Viresh Kumar пишет: > This registers the notifiers for min/max frequency constraints with the > PM QoS framework. The constraints are also taken into consideration in > cpufreq_set_policy(). > > This also relocates cpufreq_policy_put_kobj() as it is required to be > called from cpufreq_policy_alloc() now. > > refresh_frequency_limits() is updated to avoid calling > cpufreq_set_policy() for inactive policies and handle_update() is > updated to have proper locking in place. > > No constraints are added until now though. > > Reviewed-by: Matthias Kaehlcke > Reviewed-by: Ulf Hansson > Signed-off-by: Viresh Kumar > Signed-off-by: Rafael J. Wysocki > --- > V6->V7: > - All callers of refresh_frequency_limits(), except handle_update(), > take the policy->rwsem and result in deadlock as > refresh_frequency_limits() also takes the same lock again. Fix that > by taking the rwsem from handle_update() instead. > > @Rafael: Sending it before Pavel has verified it as I would be offline > later, in case you want to apply this today itself. > > drivers/cpufreq/cpufreq.c | 135 +- > include/linux/cpufreq.h | 3 + > 2 files changed, 108 insertions(+), 30 deletions(-) > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > index ceb57af15ca0..b96ef6db1bfe 100644 > --- a/drivers/cpufreq/cpufreq.c > +++ b/drivers/cpufreq/cpufreq.c > @@ -26,6 +26,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -999,7 +1000,7 @@ static void add_cpu_dev_symlink(struct cpufreq_policy > *policy, unsigned int cpu) > { > struct device *dev = get_cpu_device(cpu); > > - if (!dev) > + if (unlikely(!dev)) > return; > > if (cpumask_test_and_set_cpu(cpu, policy->real_cpus)) > @@ -1117,14 +1118,16 @@ static int cpufreq_add_policy_cpu(struct > cpufreq_policy *policy, unsigned int cp > > static void refresh_frequency_limits(struct cpufreq_policy *policy) > { > - struct cpufreq_policy new_policy = *policy; > - > - pr_debug("updating policy for CPU %u\n", policy->cpu); > + struct cpufreq_policy new_policy; > > - new_policy.min = policy->user_policy.min; > - new_policy.max = policy->user_policy.max; > + if (!policy_is_inactive(policy)) { > + new_policy = *policy; > + pr_debug("updating policy for CPU %u\n", policy->cpu); > > - cpufreq_set_policy(policy, _policy); > + new_policy.min = policy->user_policy.min; > + new_policy.max = policy->user_policy.max; > + cpufreq_set_policy(policy, _policy); > + } > } > > static void handle_update(struct work_struct *work) > @@ -1133,14 +1136,60 @@ static void handle_update(struct work_struct *work) > container_of(work, struct cpufreq_policy, update); > > pr_debug("handle_update for cpu %u called\n", policy->cpu); > + down_write(>rwsem); > refresh_frequency_limits(policy); > + up_write(>rwsem); > +} > + > +static int cpufreq_notifier_min(struct notifier_block *nb, unsigned long > freq, > + void *data) > +{ > + struct cpufreq_policy *policy = container_of(nb, struct cpufreq_policy, > nb_min); > + > + schedule_work(>update); > + return 0; > +} > + > +static int cpufreq_notifier_max(struct notifier_block *nb, unsigned long > freq, > + void *data) > +{ > + struct cpufreq_policy *policy = container_of(nb, struct cpufreq_policy, > nb_max); > + > + schedule_work(>update); > + return 0; > +} > + > +static void cpufreq_policy_put_kobj(struct cpufreq_policy *policy) > +{ > + struct kobject *kobj; > + struct completion *cmp; > + > + down_write(>rwsem); > + cpufreq_stats_free_table(policy); > + kobj = >kobj; > + cmp = >kobj_unregister; > + up_write(>rwsem); > + kobject_put(kobj); > + > + /* > + * We need to make sure that the underlying kobj is > + * actually not referenced anymore by anybody before we > + * proceed with unloading. > + */ > + pr_debug("waiting for dropping of refcount\n"); > + wait_for_completion(cmp); > + pr_debug("wait complete\n"); > } > > static struct cpufreq_policy *cpufreq_policy_alloc(unsigned int cpu) > { > struct cpufreq_policy *policy; > + struct device *dev = get_cpu_device(cpu); > int ret; > > + if (!dev) > + return NULL; > + > policy = kzalloc(sizeof(*policy), GFP_KERNEL); > if (!policy) > return NULL; > @@ -1157,7 +1206,7 @@ static struct cpufreq_policy > *cpufreq_policy_alloc(unsigned int cpu) > ret = kobject_init_and_add(>kobj, _cpufreq, > cpufreq_global_kobject, "policy%u", cpu); > if (ret) { > - pr_err("%s: failed to init policy->kobj: %d\n", __func__, ret); > + dev_err(dev, "%s: failed to init
[PATCH 1/4] MIPS: CI20: DTS: Add I2C nodes
Adding missing I2C nodes and some peripheral: - PMU - RTC Signed-off-by: Alexandre GRIVEAUX --- arch/mips/boot/dts/ingenic/ci20.dts | 147 1 file changed, 147 insertions(+) diff --git a/arch/mips/boot/dts/ingenic/ci20.dts b/arch/mips/boot/dts/ingenic/ci20.dts index 2e9952311ecd..4a77fa30a9cd 100644 --- a/arch/mips/boot/dts/ingenic/ci20.dts +++ b/arch/mips/boot/dts/ingenic/ci20.dts @@ -87,6 +87,123 @@ pinctrl-0 = <_uart4>; }; + { + status = "okay"; + + pinctrl-names = "default"; + pinctrl-0 = <_i2c0>; + + clock-frequency = <40>; + + act8600: act8600@5a { + compatible = "active-semi,act8600"; + reg = <0x5a>; + status = "okay"; + + regulators { + vddcore: SUDCDC1 { + regulator-name = "VDDCORE"; + regulator-min-microvolt = <110>; + regulator-max-microvolt = <110>; + regulator-always-on; + }; + vddmem: SUDCDC2 { + regulator-name = "VDDMEM"; + regulator-min-microvolt = <150>; + regulator-max-microvolt = <150>; + regulator-always-on; + }; + vcc_33: SUDCDC3 { + regulator-name = "VCC33"; + regulator-min-microvolt = <330>; + regulator-max-microvolt = <330>; + regulator-always-on; + }; + vcc_50: SUDCDC4 { + regulator-name = "VCC50"; + regulator-min-microvolt = <500>; + regulator-max-microvolt = <500>; + regulator-always-on; + }; + vcc_25: LDO_REG5 { + regulator-name = "VCC25"; + regulator-min-microvolt = <250>; + regulator-max-microvolt = <250>; + regulator-always-on; + }; + wifi_io: LDO_REG6 { + regulator-name = "WIFIIO"; + regulator-min-microvolt = <250>; + regulator-max-microvolt = <250>; + regulator-always-on; + }; + vcc_28: LDO_REG7 { + regulator-name = "VCC28"; + regulator-min-microvolt = <280>; + regulator-max-microvolt = <280>; + regulator-always-on; + }; + vcc_15: LDO_REG8 { + regulator-name = "VCC15"; + regulator-min-microvolt = <150>; + regulator-max-microvolt = <150>; + regulator-always-on; + }; + vcc_18: LDO_REG9 { + regulator-name = "VCC18"; + regulator-min-microvolt = <180>; + regulator-max-microvolt = <180>; + regulator-always-on; + }; + vcc_11: LDO_REG10 { + regulator-name = "VCC11"; + regulator-min-microvolt = <110>; + regulator-max-microvolt = <110>; + regulator-always-on; + }; + }; + }; +}; + + { + status = "okay"; + + pinctrl-names = "default"; + pinctrl-0 = <_i2c1>; + +}; + + { + status = "okay"; + + pinctrl-names = "default"; + pinctrl-0 = <_i2c2>; + +}; + + { + status = "okay"; + + pinctrl-names = "default"; + pinctrl-0 = <_i2c3>; + +}; + + { + status = "okay"; + + pinctrl-names = "default"; + pinctrl-0 = <_i2c4>; + + clock-frequency = <40>; + + rtc@51 { + compatible = "nxp,pcf8563"; + reg = <0x51>; + interrupts = <110>; + }; +}; + { status = "okay"; @@ -209,6 +326,36 @@ bias-disable; }; + pins_i2c0: i2c0 { + function = "i2c0"; + groups = "i2c0-data"; + bias-disable; + }; + + pins_i2c1: i2c1 { + function = "i2c1"; + groups = "i2c1-data"; +
[PATCH 0/4] MIPS: CI20: DTS: Add nodes to Creator CI20 board
Attemping to make my CI20 more usefull than a paperweight, I add nodes to Devicetree, at this time: - The IW8103 need some work to stay alive because power seem to turn off. - The leds patch lack of correct option in ci20_defconfig. - The Cpu patch isn't usefull without SMP support of jz4780. Alexandre GRIVEAUX (4): MIPS: CI20: DTS: Add I2C nodes MIPS: CI20: DTS: Add IW8103 Wifi + bluetooth MIPS: CI20: DTS: Add Leds MIPS: JZ4780: DTS: Add CPU nodes arch/mips/boot/dts/ingenic/ci20.dts| 214 + arch/mips/boot/dts/ingenic/jz4780.dtsi | 17 ++ 2 files changed, 231 insertions(+) base-commit: 4c37310a2e605357cf47b3d357696309f1181b5c -- 2.20.1
Teasith18
开发瓢 v:ⓍⓍⓑⓑ①④③② eople using this calendar will see your email address if you join. To join a shared calendar, you must have iCloud. Learn More
[PATCH AUTOSEL 5.3 030/203] rcu/tree: Call setschedule() gp ktread to SCHED_FIFO outside of atomic region
From: Juri Lelli [ Upstream commit 1a763fd7c6335e3122c1cc09576ef6c99ada4267 ] sched_setscheduler() needs to acquire cpuset_rwsem, but it is currently called from an invalid (atomic) context by rcu_spawn_gp_kthread(). Fix that by simply moving sched_setscheduler_nocheck() call outside of the atomic region, as it doesn't actually require to be guarded by rcu_node lock. Suggested-by: Peter Zijlstra Tested-by: Dietmar Eggemann Signed-off-by: Juri Lelli Signed-off-by: Peter Zijlstra (Intel) Cc: Linus Torvalds Cc: Thomas Gleixner Cc: bris...@redhat.com Cc: clau...@evidence.eu.com Cc: lize...@huawei.com Cc: long...@redhat.com Cc: luca.ab...@santannapisa.it Cc: mathieu.poir...@linaro.org Cc: rost...@goodmis.org Cc: t...@kernel.org Cc: tommaso.cucino...@santannapisa.it Link: https://lkml.kernel.org/r/2019071914.31694-8-juri.le...@redhat.com Signed-off-by: Ingo Molnar Signed-off-by: Sasha Levin --- kernel/rcu/tree.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index a14e5fbbea467..eb764c24bc4d4 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -3234,13 +3234,13 @@ static int __init rcu_spawn_gp_kthread(void) t = kthread_create(rcu_gp_kthread, NULL, "%s", rcu_state.name); if (WARN_ONCE(IS_ERR(t), "%s: Could not start grace-period kthread, OOM is now expected behavior\n", __func__)) return 0; + if (kthread_prio) + sched_setscheduler_nocheck(t, SCHED_FIFO, ); rnp = rcu_get_root(); raw_spin_lock_irqsave_rcu_node(rnp, flags); rcu_state.gp_kthread = t; - if (kthread_prio) { + if (kthread_prio) sp.sched_priority = kthread_prio; - sched_setscheduler_nocheck(t, SCHED_FIFO, ); - } raw_spin_unlock_irqrestore_rcu_node(rnp, flags); wake_up_process(t); rcu_spawn_nocb_kthreads(); -- 2.20.1
[PATCH AUTOSEL 5.3 005/203] regulator: lm363x: Fix off-by-one n_voltages for lm3632 ldo_vpos/ldo_vneg
From: Axel Lin [ Upstream commit 1e2cc8c5e0745b545d4974788dc606d678b6e564 ] According to the datasheet https://www.ti.com/lit/ds/symlink/lm3632a.pdf Table 20. VPOS Bias Register Field Descriptions VPOS[5:0] Sets the Positive Display Bias (LDO) Voltage (50 mV per step) 00: 4 V 01: 4.05 V 10: 4.1 V 011101: 5.45 V 00: 5.5 V (Default) 01: 5.55 V 100111: 5.95 V 101000: 6 V Note: Codes 101001 to 11 map to 6 V The LM3632_LDO_VSEL_MAX should be 0b101000 (0x28), so the maximum voltage can match the datasheet. Fixes: 3a8d1a73a037 ("regulator: add LM363X driver") Signed-off-by: Axel Lin Link: https://lore.kernel.org/r/20190626132632.32629-1-axel@ingics.com Signed-off-by: Mark Brown Signed-off-by: Sasha Levin --- drivers/regulator/lm363x-regulator.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/regulator/lm363x-regulator.c b/drivers/regulator/lm363x-regulator.c index 5647e2f97ff8d..e4a27d63bf901 100644 --- a/drivers/regulator/lm363x-regulator.c +++ b/drivers/regulator/lm363x-regulator.c @@ -30,7 +30,7 @@ /* LM3632 */ #define LM3632_BOOST_VSEL_MAX 0x26 -#define LM3632_LDO_VSEL_MAX0x29 +#define LM3632_LDO_VSEL_MAX0x28 #define LM3632_VBOOST_MIN 450 #define LM3632_VLDO_MIN400 -- 2.20.1