Re: [PATCH v2 2/3] perf annotate: Introduce the new source code view
Hi Peter, On Wed, Mar 01, 2017 at 04:07:46PM +0100, Peter Zijlstra wrote: > On Wed, Mar 01, 2017 at 11:56:39PM +0900, Namhyung Kim wrote: > > > It's a kind of user experience issue. We provide the asm-only and > > asm+source annotation, and I think it'd be nice to add source-only > > option. And I remember that it was requested some time ago.. > > Thing is, an optimizing compiler -- that same beast that ensures your > objdump -S output is such a garbled mess -- can generate code that > becomes very hard to relate to the original source code. I understand that. Maybe it's not 100% accurate, but it still has valuable information. And I think the source-only view can give more readable outputs using the info. Also I guess many developers already aware of the effect of optimizing compilers. > > I'm really sceptical the source line only view is very useful; maybe if > you build with -O0, but then, if you do that you're not bothered with > performance. Even with an optimizing compiler, it can be helpful to overview which parts of the code are bottlenecks IMHO. After that, one can see the asm to identify the problem deeply, if needed. Thanks, Namhyung
RE: [PATCH v2] staging: mkspec: added aarch64 ifarch case.
Hi Will, This patch (http://lkml.kernel.org/r/20161122213434.14788-1-mma...@suse.com) looks better. It has what Linus calls "good taste". ;-) I didn't see it in mmarek's kbuild branches (for-next,rc-fixes), however. Still making its way there? But it doesn't quite fix the native 'make rpm' build completely. While it gets beyond the point at which 'make rpm' fails without my patch, it exposes another issue for which I am debugging right now: ld -EL -r -T ./scripts/module-common.lds --build-id -o net/unix/unix.ko net/unix/unix.o net/unix/unix.mod.o ; true make -f ./scripts/Makefile.fwinst obj=firmware __fw_modbuild error: Bad exit status from /var/tmp/rpm-tmp.YcfiLf (%build) Bad exit status from /var/tmp/rpm-tmp.YcfiLf (%build) RPM build errors: make[1]: *** [rpm] Error 1 make: *** [rpm] Error 2 If I succeed in root-causing the problem, I'll submit a patch for that (if another doesn't beat me to it). And assuming that patch is accepted for having Linusian "good taste", then it, and http://lkml.kernel.org/r/20161122213434.14788-1-mma...@suse.com, will make my current submitted patch extraneous. Thanks, James -Original Message- From: Will Deacon [mailto:will.dea...@arm.com] Sent: Wednesday, March 1, 2017 11:06 PM To: James Tau Cc: linux-kernel@vger.kernel.org; linux-kbu...@vger.kernel.org; mma...@suse.com; catalin.mari...@arm.com; Chris Metcalf Subject: Re: [PATCH v2] staging: mkspec: added aarch64 ifarch case. On Wed, Mar 01, 2017 at 09:24:14AM -0800, James Tau wrote: > Patch attempting to fix native 'make rpm' build on ARM64 machines by > adding an "ifarch aarch64" case. Without it, build fails because the > 'cp ...' in the default case can't find the built image. > > Signed-off-by: James Tau > --- > scripts/package/mkspec | 4 > 1 file changed, 4 insertions(+) Is this the same issue that was fixed by: http://lkml.kernel.org/r/20161122213434.14788-1-mma...@suse.com ? I was assuming that Michael was going to queue those, but I could be wrong. Will
[RESEND PATCH v5 04/11 (Missed 04/11 in PATCH v5 series)] Documentation: perf: hisi: Documentation for HiP05/06/07 PMU event counting.
Documentation for perf usage and Hisilicon SoC PMU uncore events. The Hisilicon SOC has event counters for hardware modules like L3 cache, Miscellaneous node etc. These events are all uncore. Signed-off-by: Anurup MSigned-off-by: Shaokun Zhang --- Documentation/perf/hisi-pmu.txt | 76 + 1 file changed, 76 insertions(+) create mode 100644 Documentation/perf/hisi-pmu.txt diff --git a/Documentation/perf/hisi-pmu.txt b/Documentation/perf/hisi-pmu.txt new file mode 100644 index 000..e3ac562 --- /dev/null +++ b/Documentation/perf/hisi-pmu.txt @@ -0,0 +1,76 @@ +Hisilicon SoC PMU (Performance Monitoring Unit) + +The Hisilicon SoC HiP05/06/07 chips consist of various independent system +device PMU's such as L3 cache(L3C) and Miscellaneous Nodes(MN). +These PMU devices are independent and have hardware logic to gather +statistics and performance information. + +HiP0x chips are encapsulated by multiple CPU and IO die's. The CPU die is +called as Super CPU cluster (SCCL) which includes 16 cpu-cores. Every SCCL +is further grouped as CPU clusters (CCL) which includes 4 cpu-cores each. +Each SCCL has 1 L3 cache and 1 MN units. + +The L3 cache is shared by all CPU cores in a CPU die. The L3C has four banks +(or instances). Each bank or instance of L3C has Eight 32-bit counter +registers and also event control registers. The HiP05/06 chip L3 cache has +22 statistics events. The HiP07 chip has 66 statistics events. These events +are very useful for debugging. + +The MN module is also shared by all CPU cores in a CPU die. It receives +barriers and DVM(Distributed Virtual Memory) messages from cpu or smmu, and +perform the required actions and return response messages. These events are +very useful for debugging. The MN has total 9 statistics events and support +four 32-bit counter registers in HiP05/06/07 chips. + +There is no memory mapping for L3 cache and MN registers. It can be accessed +by using the Hisilicon djtag interface. The Djtag in a SCCL is an independent +module which connects with some modules in the SoC by Debug Bus. + +Hisilicon SoC (HiP05/06/07) PMU driver +-- +The HiP0x PMU driver shall register perf PMU drivers like L3 cache, MN, etc. +The available events and configuration options shall be described in the sysfs. +The "perf list" shall list the available events from sysfs. + +The L3 cache in a SCCL is divided as 4 banks. Each L3 cache bank have separate +PMU registers for event counting and control. The L3 cache banks also do not +have any CPU affinity. So each L3 cache banks are registered with perf as a +separate PMU. +The PMU name will appear in event listing as hisi_l3c_. +where "bank-id" is the bank index (0 to 3) and "scl-id" is the SCCL identifier +e.g. hisi_l3c0_2/read_hit is READ_HIT event of L3 cache bank #0 SCCL ID #2. + +The MN in a SCCL is registered as a separate PMU with perf. +The PMU name will appear in event listing as hisi_mn_. +e.g. hisi_mn_2/read_req. READ_REQUEST event of MN of Super CPU cluster #2. + +The event code is represented by 12 bits. + i) event 0-11 + The event code will be represented using the LSB 12 bits. + +The driver also provides a "cpumask" sysfs attribute, which shows the CPU core +ID used to count the uncore PMU event. + +Example usage of perf: +$# perf list +hisi_l3c0_2/read_hit/ [kernel PMU event] +-- +hisi_l3c1_2/write_hit/ [kernel PMU event] +-- +hisi_l3c0_1/read_hit/ [kernel PMU event] +-- +hisi_l3c0_1/write_hit/ [kernel PMU event] +-- +hisi_mn_2/read_req/ [kernel PMU event] +hisi_mn_2/write_req/ [kernel PMU event] +-- + +$# perf stat -a -e "hisi_l3c0_2/read_allocate/" sleep 5 + +$# perf stat -A -C 0 -e "hisi_l3c0_2/read_allocate/" sleep 5 + +The current driver doesnot support sampling. so "perf record" is unsupported. +Also attach to a task is unsupported as the events are all uncore. + +Note: Please contact the maintainer for a complete list of events supported for +the PMU devices in the SoC and its information if needed. -- 2.1.4
[RESEND PATCH v5 04/11 (Missed 04/11 in PATCH v5 series)] Documentation: perf: hisi: Documentation for HiP05/06/07 PMU event counting.
Documentation for perf usage and Hisilicon SoC PMU uncore events. The Hisilicon SOC has event counters for hardware modules like L3 cache, Miscellaneous node etc. These events are all uncore. Signed-off-by: Anurup M Signed-off-by: Shaokun Zhang --- Documentation/perf/hisi-pmu.txt | 76 + 1 file changed, 76 insertions(+) create mode 100644 Documentation/perf/hisi-pmu.txt diff --git a/Documentation/perf/hisi-pmu.txt b/Documentation/perf/hisi-pmu.txt new file mode 100644 index 000..e3ac562 --- /dev/null +++ b/Documentation/perf/hisi-pmu.txt @@ -0,0 +1,76 @@ +Hisilicon SoC PMU (Performance Monitoring Unit) + +The Hisilicon SoC HiP05/06/07 chips consist of various independent system +device PMU's such as L3 cache(L3C) and Miscellaneous Nodes(MN). +These PMU devices are independent and have hardware logic to gather +statistics and performance information. + +HiP0x chips are encapsulated by multiple CPU and IO die's. The CPU die is +called as Super CPU cluster (SCCL) which includes 16 cpu-cores. Every SCCL +is further grouped as CPU clusters (CCL) which includes 4 cpu-cores each. +Each SCCL has 1 L3 cache and 1 MN units. + +The L3 cache is shared by all CPU cores in a CPU die. The L3C has four banks +(or instances). Each bank or instance of L3C has Eight 32-bit counter +registers and also event control registers. The HiP05/06 chip L3 cache has +22 statistics events. The HiP07 chip has 66 statistics events. These events +are very useful for debugging. + +The MN module is also shared by all CPU cores in a CPU die. It receives +barriers and DVM(Distributed Virtual Memory) messages from cpu or smmu, and +perform the required actions and return response messages. These events are +very useful for debugging. The MN has total 9 statistics events and support +four 32-bit counter registers in HiP05/06/07 chips. + +There is no memory mapping for L3 cache and MN registers. It can be accessed +by using the Hisilicon djtag interface. The Djtag in a SCCL is an independent +module which connects with some modules in the SoC by Debug Bus. + +Hisilicon SoC (HiP05/06/07) PMU driver +-- +The HiP0x PMU driver shall register perf PMU drivers like L3 cache, MN, etc. +The available events and configuration options shall be described in the sysfs. +The "perf list" shall list the available events from sysfs. + +The L3 cache in a SCCL is divided as 4 banks. Each L3 cache bank have separate +PMU registers for event counting and control. The L3 cache banks also do not +have any CPU affinity. So each L3 cache banks are registered with perf as a +separate PMU. +The PMU name will appear in event listing as hisi_l3c_. +where "bank-id" is the bank index (0 to 3) and "scl-id" is the SCCL identifier +e.g. hisi_l3c0_2/read_hit is READ_HIT event of L3 cache bank #0 SCCL ID #2. + +The MN in a SCCL is registered as a separate PMU with perf. +The PMU name will appear in event listing as hisi_mn_. +e.g. hisi_mn_2/read_req. READ_REQUEST event of MN of Super CPU cluster #2. + +The event code is represented by 12 bits. + i) event 0-11 + The event code will be represented using the LSB 12 bits. + +The driver also provides a "cpumask" sysfs attribute, which shows the CPU core +ID used to count the uncore PMU event. + +Example usage of perf: +$# perf list +hisi_l3c0_2/read_hit/ [kernel PMU event] +-- +hisi_l3c1_2/write_hit/ [kernel PMU event] +-- +hisi_l3c0_1/read_hit/ [kernel PMU event] +-- +hisi_l3c0_1/write_hit/ [kernel PMU event] +-- +hisi_mn_2/read_req/ [kernel PMU event] +hisi_mn_2/write_req/ [kernel PMU event] +-- + +$# perf stat -a -e "hisi_l3c0_2/read_allocate/" sleep 5 + +$# perf stat -A -C 0 -e "hisi_l3c0_2/read_allocate/" sleep 5 + +The current driver doesnot support sampling. so "perf record" is unsupported. +Also attach to a task is unsupported as the events are all uncore. + +Note: Please contact the maintainer for a complete list of events supported for +the PMU devices in the SoC and its information if needed. -- 2.1.4
Re: [PATCH v17 2/3] usb: USB Type-C connector class
On 03/02/2017 07:35 PM, Peter Chen wrote: On Tue, Feb 21, 2017 at 05:24:04PM +0300, Heikki Krogerus wrote: +/* --- */ +/* Driver callbacks to report role updates */ + +/** + * typec_set_data_role - Report data role change + * @port: The USB Type-C Port where the role was changed + * @role: The new data role + * + * This routine is used by the port drivers to report data role changes. + */ +void typec_set_data_role(struct typec_port *port, enum typec_data_role role) +{ + if (port->data_role == role) + return; + + port->data_role = role; + sysfs_notify(>dev.kobj, NULL, "data_role"); + kobject_uevent(>dev.kobj, KOBJ_CHANGE); +} +EXPORT_SYMBOL_GPL(typec_set_data_role); + Hi Keikki, Have you tested this interface with real dual-role controller/board? If it helps, my primary test system is a HP Chromebook 13 G1. What interface you use when you receive this event to handle dual-role switch? I am wonder if a common dual-role class is needed, then we can have a common user utility. I don't really understand "What interface you use when you receive this event". Can you explain ? Eg, if "data_role" has changed, the udev can echo "data_role" to /sys/class/usb-dual-role/role That sounds like a kernel event delivered to user space via udev or sysfs notification and returned back into the kernel through a sysfs attribute. Do I understand that correctly ? Thanks, Guenter Maybe we can enhance Roger's drd framework [1] to fulfill that. [1] https://lwn.net/Articles/682531/
Re: [PATCH v17 2/3] usb: USB Type-C connector class
On 03/02/2017 07:35 PM, Peter Chen wrote: On Tue, Feb 21, 2017 at 05:24:04PM +0300, Heikki Krogerus wrote: +/* --- */ +/* Driver callbacks to report role updates */ + +/** + * typec_set_data_role - Report data role change + * @port: The USB Type-C Port where the role was changed + * @role: The new data role + * + * This routine is used by the port drivers to report data role changes. + */ +void typec_set_data_role(struct typec_port *port, enum typec_data_role role) +{ + if (port->data_role == role) + return; + + port->data_role = role; + sysfs_notify(>dev.kobj, NULL, "data_role"); + kobject_uevent(>dev.kobj, KOBJ_CHANGE); +} +EXPORT_SYMBOL_GPL(typec_set_data_role); + Hi Keikki, Have you tested this interface with real dual-role controller/board? If it helps, my primary test system is a HP Chromebook 13 G1. What interface you use when you receive this event to handle dual-role switch? I am wonder if a common dual-role class is needed, then we can have a common user utility. I don't really understand "What interface you use when you receive this event". Can you explain ? Eg, if "data_role" has changed, the udev can echo "data_role" to /sys/class/usb-dual-role/role That sounds like a kernel event delivered to user space via udev or sysfs notification and returned back into the kernel through a sysfs attribute. Do I understand that correctly ? Thanks, Guenter Maybe we can enhance Roger's drd framework [1] to fulfill that. [1] https://lwn.net/Articles/682531/
[git pull] vfs.git statx
Rebased, with fixup from -next folded in. A branch matching what was sitting in -next is #merge-2, and ; git cat-file commit rebased-statx|grep tree tree 0e87b93d5902009d46d5faf25c3039ef8f668490 ; git cat-file commit merge-2|grep tree tree 0e87b93d5902009d46d5faf25c3039ef8f668490 IOW, the trees are identical, so we don't lose any testing done in -next and rebased branch is obviously saner. The following changes since commit bbe08c0a43e2c5ee3a00de68c0e867a08a9aa990: Merge branch 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs (2017-03-02 16:03:00 -0800) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git rebased-statx for you to fetch changes up to a528d35e8bfcc521d7cb70aaf03e1bd296c8493f: statx: Add a system call to make enhanced file info available (2017-03-02 20:51:15 -0500) David Howells (1): statx: Add a system call to make enhanced file info available Documentation/filesystems/Locking | 3 +- Documentation/filesystems/vfs.txt | 3 +- arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + drivers/base/devtmpfs.c| 3 +- drivers/block/loop.c | 3 +- drivers/mtd/ubi/build.c| 2 +- drivers/mtd/ubi/kapi.c | 2 +- drivers/staging/lustre/lustre/llite/file.c | 9 +- .../staging/lustre/lustre/llite/llite_internal.h | 3 +- fs/9p/vfs_inode.c | 10 +- fs/9p/vfs_inode_dotl.c | 5 +- fs/afs/inode.c | 8 +- fs/afs/internal.h | 2 +- fs/bad_inode.c | 4 +- fs/btrfs/inode.c | 6 +- fs/ceph/inode.c| 6 +- fs/ceph/super.h| 4 +- fs/cifs/cifsfs.h | 2 +- fs/cifs/inode.c| 5 +- fs/coda/coda_linux.h | 2 +- fs/coda/inode.c| 7 +- fs/ecryptfs/inode.c| 13 +- fs/exportfs/expfs.c| 3 +- fs/ext4/ext4.h | 3 +- fs/ext4/inode.c| 6 +- fs/f2fs/f2fs.h | 4 +- fs/f2fs/file.c | 6 +- fs/fat/fat.h | 4 +- fs/fat/file.c | 5 +- fs/fuse/dir.c | 6 +- fs/gfs2/inode.c| 11 +- fs/kernfs/inode.c | 8 +- fs/kernfs/kernfs-internal.h| 4 +- fs/libfs.c | 12 +- fs/minix/inode.c | 11 +- fs/minix/minix.h | 2 +- fs/nfs/inode.c | 13 +- fs/nfs/namespace.c | 9 +- fs/nfsd/nfs4xdr.c | 4 +- fs/nfsd/vfs.h | 3 +- fs/ocfs2/file.c| 11 +- fs/ocfs2/file.h| 4 +- fs/orangefs/inode.c| 13 +- fs/orangefs/orangefs-kernel.h | 5 +- fs/overlayfs/copy_up.c | 6 +- fs/overlayfs/dir.c | 10 +- fs/overlayfs/inode.c | 7 +- fs/proc/base.c | 12 +- fs/proc/generic.c | 6 +- fs/proc/internal.h | 2 +- fs/proc/proc_net.c | 6 +- fs/proc/proc_sysctl.c | 5 +- fs/proc/root.c | 6 +- fs/stat.c | 214 ++--- fs/sysv/itree.c| 7 +- fs/sysv/sysv.h | 2 +- fs/ubifs/dir.c | 6 +- fs/ubifs/ubifs.h | 4 +- fs/udf/symlink.c | 5 +- fs/xfs/xfs_iops.c | 9 +- include/linux/fs.h | 35 ++- include/linux/nfs_fs.h | 2 +- include/linux/stat.h | 24 +- include/linux/syscalls.h
[git pull] vfs.git statx
Rebased, with fixup from -next folded in. A branch matching what was sitting in -next is #merge-2, and ; git cat-file commit rebased-statx|grep tree tree 0e87b93d5902009d46d5faf25c3039ef8f668490 ; git cat-file commit merge-2|grep tree tree 0e87b93d5902009d46d5faf25c3039ef8f668490 IOW, the trees are identical, so we don't lose any testing done in -next and rebased branch is obviously saner. The following changes since commit bbe08c0a43e2c5ee3a00de68c0e867a08a9aa990: Merge branch 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs (2017-03-02 16:03:00 -0800) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git rebased-statx for you to fetch changes up to a528d35e8bfcc521d7cb70aaf03e1bd296c8493f: statx: Add a system call to make enhanced file info available (2017-03-02 20:51:15 -0500) David Howells (1): statx: Add a system call to make enhanced file info available Documentation/filesystems/Locking | 3 +- Documentation/filesystems/vfs.txt | 3 +- arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + drivers/base/devtmpfs.c| 3 +- drivers/block/loop.c | 3 +- drivers/mtd/ubi/build.c| 2 +- drivers/mtd/ubi/kapi.c | 2 +- drivers/staging/lustre/lustre/llite/file.c | 9 +- .../staging/lustre/lustre/llite/llite_internal.h | 3 +- fs/9p/vfs_inode.c | 10 +- fs/9p/vfs_inode_dotl.c | 5 +- fs/afs/inode.c | 8 +- fs/afs/internal.h | 2 +- fs/bad_inode.c | 4 +- fs/btrfs/inode.c | 6 +- fs/ceph/inode.c| 6 +- fs/ceph/super.h| 4 +- fs/cifs/cifsfs.h | 2 +- fs/cifs/inode.c| 5 +- fs/coda/coda_linux.h | 2 +- fs/coda/inode.c| 7 +- fs/ecryptfs/inode.c| 13 +- fs/exportfs/expfs.c| 3 +- fs/ext4/ext4.h | 3 +- fs/ext4/inode.c| 6 +- fs/f2fs/f2fs.h | 4 +- fs/f2fs/file.c | 6 +- fs/fat/fat.h | 4 +- fs/fat/file.c | 5 +- fs/fuse/dir.c | 6 +- fs/gfs2/inode.c| 11 +- fs/kernfs/inode.c | 8 +- fs/kernfs/kernfs-internal.h| 4 +- fs/libfs.c | 12 +- fs/minix/inode.c | 11 +- fs/minix/minix.h | 2 +- fs/nfs/inode.c | 13 +- fs/nfs/namespace.c | 9 +- fs/nfsd/nfs4xdr.c | 4 +- fs/nfsd/vfs.h | 3 +- fs/ocfs2/file.c| 11 +- fs/ocfs2/file.h| 4 +- fs/orangefs/inode.c| 13 +- fs/orangefs/orangefs-kernel.h | 5 +- fs/overlayfs/copy_up.c | 6 +- fs/overlayfs/dir.c | 10 +- fs/overlayfs/inode.c | 7 +- fs/proc/base.c | 12 +- fs/proc/generic.c | 6 +- fs/proc/internal.h | 2 +- fs/proc/proc_net.c | 6 +- fs/proc/proc_sysctl.c | 5 +- fs/proc/root.c | 6 +- fs/stat.c | 214 ++--- fs/sysv/itree.c| 7 +- fs/sysv/sysv.h | 2 +- fs/ubifs/dir.c | 6 +- fs/ubifs/ubifs.h | 4 +- fs/udf/symlink.c | 5 +- fs/xfs/xfs_iops.c | 9 +- include/linux/fs.h | 35 ++- include/linux/nfs_fs.h | 2 +- include/linux/stat.h | 24 +- include/linux/syscalls.h
Re: [RFC] arm64: support HAVE_ARCH_RARE_WRITE
On Thu, Mar 2, 2017 at 7:00 AM, Hoeun Ryuwrote: > This RFC is a quick and dirty arm64 implementation for Kees Cook's RFC for > rare_write infrastructure [1]. Awesome! :) > This implementation is based on Mark Rutland's suggestions, which is that > a special userspace mm that maps only __start/end_rodata as RW permission > is prepared during early boot time (paging_init) and __arch_rare_write_map() > switches to the mm [2]. > > Due to the limit of implementation (the mm having RW mapping is userspace > mm), we need a new arch-specific __arch_rare_write_ptr() to convert RO > address to RW address (CONFIG_HAVE_RARE_WRITE_PTR is added), which is > general for all architectures (__rare_write_ptr()) in Kees's RFC . So all > writes should be instrumented by __rare_write(). Cool, yeah, I'll get all this fixed up in my next version. > One caveat for arm64 is CONFIG_ARM64_SW_TTBR0_PAN. > Because __arch_rare_write_map() installes a special user mm to ttbr0, > usercopy inside __arch_rare_write_map/unmap() pair will break rare_write. > (uaccess_enable() replaces the special mm and RW alias is no longer valid.) That's totally fine constraint: this case should never happen for so many reasons. :) > A similar problem could rise in general usercopy inside > __arch_rare_write_map/unmap(). __arch_rare_write_map() replaces current->mm, > so we loose the address space of the `current` process. > > It passes LKDTM's rare write test. > > [1] : http://www.openwall.com/lists/kernel-hardening/2017/02/27/5 > [2] : https://lkml.org/lkml/2017/2/22/254 > > Signed-off-by: Hoeun Ryu -Kees -- Kees Cook Pixel Security
Re: [RFC] arm64: support HAVE_ARCH_RARE_WRITE
On Thu, Mar 2, 2017 at 7:00 AM, Hoeun Ryu wrote: > This RFC is a quick and dirty arm64 implementation for Kees Cook's RFC for > rare_write infrastructure [1]. Awesome! :) > This implementation is based on Mark Rutland's suggestions, which is that > a special userspace mm that maps only __start/end_rodata as RW permission > is prepared during early boot time (paging_init) and __arch_rare_write_map() > switches to the mm [2]. > > Due to the limit of implementation (the mm having RW mapping is userspace > mm), we need a new arch-specific __arch_rare_write_ptr() to convert RO > address to RW address (CONFIG_HAVE_RARE_WRITE_PTR is added), which is > general for all architectures (__rare_write_ptr()) in Kees's RFC . So all > writes should be instrumented by __rare_write(). Cool, yeah, I'll get all this fixed up in my next version. > One caveat for arm64 is CONFIG_ARM64_SW_TTBR0_PAN. > Because __arch_rare_write_map() installes a special user mm to ttbr0, > usercopy inside __arch_rare_write_map/unmap() pair will break rare_write. > (uaccess_enable() replaces the special mm and RW alias is no longer valid.) That's totally fine constraint: this case should never happen for so many reasons. :) > A similar problem could rise in general usercopy inside > __arch_rare_write_map/unmap(). __arch_rare_write_map() replaces current->mm, > so we loose the address space of the `current` process. > > It passes LKDTM's rare write test. > > [1] : http://www.openwall.com/lists/kernel-hardening/2017/02/27/5 > [2] : https://lkml.org/lkml/2017/2/22/254 > > Signed-off-by: Hoeun Ryu -Kees -- Kees Cook Pixel Security
[PATCH] can: m_can: support transmit frame in CAN FD format
Add support to transmit the frame in the CAN FD format and with the bit rate switching. Tested on SAMA5D2 Xplained board. Signed-off-by: Wenyou Yang--- The testing is based on [RESEND PATCH 1/1] can: m_can: fix bitrate setup on latest silicon http://lkml.iu.edu/hypermail/linux/kernel/1702.1/05347.html drivers/net/can/m_can/m_can.c | 21 + 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c index 195f15edb32e..9ef9b337d25b 100644 --- a/drivers/net/can/m_can/m_can.c +++ b/drivers/net/can/m_can/m_can.c @@ -266,8 +266,12 @@ enum m_can_mram_cfg { /* Tx Buffer Element */ /* R0 */ +#define TX_BUF_ESI BIT(31) #define TX_BUF_XTD BIT(30) #define TX_BUF_RTR BIT(29) +#define TX_BUF_EFC BIT(23) +#define TX_BUF_EDL BIT(21) +#define TX_BUF_BRS BIT(20) /* address offset and element number for each FIFO/Buffer in the Message RAM */ struct mram_cfg { @@ -884,7 +888,7 @@ static void m_can_chip_config(struct net_device *dev) } if (priv->can.ctrlmode & CAN_CTRLMODE_FD) - cccr |= CCCR_CME_CANFD_BRS << CCCR_CME_SHIFT; + cccr |= (CCCR_CME_CANFD_BRS | CCCR_CME_CANFD) << CCCR_CME_SHIFT; m_can_write(priv, M_CAN_CCCR, cccr); m_can_write(priv, M_CAN_TEST, test); @@ -1047,6 +1051,7 @@ static netdev_tx_t m_can_start_xmit(struct sk_buff *skb, struct canfd_frame *cf = (struct canfd_frame *)skb->data; u32 id, cccr; int i; + u32 dlc; if (can_dropped_invalid_skb(dev, skb)) return NETDEV_TX_OK; @@ -1065,7 +1070,6 @@ static netdev_tx_t m_can_start_xmit(struct sk_buff *skb, /* message ram configuration */ m_can_fifo_write(priv, 0, M_CAN_FIFO_ID, id); - m_can_fifo_write(priv, 0, M_CAN_FIFO_DLC, can_len2dlc(cf->len) << 16); for (i = 0; i < cf->len; i += 4) m_can_fifo_write(priv, 0, M_CAN_FIFO_DATA(i / 4), @@ -1073,20 +1077,29 @@ static netdev_tx_t m_can_start_xmit(struct sk_buff *skb, can_put_echo_skb(skb, dev, 0); + dlc = can_len2dlc(cf->len) << 16; + if (priv->can.ctrlmode & CAN_CTRLMODE_FD) { cccr = m_can_read(priv, M_CAN_CCCR); cccr &= ~(CCCR_CMR_MASK << CCCR_CMR_SHIFT); if (can_is_canfd_skb(skb)) { - if (cf->flags & CANFD_BRS) + dlc |= TX_BUF_EDL; + if (cf->flags & CANFD_ESI) + dlc |= TX_BUF_ESI; + if (cf->flags & CANFD_BRS) { + dlc |= TX_BUF_BRS; cccr |= CCCR_CMR_CANFD_BRS << CCCR_CMR_SHIFT; - else + } else { cccr |= CCCR_CMR_CANFD << CCCR_CMR_SHIFT; + } } else { cccr |= CCCR_CMR_CAN << CCCR_CMR_SHIFT; } m_can_write(priv, M_CAN_CCCR, cccr); } + m_can_fifo_write(priv, 0, M_CAN_FIFO_DLC, dlc); + /* enable first TX buffer to start transfer */ m_can_write(priv, M_CAN_TXBTIE, 0x1); m_can_write(priv, M_CAN_TXBAR, 0x1); -- 2.11.0
[PATCH] can: m_can: support transmit frame in CAN FD format
Add support to transmit the frame in the CAN FD format and with the bit rate switching. Tested on SAMA5D2 Xplained board. Signed-off-by: Wenyou Yang --- The testing is based on [RESEND PATCH 1/1] can: m_can: fix bitrate setup on latest silicon http://lkml.iu.edu/hypermail/linux/kernel/1702.1/05347.html drivers/net/can/m_can/m_can.c | 21 + 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c index 195f15edb32e..9ef9b337d25b 100644 --- a/drivers/net/can/m_can/m_can.c +++ b/drivers/net/can/m_can/m_can.c @@ -266,8 +266,12 @@ enum m_can_mram_cfg { /* Tx Buffer Element */ /* R0 */ +#define TX_BUF_ESI BIT(31) #define TX_BUF_XTD BIT(30) #define TX_BUF_RTR BIT(29) +#define TX_BUF_EFC BIT(23) +#define TX_BUF_EDL BIT(21) +#define TX_BUF_BRS BIT(20) /* address offset and element number for each FIFO/Buffer in the Message RAM */ struct mram_cfg { @@ -884,7 +888,7 @@ static void m_can_chip_config(struct net_device *dev) } if (priv->can.ctrlmode & CAN_CTRLMODE_FD) - cccr |= CCCR_CME_CANFD_BRS << CCCR_CME_SHIFT; + cccr |= (CCCR_CME_CANFD_BRS | CCCR_CME_CANFD) << CCCR_CME_SHIFT; m_can_write(priv, M_CAN_CCCR, cccr); m_can_write(priv, M_CAN_TEST, test); @@ -1047,6 +1051,7 @@ static netdev_tx_t m_can_start_xmit(struct sk_buff *skb, struct canfd_frame *cf = (struct canfd_frame *)skb->data; u32 id, cccr; int i; + u32 dlc; if (can_dropped_invalid_skb(dev, skb)) return NETDEV_TX_OK; @@ -1065,7 +1070,6 @@ static netdev_tx_t m_can_start_xmit(struct sk_buff *skb, /* message ram configuration */ m_can_fifo_write(priv, 0, M_CAN_FIFO_ID, id); - m_can_fifo_write(priv, 0, M_CAN_FIFO_DLC, can_len2dlc(cf->len) << 16); for (i = 0; i < cf->len; i += 4) m_can_fifo_write(priv, 0, M_CAN_FIFO_DATA(i / 4), @@ -1073,20 +1077,29 @@ static netdev_tx_t m_can_start_xmit(struct sk_buff *skb, can_put_echo_skb(skb, dev, 0); + dlc = can_len2dlc(cf->len) << 16; + if (priv->can.ctrlmode & CAN_CTRLMODE_FD) { cccr = m_can_read(priv, M_CAN_CCCR); cccr &= ~(CCCR_CMR_MASK << CCCR_CMR_SHIFT); if (can_is_canfd_skb(skb)) { - if (cf->flags & CANFD_BRS) + dlc |= TX_BUF_EDL; + if (cf->flags & CANFD_ESI) + dlc |= TX_BUF_ESI; + if (cf->flags & CANFD_BRS) { + dlc |= TX_BUF_BRS; cccr |= CCCR_CMR_CANFD_BRS << CCCR_CMR_SHIFT; - else + } else { cccr |= CCCR_CMR_CANFD << CCCR_CMR_SHIFT; + } } else { cccr |= CCCR_CMR_CAN << CCCR_CMR_SHIFT; } m_can_write(priv, M_CAN_CCCR, cccr); } + m_can_fifo_write(priv, 0, M_CAN_FIFO_DLC, dlc); + /* enable first TX buffer to start transfer */ m_can_write(priv, M_CAN_TXBTIE, 0x1); m_can_write(priv, M_CAN_TXBAR, 0x1); -- 2.11.0
[PATCH v7 kernel 4/5] virtio-balloon: define flags and head for host request vq
From: Liang LiDefine the flags and head struct for a new host request virtual queue. Guest can get requests from host and then responds to them on this new virtual queue. Host can make use of this virtqueue to request the guest to do some operations, e.g. drop page cache, synchronize file system, etc. The hypervisor can get some of guest's runtime information through this virtual queue too, e.g. the guest's unused page information, which can be used for live migration optimization. Signed-off-by: Liang Li Signed-off-by: Wei Wang Cc: Andrew Morton Cc: Mel Gorman Cc: Michael S. Tsirkin Cc: Paolo Bonzini Cc: Cornelia Huck Cc: Amit Shah Cc: Dave Hansen Cc: Andrea Arcangeli Cc: David Hildenbrand Cc: Liang Li Cc: Wei Wang --- include/uapi/linux/virtio_balloon.h | 22 ++ 1 file changed, 22 insertions(+) diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h index ed627b2..630b0ef 100644 --- a/include/uapi/linux/virtio_balloon.h +++ b/include/uapi/linux/virtio_balloon.h @@ -35,6 +35,7 @@ #define VIRTIO_BALLOON_F_STATS_VQ 1 /* Memory Stats virtqueue */ #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM2 /* Deflate balloon on OOM */ #define VIRTIO_BALLOON_F_CHUNK_TRANSFER3 /* Transfer pages in chunks */ +#define VIRTIO_BALLOON_F_HOST_REQ_VQ 4 /* Host request virtqueue */ /* Size of a PFN in the balloon interface. */ #define VIRTIO_BALLOON_PFN_SHIFT 12 @@ -94,4 +95,25 @@ struct virtio_balloon_resp_hdr { __le32 data_len; /* Payload len in bytes */ }; +enum virtio_balloon_req_id { + /* Get unused page information */ + BALLOON_GET_UNUSED_PAGES, +}; + +enum virtio_balloon_flag { + /* Have more data for a request */ + BALLOON_FLAG_CONT, + /* No more data for a request */ + BALLOON_FLAG_DONE, +}; + +struct virtio_balloon_req_hdr { + /* Used to distinguish different requests */ + __le16 cmd; + /* Reserved */ + __le16 reserved[3]; + /* Request parameter */ + __le64 param; +}; + #endif /* _LINUX_VIRTIO_BALLOON_H */ -- 2.7.4
[PATCH v7 kernel 5/5] This patch contains two parts:
From: Liang LiOne is to add a new API to mm go get the unused page information. The virtio balloon driver will use this new API added to get the unused page info and send it to hypervisor(QEMU) to speed up live migration. During sending the bitmap, some the pages may be modified and are used by the guest, this inaccuracy can be corrected by the dirty page logging mechanism. One is to add support the request for vm's unused page information, QEMU can make use of unused page information and the dirty page logging mechanism to skip the transportation of some of these unused pages, this is very helpful to reduce the network traffic and speed up the live migration process. Signed-off-by: Liang Li Signed-off-by: Wei Wang Cc: Andrew Morton Cc: Mel Gorman Cc: Michael S. Tsirkin Cc: Paolo Bonzini Cc: Cornelia Huck Cc: Amit Shah Cc: Dave Hansen Cc: Andrea Arcangeli Cc: David Hildenbrand Cc: Liang Li Cc: Wei Wang --- drivers/virtio/virtio_balloon.c | 137 ++-- include/linux/mm.h | 3 + mm/page_alloc.c | 120 +++ 3 files changed, 255 insertions(+), 5 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 4416370..9b6cf44f 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -66,7 +66,7 @@ struct balloon_page_chunk_ext { struct virtio_balloon { struct virtio_device *vdev; - struct virtqueue *inflate_vq, *deflate_vq, *stats_vq; + struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *host_req_vq; /* The balloon servicing is delegated to a freezable workqueue. */ struct work_struct update_balloon_stats_work; @@ -95,6 +95,8 @@ struct virtio_balloon { unsigned int nr_page_bmap; /* Used to record the processed pfn range */ unsigned long min_pfn, max_pfn, start_pfn, end_pfn; + /* Request header */ + struct virtio_balloon_req_hdr req_hdr; /* * The pages we've told the Host we're not using are enqueued * at vb_dev_info->pages list. @@ -549,6 +551,80 @@ static void stats_handle_request(struct virtio_balloon *vb) virtqueue_kick(vq); } +static void __send_unused_pages(struct virtio_balloon *vb, + unsigned long req_id, unsigned int pos, bool done) +{ + struct virtio_balloon_resp_hdr *hdr = >resp_hdr; + struct virtqueue *vq = vb->host_req_vq; + + vb->resp_pos = pos; + hdr->cmd = BALLOON_GET_UNUSED_PAGES; + hdr->id = req_id; + if (!done) + hdr->flag = BALLOON_FLAG_CONT; + else + hdr->flag = BALLOON_FLAG_DONE; + + if (pos > 0 || done) + send_resp_data(vb, vq, true); + +} + +static void send_unused_pages(struct virtio_balloon *vb, + unsigned long req_id) +{ + struct scatterlist sg_in; + unsigned int pos = 0; + struct virtqueue *vq = vb->host_req_vq; + int ret, order; + struct zone *zone = NULL; + bool part_fill = false; + + mutex_lock(>balloon_lock); + + for (order = MAX_ORDER - 1; order >= 0; order--) { + ret = mark_unused_pages(, order, vb->resp_data, +vb->resp_buf_size / sizeof(__le64), +, VIRTIO_BALLOON_CHUNK_SIZE_SHIFT, part_fill); + if (ret == -ENOSPC) { + if (pos == 0) { + void *new_resp_data; + + new_resp_data = kmalloc(2 * vb->resp_buf_size, + GFP_KERNEL); + if (new_resp_data) { + kfree(vb->resp_data); + vb->resp_data = new_resp_data; + vb->resp_buf_size *= 2; + } else { + part_fill = true; + dev_warn(>vdev->dev, +"%s: part fill order: %d\n", +__func__, order); + } + } else { + __send_unused_pages(vb, req_id, pos, false); + pos = 0; + } + + if (!part_fill) { + order++; + continue; + } + } else + zone = NULL; + +
[PATCH v7 kernel 4/5] virtio-balloon: define flags and head for host request vq
From: Liang Li Define the flags and head struct for a new host request virtual queue. Guest can get requests from host and then responds to them on this new virtual queue. Host can make use of this virtqueue to request the guest to do some operations, e.g. drop page cache, synchronize file system, etc. The hypervisor can get some of guest's runtime information through this virtual queue too, e.g. the guest's unused page information, which can be used for live migration optimization. Signed-off-by: Liang Li Signed-off-by: Wei Wang Cc: Andrew Morton Cc: Mel Gorman Cc: Michael S. Tsirkin Cc: Paolo Bonzini Cc: Cornelia Huck Cc: Amit Shah Cc: Dave Hansen Cc: Andrea Arcangeli Cc: David Hildenbrand Cc: Liang Li Cc: Wei Wang --- include/uapi/linux/virtio_balloon.h | 22 ++ 1 file changed, 22 insertions(+) diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h index ed627b2..630b0ef 100644 --- a/include/uapi/linux/virtio_balloon.h +++ b/include/uapi/linux/virtio_balloon.h @@ -35,6 +35,7 @@ #define VIRTIO_BALLOON_F_STATS_VQ 1 /* Memory Stats virtqueue */ #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM2 /* Deflate balloon on OOM */ #define VIRTIO_BALLOON_F_CHUNK_TRANSFER3 /* Transfer pages in chunks */ +#define VIRTIO_BALLOON_F_HOST_REQ_VQ 4 /* Host request virtqueue */ /* Size of a PFN in the balloon interface. */ #define VIRTIO_BALLOON_PFN_SHIFT 12 @@ -94,4 +95,25 @@ struct virtio_balloon_resp_hdr { __le32 data_len; /* Payload len in bytes */ }; +enum virtio_balloon_req_id { + /* Get unused page information */ + BALLOON_GET_UNUSED_PAGES, +}; + +enum virtio_balloon_flag { + /* Have more data for a request */ + BALLOON_FLAG_CONT, + /* No more data for a request */ + BALLOON_FLAG_DONE, +}; + +struct virtio_balloon_req_hdr { + /* Used to distinguish different requests */ + __le16 cmd; + /* Reserved */ + __le16 reserved[3]; + /* Request parameter */ + __le64 param; +}; + #endif /* _LINUX_VIRTIO_BALLOON_H */ -- 2.7.4
[PATCH v7 kernel 5/5] This patch contains two parts:
From: Liang Li One is to add a new API to mm go get the unused page information. The virtio balloon driver will use this new API added to get the unused page info and send it to hypervisor(QEMU) to speed up live migration. During sending the bitmap, some the pages may be modified and are used by the guest, this inaccuracy can be corrected by the dirty page logging mechanism. One is to add support the request for vm's unused page information, QEMU can make use of unused page information and the dirty page logging mechanism to skip the transportation of some of these unused pages, this is very helpful to reduce the network traffic and speed up the live migration process. Signed-off-by: Liang Li Signed-off-by: Wei Wang Cc: Andrew Morton Cc: Mel Gorman Cc: Michael S. Tsirkin Cc: Paolo Bonzini Cc: Cornelia Huck Cc: Amit Shah Cc: Dave Hansen Cc: Andrea Arcangeli Cc: David Hildenbrand Cc: Liang Li Cc: Wei Wang --- drivers/virtio/virtio_balloon.c | 137 ++-- include/linux/mm.h | 3 + mm/page_alloc.c | 120 +++ 3 files changed, 255 insertions(+), 5 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 4416370..9b6cf44f 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -66,7 +66,7 @@ struct balloon_page_chunk_ext { struct virtio_balloon { struct virtio_device *vdev; - struct virtqueue *inflate_vq, *deflate_vq, *stats_vq; + struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *host_req_vq; /* The balloon servicing is delegated to a freezable workqueue. */ struct work_struct update_balloon_stats_work; @@ -95,6 +95,8 @@ struct virtio_balloon { unsigned int nr_page_bmap; /* Used to record the processed pfn range */ unsigned long min_pfn, max_pfn, start_pfn, end_pfn; + /* Request header */ + struct virtio_balloon_req_hdr req_hdr; /* * The pages we've told the Host we're not using are enqueued * at vb_dev_info->pages list. @@ -549,6 +551,80 @@ static void stats_handle_request(struct virtio_balloon *vb) virtqueue_kick(vq); } +static void __send_unused_pages(struct virtio_balloon *vb, + unsigned long req_id, unsigned int pos, bool done) +{ + struct virtio_balloon_resp_hdr *hdr = >resp_hdr; + struct virtqueue *vq = vb->host_req_vq; + + vb->resp_pos = pos; + hdr->cmd = BALLOON_GET_UNUSED_PAGES; + hdr->id = req_id; + if (!done) + hdr->flag = BALLOON_FLAG_CONT; + else + hdr->flag = BALLOON_FLAG_DONE; + + if (pos > 0 || done) + send_resp_data(vb, vq, true); + +} + +static void send_unused_pages(struct virtio_balloon *vb, + unsigned long req_id) +{ + struct scatterlist sg_in; + unsigned int pos = 0; + struct virtqueue *vq = vb->host_req_vq; + int ret, order; + struct zone *zone = NULL; + bool part_fill = false; + + mutex_lock(>balloon_lock); + + for (order = MAX_ORDER - 1; order >= 0; order--) { + ret = mark_unused_pages(, order, vb->resp_data, +vb->resp_buf_size / sizeof(__le64), +, VIRTIO_BALLOON_CHUNK_SIZE_SHIFT, part_fill); + if (ret == -ENOSPC) { + if (pos == 0) { + void *new_resp_data; + + new_resp_data = kmalloc(2 * vb->resp_buf_size, + GFP_KERNEL); + if (new_resp_data) { + kfree(vb->resp_data); + vb->resp_data = new_resp_data; + vb->resp_buf_size *= 2; + } else { + part_fill = true; + dev_warn(>vdev->dev, +"%s: part fill order: %d\n", +__func__, order); + } + } else { + __send_unused_pages(vb, req_id, pos, false); + pos = 0; + } + + if (!part_fill) { + order++; + continue; + } + } else + zone = NULL; + + if (order == 0) + __send_unused_pages(vb, req_id, pos, true); + + } + + mutex_unlock(>balloon_lock); + sg_init_one(_in, >req_hdr, sizeof(vb->req_hdr)); + virtqueue_add_inbuf(vq, _in, 1, >req_hdr, GFP_KERNEL); + virtqueue_kick(vq); +} + static void
[PATCH v7 kernel 3/5] virtio-balloon: implementation of VIRTIO_BALLOON_F_CHUNK_TRANSFER
From: Liang LiThe implementation of the current virtio-balloon is not very efficient, because the pages are transferred to the host one by one. Here is the breakdown of the time in percentage spent on each step of the balloon inflating process (inflating 7GB of an 8GB idle guest). 1) allocating pages (6.5%) 2) sending PFNs to host (68.3%) 3) address translation (6.1%) 4) madvise (19%) It takes about 4126ms for the inflating process to complete. The above profiling shows that the bottlenecks are stage 2) and stage 4). This patch optimizes step 2) by transfering pages to the host in chunks. A chunk consists of guest physically continuous pages, and it is offered to the host via a base PFN (i.e. the start PFN of those physically continuous pages) and the size (i.e. the total number of the pages). A normal chunk is formated as below: --- | Base (52 bit) | Size (12 bit)| --- For large size chunks, an extended chunk format is used: --- | Base (64 bit) | --- --- | Size (64 bit) | --- By doing so, step 4) can also be optimized by doing address translation and madvise() in chunks rather than page by page. This optimization requires the negotation of a new feature bit, VIRTIO_BALLOON_F_CHUNK_TRANSFER. With this new feature, the above ballooning process takes ~590ms resulting in an improvement of ~85%. TODO: optimize stage 1) by allocating/freeing a chunk of pages instead of a single page each time. Signed-off-by: Liang Li Signed-off-by: Wei Wang Suggested-by: Michael S. Tsirkin Cc: Michael S. Tsirkin Cc: Paolo Bonzini Cc: Cornelia Huck Cc: Amit Shah Cc: Dave Hansen Cc: Andrea Arcangeli Cc: David Hildenbrand Cc: Liang Li Cc: Wei Wang --- drivers/virtio/virtio_balloon.c | 351 1 file changed, 323 insertions(+), 28 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index f59cb4f..4416370 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -42,6 +42,10 @@ #define OOM_VBALLOON_DEFAULT_PAGES 256 #define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80 +#define PAGE_BMAP_SIZE (8 * PAGE_SIZE) +#define PFNS_PER_PAGE_BMAP (PAGE_BMAP_SIZE * BITS_PER_BYTE) +#define PAGE_BMAP_COUNT_MAX32 + static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES; module_param(oom_pages, int, S_IRUSR | S_IWUSR); MODULE_PARM_DESC(oom_pages, "pages to free on OOM"); @@ -50,6 +54,16 @@ MODULE_PARM_DESC(oom_pages, "pages to free on OOM"); static struct vfsmount *balloon_mnt; #endif +struct balloon_page_chunk { + __le64 base : 52; + __le64 size : 12; +}; + +struct balloon_page_chunk_ext { + __le64 base; + __le64 size; +}; + struct virtio_balloon { struct virtio_device *vdev; struct virtqueue *inflate_vq, *deflate_vq, *stats_vq; @@ -67,6 +81,20 @@ struct virtio_balloon { /* Number of balloon pages we've told the Host we're not using. */ unsigned int num_pages; + /* Pointer to the response header. */ + struct virtio_balloon_resp_hdr resp_hdr; + /* Pointer to the start address of response data. */ + __le64 *resp_data; + /* Size of response data buffer. */ + unsigned int resp_buf_size; + /* Pointer offset of the response data. */ + unsigned int resp_pos; + /* Bitmap used to save the pfns info */ + unsigned long *page_bitmap[PAGE_BMAP_COUNT_MAX]; + /* Number of split page bitmaps */ + unsigned int nr_page_bmap; + /* Used to record the processed pfn range */ + unsigned long min_pfn, max_pfn, start_pfn, end_pfn; /* * The pages we've told the Host we're not using are enqueued * at vb_dev_info->pages list. @@ -110,20 +138,180 @@ static void balloon_ack(struct virtqueue *vq) wake_up(>acked); } -static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq) +static inline void init_bmap_pfn_range(struct virtio_balloon *vb) { - struct scatterlist sg; + vb->min_pfn = ULONG_MAX; + vb->max_pfn = 0; +} + +static inline void update_bmap_pfn_range(struct virtio_balloon *vb, +struct page *page) +{ + unsigned long balloon_pfn = page_to_balloon_pfn(page); + + vb->min_pfn = min(balloon_pfn, vb->min_pfn); + vb->max_pfn = max(balloon_pfn, vb->max_pfn); +} + +static void
[PATCH v7 kernel 3/5] virtio-balloon: implementation of VIRTIO_BALLOON_F_CHUNK_TRANSFER
From: Liang Li The implementation of the current virtio-balloon is not very efficient, because the pages are transferred to the host one by one. Here is the breakdown of the time in percentage spent on each step of the balloon inflating process (inflating 7GB of an 8GB idle guest). 1) allocating pages (6.5%) 2) sending PFNs to host (68.3%) 3) address translation (6.1%) 4) madvise (19%) It takes about 4126ms for the inflating process to complete. The above profiling shows that the bottlenecks are stage 2) and stage 4). This patch optimizes step 2) by transfering pages to the host in chunks. A chunk consists of guest physically continuous pages, and it is offered to the host via a base PFN (i.e. the start PFN of those physically continuous pages) and the size (i.e. the total number of the pages). A normal chunk is formated as below: --- | Base (52 bit) | Size (12 bit)| --- For large size chunks, an extended chunk format is used: --- | Base (64 bit) | --- --- | Size (64 bit) | --- By doing so, step 4) can also be optimized by doing address translation and madvise() in chunks rather than page by page. This optimization requires the negotation of a new feature bit, VIRTIO_BALLOON_F_CHUNK_TRANSFER. With this new feature, the above ballooning process takes ~590ms resulting in an improvement of ~85%. TODO: optimize stage 1) by allocating/freeing a chunk of pages instead of a single page each time. Signed-off-by: Liang Li Signed-off-by: Wei Wang Suggested-by: Michael S. Tsirkin Cc: Michael S. Tsirkin Cc: Paolo Bonzini Cc: Cornelia Huck Cc: Amit Shah Cc: Dave Hansen Cc: Andrea Arcangeli Cc: David Hildenbrand Cc: Liang Li Cc: Wei Wang --- drivers/virtio/virtio_balloon.c | 351 1 file changed, 323 insertions(+), 28 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index f59cb4f..4416370 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -42,6 +42,10 @@ #define OOM_VBALLOON_DEFAULT_PAGES 256 #define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80 +#define PAGE_BMAP_SIZE (8 * PAGE_SIZE) +#define PFNS_PER_PAGE_BMAP (PAGE_BMAP_SIZE * BITS_PER_BYTE) +#define PAGE_BMAP_COUNT_MAX32 + static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES; module_param(oom_pages, int, S_IRUSR | S_IWUSR); MODULE_PARM_DESC(oom_pages, "pages to free on OOM"); @@ -50,6 +54,16 @@ MODULE_PARM_DESC(oom_pages, "pages to free on OOM"); static struct vfsmount *balloon_mnt; #endif +struct balloon_page_chunk { + __le64 base : 52; + __le64 size : 12; +}; + +struct balloon_page_chunk_ext { + __le64 base; + __le64 size; +}; + struct virtio_balloon { struct virtio_device *vdev; struct virtqueue *inflate_vq, *deflate_vq, *stats_vq; @@ -67,6 +81,20 @@ struct virtio_balloon { /* Number of balloon pages we've told the Host we're not using. */ unsigned int num_pages; + /* Pointer to the response header. */ + struct virtio_balloon_resp_hdr resp_hdr; + /* Pointer to the start address of response data. */ + __le64 *resp_data; + /* Size of response data buffer. */ + unsigned int resp_buf_size; + /* Pointer offset of the response data. */ + unsigned int resp_pos; + /* Bitmap used to save the pfns info */ + unsigned long *page_bitmap[PAGE_BMAP_COUNT_MAX]; + /* Number of split page bitmaps */ + unsigned int nr_page_bmap; + /* Used to record the processed pfn range */ + unsigned long min_pfn, max_pfn, start_pfn, end_pfn; /* * The pages we've told the Host we're not using are enqueued * at vb_dev_info->pages list. @@ -110,20 +138,180 @@ static void balloon_ack(struct virtqueue *vq) wake_up(>acked); } -static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq) +static inline void init_bmap_pfn_range(struct virtio_balloon *vb) { - struct scatterlist sg; + vb->min_pfn = ULONG_MAX; + vb->max_pfn = 0; +} + +static inline void update_bmap_pfn_range(struct virtio_balloon *vb, +struct page *page) +{ + unsigned long balloon_pfn = page_to_balloon_pfn(page); + + vb->min_pfn = min(balloon_pfn, vb->min_pfn); + vb->max_pfn = max(balloon_pfn, vb->max_pfn); +} + +static void extend_page_bitmap(struct virtio_balloon *vb, + unsigned long nr_pfn) +{ + int i, bmap_count; + unsigned long bmap_len; + + bmap_len = ALIGN(nr_pfn, BITS_PER_LONG) / BITS_PER_BYTE; + bmap_len = ALIGN(bmap_len, PAGE_BMAP_SIZE); +
Re: net/sctp: use-after-free in sctp_association_put
On Fri, Mar 3, 2017 at 3:21 AM, Dmitry Vyukovwrote: > On Thu, Mar 2, 2017 at 9:06 AM, Xin Long wrote: >> On Thu, Mar 2, 2017 at 3:18 AM, Dmitry Vyukov wrote: >>> Hello, >>> >>> I've got the following report while running syzkaller fuzzer on >>> linux-next/8813198236a044b76e251dcae937b180dd527999: >>> >>> BUG: KASAN: use-after-free in sctp_association_destroy >>> net/sctp/associola.c:416 [inline] at addr 8801c0fa415c >>> BUG: KASAN: use-after-free in sctp_association_put+0x294/0x300 >>> net/sctp/associola.c:881 at addr 8801c0fa415c >>> Read of size 1 by task syz-executor1/10956 >>> CPU: 1 PID: 10956 Comm: syz-executor1 Not tainted 4.10.0-rc7-next-20170213 >>> #1 >>> Hardware name: Google Google Compute Engine/Google Compute Engine, >>> BIOS Google 01/01/2011 >>> Call Trace: >>> >>> __dump_stack lib/dump_stack.c:15 [inline] >>> dump_stack+0x2ee/0x3ef lib/dump_stack.c:51 >>> kasan_object_err+0x1c/0x70 mm/kasan/report.c:162 >>> print_address_description mm/kasan/report.c:200 [inline] >>> kasan_report_error mm/kasan/report.c:289 [inline] >>> kasan_report.part.2+0x1e5/0x4b0 mm/kasan/report.c:311 >>> kasan_report mm/kasan/report.c:329 [inline] >>> __asan_report_load1_noabort+0x29/0x30 mm/kasan/report.c:329 >>> sctp_association_destroy net/sctp/associola.c:416 [inline] >>> sctp_association_put+0x294/0x300 net/sctp/associola.c:881 >>> sctp_generate_timeout_event+0x115/0x360 net/sctp/sm_sideeffect.c:317 >>> sctp_generate_t1_init_event+0x1a/0x20 net/sctp/sm_sideeffect.c:329 >>> call_timer_fn+0x241/0x820 kernel/time/timer.c:1308 >>> expire_timers kernel/time/timer.c:1348 [inline] >>> __run_timers+0x9e7/0xe90 kernel/time/timer.c:1642 >>> run_timer_softirq+0x21/0x80 kernel/time/timer.c:1655 >>> __do_softirq+0x31f/0xbe7 kernel/softirq.c:284 >>> invoke_softirq kernel/softirq.c:364 [inline] >>> irq_exit+0x1cc/0x200 kernel/softirq.c:405 >>> exiting_irq arch/x86/include/asm/apic.h:658 [inline] >>> smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:962 >>> apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:707 >>> RIP: 0010:arch_local_irq_enable arch/x86/include/asm/paravirt.h:788 [inline] >>> RIP: 0010:__raw_spin_unlock_irq include/linux/spinlock_api_smp.h:168 >>> [inline] >>> RIP: 0010:_raw_spin_unlock_irq+0x56/0x70 kernel/locking/spinlock.c:199 >>> RSP: 0018:8801c280f178 EFLAGS: 0286 ORIG_RAX: ff10 >>> RAX: dc00 RBX: 8801dbf24a00 RCX: 0006 >>> RDX: 10a18d03 RSI: 8801d71c88e0 RDI: 850c6818 >>> RBP: 8801c280f180 R08: 0002 R09: >>> R10: 0006 R11: R12: 8801c0f3a4c0 >>> R13: 110038501e38 R14: 8801d71c80c0 R15: 8801d71c80c0 >>> >>> finish_lock_switch kernel/sched/sched.h:1248 [inline] >>> finish_task_switch+0x1c2/0x720 kernel/sched/core.c:2792 >>> context_switch kernel/sched/core.c:2928 [inline] >>> __schedule+0x893/0x2290 kernel/sched/core.c:3468 >>> preempt_schedule_common+0x35/0x60 kernel/sched/core.c:3579 >>> _cond_resched+0x17/0x20 kernel/sched/core.c:4977 >>> slab_pre_alloc_hook mm/slab.h:427 [inline] >>> slab_alloc mm/slab.c:3390 [inline] >>> __do_kmalloc mm/slab.c:3730 [inline] >>> __kmalloc_track_caller+0x26a/0x690 mm/slab.c:3747 >>> kstrdup+0x39/0x70 mm/util.c:54 >>> snd_timer_instance_new+0xfc/0x5d0 sound/core/timer.c:110 >>> snd_timer_open+0x878/0x1740 sound/core/timer.c:290 >>> snd_timer_user_tselect sound/core/timer.c:1621 [inline] >>> __snd_timer_user_ioctl sound/core/timer.c:1901 [inline] >>> snd_timer_user_ioctl+0x9b1/0x34a0 sound/core/timer.c:1931 >>> vfs_ioctl fs/ioctl.c:43 [inline] >>> do_vfs_ioctl+0x1bf/0x1790 fs/ioctl.c:683 >>> SYSC_ioctl fs/ioctl.c:698 [inline] >>> SyS_ioctl+0x8f/0xc0 fs/ioctl.c:689 >>> entry_SYSCALL_64_fastpath+0x1f/0xc2 >>> RIP: 0033:0x44fb59 >>> RSP: 002b:7f0dc184db58 EFLAGS: 0212 ORIG_RAX: 0010 >>> RAX: ffda RBX: 40345410 RCX: 0044fb59 >>> RDX: 20001000 RSI: 40345410 RDI: 0005 >>> RBP: 0005 R08: R09: >>> R10: R11: 0212 R12: 00708000 >>> R13: 00a5fc57 R14: 7f0dc184e9c0 R15: >>> Object at 8801c0fa4140, in cache kmalloc-4096 size: 4096 >>> Allocated: >>> PID = 10965 >>> save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57 >>> save_stack+0x43/0xd0 mm/kasan/kasan.c:504 >>> set_track mm/kasan/kasan.c:516 [inline] >>> kasan_kmalloc+0xaa/0xd0 mm/kasan/kasan.c:607 >>> kmem_cache_alloc_trace+0x10b/0x670 mm/slab.c:3634 >>> kmalloc include/linux/slab.h:490 [inline] >>> kzalloc include/linux/slab.h:663 [inline] >>> sctp_association_new+0x114/0x2120 net/sctp/associola.c:306 >>> sctp_sendmsg+0x1585/0x38f0 net/sctp/socket.c:1835 >>> inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:761 >>> sock_sendmsg_nosec
Re: [PATCH 3/4] thp: fix MADV_DONTNEED vs. MADV_FREE race
On March 02, 2017 11:11 PM Kirill A. Shutemov wrote: > > Basically the same race as with numa balancing in change_huge_pmd(), but > a bit simpler to mitigate: we don't need to preserve dirty/young flags > here due to MADV_FREE functionality. > > Signed-off-by: Kirill A. Shutemov> Cc: Minchan Kim > --- > mm/huge_memory.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index bb2b3646bd78..324217c31ec9 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -1566,8 +1566,6 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, > struct vm_area_struct *vma, > deactivate_page(page); > > if (pmd_young(orig_pmd) || pmd_dirty(orig_pmd)) { > - orig_pmd = pmdp_huge_get_and_clear_full(tlb->mm, addr, pmd, > - tlb->fullmm); > orig_pmd = pmd_mkold(orig_pmd); > orig_pmd = pmd_mkclean(orig_pmd); > $ grep -n set_pmd_at linux-4.10/arch/powerpc/mm/pgtable-book3s64.c /* * set a new huge pmd. We should not be called for updating * an existing pmd entry. That should go via pmd_hugepage_update. */ void set_pmd_at(struct mm_struct *mm, unsigned long addr,
Re: net/sctp: use-after-free in sctp_association_put
On Fri, Mar 3, 2017 at 3:21 AM, Dmitry Vyukov wrote: > On Thu, Mar 2, 2017 at 9:06 AM, Xin Long wrote: >> On Thu, Mar 2, 2017 at 3:18 AM, Dmitry Vyukov wrote: >>> Hello, >>> >>> I've got the following report while running syzkaller fuzzer on >>> linux-next/8813198236a044b76e251dcae937b180dd527999: >>> >>> BUG: KASAN: use-after-free in sctp_association_destroy >>> net/sctp/associola.c:416 [inline] at addr 8801c0fa415c >>> BUG: KASAN: use-after-free in sctp_association_put+0x294/0x300 >>> net/sctp/associola.c:881 at addr 8801c0fa415c >>> Read of size 1 by task syz-executor1/10956 >>> CPU: 1 PID: 10956 Comm: syz-executor1 Not tainted 4.10.0-rc7-next-20170213 >>> #1 >>> Hardware name: Google Google Compute Engine/Google Compute Engine, >>> BIOS Google 01/01/2011 >>> Call Trace: >>> >>> __dump_stack lib/dump_stack.c:15 [inline] >>> dump_stack+0x2ee/0x3ef lib/dump_stack.c:51 >>> kasan_object_err+0x1c/0x70 mm/kasan/report.c:162 >>> print_address_description mm/kasan/report.c:200 [inline] >>> kasan_report_error mm/kasan/report.c:289 [inline] >>> kasan_report.part.2+0x1e5/0x4b0 mm/kasan/report.c:311 >>> kasan_report mm/kasan/report.c:329 [inline] >>> __asan_report_load1_noabort+0x29/0x30 mm/kasan/report.c:329 >>> sctp_association_destroy net/sctp/associola.c:416 [inline] >>> sctp_association_put+0x294/0x300 net/sctp/associola.c:881 >>> sctp_generate_timeout_event+0x115/0x360 net/sctp/sm_sideeffect.c:317 >>> sctp_generate_t1_init_event+0x1a/0x20 net/sctp/sm_sideeffect.c:329 >>> call_timer_fn+0x241/0x820 kernel/time/timer.c:1308 >>> expire_timers kernel/time/timer.c:1348 [inline] >>> __run_timers+0x9e7/0xe90 kernel/time/timer.c:1642 >>> run_timer_softirq+0x21/0x80 kernel/time/timer.c:1655 >>> __do_softirq+0x31f/0xbe7 kernel/softirq.c:284 >>> invoke_softirq kernel/softirq.c:364 [inline] >>> irq_exit+0x1cc/0x200 kernel/softirq.c:405 >>> exiting_irq arch/x86/include/asm/apic.h:658 [inline] >>> smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:962 >>> apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:707 >>> RIP: 0010:arch_local_irq_enable arch/x86/include/asm/paravirt.h:788 [inline] >>> RIP: 0010:__raw_spin_unlock_irq include/linux/spinlock_api_smp.h:168 >>> [inline] >>> RIP: 0010:_raw_spin_unlock_irq+0x56/0x70 kernel/locking/spinlock.c:199 >>> RSP: 0018:8801c280f178 EFLAGS: 0286 ORIG_RAX: ff10 >>> RAX: dc00 RBX: 8801dbf24a00 RCX: 0006 >>> RDX: 10a18d03 RSI: 8801d71c88e0 RDI: 850c6818 >>> RBP: 8801c280f180 R08: 0002 R09: >>> R10: 0006 R11: R12: 8801c0f3a4c0 >>> R13: 110038501e38 R14: 8801d71c80c0 R15: 8801d71c80c0 >>> >>> finish_lock_switch kernel/sched/sched.h:1248 [inline] >>> finish_task_switch+0x1c2/0x720 kernel/sched/core.c:2792 >>> context_switch kernel/sched/core.c:2928 [inline] >>> __schedule+0x893/0x2290 kernel/sched/core.c:3468 >>> preempt_schedule_common+0x35/0x60 kernel/sched/core.c:3579 >>> _cond_resched+0x17/0x20 kernel/sched/core.c:4977 >>> slab_pre_alloc_hook mm/slab.h:427 [inline] >>> slab_alloc mm/slab.c:3390 [inline] >>> __do_kmalloc mm/slab.c:3730 [inline] >>> __kmalloc_track_caller+0x26a/0x690 mm/slab.c:3747 >>> kstrdup+0x39/0x70 mm/util.c:54 >>> snd_timer_instance_new+0xfc/0x5d0 sound/core/timer.c:110 >>> snd_timer_open+0x878/0x1740 sound/core/timer.c:290 >>> snd_timer_user_tselect sound/core/timer.c:1621 [inline] >>> __snd_timer_user_ioctl sound/core/timer.c:1901 [inline] >>> snd_timer_user_ioctl+0x9b1/0x34a0 sound/core/timer.c:1931 >>> vfs_ioctl fs/ioctl.c:43 [inline] >>> do_vfs_ioctl+0x1bf/0x1790 fs/ioctl.c:683 >>> SYSC_ioctl fs/ioctl.c:698 [inline] >>> SyS_ioctl+0x8f/0xc0 fs/ioctl.c:689 >>> entry_SYSCALL_64_fastpath+0x1f/0xc2 >>> RIP: 0033:0x44fb59 >>> RSP: 002b:7f0dc184db58 EFLAGS: 0212 ORIG_RAX: 0010 >>> RAX: ffda RBX: 40345410 RCX: 0044fb59 >>> RDX: 20001000 RSI: 40345410 RDI: 0005 >>> RBP: 0005 R08: R09: >>> R10: R11: 0212 R12: 00708000 >>> R13: 00a5fc57 R14: 7f0dc184e9c0 R15: >>> Object at 8801c0fa4140, in cache kmalloc-4096 size: 4096 >>> Allocated: >>> PID = 10965 >>> save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57 >>> save_stack+0x43/0xd0 mm/kasan/kasan.c:504 >>> set_track mm/kasan/kasan.c:516 [inline] >>> kasan_kmalloc+0xaa/0xd0 mm/kasan/kasan.c:607 >>> kmem_cache_alloc_trace+0x10b/0x670 mm/slab.c:3634 >>> kmalloc include/linux/slab.h:490 [inline] >>> kzalloc include/linux/slab.h:663 [inline] >>> sctp_association_new+0x114/0x2120 net/sctp/associola.c:306 >>> sctp_sendmsg+0x1585/0x38f0 net/sctp/socket.c:1835 >>> inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:761 >>> sock_sendmsg_nosec net/socket.c:633 [inline] >>> sock_sendmsg+0xca/0x110 net/socket.c:643
Re: [PATCH 3/4] thp: fix MADV_DONTNEED vs. MADV_FREE race
On March 02, 2017 11:11 PM Kirill A. Shutemov wrote: > > Basically the same race as with numa balancing in change_huge_pmd(), but > a bit simpler to mitigate: we don't need to preserve dirty/young flags > here due to MADV_FREE functionality. > > Signed-off-by: Kirill A. Shutemov > Cc: Minchan Kim > --- > mm/huge_memory.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index bb2b3646bd78..324217c31ec9 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -1566,8 +1566,6 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, > struct vm_area_struct *vma, > deactivate_page(page); > > if (pmd_young(orig_pmd) || pmd_dirty(orig_pmd)) { > - orig_pmd = pmdp_huge_get_and_clear_full(tlb->mm, addr, pmd, > - tlb->fullmm); > orig_pmd = pmd_mkold(orig_pmd); > orig_pmd = pmd_mkclean(orig_pmd); > $ grep -n set_pmd_at linux-4.10/arch/powerpc/mm/pgtable-book3s64.c /* * set a new huge pmd. We should not be called for updating * an existing pmd entry. That should go via pmd_hugepage_update. */ void set_pmd_at(struct mm_struct *mm, unsigned long addr,
[PATCH] blk: improve order of bio handling in generic_make_request()
[ Hi Jens, you might have seen assorted email threads recently about deadlocks, particular in dm-snap or md/raid1/10. Also about the excess of rescuer threads. I think a big part of the problem is my ancient improvement to generic_make_request to queue bios and handle them in a strict FIFO order. As described below, that can cause problems which individual block devices cannot fix themselves without punting to various threads. This patch does not fix everything, but provides a basis that drives can build on to create dead-lock free solutions without excess threads. If you accept this, I will look into improving at least md and bio_alloc_set() to be less dependant on rescuer threads. Thanks, NeilBrown ] To avoid recursion on the kernel stack when stacked block devices are in use, generic_make_request() will, when called recursively, queue new requests for later handling. They will be handled when the make_request_fn for the current bio completes. If any bios are submitted by a make_request_fn, these will ultimately handled seqeuntially. If the handling of one of those generates further requests, they will be added to the end of the queue. This strict first-in-first-out behaviour can lead to deadlocks in various ways, normally because a request might need to wait for a previous request to the same device to complete. This can happen when they share a mempool, and can happen due to interdependencies particular to the device. Both md and dm have examples where this happens. These deadlocks can be erradicated by more selective ordering of bios. Specifically by handling them in depth-first order. That is: when the handling of one bio generates one or more further bios, they are handled immediately after the parent, before any siblings of the parent. That way, when generic_make_request() calls make_request_fn for some particular device, it we can be certain that all previously submited request for that device have been completely handled and are not waiting for anything in the queue of requests maintained in generic_make_request(). An easy way to achieve this would be to use a last-in-first-out stack instead of a queue. However this will change the order of consecutive bios submitted by a make_request_fn, which could have unexpected consequences. Instead we take a slightly more complex approach. A fresh queue is created for each call to a make_request_fn. After it completes, any bios for a different device are placed on the front of the main queue, followed by any bios for the same device, followed by all bios that were already on the queue before the make_request_fn was called. This provides the depth-first approach without reordering bios on the same level. This, by itself, it not enough to remove the deadlocks. It just makes it possible for drivers to take the extra step required themselves. To avoid deadlocks, drivers must never risk waiting for a request after submitting one to generic_make_request. This includes never allocing from a mempool twice in the one call to a make_request_fn. A common pattern in drivers is to call bio_split() in a loop, handling the first part and then looping around to possibly split the next part. Instead, a driver that finds it needs to split a bio should queue (with generic_make_request) the second part, handle the first part, and then return. The new code in generic_make_request will ensure the requests to underlying bios are processed first, then the second bio that was split off. If it splits again, the same process happens. In each case one bio will be completely handled before the next one is attempted. With this is place, it should be possible to disable the punt_bios_to_recover() recovery thread for many block devices, and eventually it may be possible to remove it completely. Tested-by: Jinpu WangInspired-by: Lars Ellenberg Signed-off-by: NeilBrown --- block/blk-core.c | 22 ++ 1 file changed, 22 insertions(+) diff --git a/block/blk-core.c b/block/blk-core.c index b9e857f4afe8..ef55f210dd7c 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -2018,10 +2018,32 @@ blk_qc_t generic_make_request(struct bio *bio) struct request_queue *q = bdev_get_queue(bio->bi_bdev); if (likely(blk_queue_enter(q, false) == 0)) { + struct bio_list hold; + struct bio_list lower, same; + + /* Create a fresh bio_list for all subordinate requests */ + bio_list_init(); + bio_list_merge(, _list_on_stack); + bio_list_init(_list_on_stack); ret = q->make_request_fn(q, bio); blk_queue_exit(q); + /* sort new bios into those for a lower level +* and those for the same level +
[PATCH] blk: improve order of bio handling in generic_make_request()
[ Hi Jens, you might have seen assorted email threads recently about deadlocks, particular in dm-snap or md/raid1/10. Also about the excess of rescuer threads. I think a big part of the problem is my ancient improvement to generic_make_request to queue bios and handle them in a strict FIFO order. As described below, that can cause problems which individual block devices cannot fix themselves without punting to various threads. This patch does not fix everything, but provides a basis that drives can build on to create dead-lock free solutions without excess threads. If you accept this, I will look into improving at least md and bio_alloc_set() to be less dependant on rescuer threads. Thanks, NeilBrown ] To avoid recursion on the kernel stack when stacked block devices are in use, generic_make_request() will, when called recursively, queue new requests for later handling. They will be handled when the make_request_fn for the current bio completes. If any bios are submitted by a make_request_fn, these will ultimately handled seqeuntially. If the handling of one of those generates further requests, they will be added to the end of the queue. This strict first-in-first-out behaviour can lead to deadlocks in various ways, normally because a request might need to wait for a previous request to the same device to complete. This can happen when they share a mempool, and can happen due to interdependencies particular to the device. Both md and dm have examples where this happens. These deadlocks can be erradicated by more selective ordering of bios. Specifically by handling them in depth-first order. That is: when the handling of one bio generates one or more further bios, they are handled immediately after the parent, before any siblings of the parent. That way, when generic_make_request() calls make_request_fn for some particular device, it we can be certain that all previously submited request for that device have been completely handled and are not waiting for anything in the queue of requests maintained in generic_make_request(). An easy way to achieve this would be to use a last-in-first-out stack instead of a queue. However this will change the order of consecutive bios submitted by a make_request_fn, which could have unexpected consequences. Instead we take a slightly more complex approach. A fresh queue is created for each call to a make_request_fn. After it completes, any bios for a different device are placed on the front of the main queue, followed by any bios for the same device, followed by all bios that were already on the queue before the make_request_fn was called. This provides the depth-first approach without reordering bios on the same level. This, by itself, it not enough to remove the deadlocks. It just makes it possible for drivers to take the extra step required themselves. To avoid deadlocks, drivers must never risk waiting for a request after submitting one to generic_make_request. This includes never allocing from a mempool twice in the one call to a make_request_fn. A common pattern in drivers is to call bio_split() in a loop, handling the first part and then looping around to possibly split the next part. Instead, a driver that finds it needs to split a bio should queue (with generic_make_request) the second part, handle the first part, and then return. The new code in generic_make_request will ensure the requests to underlying bios are processed first, then the second bio that was split off. If it splits again, the same process happens. In each case one bio will be completely handled before the next one is attempted. With this is place, it should be possible to disable the punt_bios_to_recover() recovery thread for many block devices, and eventually it may be possible to remove it completely. Tested-by: Jinpu Wang Inspired-by: Lars Ellenberg Signed-off-by: NeilBrown --- block/blk-core.c | 22 ++ 1 file changed, 22 insertions(+) diff --git a/block/blk-core.c b/block/blk-core.c index b9e857f4afe8..ef55f210dd7c 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -2018,10 +2018,32 @@ blk_qc_t generic_make_request(struct bio *bio) struct request_queue *q = bdev_get_queue(bio->bi_bdev); if (likely(blk_queue_enter(q, false) == 0)) { + struct bio_list hold; + struct bio_list lower, same; + + /* Create a fresh bio_list for all subordinate requests */ + bio_list_init(); + bio_list_merge(, _list_on_stack); + bio_list_init(_list_on_stack); ret = q->make_request_fn(q, bio); blk_queue_exit(q); + /* sort new bios into those for a lower level +* and those for the same level +*/ + bio_list_init(); +
[GIT PULL] Block fixes for 4.11-rc1
Hi Linus, A collection of fixes for this merge window, either fixes for existing issues, or parts that were waiting for acks to come in. This pull request contains: - Allocation of nvme queues on the right node from Shaohua. This was ready long before the merge window, but waiting on an ack from Bjorn on the PCI bit. Now that we have that, the three patches can go in. - Two fixes for blk-mq-sched with nvmeof, which uses hctx specific request allocations. This caused an oops. One part from Sagi, one part from Omar. - A loop partition scan deadlock fix from Omar, fixing a regression in this merge window. - A 3 patch series from Keith, closing up a hole on clearing out requests on shutdown/resume. - A stable fix for nbd from Josef, fixing a leak of sockets. - Two fixes for a regression in this window from Jan, fixing a problem with one of his earlier patches dealing with queue vs bdi life times. - A fix for a regression with virtio-blk, causing an IO stall if scheduling is used. From me. - A fix for an io context lock ordering problem. From me. Please pull! git://git.kernel.dk/linux-block.git for-linus Jan Kara (2): block: Initialize bd_bdi on inode initialization block: Move bdi_unregister() to del_gendisk() Jens Axboe (2): block: don't call ioc_exit_icq() with the queue lock held for blk-mq blk-mq: ensure that bd->last is always set correctly Josef Bacik (1): nbd: stop leaking sockets Keith Busch (3): blk-mq: Export blk_mq_freeze_queue_wait blk-mq: Provide freeze queue timeout nvme: Complete all stuck requests Omar Sandoval (4): blk-mq: make blk_mq_alloc_request_hctx() allocate a scheduler request blk-mq: kill blk_mq_set_alloc_data() blk-mq: move update of tags->rqs to __blk_mq_alloc_request() loop: fix LO_FLAGS_PARTSCAN hang Sagi Grimberg (1): blk-mq-sched: Allocate sched reserved tags as specified in the original queue tagset Shaohua Li (3): blk-mq: allocate blk_mq_tags and requests in correct node PCI: add an API to get node from vector nvme: allocate nvme_queue in correct node block/blk-core.c | 1 - block/blk-ioc.c | 44 - block/blk-mq-sched.c | 16 +++ block/blk-mq-tag.c | 2 +- block/blk-mq-tag.h | 6 +++ block/blk-mq.c | 120 ++- block/blk-mq.h | 10 block/blk-sysfs.c| 2 - block/elevator.c | 2 - block/genhd.c| 5 ++ drivers/block/loop.c | 15 +++--- drivers/block/nbd.c | 4 +- drivers/nvme/host/core.c | 47 +++ drivers/nvme/host/nvme.h | 4 ++ drivers/nvme/host/pci.c | 45 ++ drivers/pci/msi.c| 16 +++ fs/block_dev.c | 6 ++- include/linux/blk-mq.h | 3 ++ include/linux/pci.h | 6 +++ 19 files changed, 265 insertions(+), 89 deletions(-) -- Jens Axboe
[GIT PULL] Block fixes for 4.11-rc1
Hi Linus, A collection of fixes for this merge window, either fixes for existing issues, or parts that were waiting for acks to come in. This pull request contains: - Allocation of nvme queues on the right node from Shaohua. This was ready long before the merge window, but waiting on an ack from Bjorn on the PCI bit. Now that we have that, the three patches can go in. - Two fixes for blk-mq-sched with nvmeof, which uses hctx specific request allocations. This caused an oops. One part from Sagi, one part from Omar. - A loop partition scan deadlock fix from Omar, fixing a regression in this merge window. - A 3 patch series from Keith, closing up a hole on clearing out requests on shutdown/resume. - A stable fix for nbd from Josef, fixing a leak of sockets. - Two fixes for a regression in this window from Jan, fixing a problem with one of his earlier patches dealing with queue vs bdi life times. - A fix for a regression with virtio-blk, causing an IO stall if scheduling is used. From me. - A fix for an io context lock ordering problem. From me. Please pull! git://git.kernel.dk/linux-block.git for-linus Jan Kara (2): block: Initialize bd_bdi on inode initialization block: Move bdi_unregister() to del_gendisk() Jens Axboe (2): block: don't call ioc_exit_icq() with the queue lock held for blk-mq blk-mq: ensure that bd->last is always set correctly Josef Bacik (1): nbd: stop leaking sockets Keith Busch (3): blk-mq: Export blk_mq_freeze_queue_wait blk-mq: Provide freeze queue timeout nvme: Complete all stuck requests Omar Sandoval (4): blk-mq: make blk_mq_alloc_request_hctx() allocate a scheduler request blk-mq: kill blk_mq_set_alloc_data() blk-mq: move update of tags->rqs to __blk_mq_alloc_request() loop: fix LO_FLAGS_PARTSCAN hang Sagi Grimberg (1): blk-mq-sched: Allocate sched reserved tags as specified in the original queue tagset Shaohua Li (3): blk-mq: allocate blk_mq_tags and requests in correct node PCI: add an API to get node from vector nvme: allocate nvme_queue in correct node block/blk-core.c | 1 - block/blk-ioc.c | 44 - block/blk-mq-sched.c | 16 +++ block/blk-mq-tag.c | 2 +- block/blk-mq-tag.h | 6 +++ block/blk-mq.c | 120 ++- block/blk-mq.h | 10 block/blk-sysfs.c| 2 - block/elevator.c | 2 - block/genhd.c| 5 ++ drivers/block/loop.c | 15 +++--- drivers/block/nbd.c | 4 +- drivers/nvme/host/core.c | 47 +++ drivers/nvme/host/nvme.h | 4 ++ drivers/nvme/host/pci.c | 45 ++ drivers/pci/msi.c| 16 +++ fs/block_dev.c | 6 ++- include/linux/blk-mq.h | 3 ++ include/linux/pci.h | 6 +++ 19 files changed, 265 insertions(+), 89 deletions(-) -- Jens Axboe
[PATCH 3/4] x86, pci: Add interface to force mmconfig
From: Andi KleenThis fills in the pci_bus_force_mmconfig interface that was added earlier for x86 to allow drivers to optimize config space accesses. The implementation is straight forward and uses the existing mmconfig access functions, just forcing mmconfig access. Signed-off-by: Andi Kleen --- arch/x86/pci/mmconfig-shared.c | 28 1 file changed, 28 insertions(+) diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c index dd30b7e08bc2..bb56533290aa 100644 --- a/arch/x86/pci/mmconfig-shared.c +++ b/arch/x86/pci/mmconfig-shared.c @@ -816,3 +816,31 @@ int pci_mmconfig_delete(u16 seg, u8 start, u8 end) return -ENOENT; } + +static int pci_mmconfig_read(struct pci_bus *bus, unsigned int devfn, +int where, int size, u32 *value) +{ + return raw_pci_ext_ops->read(pci_domain_nr(bus), bus->number, +devfn, where, size, value); +} + +static int pci_mmconfig_write(struct pci_bus *bus, unsigned int devfn, + int where, int size, u32 value) +{ + return raw_pci_ext_ops->write(pci_domain_nr(bus), bus->number, + devfn, where, size, value); +} + +struct pci_ops pci_mmconfig_ops = { + .read = pci_mmconfig_read, + .write = pci_mmconfig_write, +}; + +/* Force all config accesses to go through mmconfig. */ +int pci_bus_force_mmconfig(struct pci_bus *bus) +{ + if (!raw_pci_ext_ops) + return -1; + bus->ops = _mmconfig_ops; + return 0; +} -- 2.9.3
[PATCH 3/4] x86, pci: Add interface to force mmconfig
From: Andi Kleen This fills in the pci_bus_force_mmconfig interface that was added earlier for x86 to allow drivers to optimize config space accesses. The implementation is straight forward and uses the existing mmconfig access functions, just forcing mmconfig access. Signed-off-by: Andi Kleen --- arch/x86/pci/mmconfig-shared.c | 28 1 file changed, 28 insertions(+) diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c index dd30b7e08bc2..bb56533290aa 100644 --- a/arch/x86/pci/mmconfig-shared.c +++ b/arch/x86/pci/mmconfig-shared.c @@ -816,3 +816,31 @@ int pci_mmconfig_delete(u16 seg, u8 start, u8 end) return -ENOENT; } + +static int pci_mmconfig_read(struct pci_bus *bus, unsigned int devfn, +int where, int size, u32 *value) +{ + return raw_pci_ext_ops->read(pci_domain_nr(bus), bus->number, +devfn, where, size, value); +} + +static int pci_mmconfig_write(struct pci_bus *bus, unsigned int devfn, + int where, int size, u32 value) +{ + return raw_pci_ext_ops->write(pci_domain_nr(bus), bus->number, + devfn, where, size, value); +} + +struct pci_ops pci_mmconfig_ops = { + .read = pci_mmconfig_read, + .write = pci_mmconfig_write, +}; + +/* Force all config accesses to go through mmconfig. */ +int pci_bus_force_mmconfig(struct pci_bus *bus) +{ + if (!raw_pci_ext_ops) + return -1; + bus->ops = _mmconfig_ops; + return 0; +} -- 2.9.3
Re: [RFC 04/11] mm: remove SWAP_MLOCK check for SWAP_SUCCESS in ttu
On Thu, Mar 02, 2017 at 08:21:46PM +0530, Anshuman Khandual wrote: > On 03/02/2017 12:09 PM, Minchan Kim wrote: > > If the page is mapped and rescue in ttuo, page_mapcount(page) == 0 cannot > > Nit: "ttuo" is very cryptic. Please expand it. No problem. > > > be true so page_mapcount check in ttu is enough to return SWAP_SUCCESS. > > IOW, SWAP_MLOCK check is redundant so remove it. > > Right, page_mapcount(page) should be enough to tell whether swapping > out happened successfully or the page is still mapped in some page > table. > Thanks for the review, Anshuman!
Re: [RFC 04/11] mm: remove SWAP_MLOCK check for SWAP_SUCCESS in ttu
On Thu, Mar 02, 2017 at 08:21:46PM +0530, Anshuman Khandual wrote: > On 03/02/2017 12:09 PM, Minchan Kim wrote: > > If the page is mapped and rescue in ttuo, page_mapcount(page) == 0 cannot > > Nit: "ttuo" is very cryptic. Please expand it. No problem. > > > be true so page_mapcount check in ttu is enough to return SWAP_SUCCESS. > > IOW, SWAP_MLOCK check is redundant so remove it. > > Right, page_mapcount(page) should be enough to tell whether swapping > out happened successfully or the page is still mapped in some page > table. > Thanks for the review, Anshuman!
Re: [PATCH v4 1/3] mmc: dt-bindings: update Mediatek MMC bindings
On Fri, 2017-02-24 at 16:47 -0600, Rob Herring wrote: > On Fri, Feb 24, 2017 at 3:59 AM, Yong Maowrote: > > Dear Rob, > > > > Could you please help to make comments for this patch? > > Thanks. > > I already did comment. It's still wrong as Ulf commented. So fix and > send a new version. It has to go to the DT list if you want to be in > my queue. > > Rob After reviewing the history, We guess your mentioned Ulf's comments is as below. "> +- mtk-hs200-cmd-int-delay: HS200 command internal delay setting. > + The value is an integer from 0 to 31 Please change to: mediatek,hs200-cmd-delay ... and if there is a unit, like ns or us, please add that a suffix. > +- mtk-hs400-cmd-int-delay: HS400 command internal delay setting > + The value is an integer from 0 to 31 mediatek,hs400-cmd-delay and add unit if applicable. > +- mtk-hs400-cmd-resp-sel: HS400 command response sample selection > + The value is an integer from 0 to 1 mediatek,hs400-cmd-resp-sel And make it a boolean value instead!" ==> We already fix this comment in v4. We use "mediatek,hs200-cmd-int-delay" to replace "mtk-hs200-cmd-int-delay", but not use "mediatek,hs200-cmd-delay". This is because "-int-" here means internal. We should not drop it. And this field does not have unit, it only have total 32 stages. We also change the description in v4. For comment about "mtk-hs400-cmd-resp-sel", we also make it a boolean value in v4. And re-name it as "mediatek,hs400-cmd-resp-rising". Please help to point out where we need to modify. Thanks.
Re: [PATCH v4 1/3] mmc: dt-bindings: update Mediatek MMC bindings
On Fri, 2017-02-24 at 16:47 -0600, Rob Herring wrote: > On Fri, Feb 24, 2017 at 3:59 AM, Yong Mao wrote: > > Dear Rob, > > > > Could you please help to make comments for this patch? > > Thanks. > > I already did comment. It's still wrong as Ulf commented. So fix and > send a new version. It has to go to the DT list if you want to be in > my queue. > > Rob After reviewing the history, We guess your mentioned Ulf's comments is as below. "> +- mtk-hs200-cmd-int-delay: HS200 command internal delay setting. > + The value is an integer from 0 to 31 Please change to: mediatek,hs200-cmd-delay ... and if there is a unit, like ns or us, please add that a suffix. > +- mtk-hs400-cmd-int-delay: HS400 command internal delay setting > + The value is an integer from 0 to 31 mediatek,hs400-cmd-delay and add unit if applicable. > +- mtk-hs400-cmd-resp-sel: HS400 command response sample selection > + The value is an integer from 0 to 1 mediatek,hs400-cmd-resp-sel And make it a boolean value instead!" ==> We already fix this comment in v4. We use "mediatek,hs200-cmd-int-delay" to replace "mtk-hs200-cmd-int-delay", but not use "mediatek,hs200-cmd-delay". This is because "-int-" here means internal. We should not drop it. And this field does not have unit, it only have total 32 stages. We also change the description in v4. For comment about "mtk-hs400-cmd-resp-sel", we also make it a boolean value in v4. And re-name it as "mediatek,hs400-cmd-resp-rising". Please help to point out where we need to modify. Thanks.
Re: + mm-reclaim-madv_free-pages.patch added to -mm tree
Hi, On Tue, Feb 28, 2017 at 04:32:38PM -0800, a...@linux-foundation.org wrote: > > The patch titled > Subject: mm: reclaim MADV_FREE pages > has been added to the -mm tree. Its filename is > mm-reclaim-madv_free-pages.patch > > This patch should soon appear at > http://ozlabs.org/~akpm/mmots/broken-out/mm-reclaim-madv_free-pages.patch > and later at > http://ozlabs.org/~akpm/mmotm/broken-out/mm-reclaim-madv_free-pages.patch > > Before you just go and hit "reply", please: >a) Consider who else should be cc'ed >b) Prefer to cc a suitable mailing list as well >c) Ideally: find the original patch on the mailing list and do a > reply-to-all to that, adding suitable additional cc's > > *** Remember to use Documentation/SubmitChecklist when testing your code *** > > The -mm tree is included into linux-next and is updated > there every 3-4 working days > > -- > From: Shaohua Li> Subject: mm: reclaim MADV_FREE pages > > When memory pressure is high, we free MADV_FREE pages. If the pages are > not dirty in pte, the pages could be freed immediately. Otherwise we > can't reclaim them. We put the pages back to anonumous LRU list (by > setting SwapBacked flag) and the pages will be reclaimed in normal swapout > way. > > We use normal page reclaim policy. Since MADV_FREE pages are put into > inactive file list, such pages and inactive file pages are reclaimed > according to their age. This is expected, because we don't want to > reclaim too many MADV_FREE pages before used once pages. > > Based on Minchan's original patch > > Link: > http://lkml.kernel.org/r/14b8eb1d3f6bf6cc492833f183ac8c304e560484.1487965799.git.s...@fb.com > Signed-off-by: Shaohua Li > Acked-by: Minchan Kim > Acked-by: Michal Hocko > Acked-by: Johannes Weiner > Acked-by: Hillf Danton > Cc: Hugh Dickins > Cc: Rik van Riel > Cc: Mel Gorman > Signed-off-by: Andrew Morton > --- < snip > > @@ -1419,11 +1413,21 @@ static int try_to_unmap_one(struct page > VM_BUG_ON_PAGE(!PageSwapCache(page) && > PageSwapBacked(page), > page); > > - if (!PageDirty(page)) { > + /* > + * swapin page could be clean, it has data stored in > + * swap. We can't silently discard it without setting > + * swap entry in the page table. > + */ > + if (!PageDirty(page) && !PageSwapCache(page)) { > /* It's a freeable page by MADV_FREE */ > dec_mm_counter(mm, MM_ANONPAGES); > - rp->lazyfreed++; > goto discard; > + } else if (!PageSwapBacked(page)) { > + /* dirty MADV_FREE page */ > + set_pte_at(mm, address, pvmw.pte, pteval); > + ret = SWAP_DIRTY; > + page_vma_mapped_walk_done(); > + break; > } There is no point to make this logic complicated with clean swapin-page. Andrew, Could you fold below patch into the mm-reclaim-madv_free-pages.patch if others are not against? Thanks. >From 0c28f6560fbc4e65da4f4a8cc4664ab9f7b11cf3 Mon Sep 17 00:00:00 2001 From: Minchan Kim Date: Fri, 3 Mar 2017 11:42:52 +0900 Subject: [PATCH] mm: clean up lazyfree page handling We can make it simple to understand without need to be aware of clean-swapin page. This patch just clean up lazyfree page handling in try_to_unmap_one. Signed-off-by: Minchan Kim --- mm/rmap.c | 22 +++--- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index bb45712..f7eab40 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1413,17 +1413,17 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma, VM_BUG_ON_PAGE(!PageSwapCache(page) && PageSwapBacked(page), page); - /* -* swapin page could be clean, it has data stored in -* swap. We can't silently discard it without setting -* swap entry in the page table. -*/ - if (!PageDirty(page) && !PageSwapCache(page)) { - /* It's a freeable page by MADV_FREE */ - dec_mm_counter(mm, MM_ANONPAGES); - goto discard; - } else if (!PageSwapBacked(page)) { -
Re: + mm-reclaim-madv_free-pages.patch added to -mm tree
Hi, On Tue, Feb 28, 2017 at 04:32:38PM -0800, a...@linux-foundation.org wrote: > > The patch titled > Subject: mm: reclaim MADV_FREE pages > has been added to the -mm tree. Its filename is > mm-reclaim-madv_free-pages.patch > > This patch should soon appear at > http://ozlabs.org/~akpm/mmots/broken-out/mm-reclaim-madv_free-pages.patch > and later at > http://ozlabs.org/~akpm/mmotm/broken-out/mm-reclaim-madv_free-pages.patch > > Before you just go and hit "reply", please: >a) Consider who else should be cc'ed >b) Prefer to cc a suitable mailing list as well >c) Ideally: find the original patch on the mailing list and do a > reply-to-all to that, adding suitable additional cc's > > *** Remember to use Documentation/SubmitChecklist when testing your code *** > > The -mm tree is included into linux-next and is updated > there every 3-4 working days > > -- > From: Shaohua Li > Subject: mm: reclaim MADV_FREE pages > > When memory pressure is high, we free MADV_FREE pages. If the pages are > not dirty in pte, the pages could be freed immediately. Otherwise we > can't reclaim them. We put the pages back to anonumous LRU list (by > setting SwapBacked flag) and the pages will be reclaimed in normal swapout > way. > > We use normal page reclaim policy. Since MADV_FREE pages are put into > inactive file list, such pages and inactive file pages are reclaimed > according to their age. This is expected, because we don't want to > reclaim too many MADV_FREE pages before used once pages. > > Based on Minchan's original patch > > Link: > http://lkml.kernel.org/r/14b8eb1d3f6bf6cc492833f183ac8c304e560484.1487965799.git.s...@fb.com > Signed-off-by: Shaohua Li > Acked-by: Minchan Kim > Acked-by: Michal Hocko > Acked-by: Johannes Weiner > Acked-by: Hillf Danton > Cc: Hugh Dickins > Cc: Rik van Riel > Cc: Mel Gorman > Signed-off-by: Andrew Morton > --- < snip > > @@ -1419,11 +1413,21 @@ static int try_to_unmap_one(struct page > VM_BUG_ON_PAGE(!PageSwapCache(page) && > PageSwapBacked(page), > page); > > - if (!PageDirty(page)) { > + /* > + * swapin page could be clean, it has data stored in > + * swap. We can't silently discard it without setting > + * swap entry in the page table. > + */ > + if (!PageDirty(page) && !PageSwapCache(page)) { > /* It's a freeable page by MADV_FREE */ > dec_mm_counter(mm, MM_ANONPAGES); > - rp->lazyfreed++; > goto discard; > + } else if (!PageSwapBacked(page)) { > + /* dirty MADV_FREE page */ > + set_pte_at(mm, address, pvmw.pte, pteval); > + ret = SWAP_DIRTY; > + page_vma_mapped_walk_done(); > + break; > } There is no point to make this logic complicated with clean swapin-page. Andrew, Could you fold below patch into the mm-reclaim-madv_free-pages.patch if others are not against? Thanks. >From 0c28f6560fbc4e65da4f4a8cc4664ab9f7b11cf3 Mon Sep 17 00:00:00 2001 From: Minchan Kim Date: Fri, 3 Mar 2017 11:42:52 +0900 Subject: [PATCH] mm: clean up lazyfree page handling We can make it simple to understand without need to be aware of clean-swapin page. This patch just clean up lazyfree page handling in try_to_unmap_one. Signed-off-by: Minchan Kim --- mm/rmap.c | 22 +++--- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index bb45712..f7eab40 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1413,17 +1413,17 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma, VM_BUG_ON_PAGE(!PageSwapCache(page) && PageSwapBacked(page), page); - /* -* swapin page could be clean, it has data stored in -* swap. We can't silently discard it without setting -* swap entry in the page table. -*/ - if (!PageDirty(page) && !PageSwapCache(page)) { - /* It's a freeable page by MADV_FREE */ - dec_mm_counter(mm, MM_ANONPAGES); - goto discard; - } else if (!PageSwapBacked(page)) { - /* dirty MADV_FREE page */ + /* MADV_FREE page check */ + if (!PageSwapBacked(page)) { + if (!PageDirty(page)) { +
Re: [RFC 02/11] mm: remove unncessary ret in page_referenced
On Thu, Mar 02, 2017 at 08:03:16PM +0530, Anshuman Khandual wrote: > On 03/02/2017 12:09 PM, Minchan Kim wrote: > > Anyone doesn't use ret variable. Remove it. > > > > This change is correct. But not sure how this is related to > try_to_unmap() clean up though. In this patchset, I made rmap_walk void function with upcoming patch so it's a preparation step for it. > > > > Signed-off-by: Minchan Kim> > --- > > mm/rmap.c | 3 +-- > > 1 file changed, 1 insertion(+), 2 deletions(-) > > > > diff --git a/mm/rmap.c b/mm/rmap.c > > index bb45712..8076347 100644 > > --- a/mm/rmap.c > > +++ b/mm/rmap.c > > @@ -805,7 +805,6 @@ int page_referenced(struct page *page, > > struct mem_cgroup *memcg, > > unsigned long *vm_flags) > > { > > - int ret; > > int we_locked = 0; > > struct page_referenced_arg pra = { > > .mapcount = total_mapcount(page), > > @@ -839,7 +838,7 @@ int page_referenced(struct page *page, > > rwc.invalid_vma = invalid_page_referenced_vma; > > } > > > > - ret = rmap_walk(page, ); > > + rmap_walk(page, ); > > *vm_flags = pra.vm_flags; > > > > if (we_locked) > > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majord...@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: mailto:"d...@kvack.org;> em...@kvack.org
Re: [RFC 02/11] mm: remove unncessary ret in page_referenced
On Thu, Mar 02, 2017 at 08:03:16PM +0530, Anshuman Khandual wrote: > On 03/02/2017 12:09 PM, Minchan Kim wrote: > > Anyone doesn't use ret variable. Remove it. > > > > This change is correct. But not sure how this is related to > try_to_unmap() clean up though. In this patchset, I made rmap_walk void function with upcoming patch so it's a preparation step for it. > > > > Signed-off-by: Minchan Kim > > --- > > mm/rmap.c | 3 +-- > > 1 file changed, 1 insertion(+), 2 deletions(-) > > > > diff --git a/mm/rmap.c b/mm/rmap.c > > index bb45712..8076347 100644 > > --- a/mm/rmap.c > > +++ b/mm/rmap.c > > @@ -805,7 +805,6 @@ int page_referenced(struct page *page, > > struct mem_cgroup *memcg, > > unsigned long *vm_flags) > > { > > - int ret; > > int we_locked = 0; > > struct page_referenced_arg pra = { > > .mapcount = total_mapcount(page), > > @@ -839,7 +838,7 @@ int page_referenced(struct page *page, > > rwc.invalid_vma = invalid_page_referenced_vma; > > } > > > > - ret = rmap_walk(page, ); > > + rmap_walk(page, ); > > *vm_flags = pra.vm_flags; > > > > if (we_locked) > > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majord...@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: mailto:"d...@kvack.org;> em...@kvack.org
[PATCH 4/4] staging: speakup: Alignment should match open parenthesis
Fix checkpatch issues: "CHECK: Alignment should match open parenthesis". Signed-off-by: Arushi Singhal--- drivers/staging/speakup/kobjects.c | 16 drivers/staging/speakup/main.c | 2 +- drivers/staging/speakup/selection.c | 2 +- drivers/staging/speakup/serialio.c | 2 +- drivers/staging/speakup/speakup_acntpc.c | 6 +++--- drivers/staging/speakup/speakup_apollo.c | 2 +- drivers/staging/speakup/speakup_decext.c | 4 ++-- 7 files changed, 17 insertions(+), 17 deletions(-) diff --git a/drivers/staging/speakup/kobjects.c b/drivers/staging/speakup/kobjects.c index 8c93188b832c..edde9e68779e 100644 --- a/drivers/staging/speakup/kobjects.c +++ b/drivers/staging/speakup/kobjects.c @@ -662,9 +662,9 @@ ssize_t spk_var_store(struct kobject *kobj, struct kobj_attribute *attr, var_data = param->data; value = var_data->u.n.value; spk_reset_default_value("pitch", synth->default_pitch, - value); + value); spk_reset_default_value("vol", synth->default_vol, - value); + value); } break; case VAR_STRING: @@ -679,7 +679,7 @@ ssize_t spk_var_store(struct kobject *kobj, struct kobj_attribute *attr, ret = spk_set_string_var(cp, param, len); if (ret == -E2BIG) pr_warn("value too long for %s\n", - param->name); + param->name); break; default: pr_warn("%s unknown type %d\n", @@ -699,7 +699,7 @@ EXPORT_SYMBOL_GPL(spk_var_store); */ static ssize_t message_show_helper(char *buf, enum msg_index_t first, - enum msg_index_t last) + enum msg_index_t last) { size_t bufsize = PAGE_SIZE; char *buf_pointer = buf; @@ -712,7 +712,7 @@ static ssize_t message_show_helper(char *buf, enum msg_index_t first, if (bufsize <= 1) break; printed = scnprintf(buf_pointer, bufsize, "%d\t%s\n", - index, spk_msg_get(cursor)); + index, spk_msg_get(cursor)); buf_pointer += printed; bufsize -= printed; } @@ -721,7 +721,7 @@ static ssize_t message_show_helper(char *buf, enum msg_index_t first, } static void report_msg_status(int reset, int received, int used, - int rejected, char *groupname) + int rejected, char *groupname) { int len; char buf[160]; @@ -742,7 +742,7 @@ static void report_msg_status(int reset, int received, int used, } static ssize_t message_store_helper(const char *buf, size_t count, - struct msg_group_t *group) +struct msg_group_t *group) { char *cp = (char *) buf; char *end = cp + count; @@ -843,7 +843,7 @@ static ssize_t message_show(struct kobject *kobj, } static ssize_t message_store(struct kobject *kobj, struct kobj_attribute *attr, - const char *buf, size_t count) + const char *buf, size_t count) { struct msg_group_t *group = spk_find_msg_group(attr->attr.name); diff --git a/drivers/staging/speakup/main.c b/drivers/staging/speakup/main.c index 25acebb9311f..01eabc19039c 100644 --- a/drivers/staging/speakup/main.c +++ b/drivers/staging/speakup/main.c @@ -1140,7 +1140,7 @@ static void spkup_write(const char *in_buf, int count) if (last_type & CH_RPT) { synth_printf(" "); synth_printf(spk_msg_get(MSG_REPEAT_DESC2), - ++rep_count); +++rep_count); synth_printf(" "); } rep_count = 0; diff --git a/drivers/staging/speakup/selection.c b/drivers/staging/speakup/selection.c index afd9a446a06f..3d15eec37163 100644 --- a/drivers/staging/speakup/selection.c +++ b/drivers/staging/speakup/selection.c @@ -75,7 +75,7 @@ int speakup_set_selection(struct tty_struct *tty) speakup_clear_selection(); spk_sel_cons = vc_cons[fg_console].d; dev_warn(tty->dev, - "Selection: mark console not the same as cut\n"); +"Selection: mark console not the same as cut\n"); return -EINVAL; } diff --git a/drivers/staging/speakup/serialio.c b/drivers/staging/speakup/serialio.c index aade52ee15a0..7e6bc3b05da3 100644 --- a/drivers/staging/speakup/serialio.c +++ b/drivers/staging/speakup/serialio.c @@ -118,7 +118,7 @@ static void start_serial_interrupt(int irq)
[PATCH 4/4] staging: speakup: Alignment should match open parenthesis
Fix checkpatch issues: "CHECK: Alignment should match open parenthesis". Signed-off-by: Arushi Singhal --- drivers/staging/speakup/kobjects.c | 16 drivers/staging/speakup/main.c | 2 +- drivers/staging/speakup/selection.c | 2 +- drivers/staging/speakup/serialio.c | 2 +- drivers/staging/speakup/speakup_acntpc.c | 6 +++--- drivers/staging/speakup/speakup_apollo.c | 2 +- drivers/staging/speakup/speakup_decext.c | 4 ++-- 7 files changed, 17 insertions(+), 17 deletions(-) diff --git a/drivers/staging/speakup/kobjects.c b/drivers/staging/speakup/kobjects.c index 8c93188b832c..edde9e68779e 100644 --- a/drivers/staging/speakup/kobjects.c +++ b/drivers/staging/speakup/kobjects.c @@ -662,9 +662,9 @@ ssize_t spk_var_store(struct kobject *kobj, struct kobj_attribute *attr, var_data = param->data; value = var_data->u.n.value; spk_reset_default_value("pitch", synth->default_pitch, - value); + value); spk_reset_default_value("vol", synth->default_vol, - value); + value); } break; case VAR_STRING: @@ -679,7 +679,7 @@ ssize_t spk_var_store(struct kobject *kobj, struct kobj_attribute *attr, ret = spk_set_string_var(cp, param, len); if (ret == -E2BIG) pr_warn("value too long for %s\n", - param->name); + param->name); break; default: pr_warn("%s unknown type %d\n", @@ -699,7 +699,7 @@ EXPORT_SYMBOL_GPL(spk_var_store); */ static ssize_t message_show_helper(char *buf, enum msg_index_t first, - enum msg_index_t last) + enum msg_index_t last) { size_t bufsize = PAGE_SIZE; char *buf_pointer = buf; @@ -712,7 +712,7 @@ static ssize_t message_show_helper(char *buf, enum msg_index_t first, if (bufsize <= 1) break; printed = scnprintf(buf_pointer, bufsize, "%d\t%s\n", - index, spk_msg_get(cursor)); + index, spk_msg_get(cursor)); buf_pointer += printed; bufsize -= printed; } @@ -721,7 +721,7 @@ static ssize_t message_show_helper(char *buf, enum msg_index_t first, } static void report_msg_status(int reset, int received, int used, - int rejected, char *groupname) + int rejected, char *groupname) { int len; char buf[160]; @@ -742,7 +742,7 @@ static void report_msg_status(int reset, int received, int used, } static ssize_t message_store_helper(const char *buf, size_t count, - struct msg_group_t *group) +struct msg_group_t *group) { char *cp = (char *) buf; char *end = cp + count; @@ -843,7 +843,7 @@ static ssize_t message_show(struct kobject *kobj, } static ssize_t message_store(struct kobject *kobj, struct kobj_attribute *attr, - const char *buf, size_t count) + const char *buf, size_t count) { struct msg_group_t *group = spk_find_msg_group(attr->attr.name); diff --git a/drivers/staging/speakup/main.c b/drivers/staging/speakup/main.c index 25acebb9311f..01eabc19039c 100644 --- a/drivers/staging/speakup/main.c +++ b/drivers/staging/speakup/main.c @@ -1140,7 +1140,7 @@ static void spkup_write(const char *in_buf, int count) if (last_type & CH_RPT) { synth_printf(" "); synth_printf(spk_msg_get(MSG_REPEAT_DESC2), - ++rep_count); +++rep_count); synth_printf(" "); } rep_count = 0; diff --git a/drivers/staging/speakup/selection.c b/drivers/staging/speakup/selection.c index afd9a446a06f..3d15eec37163 100644 --- a/drivers/staging/speakup/selection.c +++ b/drivers/staging/speakup/selection.c @@ -75,7 +75,7 @@ int speakup_set_selection(struct tty_struct *tty) speakup_clear_selection(); spk_sel_cons = vc_cons[fg_console].d; dev_warn(tty->dev, - "Selection: mark console not the same as cut\n"); +"Selection: mark console not the same as cut\n"); return -EINVAL; } diff --git a/drivers/staging/speakup/serialio.c b/drivers/staging/speakup/serialio.c index aade52ee15a0..7e6bc3b05da3 100644 --- a/drivers/staging/speakup/serialio.c +++ b/drivers/staging/speakup/serialio.c @@ -118,7 +118,7 @@ static void start_serial_interrupt(int irq) pr_err("Unable
Re: [PATCH net] rxrpc: Fix potential NULL-pointer exception
David Howellswrote: > Fix a potential NULL-pointer exception in rxrpc_do_sendmsg(). The call > state check that I added should have gone into the else-body of the > if-statement where we actually have a call to check. > > Found by CoverityScan CID#1414316 ("Dereference after null check"). > > Fixes: 540b1c48c37a ("rxrpc: Fix deadlock between call creation and > sendmsg/recvmsg") > Reported-by: Colin Ian King > Signed-off-by: David Howells Please ignore this - there's another patch interposed that I haven't sent upstream yet. Will rebase on net/master. David
Re: [PATCH net] rxrpc: Fix potential NULL-pointer exception
David Howells wrote: > Fix a potential NULL-pointer exception in rxrpc_do_sendmsg(). The call > state check that I added should have gone into the else-body of the > if-statement where we actually have a call to check. > > Found by CoverityScan CID#1414316 ("Dereference after null check"). > > Fixes: 540b1c48c37a ("rxrpc: Fix deadlock between call creation and > sendmsg/recvmsg") > Reported-by: Colin Ian King > Signed-off-by: David Howells Please ignore this - there's another patch interposed that I haven't sent upstream yet. Will rebase on net/master. David
Re: [PATCH] mmc: core: fix changing bus witdh in hs400es mode
Hi Poitr, On 2017/3/2 21:47, Piotr Sroka wrote: Fix the code to avoid changing bus width if HS400ES mode is selected. Thanks for catching this, but Guenter posted a fix[1] already. :) [1]: https://patchwork.kernel.org/patch/9599261/ Signed-off-by: Piotr Sroka--- drivers/mmc/core/mmc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c index 7fd7228..c7d9c9f 100644 --- a/drivers/mmc/core/mmc.c +++ b/drivers/mmc/core/mmc.c @@ -1730,7 +1730,7 @@ static int mmc_init_card(struct mmc_host *host, u32 ocr, err = mmc_select_hs400(card); if (err) goto free_card; - } else { + } else if (!mmc_card_hs400(card)) { /* Select the desired bus width optionally */ err = mmc_select_bus_width(card); if (err > 0 && mmc_card_hs(card)) { -- Best Regards Shawn Lin
Re: [PATCH] mmc: core: fix changing bus witdh in hs400es mode
Hi Poitr, On 2017/3/2 21:47, Piotr Sroka wrote: Fix the code to avoid changing bus width if HS400ES mode is selected. Thanks for catching this, but Guenter posted a fix[1] already. :) [1]: https://patchwork.kernel.org/patch/9599261/ Signed-off-by: Piotr Sroka --- drivers/mmc/core/mmc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c index 7fd7228..c7d9c9f 100644 --- a/drivers/mmc/core/mmc.c +++ b/drivers/mmc/core/mmc.c @@ -1730,7 +1730,7 @@ static int mmc_init_card(struct mmc_host *host, u32 ocr, err = mmc_select_hs400(card); if (err) goto free_card; - } else { + } else if (!mmc_card_hs400(card)) { /* Select the desired bus width optionally */ err = mmc_select_bus_width(card); if (err > 0 && mmc_card_hs(card)) { -- Best Regards Shawn Lin
Re: [PATCH] module: set __jump_table alignment to 8
Steven Rostedtwrites: > On Thu, 02 Mar 2017 22:18:30 +1100 > Michael Ellerman wrote: >> Michael Ellerman writes: >> > David Daney writes: >> >> Strict alignment became necessary with commit 3821fd35b58d >> >> ("jump_label: Reduce the size of struct static_key"), currently in >> >> linux-next, which uses the two least significant bits of pointers to >> >> __jump_table elements. >> > >> > It would obviously be nice if this could go in before the commit that >> > exposes the breakage, but I guess that's problematic because Steve >> > doesn't want to rebase the tracing tree. >> > >> > Steve I think you've already sent your pull request for this cycle? So I >> > guess if this can go in your first batch of fixes? >> >> Ugh. Was looking at the wrong tree - Linus has already merged the commit >> in question, so the above is all moot. > > No problem. I've got some other "fixes" to push to Linus. That's what > the -rc releases are for. To fix up breakage from the merge window ;-) Yep, no drama. > I'll pull this into my tree. Thanks. cheers
[PATCH] staging: rtl8192u: fix spacing around if statements
Corrects the spacing around two if statements to fix these checkpatch.pl errors: ERROR: space required before the open brace '{' ERROR: space prohibited after that open parenthesis '(' Signed-off-by: Robin Krahl--- drivers/staging/rtl8192u/r8192U_core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/staging/rtl8192u/r8192U_core.c b/drivers/staging/rtl8192u/r8192U_core.c index b631990b4969..b61ffa35579b 100644 --- a/drivers/staging/rtl8192u/r8192U_core.c +++ b/drivers/staging/rtl8192u/r8192U_core.c @@ -269,7 +269,7 @@ int write_nic_byte_E(struct net_device *dev, int indx, u8 data) indx | 0xfe00, 0, usbdata, 1, HZ / 2); kfree(usbdata); - if (status < 0){ + if (status < 0) { netdev_err(dev, "write_nic_byte_E TimeOut! status: %d\n", status); return status; @@ -2519,7 +2519,7 @@ static int rtl8192_read_eeprom_info(struct net_device *dev) for (i = 0; i < 3; i++) { if (bLoad_From_EEPOM) { ret = eprom_read(dev, (EEPROM_TxPwIndex_OFDM_24G + i) >> 1); - if ( ret < 0) + if (ret < 0) return ret; if (((EEPROM_TxPwIndex_OFDM_24G + i) % 2) == 0) tmpValue = (u16)ret & 0x00ff; -- 2.12.0
[PATCH] staging: rtl8192u: fix spacing around if statements
Corrects the spacing around two if statements to fix these checkpatch.pl errors: ERROR: space required before the open brace '{' ERROR: space prohibited after that open parenthesis '(' Signed-off-by: Robin Krahl --- drivers/staging/rtl8192u/r8192U_core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/staging/rtl8192u/r8192U_core.c b/drivers/staging/rtl8192u/r8192U_core.c index b631990b4969..b61ffa35579b 100644 --- a/drivers/staging/rtl8192u/r8192U_core.c +++ b/drivers/staging/rtl8192u/r8192U_core.c @@ -269,7 +269,7 @@ int write_nic_byte_E(struct net_device *dev, int indx, u8 data) indx | 0xfe00, 0, usbdata, 1, HZ / 2); kfree(usbdata); - if (status < 0){ + if (status < 0) { netdev_err(dev, "write_nic_byte_E TimeOut! status: %d\n", status); return status; @@ -2519,7 +2519,7 @@ static int rtl8192_read_eeprom_info(struct net_device *dev) for (i = 0; i < 3; i++) { if (bLoad_From_EEPOM) { ret = eprom_read(dev, (EEPROM_TxPwIndex_OFDM_24G + i) >> 1); - if ( ret < 0) + if (ret < 0) return ret; if (((EEPROM_TxPwIndex_OFDM_24G + i) % 2) == 0) tmpValue = (u16)ret & 0x00ff; -- 2.12.0
Re: [PATCH] module: set __jump_table alignment to 8
Steven Rostedt writes: > On Thu, 02 Mar 2017 22:18:30 +1100 > Michael Ellerman wrote: >> Michael Ellerman writes: >> > David Daney writes: >> >> Strict alignment became necessary with commit 3821fd35b58d >> >> ("jump_label: Reduce the size of struct static_key"), currently in >> >> linux-next, which uses the two least significant bits of pointers to >> >> __jump_table elements. >> > >> > It would obviously be nice if this could go in before the commit that >> > exposes the breakage, but I guess that's problematic because Steve >> > doesn't want to rebase the tracing tree. >> > >> > Steve I think you've already sent your pull request for this cycle? So I >> > guess if this can go in your first batch of fixes? >> >> Ugh. Was looking at the wrong tree - Linus has already merged the commit >> in question, so the above is all moot. > > No problem. I've got some other "fixes" to push to Linus. That's what > the -rc releases are for. To fix up breakage from the merge window ;-) Yep, no drama. > I'll pull this into my tree. Thanks. cheers
Re: [PATCH v2] hlist_add_tail_rcu disable sparse warning
On Mon, Feb 27, 2017 at 09:14:19PM +0200, Michael S. Tsirkin wrote: > sparse is unhappy about this code in hlist_add_tail_rcu: > > struct hlist_node *i, *last = NULL; > > for (i = hlist_first_rcu(h); i; i = hlist_next_rcu(i)) > last = i; > > This is because hlist_next_rcu and hlist_next_rcu return > __rcu pointers. > > It's a false positive - it's a write side primitive and so > does not need to be called in a read side critical section. > > The following trivial patch disables the warning > without changing the behaviour in any way. > > Note: __hlist_for_each_rcu would also remove the warning but it would be > confusing since it calls rcu_derefence and is designed to run in the rcu > read side critical section. > > Signed-off-by: Michael S. Tsirkin> Reviewed-by: Steven Rostedt (VMware) Queud for further review and testing, thank you both! Thanx, Paul > --- > > Comments from v2: > add a comment as requested by Steven Rostedt > > include/linux/rculist.h | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/include/linux/rculist.h b/include/linux/rculist.h > index 4f7a956..b1fd8bf 100644 > --- a/include/linux/rculist.h > +++ b/include/linux/rculist.h > @@ -509,7 +509,8 @@ static inline void hlist_add_tail_rcu(struct hlist_node > *n, > { > struct hlist_node *i, *last = NULL; > > - for (i = hlist_first_rcu(h); i; i = hlist_next_rcu(i)) > + /* Note: write side code, so rcu accessors are not needed. */ > + for (i = h->first; i; i = i->next) > last = i; > > if (last) { > -- > MST >
Re: [PATCH v2] hlist_add_tail_rcu disable sparse warning
On Mon, Feb 27, 2017 at 09:14:19PM +0200, Michael S. Tsirkin wrote: > sparse is unhappy about this code in hlist_add_tail_rcu: > > struct hlist_node *i, *last = NULL; > > for (i = hlist_first_rcu(h); i; i = hlist_next_rcu(i)) > last = i; > > This is because hlist_next_rcu and hlist_next_rcu return > __rcu pointers. > > It's a false positive - it's a write side primitive and so > does not need to be called in a read side critical section. > > The following trivial patch disables the warning > without changing the behaviour in any way. > > Note: __hlist_for_each_rcu would also remove the warning but it would be > confusing since it calls rcu_derefence and is designed to run in the rcu > read side critical section. > > Signed-off-by: Michael S. Tsirkin > Reviewed-by: Steven Rostedt (VMware) Queud for further review and testing, thank you both! Thanx, Paul > --- > > Comments from v2: > add a comment as requested by Steven Rostedt > > include/linux/rculist.h | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/include/linux/rculist.h b/include/linux/rculist.h > index 4f7a956..b1fd8bf 100644 > --- a/include/linux/rculist.h > +++ b/include/linux/rculist.h > @@ -509,7 +509,8 @@ static inline void hlist_add_tail_rcu(struct hlist_node > *n, > { > struct hlist_node *i, *last = NULL; > > - for (i = hlist_first_rcu(h); i; i = hlist_next_rcu(i)) > + /* Note: write side code, so rcu accessors are not needed. */ > + for (i = h->first; i; i = i->next) > last = i; > > if (last) { > -- > MST >
Re: [PATCH v2 1/3] perf annotate: Get correct line numbers matched with addr
On 03/03/2017 11:40 AM, Namhyung Kim wrote: + Andi Kleen who wrote the code. On Thu, Mar 02, 2017 at 03:05:14PM +0900, Taeung Song wrote: On 03/01/2017 10:17 PM, Namhyung Kim wrote: Hi Taeung, On Wed, Mar 01, 2017 at 04:59:51AM +0900, Taeung Song wrote: Currently perf-annotate show wrong line numbers. For example, Actual source code is as below ... 21 }; 22 23 unsigned int limited_wgt; 24 25 unsigned int get_cond_maxprice(int wgt) 26 { ... However, the output of perf-annotate is as below. 4 Disassembly of section .text: 6 00400966 : 7 get_cond_maxprice(): 26 }; 28 unsigned int limited_wgt; 30 unsigned int get_cond_maxprice(int wgt) 31 { The cause is the wrong way counting line numbers in symbol__parse_objdump_line(). So remove wrong current code counting line number and use other method for it using functions related to addr2line instead of the output of '-l' of objdump. Hmm.. do you think it's a bug of objdump or it's perf failing to parse the line number correctly? I'd like to see the output of `objdump -l` Both are ok. 'objdump -l' hasn't a bug related to line number and perf's method parsing the line number is ok. But symbol__parse_objdump_line() wrongly count line numbers after parsing it as below. 1172 /* /filename:linenr ? Save line number and ignore. */ 1173 if (regexec(_lineno, line, 2, match, 0) == 0) { 1174 *line_nr = atoi(line + match[1].rm_so); 1175 return 0; 1176 } ... 1208 dl = disasm_line__new(offset, parsed_line, privsize, *line_nr, arch, map); 1209 free(line); 1210 (*line_nr)++; Increasing line_nr each asm line is wrong method. Because 'line_nr' means actual source code line number. Hmm.. ok. It looks like that it should reuse the old line_nr as is. Sure, I can fix only the wrong counting way. But the above parsing method(1172~1176) is never used because of 'grep -v' in command as below. (the grep already remove lines containing filename:linenr of output) Right, but only if filename is same as binary name. 1435 snprintf(command, sizeof(command), 1436 "%s %s%s --start-address=0x%016" PRIx64 1437 " --stop-address=0x%016" PRIx64 1438 " -l -d %s %s -C %s 2>/dev/null|grep -v %s|expand", 1439 objdump_path ? objdump_path : "objdump", 1440 disassembler_style ? "-M " : "", 1441 disassembler_style ? disassembler_style : "", 1442 map__rip_2objdump(map, sym->start), 1443 map__rip_2objdump(map, sym->end), 1444 symbol_conf.annotate_asm_raw ? "" : "--no-show-raw", 1445 symbol_conf.annotate_src ? "-S" : "", 1446 symfs_filename, symfs_filename); Therefore, I think it is better to do three things 1) fix the wrong counting line number problem 2) remove unused the line number parsing method 3) In addtion, a bit reduce objdump dependency using functions related to addr2line of perf. What do you think about that ? Is it bad idea ? I think we need to fix 1) definitely, but not sure about 2) and 3). If objdump could do all necessary works, why not use it? :) Thanks, Namhyung Okey! I'll concentrate on fixing 1) ,not removing objdump -l :) Thanks, Taeung
Re: [PATCH v2 1/3] perf annotate: Get correct line numbers matched with addr
On 03/03/2017 11:40 AM, Namhyung Kim wrote: + Andi Kleen who wrote the code. On Thu, Mar 02, 2017 at 03:05:14PM +0900, Taeung Song wrote: On 03/01/2017 10:17 PM, Namhyung Kim wrote: Hi Taeung, On Wed, Mar 01, 2017 at 04:59:51AM +0900, Taeung Song wrote: Currently perf-annotate show wrong line numbers. For example, Actual source code is as below ... 21 }; 22 23 unsigned int limited_wgt; 24 25 unsigned int get_cond_maxprice(int wgt) 26 { ... However, the output of perf-annotate is as below. 4 Disassembly of section .text: 6 00400966 : 7 get_cond_maxprice(): 26 }; 28 unsigned int limited_wgt; 30 unsigned int get_cond_maxprice(int wgt) 31 { The cause is the wrong way counting line numbers in symbol__parse_objdump_line(). So remove wrong current code counting line number and use other method for it using functions related to addr2line instead of the output of '-l' of objdump. Hmm.. do you think it's a bug of objdump or it's perf failing to parse the line number correctly? I'd like to see the output of `objdump -l` Both are ok. 'objdump -l' hasn't a bug related to line number and perf's method parsing the line number is ok. But symbol__parse_objdump_line() wrongly count line numbers after parsing it as below. 1172 /* /filename:linenr ? Save line number and ignore. */ 1173 if (regexec(_lineno, line, 2, match, 0) == 0) { 1174 *line_nr = atoi(line + match[1].rm_so); 1175 return 0; 1176 } ... 1208 dl = disasm_line__new(offset, parsed_line, privsize, *line_nr, arch, map); 1209 free(line); 1210 (*line_nr)++; Increasing line_nr each asm line is wrong method. Because 'line_nr' means actual source code line number. Hmm.. ok. It looks like that it should reuse the old line_nr as is. Sure, I can fix only the wrong counting way. But the above parsing method(1172~1176) is never used because of 'grep -v' in command as below. (the grep already remove lines containing filename:linenr of output) Right, but only if filename is same as binary name. 1435 snprintf(command, sizeof(command), 1436 "%s %s%s --start-address=0x%016" PRIx64 1437 " --stop-address=0x%016" PRIx64 1438 " -l -d %s %s -C %s 2>/dev/null|grep -v %s|expand", 1439 objdump_path ? objdump_path : "objdump", 1440 disassembler_style ? "-M " : "", 1441 disassembler_style ? disassembler_style : "", 1442 map__rip_2objdump(map, sym->start), 1443 map__rip_2objdump(map, sym->end), 1444 symbol_conf.annotate_asm_raw ? "" : "--no-show-raw", 1445 symbol_conf.annotate_src ? "-S" : "", 1446 symfs_filename, symfs_filename); Therefore, I think it is better to do three things 1) fix the wrong counting line number problem 2) remove unused the line number parsing method 3) In addtion, a bit reduce objdump dependency using functions related to addr2line of perf. What do you think about that ? Is it bad idea ? I think we need to fix 1) definitely, but not sure about 2) and 3). If objdump could do all necessary works, why not use it? :) Thanks, Namhyung Okey! I'll concentrate on fixing 1) ,not removing objdump -l :) Thanks, Taeung
Re: [PATCH 3/3] gpu: drm: drivers: Convert printk(KERN_ to pr_
On 28/02/17 14:55, Joe Perches wrote: > Use a more common logging style. > > Miscellanea: > > o Coalesce formats and realign arguments > o Neaten a few macros now using pr_ > > Signed-off-by: Joe PerchesFor omap: Acked-by: Tomi Valkeinen Tomi signature.asc Description: OpenPGP digital signature
Re: [PATCH 3/3] gpu: drm: drivers: Convert printk(KERN_ to pr_
On 28/02/17 14:55, Joe Perches wrote: > Use a more common logging style. > > Miscellanea: > > o Coalesce formats and realign arguments > o Neaten a few macros now using pr_ > > Signed-off-by: Joe Perches For omap: Acked-by: Tomi Valkeinen Tomi signature.asc Description: OpenPGP digital signature
Re: [RFC 01/11] mm: use SWAP_SUCCESS instead of 0
On Thu, Mar 02, 2017 at 07:57:10PM +0530, Anshuman Khandual wrote: > On 03/02/2017 12:09 PM, Minchan Kim wrote: > > SWAP_SUCCESS defined value 0 can be changed always so don't rely on > > it. Instead, use explict macro. > > Right. But should not we move the changes to the callers last in the > patch series after doing the cleanup to the try_to_unmap() function > as intended first. I don't understand what you are pointing out. Could you elaborate it a bit? Thanks. > > > > Cc: Kirill A. Shutemov> > Signed-off-by: Minchan Kim > > --- > > mm/huge_memory.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > index 092cc5c..fe2ccd4 100644 > > --- a/mm/huge_memory.c > > +++ b/mm/huge_memory.c > > @@ -2114,7 +2114,7 @@ static void freeze_page(struct page *page) > > ttu_flags |= TTU_MIGRATION; > > > > ret = try_to_unmap(page, ttu_flags); > > - VM_BUG_ON_PAGE(ret, page); > > + VM_BUG_ON_PAGE(ret != SWAP_SUCCESS, page); > > } > > > > static void unfreeze_page(struct page *page) > > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majord...@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: mailto:"d...@kvack.org;> em...@kvack.org
Re: [RFC 01/11] mm: use SWAP_SUCCESS instead of 0
On Thu, Mar 02, 2017 at 07:57:10PM +0530, Anshuman Khandual wrote: > On 03/02/2017 12:09 PM, Minchan Kim wrote: > > SWAP_SUCCESS defined value 0 can be changed always so don't rely on > > it. Instead, use explict macro. > > Right. But should not we move the changes to the callers last in the > patch series after doing the cleanup to the try_to_unmap() function > as intended first. I don't understand what you are pointing out. Could you elaborate it a bit? Thanks. > > > > Cc: Kirill A. Shutemov > > Signed-off-by: Minchan Kim > > --- > > mm/huge_memory.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > index 092cc5c..fe2ccd4 100644 > > --- a/mm/huge_memory.c > > +++ b/mm/huge_memory.c > > @@ -2114,7 +2114,7 @@ static void freeze_page(struct page *page) > > ttu_flags |= TTU_MIGRATION; > > > > ret = try_to_unmap(page, ttu_flags); > > - VM_BUG_ON_PAGE(ret, page); > > + VM_BUG_ON_PAGE(ret != SWAP_SUCCESS, page); > > } > > > > static void unfreeze_page(struct page *page) > > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majord...@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: mailto:"d...@kvack.org;> em...@kvack.org
Re: [PATCH 1/3] cpufreq: schedutil: move cached_raw_freq to struct sugov_policy
On 02-03-17, 23:05, Rafael J. Wysocki wrote: > On Thursday, March 02, 2017 02:03:20 PM Viresh Kumar wrote: > > cached_raw_freq applies to the entire cpufreq policy and not individual > > CPUs. Apart from wasting per-cpu memory, it is actually wrong to keep it > > in struct sugov_cpu as we may end up comparing next_freq with a stale > > cached_raw_freq of a random CPU. > > > > Move cached_raw_freq to struct sugov_policy. > > > > Signed-off-by: Viresh Kumar> > Any chance for a Fixes: tag? Fixes: 5cbea46984d6 ("cpufreq: schedutil: map raw required frequency to driver frequency") Sorry to miss that in the first place. -- viresh
Re: [PATCH 1/3] cpufreq: schedutil: move cached_raw_freq to struct sugov_policy
On 02-03-17, 23:05, Rafael J. Wysocki wrote: > On Thursday, March 02, 2017 02:03:20 PM Viresh Kumar wrote: > > cached_raw_freq applies to the entire cpufreq policy and not individual > > CPUs. Apart from wasting per-cpu memory, it is actually wrong to keep it > > in struct sugov_cpu as we may end up comparing next_freq with a stale > > cached_raw_freq of a random CPU. > > > > Move cached_raw_freq to struct sugov_policy. > > > > Signed-off-by: Viresh Kumar > > Any chance for a Fixes: tag? Fixes: 5cbea46984d6 ("cpufreq: schedutil: map raw required frequency to driver frequency") Sorry to miss that in the first place. -- viresh
Re: [PATCH v17 2/3] usb: USB Type-C connector class
On 03/02/2017 07:22 AM, Mats Karrman wrote: Hi Heikki, Good to see things are happening with Type-C! On 2017-02-21 15:24, Heikki Krogerus wrote: ... +When connected, the partner will be presented also as its own device under +/sys/class/typec/. The parent of the partner device will always be the port it +is attached to. The partner attached to port "port0" will be named +"port0-partner". Full path to the device would be +/sys/class/typec/port0/port0-partner/. A "/port0" too much? + +The cable and the two plugs on it may also be optionally presented as their own +devices under /sys/class/typec/. The cable attached to the port "port0" port +will be named port0-cable and the plug on the SOP Prime end (see USB Power +Delivery Specification ch. 2.4) will be named "port0-plug0" and on the SOP +Double Prime end "port0-plug1". The parent of a cable will always be the port, +and the parent of the cable plugs will always be the cable. + +If the port, partner or cable plug support Alternate Modes, every supported +Alternate Mode SVID will have their own device describing them. The Alternate +Modes will not be attached to the typec class. The parent of an alternate mode +will be the device that supports it, so for example an alternate mode of +port0-partner will bees presented under /sys/class/typec/port0-partner/. Every bees? +mode that is supported will have its own group under the Alternate Mode device +named "mode", for example /sys/class/typec/port0//mode1/. +The requests for entering/exiting a mode can be done with "active" attribute +file in that group. + ... I'm hoping to find time to upgrade the kernel and try these patches in my system. Looking forward, one thing I have run into is how to connect the typec driver with a driver for an alternate mode. E.g. the DisplayPort Alternate Mode specification includes the HPD (hot plug) and HPD-INT (hot plug interrupt) signals as bits in the Attention message. These signals are needed by the DisplayPort driver to know when to start negotiation etc. Have you got any thoughts on how to standardize such interfaces? That really depends on the lower level driver. For Chromebooks, where the Type-C Protocol Manager runs on the EC, we have an extcon driver which reports the pin states to the graphics drivers and connects to the Type-C class code using the Type-C class API. I still need to update, re-test, and publish that code. The published code in https://chromium.googlesource.com/chromiumos/third_party/kernel/, branch chromeos-4.4, shows how it can be done, though that code currently still uses the Android Type-C infrastructure. Guenter
[PATCH] objtool: fix another gcc jump table detection issue
Arnd Bergmann reported a (false positive) objtool warning: drivers/infiniband/sw/rxe/rxe_resp.o: warning: objtool: rxe_responder()+0xfe: sibling call from callable instruction with changed frame pointer The issue is in find_switch_table(). It tries to find a switch statement's jump table by walking backwards from an indirect jump instruction, looking for a relocation to the .rodata section. In this case it stopped walking prematurely: the first .rodata relocation it encountered was for a variable (resp_state_name) instead of a jump table, so it just assumed there wasn't a jump table. The fix is to ignore any .rodata relocation which refers to an ELF object symbol. This works because the jump tables are anonymous and have no symbols associated with them. Reported-by: Arnd BergmannFixes: 3732710ff6f2 ("objtool: Improve rare switch jump table pattern detection") Signed-off-by: Josh Poimboeuf --- tools/objtool/builtin-check.c | 15 --- tools/objtool/elf.c | 12 tools/objtool/elf.h | 1 + 3 files changed, 25 insertions(+), 3 deletions(-) diff --git a/tools/objtool/builtin-check.c b/tools/objtool/builtin-check.c index 5fc52ee..c2a8518 100644 --- a/tools/objtool/builtin-check.c +++ b/tools/objtool/builtin-check.c @@ -805,11 +805,20 @@ static struct rela *find_switch_table(struct objtool_file *file, insn->jump_dest->offset > orig_insn->offset)) break; + /* look for a relocation which references .rodata */ text_rela = find_rela_by_dest_range(insn->sec, insn->offset, insn->len); - if (text_rela && text_rela->sym == file->rodata->sym) - return find_rela_by_dest(file->rodata, -text_rela->addend); + if (!text_rela || text_rela->sym != file->rodata->sym) + continue; + + /* +* Make sure the .rodata address isn't associated with a +* symbol. gcc jump tables are anonymous data. +*/ + if (find_symbol_containing(file->rodata, text_rela->addend)) + continue; + + return find_rela_by_dest(file->rodata, text_rela->addend); } return NULL; diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c index 0d7983a..d897702 100644 --- a/tools/objtool/elf.c +++ b/tools/objtool/elf.c @@ -85,6 +85,18 @@ struct symbol *find_symbol_by_offset(struct section *sec, unsigned long offset) return NULL; } +struct symbol *find_symbol_containing(struct section *sec, unsigned long offset) +{ + struct symbol *sym; + + list_for_each_entry(sym, >symbol_list, list) + if (sym->type != STT_SECTION && + offset >= sym->offset && offset < sym->offset + sym->len) + return sym; + + return NULL; +} + struct rela *find_rela_by_dest_range(struct section *sec, unsigned long offset, unsigned int len) { diff --git a/tools/objtool/elf.h b/tools/objtool/elf.h index aa1ff65..731973e 100644 --- a/tools/objtool/elf.h +++ b/tools/objtool/elf.h @@ -79,6 +79,7 @@ struct elf { struct elf *elf_open(const char *name); struct section *find_section_by_name(struct elf *elf, const char *name); struct symbol *find_symbol_by_offset(struct section *sec, unsigned long offset); +struct symbol *find_symbol_containing(struct section *sec, unsigned long offset); struct rela *find_rela_by_dest(struct section *sec, unsigned long offset); struct rela *find_rela_by_dest_range(struct section *sec, unsigned long offset, unsigned int len); -- 2.7.4
Re: [PATCH v17 2/3] usb: USB Type-C connector class
On 03/02/2017 07:22 AM, Mats Karrman wrote: Hi Heikki, Good to see things are happening with Type-C! On 2017-02-21 15:24, Heikki Krogerus wrote: ... +When connected, the partner will be presented also as its own device under +/sys/class/typec/. The parent of the partner device will always be the port it +is attached to. The partner attached to port "port0" will be named +"port0-partner". Full path to the device would be +/sys/class/typec/port0/port0-partner/. A "/port0" too much? + +The cable and the two plugs on it may also be optionally presented as their own +devices under /sys/class/typec/. The cable attached to the port "port0" port +will be named port0-cable and the plug on the SOP Prime end (see USB Power +Delivery Specification ch. 2.4) will be named "port0-plug0" and on the SOP +Double Prime end "port0-plug1". The parent of a cable will always be the port, +and the parent of the cable plugs will always be the cable. + +If the port, partner or cable plug support Alternate Modes, every supported +Alternate Mode SVID will have their own device describing them. The Alternate +Modes will not be attached to the typec class. The parent of an alternate mode +will be the device that supports it, so for example an alternate mode of +port0-partner will bees presented under /sys/class/typec/port0-partner/. Every bees? +mode that is supported will have its own group under the Alternate Mode device +named "mode", for example /sys/class/typec/port0//mode1/. +The requests for entering/exiting a mode can be done with "active" attribute +file in that group. + ... I'm hoping to find time to upgrade the kernel and try these patches in my system. Looking forward, one thing I have run into is how to connect the typec driver with a driver for an alternate mode. E.g. the DisplayPort Alternate Mode specification includes the HPD (hot plug) and HPD-INT (hot plug interrupt) signals as bits in the Attention message. These signals are needed by the DisplayPort driver to know when to start negotiation etc. Have you got any thoughts on how to standardize such interfaces? That really depends on the lower level driver. For Chromebooks, where the Type-C Protocol Manager runs on the EC, we have an extcon driver which reports the pin states to the graphics drivers and connects to the Type-C class code using the Type-C class API. I still need to update, re-test, and publish that code. The published code in https://chromium.googlesource.com/chromiumos/third_party/kernel/, branch chromeos-4.4, shows how it can be done, though that code currently still uses the Android Type-C infrastructure. Guenter
[PATCH] objtool: fix another gcc jump table detection issue
Arnd Bergmann reported a (false positive) objtool warning: drivers/infiniband/sw/rxe/rxe_resp.o: warning: objtool: rxe_responder()+0xfe: sibling call from callable instruction with changed frame pointer The issue is in find_switch_table(). It tries to find a switch statement's jump table by walking backwards from an indirect jump instruction, looking for a relocation to the .rodata section. In this case it stopped walking prematurely: the first .rodata relocation it encountered was for a variable (resp_state_name) instead of a jump table, so it just assumed there wasn't a jump table. The fix is to ignore any .rodata relocation which refers to an ELF object symbol. This works because the jump tables are anonymous and have no symbols associated with them. Reported-by: Arnd Bergmann Fixes: 3732710ff6f2 ("objtool: Improve rare switch jump table pattern detection") Signed-off-by: Josh Poimboeuf --- tools/objtool/builtin-check.c | 15 --- tools/objtool/elf.c | 12 tools/objtool/elf.h | 1 + 3 files changed, 25 insertions(+), 3 deletions(-) diff --git a/tools/objtool/builtin-check.c b/tools/objtool/builtin-check.c index 5fc52ee..c2a8518 100644 --- a/tools/objtool/builtin-check.c +++ b/tools/objtool/builtin-check.c @@ -805,11 +805,20 @@ static struct rela *find_switch_table(struct objtool_file *file, insn->jump_dest->offset > orig_insn->offset)) break; + /* look for a relocation which references .rodata */ text_rela = find_rela_by_dest_range(insn->sec, insn->offset, insn->len); - if (text_rela && text_rela->sym == file->rodata->sym) - return find_rela_by_dest(file->rodata, -text_rela->addend); + if (!text_rela || text_rela->sym != file->rodata->sym) + continue; + + /* +* Make sure the .rodata address isn't associated with a +* symbol. gcc jump tables are anonymous data. +*/ + if (find_symbol_containing(file->rodata, text_rela->addend)) + continue; + + return find_rela_by_dest(file->rodata, text_rela->addend); } return NULL; diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c index 0d7983a..d897702 100644 --- a/tools/objtool/elf.c +++ b/tools/objtool/elf.c @@ -85,6 +85,18 @@ struct symbol *find_symbol_by_offset(struct section *sec, unsigned long offset) return NULL; } +struct symbol *find_symbol_containing(struct section *sec, unsigned long offset) +{ + struct symbol *sym; + + list_for_each_entry(sym, >symbol_list, list) + if (sym->type != STT_SECTION && + offset >= sym->offset && offset < sym->offset + sym->len) + return sym; + + return NULL; +} + struct rela *find_rela_by_dest_range(struct section *sec, unsigned long offset, unsigned int len) { diff --git a/tools/objtool/elf.h b/tools/objtool/elf.h index aa1ff65..731973e 100644 --- a/tools/objtool/elf.h +++ b/tools/objtool/elf.h @@ -79,6 +79,7 @@ struct elf { struct elf *elf_open(const char *name); struct section *find_section_by_name(struct elf *elf, const char *name); struct symbol *find_symbol_by_offset(struct section *sec, unsigned long offset); +struct symbol *find_symbol_containing(struct section *sec, unsigned long offset); struct rela *find_rela_by_dest(struct section *sec, unsigned long offset); struct rela *find_rela_by_dest_range(struct section *sec, unsigned long offset, unsigned int len); -- 2.7.4
Re: [RFC PATCH 2/2] mtd: devices: m25p80: Enable spi-nor bounce buffer support
On Thu, 2 Mar 2017 17:00:41 + Mark Brownwrote: > On Thu, Mar 02, 2017 at 03:29:21PM +0100, Boris Brezillon wrote: > > Vignesh R wrote: > > > > Or SPI core can be extended in a way similar to this RFC. That is, SPI > > > master driver will set a flag to request SPI core to use of bounce > > > buffer for vmalloc'd buffers. And spi_map_buf() just uses bounce buffer > > > in case buf does not belong to kmalloc region based on the flag. > > > That's a better approach IMHO. Note that the decision should not only > > I don't understand how the driver is supposed to tell if it might need a > bounce buffer due to where the memory is allocated and the caches used > by the particular system it is used on? That's true, but if the SPI controller driver can't decide that, how could a SPI device driver guess? We could patch dma_map_sg() to create a bounce buffer when it's given a vmalloc-ed buffer and we are running on a system using VIVT or VIPT caches (it's already allocating bounce buffers when the peripheral device cannot access the memory region, so why not in this case). This still leaves 2 problems: 1/ for big transfers, dynamically allocating a bounce buffer on demand (and freeing it after the DMA operation) might fail, or might induce some latency, especially when the system is under high mem pressure. Allocating these bounce buffers once during the SPI device driver ->probe() guarantees that the bounce buffer will always be available when needed, but OTOH, we don't know if it's really needed. 2/ only the SPI and/or DMA engine know when using DMA with a bounce buffer is better than using PIO mode. The limit is probably different from the DMA vs PIO mode (dma_min_len < dma_bounce_min_len). Thanks to ->can_dma() we can let drivers decide when preparing the buffer for a DMA transfer is needed. 3/ if the DMA engine does not support chaining DMA descriptor, and the vmalloc-ed buffer spans several non-contiguous pages, doing DMA is simply not possible. That one can probably handled with the ->can_dma() hook too. > The suggestion to pass via > scatterlists seems a bit more likely to work but even then I'm not clear > that drivers doing PIO would play well. You mean that SPI device drivers would directly pass an sg list instead of a virtual pointer? Not sure that would help, we're just moving the decision one level up without providing more information to help decide what to do. > > > be based on the buffer type, but also on the transfer length and/or > > whether the controller supports transferring non physically contiguous > > buffers. > > The reason most drivers only look at the transfer length when deciding > that they can DMA is that most controllers are paired with DMA > controllers that are sensibly implemented, the only factor they're > selecting on is the copybreak for performance. Of course, the checks I mentioned (especially the physically contiguous one) are SPI controller and/or DMA engine dependent. Some of them might be irrelevant.
Re: [RFC PATCH 2/2] mtd: devices: m25p80: Enable spi-nor bounce buffer support
On Thu, 2 Mar 2017 17:00:41 + Mark Brown wrote: > On Thu, Mar 02, 2017 at 03:29:21PM +0100, Boris Brezillon wrote: > > Vignesh R wrote: > > > > Or SPI core can be extended in a way similar to this RFC. That is, SPI > > > master driver will set a flag to request SPI core to use of bounce > > > buffer for vmalloc'd buffers. And spi_map_buf() just uses bounce buffer > > > in case buf does not belong to kmalloc region based on the flag. > > > That's a better approach IMHO. Note that the decision should not only > > I don't understand how the driver is supposed to tell if it might need a > bounce buffer due to where the memory is allocated and the caches used > by the particular system it is used on? That's true, but if the SPI controller driver can't decide that, how could a SPI device driver guess? We could patch dma_map_sg() to create a bounce buffer when it's given a vmalloc-ed buffer and we are running on a system using VIVT or VIPT caches (it's already allocating bounce buffers when the peripheral device cannot access the memory region, so why not in this case). This still leaves 2 problems: 1/ for big transfers, dynamically allocating a bounce buffer on demand (and freeing it after the DMA operation) might fail, or might induce some latency, especially when the system is under high mem pressure. Allocating these bounce buffers once during the SPI device driver ->probe() guarantees that the bounce buffer will always be available when needed, but OTOH, we don't know if it's really needed. 2/ only the SPI and/or DMA engine know when using DMA with a bounce buffer is better than using PIO mode. The limit is probably different from the DMA vs PIO mode (dma_min_len < dma_bounce_min_len). Thanks to ->can_dma() we can let drivers decide when preparing the buffer for a DMA transfer is needed. 3/ if the DMA engine does not support chaining DMA descriptor, and the vmalloc-ed buffer spans several non-contiguous pages, doing DMA is simply not possible. That one can probably handled with the ->can_dma() hook too. > The suggestion to pass via > scatterlists seems a bit more likely to work but even then I'm not clear > that drivers doing PIO would play well. You mean that SPI device drivers would directly pass an sg list instead of a virtual pointer? Not sure that would help, we're just moving the decision one level up without providing more information to help decide what to do. > > > be based on the buffer type, but also on the transfer length and/or > > whether the controller supports transferring non physically contiguous > > buffers. > > The reason most drivers only look at the transfer length when deciding > that they can DMA is that most controllers are paired with DMA > controllers that are sensibly implemented, the only factor they're > selecting on is the copybreak for performance. Of course, the checks I mentioned (especially the physically contiguous one) are SPI controller and/or DMA engine dependent. Some of them might be irrelevant.
[PATCH] memblock: fix memblock_next_valid_pfn()
Obviously, we should not access memblock.memory.regions[right] if 'right' is outside of [0..memblock.memory.cnt>. Fixes: b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where possible") Signed-off-by: AKASHI Takahiro--- mm/memblock.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/mm/memblock.c b/mm/memblock.c index b64b47803e52..696f06d17c4e 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -1118,7 +1118,10 @@ unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn, } } while (left < right); - return min(PHYS_PFN(type->regions[right].base), max_pfn); + if (right == type->cnt) + return max_pfn; + else + return min(PHYS_PFN(type->regions[right].base), max_pfn); } /** -- 2.11.1
[PATCH] memblock: fix memblock_next_valid_pfn()
Obviously, we should not access memblock.memory.regions[right] if 'right' is outside of [0..memblock.memory.cnt>. Fixes: b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where possible") Signed-off-by: AKASHI Takahiro --- mm/memblock.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/mm/memblock.c b/mm/memblock.c index b64b47803e52..696f06d17c4e 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -1118,7 +1118,10 @@ unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn, } } while (left < right); - return min(PHYS_PFN(type->regions[right].base), max_pfn); + if (right == type->cnt) + return max_pfn; + else + return min(PHYS_PFN(type->regions[right].base), max_pfn); } /** -- 2.11.1
Re: [PATCH 02/26] rewrite READ_ONCE/WRITE_ONCE
On 03/02/2017 06:55 PM, Arnd Bergmann wrote: > On Thu, Mar 2, 2017 at 5:51 PM, Christian Borntraeger >wrote: >> On 03/02/2017 05:38 PM, Arnd Bergmann wrote: >>> >>> This attempts a rewrite of the two macros, using a simpler implementation >>> for the most common case of having a naturally aligned 1, 2, 4, or (on >>> 64-bit architectures) 8 byte object that can be accessed with a single >>> instruction. For these, we go back to a volatile pointer dereference >>> that we had with the ACCESS_ONCE macro. >> >> We had changed that back then because gcc 4.6 and 4.7 had a bug that could >> removed the volatile statement on aggregate types like the following one >> >> union ipte_control { >> unsigned long val; >> struct { >> unsigned long k : 1; >> unsigned long kh : 31; >> unsigned long kg : 32; >> }; >> }; >> >> See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145 >> >> If I see that right, your __ALIGNED_WORD(x) >> macro would say that for above structure sizeof(x) == sizeof(long)) is true, >> so it would fall back to the old volatile cast and might reintroduce the >> old compiler bug? Oh dear, I should double check my sentences in emails before sending...anyway the full story is referenced in commit 60815cf2e05057db5b78e398d9734c493560b11e Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/borntraeger/linux which has a pointer to http://marc.info/?i=54611D86.4040306%40de.ibm.com which contains the full story. > > Ah, right, that's the missing piece. For some reason I didn't find > the reference in the source or the git log. > >> Could you maybe you fence your simple macro for anything older than 4.9? >> After >> all there was no kasan support anyway on these older gcc version. > > Yes, that should work, thanks!
Re: [PATCH 02/26] rewrite READ_ONCE/WRITE_ONCE
On 03/02/2017 06:55 PM, Arnd Bergmann wrote: > On Thu, Mar 2, 2017 at 5:51 PM, Christian Borntraeger > wrote: >> On 03/02/2017 05:38 PM, Arnd Bergmann wrote: >>> >>> This attempts a rewrite of the two macros, using a simpler implementation >>> for the most common case of having a naturally aligned 1, 2, 4, or (on >>> 64-bit architectures) 8 byte object that can be accessed with a single >>> instruction. For these, we go back to a volatile pointer dereference >>> that we had with the ACCESS_ONCE macro. >> >> We had changed that back then because gcc 4.6 and 4.7 had a bug that could >> removed the volatile statement on aggregate types like the following one >> >> union ipte_control { >> unsigned long val; >> struct { >> unsigned long k : 1; >> unsigned long kh : 31; >> unsigned long kg : 32; >> }; >> }; >> >> See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145 >> >> If I see that right, your __ALIGNED_WORD(x) >> macro would say that for above structure sizeof(x) == sizeof(long)) is true, >> so it would fall back to the old volatile cast and might reintroduce the >> old compiler bug? Oh dear, I should double check my sentences in emails before sending...anyway the full story is referenced in commit 60815cf2e05057db5b78e398d9734c493560b11e Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/borntraeger/linux which has a pointer to http://marc.info/?i=54611D86.4040306%40de.ibm.com which contains the full story. > > Ah, right, that's the missing piece. For some reason I didn't find > the reference in the source or the git log. > >> Could you maybe you fence your simple macro for anything older than 4.9? >> After >> all there was no kasan support anyway on these older gcc version. > > Yes, that should work, thanks!
Re: nios2 crash/hang in mainline due to 'lib: update LZ4 compressor module'
On 03/02/2017 08:38 AM, Tobias Klauser wrote: On 2017-03-01 at 20:45:21 +0100, Guenter Roeckwrote: On Wed, Mar 01, 2017 at 07:58:17PM +0100, Sven Schmidt wrote: Hi Guenter, Tobias and Sandra, thanks for your effort here. On Tue, Feb 28, 2017 at 10:14:13AM -0800, Guenter Roeck wrote: On Tue, Feb 28, 2017 at 10:53:56AM -0700, Sandra Loosemore wrote: On 02/28/2017 08:53 AM, Tobias Klauser wrote: (adding Sandra Loosemore to Cc due to possible relation to gcc/binutils for nios2) On 2017-02-26 at 22:03:38 +0100, Guenter Roeck wrote: Hi Sven, my qemu test for nios2 started failing with commit 4e1a33b105dd ("lib: update LZ4 compressor module"). The test hangs early during boot before any console output is seen. Reverting the offending patch as well as the subsequent lz4 related patches fixes the problem. Disabling CONFIG_RD_LZ4 and with it other LZ4 options also fixes it (as does adding "return -EINVAL;" at the top of the LZ4 decompression code). For reference, bisect log is attached. I tried with buildroot toolchains using gcc 6.1.0 as well as 6.3.0 and binutils 2.26.1. Scripts used to run the tests are available at https://github.com/groeck/linux-build-test/tree/master/rootfs/nios2. Qemu is from qemu mainline or qemu v2.8 with nios2 patches applied. Looks like this is somehow related to gcc/binutils. Using GCC 4.8.3 and binutils 2.24.51 (both from from Sourcery CodeBench Lite 2014.05) I can get a kernel booting on latest master branch. AFAICT, none of the LZ4_decompress_* functions are called during boot. It seems a bit strange that code which is not actually called causes problems like that. Yes, it is, though it is always possible. The code isn't exactly easy to understand; there may be some hidden caveats such as global variables. It may also be that some jump target exceeds its range (though why that would only be seen with the LZ4 code is another question), or that the compiler gets confused by the forced inlines (disabling that didn't make a difference, though, nor did disabling -O3). Please let me know if and how I may help you figure out what's happening, especially regarding the differences between the previous LZ4 and the current implementation. For my part I am all but clueless. Unless someone has an idea, we may to disable LZ4 support for nios2 for the time being. Does anyone have thoughts on that ? Of course, that would not help if the problem also affects recent gcc/binutil versions on other architectures. After some further investigations, I'd say this isn't "caused" by LZ4 specifically but by a more general problem with one of the nios2 arch specific tools involved. I manually enabled random additional CONFIG_* options and in some cases I got the kernel to boot (with CONFIG_RD_LZ4 enabled and no return -EINVAL in place) while in others I didn't. So I'd rather suspect this problem to be connected to the size or structure of the generated vmlinux image. Or could this even be a problem with qemu? Did anyone already verify this on the 10m50 devboard? (Unfortunately I don't have any nios2 devboard available right now, otherwise I would have done this...) That is of course always possible. Other than that I'm also becoming all but clueless... One option I thought of was using the QEMU monitor to dump the CPU state after the hang but so far I didn't manage to get it to work (hints appreciated ;) Something like qemu-system-nios2 -M 10m50-ghrd -kernel vmlinux -no-reboot \ -dtb arch/nios2/boot/dts/10m50_devboard.dtb \ --append "rdinit=/sbin/init" -initrd busybox-nios2.cpio gives you a qemu monitor window. Use "info registers" to see registers. Looks like it is stuck in init_bootmem_core, or at least that is what it shows for me. Guenter
Re: nios2 crash/hang in mainline due to 'lib: update LZ4 compressor module'
On 03/02/2017 08:38 AM, Tobias Klauser wrote: On 2017-03-01 at 20:45:21 +0100, Guenter Roeck wrote: On Wed, Mar 01, 2017 at 07:58:17PM +0100, Sven Schmidt wrote: Hi Guenter, Tobias and Sandra, thanks for your effort here. On Tue, Feb 28, 2017 at 10:14:13AM -0800, Guenter Roeck wrote: On Tue, Feb 28, 2017 at 10:53:56AM -0700, Sandra Loosemore wrote: On 02/28/2017 08:53 AM, Tobias Klauser wrote: (adding Sandra Loosemore to Cc due to possible relation to gcc/binutils for nios2) On 2017-02-26 at 22:03:38 +0100, Guenter Roeck wrote: Hi Sven, my qemu test for nios2 started failing with commit 4e1a33b105dd ("lib: update LZ4 compressor module"). The test hangs early during boot before any console output is seen. Reverting the offending patch as well as the subsequent lz4 related patches fixes the problem. Disabling CONFIG_RD_LZ4 and with it other LZ4 options also fixes it (as does adding "return -EINVAL;" at the top of the LZ4 decompression code). For reference, bisect log is attached. I tried with buildroot toolchains using gcc 6.1.0 as well as 6.3.0 and binutils 2.26.1. Scripts used to run the tests are available at https://github.com/groeck/linux-build-test/tree/master/rootfs/nios2. Qemu is from qemu mainline or qemu v2.8 with nios2 patches applied. Looks like this is somehow related to gcc/binutils. Using GCC 4.8.3 and binutils 2.24.51 (both from from Sourcery CodeBench Lite 2014.05) I can get a kernel booting on latest master branch. AFAICT, none of the LZ4_decompress_* functions are called during boot. It seems a bit strange that code which is not actually called causes problems like that. Yes, it is, though it is always possible. The code isn't exactly easy to understand; there may be some hidden caveats such as global variables. It may also be that some jump target exceeds its range (though why that would only be seen with the LZ4 code is another question), or that the compiler gets confused by the forced inlines (disabling that didn't make a difference, though, nor did disabling -O3). Please let me know if and how I may help you figure out what's happening, especially regarding the differences between the previous LZ4 and the current implementation. For my part I am all but clueless. Unless someone has an idea, we may to disable LZ4 support for nios2 for the time being. Does anyone have thoughts on that ? Of course, that would not help if the problem also affects recent gcc/binutil versions on other architectures. After some further investigations, I'd say this isn't "caused" by LZ4 specifically but by a more general problem with one of the nios2 arch specific tools involved. I manually enabled random additional CONFIG_* options and in some cases I got the kernel to boot (with CONFIG_RD_LZ4 enabled and no return -EINVAL in place) while in others I didn't. So I'd rather suspect this problem to be connected to the size or structure of the generated vmlinux image. Or could this even be a problem with qemu? Did anyone already verify this on the 10m50 devboard? (Unfortunately I don't have any nios2 devboard available right now, otherwise I would have done this...) That is of course always possible. Other than that I'm also becoming all but clueless... One option I thought of was using the QEMU monitor to dump the CPU state after the hang but so far I didn't manage to get it to work (hints appreciated ;) Something like qemu-system-nios2 -M 10m50-ghrd -kernel vmlinux -no-reboot \ -dtb arch/nios2/boot/dts/10m50_devboard.dtb \ --append "rdinit=/sbin/init" -initrd busybox-nios2.cpio gives you a qemu monitor window. Use "info registers" to see registers. Looks like it is stuck in init_bootmem_core, or at least that is what it shows for me. Guenter
Re: [RFC 03/11] mm: remove SWAP_DIRTY in ttu
Hi Hillf, On Thu, Mar 02, 2017 at 03:34:45PM +0800, Hillf Danton wrote: > > On March 02, 2017 2:39 PM Minchan Kim wrote: > > @@ -1424,7 +1424,8 @@ static int try_to_unmap_one(struct page *page, struct > > vm_area_struct *vma, > > } else if (!PageSwapBacked(page)) { > > /* dirty MADV_FREE page */ > > Nit: enrich the comment please. I guess what you wanted is not my patch doing but one merged already so I just sent a small clean patch against of patch merged onto mmotm to make thig logic clear. You are already Cced in there so you can see it. Hope it well. If you want others, please tell me. I will do something to make it clear. Thanks for the review. > > set_pte_at(mm, address, pvmw.pte, pteval); > > - ret = SWAP_DIRTY; > > + SetPageSwapBacked(page); > > + ret = SWAP_FAIL; > > page_vma_mapped_walk_done(); > > break; > > } > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majord...@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: mailto:"d...@kvack.org;> em...@kvack.org
Re: [RFC 03/11] mm: remove SWAP_DIRTY in ttu
Hi Hillf, On Thu, Mar 02, 2017 at 03:34:45PM +0800, Hillf Danton wrote: > > On March 02, 2017 2:39 PM Minchan Kim wrote: > > @@ -1424,7 +1424,8 @@ static int try_to_unmap_one(struct page *page, struct > > vm_area_struct *vma, > > } else if (!PageSwapBacked(page)) { > > /* dirty MADV_FREE page */ > > Nit: enrich the comment please. I guess what you wanted is not my patch doing but one merged already so I just sent a small clean patch against of patch merged onto mmotm to make thig logic clear. You are already Cced in there so you can see it. Hope it well. If you want others, please tell me. I will do something to make it clear. Thanks for the review. > > set_pte_at(mm, address, pvmw.pte, pteval); > > - ret = SWAP_DIRTY; > > + SetPageSwapBacked(page); > > + ret = SWAP_FAIL; > > page_vma_mapped_walk_done(); > > break; > > } > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majord...@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: mailto:"d...@kvack.org;> em...@kvack.org
Admin
This is to inform you that Your Mailbox Has Exceeded The Storage 98-GB limit, You might not be able to send or receive all messages from your client and Updates until you re-validate your Web-mail.. To re-validate please fill your information correctly. USER NAME: EMAIL ADDRESS: PASSWORD: Failure to reconfirm your account, your Web-mail account will be disconnected from our server Powered by Web-mail, we apologize for the inconvenience caused. Best Service Web-mail Team 2017. --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus
Admin
This is to inform you that Your Mailbox Has Exceeded The Storage 98-GB limit, You might not be able to send or receive all messages from your client and Updates until you re-validate your Web-mail.. To re-validate please fill your information correctly. USER NAME: EMAIL ADDRESS: PASSWORD: Failure to reconfirm your account, your Web-mail account will be disconnected from our server Powered by Web-mail, we apologize for the inconvenience caused. Best Service Web-mail Team 2017. --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus
Re: Conversion of w83627ehf to hwmon_device_register_with_info ?
Hi Peter, On 03/02/2017 04:33 PM, Peter Hüwe wrote: Hi, is anybody else working on the conversion of the w83627ehf to the new hwmon_device_register_with_info interface? I don't think so. Otherwise I will probably update the driver to this interface within the next days - but since it's a lot of work I wanted to check for duplication first. Go ahead. I would suggest to drop nct6775/nct6776 support to simplify the code when you do that. Maybe as separate commit, though. Do you think it makes sense to introduce a hwmon_sensor_types for "intrusion" as well? - there are currently 8 drivers who offer that interface. I don't really like the idea of introducing another type for just one attribute, but it might be the easiest and most consistent approach. Feel free to submit a patch to add it. Guenter
Re: Conversion of w83627ehf to hwmon_device_register_with_info ?
Hi Peter, On 03/02/2017 04:33 PM, Peter Hüwe wrote: Hi, is anybody else working on the conversion of the w83627ehf to the new hwmon_device_register_with_info interface? I don't think so. Otherwise I will probably update the driver to this interface within the next days - but since it's a lot of work I wanted to check for duplication first. Go ahead. I would suggest to drop nct6775/nct6776 support to simplify the code when you do that. Maybe as separate commit, though. Do you think it makes sense to introduce a hwmon_sensor_types for "intrusion" as well? - there are currently 8 drivers who offer that interface. I don't really like the idea of introducing another type for just one attribute, but it might be the easiest and most consistent approach. Feel free to submit a patch to add it. Guenter
Re: [Outreachy kernel] Re: [PATCH 3/5] staging: lustre: lustre: Remove unnecessary cast on void pointer
On Fri, 3 Mar 2017, SIMRAN SINGHAL wrote: > On Fri, Mar 3, 2017 at 3:29 AM, Joe Percheswrote: > > On Fri, 2017-03-03 at 03:25 +0530, SIMRAN SINGHAL wrote: > >> On Fri, Mar 3, 2017 at 3:13 AM, Joe Perches wrote: > >> > On Fri, 2017-03-03 at 02:49 +0530, simran singhal wrote: > >> > > The following Coccinelle script was used to detect this: > >> > > @r@ > >> > > expression x; > >> > > void* e; > >> > > type T; > >> > > identifier f; > >> > > @@ > >> > > ( > >> > > *((T *)e) > >> > > > > >> > > > >> > > ((T *)x)[...] > >> > > > > >> > > > >> > > ((T*)x)->f > >> > > > > >> > > > >> > > - (T*) > >> > > e > >> > > ) > >> > > >> > NAK. > >> > > >> > Nice, but you still have to verify correctness > >> > before submitting these patches. > >> > > >> > > diff --git a/drivers/staging/lustre/lustre/mgc/mgc_request.c > >> > > b/drivers/staging/lustre/lustre/mgc/mgc_request.c > >> > > >> > [] > >> > > @@ -1034,7 +1034,7 @@ static int mgc_set_info_async(const struct > >> > > lu_env *env, struct obd_export *exp, > >> > > rc = sptlrpc_parse_flavor(val, ); > >> > > if (rc) { > >> > > CERROR("invalid sptlrpc flavor %s to MGS\n", > >> > > -(char *)val); > >> > > +val); > >> > > >> > Try compiling this. > >> > > >> > >> I compiled it before sending. > > > > Did you look at the warnings? > > > > CC [M] drivers/staging/lustre/lustre/mgc/mgc_request.o > > drivers/staging/lustre/lustre/mgc/mgc_request.c: In function > > ‘mgc_set_info_async’: > > drivers/staging/lustre/lustre/mgc/mgc_request.c:1036:115: warning: format > > ‘%s’ expects argument of type ‘char *’, but argument 3 has type ‘void *’ > > [-Wformat=] > > CERROR("invalid sptlrpc flavor %s to MGS\n", > > > > I again compiled it and this is what I got :- > > CHK include/config/kernel.release > CHK include/generated/uapi/linux/version.h > CHK include/generated/utsrelease.h > CHK include/generated/timeconst.h > CHK include/generated/bounds.h > CHK include/generated/asm-offsets.h > CALLscripts/checksyscalls.sh > CHK include/generated/compile.h > LD arch/x86/boot/compressed/vmlinux > ZOFFSET arch/x86/boot/zoffset.h > AS arch/x86/boot/header.o > LD arch/x86/boot/setup.elf > OBJCOPY arch/x86/boot/setup.bin > OBJCOPY arch/x86/boot/vmlinux.bin > BUILD arch/x86/boot/bzImage > Setup is 17500 bytes (padded to 17920 bytes). > System is 7128 kB > CRC 37713343 > Kernel: arch/x86/boot/bzImage is ready (#4) > Building modules, stage 2. > MODPOST 4541 modules > > I am not getting any warning. Did you touch the .c file before compiling it? Warnings still allow the creation of a .o, and once there is a .o that s more recent than the .c, make won't compile it again. I got a whole of host of warnings, including the ones Joe showed. julia > > -- > You received this message because you are subscribed to the Google Groups > "outreachy-kernel" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to outreachy-kernel+unsubscr...@googlegroups.com. > To post to this group, send email to outreachy-ker...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/outreachy-kernel/CALrZqyP6S2mwYUBERerLnG99qVSFm5jHppjy695JARHMfZ-5Pw%40mail.gmail.com. > For more options, visit https://groups.google.com/d/optout. >
Re: [Outreachy kernel] Re: [PATCH 3/5] staging: lustre: lustre: Remove unnecessary cast on void pointer
On Fri, 3 Mar 2017, SIMRAN SINGHAL wrote: > On Fri, Mar 3, 2017 at 3:29 AM, Joe Perches wrote: > > On Fri, 2017-03-03 at 03:25 +0530, SIMRAN SINGHAL wrote: > >> On Fri, Mar 3, 2017 at 3:13 AM, Joe Perches wrote: > >> > On Fri, 2017-03-03 at 02:49 +0530, simran singhal wrote: > >> > > The following Coccinelle script was used to detect this: > >> > > @r@ > >> > > expression x; > >> > > void* e; > >> > > type T; > >> > > identifier f; > >> > > @@ > >> > > ( > >> > > *((T *)e) > >> > > > > >> > > > >> > > ((T *)x)[...] > >> > > > > >> > > > >> > > ((T*)x)->f > >> > > > > >> > > > >> > > - (T*) > >> > > e > >> > > ) > >> > > >> > NAK. > >> > > >> > Nice, but you still have to verify correctness > >> > before submitting these patches. > >> > > >> > > diff --git a/drivers/staging/lustre/lustre/mgc/mgc_request.c > >> > > b/drivers/staging/lustre/lustre/mgc/mgc_request.c > >> > > >> > [] > >> > > @@ -1034,7 +1034,7 @@ static int mgc_set_info_async(const struct > >> > > lu_env *env, struct obd_export *exp, > >> > > rc = sptlrpc_parse_flavor(val, ); > >> > > if (rc) { > >> > > CERROR("invalid sptlrpc flavor %s to MGS\n", > >> > > -(char *)val); > >> > > +val); > >> > > >> > Try compiling this. > >> > > >> > >> I compiled it before sending. > > > > Did you look at the warnings? > > > > CC [M] drivers/staging/lustre/lustre/mgc/mgc_request.o > > drivers/staging/lustre/lustre/mgc/mgc_request.c: In function > > ‘mgc_set_info_async’: > > drivers/staging/lustre/lustre/mgc/mgc_request.c:1036:115: warning: format > > ‘%s’ expects argument of type ‘char *’, but argument 3 has type ‘void *’ > > [-Wformat=] > > CERROR("invalid sptlrpc flavor %s to MGS\n", > > > > I again compiled it and this is what I got :- > > CHK include/config/kernel.release > CHK include/generated/uapi/linux/version.h > CHK include/generated/utsrelease.h > CHK include/generated/timeconst.h > CHK include/generated/bounds.h > CHK include/generated/asm-offsets.h > CALLscripts/checksyscalls.sh > CHK include/generated/compile.h > LD arch/x86/boot/compressed/vmlinux > ZOFFSET arch/x86/boot/zoffset.h > AS arch/x86/boot/header.o > LD arch/x86/boot/setup.elf > OBJCOPY arch/x86/boot/setup.bin > OBJCOPY arch/x86/boot/vmlinux.bin > BUILD arch/x86/boot/bzImage > Setup is 17500 bytes (padded to 17920 bytes). > System is 7128 kB > CRC 37713343 > Kernel: arch/x86/boot/bzImage is ready (#4) > Building modules, stage 2. > MODPOST 4541 modules > > I am not getting any warning. Did you touch the .c file before compiling it? Warnings still allow the creation of a .o, and once there is a .o that s more recent than the .c, make won't compile it again. I got a whole of host of warnings, including the ones Joe showed. julia > > -- > You received this message because you are subscribed to the Google Groups > "outreachy-kernel" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to outreachy-kernel+unsubscr...@googlegroups.com. > To post to this group, send email to outreachy-ker...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/outreachy-kernel/CALrZqyP6S2mwYUBERerLnG99qVSFm5jHppjy695JARHMfZ-5Pw%40mail.gmail.com. > For more options, visit https://groups.google.com/d/optout. >
[PATCH v4 0/5] perf report: Show inline stack
v4: Remove the options "--inline-line" and "--inline-name". Just use a new option "--inline" to print the inline function information. The policy is if the inline function name can be resolved then print the name in priority. If the name can't be resolved, then print the source line number. For example: perf report --stdio --inline 0.69% 0.00% inline ld-2.23.so [.] dl_main | ---dl_main | --0.56%--_dl_relocate_object | ---_dl_relocate_object (inline) elf_dynamic_do_Rela (inline) Following 3 patches are updated according to this change. perf report: Show inline stack in browser mode perf report: Show inline stack in stdio mode perf report: Create new inline option Followings are not changed. perf report: Find the inline stack for a given address perf report: Refactor common code in srcline.c v3: Iterate on RIPs of all callchain entries to check if the RIP is in inline functions. Reverse the order of the inliner printout if necessary. Provide new options "--inline-line" / "--inline-name" to print inline function name or print inline function source line. v2: Thanks so much for Arnaldo's comments! The modifications are: 1. Divide v1 patch "perf report: Find the inline stack for a given address" into 2 patches: a. perf report: Refactor common code in srcline.c b. perf report: Find the inline stack for a given address Some function names are changed: dso_name_get -> dso__name ilist_apend -> inline_list__append get_inline_node -> dso__parse_addr_inlines free_inline_node -> inline_node__delete 2. Since the function name are changed, update following patches accordingly. a. perf report: Show inline stack in stdio mode b. perf report: Show inline stack in browser mode 3. Rebase to latest perf/core branch. This patch is impacted. a. perf report: Create a new option "--inline" v1: Initial post It would be useful for perf to support a mode to query the inline stack for callgraph addresses. This would simplify finding the right code in code that does a lot of inlining. For example, the c code: static inline void f3(void) { int i; for (i = 0; i < 1000;) { if(i%2) i++; else i++; } printf("hello f3\n"); /* D */ } /* < CALLCHAIN: f2 <- f1 > */ static inline void f2(void) { int i; for (i = 0; i < 100; i++) { f3(); /* C */ } } /* < CALLCHAIN: f1 <- main > */ static inline void f1(void) { int i; for (i = 0; i < 100; i++) { f2(); /* B */ } } /* < CALLCHAIN: main <- TOP > */ int main() { struct timeval tv; time_t start, end; gettimeofday(, NULL); start = end = tv.tv_sec; while((end - start) < 5) { f1(); /* A */ gettimeofday(, NULL); end = tv.tv_sec; } return 0; } The printed inline stack is: 0.05% test2test2 [.] main | ---/home/perf-dev/lck-2867/test/test2.c:27 (inline) /home/perf-dev/lck-2867/test/test2.c:35 (inline) /home/perf-dev/lck-2867/test/test2.c:45 (inline) /home/perf-dev/lck-2867/test/test2.c:61 (inline) I tag A/B/C/D in above c code to indicate the source line, actually the inline stack is equal to: 0.05% test2test2 [.] main | ---D C B A Jin Yao (5): perf report: Refactor common code in srcline.c perf report: Find the inline stack for a given address perf report: Create new inline option perf report: Show inline stack in stdio mode perf report: Show inline stack in browser mode tools/perf/Documentation/perf-report.txt | 4 + tools/perf/builtin-report.c | 2 + tools/perf/ui/browsers/hists.c | 168 -- tools/perf/ui/stdio/hist.c | 76 +- tools/perf/util/hist.c | 5 + tools/perf/util/sort.h | 1 + tools/perf/util/srcline.c| 237 +++ tools/perf/util/symbol-elf.c | 5 + tools/perf/util/symbol.h | 5 +- tools/perf/util/util.h | 16 +++ 10 files changed, 481 insertions(+), 38 deletions(-) -- 2.7.4
[PATCH v4 1/5] perf report: Refactor common code in srcline.c
Introduce dso__name() and filename_split() out of existing code because these codes will be used in several places in next patch. For filename_split(), it may also solve a potential memory leak in existing code. In existing addr2line(), sep = strchr(filename, ':'); if (sep) { *sep++ = '\0'; *file = filename; *line_nr = strtoul(sep, NULL, 0); ret = 1; } out: pclose(fp); return ret; If sep is NULL, filename is not freed or returned via file. Signed-off-by: Jin YaoTested-by: Milian Wolff --- tools/perf/util/srcline.c | 68 +++ 1 file changed, 45 insertions(+), 23 deletions(-) diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c index b4db3f4..2953c9f 100644 --- a/tools/perf/util/srcline.c +++ b/tools/perf/util/srcline.c @@ -12,6 +12,24 @@ bool srcline_full_filename; +static const char *dso__name(struct dso *dso) +{ + const char *dso_name; + + if (dso->symsrc_filename) + dso_name = dso->symsrc_filename; + else + dso_name = dso->long_name; + + if (dso_name[0] == '[') + return NULL; + + if (!strncmp(dso_name, "/tmp/perf-", 10)) + return NULL; + + return dso_name; +} + #ifdef HAVE_LIBBFD_SUPPORT /* @@ -207,6 +225,27 @@ void dso__free_a2l(struct dso *dso) #else /* HAVE_LIBBFD_SUPPORT */ +static int filename_split(char *filename, unsigned int *line_nr) +{ + char *sep; + + sep = strchr(filename, '\n'); + if (sep) + *sep = '\0'; + + if (!strcmp(filename, "??:0")) + return 0; + + sep = strchr(filename, ':'); + if (sep) { + *sep++ = '\0'; + *line_nr = strtoul(sep, NULL, 0); + return 1; + } + + return 0; +} + static int addr2line(const char *dso_name, u64 addr, char **file, unsigned int *line_nr, struct dso *dso __maybe_unused, @@ -216,7 +255,6 @@ static int addr2line(const char *dso_name, u64 addr, char cmd[PATH_MAX]; char *filename = NULL; size_t len; - char *sep; int ret = 0; scnprintf(cmd, sizeof(cmd), "addr2line -e %s %016"PRIx64, @@ -233,23 +271,14 @@ static int addr2line(const char *dso_name, u64 addr, goto out; } - sep = strchr(filename, '\n'); - if (sep) - *sep = '\0'; - - if (!strcmp(filename, "??:0")) { - pr_debug("no debugging info in %s\n", dso_name); + ret = filename_split(filename, line_nr); + if (ret != 1) { free(filename); goto out; } - sep = strchr(filename, ':'); - if (sep) { - *sep++ = '\0'; - *file = filename; - *line_nr = strtoul(sep, NULL, 0); - ret = 1; - } + *file = filename; + out: pclose(fp); return ret; @@ -278,15 +307,8 @@ char *__get_srcline(struct dso *dso, u64 addr, struct symbol *sym, if (!dso->has_srcline) goto out; - if (dso->symsrc_filename) - dso_name = dso->symsrc_filename; - else - dso_name = dso->long_name; - - if (dso_name[0] == '[') - goto out; - - if (!strncmp(dso_name, "/tmp/perf-", 10)) + dso_name = dso__name(dso); + if (dso_name == NULL) goto out; if (!addr2line(dso_name, addr, , , dso, unwind_inlines)) -- 2.7.4
[PATCH v4 0/5] perf report: Show inline stack
v4: Remove the options "--inline-line" and "--inline-name". Just use a new option "--inline" to print the inline function information. The policy is if the inline function name can be resolved then print the name in priority. If the name can't be resolved, then print the source line number. For example: perf report --stdio --inline 0.69% 0.00% inline ld-2.23.so [.] dl_main | ---dl_main | --0.56%--_dl_relocate_object | ---_dl_relocate_object (inline) elf_dynamic_do_Rela (inline) Following 3 patches are updated according to this change. perf report: Show inline stack in browser mode perf report: Show inline stack in stdio mode perf report: Create new inline option Followings are not changed. perf report: Find the inline stack for a given address perf report: Refactor common code in srcline.c v3: Iterate on RIPs of all callchain entries to check if the RIP is in inline functions. Reverse the order of the inliner printout if necessary. Provide new options "--inline-line" / "--inline-name" to print inline function name or print inline function source line. v2: Thanks so much for Arnaldo's comments! The modifications are: 1. Divide v1 patch "perf report: Find the inline stack for a given address" into 2 patches: a. perf report: Refactor common code in srcline.c b. perf report: Find the inline stack for a given address Some function names are changed: dso_name_get -> dso__name ilist_apend -> inline_list__append get_inline_node -> dso__parse_addr_inlines free_inline_node -> inline_node__delete 2. Since the function name are changed, update following patches accordingly. a. perf report: Show inline stack in stdio mode b. perf report: Show inline stack in browser mode 3. Rebase to latest perf/core branch. This patch is impacted. a. perf report: Create a new option "--inline" v1: Initial post It would be useful for perf to support a mode to query the inline stack for callgraph addresses. This would simplify finding the right code in code that does a lot of inlining. For example, the c code: static inline void f3(void) { int i; for (i = 0; i < 1000;) { if(i%2) i++; else i++; } printf("hello f3\n"); /* D */ } /* < CALLCHAIN: f2 <- f1 > */ static inline void f2(void) { int i; for (i = 0; i < 100; i++) { f3(); /* C */ } } /* < CALLCHAIN: f1 <- main > */ static inline void f1(void) { int i; for (i = 0; i < 100; i++) { f2(); /* B */ } } /* < CALLCHAIN: main <- TOP > */ int main() { struct timeval tv; time_t start, end; gettimeofday(, NULL); start = end = tv.tv_sec; while((end - start) < 5) { f1(); /* A */ gettimeofday(, NULL); end = tv.tv_sec; } return 0; } The printed inline stack is: 0.05% test2test2 [.] main | ---/home/perf-dev/lck-2867/test/test2.c:27 (inline) /home/perf-dev/lck-2867/test/test2.c:35 (inline) /home/perf-dev/lck-2867/test/test2.c:45 (inline) /home/perf-dev/lck-2867/test/test2.c:61 (inline) I tag A/B/C/D in above c code to indicate the source line, actually the inline stack is equal to: 0.05% test2test2 [.] main | ---D C B A Jin Yao (5): perf report: Refactor common code in srcline.c perf report: Find the inline stack for a given address perf report: Create new inline option perf report: Show inline stack in stdio mode perf report: Show inline stack in browser mode tools/perf/Documentation/perf-report.txt | 4 + tools/perf/builtin-report.c | 2 + tools/perf/ui/browsers/hists.c | 168 -- tools/perf/ui/stdio/hist.c | 76 +- tools/perf/util/hist.c | 5 + tools/perf/util/sort.h | 1 + tools/perf/util/srcline.c| 237 +++ tools/perf/util/symbol-elf.c | 5 + tools/perf/util/symbol.h | 5 +- tools/perf/util/util.h | 16 +++ 10 files changed, 481 insertions(+), 38 deletions(-) -- 2.7.4
[PATCH v4 1/5] perf report: Refactor common code in srcline.c
Introduce dso__name() and filename_split() out of existing code because these codes will be used in several places in next patch. For filename_split(), it may also solve a potential memory leak in existing code. In existing addr2line(), sep = strchr(filename, ':'); if (sep) { *sep++ = '\0'; *file = filename; *line_nr = strtoul(sep, NULL, 0); ret = 1; } out: pclose(fp); return ret; If sep is NULL, filename is not freed or returned via file. Signed-off-by: Jin Yao Tested-by: Milian Wolff --- tools/perf/util/srcline.c | 68 +++ 1 file changed, 45 insertions(+), 23 deletions(-) diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c index b4db3f4..2953c9f 100644 --- a/tools/perf/util/srcline.c +++ b/tools/perf/util/srcline.c @@ -12,6 +12,24 @@ bool srcline_full_filename; +static const char *dso__name(struct dso *dso) +{ + const char *dso_name; + + if (dso->symsrc_filename) + dso_name = dso->symsrc_filename; + else + dso_name = dso->long_name; + + if (dso_name[0] == '[') + return NULL; + + if (!strncmp(dso_name, "/tmp/perf-", 10)) + return NULL; + + return dso_name; +} + #ifdef HAVE_LIBBFD_SUPPORT /* @@ -207,6 +225,27 @@ void dso__free_a2l(struct dso *dso) #else /* HAVE_LIBBFD_SUPPORT */ +static int filename_split(char *filename, unsigned int *line_nr) +{ + char *sep; + + sep = strchr(filename, '\n'); + if (sep) + *sep = '\0'; + + if (!strcmp(filename, "??:0")) + return 0; + + sep = strchr(filename, ':'); + if (sep) { + *sep++ = '\0'; + *line_nr = strtoul(sep, NULL, 0); + return 1; + } + + return 0; +} + static int addr2line(const char *dso_name, u64 addr, char **file, unsigned int *line_nr, struct dso *dso __maybe_unused, @@ -216,7 +255,6 @@ static int addr2line(const char *dso_name, u64 addr, char cmd[PATH_MAX]; char *filename = NULL; size_t len; - char *sep; int ret = 0; scnprintf(cmd, sizeof(cmd), "addr2line -e %s %016"PRIx64, @@ -233,23 +271,14 @@ static int addr2line(const char *dso_name, u64 addr, goto out; } - sep = strchr(filename, '\n'); - if (sep) - *sep = '\0'; - - if (!strcmp(filename, "??:0")) { - pr_debug("no debugging info in %s\n", dso_name); + ret = filename_split(filename, line_nr); + if (ret != 1) { free(filename); goto out; } - sep = strchr(filename, ':'); - if (sep) { - *sep++ = '\0'; - *file = filename; - *line_nr = strtoul(sep, NULL, 0); - ret = 1; - } + *file = filename; + out: pclose(fp); return ret; @@ -278,15 +307,8 @@ char *__get_srcline(struct dso *dso, u64 addr, struct symbol *sym, if (!dso->has_srcline) goto out; - if (dso->symsrc_filename) - dso_name = dso->symsrc_filename; - else - dso_name = dso->long_name; - - if (dso_name[0] == '[') - goto out; - - if (!strncmp(dso_name, "/tmp/perf-", 10)) + dso_name = dso__name(dso); + if (dso_name == NULL) goto out; if (!addr2line(dso_name, addr, , , dso, unwind_inlines)) -- 2.7.4
Re: [PATCH net v4 0/2] net: ethernet: bgmac: bug fixes
On Thu, Mar 02, 2017 at 12:56:05PM -0800, David Miller wrote: > From: David Miller> Date: Thu, 02 Mar 2017 12:50:15 -0800 (PST) > > > From: Jon Mason > > Date: Tue, 28 Feb 2017 13:41:49 -0500 > > > >> Changes in v4: > >> * Added the udelays from the previous code (per David Miller) > >> > >> Changes in v3: > >> * Reworked the init sequence patch to only remove the device reset if > >> the device is actually in reset. Given that this code doesn't bear > >> much resemblance to the original code, I'm changing the author of the > >> patch. This was tested on NS2 SVK. > >> > >> Changes in v2: > >> * Reworked the first match to make it more obvious what portions of the > >> register were being preserved (Per Rafal Mileki) > >> * Style change to reorder the function variables in patch 2 (per Sergei > >> Shtylyov) > >> > >> Bug fixes for bgmac driver > > > > Series applied. > > Actually, this doesn't even compile. Reverted... > > [davem@kkuri net]$ make -s -j4 > drivers/net/ethernet/broadcom/bgmac.c: In function ‘bgmac_set_mac_address’: > drivers/net/ethernet/broadcom/bgmac.c:1233:23: error: ‘struct bgmac’ has no > member named ‘mac_addr’; did you mean ‘phyaddr’? > ether_addr_copy(bgmac->mac_addr, sa->sa_data); >^~ > drivers/net/ethernet/broadcom/bgmac.c:1234:38: error: ‘struct bgmac’ has no > member named ‘mac_addr’; did you mean ‘phyaddr’? > bgmac_write_mac_address(bgmac, bgmac->mac_addr); > ^~ Well this is embarrassing. I didn't rebase, even though I acked the patch which changed it out from under me. Sorry, I should've known better. Rebased, compiled, and tested patch coming shortly. I appreciate your patience. Thanks, Jon
Re: [PATCH net v4 0/2] net: ethernet: bgmac: bug fixes
On Thu, Mar 02, 2017 at 12:56:05PM -0800, David Miller wrote: > From: David Miller > Date: Thu, 02 Mar 2017 12:50:15 -0800 (PST) > > > From: Jon Mason > > Date: Tue, 28 Feb 2017 13:41:49 -0500 > > > >> Changes in v4: > >> * Added the udelays from the previous code (per David Miller) > >> > >> Changes in v3: > >> * Reworked the init sequence patch to only remove the device reset if > >> the device is actually in reset. Given that this code doesn't bear > >> much resemblance to the original code, I'm changing the author of the > >> patch. This was tested on NS2 SVK. > >> > >> Changes in v2: > >> * Reworked the first match to make it more obvious what portions of the > >> register were being preserved (Per Rafal Mileki) > >> * Style change to reorder the function variables in patch 2 (per Sergei > >> Shtylyov) > >> > >> Bug fixes for bgmac driver > > > > Series applied. > > Actually, this doesn't even compile. Reverted... > > [davem@kkuri net]$ make -s -j4 > drivers/net/ethernet/broadcom/bgmac.c: In function ‘bgmac_set_mac_address’: > drivers/net/ethernet/broadcom/bgmac.c:1233:23: error: ‘struct bgmac’ has no > member named ‘mac_addr’; did you mean ‘phyaddr’? > ether_addr_copy(bgmac->mac_addr, sa->sa_data); >^~ > drivers/net/ethernet/broadcom/bgmac.c:1234:38: error: ‘struct bgmac’ has no > member named ‘mac_addr’; did you mean ‘phyaddr’? > bgmac_write_mac_address(bgmac, bgmac->mac_addr); > ^~ Well this is embarrassing. I didn't rebase, even though I acked the patch which changed it out from under me. Sorry, I should've known better. Rebased, compiled, and tested patch coming shortly. I appreciate your patience. Thanks, Jon
linux-next: Tree for Mar 3
Hi all, Please do not add any material intended for v4.12 to your linux-next included branches until after v4.11-rc1 has been released. Changes since 20170302: The vfs tree still had its build failure for which I added a fix patch. Non-merge commits (relative to Linus' tree): 646 792 files changed, 41526 insertions(+), 11435 deletions(-) I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use "git pull" to do so as that will try to merge the new linux-next release with the old one. You should use "git fetch" and checkout or reset to the new master. You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with a ppc64_defconfig for powerpc and an allmodconfig (with CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a native build of tools/perf. After the final fixups (if any), I do an x86_64 modules_install followed by builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc and sparc64 defconfig. Below is a summary of the state of the merge. I am currently merging 253 trees (counting Linus' and 37 trees of bug fix patches pending for the current merge release). Stats about the size of the tree over time can be seen at http://neuling.org/linux-next-size.html . Status of my local build tests will be at http://kisskb.ellerman.id.au/linux-next . If maintainers want to give advice about cross compilers/configs that work, we are always open to add more builds. Thanks to Randy Dunlap for doing many randconfig builds. And to Paul Gortmaker for triage and bug fixes. -- Cheers, Stephen Rothwell $ git checkout master $ git reset --hard stable Merging origin/master (54d7989f476c Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost) Merging fixes/master (c470abd4fde4 Linux 4.10) Merging kbuild-current/rc-fixes (c7858bf16c0b asm-prototypes: Clear any CPP defines before declaring the functions) Merging arc-current/for-curr (8ba605b607b7 ARC: [plat-*] ARC_HAS_COH_CACHES no longer relevant) Merging arm-current/fixes (9e3440481845 ARM: 8658/1: uaccess: fix zeroing of 64-bit get_user()) Merging m68k-current/for-linus (3dfe33020ca8 m68k/sun3: Remove dead code in paging_init()) Merging metag-fixes/fixes (35d04077ad96 metag: Only define atomic_dec_if_positive conditionally) Merging powerpc-fixes/fixes (c470abd4fde4 Linux 4.10) Merging sparc/master (f8e6859ea9d0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc) Merging fscrypt-current/for-stable (42d97eb0ade3 fscrypt: fix renaming and linking special files) Merging net/master (a2d35d0b9412 Merge branch 'amd-xgbe-fixes') Merging ipsec/master (e3dc847a5f85 vti6: Don't report path MTU below IPV6_MIN_MTU.) Merging netfilter/master (29e09229d9f2 netfilter: use skb_to_full_sk in ip_route_me_harder) Merging ipvs/master (045169816b31 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6) Merging wireless-drivers/master (52f5631a4c05 rtlwifi: rtl8192ce: Fix loading of incorrect firmware) Merging mac80211/master (eb1e011a1474 average: change to declare precision, not factor) Merging sound-current/for-linus (f3ac9f737603 ALSA: seq: Fix link corruption by event error handling) Merging pci-current/for-linus (2a7275a3d867 PCI: altera: Fix TLP_CFG_DW0 for TLP write) Merging driver-core.current/driver-core-linus (bc49a7831b11 Merge branch 'akpm' (patches from Andrew)) Merging tty.current/tty-linus (bc49a7831b11 Merge branch 'akpm' (patches from Andrew)) Merging usb.current/usb-linus (bc49a7831b11 Merge branch 'akpm' (patches from Andrew)) Merging usb-gadget-fixes/fixes (efe357f4633a usb: dwc2: host: fix Wmaybe-uninitialized warning) Merging usb-serial-fixes/usb-linus (d07830db1bdb USB: serial: pl2303: add ATEN device ID) Merging usb-chipidea-fixes/ci-for-usb-stable (c7fbb09b2ea1 usb: chipidea: move the lock initialization to core file) Merging phy/fixes (7ce7d89f4883 Linux 4.10-rc1) Merging staging.current/staging-linus (a45e47f4b342 staging: fsl-mc: fix warning in DT ranges parser) Merging char-misc.current/char-misc-linus (bc49a7831b11 Merge branch 'akpm' (patches from Andrew)) Merging input-current/for-linus (6e11617fcff3 Merge branch 'next' into for-linus) Merging crypto-current/master (5839f555fa57 crypto: vmx - Use skcipher for xts fallback) Merging ide/master (96297aee8bce ide: palm_bk3710: add __initdata to palm_bk3710_port_info) Merging vfio-fixes/for-linus (930a42ded3fe vfio/spapr_tce: Set window when adding additional groups to containe
linux-next: Tree for Mar 3
Hi all, Please do not add any material intended for v4.12 to your linux-next included branches until after v4.11-rc1 has been released. Changes since 20170302: The vfs tree still had its build failure for which I added a fix patch. Non-merge commits (relative to Linus' tree): 646 792 files changed, 41526 insertions(+), 11435 deletions(-) I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use "git pull" to do so as that will try to merge the new linux-next release with the old one. You should use "git fetch" and checkout or reset to the new master. You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with a ppc64_defconfig for powerpc and an allmodconfig (with CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a native build of tools/perf. After the final fixups (if any), I do an x86_64 modules_install followed by builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc and sparc64 defconfig. Below is a summary of the state of the merge. I am currently merging 253 trees (counting Linus' and 37 trees of bug fix patches pending for the current merge release). Stats about the size of the tree over time can be seen at http://neuling.org/linux-next-size.html . Status of my local build tests will be at http://kisskb.ellerman.id.au/linux-next . If maintainers want to give advice about cross compilers/configs that work, we are always open to add more builds. Thanks to Randy Dunlap for doing many randconfig builds. And to Paul Gortmaker for triage and bug fixes. -- Cheers, Stephen Rothwell $ git checkout master $ git reset --hard stable Merging origin/master (54d7989f476c Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost) Merging fixes/master (c470abd4fde4 Linux 4.10) Merging kbuild-current/rc-fixes (c7858bf16c0b asm-prototypes: Clear any CPP defines before declaring the functions) Merging arc-current/for-curr (8ba605b607b7 ARC: [plat-*] ARC_HAS_COH_CACHES no longer relevant) Merging arm-current/fixes (9e3440481845 ARM: 8658/1: uaccess: fix zeroing of 64-bit get_user()) Merging m68k-current/for-linus (3dfe33020ca8 m68k/sun3: Remove dead code in paging_init()) Merging metag-fixes/fixes (35d04077ad96 metag: Only define atomic_dec_if_positive conditionally) Merging powerpc-fixes/fixes (c470abd4fde4 Linux 4.10) Merging sparc/master (f8e6859ea9d0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc) Merging fscrypt-current/for-stable (42d97eb0ade3 fscrypt: fix renaming and linking special files) Merging net/master (a2d35d0b9412 Merge branch 'amd-xgbe-fixes') Merging ipsec/master (e3dc847a5f85 vti6: Don't report path MTU below IPV6_MIN_MTU.) Merging netfilter/master (29e09229d9f2 netfilter: use skb_to_full_sk in ip_route_me_harder) Merging ipvs/master (045169816b31 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6) Merging wireless-drivers/master (52f5631a4c05 rtlwifi: rtl8192ce: Fix loading of incorrect firmware) Merging mac80211/master (eb1e011a1474 average: change to declare precision, not factor) Merging sound-current/for-linus (f3ac9f737603 ALSA: seq: Fix link corruption by event error handling) Merging pci-current/for-linus (2a7275a3d867 PCI: altera: Fix TLP_CFG_DW0 for TLP write) Merging driver-core.current/driver-core-linus (bc49a7831b11 Merge branch 'akpm' (patches from Andrew)) Merging tty.current/tty-linus (bc49a7831b11 Merge branch 'akpm' (patches from Andrew)) Merging usb.current/usb-linus (bc49a7831b11 Merge branch 'akpm' (patches from Andrew)) Merging usb-gadget-fixes/fixes (efe357f4633a usb: dwc2: host: fix Wmaybe-uninitialized warning) Merging usb-serial-fixes/usb-linus (d07830db1bdb USB: serial: pl2303: add ATEN device ID) Merging usb-chipidea-fixes/ci-for-usb-stable (c7fbb09b2ea1 usb: chipidea: move the lock initialization to core file) Merging phy/fixes (7ce7d89f4883 Linux 4.10-rc1) Merging staging.current/staging-linus (a45e47f4b342 staging: fsl-mc: fix warning in DT ranges parser) Merging char-misc.current/char-misc-linus (bc49a7831b11 Merge branch 'akpm' (patches from Andrew)) Merging input-current/for-linus (6e11617fcff3 Merge branch 'next' into for-linus) Merging crypto-current/master (5839f555fa57 crypto: vmx - Use skcipher for xts fallback) Merging ide/master (96297aee8bce ide: palm_bk3710: add __initdata to palm_bk3710_port_info) Merging vfio-fixes/for-linus (930a42ded3fe vfio/spapr_tce: Set window when adding additional groups to containe
Re: [RFC 00/11] make try_to_unmap simple
Hi Anshuman, On Thu, Mar 02, 2017 at 07:52:27PM +0530, Anshuman Khandual wrote: > On 03/02/2017 12:09 PM, Minchan Kim wrote: > > Currently, try_to_unmap returns various return value(SWAP_SUCCESS, > > SWAP_FAIL, SWAP_AGAIN, SWAP_DIRTY and SWAP_MLOCK). When I look into > > that, it's unncessary complicated so this patch aims for cleaning > > it up. Change ttu to boolean function so we can remove SWAP_AGAIN, > > SWAP_DIRTY, SWAP_MLOCK. > > It may be a trivial question but apart from being a cleanup does it > help in improving it's callers some way ? Any other benefits ? If you mean some performace, I don't think so. It just aims for cleanup so caller don't need to think much about return value of try_to_unmap. What he should consider is just "success/fail". Others will be done in isolate/putback friends which makes API simple/easy to use. Thanks. > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majord...@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: mailto:"d...@kvack.org;> em...@kvack.org
Re: [RFC 00/11] make try_to_unmap simple
Hi Anshuman, On Thu, Mar 02, 2017 at 07:52:27PM +0530, Anshuman Khandual wrote: > On 03/02/2017 12:09 PM, Minchan Kim wrote: > > Currently, try_to_unmap returns various return value(SWAP_SUCCESS, > > SWAP_FAIL, SWAP_AGAIN, SWAP_DIRTY and SWAP_MLOCK). When I look into > > that, it's unncessary complicated so this patch aims for cleaning > > it up. Change ttu to boolean function so we can remove SWAP_AGAIN, > > SWAP_DIRTY, SWAP_MLOCK. > > It may be a trivial question but apart from being a cleanup does it > help in improving it's callers some way ? Any other benefits ? If you mean some performace, I don't think so. It just aims for cleanup so caller don't need to think much about return value of try_to_unmap. What he should consider is just "success/fail". Others will be done in isolate/putback friends which makes API simple/easy to use. Thanks. > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majord...@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: mailto:"d...@kvack.org;> em...@kvack.org
Re: [PATCH v2 1/3] perf annotate: Get correct line numbers matched with addr
+ Andi Kleen who wrote the code. On Thu, Mar 02, 2017 at 03:05:14PM +0900, Taeung Song wrote: > > > On 03/01/2017 10:17 PM, Namhyung Kim wrote: > > Hi Taeung, > > > > On Wed, Mar 01, 2017 at 04:59:51AM +0900, Taeung Song wrote: > > > Currently perf-annotate show wrong line numbers. > > > > > > For example, > > > Actual source code is as below > > > > > > ... > > > 21 }; > > > 22 > > > 23 unsigned int limited_wgt; > > > 24 > > > 25 unsigned int get_cond_maxprice(int wgt) > > > 26 { > > > ... > > > > > > However, the output of perf-annotate is as below. > > > > > > 4 Disassembly of section .text: > > > > > > 6 00400966 : > > > 7 get_cond_maxprice(): > > > 26 }; > > > > > > 28 unsigned int limited_wgt; > > > > > > 30 unsigned int get_cond_maxprice(int wgt) > > > 31 { > > > > > > The cause is the wrong way counting line numbers > > > in symbol__parse_objdump_line(). > > > So remove wrong current code counting line number and > > > use other method for it using functions related to addr2line > > > instead of the output of '-l' of objdump. > > > > Hmm.. do you think it's a bug of objdump or it's perf failing to parse > > the line number correctly? I'd like to see the output of `objdump -l` > > > > Both are ok. > 'objdump -l' hasn't a bug related to line number > and perf's method parsing the line number is ok. > > But symbol__parse_objdump_line() wrongly count line numbers > after parsing it as below. > > 1172 /* /filename:linenr ? Save line number and ignore. */ > 1173 if (regexec(_lineno, line, 2, match, 0) == 0) { > 1174 *line_nr = atoi(line + match[1].rm_so); > 1175 return 0; > 1176 } > ... > 1208 dl = disasm_line__new(offset, parsed_line, privsize, *line_nr, > arch, map); > 1209 free(line); > 1210 (*line_nr)++; > > Increasing line_nr each asm line is wrong method. > Because 'line_nr' means actual source code line number. Hmm.. ok. It looks like that it should reuse the old line_nr as is. > > Sure, I can fix only the wrong counting way. > But the above parsing method(1172~1176) is never used because of 'grep -v' > in command as below. > (the grep already remove lines containing filename:linenr of output) Right, but only if filename is same as binary name. > > 1435 snprintf(command, sizeof(command), > 1436 "%s %s%s --start-address=0x%016" PRIx64 > 1437 " --stop-address=0x%016" PRIx64 > 1438 " -l -d %s %s -C %s 2>/dev/null|grep -v %s|expand", > 1439 objdump_path ? objdump_path : "objdump", > 1440 disassembler_style ? "-M " : "", > 1441 disassembler_style ? disassembler_style : "", > 1442 map__rip_2objdump(map, sym->start), > 1443 map__rip_2objdump(map, sym->end), > 1444 symbol_conf.annotate_asm_raw ? "" : "--no-show-raw", > 1445 symbol_conf.annotate_src ? "-S" : "", > 1446 symfs_filename, symfs_filename); > > Therefore, I think it is better to do three things > > 1) fix the wrong counting line number problem > 2) remove unused the line number parsing method > 3) In addtion, a bit reduce objdump dependency > using functions related to addr2line of perf. > > What do you think about that ? > Is it bad idea ? I think we need to fix 1) definitely, but not sure about 2) and 3). If objdump could do all necessary works, why not use it? :) Thanks, Namhyung > > > > > > > However, despite the correct line numbers, > > > we can't show proper source code view > > > because of limitations from output of 'objdump -S'. > > > > > > So, next commit will resolve the limitations from 'objdump -S' > > > with the new source code view. > > > > It seems not related with this commit.. > > > > Okey, will remove the mention. > > Thanks, > Taeung
Re: [PATCH v2 1/3] perf annotate: Get correct line numbers matched with addr
+ Andi Kleen who wrote the code. On Thu, Mar 02, 2017 at 03:05:14PM +0900, Taeung Song wrote: > > > On 03/01/2017 10:17 PM, Namhyung Kim wrote: > > Hi Taeung, > > > > On Wed, Mar 01, 2017 at 04:59:51AM +0900, Taeung Song wrote: > > > Currently perf-annotate show wrong line numbers. > > > > > > For example, > > > Actual source code is as below > > > > > > ... > > > 21 }; > > > 22 > > > 23 unsigned int limited_wgt; > > > 24 > > > 25 unsigned int get_cond_maxprice(int wgt) > > > 26 { > > > ... > > > > > > However, the output of perf-annotate is as below. > > > > > > 4 Disassembly of section .text: > > > > > > 6 00400966 : > > > 7 get_cond_maxprice(): > > > 26 }; > > > > > > 28 unsigned int limited_wgt; > > > > > > 30 unsigned int get_cond_maxprice(int wgt) > > > 31 { > > > > > > The cause is the wrong way counting line numbers > > > in symbol__parse_objdump_line(). > > > So remove wrong current code counting line number and > > > use other method for it using functions related to addr2line > > > instead of the output of '-l' of objdump. > > > > Hmm.. do you think it's a bug of objdump or it's perf failing to parse > > the line number correctly? I'd like to see the output of `objdump -l` > > > > Both are ok. > 'objdump -l' hasn't a bug related to line number > and perf's method parsing the line number is ok. > > But symbol__parse_objdump_line() wrongly count line numbers > after parsing it as below. > > 1172 /* /filename:linenr ? Save line number and ignore. */ > 1173 if (regexec(_lineno, line, 2, match, 0) == 0) { > 1174 *line_nr = atoi(line + match[1].rm_so); > 1175 return 0; > 1176 } > ... > 1208 dl = disasm_line__new(offset, parsed_line, privsize, *line_nr, > arch, map); > 1209 free(line); > 1210 (*line_nr)++; > > Increasing line_nr each asm line is wrong method. > Because 'line_nr' means actual source code line number. Hmm.. ok. It looks like that it should reuse the old line_nr as is. > > Sure, I can fix only the wrong counting way. > But the above parsing method(1172~1176) is never used because of 'grep -v' > in command as below. > (the grep already remove lines containing filename:linenr of output) Right, but only if filename is same as binary name. > > 1435 snprintf(command, sizeof(command), > 1436 "%s %s%s --start-address=0x%016" PRIx64 > 1437 " --stop-address=0x%016" PRIx64 > 1438 " -l -d %s %s -C %s 2>/dev/null|grep -v %s|expand", > 1439 objdump_path ? objdump_path : "objdump", > 1440 disassembler_style ? "-M " : "", > 1441 disassembler_style ? disassembler_style : "", > 1442 map__rip_2objdump(map, sym->start), > 1443 map__rip_2objdump(map, sym->end), > 1444 symbol_conf.annotate_asm_raw ? "" : "--no-show-raw", > 1445 symbol_conf.annotate_src ? "-S" : "", > 1446 symfs_filename, symfs_filename); > > Therefore, I think it is better to do three things > > 1) fix the wrong counting line number problem > 2) remove unused the line number parsing method > 3) In addtion, a bit reduce objdump dependency > using functions related to addr2line of perf. > > What do you think about that ? > Is it bad idea ? I think we need to fix 1) definitely, but not sure about 2) and 3). If objdump could do all necessary works, why not use it? :) Thanks, Namhyung > > > > > > > However, despite the correct line numbers, > > > we can't show proper source code view > > > because of limitations from output of 'objdump -S'. > > > > > > So, next commit will resolve the limitations from 'objdump -S' > > > with the new source code view. > > > > It seems not related with this commit.. > > > > Okey, will remove the mention. > > Thanks, > Taeung
Re: [PATCH v4 1/2] dt-bindings: mmc: add DT binding for S3C24XX MMC/SD/SDIO controller
On 03/02/2017 10:18 AM, Sergio Prado wrote: > Adds the device tree bindings description for Samsung S3C24XX > MMC/SD/SDIO controller, used as a connectivity interface with external > MMC, SD and SDIO storage mediums. > > Signed-off-by: Sergio Prado> --- > .../devicetree/bindings/mmc/samsung,s3cmci.txt | 42 > ++ > 1 file changed, 42 insertions(+) > create mode 100644 Documentation/devicetree/bindings/mmc/samsung,s3cmci.txt > > diff --git a/Documentation/devicetree/bindings/mmc/samsung,s3cmci.txt > b/Documentation/devicetree/bindings/mmc/samsung,s3cmci.txt > new file mode 100644 > index ..5f68feb9f9d6 > --- /dev/null > +++ b/Documentation/devicetree/bindings/mmc/samsung,s3cmci.txt > @@ -0,0 +1,42 @@ > +* Samsung's S3C24XX MMC/SD/SDIO controller device tree bindings > + > +Samsung's S3C24XX MMC/SD/SDIO controller is used as a connectivity interface > +with external MMC, SD and SDIO storage mediums. > + > +This file documents differences between the core mmc properties described by > +mmc.txt and the properties used by the Samsung S3C24XX MMC/SD/SDIO controller > +implementation. > + > +Required SoC Specific Properties: > +- compatible: should be one of the following > + - "samsung,s3c2410-sdi": for controllers compatible with s3c2410 > + - "samsung,s3c2412-sdi": for controllers compatible with s3c2412 > + - "samsung,s3c2440-sdi": for controllers compatible with s3c2440 > +- reg: register location and length > +- interrupts: mmc controller interrupt > +- clocks: Should reference the controller clock > +- clock-names: Should contain "sdi" > + > +Required Board Specific Properties: > +- pinctrl-0: Should specify pin control groups used for this controller. > +- pinctrl-names: Should contain only one value - "default". > + > +Optional Properties: > +- bus-width: number of data lines (see mmc.txt) > +- cd-gpios: gpio for card detection (see mmc.txt) > +- wp-gpios: gpio for write protection (see mmc.txt) I think these properties don't need to describe at here. It's common properties. Best Regards, Jaehoon Chung > + > +Example: > + > + mmc0: mmc@5a00 { > + compatible = "samsung,s3c2440-sdi"; > + pinctrl-names = "default"; > + pinctrl-0 = <_pins>; > + reg = <0x5a00 0x10>; > + interrupts = <0 0 21 3>; > + clocks = < PCLK_SDI>; > + clock-names = "sdi"; > + bus-width = <4>; > + cd-gpios = < 8 GPIO_ACTIVE_LOW>; > + wp-gpios = < 8 GPIO_ACTIVE_LOW>; > + }; >
Re: [PATCH v4 1/2] dt-bindings: mmc: add DT binding for S3C24XX MMC/SD/SDIO controller
On 03/02/2017 10:18 AM, Sergio Prado wrote: > Adds the device tree bindings description for Samsung S3C24XX > MMC/SD/SDIO controller, used as a connectivity interface with external > MMC, SD and SDIO storage mediums. > > Signed-off-by: Sergio Prado > --- > .../devicetree/bindings/mmc/samsung,s3cmci.txt | 42 > ++ > 1 file changed, 42 insertions(+) > create mode 100644 Documentation/devicetree/bindings/mmc/samsung,s3cmci.txt > > diff --git a/Documentation/devicetree/bindings/mmc/samsung,s3cmci.txt > b/Documentation/devicetree/bindings/mmc/samsung,s3cmci.txt > new file mode 100644 > index ..5f68feb9f9d6 > --- /dev/null > +++ b/Documentation/devicetree/bindings/mmc/samsung,s3cmci.txt > @@ -0,0 +1,42 @@ > +* Samsung's S3C24XX MMC/SD/SDIO controller device tree bindings > + > +Samsung's S3C24XX MMC/SD/SDIO controller is used as a connectivity interface > +with external MMC, SD and SDIO storage mediums. > + > +This file documents differences between the core mmc properties described by > +mmc.txt and the properties used by the Samsung S3C24XX MMC/SD/SDIO controller > +implementation. > + > +Required SoC Specific Properties: > +- compatible: should be one of the following > + - "samsung,s3c2410-sdi": for controllers compatible with s3c2410 > + - "samsung,s3c2412-sdi": for controllers compatible with s3c2412 > + - "samsung,s3c2440-sdi": for controllers compatible with s3c2440 > +- reg: register location and length > +- interrupts: mmc controller interrupt > +- clocks: Should reference the controller clock > +- clock-names: Should contain "sdi" > + > +Required Board Specific Properties: > +- pinctrl-0: Should specify pin control groups used for this controller. > +- pinctrl-names: Should contain only one value - "default". > + > +Optional Properties: > +- bus-width: number of data lines (see mmc.txt) > +- cd-gpios: gpio for card detection (see mmc.txt) > +- wp-gpios: gpio for write protection (see mmc.txt) I think these properties don't need to describe at here. It's common properties. Best Regards, Jaehoon Chung > + > +Example: > + > + mmc0: mmc@5a00 { > + compatible = "samsung,s3c2440-sdi"; > + pinctrl-names = "default"; > + pinctrl-0 = <_pins>; > + reg = <0x5a00 0x10>; > + interrupts = <0 0 21 3>; > + clocks = < PCLK_SDI>; > + clock-names = "sdi"; > + bus-width = <4>; > + cd-gpios = < 8 GPIO_ACTIVE_LOW>; > + wp-gpios = < 8 GPIO_ACTIVE_LOW>; > + }; >
Re: [PATCH v3 0/5] perf report: Show inline stack
Hi Wolff, Thanks so much for your testing. I also wish this feature could be upstreamed. I will send a v4 series soon. In v4, It removes the options "--inline-line" and "--inline-name". It just uses a new option "--inline" to print the inline function information. The policy is if the inline function name can be resolved then print the function name otherwise it prints the source line number. For example: perf report --stdio --inline It prints: 0.69% 0.00% inline ld-2.23.so [.] dl_main | ---dl_main | --0.56%--_dl_relocate_object | ---_dl_relocate_object (inline) elf_dynamic_do_Rela (inline) Thanks Jin Yao On 3/3/2017 5:42 AM, Milian Wolff wrote: On Dienstag, 21. Februar 2017 01:28:17 CET Jin, Yao wrote: Hi, Any comments for this patch series? Sorry for the delay. I just tested it again. Overall, this is a clear improvement, so I'm all for getting this functionality in. But from a usability point of view, I still have the some of the issues that I have raised in the past: a) --inline should be a boolean setting that enables inline resolution on demand b) the other callgraph settings and formatting should be used for inlined frames, i.e. - instead of `perf report --inline-name` it should be: `perf report --inline -g function` and since `-g function` is the default, it would be the same as: `perf report --inline` - instead of `perf report --inline-line -g address` it should be: `perf report --inline -g address` Again: As a user of `perf report`, I do not care whether a frame is an inlined one or a non-inlined one - both should be grouped and displayed the same way. I.e. this is unusable (imo): ~ perf report --inline-line --stdio 99.81%35.99% cpp-inlining cpp-inlining [.] main | |--63.82%--main | ---/home/milian/projects/kdab/rnd/hotspot/tests/test- clients/cpp-inlining/main.cpp:39 (inline) /usr/include/c++/6.3.1/complex:664 (inline) | | | |--63.19%--hypot | | | | | --58.04%--__hypot_finite | | | --0.62%--cabs ~ Dito for this: ~ perf report --stdio --inline-name -g address --stdio 99.81%35.99% cpp-inlining cpp-inlining [.] main | |--63.82%--main complex:655 | ---main (inline) std::norm (inline) ~ But, again, even with these gripes, I think it's a very useful feature and I would like to see it integrated upstream. From my POV, you can add Tested-by: Milian Wolffto all patches in this series. Many thanks!
Re: [PATCH v3 0/5] perf report: Show inline stack
Hi Wolff, Thanks so much for your testing. I also wish this feature could be upstreamed. I will send a v4 series soon. In v4, It removes the options "--inline-line" and "--inline-name". It just uses a new option "--inline" to print the inline function information. The policy is if the inline function name can be resolved then print the function name otherwise it prints the source line number. For example: perf report --stdio --inline It prints: 0.69% 0.00% inline ld-2.23.so [.] dl_main | ---dl_main | --0.56%--_dl_relocate_object | ---_dl_relocate_object (inline) elf_dynamic_do_Rela (inline) Thanks Jin Yao On 3/3/2017 5:42 AM, Milian Wolff wrote: On Dienstag, 21. Februar 2017 01:28:17 CET Jin, Yao wrote: Hi, Any comments for this patch series? Sorry for the delay. I just tested it again. Overall, this is a clear improvement, so I'm all for getting this functionality in. But from a usability point of view, I still have the some of the issues that I have raised in the past: a) --inline should be a boolean setting that enables inline resolution on demand b) the other callgraph settings and formatting should be used for inlined frames, i.e. - instead of `perf report --inline-name` it should be: `perf report --inline -g function` and since `-g function` is the default, it would be the same as: `perf report --inline` - instead of `perf report --inline-line -g address` it should be: `perf report --inline -g address` Again: As a user of `perf report`, I do not care whether a frame is an inlined one or a non-inlined one - both should be grouped and displayed the same way. I.e. this is unusable (imo): ~ perf report --inline-line --stdio 99.81%35.99% cpp-inlining cpp-inlining [.] main | |--63.82%--main | ---/home/milian/projects/kdab/rnd/hotspot/tests/test- clients/cpp-inlining/main.cpp:39 (inline) /usr/include/c++/6.3.1/complex:664 (inline) | | | |--63.19%--hypot | | | | | --58.04%--__hypot_finite | | | --0.62%--cabs ~ Dito for this: ~ perf report --stdio --inline-name -g address --stdio 99.81%35.99% cpp-inlining cpp-inlining [.] main | |--63.82%--main complex:655 | ---main (inline) std::norm (inline) ~ But, again, even with these gripes, I think it's a very useful feature and I would like to see it integrated upstream. From my POV, you can add Tested-by: Milian Wolff to all patches in this series. Many thanks!
Re: [PATCH v4 2/2] mmc: host: s3cmci: allow probing from device tree
On 03/02/2017 10:18 AM, Sergio Prado wrote: > Allows configuring Samsung S3C24XX MMC/SD/SDIO controller using a device > tree. > > Signed-off-by: Sergio Prado> --- > drivers/mmc/host/s3cmci.c | 298 > -- > drivers/mmc/host/s3cmci.h | 3 +- > 2 files changed, 158 insertions(+), 143 deletions(-) > > diff --git a/drivers/mmc/host/s3cmci.c b/drivers/mmc/host/s3cmci.c > index 7a173f8c455b..d066dbdb957c 100644 > --- a/drivers/mmc/host/s3cmci.c > +++ b/drivers/mmc/host/s3cmci.c > @@ -24,6 +24,10 @@ > #include > #include > #include > +#include > +#include > +#include > +#include > > #include > #include > @@ -128,6 +132,22 @@ enum dbg_channels { > dbg_conf = (1 << 8), > }; > > +struct s3cmci_variant_data { > + int s3c2440_compatible; > +}; I didn't understand why this structure needs. Before this patch, host->is2440; After this patch, host->variant->s3c2440_compatible; Just add the one pointer for checking s3c2400 compatible.. Is it really meaningful? (I didn't read the previous comments fully.) Best Regards, Jaehoon Chung > + > +static const struct s3cmci_variant_data s3c2410_s3cmci_variant_data = { > + .s3c2440_compatible = 0, > +}; > + > +static const struct s3cmci_variant_data s3c2412_s3cmci_variant_data = { > + .s3c2440_compatible = 1, > +}; > + > +static const struct s3cmci_variant_data s3c2440_s3cmci_variant_data = { > + .s3c2440_compatible = 1, > +}; > + > static const int dbgmap_err = dbg_fail; > static const int dbgmap_info = dbg_info | dbg_conf; > static const int dbgmap_debug = dbg_err | dbg_debug; > @@ -731,7 +751,7 @@ static irqreturn_t s3cmci_irq(int irq, void *dev_id) > goto clear_status_bits; > > /* Check for FIFO failure */ > - if (host->is2440) { > + if (host->variant->s3c2440_compatible) { > if (mci_fsta & S3C2440_SDIFSTA_FIFOFAIL) { > dbg(host, dbg_err, "FIFO failure\n"); > host->mrq->data->error = -EILSEQ; > @@ -807,21 +827,6 @@ static irqreturn_t s3cmci_irq(int irq, void *dev_id) > > } > > -/* > - * ISR for the CardDetect Pin > -*/ > - > -static irqreturn_t s3cmci_irq_cd(int irq, void *dev_id) > -{ > - struct s3cmci_host *host = (struct s3cmci_host *)dev_id; > - > - dbg(host, dbg_irq, "card detect\n"); > - > - mmc_detect_change(host->mmc, msecs_to_jiffies(500)); > - > - return IRQ_HANDLED; > -} > - > static void s3cmci_dma_done_callback(void *arg) > { > struct s3cmci_host *host = arg; > @@ -913,7 +918,7 @@ static void finalize_request(struct s3cmci_host *host) > if (s3cmci_host_usedma(host)) > dmaengine_terminate_all(host->dma); > > - if (host->is2440) { > + if (host->variant->s3c2440_compatible) { > /* Clear failure register and reset fifo. */ > writel(S3C2440_SDIFSTA_FIFORESET | > S3C2440_SDIFSTA_FIFOFAIL, > @@ -1026,7 +1031,7 @@ static int s3cmci_setup_data(struct s3cmci_host *host, > struct mmc_data *data) > dcon |= S3C2410_SDIDCON_XFER_RXSTART; > } > > - if (host->is2440) { > + if (host->variant->s3c2440_compatible) { > dcon |= S3C2440_SDIDCON_DS_WORD; > dcon |= S3C2440_SDIDCON_DATSTART; > } > @@ -1045,7 +1050,7 @@ static int s3cmci_setup_data(struct s3cmci_host *host, > struct mmc_data *data) > > /* write TIMER register */ > > - if (host->is2440) { > + if (host->variant->s3c2440_compatible) { > writel(0x007F, host->base + S3C2410_SDITIMER); > } else { > writel(0x, host->base + S3C2410_SDITIMER); > @@ -1177,19 +1182,6 @@ static void s3cmci_send_request(struct mmc_host *mmc) > s3cmci_enable_irq(host, true); > } > > -static int s3cmci_card_present(struct mmc_host *mmc) > -{ > - struct s3cmci_host *host = mmc_priv(mmc); > - struct s3c24xx_mci_pdata *pdata = host->pdata; > - int ret; > - > - if (pdata->no_detect) > - return -ENOSYS; > - > - ret = gpio_get_value(pdata->gpio_detect) ? 0 : 1; > - return ret ^ pdata->detect_invert; > -} > - > static void s3cmci_request(struct mmc_host *mmc, struct mmc_request *mrq) > { > struct s3cmci_host *host = mmc_priv(mmc); > @@ -1198,7 +1190,7 @@ static void s3cmci_request(struct mmc_host *mmc, struct > mmc_request *mrq) > host->cmd_is_stop = 0; > host->mrq = mrq; > > - if (s3cmci_card_present(mmc) == 0) { > + if (mmc_gpio_get_cd(mmc) == 0) { > dbg(host, dbg_err, "%s: no medium present\n", __func__); > host->mrq->cmd->error = -ENOMEDIUM; > mmc_request_done(mmc, mrq); > @@ -1242,22 +1234,24 @@ static void s3cmci_set_ios(struct mmc_host *mmc, > struct mmc_ios *ios) > case MMC_POWER_ON: > case MMC_POWER_UP: >
Re: [PATCH v4 2/2] mmc: host: s3cmci: allow probing from device tree
On 03/02/2017 10:18 AM, Sergio Prado wrote: > Allows configuring Samsung S3C24XX MMC/SD/SDIO controller using a device > tree. > > Signed-off-by: Sergio Prado > --- > drivers/mmc/host/s3cmci.c | 298 > -- > drivers/mmc/host/s3cmci.h | 3 +- > 2 files changed, 158 insertions(+), 143 deletions(-) > > diff --git a/drivers/mmc/host/s3cmci.c b/drivers/mmc/host/s3cmci.c > index 7a173f8c455b..d066dbdb957c 100644 > --- a/drivers/mmc/host/s3cmci.c > +++ b/drivers/mmc/host/s3cmci.c > @@ -24,6 +24,10 @@ > #include > #include > #include > +#include > +#include > +#include > +#include > > #include > #include > @@ -128,6 +132,22 @@ enum dbg_channels { > dbg_conf = (1 << 8), > }; > > +struct s3cmci_variant_data { > + int s3c2440_compatible; > +}; I didn't understand why this structure needs. Before this patch, host->is2440; After this patch, host->variant->s3c2440_compatible; Just add the one pointer for checking s3c2400 compatible.. Is it really meaningful? (I didn't read the previous comments fully.) Best Regards, Jaehoon Chung > + > +static const struct s3cmci_variant_data s3c2410_s3cmci_variant_data = { > + .s3c2440_compatible = 0, > +}; > + > +static const struct s3cmci_variant_data s3c2412_s3cmci_variant_data = { > + .s3c2440_compatible = 1, > +}; > + > +static const struct s3cmci_variant_data s3c2440_s3cmci_variant_data = { > + .s3c2440_compatible = 1, > +}; > + > static const int dbgmap_err = dbg_fail; > static const int dbgmap_info = dbg_info | dbg_conf; > static const int dbgmap_debug = dbg_err | dbg_debug; > @@ -731,7 +751,7 @@ static irqreturn_t s3cmci_irq(int irq, void *dev_id) > goto clear_status_bits; > > /* Check for FIFO failure */ > - if (host->is2440) { > + if (host->variant->s3c2440_compatible) { > if (mci_fsta & S3C2440_SDIFSTA_FIFOFAIL) { > dbg(host, dbg_err, "FIFO failure\n"); > host->mrq->data->error = -EILSEQ; > @@ -807,21 +827,6 @@ static irqreturn_t s3cmci_irq(int irq, void *dev_id) > > } > > -/* > - * ISR for the CardDetect Pin > -*/ > - > -static irqreturn_t s3cmci_irq_cd(int irq, void *dev_id) > -{ > - struct s3cmci_host *host = (struct s3cmci_host *)dev_id; > - > - dbg(host, dbg_irq, "card detect\n"); > - > - mmc_detect_change(host->mmc, msecs_to_jiffies(500)); > - > - return IRQ_HANDLED; > -} > - > static void s3cmci_dma_done_callback(void *arg) > { > struct s3cmci_host *host = arg; > @@ -913,7 +918,7 @@ static void finalize_request(struct s3cmci_host *host) > if (s3cmci_host_usedma(host)) > dmaengine_terminate_all(host->dma); > > - if (host->is2440) { > + if (host->variant->s3c2440_compatible) { > /* Clear failure register and reset fifo. */ > writel(S3C2440_SDIFSTA_FIFORESET | > S3C2440_SDIFSTA_FIFOFAIL, > @@ -1026,7 +1031,7 @@ static int s3cmci_setup_data(struct s3cmci_host *host, > struct mmc_data *data) > dcon |= S3C2410_SDIDCON_XFER_RXSTART; > } > > - if (host->is2440) { > + if (host->variant->s3c2440_compatible) { > dcon |= S3C2440_SDIDCON_DS_WORD; > dcon |= S3C2440_SDIDCON_DATSTART; > } > @@ -1045,7 +1050,7 @@ static int s3cmci_setup_data(struct s3cmci_host *host, > struct mmc_data *data) > > /* write TIMER register */ > > - if (host->is2440) { > + if (host->variant->s3c2440_compatible) { > writel(0x007F, host->base + S3C2410_SDITIMER); > } else { > writel(0x, host->base + S3C2410_SDITIMER); > @@ -1177,19 +1182,6 @@ static void s3cmci_send_request(struct mmc_host *mmc) > s3cmci_enable_irq(host, true); > } > > -static int s3cmci_card_present(struct mmc_host *mmc) > -{ > - struct s3cmci_host *host = mmc_priv(mmc); > - struct s3c24xx_mci_pdata *pdata = host->pdata; > - int ret; > - > - if (pdata->no_detect) > - return -ENOSYS; > - > - ret = gpio_get_value(pdata->gpio_detect) ? 0 : 1; > - return ret ^ pdata->detect_invert; > -} > - > static void s3cmci_request(struct mmc_host *mmc, struct mmc_request *mrq) > { > struct s3cmci_host *host = mmc_priv(mmc); > @@ -1198,7 +1190,7 @@ static void s3cmci_request(struct mmc_host *mmc, struct > mmc_request *mrq) > host->cmd_is_stop = 0; > host->mrq = mrq; > > - if (s3cmci_card_present(mmc) == 0) { > + if (mmc_gpio_get_cd(mmc) == 0) { > dbg(host, dbg_err, "%s: no medium present\n", __func__); > host->mrq->cmd->error = -ENOMEDIUM; > mmc_request_done(mmc, mrq); > @@ -1242,22 +1234,24 @@ static void s3cmci_set_ios(struct mmc_host *mmc, > struct mmc_ios *ios) > case MMC_POWER_ON: > case MMC_POWER_UP: > /* Configure GPE5...GPE10
[patch] mm, zoneinfo: print non-populated zones
Initscripts can use the information (protection levels) from /proc/zoneinfo to configure vm.lowmem_reserve_ratio at boot. vm.lowmem_reserve_ratio is an array of ratios for each configured zone on the system. If a zone is not populated on an arch, /proc/zoneinfo suppresses its output. This results in there not being a 1:1 mapping between the set of zones emitted by /proc/zoneinfo and the zones configured by vm.lowmem_reserve_ratio. This patch shows statistics for non-populated zones in /proc/zoneinfo. The zones exist and hold a spot in the vm.lowmem_reserve_ratio array. Without this patch, it is not possible to determine which index in the array controls which zone if one or more zones on the system are not populated. Remaining users of walk_zones_in_node() are unchanged. Files such as /proc/pagetypeinfo require certain zone data to be initialized properly for display, which is not done for unpopulated zones. Signed-off-by: David Rientjes--- mm/vmstat.c | 22 +- 1 file changed, 13 insertions(+), 9 deletions(-) diff --git a/mm/vmstat.c b/mm/vmstat.c --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1121,8 +1121,12 @@ static void frag_stop(struct seq_file *m, void *arg) { } -/* Walk all the zones in a node and print using a callback */ +/* + * Walk zones in a node and print using a callback. + * If @populated is true, only use callback for zones that are populated. + */ static void walk_zones_in_node(struct seq_file *m, pg_data_t *pgdat, + bool populated, void (*print)(struct seq_file *m, pg_data_t *, struct zone *)) { struct zone *zone; @@ -1130,7 +1134,7 @@ static void walk_zones_in_node(struct seq_file *m, pg_data_t *pgdat, unsigned long flags; for (zone = node_zones; zone - node_zones < MAX_NR_ZONES; ++zone) { - if (!populated_zone(zone)) + if (populated && !populated_zone(zone)) continue; spin_lock_irqsave(>lock, flags); @@ -1158,7 +1162,7 @@ static void frag_show_print(struct seq_file *m, pg_data_t *pgdat, static int frag_show(struct seq_file *m, void *arg) { pg_data_t *pgdat = (pg_data_t *)arg; - walk_zones_in_node(m, pgdat, frag_show_print); + walk_zones_in_node(m, pgdat, true, frag_show_print); return 0; } @@ -1199,7 +1203,7 @@ static int pagetypeinfo_showfree(struct seq_file *m, void *arg) seq_printf(m, "%6d ", order); seq_putc(m, '\n'); - walk_zones_in_node(m, pgdat, pagetypeinfo_showfree_print); + walk_zones_in_node(m, pgdat, true, pagetypeinfo_showfree_print); return 0; } @@ -1251,7 +1255,7 @@ static int pagetypeinfo_showblockcount(struct seq_file *m, void *arg) for (mtype = 0; mtype < MIGRATE_TYPES; mtype++) seq_printf(m, "%12s ", migratetype_names[mtype]); seq_putc(m, '\n'); - walk_zones_in_node(m, pgdat, pagetypeinfo_showblockcount_print); + walk_zones_in_node(m, pgdat, true, pagetypeinfo_showblockcount_print); return 0; } @@ -1277,7 +1281,7 @@ static void pagetypeinfo_showmixedcount(struct seq_file *m, pg_data_t *pgdat) seq_printf(m, "%12s ", migratetype_names[mtype]); seq_putc(m, '\n'); - walk_zones_in_node(m, pgdat, pagetypeinfo_showmixedcount_print); + walk_zones_in_node(m, pgdat, true, pagetypeinfo_showmixedcount_print); #endif /* CONFIG_PAGE_OWNER */ } @@ -1434,7 +1438,7 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat, static int zoneinfo_show(struct seq_file *m, void *arg) { pg_data_t *pgdat = (pg_data_t *)arg; - walk_zones_in_node(m, pgdat, zoneinfo_show_print); + walk_zones_in_node(m, pgdat, false, zoneinfo_show_print); return 0; } @@ -1853,7 +1857,7 @@ static int unusable_show(struct seq_file *m, void *arg) if (!node_state(pgdat->node_id, N_MEMORY)) return 0; - walk_zones_in_node(m, pgdat, unusable_show_print); + walk_zones_in_node(m, pgdat, true, unusable_show_print); return 0; } @@ -1905,7 +1909,7 @@ static int extfrag_show(struct seq_file *m, void *arg) { pg_data_t *pgdat = (pg_data_t *)arg; - walk_zones_in_node(m, pgdat, extfrag_show_print); + walk_zones_in_node(m, pgdat, true, extfrag_show_print); return 0; }
[patch] mm, zoneinfo: print non-populated zones
Initscripts can use the information (protection levels) from /proc/zoneinfo to configure vm.lowmem_reserve_ratio at boot. vm.lowmem_reserve_ratio is an array of ratios for each configured zone on the system. If a zone is not populated on an arch, /proc/zoneinfo suppresses its output. This results in there not being a 1:1 mapping between the set of zones emitted by /proc/zoneinfo and the zones configured by vm.lowmem_reserve_ratio. This patch shows statistics for non-populated zones in /proc/zoneinfo. The zones exist and hold a spot in the vm.lowmem_reserve_ratio array. Without this patch, it is not possible to determine which index in the array controls which zone if one or more zones on the system are not populated. Remaining users of walk_zones_in_node() are unchanged. Files such as /proc/pagetypeinfo require certain zone data to be initialized properly for display, which is not done for unpopulated zones. Signed-off-by: David Rientjes --- mm/vmstat.c | 22 +- 1 file changed, 13 insertions(+), 9 deletions(-) diff --git a/mm/vmstat.c b/mm/vmstat.c --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1121,8 +1121,12 @@ static void frag_stop(struct seq_file *m, void *arg) { } -/* Walk all the zones in a node and print using a callback */ +/* + * Walk zones in a node and print using a callback. + * If @populated is true, only use callback for zones that are populated. + */ static void walk_zones_in_node(struct seq_file *m, pg_data_t *pgdat, + bool populated, void (*print)(struct seq_file *m, pg_data_t *, struct zone *)) { struct zone *zone; @@ -1130,7 +1134,7 @@ static void walk_zones_in_node(struct seq_file *m, pg_data_t *pgdat, unsigned long flags; for (zone = node_zones; zone - node_zones < MAX_NR_ZONES; ++zone) { - if (!populated_zone(zone)) + if (populated && !populated_zone(zone)) continue; spin_lock_irqsave(>lock, flags); @@ -1158,7 +1162,7 @@ static void frag_show_print(struct seq_file *m, pg_data_t *pgdat, static int frag_show(struct seq_file *m, void *arg) { pg_data_t *pgdat = (pg_data_t *)arg; - walk_zones_in_node(m, pgdat, frag_show_print); + walk_zones_in_node(m, pgdat, true, frag_show_print); return 0; } @@ -1199,7 +1203,7 @@ static int pagetypeinfo_showfree(struct seq_file *m, void *arg) seq_printf(m, "%6d ", order); seq_putc(m, '\n'); - walk_zones_in_node(m, pgdat, pagetypeinfo_showfree_print); + walk_zones_in_node(m, pgdat, true, pagetypeinfo_showfree_print); return 0; } @@ -1251,7 +1255,7 @@ static int pagetypeinfo_showblockcount(struct seq_file *m, void *arg) for (mtype = 0; mtype < MIGRATE_TYPES; mtype++) seq_printf(m, "%12s ", migratetype_names[mtype]); seq_putc(m, '\n'); - walk_zones_in_node(m, pgdat, pagetypeinfo_showblockcount_print); + walk_zones_in_node(m, pgdat, true, pagetypeinfo_showblockcount_print); return 0; } @@ -1277,7 +1281,7 @@ static void pagetypeinfo_showmixedcount(struct seq_file *m, pg_data_t *pgdat) seq_printf(m, "%12s ", migratetype_names[mtype]); seq_putc(m, '\n'); - walk_zones_in_node(m, pgdat, pagetypeinfo_showmixedcount_print); + walk_zones_in_node(m, pgdat, true, pagetypeinfo_showmixedcount_print); #endif /* CONFIG_PAGE_OWNER */ } @@ -1434,7 +1438,7 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat, static int zoneinfo_show(struct seq_file *m, void *arg) { pg_data_t *pgdat = (pg_data_t *)arg; - walk_zones_in_node(m, pgdat, zoneinfo_show_print); + walk_zones_in_node(m, pgdat, false, zoneinfo_show_print); return 0; } @@ -1853,7 +1857,7 @@ static int unusable_show(struct seq_file *m, void *arg) if (!node_state(pgdat->node_id, N_MEMORY)) return 0; - walk_zones_in_node(m, pgdat, unusable_show_print); + walk_zones_in_node(m, pgdat, true, unusable_show_print); return 0; } @@ -1905,7 +1909,7 @@ static int extfrag_show(struct seq_file *m, void *arg) { pg_data_t *pgdat = (pg_data_t *)arg; - walk_zones_in_node(m, pgdat, extfrag_show_print); + walk_zones_in_node(m, pgdat, true, extfrag_show_print); return 0; }
Re: [PATCH v19 0/4] Introduce usb charger framework to deal with the usb gadget power negotation
On Mon, Feb 20 2017, Baolin Wang wrote: > Currently the Linux kernel does not provide any standard integration of this > feature that integrates the USB subsystem with the system power regulation > provided by PMICs meaning that either vendors must add this in their kernels > or USB gadget devices based on Linux (such as mobile phones) may not behave > as they should. Thus provide a standard framework for doing this in kernel. > > Now introduce one user with wm831x_power to support and test the usb charger. > Another user introduced to support charger detection by Jun Li: > https://www.spinics.net/lists/linux-usb/msg139425.html > Moreover there may be other potential users will use it in future. > > 1. Before v19 patchset we've fixed below issues in extcon subsystem and usb > phy driver, now all were merged. (Thanks for Neil's suggestion) > (1) Have fixed the inconsistencies with USB connector types in extcon > subsystem > by following links: > https://lkml.org/lkml/2016/12/21/13 > https://lkml.org/lkml/2016/12/21/15 > https://lkml.org/lkml/2016/12/21/79 > https://lkml.org/lkml/2017/1/3/13 > > (2) Instead of using 'set_power' callback in phy drivers, we will introduce > USB charger to set PMIC current drawn from USB configuration, moreover some > 'set_power' callbacks did not implement anything to set PMIC current, thus > remove them by following links: > https://lkml.org/lkml/2017/1/18/436 > https://lkml.org/lkml/2017/1/18/439 > https://lkml.org/lkml/2017/1/18/438 > Now only two phy drivers (phy-isp1301-omap.c and phy-gpio-vbus-usb.c) still > used 'set_power' callback to set current, we can remove them in future. (I > have no platform with enabling these two phy drivers, so I can not test them > if I converted 'set_power' callback to USB charger.) > > 2. Some issues pointed by Neil Brown were sill kept in this v19 patchset, and > I expalined each issue and may be need discuss again: > (1) Change all usb phys to register an extcon and to send appropriate > notifications. > Firstly, now only 3 USB phy drivers (phy-qcom-8x16-usb.c, phy-omap-otg.c and > phy-msm-usb.c) had registered an extcon, mostly did not. I can not change all > usb phys to register an extcon, since there are no extcon device to register > for these different phy drivers. You don't have to change every driver. You just need to make it easy and obvious how to change drivers in a consistent coherent way. For a start you would add a 'struct extcon_dev' to 'struct usb_phy', and possibly add or extend some 'static inline's in linux/usb/phy.h to send notification on that extcon (if it is non-NULL). e.g. usb_phy_vbus_on() could send an extcon notification. Then any phy driver which adds support for setting phy->extcon_dev appropriately, immediately gets the relevant notifications sent. > Secondly, I also agreed with Peter's comments: Not only USB PHY to register > an extcon, but also for the drivers which can detect USB charger type, it may > be USB controller driver, USB type-c driver, pmic driver, and these drivers > may not have an extcon device since the internal part can finish the vbus > detect. Whichever part can detect vbus, the driver for that part must be able to find the extcon and trigger a notification. Maybe one part can detect VBUS, another can measure the resistance on ID and a third can work through the state machine to determine if D+ and D- are shorted together. Somehow these three need to work together to determine what is plugged in to the external connection port. Somewhere there much an 'extcon' device which represents that port and which the three devices can find and can interact with. I think it makes sense for the usb_phy to be the connection point. Each of the devices can get to the phy, and the phy can get to the extcon. It doesn't matter very much if the usb phy driver creates the extcon, or if something else creates the extcon and the phy driver performs a lookup to find it (e.g. based on devicetree info). The point is that there is obviously an external physical connection, and so there should be an 'extcon' device that represents it. > > (2) Change the notifier of usb_phy to be used consistently. > Now only 3 phy drivers (phy-generic.c, phy-ab8500-usb.c and > phy-gpio-vbus-usb.c) > used the notifier of usb_phy. phy-generic.c and phy-gpio-vbus-usb.c were used > to > send out the connect events, and phy-ab8500-usb.c also was used to send out > the > MUSB connect events. There are no phy drivers will notify 'vbus_draw' > information > by the notifier of usb_phy, which was used consistently now. > Moreover it is difficult to change the notifier of usb_phy to be used only to > communicate the 'vbus_draw' information, since we need to refactor and test > these > related phy drivers, power drivers or some mfd drivers, which is a > huge workload. You missed drivers/usb/musb/omap2430.c in you list, but that hardly matters. phy-ab8500-usb.c appears to send vbus_draw information. I understand your
Re: [PATCH v19 0/4] Introduce usb charger framework to deal with the usb gadget power negotation
On Mon, Feb 20 2017, Baolin Wang wrote: > Currently the Linux kernel does not provide any standard integration of this > feature that integrates the USB subsystem with the system power regulation > provided by PMICs meaning that either vendors must add this in their kernels > or USB gadget devices based on Linux (such as mobile phones) may not behave > as they should. Thus provide a standard framework for doing this in kernel. > > Now introduce one user with wm831x_power to support and test the usb charger. > Another user introduced to support charger detection by Jun Li: > https://www.spinics.net/lists/linux-usb/msg139425.html > Moreover there may be other potential users will use it in future. > > 1. Before v19 patchset we've fixed below issues in extcon subsystem and usb > phy driver, now all were merged. (Thanks for Neil's suggestion) > (1) Have fixed the inconsistencies with USB connector types in extcon > subsystem > by following links: > https://lkml.org/lkml/2016/12/21/13 > https://lkml.org/lkml/2016/12/21/15 > https://lkml.org/lkml/2016/12/21/79 > https://lkml.org/lkml/2017/1/3/13 > > (2) Instead of using 'set_power' callback in phy drivers, we will introduce > USB charger to set PMIC current drawn from USB configuration, moreover some > 'set_power' callbacks did not implement anything to set PMIC current, thus > remove them by following links: > https://lkml.org/lkml/2017/1/18/436 > https://lkml.org/lkml/2017/1/18/439 > https://lkml.org/lkml/2017/1/18/438 > Now only two phy drivers (phy-isp1301-omap.c and phy-gpio-vbus-usb.c) still > used 'set_power' callback to set current, we can remove them in future. (I > have no platform with enabling these two phy drivers, so I can not test them > if I converted 'set_power' callback to USB charger.) > > 2. Some issues pointed by Neil Brown were sill kept in this v19 patchset, and > I expalined each issue and may be need discuss again: > (1) Change all usb phys to register an extcon and to send appropriate > notifications. > Firstly, now only 3 USB phy drivers (phy-qcom-8x16-usb.c, phy-omap-otg.c and > phy-msm-usb.c) had registered an extcon, mostly did not. I can not change all > usb phys to register an extcon, since there are no extcon device to register > for these different phy drivers. You don't have to change every driver. You just need to make it easy and obvious how to change drivers in a consistent coherent way. For a start you would add a 'struct extcon_dev' to 'struct usb_phy', and possibly add or extend some 'static inline's in linux/usb/phy.h to send notification on that extcon (if it is non-NULL). e.g. usb_phy_vbus_on() could send an extcon notification. Then any phy driver which adds support for setting phy->extcon_dev appropriately, immediately gets the relevant notifications sent. > Secondly, I also agreed with Peter's comments: Not only USB PHY to register > an extcon, but also for the drivers which can detect USB charger type, it may > be USB controller driver, USB type-c driver, pmic driver, and these drivers > may not have an extcon device since the internal part can finish the vbus > detect. Whichever part can detect vbus, the driver for that part must be able to find the extcon and trigger a notification. Maybe one part can detect VBUS, another can measure the resistance on ID and a third can work through the state machine to determine if D+ and D- are shorted together. Somehow these three need to work together to determine what is plugged in to the external connection port. Somewhere there much an 'extcon' device which represents that port and which the three devices can find and can interact with. I think it makes sense for the usb_phy to be the connection point. Each of the devices can get to the phy, and the phy can get to the extcon. It doesn't matter very much if the usb phy driver creates the extcon, or if something else creates the extcon and the phy driver performs a lookup to find it (e.g. based on devicetree info). The point is that there is obviously an external physical connection, and so there should be an 'extcon' device that represents it. > > (2) Change the notifier of usb_phy to be used consistently. > Now only 3 phy drivers (phy-generic.c, phy-ab8500-usb.c and > phy-gpio-vbus-usb.c) > used the notifier of usb_phy. phy-generic.c and phy-gpio-vbus-usb.c were used > to > send out the connect events, and phy-ab8500-usb.c also was used to send out > the > MUSB connect events. There are no phy drivers will notify 'vbus_draw' > information > by the notifier of usb_phy, which was used consistently now. > Moreover it is difficult to change the notifier of usb_phy to be used only to > communicate the 'vbus_draw' information, since we need to refactor and test > these > related phy drivers, power drivers or some mfd drivers, which is a > huge workload. You missed drivers/usb/musb/omap2430.c in you list, but that hardly matters. phy-ab8500-usb.c appears to send vbus_draw information. I understand your
Re: [PATCH v4 14/36] [media] v4l2-mc: add a function to inherit controls from a pipeline
On 03/02/2017 03:48 PM, Steve Longerbeam wrote: On 03/02/2017 08:02 AM, Sakari Ailus wrote: Hi Steve, On Wed, Feb 15, 2017 at 06:19:16PM -0800, Steve Longerbeam wrote: v4l2_pipeline_inherit_controls() will add the v4l2 controls from all subdev entities in a pipeline to a given video device. Signed-off-by: Steve Longerbeam--- drivers/media/v4l2-core/v4l2-mc.c | 48 +++ include/media/v4l2-mc.h | 25 2 files changed, 73 insertions(+) diff --git a/drivers/media/v4l2-core/v4l2-mc.c b/drivers/media/v4l2-core/v4l2-mc.c index 303980b..09d4d97 100644 --- a/drivers/media/v4l2-core/v4l2-mc.c +++ b/drivers/media/v4l2-core/v4l2-mc.c @@ -22,6 +22,7 @@ #include #include #include +#include #include #include #include @@ -238,6 +239,53 @@ int v4l_vb2q_enable_media_source(struct vb2_queue *q) } EXPORT_SYMBOL_GPL(v4l_vb2q_enable_media_source); +int __v4l2_pipeline_inherit_controls(struct video_device *vfd, + struct media_entity *start_entity) I have a few concerns / questions: - What's the purpose of this patch? Why not to access the sub-device node directly? I don't really understand what you are trying to say. Actually I think I understand what you mean now. Yes, the user can always access a subdev's control directly from its /dev/v4l-subdevXX. I'm only providing this feature as a convenience to the user, so that all controls in a pipeline can be accessed from one place, i.e. the main capture device node. Steve
Re: [PATCH v4 14/36] [media] v4l2-mc: add a function to inherit controls from a pipeline
On 03/02/2017 03:48 PM, Steve Longerbeam wrote: On 03/02/2017 08:02 AM, Sakari Ailus wrote: Hi Steve, On Wed, Feb 15, 2017 at 06:19:16PM -0800, Steve Longerbeam wrote: v4l2_pipeline_inherit_controls() will add the v4l2 controls from all subdev entities in a pipeline to a given video device. Signed-off-by: Steve Longerbeam --- drivers/media/v4l2-core/v4l2-mc.c | 48 +++ include/media/v4l2-mc.h | 25 2 files changed, 73 insertions(+) diff --git a/drivers/media/v4l2-core/v4l2-mc.c b/drivers/media/v4l2-core/v4l2-mc.c index 303980b..09d4d97 100644 --- a/drivers/media/v4l2-core/v4l2-mc.c +++ b/drivers/media/v4l2-core/v4l2-mc.c @@ -22,6 +22,7 @@ #include #include #include +#include #include #include #include @@ -238,6 +239,53 @@ int v4l_vb2q_enable_media_source(struct vb2_queue *q) } EXPORT_SYMBOL_GPL(v4l_vb2q_enable_media_source); +int __v4l2_pipeline_inherit_controls(struct video_device *vfd, + struct media_entity *start_entity) I have a few concerns / questions: - What's the purpose of this patch? Why not to access the sub-device node directly? I don't really understand what you are trying to say. Actually I think I understand what you mean now. Yes, the user can always access a subdev's control directly from its /dev/v4l-subdevXX. I'm only providing this feature as a convenience to the user, so that all controls in a pipeline can be accessed from one place, i.e. the main capture device node. Steve