Re: [PATCH] staging: ion: remove from the tree
On 8/27/20 8:36 AM, Greg Kroah-Hartman wrote: The ION android code has long been marked to be removed, now that we dma-buf support merged into the real part of the kernel. It was thought that we could wait to remove the ion kernel at a later time, but as the out-of-tree Android fork of the ion code has diverged quite a bit, and any Android device using the ion interface uses that forked version and not this in-tree version, the in-tree copy of the code is abandonded and not used by anyone. Combine this abandoned codebase with the need to make changes to it in order to keep the kernel building properly, which then causes merge issues when merging those changes into the out-of-tree Android code, and you end up with two different groups of people (the in-kernel-tree developers, and the Android kernel developers) who are both annoyed at the current situation. Because of this problem, just drop the in-kernel copy of the ion code now, as it's not used, and is only causing problems for everyone involved. Cc: "Arve Hjønnevåg" Cc: "Christian König" Cc: Christian Brauner Cc: Christoph Hellwig Cc: Hridya Valsaraju Cc: Joel Fernandes Cc: John Stultz Cc: Laura Abbott Cc: Martijn Coenen Cc: Shuah Khan Cc: Sumit Semwal Cc: Suren Baghdasaryan Cc: Todd Kjos Cc: de...@driverdev.osuosl.org Cc: dri-de...@lists.freedesktop.org Cc: linaro-mm-...@lists.linaro.org Signed-off-by: Greg Kroah-Hartman We discussed this at the Android MC on Monday and the plan was to remove it after the next LTS release.
Re: [Ksummit-discuss] Linux Foundation 2020 Technical Advisory Board election: call for nominations
On 8/3/20 5:36 PM, Jonathan Corbet wrote: The election for the Linux Foundation Technical Advisory Board (TAB) will be held virtually during the 2020 Kernel Summit and Linux Plumbers Conference, August 24-28 2020. Nominations for candidates interested in serving on the TAB are currently being sought. The TAB serves as the interface between the kernel development community and the Linux Foundation, advising the Foundation on kernel-related matters, helping member companies learn to work with the community, and working to resolve community-related problems before they get out of hand. We also support the Code of Conduct committee in their mission. Over the last year, matters tended to by the TAB include proposals for developer workflow improvement, overseeing the Linux Plumbers Conference, moving toward more inclusive terminology in the kernel, and more. Minutes from TAB meetings can be found here: https://git.kernel.org/pub/scm/docs/tab/tab.git/tree/minutes The board has ten members, one of whom sits on the Linux Foundation board of directors. Half of the board (five members) is elected every year to serve a two-year term. The members whose terms are expiring this year are: Chris Mason (chair) Dan Williams Kees Cook Laura Abbott Olof Johansson The remaining members' terms will expire in 2021: Greg Kroah-Hartman Jonathan Corbet Sasha Levin Steven Rostedt Ted Ts'o Anyone is eligible to stand for election; simply send your nomination to: tech-board-disc...@lists.linux-foundation.org With your nomination, please include a short (<= 200 words) candidate statement focusing on why you are running and what you hope to accomplish on the TAB. We will be collecting these statements and making them publicly available. The deadline for receiving nominations is 9:00AM GMT-4 (US/Eastern) on August 24 (the first day of Kernel Summit). Due to the use of electronic voting, this will be a hard deadline! As always, please let us know if you have questions (the TAB can be reached at tech-bo...@lists.linuxfoundation.org), and please do consider running. As a reminder, the deadline for nominations is coming up. Please get your nominations in! Thanks, Laura
Re: [Ksummit-discuss] Linux Foundation Technical Advisory Board Elections -- voting procedures
On 8/13/20 10:31 AM, Johannes Berg wrote: Hi Laura, Seeing your reminder reminded me :) We will be using the electronic voting method that we used in 2019. All Linux Plumbers Attendees will automatically receive a ballot. Anyone who is otherwise eligible to vote should e-mail tab-electi...@lists.linuxfoundation.org to request a ballot. The deadline for requesting a ballot is August 17, 00:00 UTC (one week before Linux Plumbers) Will you be sending out some kind of voting tokens for the ballot? And if so, when is that supposed to happen? I (believe I) have requested a ballot, but didn't get a response so far. Thanks, Johannes The voting itself will be taking place during the week of Linux Plumbers/Kernel summit, August 24-28. Keep an eye out for the ballot during that time period. We'll send out another e-mail when the ballots go out. Thanks, Laura
Re: Linux Foundation Technical Advisory Board Elections -- voting procedures
On 7/27/20 5:31 PM, Laura Abbott wrote: On behalf of the Linux Foundation Technical Advisory Board (TAB), I'd like to announce the voting procedures for the 2020 TAB elections. The pool of eligible voters will consist of the following: 1) All attendees of the Linux Plumbers conference (i.e. kernel summit) 2) Anyone who is not a kernel summit attendee will also be eligible to vote if the following criteria are met: -- There exists three kernel commits in a mainline or stable released kernel that --- Have a commit date in the year 2019 or 2020 --- Contain an e-mail address in one of the following tags or merged tags (e.g. Reviewed-and-tested-by) Signed-off-by Tested-by Reported-by Reviewed-by Acked-by We will be using the electronic voting method that we used in 2019. All Linux Plumbers Attendees will automatically receive a ballot. Anyone who is otherwise eligible to vote should e-mail tab-electi...@lists.linuxfoundation.org to request a ballot. The deadline for requesting a ballot is August 17, 00:00 UTC (one week before Linux Plumbers) For those who would like to know the thought process behind this: Last year, we successfully used electronic voting for the TAB elections. Given the circumstances of this year, we have no other reasonable option for voting. While we could continue to limit voting to kernel summit attendees, one of the goals of moving away from in person voting was to potentially expand the voter pool. Since kernel summit is not being held in person this year, it makes sense to expand the voting pool at the same time. We will be sending a call for nominations and announcements about when voting will start at a later date. If you have any questions, feel free to reach out to the tab at t...@lists.linuxfoundation.org As a reminder, this vote is coming up. Some FAQs from last year on virtual voting: Q: What's the software used for voting? A: We will be using the hosted version of the Condorcet Internet Voting Service (CIVS) at https://civs.cs.cornell.edu Q: Is this code open source? A: Yes. The code is available under a BSD-like research license Q: Is this method of voting secure? A: Privacy and security is a focus of CIVS. See https://civs.cs.cornell.edu/sec_priv.html for more information. Q: The website mentions ranked choice voting. What is this? A: In ranked choice voting, you rank your preferred choices from most to least liked. The theory is this results in a more accurate representation of what the voter pool wants. Q: The description mentions an 'election supervisor'. What is this role? A: The election supervisor's role is to start and stop the poll, send links to voters, and set various options for the poll. A single e-mail address is used to e-mail the link to manage the election, after which anyone with the link can manage the poll. Q: Who is the election supervisor for the TAB elections? A: We have created a mailing list for election management, tab-electi...@lists.linuxfoundation.org Q: What if I lose the e-mail before I vote? A: Please e-mail tab-electi...@lists.linuxfoundation.org Q: What if I want to change my vote? A: This is not possible, please make sure you've made your final choices when you click submit. Q: What if I want to practice voting? A: CIVS has a number of sample polls available. Feel free to vote in those to see how the process works.
Linux Foundation Technical Advisory Board Elections -- voting procedures
On behalf of the Linux Foundation Technical Advisory Board (TAB), I'd like to announce the voting procedures for the 2020 TAB elections. The pool of eligible voters will consist of the following: 1) All attendees of the Linux Plumbers conference (i.e. kernel summit) 2) Anyone who is not a kernel summit attendee will also be eligible to vote if the following criteria are met: -- There exists three kernel commits in a mainline or stable released kernel that --- Have a commit date in the year 2019 or 2020 --- Contain an e-mail address in one of the following tags or merged tags (e.g. Reviewed-and-tested-by) Signed-off-by Tested-by Reported-by Reviewed-by Acked-by We will be using the electronic voting method that we used in 2019. All Linux Plumbers Attendees will automatically receive a ballot. Anyone who is otherwise eligible to vote should e-mail tab-electi...@lists.linuxfoundation.org to request a ballot. The deadline for requesting a ballot is August 17, 00:00 UTC (one week before Linux Plumbers) For those who would like to know the thought process behind this: Last year, we successfully used electronic voting for the TAB elections. Given the circumstances of this year, we have no other reasonable option for voting. While we could continue to limit voting to kernel summit attendees, one of the goals of moving away from in person voting was to potentially expand the voter pool. Since kernel summit is not being held in person this year, it makes sense to expand the voting pool at the same time. We will be sending a call for nominations and announcements about when voting will start at a later date. If you have any questions, feel free to reach out to the tab at t...@lists.linuxfoundation.org
Re: [PATCH v3] CodingStyle: Inclusive Terminology
On 7/8/20 2:14 PM, Dan Williams wrote: Linux maintains a coding-style and its own idiomatic set of terminology. Update the style guidelines to recommend replacements for the terms master/slave and blacklist/whitelist. Link: http://lore.kernel.org/r/159389297140.2210796.13590142254668787525.st...@dwillia2-desk3.amr.corp.intel.com Acked-by: Randy Dunlap Acked-by: Dave Airlie Acked-by: SeongJae Park Acked-by: Christian Brauner Acked-by: James Bottomley Reviewed-by: Mark Brown Signed-off-by: Theodore Ts'o Signed-off-by: Shuah Khan Signed-off-by: Dan Carpenter Signed-off-by: Kees Cook Signed-off-by: Olof Johansson Signed-off-by: Jonathan Corbet Signed-off-by: Chris Mason Signed-off-by: Greg Kroah-Hartman Signed-off-by: Dan Williams --- Changes since v2 [1]: - Pick up missed sign-offs and acks from Jon, Shuah, and Christian (sorry about missing those earlier). - Reformat the replacement list to make it easier to read. - Add 'controller' as a suggested replacement (Kees and Mark) - Fix up the paired term for 'performer' to be 'director' (Kees) - Collect some new acks, reviewed-by's, and sign-offs for v2. - Fix up Chris's email [1]: http://lore.kernel.org/r/159419296487.2464622.863943877093636532.st...@dwillia2-desk3.amr.corp.intel.com Documentation/process/coding-style.rst | 20 1 file changed, 20 insertions(+) diff --git a/Documentation/process/coding-style.rst b/Documentation/process/coding-style.rst index 2657a55c6f12..1bee6f8affdb 100644 --- a/Documentation/process/coding-style.rst +++ b/Documentation/process/coding-style.rst @@ -319,6 +319,26 @@ If you are afraid to mix up your local variable names, you have another problem, which is called the function-growth-hormone-imbalance syndrome. See chapter 6 (Functions). +For symbol names and documentation, avoid introducing new usage of +'master / slave' (or 'slave' independent of 'master') and 'blacklist / +whitelist'. + +Recommended replacements for 'master / slave' are: +'{primary,main} / {secondary,replica,subordinate}' +'{initiator,requester} / {target,responder}' +'{controller,host} / {device,worker,proxy}' +'leader / follower' +'director / performer' + +Recommended replacements for 'blacklist/whitelist' are: +'denylist / allowlist' +'blocklist / passlist' + +Exceptions for introducing new usage is to maintain a userspace ABI/API, +or when updating code for an existing (as of 2020) hardware or protocol +specification that mandates those terms. For new specifications +translate specification usage of the terminology to the kernel coding +standard where possible. 5) Typedefs --- Acked-by: Laura Abbott
Re: [PATCH v2] rtlwifi: Fix potential overflow on P2P code
On 10/19/19 6:51 AM, Kalle Valo wrote: Laura Abbott writes: Nicolas Waisman noticed that even though noa_len is checked for a compatible length it's still possible to overrun the buffers of p2pinfo since there's no check on the upper bound of noa_num. Bound noa_num against P2P_MAX_NOA_NUM. Reported-by: Nicolas Waisman Signed-off-by: Laura Abbott --- v2: Use P2P_MAX_NOA_NUM instead of erroring out. --- drivers/net/wireless/realtek/rtlwifi/ps.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/net/wireless/realtek/rtlwifi/ps.c b/drivers/net/wireless/realtek/rtlwifi/ps.c index 70f04c2f5b17..fff8dda14023 100644 --- a/drivers/net/wireless/realtek/rtlwifi/ps.c +++ b/drivers/net/wireless/realtek/rtlwifi/ps.c @@ -754,6 +754,9 @@ static void rtl_p2p_noa_ie(struct ieee80211_hw *hw, void *data, return; } else { noa_num = (noa_len - 2) / 13; + if (noa_num > P2P_MAX_NOA_NUM) + noa_num = P2P_MAX_NOA_NUM; + } noa_index = ie[3]; if (rtlpriv->psc.p2p_ps_info.p2p_ps_mode == @@ -848,6 +851,9 @@ static void rtl_p2p_action_ie(struct ieee80211_hw *hw, void *data, return; } else { noa_num = (noa_len - 2) / 13; + if (noa_num > P2P_MAX_NOA_NUM) + noa_num = P2P_MAX_NOA_NUM; IMHO using min() would be cleaner, but I'm fine with this as well. Up to you. I believe the intention is to re-write this anyway so I'd prefer to just get this in given the uptick this issue seems to have gotten. Thanks, Laura
[PATCH] tools: iio: Correctly add make dependency for iio_utils
iio tools fail to build correctly with make parallelization: $ make -s -j24 fixdep: error opening depfile: ./.iio_utils.o.d: No such file or directory make[1]: *** [/home/labbott/linux_upstream/tools/build/Makefile.build:96: iio_utils.o] Error 2 make: *** [Makefile:43: iio_event_monitor-in.o] Error 2 make: *** Waiting for unfinished jobs This is because iio_utils.o is used across multiple targets. Fix this by making iio_utils.o a proper dependency. Signed-off-by: Laura Abbott --- I realize that we don't really need the parallelization for tools because it's so small but when building with the distro we want to use the same make command and -j wherever possible. This same issue also appears in the gpio tools so if this looks like an okay approach I'll fix it there as well. --- tools/iio/Build| 1 + tools/iio/Makefile | 10 +++--- 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/tools/iio/Build b/tools/iio/Build index f74cbda64710..8d0f3af3723f 100644 --- a/tools/iio/Build +++ b/tools/iio/Build @@ -1,3 +1,4 @@ +iio_utils-y += iio_utils.o lsiio-y += lsiio.o iio_utils.o iio_event_monitor-y += iio_event_monitor.o iio_utils.o iio_generic_buffer-y += iio_generic_buffer.o iio_utils.o diff --git a/tools/iio/Makefile b/tools/iio/Makefile index e22378dba244..3de763d9ab70 100644 --- a/tools/iio/Makefile +++ b/tools/iio/Makefile @@ -32,20 +32,24 @@ $(OUTPUT)include/linux/iio: ../../include/uapi/linux/iio prepare: $(OUTPUT)include/linux/iio +IIO_UTILS_IN := $(OUTPUT)iio_utils-in.o +$(IIO_UTILS_IN): prepare FORCE + $(Q)$(MAKE) $(build)=iio_utils + LSIIO_IN := $(OUTPUT)lsiio-in.o -$(LSIIO_IN): prepare FORCE +$(LSIIO_IN): prepare FORCE $(OUTPUT)iio_utils-in.o $(Q)$(MAKE) $(build)=lsiio $(OUTPUT)lsiio: $(LSIIO_IN) $(QUIET_LINK)$(CC) $(CFLAGS) $(LDFLAGS) $< -o $@ IIO_EVENT_MONITOR_IN := $(OUTPUT)iio_event_monitor-in.o -$(IIO_EVENT_MONITOR_IN): prepare FORCE +$(IIO_EVENT_MONITOR_IN): prepare FORCE $(OUTPUT)iio_utils-in.o $(Q)$(MAKE) $(build)=iio_event_monitor $(OUTPUT)iio_event_monitor: $(IIO_EVENT_MONITOR_IN) $(QUIET_LINK)$(CC) $(CFLAGS) $(LDFLAGS) $< -o $@ IIO_GENERIC_BUFFER_IN := $(OUTPUT)iio_generic_buffer-in.o -$(IIO_GENERIC_BUFFER_IN): prepare FORCE +$(IIO_GENERIC_BUFFER_IN): prepare FORCE $(OUTPUT)iio_utils-in.o $(Q)$(MAKE) $(build)=iio_generic_buffer $(OUTPUT)iio_generic_buffer: $(IIO_GENERIC_BUFFER_IN) $(QUIET_LINK)$(CC) $(CFLAGS) $(LDFLAGS) $< -o $@ -- 2.21.0
[PATCH v2] rtlwifi: Fix potential overflow on P2P code
Nicolas Waisman noticed that even though noa_len is checked for a compatible length it's still possible to overrun the buffers of p2pinfo since there's no check on the upper bound of noa_num. Bound noa_num against P2P_MAX_NOA_NUM. Reported-by: Nicolas Waisman Signed-off-by: Laura Abbott --- v2: Use P2P_MAX_NOA_NUM instead of erroring out. --- drivers/net/wireless/realtek/rtlwifi/ps.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/net/wireless/realtek/rtlwifi/ps.c b/drivers/net/wireless/realtek/rtlwifi/ps.c index 70f04c2f5b17..fff8dda14023 100644 --- a/drivers/net/wireless/realtek/rtlwifi/ps.c +++ b/drivers/net/wireless/realtek/rtlwifi/ps.c @@ -754,6 +754,9 @@ static void rtl_p2p_noa_ie(struct ieee80211_hw *hw, void *data, return; } else { noa_num = (noa_len - 2) / 13; + if (noa_num > P2P_MAX_NOA_NUM) + noa_num = P2P_MAX_NOA_NUM; + } noa_index = ie[3]; if (rtlpriv->psc.p2p_ps_info.p2p_ps_mode == @@ -848,6 +851,9 @@ static void rtl_p2p_action_ie(struct ieee80211_hw *hw, void *data, return; } else { noa_num = (noa_len - 2) / 13; + if (noa_num > P2P_MAX_NOA_NUM) + noa_num = P2P_MAX_NOA_NUM; + } noa_index = ie[3]; if (rtlpriv->psc.p2p_ps_info.p2p_ps_mode == -- 2.21.0
[PATCH] rtlwifi: Fix potential overflow on P2P code
Nicolas Waisman noticed that even though noa_len is checked for a compatible length it's still possible to overrun the buffers of p2pinfo since there's no check on the upper bound of noa_num. Bounds check noa_num against P2P_MAX_NOA_NUM. Reported-by: Nicolas Waisman Signed-off-by: Laura Abbott --- Compile tested only as this was reported to the security list. --- drivers/net/wireless/realtek/rtlwifi/ps.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/drivers/net/wireless/realtek/rtlwifi/ps.c b/drivers/net/wireless/realtek/rtlwifi/ps.c index 70f04c2f5b17..c5cff598383d 100644 --- a/drivers/net/wireless/realtek/rtlwifi/ps.c +++ b/drivers/net/wireless/realtek/rtlwifi/ps.c @@ -754,6 +754,13 @@ static void rtl_p2p_noa_ie(struct ieee80211_hw *hw, void *data, return; } else { noa_num = (noa_len - 2) / 13; + if (noa_num > P2P_MAX_NOA_NUM) { + RT_TRACE(rtlpriv, COMP_INIT, DBG_LOUD, +"P2P notice of absence: invalid noa_num.%d\n", +noa_num); + return; + } + } noa_index = ie[3]; if (rtlpriv->psc.p2p_ps_info.p2p_ps_mode == @@ -848,6 +855,13 @@ static void rtl_p2p_action_ie(struct ieee80211_hw *hw, void *data, return; } else { noa_num = (noa_len - 2) / 13; + if (noa_num > P2P_MAX_NOA_NUM) { + RT_TRACE(rtlpriv, COMP_FW, DBG_LOUD, +"P2P notice of absence: invalid noa_len.%d\n", +noa_len); + return; + + } } noa_index = ie[3]; if (rtlpriv->psc.p2p_ps_info.p2p_ps_mode == -- 2.21.0
Re: mount on tmpfs failing to parse context option
On 10/7/19 9:26 PM, Al Viro wrote: On Mon, Oct 07, 2019 at 05:50:31PM -0700, Hugh Dickins wrote: [sorry for being MIA - had been sick through the last week, just digging myself from under piles of mail; my apologies] (tmpfs, very tiresomely, supports a NUMA "mpol" mount option which can have commas in it e.g "mpol=bind:0,2": which makes all its comma parsing awkward. I assume that where the new mount API commits bend over to accommodate that peculiarity, they end up mishandling the comma in the context string above.) Dumber than that, I'm afraid. mpol is the reason for having ->parse_monolithic() in the first place, all right, but the problem is simply the lack of security_sb_eat_lsm_opts() call in it. Could you check if the following fixes that one? diff --git a/mm/shmem.c b/mm/shmem.c index 0f7fd4a85db6..8dcc8d04cbaf 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -3482,6 +3482,12 @@ static int shmem_parse_options(struct fs_context *fc, void *data) { char *options = data; + if (options) { + int err = security_sb_eat_lsm_opts(options, >security); + if (err) + return err; + } + while (options != NULL) { char *this_char = options; for (;;) { Yes the reporter says that works. Thanks, Laura
Re: mount on tmpfs failing to parse context option
On 9/30/19 12:07 PM, Laura Abbott wrote: Hi, Fedora got a bug report https://bugzilla.redhat.com/show_bug.cgi?id=1757104 of a failure to parse options with the context mount option. From the reporter: $ unshare -rm mount -t tmpfs tmpfs /tmp -o 'context="system_u:object_r:container_file_t:s0:c475,c690"' mount: /tmp: wrong fs type, bad option, bad superblock on tmpfs, missing codepage or helper program, or other error. Sep 30 16:50:42 kernel: tmpfs: Unknown parameter 'c690"' I haven't asked the reporter to bisect yet but I'm suspecting one of the conversion to the new mount API: $ git log --oneline v5.3..origin/master mm/shmem.c edf445ad7c8d Merge branch 'hugepage-fallbacks' (hugepatch patches from David Rientjes) 19deb7695e07 Revert "Revert "Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"" 28eb3c808719 shmem: fix obsolete comment in shmem_getpage_gfp() 4101196b19d7 mm: page cache: store only head pages in i_pages d8c6546b1aea mm: introduce compound_nr() f32356261d44 vfs: Convert ramfs, shmem, tmpfs, devtmpfs, rootfs to use the new mount API 626c3920aeb4 shmem_parse_one(): switch to use of fs_parse() e04dc423ae2c shmem_parse_options(): take handling a single option into a helper f6490b7fbb82 shmem_parse_options(): don't bother with mpol in separate variable 0b5071dd323d shmem_parse_options(): use a separate structure to keep the results 7e30d2a5eb0b make shmem_fill_super() static I didn't find another report or a fix yet. Is it worth asking the reporter to bisect? Thanks, Laura Ping again, I never heard anything back and I didn't see anything come in with -rc2
mount on tmpfs failing to parse context option
Hi, Fedora got a bug report https://bugzilla.redhat.com/show_bug.cgi?id=1757104 of a failure to parse options with the context mount option. From the reporter: $ unshare -rm mount -t tmpfs tmpfs /tmp -o 'context="system_u:object_r:container_file_t:s0:c475,c690"' mount: /tmp: wrong fs type, bad option, bad superblock on tmpfs, missing codepage or helper program, or other error. Sep 30 16:50:42 kernel: tmpfs: Unknown parameter 'c690"' I haven't asked the reporter to bisect yet but I'm suspecting one of the conversion to the new mount API: $ git log --oneline v5.3..origin/master mm/shmem.c edf445ad7c8d Merge branch 'hugepage-fallbacks' (hugepatch patches from David Rientjes) 19deb7695e07 Revert "Revert "Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"" 28eb3c808719 shmem: fix obsolete comment in shmem_getpage_gfp() 4101196b19d7 mm: page cache: store only head pages in i_pages d8c6546b1aea mm: introduce compound_nr() f32356261d44 vfs: Convert ramfs, shmem, tmpfs, devtmpfs, rootfs to use the new mount API 626c3920aeb4 shmem_parse_one(): switch to use of fs_parse() e04dc423ae2c shmem_parse_options(): take handling a single option into a helper f6490b7fbb82 shmem_parse_options(): don't bother with mpol in separate variable 0b5071dd323d shmem_parse_options(): use a separate structure to keep the results 7e30d2a5eb0b make shmem_fill_super() static I didn't find another report or a fix yet. Is it worth asking the reporter to bisect? Thanks, Laura
Re: [PATCH] arm64: use generic free_initrd_mem()
On 9/16/19 8:21 AM, Mike Rapoport wrote: From: Mike Rapoport arm64 calls memblock_free() for the initrd area in its implementation of free_initrd_mem(), but this call has no actual effect that late in the boot process. By the time initrd is freed, all the reserved memory is managed by the page allocator and the memblock.reserved is unused, so there is no point to update it. People like to use memblock for keeping track of memory even if it has no actual effect. We made this change explicitly (see 05c58752f9dc ("arm64: To remove initrd reserved area entry from memblock") That said, moving to the generic APIs would be nice. Maybe we can find another place to update the accounting? Without the memblock_free() call the only difference between arm64 and the generic versions of free_initrd_mem() is the memory poisoning. Switching arm64 to the generic version will enable the poisoning. Signed-off-by: Mike Rapoport --- I've boot tested it on qemu and I've checked that kexec works. arch/arm64/mm/init.c | 8 1 file changed, 8 deletions(-) diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index f3c7952..8ad2934 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -567,14 +567,6 @@ void free_initmem(void) unmap_kernel_range((u64)__init_begin, (u64)(__init_end - __init_begin)); } -#ifdef CONFIG_BLK_DEV_INITRD -void __init free_initrd_mem(unsigned long start, unsigned long end) -{ - free_reserved_area((void *)start, (void *)end, 0, "initrd"); - memblock_free(__virt_to_phys(start), end - start); -} -#endif - /* * Dump out memory limit information on panic. */
Results: Linux Foundation Technical Advisory Board Election 2019
The following are the results of the 2019 Technical Advisory Board (TAB) elections: 1. Greg Kroah-Hartman 2. Jonathan Corbet 3. Steven Rostedt 4. Ted Ts’o 5. Sasha Levin Thank you to all of the candidates for stepping up and running this year. We appreciate your willingness to serve the kernel community. We had 174 people vote this year. Because of the way CIVS works, we don't have the full count of votes for each person, only the aggregated score and who "won" against each person. Results are available to those who are interested. A big thank you to everyone who helped with the electronic voting this year, whether participating in a test vote or proofreading my e-mails explaining voting procedures. Congratulations to all the elected candidates! Thanks, Laura
Re: Linux Foundation Technical Advisory Board Elections -- Call for nominations
> On Aug 28, 2019, at 1:48 PM, Laura Abbott wrote: > > >> On Aug 9, 2019, at 2:26 AM, Laura Abbott wrote: >> >> Hello everyone, >> >> Friendly reminder that the TAB elections are coming soon: >> >> The Linux Foundation Technical Advisory Board (TAB) serves as the >> interface between the kernel development community and the Linux >> Foundation. The TAB advises the Foundation on kernel-related matters, >> helps member companies learn to work with the community, and works to >> resolve community-related problems before they get out of hand. We >> also support the Code of Conduct committee in their mission. >> >> The board has ten members, one of whom sits on the Linux Foundation >> board of directors. >> >> The election to select five TAB members will be held at the 2019 Kernel >> Summit >> in Lisbon, Portugal September 9-11. As has been announced[2], we are moving >> to >> an electronic voting system this year. Further details about the exact voting >> procedures will be coming soon. Anyone is eligible to stand for election, >> simply send your nomination to: >> >> tech-board-discuss at lists.linux-foundation.org >> >> With your nomination, please include a short candidate statement. This >> candidate >> statement should focus on why you are running and what you hope to accomplish >> on the TAB. We will be collecting these statements and making them publicly >> available. >> >> The deadline for receiving nominations is 9am GMT+1 on September 9th (the >> first >> day of Kernel Summit). Due to the use of electronic voting, this will be a >> hard >> deadline! >> >> Current TAB members, and their election year: >> >> Jon Corbet 2017 >> Greg Kroah-Hartman 2017 >> Steven Rostedt 2017 >> Ted Tso 2017 >> Tim Bird 2017 >> >> Chris Mason 2018 >> Laura Abbott 2018 >> Olof Johansson 2018 >> Kees Cook2018 >> Dan Williams 2018 >> >> The five slots from 2017 are all up for election. As always, please >> let us know if you have questions, and please do consider running. >> >> Thanks, >> Laura >> >> [1] TAB members sit for a term of two years, and half of the board is >> up for election every year. Five of the seats are up for election now. >> The other five are halfway through their term and will be up for >> election next year. >> >> [2] >> https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2019-July/006582.html > > Reminder to send in your candidate statements, the current ones are > available at > > https://docs.google.com/document/d/1E3_W1c-xJMx9o2PCnKiGt3vqs-mPh77yNO4GSqNipOQ Final reminder, the deadline is Monday September 9th at 9am UTC+1 (that's 9am Lisbon time). Because we are doing electronic voting this is a hard deadline!
Re: Linux Foundation Technical Advisory Board Elections -- voting procedures
> On Sep 3, 2019, at 7:13 AM, Laura Abbott wrote: > > Hi, > > On behalf of the Linux Foundation Technical Advisory Board (TAB), I'd like to > take this opportunity to announce the voting procedures for the 2019 TAB > elections. As was announced[1], this year we are moving to electronic voting. > > Everyone who is registered for kernel summit (co-located with Linux Plumbers > Conference in Lisbon this year) by September 8th 2019 is eligible to vote in > this year's TAB elections. This includes everyone registered for Plumbers and > Maintainers summit. All eligible voters will receive a link from > Condorcet Internet Voting Service (https://civs.cs.cornell.edu) by the > start of the first Plumbers session (September 9th 10am UTC+1). The voting > will run until September 11th at 10am UTC+1. > > The list of all candidates and their platform is available at the following > Google doc > > https://docs.google.com/document/d/1E3_W1c-xJMx9o2PCnKiGt3vqs-mPh77yNO4GSqNipOQ/edit?usp=sharing > > We will also be hosting an open TAB session at Plumbers on Monday > September 9th at 18:30. A more detailed FAQ about voting procedures is > below. > > If you have any questions, feel free to reach out to > t...@lists.linux-foundation.org . > > Thanks, > Laura > > P.S. Please consider this a reminder to send in your TAB nominations! > > [1] > https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2019-July/006582.html > > --- > > Q: Why are we making this change? > A: As explained in the previous announcement, > https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2019-July/006582.html > In person voting has a number of limitations. We'd like to move to electronic > voting with the objective of giving more members of our community a voice in > the membership of the TAB > > Q: Who is eligible to vote? > A: All registered attendees of Plumbers and Kernel Maintainers Summit are > eligible. > > Q: If I am registered for Plumbers but not attending can I still vote? > A: We will be sending the e-mail to all registered attendees before confirming > they are present. > > Q: Can I register for Plumbers just to vote? > A: Plumbers is sold out this year. > > Q: Why bother with electronic voting if the voting pool is still conference > attendees? > A: The kernel philosophy is small incremental changes. Based on discussions > with the TAB, changing the voting method and widening the voting pool > simultaneously was too much for one year. The goal is to run the electronic > voting this year with the same voting pool and then discuss how voting will > work in subsequent years. > > Q: When does voting start? > A: E-mails with the voting link will be sent out before the start of the > first Plumbers session on Monday September 9th at 10am UTC+1 > > Q: When does voting end? > A: Voting ends on September 11th at 10am UTC+1 > > Q: What's the software used for voting? > A: We will be using the hosted version of the Condorcet Internet Voting > Service > (CIVS) at https://civs.cs.cornell.edu > > Q: Is this code open source? > A: Yes. The code is available under a BSD-like research license > > Q: How do I vote? > A: You will receive an e-mail by Monday September 9th at 10am UTC+1 with > a link to vote. > > Q: Is this method of voting secure? > A: Privacy and security is a focus of CIVS. See > https://civs.cs.cornell.edu/sec_priv.html for more information. > > Q: The website mentions ranked choice voting. What is this? > A: In ranked choice voting, you rank your preferred choices from most > to least liked. The theory is this results in a more accurate representation > of what the voter pool wants. This is a different method than we've used > for TAB elections in the past where you indicated your preferred $n out > of $m candidates. Because we are using the hosted version of CIVS, we did > not have the option to use our old method of voting. > > Q: The description mentions an 'election supervisor'. What is this role? > A: The election supervisor's role is to start and stop the poll, send > links to voters, and set various options for the poll. A single e-mail > address is used to e-mail the link to manage the election, after which > anyone with the link can manage the poll. > > Q: Who is the election supervisor for the TAB elections? > A: We have created a mailing list for election management. This mailing > list contains individuals from the kernel community who are not running > for the TAB this year, similar to in-person proctors from past years. > We are still working on getting the mailing list set up, the address will > be announced when it is ready. > > Q: What if I lose the e
Linux Foundation Technical Advisory Board Elections -- voting procedures
Hi, On behalf of the Linux Foundation Technical Advisory Board (TAB), I'd like to take this opportunity to announce the voting procedures for the 2019 TAB elections. As was announced[1], this year we are moving to electronic voting. Everyone who is registered for kernel summit (co-located with Linux Plumbers Conference in Lisbon this year) by September 8th 2019 is eligible to vote in this year's TAB elections. This includes everyone registered for Plumbers and Maintainers summit. All eligible voters will receive a link from Condorcet Internet Voting Service (https://civs.cs.cornell.edu) by the start of the first Plumbers session (September 9th 10am UTC+1). The voting will run until September 11th at 10am UTC+1. The list of all candidates and their platform is available at the following Google doc https://docs.google.com/document/d/1E3_W1c-xJMx9o2PCnKiGt3vqs-mPh77yNO4GSqNipOQ/edit?usp=sharing We will also be hosting an open TAB session at Plumbers on Monday September 9th at 18:30. A more detailed FAQ about voting procedures is below. If you have any questions, feel free to reach out to t...@lists.linux-foundation.org . Thanks, Laura P.S. Please consider this a reminder to send in your TAB nominations! [1] https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2019-July/006582.html --- Q: Why are we making this change? A: As explained in the previous announcement, https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2019-July/006582.html In person voting has a number of limitations. We'd like to move to electronic voting with the objective of giving more members of our community a voice in the membership of the TAB Q: Who is eligible to vote? A: All registered attendees of Plumbers and Kernel Maintainers Summit are eligible. Q: If I am registered for Plumbers but not attending can I still vote? A: We will be sending the e-mail to all registered attendees before confirming they are present. Q: Can I register for Plumbers just to vote? A: Plumbers is sold out this year. Q: Why bother with electronic voting if the voting pool is still conference attendees? A: The kernel philosophy is small incremental changes. Based on discussions with the TAB, changing the voting method and widening the voting pool simultaneously was too much for one year. The goal is to run the electronic voting this year with the same voting pool and then discuss how voting will work in subsequent years. Q: When does voting start? A: E-mails with the voting link will be sent out before the start of the first Plumbers session on Monday September 9th at 10am UTC+1 Q: When does voting end? A: Voting ends on September 11th at 10am UTC+1 Q: What's the software used for voting? A: We will be using the hosted version of the Condorcet Internet Voting Service (CIVS) at https://civs.cs.cornell.edu Q: Is this code open source? A: Yes. The code is available under a BSD-like research license Q: How do I vote? A: You will receive an e-mail by Monday September 9th at 10am UTC+1 with a link to vote. Q: Is this method of voting secure? A: Privacy and security is a focus of CIVS. See https://civs.cs.cornell.edu/sec_priv.html for more information. Q: The website mentions ranked choice voting. What is this? A: In ranked choice voting, you rank your preferred choices from most to least liked. The theory is this results in a more accurate representation of what the voter pool wants. This is a different method than we've used for TAB elections in the past where you indicated your preferred $n out of $m candidates. Because we are using the hosted version of CIVS, we did not have the option to use our old method of voting. Q: The description mentions an 'election supervisor'. What is this role? A: The election supervisor's role is to start and stop the poll, send links to voters, and set various options for the poll. A single e-mail address is used to e-mail the link to manage the election, after which anyone with the link can manage the poll. Q: Who is the election supervisor for the TAB elections? A: We have created a mailing list for election management. This mailing list contains individuals from the kernel community who are not running for the TAB this year, similar to in-person proctors from past years. We are still working on getting the mailing list set up, the address will be announced when it is ready. Q: What if I lose the e-mail before I vote? A: Please e-mail the election list, address to be announced Q: What if I want to change my vote? A: This is not possible, please make sure you've made your final choices when you click submit. Q: What if I want to practice voting? A: CIVS has a number of sample polls available. Feel free to vote in those to see how the process works. Q: What if something unforeseen happens with electronic voting and we don't end up with results? A: We will arrange an in person vote similar to previous years. Q: What if I have questions not addressed here? A: E-mail
Re: Linux Foundation Technical Advisory Board Elections -- Call for nominations
> On Aug 9, 2019, at 2:26 AM, Laura Abbott wrote: > > Hello everyone, > > Friendly reminder that the TAB elections are coming soon: > > The Linux Foundation Technical Advisory Board (TAB) serves as the > interface between the kernel development community and the Linux > Foundation. The TAB advises the Foundation on kernel-related matters, > helps member companies learn to work with the community, and works to > resolve community-related problems before they get out of hand. We > also support the Code of Conduct committee in their mission. > > The board has ten members, one of whom sits on the Linux Foundation > board of directors. > > The election to select five TAB members will be held at the 2019 Kernel Summit > in Lisbon, Portugal September 9-11. As has been announced[2], we are moving to > an electronic voting system this year. Further details about the exact voting > procedures will be coming soon. Anyone is eligible to stand for election, > simply send your nomination to: > > tech-board-discuss at lists.linux-foundation.org > > With your nomination, please include a short candidate statement. This > candidate > statement should focus on why you are running and what you hope to accomplish > on the TAB. We will be collecting these statements and making them publicly > available. > > The deadline for receiving nominations is 9am GMT+1 on September 9th (the > first > day of Kernel Summit). Due to the use of electronic voting, this will be a > hard > deadline! > > Current TAB members, and their election year: > > Jon Corbet2017 > Greg Kroah-Hartman2017 > Steven Rostedt 2017 > Ted Tso 2017 > Tim Bird 2017 > > Chris Mason 2018 > Laura Abbott 2018 > Olof Johansson2018 > Kees Cook 2018 > Dan Williams 2018 > > The five slots from 2017 are all up for election. As always, please > let us know if you have questions, and please do consider running. > > Thanks, > Laura > > [1] TAB members sit for a term of two years, and half of the board is > up for election every year. Five of the seats are up for election now. > The other five are halfway through their term and will be up for > election next year. > > [2] > https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2019-July/006582.html Reminder to send in your candidate statements, the current ones are available at https://docs.google.com/document/d/1E3_W1c-xJMx9o2PCnKiGt3vqs-mPh77yNO4GSqNipOQ Thanks, Laura
Linux Foundation Technical Advisory Board Elections -- Call for nominations
Hello everyone, Friendly reminder that the TAB elections are coming soon: The Linux Foundation Technical Advisory Board (TAB) serves as the interface between the kernel development community and the Linux Foundation. The TAB advises the Foundation on kernel-related matters, helps member companies learn to work with the community, and works to resolve community-related problems before they get out of hand. We also support the Code of Conduct committee in their mission. The board has ten members, one of whom sits on the Linux Foundation board of directors. The election to select five TAB members will be held at the 2019 Kernel Summit in Lisbon, Portugal September 9-11. As has been announced[2], we are moving to an electronic voting system this year. Further details about the exact voting procedures will be coming soon. Anyone is eligible to stand for election, simply send your nomination to: tech-board-discuss at lists.linux-foundation.org With your nomination, please include a short candidate statement. This candidate statement should focus on why you are running and what you hope to accomplish on the TAB. We will be collecting these statements and making them publicly available. The deadline for receiving nominations is 9am GMT+1 on September 9th (the first day of Kernel Summit). Due to the use of electronic voting, this will be a hard deadline! Current TAB members, and their election year: Jon Corbet 2017 Greg Kroah-Hartman 2017 Steven Rostedt 2017 Ted Tso 2017 Tim Bird2017 Chris Mason 2018 Laura Abbott2018 Olof Johansson 2018 Kees Cook 2018 Dan Williams2018 The five slots from 2017 are all up for election. As always, please let us know if you have questions, and please do consider running. Thanks, Laura [1] TAB members sit for a term of two years, and half of the board is up for election every year. Five of the seats are up for election now. The other five are halfway through their term and will be up for election next year. [2] https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2019-July/006582.html
[PATCH] mm: slub: Fix slab walking for init_on_free
To properly clear the slab on free with slab_want_init_on_free, we walk the list of free objects using get_freepointer/set_freepointer. The value we get from get_freepointer may not be valid. This isn't an issue since an actual value will get written later but this means there's a chance of triggering a bug if we use this value with set_freepointer: [4.478342] kernel BUG at mm/slub.c:306! [4.482437] invalid opcode: [#1] PREEMPT PTI [4.485750] CPU: 0 PID: 0 Comm: swapper Not tainted 5.2.0-05754-g6471384a #4 [4.490635] RIP: 0010:kfree+0x58a/0x5c0 [4.493679] Code: 48 83 05 78 37 51 02 01 0f 0b 48 83 05 7e 37 51 02 01 48 83 05 7e 37 51 02 01 48 83 05 7e 37 51 02 01 48 83 05 d6 37 51 02 01 <0f> 0b 48 83 05 d4 37 51 02 01 48 83 05 d4 37 51 02 01 48 83 05 d4 [4.506827] RSP: :82603d90 EFLAGS: 00010002 [4.510475] RAX: 8c3976c04320 RBX: 8c3976c04300 RCX: [4.515420] RDX: 8c3976c04300 RSI: RDI: 8c3976c04320 [4.520331] RBP: 82603db8 R08: R09: [4.525288] R10: 8c3976c04320 R11: 8289e1e0 R12: d52cc8db0100 [4.530180] R13: 8c3976c01a00 R14: 810f10d4 R15: 8c3976c04300 [4.535079] FS: () GS:8266b000() knlGS: [4.540628] CS: 0010 DS: ES: CR0: 80050033 [4.544593] CR2: 8c397000 CR3: 00012502 CR4: 000406b0 [4.549558] Call Trace: [4.551266] apply_wqattrs_prepare+0x154/0x280 [4.554357] apply_workqueue_attrs_locked+0x4e/0xe0 [4.557728] apply_workqueue_attrs+0x36/0x60 [4.560654] alloc_workqueue+0x25a/0x6d0 [4.563381] ? kmem_cache_alloc_trace+0x1e3/0x500 [4.566628] ? __mutex_unlock_slowpath+0x44/0x3f0 [4.569875] workqueue_init_early+0x246/0x348 [4.573025] start_kernel+0x3c7/0x7ec [4.575558] x86_64_start_reservations+0x40/0x49 [4.578738] x86_64_start_kernel+0xda/0xe4 [4.581600] secondary_startup_64+0xb6/0xc0 [4.584473] Modules linked in: [4.586620] ---[ end trace f67eb9af4d8d492b ]--- Fix this by ensuring the value we set with set_freepointer is either NULL or another value in the chain. Reported-by: kernel test robot Signed-off-by: Laura Abbott --- mm/slub.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index e6c030e47364..8834563cdb4b 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1432,7 +1432,9 @@ static inline bool slab_free_freelist_hook(struct kmem_cache *s, void *old_tail = *tail ? *tail : *head; int rsize; - if (slab_want_init_on_free(s)) + if (slab_want_init_on_free(s)) { + void *p = NULL; + do { object = next; next = get_freepointer(s, object); @@ -1445,8 +1447,10 @@ static inline bool slab_free_freelist_hook(struct kmem_cache *s, : 0; memset((char *)object + s->inuse, 0, s->size - s->inuse - rsize); - set_freepointer(s, object, next); + set_freepointer(s, object, p); + p = object; } while (object != old_tail); + } /* * Compiler cannot detect this function can be removed if slab_free_hook() -- 2.21.0
Re: [mm] 6471384af2: kernel_BUG_at_mm/slub.c
On 7/29/19 5:43 AM, kernel test robot wrote: FYI, we noticed the following commit (built with gcc-7): commit: 6471384af2a6530696fc0203bafe4de41a23c9ef ("mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options") https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux.git master in testcase: trinity with following parameters: runtime: 300s test-description: Trinity is a linux system call fuzz tester. test-url: http://codemonkey.org.uk/projects/trinity/ on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 8G caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace): +--+++ | | ba5c5e4a5d | 6471384af2 | +--+++ | boot_successes | 8 | 0 | | boot_failures| 2 | 15 | | invoked_oom-killer:gfp_mask=0x | 1 || | Mem-Info | 1 || | kernel_BUG_at_security/keys/keyring.c| 1 || | invalid_opcode:#[##] | 1 | 15 | | RIP:__key_link_begin | 1 || | Kernel_panic-not_syncing:Fatal_exception | 1 | 15 | | kernel_BUG_at_mm/slub.c | 0 | 15 | | RIP:kfree| 0 | 15 | +--+++ If you fix the issue, kindly add following tag Reported-by: kernel test robot [4.478342] kernel BUG at mm/slub.c:306! [4.482437] invalid opcode: [#1] PREEMPT PTI [4.485750] CPU: 0 PID: 0 Comm: swapper Not tainted 5.2.0-05754-g6471384a #4 [4.490635] RIP: 0010:kfree+0x58a/0x5c0 [4.493679] Code: 48 83 05 78 37 51 02 01 0f 0b 48 83 05 7e 37 51 02 01 48 83 05 7e 37 51 02 01 48 83 05 7e 37 51 02 01 48 83 05 d6 37 51 02 01 <0f> 0b 48 83 05 d4 37 51 02 01 48 83 05 d4 37 51 02 01 48 83 05 d4 [4.506827] RSP: :82603d90 EFLAGS: 00010002 [4.510475] RAX: 8c3976c04320 RBX: 8c3976c04300 RCX: [4.515420] RDX: 8c3976c04300 RSI: RDI: 8c3976c04320 [4.520331] RBP: 82603db8 R08: R09: [4.525288] R10: 8c3976c04320 R11: 8289e1e0 R12: d52cc8db0100 [4.530180] R13: 8c3976c01a00 R14: 810f10d4 R15: 8c3976c04300 [4.535079] FS: () GS:8266b000() knlGS: [4.540628] CS: 0010 DS: ES: CR0: 80050033 [4.544593] CR2: 8c397000 CR3: 00012502 CR4: 000406b0 [4.549558] Call Trace: [4.551266] apply_wqattrs_prepare+0x154/0x280 [4.554357] apply_workqueue_attrs_locked+0x4e/0xe0 [4.557728] apply_workqueue_attrs+0x36/0x60 [4.560654] alloc_workqueue+0x25a/0x6d0 [4.563381] ? kmem_cache_alloc_trace+0x1e3/0x500 [4.566628] ? __mutex_unlock_slowpath+0x44/0x3f0 [4.569875] workqueue_init_early+0x246/0x348 [4.573025] start_kernel+0x3c7/0x7ec [4.575558] x86_64_start_reservations+0x40/0x49 [4.578738] x86_64_start_kernel+0xda/0xe4 [4.581600] secondary_startup_64+0xb6/0xc0 [4.584473] Modules linked in: [4.586620] ---[ end trace f67eb9af4d8d492b ]--- I think this is catching an edge case with the freelist walking code in slab_free_freelist_hook. If we're not doing a bulk free, getting the free pointer from the object is going to be bogus so there's a chance it could trigger the bug in set_freepointer even if we don't actually care about it since it's going to get overwritten when we actually free. It's probably more robust to make sure we're terminating it with NULL. Lightly tested: diff --git a/mm/slub.c b/mm/slub.c index e6c030e47364..8834563cdb4b 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1432,7 +1432,9 @@ static inline bool slab_free_freelist_hook(struct kmem_cache *s, void *old_tail = *tail ? *tail : *head; int rsize; - if (slab_want_init_on_free(s)) + if (slab_want_init_on_free(s)) { + void *p = NULL; + do { object = next; next = get_freepointer(s, object); @@ -1445,8 +1447,10 @@ static inline bool slab_free_freelist_hook(struct kmem_cache *s, : 0; memset((char *)object + s->inuse, 0, s->size - s->inuse - rsize); - set_freepointer(s, object, next); + set_freepointer(s, object, p); + p = object; } while (object != old_tail); + } /* * Compiler cannot detect this
Re: Limits for ION Memory Allocator
On 7/17/19 12:31 PM, Alexander Popov wrote: Hello! The syzkaller [1] has a trouble with fuzzing the Linux kernel with ION Memory Allocator. Syzkaller uses several methods [2] to limit memory consumption of the userspace processes calling the syscalls for testing the kernel: - setrlimit(), - cgroups, - various sysctl. But these methods don't work for ION Memory Allocator, so any userspace process that has access to /dev/ion can bring the system to the out-of-memory state. An example of a program doing that: #include #include #include #include #include #include #define ION_IOC_MAGIC 'I' #define ION_IOC_ALLOC _IOWR(ION_IOC_MAGIC, 0, \ struct ion_allocation_data) struct ion_allocation_data { __u64 len; __u32 heap_id_mask; __u32 flags; __u32 fd; __u32 unused; }; int main(void) { unsigned long i = 0; int fd = -1; struct ion_allocation_data data = { .len = 0x13f65d8c, .heap_id_mask = 1, .flags = 0, .fd = -1, .unused = 0 }; fd = open("/dev/ion", 0); if (fd == -1) { perror("[-] open /dev/ion"); return 1; } while (1) { printf("iter %lu\n", i); ioctl(fd, ION_IOC_ALLOC, ); i++; } return 0; } I looked through the code of ion_alloc() and didn't find any limit checks. Is it currently possible to limit ION kernel allocations for some process? If not, is it a right idea to do that? Thanks! Yes, I do think that's the right approach. We're working on moving Ion out of staging and this is something I mentioned to John Stultz. I don't think we've thought too hard about how to do the actual limiting so suggestions are welcome. Thanks, Laura Best regards, Alexander [1]: https://github.com/google/syzkaller [2]: https://github.com/google/syzkaller/blob/master/executor/common_linux.h
Re: [PATCH] arm64/sve: fix genksyms generation
On 6/18/19 10:15 AM, Arnd Bergmann wrote: On Tue, Jun 18, 2019 at 2:03 PM Will Deacon wrote: From 6e004b8824d4eb6a4e61cd794fbc3a761b50135b Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Tue, 18 Jun 2019 12:56:49 +0100 Subject: [PATCH] genksyms: Teach parse about __uint128_t built-in type __uint128_t crops up in a few files that export symbols to modules, so teach genksyms about it so that we don't end up skipping the CRC generation for some symbols due to the parser failing to spot them: | WARNING: EXPORT symbol "kernel_neon_begin" [vmlinux] version | generation failed, symbol will not be versioned. | ld: arch/arm64/kernel/fpsimd.o: relocation R_AARCH64_ABS32 against | `__crc_kernel_neon_begin' can not be used when making a shared | object | ld: arch/arm64/kernel/fpsimd.o:(.data+0x0): dangerous relocation: | unsupported relocation Cc: Ard Biesheuvel Reported-by: Arnd Bergmann Signed-off-by: Will Deacon Looks good to me, Acked-by: Arnd Bergmann I've added this to my randconfig build setup, replacing my earlier patch, and will let you know if any problems with it remain. Arnd I just hit this on 5ad18b2e60b75c7297a998dea702451d33a052ed on Linus' branch. The fix works for me (feel free to add Tested-by). Is this already queued up for Linus? Thanks, Laura
Re: [PATCH] staging: android: ion: Bail out upon SIGKILL when allocating memory.
On 7/1/19 5:02 PM, Tetsuo Handa wrote: On 2019/07/01 23:02, Sumit Semwal wrote: Acked-by: Laura Abbott fwiw, Acked-by: Sumit Semwal Thank you for responding. You can carry this patch via whichever tree you like. By the way, is "memory allocation from ion_system_heap_allocate() is calling ion_system_heap_shrink()" ( https://lkml.kernel.org/r/03763360-a7de-de87-eb90-ba7838143...@i-love.sakura.ne.jp ) what we want? Such memory allocations might not want to call shrinkers... For what Ion gets typically used for we do want to be calling shrinkers. I've had discussions with people in the past about the risk of Ion as DoS vector due to its ability to allocate large amounts of memory. At the very least, we probably shouldn't be trying to call the Ion shrinker when we're in the middle of an ion allocation since shrinking won't do us any good. We're in the process of re-working Ion at the moment so this might be a good topic to bring up again. Thanks, Laura
Re: [PATCH] ARM: mm: only adjust sections of valid mm structures
On 6/27/19 5:32 PM, Doug Berger wrote: A timing hazard exists when an early fork/exec thread begins exiting and sets its mm pointer to NULL while a separate core tries to update the section information. This commit ensures that the mm pointer is not NULL before setting its section parameters. The arguments provided by commit 11ce4b33aedc ("ARM: 8672/1: mm: remove tasklist locking from update_sections_early()") are equally valid for not requiring grabbing the task_lock around this check. Fixes: 08925c2f124f ("ARM: 8464/1: Update all mm structures with section adjustments") Signed-off-by: Doug Berger --- arch/arm/mm/init.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c index be0b42937888..bdc70dff477b 100644 --- a/arch/arm/mm/init.c +++ b/arch/arm/mm/init.c @@ -616,7 +616,8 @@ static void update_sections_early(struct section_perm perms[], int n) if (t->flags & PF_KTHREAD) continue; for_each_thread(t, s) - set_section_perms(perms, n, true, s->mm); + if (s->mm) + set_section_perms(perms, n, true, s->mm); } set_section_perms(perms, n, true, current->active_mm); set_section_perms(perms, n, true, _mm); Acked-by: Laura Abbott
Re: [PATCH] staging: android: ion: Bail out upon SIGKILL when allocating memory.
On 7/1/19 6:55 AM, Tetsuo Handa wrote: Andrew, can you pick up this patch? No response from Laura Abbott nor Sumit Semwal. On 2019/06/21 18:58, Tetsuo Handa wrote: From e0758655727044753399fb4f7c5f3eb25ac5cccd Mon Sep 17 00:00:00 2001 From: Tetsuo Handa Date: Fri, 21 Jun 2019 11:22:51 +0900 Subject: [PATCH] staging: android: ion: Bail out upon SIGKILL when allocating memory. syzbot found that a thread can stall for minutes inside ion_system_heap_allocate() after that thread was killed by SIGKILL [1]. Let's check for SIGKILL before doing memory allocation. [1] https://syzkaller.appspot.com/bug?id=a0e3436829698d5824231251fad9d8e998f94f5e Signed-off-by: Tetsuo Handa Reported-by: syzbot --- drivers/staging/android/ion/ion_page_pool.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/staging/android/ion/ion_page_pool.c b/drivers/staging/android/ion/ion_page_pool.c index fd4995fb676e..f85ec5b16b65 100644 --- a/drivers/staging/android/ion/ion_page_pool.c +++ b/drivers/staging/android/ion/ion_page_pool.c @@ -8,11 +8,14 @@ #include #include #include +#include #include "ion.h" static inline struct page *ion_page_pool_alloc_pages(struct ion_page_pool *pool) { + if (fatal_signal_pending(current)) + return NULL; return alloc_pages(pool->gfp_mask, pool->order); } Acked-by: Laura Abbott
Re: perf build failure with newer glibc headers
On 6/13/19 11:19 AM, Arnaldo Carvalho de Melo wrote: Em Wed, Jun 12, 2019 at 05:56:11PM -0300, Arnaldo Carvalho de Melo escreveu: So, we'll have to have a feature test, that defines some HAVE_GETTID that then ifdefs out our inline copy, working on it. This should take care of it, please check, perhaps providing a Tested-by: to add to this, built okay on the tree that previously failed Tested-by: Laura Abbott Thanks, - Arnaldo commit a04ef2eb0a66d9479e75e536d919c8c9cd618ee3 Author: Arnaldo Carvalho de Melo Date: Thu Jun 13 12:04:19 2019 -0300 tools build: Check if gettid() is available before providing helper Laura reported that the perf build failed in fedora when we got a glibc that provides gettid(), which I reproduced using fedora rawhide with the glibc-devel-2.29.9000-26.fc31.x86_64 package. Add a feature check to avoid providing a gettid() helper in such systems. On a fedora rawhide system with this patch applied we now get: [root@7a5f55352234 perf]# grep gettid /tmp/build/perf/FEATURE-DUMP feature-gettid=1 [root@7a5f55352234 perf]# cat /tmp/build/perf/feature/test-gettid.make.output [root@7a5f55352234 perf]# ldd /tmp/build/perf/feature/test-gettid.bin linux-vdso.so.1 (0x7ffc6b1f6000) libc.so.6 => /lib64/libc.so.6 (0x7f04e0a74000) /lib64/ld-linux-x86-64.so.2 (0x7f04e0c47000) [root@7a5f55352234 perf]# nm /tmp/build/perf/feature/test-gettid.bin | grep -w gettid U gettid@@GLIBC_2.30 [root@7a5f55352234 perf]# While on a fedora:29 system: [acme@quaco perf]$ grep gettid /tmp/build/perf/FEATURE-DUMP feature-gettid=0 [acme@quaco perf]$ cat /tmp/build/perf/feature/test-gettid.make.output test-gettid.c: In function ‘main’: test-gettid.c:8:9: error: implicit declaration of function ‘gettid’; did you mean ‘getgid’? [-Werror=implicit-function-declaration] return gettid(); ^~ getgid cc1: all warnings being treated as errors [acme@quaco perf]$ Reported-by: Laura Abbott Cc: Adrian Hunter Cc: Florian Weimer Cc: Jiri Olsa Cc: Namhyung Kim Cc: Stephane Eranian Link: https://lkml.kernel.org/n/tip-yfy3ch53agmklwu9o7rlg...@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo diff --git a/tools/build/Makefile.feature b/tools/build/Makefile.feature index 3b24231c58a2..50377cc2f5f9 100644 --- a/tools/build/Makefile.feature +++ b/tools/build/Makefile.feature @@ -36,6 +36,7 @@ FEATURE_TESTS_BASIC := \ fortify-source \ sync-compare-and-swap \ get_current_dir_name\ +gettid \ glibc \ gtk2\ gtk2-infobar\ diff --git a/tools/build/feature/Makefile b/tools/build/feature/Makefile index 4b8244ee65ce..523ee42db0c8 100644 --- a/tools/build/feature/Makefile +++ b/tools/build/feature/Makefile @@ -54,6 +54,7 @@ FILES= \ test-get_cpuid.bin \ test-sdt.bin \ test-cxx.bin \ + test-gettid.bin \ test-jvmti.bin \ test-jvmti-cmlr.bin \ test-sched_getcpu.bin\ @@ -267,6 +268,9 @@ $(OUTPUT)test-sdt.bin: $(OUTPUT)test-cxx.bin: $(BUILDXX) -std=gnu++11 +$(OUTPUT)test-gettid.bin: + $(BUILD) + $(OUTPUT)test-jvmti.bin: $(BUILD) diff --git a/tools/build/feature/test-all.c b/tools/build/feature/test-all.c index a59c53705093..3b3d5d72124a 100644 --- a/tools/build/feature/test-all.c +++ b/tools/build/feature/test-all.c @@ -38,6 +38,10 @@ # include "test-get_current_dir_name.c" #undef main +#define main main_test_gettid +# include "test-gettid.c" +#undef main + #define main main_test_glibc # include "test-glibc.c" #undef main @@ -195,6 +199,7 @@ int main(int argc, char *argv[]) main_test_libelf(); main_test_libelf_mmap(); main_test_get_current_dir_name(); + main_test_gettid(); main_test_glibc(); main_test_dwarf(); main_test_dwarf_getlocations(); diff --git a/tools/build/feature/test-gettid.c b/tools/build/feature/test-gettid.c new file mode 100644 index ..ef24e42d3f1b --- /dev/null +++ b/tools/build/feature/test-gettid.c @@ -0,0 +1,11 @@ +// SPDX-License-Identifier: GPL-2.0 +// Copyright (C) 2019, Red Hat Inc, Arnaldo Carvalho de Melo +#define _GNU_SOURCE +#include + +int main(void) +{ + return gettid(); +} + +#undef _GNU_SOURCE diff --git a/tools/
perf build failure with newer glibc headers
Hi, While doing some build experiments, I found a compile failure with perf and jvmti: BUILDSTDERR: gcc -Wp,-MD,./.xsk.o.d -Wp,-MT,xsk.o -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-jvmti/jvmti_agent.c:48:21: error: static declaration of 'gettid' follows non-static declaration BUILDSTDERR:48 | static inline pid_t gettid(void) BUILDSTDERR: | ^~ BUILDSTDERR: In file included from /usr/include/unistd.h:1170, BUILDSTDERR: from jvmti/jvmti_agent.c:33: BUILDSTDERR: /usr/include/bits/unistd_ext.h:40:16: note: previous declaration of 'gettid' was here BUILDSTDERR:40 | extern __pid_t gettid (void) __THROW; BUILDSTDERR: |^~ This is with the newer glibc headers that came into Fedora earlier this week (glibc-2.29.9000-27.fc31) It looks like the newer headers now define gettid so the in file gettid no longer works. Note this was a custom build with jvmti enabled as regular Fedora doesn't have it enabled which is why this wasn't reported elsewhere. I don't know enough about either the glibc headers or perf to make a suggestion on how to fix this but I'm happy to test. Thanks, Laura
Re: [PATCH v3] tpm: Actually fail on TPM errors during "get random"
On 4/3/19 1:52 PM, Jarkko Sakkinen wrote: On Tue, Apr 02, 2019 at 07:13:52PM +, Winkler, Tomas wrote: On Tue, Apr 02, 2019 at 02:46:25AM +0300, Jarkko Sakkinen wrote: On Mon, Apr 01, 2019 at 12:06:07PM -0700, Kees Cook wrote: A "get random" may fail with a TPM error, but those codes were returned as-is to the caller, which assumed the result was the number of bytes that had been written to the target buffer, which could lead to a kernel heap memory exposure and over-read. This fixes tpm1_get_random() to mask positive TPM errors into -EIO, as before. [ 18.092103] tpm tpm0: A TPM error (379) occurred attempting get random [ 18.092106] usercopy: Kernel memory exposure attempt detected from SLUB object 'kmalloc-64' (offset 0, size 379)! Link: https://bugzilla.redhat.com/show_bug.cgi?id=1650989 Reported-by: Phil Baker Reported-by: Craig Robson Fixes: 7aee9c52d7ac ("tpm: tpm1: rewrite tpm1_get_random() using tpm_buf structure") Cc: Laura Abbott Cc: Tomas Winkler Cc: Jarkko Sakkinen Cc: sta...@vger.kernel.org Signed-off-by: Kees Cook --- v3: fix never-succeed, limit checks to tpm cmd return (James, Jason) v2: also fix tpm2 implementation (Jason Gunthorpe) --- drivers/char/tpm/tpm1-cmd.c | 7 +-- drivers/char/tpm/tpm2-cmd.c | 7 +-- 2 files changed, 10 insertions(+), 4 deletions(-) diff --git a/drivers/char/tpm/tpm1-cmd.c b/drivers/char/tpm/tpm1-cmd.c index 85dcf2654d11..faacbe1ffa1a 100644 --- a/drivers/char/tpm/tpm1-cmd.c +++ b/drivers/char/tpm/tpm1-cmd.c @@ -510,7 +510,7 @@ struct tpm1_get_random_out { * * Return: * * number of bytes read - * * -errno or a TPM return code otherwise + * * -errno (positive TPM return codes are masked to -EIO) */ int tpm1_get_random(struct tpm_chip *chip, u8 *dest, size_t max) { @@ -531,8 +531,11 @@ int tpm1_get_random(struct tpm_chip *chip, u8 *dest, size_t max) rc = tpm_transmit_cmd(chip, , sizeof(out->rng_data_len), "attempting get random"); - if (rc) + if (rc) { + if (rc > 0) + rc = -EIO; goto out; + } out = (struct tpm1_get_random_out *)[TPM_HEADER_SIZE]; diff --git a/drivers/char/tpm/tpm2-cmd.c b/drivers/char/tpm/tpm2-cmd.c index e74c5b7b64bf..8ffa6af61580 100644 --- a/drivers/char/tpm/tpm2-cmd.c +++ b/drivers/char/tpm/tpm2-cmd.c @@ -301,7 +301,7 @@ struct tpm2_get_random_out { * * Return: * size of the buffer on success, - * -errno otherwise + * -errno otherwise ((positive TPM return codes are masked to -EIO) */ int tpm2_get_random(struct tpm_chip *chip, u8 *dest, size_t max) { @@ -328,8 +328,11 @@ int tpm2_get_random(struct tpm_chip *chip, u8 *dest, size_t max) offsetof(struct tpm2_get_random_out, buffer), "attempting get random"); - if (err) + if (err) { + if (err > 0) + err = -EIO; goto out; + } out = (struct tpm2_get_random_out *) [TPM_HEADER_SIZE]; -- 2.17.1 -- Kees Cook Reviewed-by: Jarkko Sakkinen Applied to my master branch. Jason, Tomas, do you want me to add reviewed- by's? Sure, it fixes my patch. Great, I'll add it. Thank you. Just want to be explicit with these things as I consider them as if I was asking a signature from someone :-) /Jarkko Was this intended to go in for 5.2? I still don't see it in the tree. Thanks, Laura
Re: [RESEND PATCH v3 04/11] s390/cpacf: mark scpacf_query() as __always_inline
On 4/22/19 8:49 PM, Masahiro Yamada wrote: This prepares to move CONFIG_OPTIMIZE_INLINING from x86 to a common place. We need to eliminate potential issues beforehand. If it is enabled for s390, the following error is reported: In file included from arch/s390/crypto/des_s390.c:19: ./arch/s390/include/asm/cpacf.h: In function 'cpacf_query': ./arch/s390/include/asm/cpacf.h:170:2: warning: asm operand 3 probably doesn't match constraints asm volatile( ^~~ ./arch/s390/include/asm/cpacf.h:170:2: error: impossible constraint in 'asm' This also seems to still be broken, again with gcc 9.1.1 BUILDSTDERR: In file included from arch/s390/crypto/prng.c:29: BUILDSTDERR: ./arch/s390/include/asm/cpacf.h: In function 'cpacf_query_func': BUILDSTDERR: ./arch/s390/include/asm/cpacf.h:170:2: warning: asm operand 3 probably doesn't match constraints BUILDSTDERR: 170 | asm volatile( BUILDSTDERR: | ^~~ BUILDSTDERR: ./arch/s390/include/asm/cpacf.h:170:2: error: impossible constraint in 'asm' I realized we're still carrying a patch to add -fno-section-anchors but it's a similar failure to powerpc. Thanks, Laura (config attached, full build log at https://kojipkgs.fedoraproject.org//work/tasks/6330/34886330/build.log) Signed-off-by: Masahiro Yamada --- Changes in v3: None Changes in v2: - split into a separate patch arch/s390/include/asm/cpacf.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/s390/include/asm/cpacf.h b/arch/s390/include/asm/cpacf.h index 3cc52e37b4b2..f316de40e51b 100644 --- a/arch/s390/include/asm/cpacf.h +++ b/arch/s390/include/asm/cpacf.h @@ -202,7 +202,7 @@ static inline int __cpacf_check_opcode(unsigned int opcode) } } -static inline int cpacf_query(unsigned int opcode, cpacf_mask_t *mask) +static __always_inline int cpacf_query(unsigned int opcode, cpacf_mask_t *mask) { if (__cpacf_check_opcode(opcode)) { __cpacf_query(opcode, mask); # s390 # # Automatically generated file; DO NOT EDIT. # Linux/s390 5.1.0 Kernel Configuration # # # Compiler: gcc (GCC) 9.0.1 20190312 (Red Hat 9.0.1-0.10) # CONFIG_CC_IS_GCC=y CONFIG_GCC_VERSION=90001 CONFIG_CLANG_VERSION=0 CONFIG_CC_HAS_ASM_GOTO=y CONFIG_CC_HAS_WARN_MAYBE_UNINITIALIZED=y CONFIG_IRQ_WORK=y CONFIG_BUILDTIME_EXTABLE_SORT=y CONFIG_THREAD_INFO_IN_TASK=y # # General setup # CONFIG_INIT_ENV_ARG_LIMIT=32 # CONFIG_COMPILE_TEST is not set CONFIG_LOCALVERSION="" # CONFIG_LOCALVERSION_AUTO is not set CONFIG_BUILD_SALT="" CONFIG_HAVE_KERNEL_GZIP=y CONFIG_HAVE_KERNEL_BZIP2=y CONFIG_HAVE_KERNEL_LZMA=y CONFIG_HAVE_KERNEL_XZ=y CONFIG_HAVE_KERNEL_LZO=y CONFIG_HAVE_KERNEL_LZ4=y CONFIG_HAVE_KERNEL_UNCOMPRESSED=y CONFIG_KERNEL_GZIP=y # CONFIG_KERNEL_BZIP2 is not set # CONFIG_KERNEL_LZMA is not set # CONFIG_KERNEL_XZ is not set # CONFIG_KERNEL_LZO is not set # CONFIG_KERNEL_LZ4 is not set # CONFIG_KERNEL_UNCOMPRESSED is not set CONFIG_DEFAULT_HOSTNAME="(none)" CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y CONFIG_POSIX_MQUEUE_SYSCTL=y CONFIG_CROSS_MEMORY_ATTACH=y # CONFIG_USELIB is not set CONFIG_AUDIT=y CONFIG_HAVE_ARCH_AUDITSYSCALL=y CONFIG_AUDITSYSCALL=y # # IRQ subsystem # CONFIG_IRQ_DOMAIN=y CONFIG_GENERIC_MSI_IRQ=y CONFIG_SPARSE_IRQ=y # CONFIG_GENERIC_IRQ_DEBUGFS is not set CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_GENERIC_CLOCKEVENTS=y # # Timers subsystem # CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ_COMMON=y # CONFIG_HZ_PERIODIC is not set CONFIG_NO_HZ_IDLE=y CONFIG_NO_HZ=y CONFIG_HIGH_RES_TIMERS=y # CONFIG_PREEMPT_NONE is not set CONFIG_PREEMPT_VOLUNTARY=y # CONFIG_PREEMPT is not set CONFIG_PREEMPT_COUNT=y # # CPU/Task time and stats accounting # CONFIG_VIRT_CPU_ACCOUNTING=y CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y CONFIG_BSD_PROCESS_ACCT=y CONFIG_BSD_PROCESS_ACCT_V3=y CONFIG_TASKSTATS=y CONFIG_TASK_DELAY_ACCT=y CONFIG_TASK_XACCT=y CONFIG_TASK_IO_ACCOUNTING=y CONFIG_PSI=y # CONFIG_PSI_DEFAULT_DISABLED is not set # CONFIG_CPU_ISOLATION is not set # # RCU Subsystem # CONFIG_TREE_RCU=y # CONFIG_RCU_EXPERT is not set CONFIG_SRCU=y CONFIG_TREE_SRCU=y CONFIG_TASKS_RCU=y CONFIG_RCU_STALL_COMMON=y CONFIG_RCU_NEED_SEGCBLIST=y CONFIG_BUILD_BIN2C=y # CONFIG_IKCONFIG is not set CONFIG_IKHEADERS_PROC=m CONFIG_LOG_BUF_SHIFT=16 CONFIG_LOG_CPU_MAX_BUF_SHIFT=12 CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=12 CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y CONFIG_NUMA_BALANCING=y # CONFIG_NUMA_BALANCING_DEFAULT_ENABLED is not set CONFIG_CGROUPS=y CONFIG_PAGE_COUNTER=y CONFIG_MEMCG=y CONFIG_MEMCG_SWAP=y CONFIG_MEMCG_SWAP_ENABLED=y CONFIG_MEMCG_KMEM=y CONFIG_BLK_CGROUP=y CONFIG_DEBUG_BLK_CGROUP=y CONFIG_CGROUP_WRITEBACK=y CONFIG_CGROUP_SCHED=y CONFIG_FAIR_GROUP_SCHED=y CONFIG_CFS_BANDWIDTH=y # CONFIG_RT_GROUP_SCHED is not set CONFIG_CGROUP_PIDS=y # CONFIG_CGROUP_RDMA is not set CONFIG_CGROUP_FREEZER=y # CONFIG_CGROUP_HUGETLB is not set CONFIG_CPUSETS=y CONFIG_PROC_PID_CPUSET=y CONFIG_CGROUP_DEVICE=y CONFIG_CGROUP_CPUACCT=y CONFIG_CGROUP_PERF=y CONFIG_CGROUP_BPF=y # CONFIG_CGROUP_DEBUG
Re: [PATCH] arm64: vdso: Explicitly add build-id option
On 5/16/19 3:46 AM, Will Deacon wrote: On Thu, May 16, 2019 at 01:58:56PM +0900, Masahiro Yamada wrote: On Thu, May 16, 2019 at 4:51 AM Laura Abbott wrote: Commit 691efbedc60d ("arm64: vdso: use $(LD) instead of $(CC) to link VDSO") switched to using LD explicitly. The --build-id option needs to be passed explicitly, similar to x86. Add this option. Fixes: 691efbedc60d ("arm64: vdso: use $(LD) instead of $(CC) to link VDSO") Signed-off-by: Laura Abbott --- arch/arm64/kernel/vdso/Makefile | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile index 744b9dbaba03..ca209103cd06 100644 --- a/arch/arm64/kernel/vdso/Makefile +++ b/arch/arm64/kernel/vdso/Makefile @@ -13,6 +13,7 @@ targets := $(obj-vdso) vdso.so vdso.so.dbg obj-vdso := $(addprefix $(obj)/, $(obj-vdso)) ldflags-y := -shared -nostdlib -soname=linux-vdso.so.1 \ + $(call ld-option, --build-id) \ $(call ld-option, --hash-style=sysv) -n -T # Disable gcov profiling for VDSO code I missed that. Sorry. You can add --build-id without $(call ld-option,...) because it is supported by our minimal version of toolchain. See commit log of 1e0221374e for example. Ok, so I'm ok folding in the diff below on top? Will --->8 diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile index ca209103cd06..fa230ff09aa1 100644 --- a/arch/arm64/kernel/vdso/Makefile +++ b/arch/arm64/kernel/vdso/Makefile @@ -12,9 +12,8 @@ obj-vdso := gettimeofday.o note.o sigreturn.o targets := $(obj-vdso) vdso.so vdso.so.dbg obj-vdso := $(addprefix $(obj)/, $(obj-vdso)) -ldflags-y := -shared -nostdlib -soname=linux-vdso.so.1 \ - $(call ld-option, --build-id) \ - $(call ld-option, --hash-style=sysv) -n -T +ldflags-y := -shared -nostdlib -soname=linux-vdso.so.1 --hash-style=sysv \ + --build-id -n -T # Disable gcov profiling for VDSO code GCOV_PROFILE := n Looks good to me. Thanks, Laura
[PATCH] arm64: vdso: Explicitly add build-id option
Commit 691efbedc60d ("arm64: vdso: use $(LD) instead of $(CC) to link VDSO") switched to using LD explicitly. The --build-id option needs to be passed explicitly, similar to x86. Add this option. Fixes: 691efbedc60d ("arm64: vdso: use $(LD) instead of $(CC) to link VDSO") Signed-off-by: Laura Abbott --- arch/arm64/kernel/vdso/Makefile | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile index 744b9dbaba03..ca209103cd06 100644 --- a/arch/arm64/kernel/vdso/Makefile +++ b/arch/arm64/kernel/vdso/Makefile @@ -13,6 +13,7 @@ targets := $(obj-vdso) vdso.so vdso.so.dbg obj-vdso := $(addprefix $(obj)/, $(obj-vdso)) ldflags-y := -shared -nostdlib -soname=linux-vdso.so.1 \ + $(call ld-option, --build-id) \ $(call ld-option, --hash-style=sysv) -n -T # Disable gcov profiling for VDSO code -- 2.21.0
Re: [PATCH] usercopy: Remove HARDENED_USERCOPY_PAGESPAN
On 5/10/19 3:43 PM, Kees Cook wrote: This feature continues to cause more problems than it solves[1]. Its intention was to check the bounds of page-allocator allocations by using __GFP_COMP, for which we would need to find all missing __GFP_COMP markings. This work has been on hold and there is an argument[2] that such markings are not even the correct signal for checking for same-allocation pages. Instead of depending on BROKEN, this just removes it entirely. It can be trivially reverted if/when a better solution for tracking page allocator sizes is found. [1] https://www.mail-archive.com/linux-crypto@vger.kernel.org/msg37479.html [2] https://lkml.kernel.org/r/20190415022412.ga29...@bombadil.infradead.org Suggested-by: Eric Biggers Signed-off-by: Kees Cook --- mm/usercopy.c| 67 security/Kconfig | 11 2 files changed, 78 deletions(-) diff --git a/mm/usercopy.c b/mm/usercopy.c index 14faadcedd06..15dc1bf03303 100644 --- a/mm/usercopy.c +++ b/mm/usercopy.c @@ -159,70 +159,6 @@ static inline void check_bogus_address(const unsigned long ptr, unsigned long n, usercopy_abort("null address", NULL, to_user, ptr, n); } -/* Checks for allocs that are marked in some way as spanning multiple pages. */ -static inline void check_page_span(const void *ptr, unsigned long n, - struct page *page, bool to_user) -{ -#ifdef CONFIG_HARDENED_USERCOPY_PAGESPAN - const void *end = ptr + n - 1; - struct page *endpage; - bool is_reserved, is_cma; - - /* -* Sometimes the kernel data regions are not marked Reserved (see -* check below). And sometimes [_sdata,_edata) does not cover -* rodata and/or bss, so check each range explicitly. -*/ - - /* Allow reads of kernel rodata region (if not marked as Reserved). */ - if (ptr >= (const void *)__start_rodata && - end <= (const void *)__end_rodata) { - if (!to_user) - usercopy_abort("rodata", NULL, to_user, 0, n); - return; - } - - /* Allow kernel data region (if not marked as Reserved). */ - if (ptr >= (const void *)_sdata && end <= (const void *)_edata) - return; - - /* Allow kernel bss region (if not marked as Reserved). */ - if (ptr >= (const void *)__bss_start && - end <= (const void *)__bss_stop) - return; - I agree the page spanning is broken but is it worth keeping the checks against __rodata __bss etc.? - /* Is the object wholly within one base page? */ - if (likely(((unsigned long)ptr & (unsigned long)PAGE_MASK) == - ((unsigned long)end & (unsigned long)PAGE_MASK))) - return; - - /* Allow if fully inside the same compound (__GFP_COMP) page. */ - endpage = virt_to_head_page(end); - if (likely(endpage == page)) - return; - - /* -* Reject if range is entirely either Reserved (i.e. special or -* device memory), or CMA. Otherwise, reject since the object spans -* several independently allocated pages. -*/ - is_reserved = PageReserved(page); - is_cma = is_migrate_cma_page(page); - if (!is_reserved && !is_cma) - usercopy_abort("spans multiple pages", NULL, to_user, 0, n); - - for (ptr += PAGE_SIZE; ptr <= end; ptr += PAGE_SIZE) { - page = virt_to_head_page(ptr); - if (is_reserved && !PageReserved(page)) - usercopy_abort("spans Reserved and non-Reserved pages", - NULL, to_user, 0, n); - if (is_cma && !is_migrate_cma_page(page)) - usercopy_abort("spans CMA and non-CMA pages", NULL, - to_user, 0, n); - } -#endif -} - static inline void check_heap_object(const void *ptr, unsigned long n, bool to_user) { @@ -236,9 +172,6 @@ static inline void check_heap_object(const void *ptr, unsigned long n, if (PageSlab(page)) { /* Check slab allocator for flags and size. */ __check_heap_object(ptr, n, page, to_user); - } else { - /* Verify object does not incorrectly span multiple pages. */ - check_page_span(ptr, n, page, to_user); } } diff --git a/security/Kconfig b/security/Kconfig index 353cfef71d4e..8392647f5a4c 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -176,17 +176,6 @@ config HARDENED_USERCOPY_FALLBACK Booting with "slab_common.usercopy_fallback=Y/N" can change this setting. -config HARDENED_USERCOPY_PAGESPAN - bool "Refuse to copy allocations that span multiple pages" - depends on HARDENED_USERCOPY - depends on EXPERT - help - When a multi-page allocation is done
panic from iwl_mvm_vif_dbgfs_register with 5.0
Hi, Fedora got a bug report of a panic with kernels > 5.0: Mar 20 10:52:38 kernel: BUG: unable to handle kernel NULL pointer dereference at 0043 Mar 20 10:52:38 kernel: #PF error: [normal kernel read fault] Mar 20 10:52:38 kernel: PGD 8003de1d7067 P4D 8003de1d7067 PUD 3de1db067 PMD 0 Mar 20 10:52:38 kernel: Oops: [#1] SMP PTI Mar 20 10:52:38 kernel: CPU: 0 PID: 2071 Comm: hostapd Not tainted 5.1.0-0.rc0.git9.1.fc31.x86_64 #1 Mar 20 10:52:38 kernel: Hardware name: LENOVO 10A8S08P00/SHARKBAY, BIOS FBKT56AUS 11/18/2013 Mar 20 10:52:38 kernel: RIP: 0010:dentry_name+0x9e/0x210 Mar 20 10:52:38 kernel: Code: 6b ff 5a 85 c0 74 0d 80 3d 73 60 e2 00 00 0f 84 04 01 00 00 45 85 ff 0f 8e 5d 01 00 00 31 ff eb 09 48 83 c7 01 41 39 ff 7e 2f <48> 8b 43 50 48 89 da 89 fe 48 89 c3 48 8b 42 60 48 89 04 fc 48 39 Mar 20 10:52:38 kernel: RSP: 0018:a78d82f07920 EFLAGS: 00010246 Mar 20 10:52:38 kernel: RAX: 0001 RBX: fff3 RCX: f20b42fd Mar 20 10:52:38 kernel: RDX: a8ac3b70 RSI: 98e572c28d60 RDI: Mar 20 10:52:38 kernel: RBP: a78d82f07a75 R08: 0033dcd044f0 R09: Mar 20 10:52:38 kernel: R10: 0001 R11: 0002 R12: 0a00ff05 Mar 20 10:52:38 kernel: R13: R14: a78d82f07ad0 R15: 0003 Mar 20 10:52:38 kernel: FS: 7f8b47f4c740() GS:98e58e60() knlGS: Mar 20 10:52:38 kernel: CS: 0010 DS: ES: CR0: 80050033 Mar 20 10:52:38 kernel: CR2: 0043 CR3: 0003de26e005 CR4: 001606f0 Mar 20 10:52:38 kernel: Call Trace: Mar 20 10:52:38 kernel: ? __lock_acquire+0x252/0x1060 Mar 20 10:52:38 kernel: pointer+0x14e/0x370 Mar 20 10:52:38 kernel: vsnprintf+0x1ff/0x520 Mar 20 10:52:38 kernel: snprintf+0x49/0x60 Mar 20 10:52:38 kernel: ? __debugfs_create_file+0x3a/0x130 Mar 20 10:52:38 kernel: iwl_mvm_vif_dbgfs_register+0x231/0x310 [iwlmvm] Mar 20 10:52:38 kernel: ? iwl_mvm_find_free_sta_id+0x87/0x100 [iwlmvm] Mar 20 10:52:38 kernel: iwl_mvm_mac_add_interface+0x221/0x2a0 [iwlmvm] Mar 20 10:52:38 kernel: drv_add_interface+0x77/0x230 [mac80211] Mar 20 10:52:38 kernel: ieee80211_do_open+0x13e/0x910 [mac80211] Mar 20 10:52:38 kernel: ? ieee80211_check_concurrent_iface+0x151/0x1c0 [mac80211] Mar 20 10:52:38 kernel: __dev_open+0xd4/0x170 Mar 20 10:52:38 kernel: __dev_change_flags+0x1a7/0x200 Mar 20 10:52:38 kernel: dev_change_flags+0x21/0x60 Mar 20 10:52:38 kernel: devinet_ioctl+0x644/0x7c0 Mar 20 10:52:38 kernel: inet_ioctl+0x15a/0x220 Mar 20 10:52:38 kernel: ? avc_has_extended_perms+0x252/0x5c0 Mar 20 10:52:38 kernel: sock_do_ioctl+0x47/0x140 Mar 20 10:52:38 kernel: sock_ioctl+0x1c5/0x360 Mar 20 10:52:38 kernel: do_vfs_ioctl+0x408/0x750 Mar 20 10:52:38 kernel: ksys_ioctl+0x5e/0x90 Mar 20 10:52:38 kernel: __x64_sys_ioctl+0x16/0x20 Mar 20 10:52:38 kernel: do_syscall_64+0x5c/0xa0 Mar 20 10:52:38 kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe Mar 20 10:52:38 kernel: RIP: 0033:0x7f8b4808709b Mar 20 10:52:38 kernel: Code: 0f 1e fa 48 8b 05 ed bd 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d bd bd 0c 00 f7 d8 64 89 01 48 Mar 20 10:52:38 kernel: RSP: 002b:7ffd44090bb8 EFLAGS: 0246 ORIG_RAX: 0010 Mar 20 10:52:38 kernel: RAX: ffda RBX: RCX: 7f8b4808709b Mar 20 10:52:38 kernel: RDX: 7ffd44090bc0 RSI: 8914 RDI: 0008 Mar 20 10:52:38 kernel: RBP: 0008 R08: 7ffd44090bcf R09: 0007 Mar 20 10:52:38 kernel: R10: 014920b0 R11: 0246 R12: 7ffd44090bc0 Mar 20 10:52:38 kernel: R13: 01490d90 R14: 0001 R15: Mar 20 10:52:38 kernel: Modules linked in: ebtable_filter ebtables ip6table_filter ip6_tables bridge stp llc nf_nat_ftp nf_conntrack_ftp nf_nat_tftp nf_conntrack_tftp xt_pkttype xt_state xt_conntrack xt_nat iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c nf_log_ipv4 nf_log_common xt_dscp xt_multiport xt_LOG xt_set xt_length xt_mark iptable_mangle ip_set_hash_net ip_set_bitmap_port ip_set_hash_ip ip_set_hash_netport ip_set nfnetlink sunrpc vfat fat mei_wdt mei_hdcp intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp arc4 kvm_intel kvm irqbypass iwlmvm mac80211 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore intel_rapl_perf iTCO_wdt snd_hda_codec_realtek iTCO_vendor_support snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio iwlwifi snd_hda_intel joydev snd_hda_codec wmi_bmof snd_hda_core snd_hwdep snd_seq cfg80211 i2c_i801 snd_seq_device snd_pcm rfkill snd_timer snd soundcore mei_me pcc_cpufreq lpc_ich mei raid1 i915 crc32c_intel i2c_algo_bit Mar 20 10:52:38 kernel: drm_kms_helper drm ums_realtek uas r8169 usb_storage e1000e wmi video Mar 20 10:52:38 kernel:
Re: [RFC][PATCH 5/5 v2] kselftests: Add dma-heap test
On 3/6/19 9:01 AM, John Stultz wrote: On Wed, Mar 6, 2019 at 8:14 AM Benjamin Gaignard wrote: Le mar. 5 mars 2019 à 21:54, John Stultz a écrit : + + printf("Allocating 1 MEG\n"); + ret = dmabuf_heap_alloc(heap_fd, ONE_MEG, 0, _fd); + if (ret) + goto out; + + /* DO SOMETHING WITH THE DMABUF HERE? */ You can do a call to mmap and write a pattern in the buffer. Yea. I can also do some invalid allocations to make sure things fail properly. But I was talking a bit w/ Sumit about the lack of any general dmabuf tests, and am curious if we need to have a importer device driver that can validate its a real dmabuf and exercise more of the dmabuf ops. thanks -john There's the vgem driver in drm. I did some work to clean that up so it could take an import af33a9190d02 ("drm/vgem: Enable dmabuf import interfaces") . I mostly used it for private tests and never ended up upstreaming any of the tests. Thanks, Laura
Participation in Distro kernel microconference at Plumbers?
Hi, I'm working on organizing a microconference for Linux Plumbers 2019 focused on distribution kernels. The focus is on collaboration on common problems for maintaining and distributing a kernel for any kind of distribution (embedded, enterprise, community, internal, extrnal). Ideally, this would be a topic that could be presented and discussed in 20-30 minutes (we can be flexible depending on the number of topics though). If you have something that might be a good fit for this, please reach out to me off list. Feel free to pass this around to anyone who might be interested. Thanks, Laura
Re: [PATCH] ipc: Fix building compat mode without sysvipc
On 3/6/19 6:29 AM, Arnd Bergmann wrote: As John Stultz noticed, my y2038 syscall series caused a link failure when CONFIG_SYSVIPC is enabled but CONFIG_COMPAT is enabled: is this supposed to be "CONFIG_SYSVIPC is disabled" to match the subject? arch/arm64/kernel/sys32.o:(.rodata+0x960): undefined reference to `__arm64_compat_sys_old_semctl' arch/arm64/kernel/sys32.o:(.rodata+0x980): undefined reference to `__arm64_compat_sys_old_msgctl' arch/arm64/kernel/sys32.o:(.rodata+0x9a0): undefined reference to `__arm64_compat_sys_old_shmctl' Add the missing entries in kernel/sys_ni.c for the new system calls. Cc: Laura Abbott Cc: John Stultz Cc: Thomas Gleixner Signed-off-by: Arnd Bergmann --- I'm about to send off my pull requests for arm-soc, so I'd just send another one with just this common from my y2038 tree. --- kernel/sys_ni.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index 85e5ccec0955..62a6c8707799 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -202,6 +202,7 @@ COND_SYSCALL(msgget); COND_SYSCALL(old_msgctl); COND_SYSCALL(msgctl); COND_SYSCALL_COMPAT(msgctl); +COND_SYSCALL_COMPAT(old_msgctl); COND_SYSCALL(msgrcv); COND_SYSCALL_COMPAT(msgrcv); COND_SYSCALL(msgsnd); @@ -212,6 +213,7 @@ COND_SYSCALL(semget); COND_SYSCALL(old_semctl); COND_SYSCALL(semctl); COND_SYSCALL_COMPAT(semctl); +COND_SYSCALL_COMPAT(old_semctl); COND_SYSCALL(semtimedop); COND_SYSCALL(semtimedop_time32); COND_SYSCALL(semop); @@ -221,6 +223,7 @@ COND_SYSCALL(shmget); COND_SYSCALL(old_shmctl); COND_SYSCALL(shmctl); COND_SYSCALL_COMPAT(shmctl); +COND_SYSCALL_COMPAT(old_shmctl); COND_SYSCALL(shmat); COND_SYSCALL_COMPAT(shmat); COND_SYSCALL(shmdt);
Re: [PATCH v2] drm/vgem: fix use-after-free when drm_gem_handle_create() fails
On 2/26/19 1:44 PM, Eric Biggers wrote: From: Eric Biggers If drm_gem_handle_create() fails in vgem_gem_create(), then the drm_vgem_gem_object is freed twice: once when the reference is dropped by drm_gem_object_put_unlocked(), and again by __vgem_gem_destroy(). This was hit by syzkaller using fault injection. Fix it by skipping the second free. Reported-by: syzbot+e73f2fb5ed5a5df36...@syzkaller.appspotmail.com Fixes: af33a9190d02 ("drm/vgem: Enable dmabuf import interfaces") Reviewed-by: Chris Wilson Cc: Laura Abbott Cc: Daniel Vetter Cc: sta...@vger.kernel.org Signed-off-by: Eric Biggers --- drivers/gpu/drm/vgem/vgem_drv.c | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/drivers/gpu/drm/vgem/vgem_drv.c b/drivers/gpu/drm/vgem/vgem_drv.c index 5930facd6d2d8..11a8f99ba18c5 100644 --- a/drivers/gpu/drm/vgem/vgem_drv.c +++ b/drivers/gpu/drm/vgem/vgem_drv.c @@ -191,13 +191,9 @@ static struct drm_gem_object *vgem_gem_create(struct drm_device *dev, ret = drm_gem_handle_create(file, >base, handle); drm_gem_object_put_unlocked(>base); if (ret) - goto err; + return ERR_PTR(ret); return >base; - -err: - __vgem_gem_destroy(obj); - return ERR_PTR(ret); } static int vgem_gem_dumb_create(struct drm_file *file, struct drm_device *dev, Acked-by: Laura Abbott
Re: panic with CONFIG_FAIL_MMC_REQUEST and cqhci
On 2/22/19 5:42 AM, Ritesh Harjani wrote: Hi Laura, On 2/19/2019 12:59 AM, Laura Abbott wrote: Hi, Fedora got report of a panic when I accidentally left debugging enabled on a build https://bugzilla.redhat.com/show_bug.cgi?id=1677438 It looks like a panic from code in CONFIG_FAIL_MMC_REQUEST from the cqhci driver because there isn't a command (high level overview) With CQHCI, in case of non-DCMD (data) requests, mrq->cmd can be NULL. Is this crash happening always (100% on bootup) with CQHCI & CONFIG_FAIL_MMC_REQUEST enabled? Sure, I will role out a patch to handle this case. It will be great, if you could also confirm it from your side. Thanks, I'll have to follow up with the reporter and see if he gets back to me. Regards Ritesh (gdb) list *(mmc_should_fail_request+0xa) 0x149a is in mmc_should_fail_request (drivers/mmc/core/core.c:98). 93 }; 94 95 if (!data) 96 return; 97 98 if (cmd->error || data->error || 99 !should_fail(>fail_mmc_request, data->blksz * data->blocks)) 100 return; 101 102 data->error = data_errors[prandom_u32() % ARRAY_SIZE(data_errors)]; (gdb) (gdb) list *(mmc_cqe_request_done+0x1c) 0x2a6c is in mmc_cqe_request_done (drivers/mmc/core/core.c:505). 500 void mmc_cqe_request_done(struct mmc_host *host, struct mmc_request *mrq) 501 { 502 mmc_should_fail_request(host, mrq); 503 504 /* Flag re-tuning needed on CRC errors */ 505 if ((mrq->cmd && mrq->cmd->error == -EILSEQ) || 506 (mrq->data && mrq->data->error == -EILSEQ)) 507 mmc_retune_needed(host); 508 509 trace_mmc_request_done(host, mrq); (gdb) list *(cqhci_irq+0x1d2) 0x1172 is in cqhci_irq (drivers/mmc/host/cqhci.c:747). 742 data->bytes_xfered = 0; 743 else 744 data->bytes_xfered = data->blksz * data->blocks; 745 } 746 747 mmc_cqe_request_done(mmc, mrq); 748 } 749 750 irqreturn_t cqhci_irq(struct mmc_host *mmc, u32 intmask, int cmd_error, 751 int data_error) This can be worked around by turning off the option but it seems like something to fix up. Thanks, Laura
Re: [PATCH] s390/jump_label: Correct asm contraint
On 2/20/19 12:58 AM, Heiko Carstens wrote: On Sat, Feb 09, 2019 at 12:34:20PM -0800, Laura Abbott wrote: On 2/5/19 12:43 PM, Heiko Carstens wrote: On Tue, Jan 29, 2019 at 08:25:58AM +0100, Laura Abbott wrote: On 1/23/19 5:24 AM, Heiko Carstens wrote: On Wed, Jan 23, 2019 at 01:55:13PM +0100, Laura Abbott wrote: There's a build failure with gcc9: ./arch/s390/include/asm/jump_label.h: Assembler messages: ./arch/s390/include/asm/jump_label.h:23: Error: bad expression ./arch/s390/include/asm/jump_label.h:23: Error: junk at end of line, first unrecognized character is `r' make[1]: *** [scripts/Makefile.build:277: init/main.o] Error 1 ... I've had to turn off s390 in Fedora until this gets fixed :( Laura, the patch below should fix this (temporarily). If possible, could you give it a try? It seems to work for me. Subject: [PATCH] s390: disable section anchors Tested-by: Laura Abbott < The patch won't be used. In the meantime Ilya provided a gcc 9 and kernel patch which should fix this. The kernel patch is available here https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git/commit/?h=features=146448524bddbf6dfc62de31957e428de001cbda and will go upstream during the next merge window. Note: this obviously also requires to update the gcc 9 version in Fedora, so it contains Ilya's patch, to be able to compile the kernel. Thanks, Heiko Thanks. I'll keep an eye out for that during the next merge window.
Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap
On 1/24/19 8:44 AM, Brian Starkey wrote: On Thu, Jan 24, 2019 at 10:04:46AM -0600, Andrew F. Davis wrote: On 1/23/19 11:11 AM, Brian Starkey wrote: [snip] I'm very new to all this, so any pointers to history in this area are appreciated. [snip] In case you didn't come across it already, the effort which seems to have gained the most "air-time" recently is https://github.com/cubanismo/allocator, which is still a userspace module (perhaps some concepts from there could go into the kernel?), but makes some attempts at generic constraint solving. It's also not really moving anywhere at the moment. Very interesting, I'm going to have to stare at this for a bit. In which case, some reading material that might be of interest :-) https://www.x.org/wiki/Events/XDC2016/Program/Unix_Device_Memory_Allocation.pdf https://www.x.org/wiki/Events/XDC2017/jones_allocator.pdf https://lists.freedesktop.org/archives/mesa-dev/2017-November/177632.html -Brian In some respects this is more a question of "what is the purpose of Ion". Once upon a time, Ion was going to do constraint solving but that never really happened and I figured Ion would get deprecated. People keep coming out of the woodwork with new uses for Ion so its stuck around. This is why I've primarily focused on Ion as a framework for exposing available memory types to userspace and leave the constraint solving to someone else, since that's what most users seem to want out of Ion ("I know I want memory type X please give it to me"). That's not to say that this was a perfect or even the correct approach, just what made the most sense based on users. Thanks, Laura
Re: [PATCH] staging: android: ion: fix sys heap pool's gfp_flags
On 1/31/19 10:59 PM, Qing Xia wrote: In the first loop, gfp_flags will be modified to high_order_gfp_flags, and there will be no chance to change back to low_order_gfp_flags. Fixes: e7f63771 ("ION: Sys_heap: Add cached pool to spead up cached buffer alloc") Signed-off-by: Qing Xia --- drivers/staging/android/ion/ion_system_heap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/staging/android/ion/ion_system_heap.c b/drivers/staging/android/ion/ion_system_heap.c index 0383f75..20f2103 100644 --- a/drivers/staging/android/ion/ion_system_heap.c +++ b/drivers/staging/android/ion/ion_system_heap.c @@ -223,10 +223,10 @@ static void ion_system_heap_destroy_pools(struct ion_page_pool **pools) static int ion_system_heap_create_pools(struct ion_page_pool **pools) { int i; - gfp_t gfp_flags = low_order_gfp_flags; for (i = 0; i < NUM_ORDERS; i++) { struct ion_page_pool *pool; + gfp_t gfp_flags = low_order_gfp_flags; if (orders[i] > 4) gfp_flags = high_order_gfp_flags; Acked-by: Laura Abbott
Re: [PATCH v2] staging: android: ion: Allocate from heap ID directly without mask
On 2/15/19 11:01 AM, John Stultz wrote: On Fri, Feb 15, 2019 at 2:51 AM Brian Starkey wrote: Hi John, On Thu, Feb 14, 2019 at 09:38:29AM -0800, John Stultz wrote: [snip] Some thoughts, as this ABI break has the potential to be pretty painful. 1) Unfortunately, this ABI is exposed *through* libion via ion_alloc/ion_alloc_fd out to gralloc implementations. Which means it will have a wider impact to vendor userland code. I figured libion could fairly easily loop through all the set bits in heap_mask and call the ioctl for each until it succeeds. That preserves the old behaviour from the libion clients' perspective. Potentially, though that implicitly still caps the heaps to 32. So I'm not sure what the net benefit would be. 2) For patches that cause ABI breaks, it might be good to make it clear in the commit what the userland impact looks like in userspace, possibly with an example, so the poor folks who bisect down the change as breaking their system in a year or so have a clear example as to what they need to change in their code. 3) Also, its not clear how a given userland should distinguish between the different ABIs. We already have logic in libion to distinguish between pre-4.12 legacy and post-4.12 implementations (using implicit ion_free() behavior). I don't see any such check we can make with this code. Adding another ABI version may require we provide an actual interface version ioctl. A slightly fragile/ugly approach might be to attempt a small allocation with a heap_mask of 0x. On an "old" implementation, you'd expect that to succeed, whereas it would/could be made to fail in the "new" one. Yea I think having a proper ION_IOC_VERSION is going to be necessary. I'm hoping to send out an ugly first stab at the kernel side for switching to per-heap devices (with a config for keeping /dev/ion for legacy compat), which I hope will address the core issue this patch does (moving away from heap masks to specifically requested heaps). thanks -john Arnd/Greg said no to this last time I tried back in 2016 https://lore.kernel.org/lkml/1472769644-11039-4-git-send-email-labb...@redhat.com/T/#u
panic with CONFIG_FAIL_MMC_REQUEST and cqhci
Hi, Fedora got report of a panic when I accidentally left debugging enabled on a build https://bugzilla.redhat.com/show_bug.cgi?id=1677438 It looks like a panic from code in CONFIG_FAIL_MMC_REQUEST from the cqhci driver because there isn't a command (high level overview) (gdb) list *(mmc_should_fail_request+0xa) 0x149a is in mmc_should_fail_request (drivers/mmc/core/core.c:98). 93 }; 94 95 if (!data) 96 return; 97 98 if (cmd->error || data->error || 99 !should_fail(>fail_mmc_request, data->blksz * data->blocks)) 100 return; 101 102 data->error = data_errors[prandom_u32() % ARRAY_SIZE(data_errors)]; (gdb) (gdb) list *(mmc_cqe_request_done+0x1c) 0x2a6c is in mmc_cqe_request_done (drivers/mmc/core/core.c:505). 500 void mmc_cqe_request_done(struct mmc_host *host, struct mmc_request *mrq) 501 { 502 mmc_should_fail_request(host, mrq); 503 504 /* Flag re-tuning needed on CRC errors */ 505 if ((mrq->cmd && mrq->cmd->error == -EILSEQ) || 506 (mrq->data && mrq->data->error == -EILSEQ)) 507 mmc_retune_needed(host); 508 509 trace_mmc_request_done(host, mrq); (gdb) list *(cqhci_irq+0x1d2) 0x1172 is in cqhci_irq (drivers/mmc/host/cqhci.c:747). 742 data->bytes_xfered = 0; 743 else 744 data->bytes_xfered = data->blksz * data->blocks; 745 } 746 747 mmc_cqe_request_done(mmc, mrq); 748 } 749 750 irqreturn_t cqhci_irq(struct mmc_host *mmc, u32 intmask, int cmd_error, 751 int data_error) This can be worked around by turning off the option but it seems like something to fix up. Thanks, Laura
Re: [PATCH] staging: android: ion: Use low_order_gfp_flags for smaller allocations
On 2/11/19 11:09 PM, Jing Xia wrote: gfp_flags is always set high_order_gfp_flags even if allocations of order 0 are made.But for smaller allocations, the system should be able to reclaim some memory. Signed-off-by: Jing Xia Reviewed-by: Yuming Han Reviewed-by: Zhaoyang Huang Reviewed-by: Orson Zhai --- drivers/staging/android/ion/ion_system_heap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/staging/android/ion/ion_system_heap.c b/drivers/staging/android/ion/ion_system_heap.c index 0383f75..20f2103 100644 --- a/drivers/staging/android/ion/ion_system_heap.c +++ b/drivers/staging/android/ion/ion_system_heap.c @@ -223,10 +223,10 @@ static void ion_system_heap_destroy_pools(struct ion_page_pool **pools) static int ion_system_heap_create_pools(struct ion_page_pool **pools) { int i; - gfp_t gfp_flags = low_order_gfp_flags; for (i = 0; i < NUM_ORDERS; i++) { struct ion_page_pool *pool; + gfp_t gfp_flags = low_order_gfp_flags; if (orders[i] > 4) gfp_flags = high_order_gfp_flags; This was already submitted in https://lore.kernel.org/lkml/1549004386-38778-1-git-send-email-saberlily@hisilicon.com/T/#u (I'm also very behind on Ion e-mail and need to catch up...) Laura
Re: [RFC PATCH v7 05/16] arm64/mm: Add support for XPFO
On 2/12/19 7:52 AM, Khalid Aziz wrote: On 1/23/19 7:24 AM, Konrad Rzeszutek Wilk wrote: On Thu, Jan 10, 2019 at 02:09:37PM -0700, Khalid Aziz wrote: From: Juerg Haefliger Enable support for eXclusive Page Frame Ownership (XPFO) for arm64 and provide a hook for updating a single kernel page table entry (which is required by the generic XPFO code). v6: use flush_tlb_kernel_range() instead of __flush_tlb_one() CC: linux-arm-ker...@lists.infradead.org Signed-off-by: Juerg Haefliger Signed-off-by: Tycho Andersen Signed-off-by: Khalid Aziz --- arch/arm64/Kconfig | 1 + arch/arm64/mm/Makefile | 2 ++ arch/arm64/mm/xpfo.c | 58 ++ 3 files changed, 61 insertions(+) create mode 100644 arch/arm64/mm/xpfo.c diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index ea2ab0330e3a..f0a9c0007d23 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -171,6 +171,7 @@ config ARM64 select SWIOTLB select SYSCTL_EXCEPTION_TRACE select THREAD_INFO_IN_TASK + select ARCH_SUPPORTS_XPFO help ARM 64-bit (AArch64) Linux support. diff --git a/arch/arm64/mm/Makefile b/arch/arm64/mm/Makefile index 849c1df3d214..cca3808d9776 100644 --- a/arch/arm64/mm/Makefile +++ b/arch/arm64/mm/Makefile @@ -12,3 +12,5 @@ KASAN_SANITIZE_physaddr.o += n obj-$(CONFIG_KASAN) += kasan_init.o KASAN_SANITIZE_kasan_init.o := n + +obj-$(CONFIG_XPFO) += xpfo.o diff --git a/arch/arm64/mm/xpfo.c b/arch/arm64/mm/xpfo.c new file mode 100644 index ..678e2be848eb --- /dev/null +++ b/arch/arm64/mm/xpfo.c @@ -0,0 +1,58 @@ +/* + * Copyright (C) 2017 Hewlett Packard Enterprise Development, L.P. + * Copyright (C) 2016 Brown University. All rights reserved. + * + * Authors: + * Juerg Haefliger + * Vasileios P. Kemerlis + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + */ + +#include +#include + +#include + +/* + * Lookup the page table entry for a virtual address and return a pointer to + * the entry. Based on x86 tree. + */ +static pte_t *lookup_address(unsigned long addr) +{ + pgd_t *pgd; + pud_t *pud; + pmd_t *pmd; + + pgd = pgd_offset_k(addr); + if (pgd_none(*pgd)) + return NULL; + + pud = pud_offset(pgd, addr); + if (pud_none(*pud)) + return NULL; + + pmd = pmd_offset(pud, addr); + if (pmd_none(*pmd)) + return NULL; + + return pte_offset_kernel(pmd, addr); +} + +/* Update a single kernel page table entry */ +inline void set_kpte(void *kaddr, struct page *page, pgprot_t prot) +{ + pte_t *pte = lookup_address((unsigned long)kaddr); + + set_pte(pte, pfn_pte(page_to_pfn(page), prot)); Thought on the other hand.. what if the page is PMD? Do you really want to do this? What if 'pte' is NULL? +} + +inline void xpfo_flush_kernel_tlb(struct page *page, int order) +{ + unsigned long kaddr = (unsigned long)page_address(page); + unsigned long size = PAGE_SIZE; + + flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * size); Ditto here. You are assuming it is PTE, but it may be PMD or such. Or worts - the lookup_address could be NULL. +} -- 2.17.1 Hi Konrad, This makes sense. x86 version of set_kpte() checks pte for NULL and also checks if the page is PMD. Now what you said about adding level to lookup_address() for arm makes more sense. Can someone with knowledge of arm64 mmu make recommendations here? Thanks, Khalid arm64 can't split larger pages and requires everything must be mapped as pages (see [RFC PATCH v7 08/16] arm64/mm: disable section/contiguous mappings if XPFO is enabled) . Any situation where we would get something other than a pte would be a bug. Thanks, Laura
Re: [PATCH] s390/jump_label: Correct asm contraint
On 2/5/19 12:43 PM, Heiko Carstens wrote: On Tue, Jan 29, 2019 at 08:25:58AM +0100, Laura Abbott wrote: On 1/23/19 5:24 AM, Heiko Carstens wrote: On Wed, Jan 23, 2019 at 01:55:13PM +0100, Laura Abbott wrote: There's a build failure with gcc9: ./arch/s390/include/asm/jump_label.h: Assembler messages: ./arch/s390/include/asm/jump_label.h:23: Error: bad expression ./arch/s390/include/asm/jump_label.h:23: Error: junk at end of line, first unrecognized character is `r' make[1]: *** [scripts/Makefile.build:277: init/main.o] Error 1 ... I've had to turn off s390 in Fedora until this gets fixed :( Laura, the patch below should fix this (temporarily). If possible, could you give it a try? It seems to work for me. rom 4067027c2ccc8d3f1dc3bb19fe2d00da0c65bcd8 Mon Sep 17 00:00:00 2001 From: Heiko Carstens Date: Tue, 5 Feb 2019 13:21:56 +0100 Subject: [PATCH] s390: disable section anchors Disable section anchors to allow to compile with the current gcc 9 experimental version. The section anchors is a new feature for s390 with gcc 9, however it breaks our current usage of the 'X' constraint within the asm goto construct within our jump label implementation. Fixing this seems to be non-trivial, therefore (hopefully) temporarily disable section anchors. We will hopefully have a better solution before gcc 9 is released, so that this can be removed again. Reported-by: Laura Abbott Suggested-by: Ilya Leoshkevich Signed-off-by: Heiko Carstens --- arch/s390/Makefile | 8 1 file changed, 8 insertions(+) diff --git a/arch/s390/Makefile b/arch/s390/Makefile index e21053e5e0da..1eac75bc3a29 100644 --- a/arch/s390/Makefile +++ b/arch/s390/Makefile @@ -62,6 +62,14 @@ cflags-y += -Wa,-I$(srctree)/arch/$(ARCH)/include # cflags-$(CONFIG_FRAME_POINTER) += -fno-optimize-sibling-calls +# +# Disable section anchors. This gcc 9 feature currently breaks the 'X' +# constraint like it is used in the asm goto construct. +# +ifeq ($(call cc-option-yn,-fno-section-anchors),y) +cflags-y += -fno-section-anchors +endif + ifeq ($(call cc-option-yn,-mpacked-stack),y) cflags-$(CONFIG_PACK_STACK) += -mpacked-stack -D__PACK_STACK aflags-$(CONFIG_PACK_STACK) += -D__PACK_STACK Tested-by: Laura Abbott <
Re: [PATCH 0/3] Clean the new GCC 9 -Wmissing-attributes warnings
On 2/9/19 12:08 AM, Miguel Ojeda wrote: The upcoming GCC 9 release extends the -Wmissing-attributes warnings (enabled by -Wall) to C and aliases: it warns when particular function attributes are missing in the aliases but not in their target, e.g.: void __cold f(void) {} void __alias("f") g(void); diagnoses: warning: 'g' specifies less restrictive attribute than its target 'f': 'cold' [-Wmissing-attributes] These patch series clean these new warnings. Most of them are caused by the module_init/exit macros. The first patch has been in -next for a long time already, and an alternative solution (only __cold) for module.h as well. However, since we decided to go with the new __copy attribute, I will leave the series for a few days again and send the PR for -rc7. Link: https://lore.kernel.org/lkml/20190125104353.2791-1-labb...@redhat.com/ Miguel Ojeda (3): lib/crc32.c: mark crc32_le_base/__crc32c_le_base aliases as __pure Compiler Attributes: add support for __copy (gcc >= 9) include/linux/module.h: copy __init/__exit attrs to init/cleanup_module include/linux/compiler_attributes.h | 14 ++ include/linux/module.h | 4 ++-- lib/crc32.c | 4 ++-- 3 files changed, 18 insertions(+), 4 deletions(-) For the entire series: Tested-by: Laura Abbott You can look at the full build logs at https://koji.fedoraproject.org/koji/taskinfo?taskID=326911 . Thanks, Laura
Re: Userspace regression with 6baca7601bde ("scsi: target: drop unused pi_prot_format attribute storage")
On 2/4/19 1:40 AM, David Disseldorp wrote: Hi Laura, Thanks for the report... On Sun, 3 Feb 2019 17:56:00 +0100, Laura Abbott wrote: Fedora got a bug report of a new permission denied error with 5.0-rc2: File "/usr/lib/python3.7/site-packages/rtslib_fb/utils.py", line 100, in fread with open(path, 'r') as file_fd: PermissionError: [Errno 13] Permission denied: '/sys/kernel/config/target/core/fileio_28/xxx/attrib/pi_prot_format' This looks like an intentional behavior change with commit 6baca7601bdee2e57f20c45d63eb53b89b33e816 Author: David Disseldorp Date: Fri Nov 23 18:36:11 2018 +0100 scsi: target: drop unused pi_prot_format attribute storage On write, the pi_prot_format configfs attribute invokes the device format_prot() callback if present. Read dumps the contents of se_dev_attrib.pi_prot_format which is always zero. Make the configfs attribute write-only, and drop the always zero se_dev_attrib.pi_prot_format storage. Signed-off-by: David Disseldorp Reviewed-by: Christoph Hellwig Signed-off-by: Martin K. Petersen Unfortunately, existing code that's opening with read permissions is now broken. Can this be reverted? Full bug at https://bugzilla.redhat.com/show_bug.cgi?id=1667505 Lee (cc'ed) pinged me a couple of days ago about the same issue. My preference would be to add back a dummy read handler without the corresponding (unused) se_dev_attrib.pi_prot_format member. I'll prepare something tomorrow with this, but if it's urgent then I'd also be okay with a straight revert. Cheers, David A fix is fine by me. Thanks for the prompt response. Thanks, Laura
Userspace regression with 6baca7601bde ("scsi: target: drop unused pi_prot_format attribute storage")
Hi, Fedora got a bug report of a new permission denied error with 5.0-rc2: File "/usr/lib/python3.7/site-packages/rtslib_fb/utils.py", line 100, in fread with open(path, 'r') as file_fd: PermissionError: [Errno 13] Permission denied: '/sys/kernel/config/target/core/fileio_28/xxx/attrib/pi_prot_format' This looks like an intentional behavior change with commit 6baca7601bdee2e57f20c45d63eb53b89b33e816 Author: David Disseldorp Date: Fri Nov 23 18:36:11 2018 +0100 scsi: target: drop unused pi_prot_format attribute storage On write, the pi_prot_format configfs attribute invokes the device format_prot() callback if present. Read dumps the contents of se_dev_attrib.pi_prot_format which is always zero. Make the configfs attribute write-only, and drop the always zero se_dev_attrib.pi_prot_format storage. Signed-off-by: David Disseldorp Reviewed-by: Christoph Hellwig Signed-off-by: Martin K. Petersen Unfortunately, existing code that's opening with read permissions is now broken. Can this be reverted? Full bug at https://bugzilla.redhat.com/show_bug.cgi?id=1667505 Thanks, Laura
Re: [PATCH] s390/jump_label: Correct asm contraint
On 1/23/19 5:24 AM, Heiko Carstens wrote: On Wed, Jan 23, 2019 at 01:55:13PM +0100, Laura Abbott wrote: There's a build failure with gcc9: ./arch/s390/include/asm/jump_label.h: Assembler messages: ./arch/s390/include/asm/jump_label.h:23: Error: bad expression ./arch/s390/include/asm/jump_label.h:23: Error: junk at end of line, first unrecognized character is `r' make[1]: *** [scripts/Makefile.build:277: init/main.o] Error 1 According to the toolchain people, the actual issue is the use of "X" constraint which is too permissive. Switch to using "i" instead. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1668703 Signed-off-by: Laura Abbott --- arch/s390/include/asm/jump_label.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/s390/include/asm/jump_label.h b/arch/s390/include/asm/jump_label.h index e2d3e6c43395..41dabfd8518d 100644 --- a/arch/s390/include/asm/jump_label.h +++ b/arch/s390/include/asm/jump_label.h @@ -22,7 +22,7 @@ static inline bool arch_static_branch(struct static_key *key, bool branch) ".long 0b-.,%l[label]-.\n" ".quad %0-.\n" ".popsection\n" - : : "X" (&((char *)key)[branch]) : : label); + : : "i" (&((char *)key)[branch]) : : label); return false; label: return true; Hmmm, this works only for the kernel image, but not for modules, which we compile with "-fPIC", which again doesn't work as described in the referenced bugzilla: In file included from ././include/linux/compiler_types.h:68, from : ./arch/s390/include/asm/jump_label.h: In function 'kvm_vcpu_ioctl': ./include/linux/compiler-gcc.h:124:38: warning: asm operand 0 probably doesn't match constraints #define asm_volatile_goto(x...) do { asm goto(x); asm (""); } while (0) ^~~ ./arch/s390/include/asm/jump_label.h:19:2: note: in expansion of macro 'asm_volatile_goto' asm_volatile_goto("0: brcl 0,"__stringify(JUMP_LABEL_NOP_OFFSET)"\n" ^ ./include/linux/compiler-gcc.h:124:38: error: impossible constraint in 'asm' #define asm_volatile_goto(x...) do { asm goto(x); asm (""); } while (0) ^~~ ./arch/s390/include/asm/jump_label.h:19:2: note: in expansion of macro 'asm_volatile_goto' asm_volatile_goto("0: brcl 0,"__stringify(JUMP_LABEL_NOP_OFFSET)"\n" Andreas, Ilya, any idea how to fix this? I've had to turn off s390 in Fedora until this gets fixed :(
Re: [RFC][PATCH] Update -Wattribute-alias for gcc9
On 1/28/19 5:28 AM, Bernd Edlinger wrote: On 1/25/19 1:24 PM, Bernd Edlinger wrote: On 1/25/19 12:39 PM, Miguel Ojeda wrote: On Fri, Jan 25, 2019 at 11:58 AM Arnd Bergmann wrote: On Fri, Jan 25, 2019 at 11:43 AM Laura Abbott wrote: Commit bee20031772a ("disable -Wattribute-alias warning for SYSCALL_DEFINEx()") disabled -Wattribute-alias with gcc8. gcc9 changed the format of -Wattribute-alias to take a parameter. This doesn't quite match with the existing disabling mechanism so update for gcc9 to match with the default (-Wattribute-alias=1). Signed-off-by: Laura Abbott --- This is RFC because it feels ugly. I went ahead and did the obvious fixup but it's worth discussing if we're going to end up with an explosion or if there's a better way to handle this in one macro. Bernd Edlinger has sent a patch to gcc for this: https://gcc.gnu.org/ml/gcc-patches/2019-01/msg01120.html and Miguel Ojeda said he wanted to send a patch for it to the kernel as well, not sure if he wanted to take a different approach there, so adding both to Cc here. Thanks Arnd (I was working with Martin on the expanded -Wmissing-attribute warnings, not on this, but thanks nevertheless :). Martin/Bernd: from the GCC mailing list I am not sure if we should expect the old behavior to be maintained or not. I believe it is not intentional to break the old syntax of the pragma. There will be new -Wattribute-alias=1 and -Wattribute-alias=2 and -Wattribute-alias is easy to retain as an alias for -Wattribute-alias=1. That is what my patch will do. Okay, I committed the -Wattribute-alias patch to gcc trunk, now as https://gcc.gnu.org/viewcvs/gcc?view=revision=268336 . So there will be no need for a workaround on your side. Also fixed a few false positive -Waddress-of-packed-member warnings with https://gcc.gnu.org/viewcvs/gcc?view=revision=268118 and https://gcc.gnu.org/viewcvs/gcc?view=revision=268337 . However there remain a lot of warnings from -Waddress-of-packed-member, that look more or less valid, has anybody an idea how to handle these? Bernd. Thanks, I'll keep an eye out when this lands in Fedora gcc. I think once everything lands it will be easier to work on the remaining -Waddress-of-packed-member. Laura
Re: [PATCH] include/linux/module.h: mark init/cleanup_module aliases as __cold
On 1/23/19 9:37 AM, Miguel Ojeda wrote: The upcoming GCC 9 release adds the -Wmissing-attributes warnings (enabled by -Wall), which trigger for all the init/cleanup_module aliases in the kernel (defined by the module_init/exit macros), ending up being very noisy. These aliases point to the __init/__exit functions of a module, which are defined as __cold (among other attributes). However, the aliases themselves do not have the __cold attribute. Since the compiler behaves differently when compiling a __cold function as well as when compiling paths leading to calls to __cold functions, the warning is trying to point out the possibly-forgotten attribute in the alias. In order to keep the warning enabled, we choose to silence the warning by marking the aliases as __cold. This is possible marking either the extern declaration, the definition, or both. In order to avoid changing the behavior of callers, we do it only in the definition of the aliases (since those are not seen by any other TU). Suggested-by: Martin Sebor Signed-off-by: Miguel Ojeda --- Note that an alternative is using the new copy attribute introduced by GCC 9 (Martin told me about it, as well as the new warning). What I am concerned about using __copy is that I am not sure we should be copying all the attributes (even if some are blacklisted by the copy itself), since: - We have unknown-to-GCC attributes (e.g. from plugins). - We wouldn't enjoy the fix for older compilers (e.g. if the fix had an actual impact). So here I took the conservative approach for the moment, and we can discuss/apply whether another solution is best. Jessica: please review what I explain in the commit message. Do we actually want the __cold attribute in the declaration as well? If yes, AFAIK, GCC would assume paths that end up calling the __init/__exit functions are not meant to be taken (but when we are asked to load modules, that is the expected path, no?). I will put this in the compiler-attributes tree and get some time in linux-next, unless you want to pick it up! include/linux/module.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/module.h b/include/linux/module.h index 8fa38d3e7538..c4e805e87628 100644 --- a/include/linux/module.h +++ b/include/linux/module.h @@ -129,13 +129,13 @@ extern void cleanup_module(void); #define module_init(initfn) \ static inline initcall_t __maybe_unused __inittest(void) \ { return initfn; } \ - int init_module(void) __attribute__((alias(#initfn))); + int init_module(void) __cold __attribute__((alias(#initfn))); /* This is only required if you want to be unloadable. */ #define module_exit(exitfn) \ static inline exitcall_t __maybe_unused __exittest(void) \ { return exitfn; } \ - void cleanup_module(void) __attribute__((alias(#exitfn))); + void cleanup_module(void) __cold __attribute__((alias(#exitfn))); #endif Tested-by: Laura Abbott
Re: [PATCH] lib/crc32.c: mark crc32_le_base/__crc32c_le_base aliases as __pure
On 1/24/19 7:44 AM, Miguel Ojeda wrote: The upcoming GCC 9 release extends the -Wmissing-attributes warnings (enabled by -Wall) to C and aliases: it warns when particular function attributes are missing in the aliases but not in their target. In particular, it triggers here because crc32_le_base/__crc32c_le_base aren't __pure while their target crc32_le/__crc32c_le are. These aliases are used by architectures as a fallback in accelerated versions of CRC32. See commit 9784d82db3eb ("lib/crc32: make core crc32() routines weak so they can be overridden"). Therefore, being fallbacks, it is likely that even if the aliases were called from C, there wouldn't be any optimizations possible. Currently, the only user is arm64, which calls this from asm. Still, marking the aliases as __pure makes sense and is a good idea for documentation purposes and possible future optimizations, which also silences the warning. Signed-off-by: Miguel Ojeda --- I am picking this up through the compiler-attributes tree and putting it into -next along with the other cleanup for -Wmissing-attributes (unless some other maintainer wants it). lib/crc32.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/crc32.c b/lib/crc32.c index 45b1d67a1767..4a20455d1f61 100644 --- a/lib/crc32.c +++ b/lib/crc32.c @@ -206,8 +206,8 @@ u32 __pure __weak __crc32c_le(u32 crc, unsigned char const *p, size_t len) EXPORT_SYMBOL(crc32_le); EXPORT_SYMBOL(__crc32c_le); -u32 crc32_le_base(u32, unsigned char const *, size_t) __alias(crc32_le); -u32 __crc32c_le_base(u32, unsigned char const *, size_t) __alias(__crc32c_le); +u32 __pure crc32_le_base(u32, unsigned char const *, size_t) __alias(crc32_le); +u32 __pure __crc32c_le_base(u32, unsigned char const *, size_t) __alias(__crc32c_le); /* * This multiplies the polynomials x and y modulo the given modulus. Tested-by: Laura Abbott
[RFC][PATCH] Update -Wattribute-alias for gcc9
Commit bee20031772a ("disable -Wattribute-alias warning for SYSCALL_DEFINEx()") disabled -Wattribute-alias with gcc8. gcc9 changed the format of -Wattribute-alias to take a parameter. This doesn't quite match with the existing disabling mechanism so update for gcc9 to match with the default (-Wattribute-alias=1). Signed-off-by: Laura Abbott --- This is RFC because it feels ugly. I went ahead and did the obvious fixup but it's worth discussing if we're going to end up with an explosion or if there's a better way to handle this in one macro. --- include/linux/compat.h | 2 ++ include/linux/compiler-gcc.h | 7 ++- include/linux/syscalls.h | 2 ++ 3 files changed, 10 insertions(+), 1 deletion(-) diff --git a/include/linux/compat.h b/include/linux/compat.h index 056be0d03722..d5d7700f26a6 100644 --- a/include/linux/compat.h +++ b/include/linux/compat.h @@ -75,6 +75,8 @@ __diag_push(); \ __diag_ignore(GCC, 8, "-Wattribute-alias", \ "Type aliasing is used to sanitize syscall arguments");\ + __diag_ignore(GCC, 9, "-Wattribute-alias=1", \ + "Type aliasing is used to sanitize syscall arguments");\ asmlinkage long compat_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)); \ asmlinkage long compat_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \ __attribute__((alias(__stringify(__se_compat_sys##name; \ diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h index e8579412ad21..75079668344c 100644 --- a/include/linux/compiler-gcc.h +++ b/include/linux/compiler-gcc.h @@ -165,8 +165,13 @@ #define __diag_str(s) __diag_str1(s) #define __diag(s) _Pragma(__diag_str(GCC diagnostic s)) -#if GCC_VERSION >= 8 +#if GCC_VERSION >= 9 +#define __diag_GCC_8(s) +#define __diag_GCC_9(s)__diag(s) +#elif GCC_VERSION >= 8 #define __diag_GCC_8(s)__diag(s) +#define __diag_GCC_9(s) #else #define __diag_GCC_8(s) +#define __diag_GCC_9(s) #endif diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 257cccba3062..a08059e1ccf4 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -236,6 +236,8 @@ static inline int is_syscall_trace_event(struct trace_event_call *tp_event) __diag_push(); \ __diag_ignore(GCC, 8, "-Wattribute-alias", \ "Type aliasing is used to sanitize syscall arguments");\ + __diag_ignore(GCC, 9, "-Wattribute-alias=1",\ + "Type aliasing is used to sanitize syscall arguments");\ asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \ __attribute__((alias(__stringify(__se_sys##name;\ ALLOW_ERROR_INJECTION(sys##name, ERRNO);\ -- 2.20.1
New warnings with gcc9
As an FYI for anyone who is interested, gcc9 landed in Fedora rawhide and I built the kernel as usual. There's a very large number of warnings from -Waddress-of-packed-member e.g. BUILDSTDERR: drivers/scsi/ipr.c: In function 'ipr_handle_config_change': BUILDSTDERR: drivers/scsi/ipr.c:1453:22: warning: taking address of packed member of 'struct ipr_hostrcb_cfg_ch_not' may result in an unaligned pointer value [-Waddress-of-packed-member] BUILDSTDERR: 1453 | cfgtew.u.cfgte64 = >hcam.u.ccn.u.cfgte64; BUILDSTDERR: | There's also some missing attribute warnings. The majority are caused by a missing attribute in module_init/module_exit BUILDSTDERR: In file included from drivers/scsi/hptiop.c:18: BUILDSTDERR: ./include/linux/module.h:132:6: warning: 'init_module' specifies less restrictive attribute than its target 'hptiop_module_init': 'cold' [-Wmissing-attributes] BUILDSTDERR: 132 | int init_module(void) __attribute__((alias(#initfn))); BUILDSTDERR: | ^~~ BUILDSTDERR: drivers/scsi/hptiop.c:1704:1: note: in expansion of macro 'module_init' BUILDSTDERR: 1704 | module_init(hptiop_module_init); BUILDSTDERR: | ^~~ BUILDSTDERR: drivers/scsi/hptiop.c:1692:19: note: 'init_module' target declared here BUILDSTDERR: 1692 | static int __init hptiop_module_init(void) BUILDSTDERR: | I also hit a build failure on s390, This looks to be an issue with the assembly, I'm in discussion with the s390 people about the proper fix. BUILDSTDERR: ./arch/s390/include/asm/jump_label.h: Assembler messages: BUILDSTDERR: ./arch/s390/include/asm/jump_label.h:23: Error: bad expression BUILDSTDERR: ./arch/s390/include/asm/jump_label.h:23: Error: junk at end of line, first unrecognized character is `r' BUILDSTDERR: make[1]: *** [scripts/Makefile.build:277: init/main.o] Error 1 This looks like a new warning due to hvclock_page being declared u8 and then cast to a struct BUILDSTDERR: arch/x86/entry/vdso/vclock_gettime.c: In function 'do_hres': BUILDSTDERR: ./include/linux/compiler.h:182:26: warning: array subscript 1 is outside array bounds of 'u8[1]' {aka 'unsigned char[1]'} [-Warray-bounds] BUILDSTDERR: 182 | case 8: *(__u64 *)res = *(volatile __u64 *)p; break; \ BUILDSTDERR: | ^~~~ BUILDSTDERR: ./include/linux/compiler.h:193:2: note: in expansion of macro '__READ_ONCE_SIZE' BUILDSTDERR: 193 | __READ_ONCE_SIZE; BUILDSTDERR: | ^~~~ BUILDSTDERR: arch/x86/entry/vdso/vclock_gettime.c:37:11: note: while referencing 'hvclock_page' BUILDSTDERR:37 | extern u8 hvclock_page BUILDSTDERR: | ^~~~ This looks like we just need to make the attributes match BUILDSTDERR: lib/crc32.c:209:5: warning: 'crc32_le_base' specifies less restrictive attribute than its target 'crc32_le': 'pure' [-Wmissing-attributes] BUILDSTDERR: 209 | u32 crc32_le_base(u32, unsigned char const *, size_t) __alias(crc32_le); BUILDSTDERR: | ^ BUILDSTDERR: lib/crc32.c:195:19: note: 'crc32_le_base' target declared here BUILDSTDERR: 195 | u32 __pure __weak crc32_le(u32 crc, unsigned char const *p, size_t len) BUILDSTDERR: | ^~~~ BUILDSTDERR: lib/crc32.c:210:5: warning: '__crc32c_le_base' specifies less restrictive attribute than its target '__crc32c_le': 'pure' [-Wmissing-attributes] BUILDSTDERR: 210 | u32 __crc32c_le_base(u32, unsigned char const *, size_t) __alias(__crc32c_le); BUILDSTDERR: | ^~~~ BUILDSTDERR: lib/crc32.c:200:19: note: '__crc32c_le_base' target declared here BUILDSTDERR: 200 | u32 __pure __weak __crc32c_le(u32 crc, unsigned char const *p, size_t len) BUILDSTDERR: | I think this is a corner case in the code flow so it should be fixed up BUILDSTDERR: drivers/i2c/i2c-core-base.c: In function 'i2c_generic_scl_recovery': BUILDSTDERR: drivers/i2c/i2c-core-base.c:235:5: warning: 'ret' may be used uninitialized in this function [-Wmaybe-uninitialized] BUILDSTDERR: 235 | if (ret == -EOPNOTSUPP) BUILDSTDERR: | ^ There's also a bunch of warnings from -Wpragma that show up on powerpc and arm32 BUILDSTDERR: In file included from ././include/linux/compiler_types.h:68, BUILDSTDERR: from : BUILDSTDERR: ./include/linux/compiler-gcc.h:166:20: warning: unknown option after '#pragma GCC diagnostic' kind [-Wpragmas] BUILDSTDERR: 166 | #define __diag(s) _Pragma(__diag_str(GCC diagnostic s)) BUILDSTDERR: |^~~ BUILDSTDERR: ./include/linux/compiler-gcc.h:169:26: note: in expansion of macro '__diag' BUILDSTDERR: 169 | #define __diag_GCC_8(s) __diag(s) BUILDSTDERR: | ^~ BUILDSTDERR: ./include/linux/compiler-gcc.h:157:2: note: in expansion of macro '__diag_GCC_8' BUILDSTDERR: 157 | __diag_GCC_ ## version(__diag_GCC_ ## severity s) BUILDSTDERR: | ^~~ BUILDSTDERR: ././include/linux/compiler_types.h:212:2: note: in
[PATCH] s390/jump_label: Correct asm contraint
There's a build failure with gcc9: ./arch/s390/include/asm/jump_label.h: Assembler messages: ./arch/s390/include/asm/jump_label.h:23: Error: bad expression ./arch/s390/include/asm/jump_label.h:23: Error: junk at end of line, first unrecognized character is `r' make[1]: *** [scripts/Makefile.build:277: init/main.o] Error 1 According to the toolchain people, the actual issue is the use of "X" constraint which is too permissive. Switch to using "i" instead. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1668703 Signed-off-by: Laura Abbott --- arch/s390/include/asm/jump_label.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/s390/include/asm/jump_label.h b/arch/s390/include/asm/jump_label.h index e2d3e6c43395..41dabfd8518d 100644 --- a/arch/s390/include/asm/jump_label.h +++ b/arch/s390/include/asm/jump_label.h @@ -22,7 +22,7 @@ static inline bool arch_static_branch(struct static_key *key, bool branch) ".long0b-.,%l[label]-.\n" ".quad%0-.\n" ".popsection\n" - : : "X" (&((char *)key)[branch]) : : label); + : : "i" (&((char *)key)[branch]) : : label); return false; label: return true; @@ -36,7 +36,7 @@ static inline bool arch_static_branch_jump(struct static_key *key, bool branch) ".long0b-.,%l[label]-.\n" ".quad%0-.\n" ".popsection\n" - : : "X" (&((char *)key)[branch]) : : label); + : : "i" (&((char *)key)[branch]) : : label); return false; label: return true; -- 2.20.1
Re: [PATCH 2/4] arm: dump: no need to check return value of debugfs_create functions
On 1/22/19 6:41 AM, Greg Kroah-Hartman wrote: When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Cc: Russell King Cc: Jinbum Park Cc: Kees Cook Cc: Laura Abbott Acked-by: Laura Abbott Cc: linux-arm-ker...@lists.infradead.org Signed-off-by: Greg Kroah-Hartman --- arch/arm/include/asm/ptdump.h | 9 +++-- arch/arm/mm/dump.c| 4 ++-- arch/arm/mm/ptdump_debugfs.c | 8 ++-- 3 files changed, 7 insertions(+), 14 deletions(-) diff --git a/arch/arm/include/asm/ptdump.h b/arch/arm/include/asm/ptdump.h index 3ebf9718288d..0c2d3d0d4cc6 100644 --- a/arch/arm/include/asm/ptdump.h +++ b/arch/arm/include/asm/ptdump.h @@ -21,13 +21,10 @@ struct ptdump_info { void ptdump_walk_pgd(struct seq_file *s, struct ptdump_info *info); #ifdef CONFIG_ARM_PTDUMP_DEBUGFS -int ptdump_debugfs_register(struct ptdump_info *info, const char *name); +void ptdump_debugfs_register(struct ptdump_info *info, const char *name); #else -static inline int ptdump_debugfs_register(struct ptdump_info *info, - const char *name) -{ - return 0; -} +static inline void ptdump_debugfs_register(struct ptdump_info *info, + const char *name) { } #endif /* CONFIG_ARM_PTDUMP_DEBUGFS */ void ptdump_check_wx(void); diff --git a/arch/arm/mm/dump.c b/arch/arm/mm/dump.c index 084779c5c893..eb385a500ed0 100644 --- a/arch/arm/mm/dump.c +++ b/arch/arm/mm/dump.c @@ -450,7 +450,7 @@ void ptdump_check_wx(void) static int ptdump_init(void) { ptdump_initialize(); - return ptdump_debugfs_register(_ptdump_info, - "kernel_page_tables"); + ptdump_debugfs_register(_ptdump_info, "kernel_page_tables"); + return 0; } __initcall(ptdump_init); diff --git a/arch/arm/mm/ptdump_debugfs.c b/arch/arm/mm/ptdump_debugfs.c index be8d87be4b93..598b636615a2 100644 --- a/arch/arm/mm/ptdump_debugfs.c +++ b/arch/arm/mm/ptdump_debugfs.c @@ -24,11 +24,7 @@ static const struct file_operations ptdump_fops = { .release= single_release, }; -int ptdump_debugfs_register(struct ptdump_info *info, const char *name) +void ptdump_debugfs_register(struct ptdump_info *info, const char *name) { - struct dentry *pe; - - pe = debugfs_create_file(name, 0400, NULL, info, _fops); - return pe ? 0 : -ENOMEM; - + debugfs_create_file(name, 0400, NULL, info, _fops); }
Re: [PATCH 3/4] dma-buf: add support for mapping with dma mapping attributes
On 1/19/19 2:25 AM, Christoph Hellwig wrote: On Fri, Jan 18, 2019 at 10:37:46AM -0800, Liam Mark wrote: Add support for configuring dma mapping attributes when mapping and unmapping memory through dma_buf_map_attachment and dma_buf_unmap_attachment. Signed-off-by: Liam Mark And who is going to decide which ones to pass? And who documents which ones are safe? I'd much rather have explicit, well documented dma-buf flags that might get translated to the DMA API flags, which are not error checked, not very well documented and way to easy to get wrong. I'm not sure having flags in dma-buf really solves anything given drivers can use the attributes directly with dma_map anyway, which is what we're looking to do. The intention is for the driver creating the dma_buf attachment to have the knowledge of which flags to use. Thanks, Laura
Re: [PATCH 3/4] dma-buf: add support for mapping with dma mapping attributes
On 1/18/19 1:32 PM, Liam Mark wrote: On Fri, 18 Jan 2019, Laura Abbott wrote: On 1/18/19 10:37 AM, Liam Mark wrote: Add support for configuring dma mapping attributes when mapping and unmapping memory through dma_buf_map_attachment and dma_buf_unmap_attachment. Signed-off-by: Liam Mark --- include/linux/dma-buf.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index 58725f890b5b..59bf33e09e2d 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -308,6 +308,8 @@ struct dma_buf { * @dev: device attached to the buffer. * @node: list of dma_buf_attachment. * @priv: exporter specific attachment data. + * @dma_map_attrs: DMA mapping attributes to be used in + *dma_buf_map_attachment() and dma_buf_unmap_attachment(). * * This structure holds the attachment information between the dma_buf buffer * and its user device(s). The list contains one attachment struct per device @@ -323,6 +325,7 @@ struct dma_buf_attachment { struct device *dev; struct list_head node; void *priv; + unsigned long dma_map_attrs; }; /** Did you miss part of this patch? This only adds it to the structure but doesn't add it to any API. The same commment applies to the follow up patch, I don't quite see how it's being used. Were you asking for a cleaner DMA-buf API to set this field or were you asking for a change to an upstream client to make use of this field? I have clients set the dma_map_attrs field directly on their dma_buf_attachment struct before calling dma_buf_map_attachment (if they need this functionality). Of course this is all being used in Android for out of tree drivers, but I assume it is just as useful to everyone else who has cached ION buffers which aren't always accessed by the CPU. My understanding is that AOSP Android on Hikey 960 also is currently suffering from too many CMOs due to dma_map_attachemnt always applying CMOs, so this support should help them avoid it. A I see how you intend this to be used now! I was missing that clients would do attachment->dma_map_attrs = blah and that was how it would be stored as opposed to passing it in at the top level for dma_buf_map. I'll give this some more thought but I think it could work if Sumit is okay with the approach. Thanks, Laura
Re: [PATCH 3/4] dma-buf: add support for mapping with dma mapping attributes
On 1/18/19 10:37 AM, Liam Mark wrote: Add support for configuring dma mapping attributes when mapping and unmapping memory through dma_buf_map_attachment and dma_buf_unmap_attachment. Signed-off-by: Liam Mark --- include/linux/dma-buf.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index 58725f890b5b..59bf33e09e2d 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -308,6 +308,8 @@ struct dma_buf { * @dev: device attached to the buffer. * @node: list of dma_buf_attachment. * @priv: exporter specific attachment data. + * @dma_map_attrs: DMA mapping attributes to be used in + *dma_buf_map_attachment() and dma_buf_unmap_attachment(). * * This structure holds the attachment information between the dma_buf buffer * and its user device(s). The list contains one attachment struct per device @@ -323,6 +325,7 @@ struct dma_buf_attachment { struct device *dev; struct list_head node; void *priv; + unsigned long dma_map_attrs; }; /** Did you miss part of this patch? This only adds it to the structure but doesn't add it to any API. The same commment applies to the follow up patch, I don't quite see how it's being used. Thanks, Laura
Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap
On 1/18/19 12:43 PM, Andrew F. Davis wrote: On 1/18/19 2:31 PM, Laura Abbott wrote: On 1/17/19 8:13 AM, Andrew F. Davis wrote: On 1/16/19 4:48 PM, Liam Mark wrote: On Wed, 16 Jan 2019, Andrew F. Davis wrote: On 1/15/19 1:05 PM, Laura Abbott wrote: On 1/15/19 10:38 AM, Andrew F. Davis wrote: On 1/15/19 11:45 AM, Liam Mark wrote: On Tue, 15 Jan 2019, Andrew F. Davis wrote: On 1/14/19 11:13 AM, Liam Mark wrote: On Fri, 11 Jan 2019, Andrew F. Davis wrote: Buffers may not be mapped from the CPU so skip cache maintenance here. Accesses from the CPU to a cached heap should be bracketed with {begin,end}_cpu_access calls so maintenance should not be needed anyway. Signed-off-by: Andrew F. Davis --- drivers/staging/android/ion/ion.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c index 14e48f6eb734..09cb5a8e2b09 100644 --- a/drivers/staging/android/ion/ion.c +++ b/drivers/staging/android/ion/ion.c @@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct dma_buf_attachment *attachment, table = a->table; - if (!dma_map_sg(attachment->dev, table->sgl, table->nents, - direction)) + if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents, + direction, DMA_ATTR_SKIP_CPU_SYNC)) Unfortunately I don't think you can do this for a couple reasons. You can't rely on {begin,end}_cpu_access calls to do cache maintenance. If the calls to {begin,end}_cpu_access were made before the call to dma_buf_attach then there won't have been a device attached so the calls to {begin,end}_cpu_access won't have done any cache maintenance. That should be okay though, if you have no attachments (or all attachments are IO-coherent) then there is no need for cache maintenance. Unless you mean a sequence where a non-io-coherent device is attached later after data has already been written. Does that sequence need supporting? Yes, but also I think there are cases where CPU access can happen before in Android, but I will focus on later for now. DMA-BUF doesn't have to allocate the backing memory until map_dma_buf() time, and that should only happen after all the devices have attached so it can know where to put the buffer. So we shouldn't expect any CPU access to buffers before all the devices are attached and mapped, right? Here is an example where CPU access can happen later in Android. Camera device records video -> software post processing -> video device (who does compression of raw data) and writes to a file In this example assume the buffer is cached and the devices are not IO-coherent (quite common). This is the start of the problem, having cached mappings of memory that is also being accessed non-coherently is going to cause issues one way or another. On top of the speculative cache fills that have to be constantly fought back against with CMOs like below; some coherent interconnects behave badly when you mix coherent and non-coherent access (snoop filters get messed up). The solution is to either always have the addresses marked non-coherent (like device memory, no-map carveouts), or if you really want to use regular system memory allocated at runtime, then all cached mappings of it need to be dropped, even the kernel logical address (area as painful as that would be). I agree it's broken, hence my desire to remove it :) The other problem is that uncached buffers are being used for performance reason so anything that would involve getting rid of the logical address would probably negate any performance benefit. I wouldn't go as far as to remove them just yet.. Liam seems pretty adamant that they have valid uses. I'm just not sure performance is one of them, maybe in the case of software locks between devices or something where there needs to be a lot of back and forth interleaved access on small amounts of data? I wasn't aware that ARM considered this not supported, I thought it was supported but they advised against it because of the potential performance impact. Not sure what you mean by "this" being not supported, do you mean mixed attribute mappings? If so, it will certainly cause problems, and the problems will change from platform to platform, avoid at all costs is my understanding of ARM's position. This is after all supported in the DMA APIs and up until now devices have been successfully commercializing with this configurations, and I think they will continue to commercialize with these configurations for quite a while. Use of uncached memory mappings are almost always wrong in my experience and are used to work around some bug or because the user doesn't want to implement proper CMOs. Counter examples welcome. It would be really unfortunate if support was removed as I think that would drive clients away from using upstream ION. I'm not petitioning to remove su
Re: [PATCH 1/4] staging: android: ion: Support cpu access during dma_buf_detach
On 1/18/19 10:37 AM, Liam Mark wrote: Often userspace doesn't know when the kernel will be calling dma_buf_detach on the buffer. If userpace starts its CPU access at the same time as the sg list is being freed it could end up accessing the sg list after it has been freed. Thread AThread B - DMA_BUF_IOCTL_SYNC IOCT - ion_dma_buf_begin_cpu_access - list_for_each_entry - ion_dma_buf_detatch - free_duped_table - dma_sync_sg_for_cpu Fix this by getting the ion_buffer lock before freeing the sg table memory. Fixes: 2a55e7b5e544 ("staging: android: ion: Call dma_map_sg for syncing and mapping") Signed-off-by: Liam Mark --- drivers/staging/android/ion/ion.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c index a0802de8c3a1..6f5afab7c1a1 100644 --- a/drivers/staging/android/ion/ion.c +++ b/drivers/staging/android/ion/ion.c @@ -248,10 +248,10 @@ static void ion_dma_buf_detatch(struct dma_buf *dmabuf, struct ion_dma_buf_attachment *a = attachment->priv; struct ion_buffer *buffer = dmabuf->priv; - free_duped_table(a->table); mutex_lock(>lock); list_del(>list); mutex_unlock(>lock); + free_duped_table(a->table); kfree(a); } Acked-by: Laura Abbott
Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap
On 1/17/19 8:13 AM, Andrew F. Davis wrote: On 1/16/19 4:48 PM, Liam Mark wrote: On Wed, 16 Jan 2019, Andrew F. Davis wrote: On 1/15/19 1:05 PM, Laura Abbott wrote: On 1/15/19 10:38 AM, Andrew F. Davis wrote: On 1/15/19 11:45 AM, Liam Mark wrote: On Tue, 15 Jan 2019, Andrew F. Davis wrote: On 1/14/19 11:13 AM, Liam Mark wrote: On Fri, 11 Jan 2019, Andrew F. Davis wrote: Buffers may not be mapped from the CPU so skip cache maintenance here. Accesses from the CPU to a cached heap should be bracketed with {begin,end}_cpu_access calls so maintenance should not be needed anyway. Signed-off-by: Andrew F. Davis --- drivers/staging/android/ion/ion.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c index 14e48f6eb734..09cb5a8e2b09 100644 --- a/drivers/staging/android/ion/ion.c +++ b/drivers/staging/android/ion/ion.c @@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct dma_buf_attachment *attachment, table = a->table; - if (!dma_map_sg(attachment->dev, table->sgl, table->nents, - direction)) + if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents, + direction, DMA_ATTR_SKIP_CPU_SYNC)) Unfortunately I don't think you can do this for a couple reasons. You can't rely on {begin,end}_cpu_access calls to do cache maintenance. If the calls to {begin,end}_cpu_access were made before the call to dma_buf_attach then there won't have been a device attached so the calls to {begin,end}_cpu_access won't have done any cache maintenance. That should be okay though, if you have no attachments (or all attachments are IO-coherent) then there is no need for cache maintenance. Unless you mean a sequence where a non-io-coherent device is attached later after data has already been written. Does that sequence need supporting? Yes, but also I think there are cases where CPU access can happen before in Android, but I will focus on later for now. DMA-BUF doesn't have to allocate the backing memory until map_dma_buf() time, and that should only happen after all the devices have attached so it can know where to put the buffer. So we shouldn't expect any CPU access to buffers before all the devices are attached and mapped, right? Here is an example where CPU access can happen later in Android. Camera device records video -> software post processing -> video device (who does compression of raw data) and writes to a file In this example assume the buffer is cached and the devices are not IO-coherent (quite common). This is the start of the problem, having cached mappings of memory that is also being accessed non-coherently is going to cause issues one way or another. On top of the speculative cache fills that have to be constantly fought back against with CMOs like below; some coherent interconnects behave badly when you mix coherent and non-coherent access (snoop filters get messed up). The solution is to either always have the addresses marked non-coherent (like device memory, no-map carveouts), or if you really want to use regular system memory allocated at runtime, then all cached mappings of it need to be dropped, even the kernel logical address (area as painful as that would be). I agree it's broken, hence my desire to remove it :) The other problem is that uncached buffers are being used for performance reason so anything that would involve getting rid of the logical address would probably negate any performance benefit. I wouldn't go as far as to remove them just yet.. Liam seems pretty adamant that they have valid uses. I'm just not sure performance is one of them, maybe in the case of software locks between devices or something where there needs to be a lot of back and forth interleaved access on small amounts of data? I wasn't aware that ARM considered this not supported, I thought it was supported but they advised against it because of the potential performance impact. Not sure what you mean by "this" being not supported, do you mean mixed attribute mappings? If so, it will certainly cause problems, and the problems will change from platform to platform, avoid at all costs is my understanding of ARM's position. This is after all supported in the DMA APIs and up until now devices have been successfully commercializing with this configurations, and I think they will continue to commercialize with these configurations for quite a while. Use of uncached memory mappings are almost always wrong in my experience and are used to work around some bug or because the user doesn't want to implement proper CMOs. Counter examples welcome. It would be really unfortunate if support was removed as I think that would drive clients away from using upstream ION. I'm not petitioning to remove support, but at very least lets reverse the ION_FLAG_CACHED flag. Ion should hand out cached normal m
Re: [PATCH 00/14] Misc ION cleanups and adding unmapped heap
On 1/16/19 8:05 AM, Andrew F. Davis wrote: On 1/15/19 12:58 PM, Laura Abbott wrote: On 1/15/19 9:47 AM, Andrew F. Davis wrote: On 1/14/19 8:39 PM, Laura Abbott wrote: On 1/11/19 10:05 AM, Andrew F. Davis wrote: Hello all, This is a set of (hopefully) non-controversial cleanups for the ION framework and current set of heaps. These were found as I start to familiarize myself with the framework to help in whatever way I can in getting all this up to the standards needed for de-staging. I would like to get some ideas of what is left to work on to get ION out of staging. Has there been some kind of agreement on what ION should eventually end up being? To me it looks like it is being whittled away at to it's most core functions. To me that is looking like being a DMA-BUF user-space front end, simply advertising available memory backings in a system and providing allocations as DMA-BUF handles. If this is the case then it looks close to being ready to me at least, but I would love to hear any other opinions and concerns. Yes, at this point the only functionality that people are really depending on is the ability to allocate a dma_buf easily from userspace. Back to this patchset, the last patch may be a bit different than the others, it adds an unmapped heaps type and creation helper. I wanted to get this in to show off another heap type and maybe some issues we may have with the current ION framework. The unmapped heap is used when the backing memory should not (or cannot) be touched. Currently this kind of heap is used for firewalled secure memory that can be allocated like normal heap memory but only used by secure devices (OP-TEE, crypto HW, etc). It is basically just copied from the "carveout" heap type with the only difference being it is not mappable to userspace and we do not clear the memory (as we should not map it either). So should this really be a new heap type? Or maybe advertised as a carveout heap but with an additional allocation flag? Perhaps we do away with "types" altogether and just have flags, coherent/non-coherent, mapped/unmapped, etc. Maybe more thinking will be needed afterall.. So the cleanup looks okay (I need to finish reviewing) but I'm not a fan of adding another heaptype without solving the problem of adding some sort of devicetree binding or other method of allocating and placing Ion heaps. That plus uncached buffers are one of the big open problems that need to be solved for destaging Ion. See https://lore.kernel.org/lkml/20181120164636.jcw7li2uaa3cmwc3@DESKTOP-E1NTVVP.localdomain/ for some background on that problem. I'm under the impression that adding heaps like carveouts/chunk will be rather system specific and so do not lend themselves well to a universal DT style exporter. For instance a carveout memory space can be reported by a device at runtime, then the driver managing that device should go and use the carveout heap helpers to export that heap. If this is the case then I'm not sure it is a problem for the ION core framework to solve, but rather the users of it to figure out how best to create the various heaps. All Ion needs to do is allow exporting and advertising them IMHO. I think it is a problem for the Ion core framework to take care of. Ion is useless if you don't actually have the heaps. Nobody has actually gotten a full Ion solution end-to-end with a carveout heap working in mainline because any proposals have been rejected. I think we need at least one example in mainline of how creating a carveout heap would work. In our evil vendor trees we have several examples. The issue being that Ion is still staging and attempts for generic DT heap definitions haven't seemed to go so well. So for now we just keep it specific to our platforms until upstream makes a direction decision. Yeah, it's been a bit of a chicken and egg in that this has been blocking Ion getting out of staging but we don't actually have in-tree users because it's still in staging. Thanks for the background thread link, I've been looking for some info on current status of all this and "ion" is a bit hard to search the lists for. The core reason for posting this cleanup series is to throw my hat into the ring of all this Ion work and start getting familiar with the pending issues. The last two patches are not all that important to get in right now. In that thread you linked above, it seems we may have arrived at a similar problem for different reasons. I think the root issue is the Ion core makes too many assumptions about the heap memory. My proposal would be to allow the heap exporters more control over the DMA-BUF ops, maybe even going as far as letting them provide their own complete struct dma_buf_ops. Let me give an example where I think this is going to be useful. We have the classic constraint solving problem on our SoCs. Our SoCs are full of various coherent and non-coherent devices, some require contiguous memory allocations,
Re: [PATCH 12/14] staging: android: ion: Declare helpers for carveout and chunk heaps
On 1/18/19 1:59 AM, Greg Kroah-Hartman wrote: On Fri, Jan 11, 2019 at 12:05:21PM -0600, Andrew F. Davis wrote: When enabled the helpers functions for creating carveout and chunk heaps should have declarations in the ION header. Why? No one calls these from what I can tell. Which makes me believe we should just delete the drivers/staging/android/ion/ion_carveout_heap.c and drivers/staging/android/ion/ion_chunk_heap.c files as there are no in-tree users? Any objection to me doing that? thanks, greg k-h I'd rather not delete it quite yet. Part of this entire thread is a discussion on how to let those heaps and associated function actually be called in some way in tree. I expect them to either get called in tree or be replaced. Thanks, Laura
Re: [PATCH 11/14] staging: android: ion: Allow heap name to be null
On 1/16/19 9:12 AM, Andrew F. Davis wrote: On 1/16/19 9:28 AM, Brian Starkey wrote: Hi Andrew, On Fri, Jan 11, 2019 at 12:05:20PM -0600, Andrew F. Davis wrote: The heap name can be used for debugging but otherwise does not seem to be required and no other part of the code will fail if left NULL except here. We can make it required and check for it at some point, for now lets just prevent this from causing a NULL pointer exception. I'm not so keen on this one. In the "new" API with heap querying, the name string is the only way to identify the heap. I think Laura mentioned at XDC2017 that it was expected that userspace should use the strings to find the heap they want. Right now the names are only for debug. I accidentally left the name null once and got a kernel crash. This is the only spot where it is needed so I fixed it up. The other option is to make the name mandatory and properly error out, I don't want to do that right now until the below discussion is had to see if names really do matter or not. Yes, the heap names are part of the query API and are the expected way to identify individual heaps for the API at the moment so having a null heap name is incorrect. The heap name seemed like the best way to identify heaps to userspace but if you have an alternative proposal I'd be interested. Thanks, Laura
Re: [RFC PATCH v7 00/16] Add support for eXclusive Page Frame Ownership
On 1/10/19 1:09 PM, Khalid Aziz wrote: I am continuing to build on the work Juerg, Tycho and Julian have done on XPFO. After the last round of updates, we were seeing very significant performance penalties when stale TLB entries were flushed actively after an XPFO TLB update. Benchmark for measuring performance is kernel build using parallel make. To get full protection from ret2dir attackes, we must flush stale TLB entries. Performance penalty from flushing stale TLB entries goes up as the number of cores goes up. On a desktop class machine with only 4 cores, enabling TLB flush for stale entries causes system time for "make -j4" to go up by a factor of 2.614x but on a larger machine with 96 cores, system time with "make -j60" goes up by a factor of 26.366x! I have been working on reducing this performance penalty. I implemented a solution to reduce performance penalty and that has had large impact. When XPFO code flushes stale TLB entries, it does so for all CPUs on the system which may include CPUs that may not have any matching TLB entries or may never be scheduled to run the userspace task causing TLB flush. Problem is made worse by the fact that if number of entries being flushed exceeds tlb_single_page_flush_ceiling, it results in a full TLB flush on every CPU. A rogue process can launch a ret2dir attack only from a CPU that has dual mapping for its pages in physmap in its TLB. We can hence defer TLB flush on a CPU until a process that would have caused a TLB flush is scheduled on that CPU. I have added a cpumask to task_struct which is then used to post pending TLB flush on CPUs other than the one a process is running on. This cpumask is checked when a process migrates to a new CPU and TLB is flushed at that time. I measured system time for parallel make with unmodified 4.20 kernel, 4.20 with XPFO patches before this optimization and then again after applying this optimization. Here are the results: Hardware: 96-core Intel Xeon Platinum 8160 CPU @ 2.10GHz, 768 GB RAM make -j60 all 4.20915.183s 4.20+XPFO 24129.354s 26.366x 4.20+XPFO+Deferred flush1216.987s1.330xx Hardware: 4-core Intel Core i5-3550 CPU @ 3.30GHz, 8G RAM make -j4 all 4.20607.671s 4.20+XPFO 1588.646s 2.614x 4.20+XPFO+Deferred flush794.473s1.307xx 30+% overhead is still very high and there is room for improvement. Dave Hansen had suggested batch updating TLB entries and Tycho had created an initial implementation but I have not been able to get that to work correctly. I am still working on it and I suspect we will see a noticeable improvement in performance with that. In the code I added, I post a pending full TLB flush to all other CPUs even when number of TLB entries being flushed on current CPU does not exceed tlb_single_page_flush_ceiling. There has to be a better way to do this. I just haven't found an efficient way to implemented delayed limited TLB flush on other CPUs. I am not entirely sure if switch_mm_irqs_off() is indeed the right place to perform the pending TLB flush for a CPU. Any feedback on that will be very helpful. Delaying full TLB flushes on other CPUs seems to help tremendously, so if there is a better way to implement the same thing than what I have done in patch 16, I am open to ideas. Performance with this patch set is good enough to use these as starting point for further refinement before we merge it into main kernel, hence RFC. Since not flushing stale TLB entries creates a false sense of security, I would recommend making TLB flush mandatory and eliminate the "xpfotlbflush" kernel parameter (patch "mm, x86: omit TLB flushing by default for XPFO page table modifications"). What remains to be done beyond this patch series: 1. Performance improvements 2. Remove xpfotlbflush parameter 3. Re-evaluate the patch "arm64/mm: Add support for XPFO to swiotlb" from Juerg. I dropped it for now since swiotlb code for ARM has changed a lot in 4.20. 4. Extend the patch "xpfo, mm: Defer TLB flushes for non-current CPUs" to other architectures besides x86. - Juerg Haefliger (5): mm, x86: Add support for eXclusive Page Frame Ownership (XPFO) swiotlb: Map the buffer if it was unmapped by XPFO arm64/mm: Add support for XPFO arm64/mm, xpfo: temporarily map dcache regions lkdtm: Add test for XPFO Julian Stecklina (4): mm, x86: omit TLB flushing by default for XPFO page table modifications xpfo, mm: remove dependency on CONFIG_PAGE_EXTENSION xpfo, mm: optimize spinlock usage in xpfo_kunmap EXPERIMENTAL: xpfo, mm: optimize spin lock usage in xpfo_kmap Khalid Aziz (2): xpfo, mm: Fix hang when booting with "xpfotlbflush" xpfo, mm: Defer TLB flushes for non-current CPUs (x86 only) Tycho Andersen (5): mm: add MAP_HUGETLB support to vm_mmap x86: always set IF before
Re: [RFC PATCH v7 14/16] EXPERIMENTAL: xpfo, mm: optimize spin lock usage in xpfo_kmap
On 1/10/19 1:09 PM, Khalid Aziz wrote: From: Julian Stecklina We can reduce spin lock usage in xpfo_kmap to the 0->1 transition of the mapcount. This means that xpfo_kmap() can now race and that we get spurious page faults. The page fault handler helps the system make forward progress by fixing the page table instead of allowing repeated page faults until the right xpfo_kmap went through. Model-checked with up to 4 concurrent callers with Spin. This needs the spurious check for arm64 as well. This at least gets me booting but could probably use more review: diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c index 7d9571f4ae3d..8f425848cbb9 100644 --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c @@ -32,6 +32,7 @@ #include #include #include +#include #include #include @@ -289,6 +290,9 @@ static void __do_kernel_fault(unsigned long addr, unsigned int esr, if (!is_el1_instruction_abort(esr) && fixup_exception(regs)) return; + if (xpfo_spurious_fault(addr)) + return; + if (is_el1_permission_fault(addr, esr, regs)) { if (esr & ESR_ELx_WNR) msg = "write to read-only memory"; Signed-off-by: Julian Stecklina Cc: x...@kernel.org Cc: kernel-harden...@lists.openwall.com Cc: Vasileios P. Kemerlis Cc: Juerg Haefliger Cc: Tycho Andersen Cc: Marco Benatto Cc: David Woodhouse Signed-off-by: Khalid Aziz --- arch/x86/mm/fault.c | 4 include/linux/xpfo.h | 4 mm/xpfo.c| 50 +--- 3 files changed, 51 insertions(+), 7 deletions(-) diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index ba51652fbd33..207081dcd572 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -18,6 +18,7 @@ #include /* faulthandler_disabled() */ #include /* efi_recover_from_page_fault()*/ #include +#include #include /* boot_cpu_has, ... */ #include /* dotraplinkage, ... */ @@ -1218,6 +1219,9 @@ do_kern_addr_fault(struct pt_regs *regs, unsigned long hw_error_code, if (kprobes_fault(regs)) return; + if (xpfo_spurious_fault(address)) + return; + /* * Note, despite being a "bad area", there are quite a few * acceptable reasons to get here, such as erratum fixups diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h index ea512f49..58dd243637d2 100644 --- a/include/linux/xpfo.h +++ b/include/linux/xpfo.h @@ -54,6 +54,8 @@ bool xpfo_enabled(void); phys_addr_t user_virt_to_phys(unsigned long addr); +bool xpfo_spurious_fault(unsigned long addr); + #else /* !CONFIG_XPFO */ static inline void xpfo_init_single_page(struct page *page) { } @@ -81,6 +83,8 @@ static inline bool xpfo_enabled(void) { return false; } static inline phys_addr_t user_virt_to_phys(unsigned long addr) { return 0; } +static inline bool xpfo_spurious_fault(unsigned long addr) { return false; } + #endif /* CONFIG_XPFO */ #endif /* _LINUX_XPFO_H */ diff --git a/mm/xpfo.c b/mm/xpfo.c index dbf20efb0499..85079377c91d 100644 --- a/mm/xpfo.c +++ b/mm/xpfo.c @@ -119,6 +119,16 @@ void xpfo_free_pages(struct page *page, int order) } } +static void xpfo_do_map(void *kaddr, struct page *page) +{ + spin_lock(>xpfo_lock); + if (PageXpfoUnmapped(page)) { + set_kpte(kaddr, page, PAGE_KERNEL); + ClearPageXpfoUnmapped(page); + } + spin_unlock(>xpfo_lock); +} + void xpfo_kmap(void *kaddr, struct page *page) { if (!static_branch_unlikely(_inited)) @@ -127,17 +137,12 @@ void xpfo_kmap(void *kaddr, struct page *page) if (!PageXpfoUser(page)) return; - spin_lock(>xpfo_lock); - /* * The page was previously allocated to user space, so map it back * into the kernel. No TLB flush required. */ - if ((atomic_inc_return(>xpfo_mapcount) == 1) && - TestClearPageXpfoUnmapped(page)) - set_kpte(kaddr, page, PAGE_KERNEL); - - spin_unlock(>xpfo_lock); + if (atomic_inc_return(>xpfo_mapcount) == 1) + xpfo_do_map(kaddr, page); } EXPORT_SYMBOL(xpfo_kmap); @@ -204,3 +209,34 @@ void xpfo_temp_unmap(const void *addr, size_t size, void **mapping, kunmap_atomic(mapping[i]); } EXPORT_SYMBOL(xpfo_temp_unmap); + +bool xpfo_spurious_fault(unsigned long addr) +{ + struct page *page; + bool spurious; + int mapcount; + + if (!static_branch_unlikely(_inited)) + return false; + + /* XXX Is this sufficient to guard against calling virt_to_page() on a +* virtual address that has no corresponding struct page? */ + if (!virt_addr_valid(addr)) + return false; + + page = virt_to_page(addr); + mapcount =
Re: kernel BUG at kernel/cred.c:825!
On 1/7/19 11:18 AM, Dave Jones wrote: [ 53.980701] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory [ 53.981216] NFSD: starting 45-second grace period (net f098) [ 54.006802] CRED: Invalid credentials [ 54.006880] CRED: At ./include/linux/cred.h:253 [ 54.006899] CRED: Specified credentials: 5daa4529 [ 54.006916] CRED: ->magic=0, put_addr= (null) [ 54.006927] CRED: ->usage=1, subscr=0 [ 54.006935] CRED: ->*uid = { 0,0,0,0 } [ 54.006944] CRED: ->*gid = { 0,0,0,0 } [ 54.006954] [ cut here ] [ 54.006964] kernel BUG at kernel/cred.c:825! [ 54.006977] invalid opcode: [#1] SMP RIP: __invalid_creds+0x48/0x50 [ 54.006987] CPU: 2 PID: 814 Comm: mount.nfs Tainted: GW 5.0.0-rc1-backup+ #1 [ 54.006997] Hardware name: ASUS All Series/Z97-DELUXE, BIOS 2602 08/18/2015 [ 54.007171] RIP: 0010:__invalid_creds+0x48/0x50 [ 54.007184] Code: 44 89 e2 48 89 ee 48 c7 c7 37 3e 53 ba e8 f7 8f 03 00 48 c7 c6 49 3e 53 ba 48 89 df 65 48 8b 14 25 80 4e 01 00 e8 48 fd ff ff <0f> 0b 66 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 49 89 fe [ 54.007207] RSP: 0018:c9e33a30 EFLAGS: 00010286 [ 54.007219] RAX: 001a RBX: ba960300 RCX: 0006 [ 54.007234] RDX: RSI: 8884276f8818 RDI: 88842f895710 [ 54.007246] RBP: ba5274c3 R08: 0001 R09: [ 54.007254] R10: c9e33a50 R11: R12: 00fd [ 54.007261] R13: 88842c1a6a08 R14: ba960300 R15: c9e33d60 [ 54.007269] FS: 7f73770cb140() GS:88842f88() knlGS: [ 54.007277] CS: 0010 DS: ES: CR0: 80050033 [ 54.007283] CR2: 5557d17d1000 CR3: 0004122ba006 CR4: 001606e0 [ 54.007359] Call Trace: [ 54.007366] nfs4_discover_server_trunking+0x286/0x310 [ 54.007376] nfs4_init_client+0xe8/0x260 [ 54.007389] ? nfs_get_client+0x519/0x610 [ 54.007401] ? _raw_spin_unlock+0x24/0x30 [ 54.007412] ? nfs_get_client+0x519/0x610 [ 54.007424] nfs4_set_client+0xb8/0x100 [ 54.007439] nfs4_create_server+0xfe/0x270 [ 54.007451] ? pcpu_alloc+0x611/0x8a0 [ 54.007462] nfs4_remote_mount+0x28/0x50 [ 54.007474] mount_fs+0xf/0x80 [ 54.007487] vfs_kern_mount+0x62/0x160 [ 54.007498] nfs_do_root_mount+0x7f/0xc0 [ 54.007510] nfs4_try_mount+0x3f/0xc0 [ 54.007521] ? get_nfs_version+0x11/0x50 [ 54.007536] nfs_fs_mount+0x61b/0xbd0 [ 54.007548] ? rcu_read_lock_sched_held+0x66/0x70 [ 54.007560] ? nfs_clone_super+0x70/0x70 [ 54.007571] ? nfs_destroy_inode+0x20/0x20 [ 54.007585] ? mount_fs+0xf/0x80 [ 54.007595] mount_fs+0xf/0x80 [ 54.007606] vfs_kern_mount+0x62/0x160 [ 54.007618] do_mount+0x1d1/0xd40 [ 54.007631] ? copy_mount_options+0xd2/0x170 [ 54.007643] ksys_mount+0x7e/0xd0 [ 54.007654] __x64_sys_mount+0x21/0x30 [ 54.007665] do_syscall_64+0x6d/0x660 [ 54.007677] ? trace_hardirqs_off_thunk+0x1a/0x1c [ 54.007690] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 54.007702] RIP: 0033:0x7f7377e97a1a [ 54.007713] Code: 48 8b 0d 71 e4 0b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3e e4 0b 00 f7 d8 64 89 01 48 [ 54.007736] RSP: 002b:7ffc73d9b4a8 EFLAGS: 0202 ORIG_RAX: 00a5 [ 54.007751] RAX: ffda RBX: RCX: 7f7377e97a1a [ 54.007764] RDX: 5632beb51b50 RSI: 5632beb51b70 RDI: 5632beb53880 [ 54.007780] RBP: 7ffc73d9b600 R08: 5632beb556b0 R09: 33643a303036343a [ 54.007794] R10: 0c00 R11: 0202 R12: 7ffc73d9b600 [ 54.007807] R13: 5632beb548a0 R14: 001c R15: 7ffc73d9b510 Fedora still seems to be hitting this as of -rc2
Re: [RFC PATCH v7 00/16] Add support for eXclusive Page Frame Ownership
On 1/10/19 1:09 PM, Khalid Aziz wrote: I am continuing to build on the work Juerg, Tycho and Julian have done on XPFO. After the last round of updates, we were seeing very significant performance penalties when stale TLB entries were flushed actively after an XPFO TLB update. Benchmark for measuring performance is kernel build using parallel make. To get full protection from ret2dir attackes, we must flush stale TLB entries. Performance penalty from flushing stale TLB entries goes up as the number of cores goes up. On a desktop class machine with only 4 cores, enabling TLB flush for stale entries causes system time for "make -j4" to go up by a factor of 2.614x but on a larger machine with 96 cores, system time with "make -j60" goes up by a factor of 26.366x! I have been working on reducing this performance penalty. I implemented a solution to reduce performance penalty and that has had large impact. When XPFO code flushes stale TLB entries, it does so for all CPUs on the system which may include CPUs that may not have any matching TLB entries or may never be scheduled to run the userspace task causing TLB flush. Problem is made worse by the fact that if number of entries being flushed exceeds tlb_single_page_flush_ceiling, it results in a full TLB flush on every CPU. A rogue process can launch a ret2dir attack only from a CPU that has dual mapping for its pages in physmap in its TLB. We can hence defer TLB flush on a CPU until a process that would have caused a TLB flush is scheduled on that CPU. I have added a cpumask to task_struct which is then used to post pending TLB flush on CPUs other than the one a process is running on. This cpumask is checked when a process migrates to a new CPU and TLB is flushed at that time. I measured system time for parallel make with unmodified 4.20 kernel, 4.20 with XPFO patches before this optimization and then again after applying this optimization. Here are the results: Hardware: 96-core Intel Xeon Platinum 8160 CPU @ 2.10GHz, 768 GB RAM make -j60 all 4.20915.183s 4.20+XPFO 24129.354s 26.366x 4.20+XPFO+Deferred flush1216.987s1.330xx Hardware: 4-core Intel Core i5-3550 CPU @ 3.30GHz, 8G RAM make -j4 all 4.20607.671s 4.20+XPFO 1588.646s 2.614x 4.20+XPFO+Deferred flush794.473s1.307xx 30+% overhead is still very high and there is room for improvement. Dave Hansen had suggested batch updating TLB entries and Tycho had created an initial implementation but I have not been able to get that to work correctly. I am still working on it and I suspect we will see a noticeable improvement in performance with that. In the code I added, I post a pending full TLB flush to all other CPUs even when number of TLB entries being flushed on current CPU does not exceed tlb_single_page_flush_ceiling. There has to be a better way to do this. I just haven't found an efficient way to implemented delayed limited TLB flush on other CPUs. I am not entirely sure if switch_mm_irqs_off() is indeed the right place to perform the pending TLB flush for a CPU. Any feedback on that will be very helpful. Delaying full TLB flushes on other CPUs seems to help tremendously, so if there is a better way to implement the same thing than what I have done in patch 16, I am open to ideas. Performance with this patch set is good enough to use these as starting point for further refinement before we merge it into main kernel, hence RFC. Since not flushing stale TLB entries creates a false sense of security, I would recommend making TLB flush mandatory and eliminate the "xpfotlbflush" kernel parameter (patch "mm, x86: omit TLB flushing by default for XPFO page table modifications"). What remains to be done beyond this patch series: 1. Performance improvements 2. Remove xpfotlbflush parameter 3. Re-evaluate the patch "arm64/mm: Add support for XPFO to swiotlb" from Juerg. I dropped it for now since swiotlb code for ARM has changed a lot in 4.20. 4. Extend the patch "xpfo, mm: Defer TLB flushes for non-current CPUs" to other architectures besides x86. - Juerg Haefliger (5): mm, x86: Add support for eXclusive Page Frame Ownership (XPFO) swiotlb: Map the buffer if it was unmapped by XPFO arm64/mm: Add support for XPFO arm64/mm, xpfo: temporarily map dcache regions lkdtm: Add test for XPFO Julian Stecklina (4): mm, x86: omit TLB flushing by default for XPFO page table modifications xpfo, mm: remove dependency on CONFIG_PAGE_EXTENSION xpfo, mm: optimize spinlock usage in xpfo_kunmap EXPERIMENTAL: xpfo, mm: optimize spin lock usage in xpfo_kmap Khalid Aziz (2): xpfo, mm: Fix hang when booting with "xpfotlbflush" xpfo, mm: Defer TLB flushes for non-current CPUs (x86 only) Tycho Andersen (5): mm: add MAP_HUGETLB support to vm_mmap x86: always set IF before
Re: [PATCH 14/14] staging: android: ion: Add UNMAPPED heap type and helper
On 1/15/19 10:43 AM, Laura Abbott wrote: On 1/15/19 7:58 AM, Andrew F. Davis wrote: On 1/14/19 8:32 PM, Laura Abbott wrote: On 1/11/19 10:05 AM, Andrew F. Davis wrote: The "unmapped" heap is very similar to the carveout heap except the backing memory is presumed to be unmappable by the host, in my specific case due to firewalls. This memory can still be allocated from and used by devices that do have access to the backing memory. Based originally on the secure/unmapped heap from Linaro for the OP-TEE SDP implementation, this was re-written to match the carveout heap helper code. Suggested-by: Etienne Carriere Signed-off-by: Andrew F. Davis --- drivers/staging/android/ion/Kconfig | 10 ++ drivers/staging/android/ion/Makefile | 1 + drivers/staging/android/ion/ion.h | 16 +++ .../staging/android/ion/ion_unmapped_heap.c | 123 ++ drivers/staging/android/uapi/ion.h | 3 + 5 files changed, 153 insertions(+) create mode 100644 drivers/staging/android/ion/ion_unmapped_heap.c diff --git a/drivers/staging/android/ion/Kconfig b/drivers/staging/android/ion/Kconfig index 0fdda6f62953..a117b8b91b14 100644 --- a/drivers/staging/android/ion/Kconfig +++ b/drivers/staging/android/ion/Kconfig @@ -42,3 +42,13 @@ config ION_CMA_HEAP Choose this option to enable CMA heaps with Ion. This heap is backed by the Contiguous Memory Allocator (CMA). If your system has these regions, you should say Y here. + +config ION_UNMAPPED_HEAP + bool "ION unmapped heap support" + depends on ION + help + Choose this option to enable UNMAPPED heaps with Ion. This heap is + backed in specific memory pools, carveout from the Linux memory. + Unlike carveout heaps these are assumed to be not mappable by + kernel or user-space. + Unless you know your system has these regions, you should say N here. diff --git a/drivers/staging/android/ion/Makefile b/drivers/staging/android/ion/Makefile index 17f3a7569e3d..c71a1f3de581 100644 --- a/drivers/staging/android/ion/Makefile +++ b/drivers/staging/android/ion/Makefile @@ -4,3 +4,4 @@ obj-$(CONFIG_ION_SYSTEM_HEAP) += ion_system_heap.o ion_page_pool.o obj-$(CONFIG_ION_CARVEOUT_HEAP) += ion_carveout_heap.o obj-$(CONFIG_ION_CHUNK_HEAP) += ion_chunk_heap.o obj-$(CONFIG_ION_CMA_HEAP) += ion_cma_heap.o +obj-$(CONFIG_ION_UNMAPPED_HEAP) += ion_unmapped_heap.o diff --git a/drivers/staging/android/ion/ion.h b/drivers/staging/android/ion/ion.h index 97b2876b165a..ce74332018ba 100644 --- a/drivers/staging/android/ion/ion.h +++ b/drivers/staging/android/ion/ion.h @@ -341,4 +341,20 @@ static inline struct ion_heap *ion_chunk_heap_create(phys_addr_t base, size_t si } #endif +#ifdef CONFIG_ION_UNMAPPED_HEAP +/** + * ion_unmapped_heap_create + * @base: base address of carveout memory + * @size: size of carveout memory region + * + * Creates an unmapped ion_heap using the passed in data + */ +struct ion_heap *ion_unmapped_heap_create(phys_addr_t base, size_t size); +#else +struct ion_heap *ion_unmapped_heap_create(phys_addr_t base, size_t size) +{ + return ERR_PTR(-ENODEV); +} +#endif + #endif /* _ION_H */ diff --git a/drivers/staging/android/ion/ion_unmapped_heap.c b/drivers/staging/android/ion/ion_unmapped_heap.c new file mode 100644 index ..7602b659c2ec --- /dev/null +++ b/drivers/staging/android/ion/ion_unmapped_heap.c @@ -0,0 +1,123 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * ION Memory Allocator unmapped heap helper + * + * Copyright (C) 2015-2016 Texas Instruments Incorporated - http://www.ti.com/ + * Andrew F. Davis + * + * ION "unmapped" heaps are physical memory heaps not by default mapped into + * a virtual address space. The buffer owner can explicitly request kernel + * space mappings but the underlying memory may still not be accessible for + * various reasons, such as firewalls. + */ + +#include +#include +#include +#include + +#include "ion.h" + +#define ION_UNMAPPED_ALLOCATE_FAIL -1 + +struct ion_unmapped_heap { + struct ion_heap heap; + struct gen_pool *pool; +}; + +static phys_addr_t ion_unmapped_allocate(struct ion_heap *heap, + unsigned long size) +{ + struct ion_unmapped_heap *unmapped_heap = + container_of(heap, struct ion_unmapped_heap, heap); + unsigned long offset; + + offset = gen_pool_alloc(unmapped_heap->pool, size); + if (!offset) + return ION_UNMAPPED_ALLOCATE_FAIL; + + return offset; +} + +static void ion_unmapped_free(struct ion_heap *heap, phys_addr_t addr, + unsigned long size) +{ + struct ion_unmapped_heap *unmapped_heap = + container_of(heap, struct ion_unmapped_heap, heap); + + gen_pool_free(unmapped_heap->pool, addr, size); +} + +static int ion_unmapped_heap_allocate(struct ion_heap *heap, + struct ion_buffer *buf
Re: [PATCH 13/14] staging: android: ion: Do not sync CPU cache on map/unmap
On 1/15/19 10:38 AM, Andrew F. Davis wrote: On 1/15/19 11:45 AM, Liam Mark wrote: On Tue, 15 Jan 2019, Andrew F. Davis wrote: On 1/14/19 11:13 AM, Liam Mark wrote: On Fri, 11 Jan 2019, Andrew F. Davis wrote: Buffers may not be mapped from the CPU so skip cache maintenance here. Accesses from the CPU to a cached heap should be bracketed with {begin,end}_cpu_access calls so maintenance should not be needed anyway. Signed-off-by: Andrew F. Davis --- drivers/staging/android/ion/ion.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c index 14e48f6eb734..09cb5a8e2b09 100644 --- a/drivers/staging/android/ion/ion.c +++ b/drivers/staging/android/ion/ion.c @@ -261,8 +261,8 @@ static struct sg_table *ion_map_dma_buf(struct dma_buf_attachment *attachment, table = a->table; - if (!dma_map_sg(attachment->dev, table->sgl, table->nents, - direction)) + if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents, + direction, DMA_ATTR_SKIP_CPU_SYNC)) Unfortunately I don't think you can do this for a couple reasons. You can't rely on {begin,end}_cpu_access calls to do cache maintenance. If the calls to {begin,end}_cpu_access were made before the call to dma_buf_attach then there won't have been a device attached so the calls to {begin,end}_cpu_access won't have done any cache maintenance. That should be okay though, if you have no attachments (or all attachments are IO-coherent) then there is no need for cache maintenance. Unless you mean a sequence where a non-io-coherent device is attached later after data has already been written. Does that sequence need supporting? Yes, but also I think there are cases where CPU access can happen before in Android, but I will focus on later for now. DMA-BUF doesn't have to allocate the backing memory until map_dma_buf() time, and that should only happen after all the devices have attached so it can know where to put the buffer. So we shouldn't expect any CPU access to buffers before all the devices are attached and mapped, right? Here is an example where CPU access can happen later in Android. Camera device records video -> software post processing -> video device (who does compression of raw data) and writes to a file In this example assume the buffer is cached and the devices are not IO-coherent (quite common). This is the start of the problem, having cached mappings of memory that is also being accessed non-coherently is going to cause issues one way or another. On top of the speculative cache fills that have to be constantly fought back against with CMOs like below; some coherent interconnects behave badly when you mix coherent and non-coherent access (snoop filters get messed up). The solution is to either always have the addresses marked non-coherent (like device memory, no-map carveouts), or if you really want to use regular system memory allocated at runtime, then all cached mappings of it need to be dropped, even the kernel logical address (area as painful as that would be). I agree it's broken, hence my desire to remove it :) The other problem is that uncached buffers are being used for performance reason so anything that would involve getting rid of the logical address would probably negate any performance benefit. ION buffer is allocated. //Camera device records video dma_buf_attach dma_map_attachment (buffer needs to be cleaned) Why does the buffer need to be cleaned here? I just got through reading the thread linked by Laura in the other reply. I do like +Brian's suggestion of tracking if the buffer has had CPU access since the last time and only flushing the cache if it has. As unmapped heaps never get CPU mapped this would never be the case for unmapped heaps, it solves my problem. [camera device writes to buffer] dma_buf_unmap_attachment (buffer needs to be invalidated) It doesn't know there will be any further CPU access, it could get freed after this for all we know, the invalidate can be saved until the CPU requests access again. dma_buf_detach (device cannot stay attached because it is being sent down the pipeline and Camera doesn't know the end of the use case) This seems like a broken use-case, I understand the desire to keep everything as modular as possible and separate the steps, but at this point no one owns this buffers backing memory, not the CPU or any device. I would go as far as to say DMA-BUF should be free now to de-allocate the backing storage if it wants, that way it could get ready for the next attachment, which may change the required backing memory completely. All devices should attach before the first mapping, and only let go after the task is complete, otherwise this buffers data needs copied off to a different location or the CPU needs to take ownership in-between. Maybe it's broken but it's the status quo and we spent a good
Re: [PATCH 00/14] Misc ION cleanups and adding unmapped heap
On 1/15/19 9:47 AM, Andrew F. Davis wrote: On 1/14/19 8:39 PM, Laura Abbott wrote: On 1/11/19 10:05 AM, Andrew F. Davis wrote: Hello all, This is a set of (hopefully) non-controversial cleanups for the ION framework and current set of heaps. These were found as I start to familiarize myself with the framework to help in whatever way I can in getting all this up to the standards needed for de-staging. I would like to get some ideas of what is left to work on to get ION out of staging. Has there been some kind of agreement on what ION should eventually end up being? To me it looks like it is being whittled away at to it's most core functions. To me that is looking like being a DMA-BUF user-space front end, simply advertising available memory backings in a system and providing allocations as DMA-BUF handles. If this is the case then it looks close to being ready to me at least, but I would love to hear any other opinions and concerns. Yes, at this point the only functionality that people are really depending on is the ability to allocate a dma_buf easily from userspace. Back to this patchset, the last patch may be a bit different than the others, it adds an unmapped heaps type and creation helper. I wanted to get this in to show off another heap type and maybe some issues we may have with the current ION framework. The unmapped heap is used when the backing memory should not (or cannot) be touched. Currently this kind of heap is used for firewalled secure memory that can be allocated like normal heap memory but only used by secure devices (OP-TEE, crypto HW, etc). It is basically just copied from the "carveout" heap type with the only difference being it is not mappable to userspace and we do not clear the memory (as we should not map it either). So should this really be a new heap type? Or maybe advertised as a carveout heap but with an additional allocation flag? Perhaps we do away with "types" altogether and just have flags, coherent/non-coherent, mapped/unmapped, etc. Maybe more thinking will be needed afterall.. So the cleanup looks okay (I need to finish reviewing) but I'm not a fan of adding another heaptype without solving the problem of adding some sort of devicetree binding or other method of allocating and placing Ion heaps. That plus uncached buffers are one of the big open problems that need to be solved for destaging Ion. See https://lore.kernel.org/lkml/20181120164636.jcw7li2uaa3cmwc3@DESKTOP-E1NTVVP.localdomain/ for some background on that problem. I'm under the impression that adding heaps like carveouts/chunk will be rather system specific and so do not lend themselves well to a universal DT style exporter. For instance a carveout memory space can be reported by a device at runtime, then the driver managing that device should go and use the carveout heap helpers to export that heap. If this is the case then I'm not sure it is a problem for the ION core framework to solve, but rather the users of it to figure out how best to create the various heaps. All Ion needs to do is allow exporting and advertising them IMHO. I think it is a problem for the Ion core framework to take care of. Ion is useless if you don't actually have the heaps. Nobody has actually gotten a full Ion solution end-to-end with a carveout heap working in mainline because any proposals have been rejected. I think we need at least one example in mainline of how creating a carveout heap would work. Thanks for the background thread link, I've been looking for some info on current status of all this and "ion" is a bit hard to search the lists for. The core reason for posting this cleanup series is to throw my hat into the ring of all this Ion work and start getting familiar with the pending issues. The last two patches are not all that important to get in right now. In that thread you linked above, it seems we may have arrived at a similar problem for different reasons. I think the root issue is the Ion core makes too many assumptions about the heap memory. My proposal would be to allow the heap exporters more control over the DMA-BUF ops, maybe even going as far as letting them provide their own complete struct dma_buf_ops. Let me give an example where I think this is going to be useful. We have the classic constraint solving problem on our SoCs. Our SoCs are full of various coherent and non-coherent devices, some require contiguous memory allocations, others have in-line IOMMUs so can operate on non-contiguous, etc.. DMA-BUF has a solution designed in for this we can use, namely allocation at map time after all the attachments have come in. The checking of each attached device to find the right backing memory is something the DMA-BUF exporter has to do, and so this SoC specific logic would have to be added to each exporting framework (DRM, V4L2, etc), unless we have one unified system exporter everyone uses, Ion. That's how dmabuf is supposed to work in theory but in
Re: [PATCH 14/14] staging: android: ion: Add UNMAPPED heap type and helper
On 1/15/19 7:58 AM, Andrew F. Davis wrote: On 1/14/19 8:32 PM, Laura Abbott wrote: On 1/11/19 10:05 AM, Andrew F. Davis wrote: The "unmapped" heap is very similar to the carveout heap except the backing memory is presumed to be unmappable by the host, in my specific case due to firewalls. This memory can still be allocated from and used by devices that do have access to the backing memory. Based originally on the secure/unmapped heap from Linaro for the OP-TEE SDP implementation, this was re-written to match the carveout heap helper code. Suggested-by: Etienne Carriere Signed-off-by: Andrew F. Davis --- drivers/staging/android/ion/Kconfig | 10 ++ drivers/staging/android/ion/Makefile | 1 + drivers/staging/android/ion/ion.h | 16 +++ .../staging/android/ion/ion_unmapped_heap.c | 123 ++ drivers/staging/android/uapi/ion.h | 3 + 5 files changed, 153 insertions(+) create mode 100644 drivers/staging/android/ion/ion_unmapped_heap.c diff --git a/drivers/staging/android/ion/Kconfig b/drivers/staging/android/ion/Kconfig index 0fdda6f62953..a117b8b91b14 100644 --- a/drivers/staging/android/ion/Kconfig +++ b/drivers/staging/android/ion/Kconfig @@ -42,3 +42,13 @@ config ION_CMA_HEAP Choose this option to enable CMA heaps with Ion. This heap is backed by the Contiguous Memory Allocator (CMA). If your system has these regions, you should say Y here. + +config ION_UNMAPPED_HEAP + bool "ION unmapped heap support" + depends on ION + help + Choose this option to enable UNMAPPED heaps with Ion. This heap is + backed in specific memory pools, carveout from the Linux memory. + Unlike carveout heaps these are assumed to be not mappable by + kernel or user-space. + Unless you know your system has these regions, you should say N here. diff --git a/drivers/staging/android/ion/Makefile b/drivers/staging/android/ion/Makefile index 17f3a7569e3d..c71a1f3de581 100644 --- a/drivers/staging/android/ion/Makefile +++ b/drivers/staging/android/ion/Makefile @@ -4,3 +4,4 @@ obj-$(CONFIG_ION_SYSTEM_HEAP) += ion_system_heap.o ion_page_pool.o obj-$(CONFIG_ION_CARVEOUT_HEAP) += ion_carveout_heap.o obj-$(CONFIG_ION_CHUNK_HEAP) += ion_chunk_heap.o obj-$(CONFIG_ION_CMA_HEAP) += ion_cma_heap.o +obj-$(CONFIG_ION_UNMAPPED_HEAP) += ion_unmapped_heap.o diff --git a/drivers/staging/android/ion/ion.h b/drivers/staging/android/ion/ion.h index 97b2876b165a..ce74332018ba 100644 --- a/drivers/staging/android/ion/ion.h +++ b/drivers/staging/android/ion/ion.h @@ -341,4 +341,20 @@ static inline struct ion_heap *ion_chunk_heap_create(phys_addr_t base, size_t si } #endif +#ifdef CONFIG_ION_UNMAPPED_HEAP +/** + * ion_unmapped_heap_create + * @base: base address of carveout memory + * @size: size of carveout memory region + * + * Creates an unmapped ion_heap using the passed in data + */ +struct ion_heap *ion_unmapped_heap_create(phys_addr_t base, size_t size); +#else +struct ion_heap *ion_unmapped_heap_create(phys_addr_t base, size_t size) +{ + return ERR_PTR(-ENODEV); +} +#endif + #endif /* _ION_H */ diff --git a/drivers/staging/android/ion/ion_unmapped_heap.c b/drivers/staging/android/ion/ion_unmapped_heap.c new file mode 100644 index ..7602b659c2ec --- /dev/null +++ b/drivers/staging/android/ion/ion_unmapped_heap.c @@ -0,0 +1,123 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * ION Memory Allocator unmapped heap helper + * + * Copyright (C) 2015-2016 Texas Instruments Incorporated - http://www.ti.com/ + * Andrew F. Davis + * + * ION "unmapped" heaps are physical memory heaps not by default mapped into + * a virtual address space. The buffer owner can explicitly request kernel + * space mappings but the underlying memory may still not be accessible for + * various reasons, such as firewalls. + */ + +#include +#include +#include +#include + +#include "ion.h" + +#define ION_UNMAPPED_ALLOCATE_FAIL -1 + +struct ion_unmapped_heap { + struct ion_heap heap; + struct gen_pool *pool; +}; + +static phys_addr_t ion_unmapped_allocate(struct ion_heap *heap, + unsigned long size) +{ + struct ion_unmapped_heap *unmapped_heap = + container_of(heap, struct ion_unmapped_heap, heap); + unsigned long offset; + + offset = gen_pool_alloc(unmapped_heap->pool, size); + if (!offset) + return ION_UNMAPPED_ALLOCATE_FAIL; + + return offset; +} + +static void ion_unmapped_free(struct ion_heap *heap, phys_addr_t addr, + unsigned long size) +{ + struct ion_unmapped_heap *unmapped_heap = + container_of(heap, struct ion_unmapped_heap, heap); + + gen_pool_free(unmapped_heap->pool, addr, size); +} + +static int ion_unmapped_heap_allocate(struct ion_heap *heap, + struct ion_buffer *buffer, + unsigned long si
Re: [PATCH 00/14] Misc ION cleanups and adding unmapped heap
On 1/11/19 10:05 AM, Andrew F. Davis wrote: Hello all, This is a set of (hopefully) non-controversial cleanups for the ION framework and current set of heaps. These were found as I start to familiarize myself with the framework to help in whatever way I can in getting all this up to the standards needed for de-staging. I would like to get some ideas of what is left to work on to get ION out of staging. Has there been some kind of agreement on what ION should eventually end up being? To me it looks like it is being whittled away at to it's most core functions. To me that is looking like being a DMA-BUF user-space front end, simply advertising available memory backings in a system and providing allocations as DMA-BUF handles. If this is the case then it looks close to being ready to me at least, but I would love to hear any other opinions and concerns. Yes, at this point the only functionality that people are really depending on is the ability to allocate a dma_buf easily from userspace. Back to this patchset, the last patch may be a bit different than the others, it adds an unmapped heaps type and creation helper. I wanted to get this in to show off another heap type and maybe some issues we may have with the current ION framework. The unmapped heap is used when the backing memory should not (or cannot) be touched. Currently this kind of heap is used for firewalled secure memory that can be allocated like normal heap memory but only used by secure devices (OP-TEE, crypto HW, etc). It is basically just copied from the "carveout" heap type with the only difference being it is not mappable to userspace and we do not clear the memory (as we should not map it either). So should this really be a new heap type? Or maybe advertised as a carveout heap but with an additional allocation flag? Perhaps we do away with "types" altogether and just have flags, coherent/non-coherent, mapped/unmapped, etc. Maybe more thinking will be needed afterall.. So the cleanup looks okay (I need to finish reviewing) but I'm not a fan of adding another heaptype without solving the problem of adding some sort of devicetree binding or other method of allocating and placing Ion heaps. That plus uncached buffers are one of the big open problems that need to be solved for destaging Ion. See https://lore.kernel.org/lkml/20181120164636.jcw7li2uaa3cmwc3@DESKTOP-E1NTVVP.localdomain/ for some background on that problem. Thanks, Laura Thanks, Andrew Andrew F. Davis (14): staging: android: ion: Add proper header information staging: android: ion: Remove empty ion_ioctl_dir() function staging: android: ion: Merge ion-ioctl.c into ion.c staging: android: ion: Remove leftover comment staging: android: ion: Remove struct ion_platform_heap staging: android: ion: Fixup some white-space issues staging: android: ion: Sync comment docs with struct ion_buffer staging: android: ion: Remove base from ion_carveout_heap staging: android: ion: Remove base from ion_chunk_heap staging: android: ion: Remove unused headers staging: android: ion: Allow heap name to be null staging: android: ion: Declare helpers for carveout and chunk heaps staging: android: ion: Do not sync CPU cache on map/unmap staging: android: ion: Add UNMAPPED heap type and helper drivers/staging/android/ion/Kconfig | 10 ++ drivers/staging/android/ion/Makefile | 3 +- drivers/staging/android/ion/ion-ioctl.c | 98 -- drivers/staging/android/ion/ion.c | 93 +++-- drivers/staging/android/ion/ion.h | 87 - .../staging/android/ion/ion_carveout_heap.c | 19 +-- drivers/staging/android/ion/ion_chunk_heap.c | 25 ++-- drivers/staging/android/ion/ion_cma_heap.c| 6 +- drivers/staging/android/ion/ion_heap.c| 8 +- drivers/staging/android/ion/ion_page_pool.c | 2 +- drivers/staging/android/ion/ion_system_heap.c | 8 +- .../staging/android/ion/ion_unmapped_heap.c | 123 ++ drivers/staging/android/uapi/ion.h| 3 + 13 files changed, 307 insertions(+), 178 deletions(-) delete mode 100644 drivers/staging/android/ion/ion-ioctl.c create mode 100644 drivers/staging/android/ion/ion_unmapped_heap.c
Re: [PATCH 14/14] staging: android: ion: Add UNMAPPED heap type and helper
On 1/11/19 10:05 AM, Andrew F. Davis wrote: The "unmapped" heap is very similar to the carveout heap except the backing memory is presumed to be unmappable by the host, in my specific case due to firewalls. This memory can still be allocated from and used by devices that do have access to the backing memory. Based originally on the secure/unmapped heap from Linaro for the OP-TEE SDP implementation, this was re-written to match the carveout heap helper code. Suggested-by: Etienne Carriere Signed-off-by: Andrew F. Davis --- drivers/staging/android/ion/Kconfig | 10 ++ drivers/staging/android/ion/Makefile | 1 + drivers/staging/android/ion/ion.h | 16 +++ .../staging/android/ion/ion_unmapped_heap.c | 123 ++ drivers/staging/android/uapi/ion.h| 3 + 5 files changed, 153 insertions(+) create mode 100644 drivers/staging/android/ion/ion_unmapped_heap.c diff --git a/drivers/staging/android/ion/Kconfig b/drivers/staging/android/ion/Kconfig index 0fdda6f62953..a117b8b91b14 100644 --- a/drivers/staging/android/ion/Kconfig +++ b/drivers/staging/android/ion/Kconfig @@ -42,3 +42,13 @@ config ION_CMA_HEAP Choose this option to enable CMA heaps with Ion. This heap is backed by the Contiguous Memory Allocator (CMA). If your system has these regions, you should say Y here. + +config ION_UNMAPPED_HEAP + bool "ION unmapped heap support" + depends on ION + help + Choose this option to enable UNMAPPED heaps with Ion. This heap is + backed in specific memory pools, carveout from the Linux memory. + Unlike carveout heaps these are assumed to be not mappable by + kernel or user-space. + Unless you know your system has these regions, you should say N here. diff --git a/drivers/staging/android/ion/Makefile b/drivers/staging/android/ion/Makefile index 17f3a7569e3d..c71a1f3de581 100644 --- a/drivers/staging/android/ion/Makefile +++ b/drivers/staging/android/ion/Makefile @@ -4,3 +4,4 @@ obj-$(CONFIG_ION_SYSTEM_HEAP) += ion_system_heap.o ion_page_pool.o obj-$(CONFIG_ION_CARVEOUT_HEAP) += ion_carveout_heap.o obj-$(CONFIG_ION_CHUNK_HEAP) += ion_chunk_heap.o obj-$(CONFIG_ION_CMA_HEAP) += ion_cma_heap.o +obj-$(CONFIG_ION_UNMAPPED_HEAP) += ion_unmapped_heap.o diff --git a/drivers/staging/android/ion/ion.h b/drivers/staging/android/ion/ion.h index 97b2876b165a..ce74332018ba 100644 --- a/drivers/staging/android/ion/ion.h +++ b/drivers/staging/android/ion/ion.h @@ -341,4 +341,20 @@ static inline struct ion_heap *ion_chunk_heap_create(phys_addr_t base, size_t si } #endif +#ifdef CONFIG_ION_UNMAPPED_HEAP +/** + * ion_unmapped_heap_create + * @base: base address of carveout memory + * @size: size of carveout memory region + * + * Creates an unmapped ion_heap using the passed in data + */ +struct ion_heap *ion_unmapped_heap_create(phys_addr_t base, size_t size); +#else +struct ion_heap *ion_unmapped_heap_create(phys_addr_t base, size_t size) +{ + return ERR_PTR(-ENODEV); +} +#endif + #endif /* _ION_H */ diff --git a/drivers/staging/android/ion/ion_unmapped_heap.c b/drivers/staging/android/ion/ion_unmapped_heap.c new file mode 100644 index ..7602b659c2ec --- /dev/null +++ b/drivers/staging/android/ion/ion_unmapped_heap.c @@ -0,0 +1,123 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * ION Memory Allocator unmapped heap helper + * + * Copyright (C) 2015-2016 Texas Instruments Incorporated - http://www.ti.com/ + * Andrew F. Davis + * + * ION "unmapped" heaps are physical memory heaps not by default mapped into + * a virtual address space. The buffer owner can explicitly request kernel + * space mappings but the underlying memory may still not be accessible for + * various reasons, such as firewalls. + */ + +#include +#include +#include +#include + +#include "ion.h" + +#define ION_UNMAPPED_ALLOCATE_FAIL -1 + +struct ion_unmapped_heap { + struct ion_heap heap; + struct gen_pool *pool; +}; + +static phys_addr_t ion_unmapped_allocate(struct ion_heap *heap, +unsigned long size) +{ + struct ion_unmapped_heap *unmapped_heap = + container_of(heap, struct ion_unmapped_heap, heap); + unsigned long offset; + + offset = gen_pool_alloc(unmapped_heap->pool, size); + if (!offset) + return ION_UNMAPPED_ALLOCATE_FAIL; + + return offset; +} + +static void ion_unmapped_free(struct ion_heap *heap, phys_addr_t addr, + unsigned long size) +{ + struct ion_unmapped_heap *unmapped_heap = + container_of(heap, struct ion_unmapped_heap, heap); + + gen_pool_free(unmapped_heap->pool, addr, size); +} + +static int ion_unmapped_heap_allocate(struct ion_heap *heap, + struct ion_buffer *buffer, + unsigned long
Re: [PATCH] tools uapi asm: Update asm-generic/unistd.h
On 1/9/19 5:15 AM, Arnaldo Carvalho de Melo wrote: Em Tue, Jan 08, 2019 at 02:17:58PM -0800, Laura Abbott escreveu: Commit 4e21565b7fd4 ("asm-generic: add kexec_file_load system call to unistd.h") added the system call to the generic header but not to the perf copy resulting a compile failure on aarch64 Humm, that shouldn't happen, i.e. kernel developers don't have to update anything in tools/ as part of their normal kernel development workflows. They can if they wish, but are not required, so the build failure was due to something else or a pre-existing bug where tools/ living stuff used things outside tools/. I think that's the issue, I couldn't reproduce the issue until I installed the 5.0-rc1 headers. This is the invocation of the script (paths trimmed) /tools/perf/arch/arm64/entry/syscalls//mksyscalltbl' 'gcc' 'gcc' /tools /tools/arch/arm64/include/uapi/asm/unistd.h > arch/arm64/include/generated/asm/syscalls.c In tools/perf/arch/arm64/entry/syscalls/mksyscalltbl, $hostcc -I $incpath/include/uapi -o $create_table_exe -x c - This was the gcc command that was failing. tools/arch/arm64/include/uapi/asm/unistd.h includes asm-generic/unistd.h so I think that's what's leaking from the system. When running mksyscalltbl Lemme try this on my Orangi PI zero... - Arnaldo >> BUILDSTDERR: : In function 'main': BUILDSTDERR: :273:44: error: '__NR_kexec_file_load' undeclared (first use in this function) BUILDSTDERR: :273:44: note: each undeclared identifier is reported only once for each function it appears in Fix this by syncing up. Fixes: 4e21565b7fd4 ("asm-generic: add kexec_file_load system call to unistd.h") Signed-off-by: Laura Abbott --- Found this on Fedora when compiling 5.0-rc1, I hadn't seen a patch queued yet. --- tools/include/uapi/asm-generic/unistd.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tools/include/uapi/asm-generic/unistd.h b/tools/include/uapi/asm-generic/unistd.h index c7f3321fbe43..d90127298f12 100644 --- a/tools/include/uapi/asm-generic/unistd.h +++ b/tools/include/uapi/asm-generic/unistd.h @@ -738,9 +738,11 @@ __SYSCALL(__NR_statx, sys_statx) __SC_COMP(__NR_io_pgetevents, sys_io_pgetevents, compat_sys_io_pgetevents) #define __NR_rseq 293 __SYSCALL(__NR_rseq, sys_rseq) +#define __NR_kexec_file_load 294 +__SYSCALL(__NR_kexec_file_load, sys_kexec_file_load) #undef __NR_syscalls -#define __NR_syscalls 294 +#define __NR_syscalls 295 /* * 32 bit systems traditionally used different -- 2.20.1
[PATCH] tools uapi asm: Update asm-generic/unistd.h
Commit 4e21565b7fd4 ("asm-generic: add kexec_file_load system call to unistd.h") added the system call to the generic header but not to the perf copy resulting a compile failure on aarch64 When running mksyscalltbl BUILDSTDERR: : In function 'main': BUILDSTDERR: :273:44: error: '__NR_kexec_file_load' undeclared (first use in this function) BUILDSTDERR: :273:44: note: each undeclared identifier is reported only once for each function it appears in Fix this by syncing up. Fixes: 4e21565b7fd4 ("asm-generic: add kexec_file_load system call to unistd.h") Signed-off-by: Laura Abbott --- Found this on Fedora when compiling 5.0-rc1, I hadn't seen a patch queued yet. --- tools/include/uapi/asm-generic/unistd.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tools/include/uapi/asm-generic/unistd.h b/tools/include/uapi/asm-generic/unistd.h index c7f3321fbe43..d90127298f12 100644 --- a/tools/include/uapi/asm-generic/unistd.h +++ b/tools/include/uapi/asm-generic/unistd.h @@ -738,9 +738,11 @@ __SYSCALL(__NR_statx, sys_statx) __SC_COMP(__NR_io_pgetevents, sys_io_pgetevents, compat_sys_io_pgetevents) #define __NR_rseq 293 __SYSCALL(__NR_rseq, sys_rseq) +#define __NR_kexec_file_load 294 +__SYSCALL(__NR_kexec_file_load, sys_kexec_file_load) #undef __NR_syscalls -#define __NR_syscalls 294 +#define __NR_syscalls 295 /* * 32 bit systems traditionally used different -- 2.20.1
Re: [PATCH] vfio_pci: Add local source directory as include
On 1/7/19 12:58 AM, Michael Ellerman wrote: Laura Abbott writes: Commit 7f92891778df ("vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] subdriver") introduced a trace.h file in the local directory but missed adding the local include path, resulting in compilation failures with tracepoints: In file included from drivers/vfio/pci/trace.h:102, from drivers/vfio/pci/vfio_pci_nvlink2.c:29: ./include/trace/define_trace.h:89:42: fatal error: ./trace.h: No such file or directory #include TRACE_INCLUDE(TRACE_INCLUDE_FILE) Fix this by adjusting the include path. Fixes: 7f92891778df ("vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] subdriver") Signed-off-by: Laura Abbott --- I'd still like to echo my sentiment that this should not be a def_bool. We hit this error on our internal testing and we couldn't even turn off the driver until we fixed this. I assume there's some reason you can't commit a patch to your tree to change it to bool, or turn it off entirely? That would change the SHA which is perhaps reason enough. In general we have far too many options and most of them never get turned off (or on), so it just creates testing/bug surface for not much benefit. This is one that will probably be turned on in all distro kernels for example. But I have no real objection to making it user configurable. Alex I assume you'll merge this fix via the vfio tree? cheers Well now that everything is fixed, we will probably keep it on. My gripe is that when things break like this it's difficult to work around vs. the configuration option at least makes it possible to work around in a fast way. This may also just be a strong preference of mine for working around problems. Thanks, Laura diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile index 9662c063a6b1..08d4676a8495 100644 --- a/drivers/vfio/pci/Makefile +++ b/drivers/vfio/pci/Makefile @@ -1,3 +1,4 @@ +ccflags-y += -I$(src) vfio-pci-y := vfio_pci.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_config.o vfio-pci-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o -- 2.20.1
[PATCH] vfio_pci: Add local source directory as include
Commit 7f92891778df ("vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] subdriver") introduced a trace.h file in the local directory but missed adding the local include path, resulting in compilation failures with tracepoints: In file included from drivers/vfio/pci/trace.h:102, from drivers/vfio/pci/vfio_pci_nvlink2.c:29: ./include/trace/define_trace.h:89:42: fatal error: ./trace.h: No such file or directory #include TRACE_INCLUDE(TRACE_INCLUDE_FILE) Fix this by adjusting the include path. Fixes: 7f92891778df ("vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] subdriver") Signed-off-by: Laura Abbott --- I'd still like to echo my sentiment that this should not be a def_bool. We hit this error on our internal testing and we couldn't even turn off the driver until we fixed this. --- drivers/vfio/pci/Makefile | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile index 9662c063a6b1..08d4676a8495 100644 --- a/drivers/vfio/pci/Makefile +++ b/drivers/vfio/pci/Makefile @@ -1,3 +1,4 @@ +ccflags-y += -I$(src) vfio-pci-y := vfio_pci.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_config.o vfio-pci-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o -- 2.20.1
Re: [PATCH] staging: android: ion: add buffer flag update ioctl
On 1/3/19 5:42 PM, Zengtao (B) wrote: -Original Message- From: Laura Abbott [mailto:labb...@redhat.com] Sent: Thursday, January 03, 2019 6:31 AM To: Zengtao (B) ; sumit.sem...@linaro.org Cc: Greg Kroah-Hartman ; Arve Hjønnevåg ; Todd Kjos ; Martijn Coenen ; Joel Fernandes ; de...@driverdev.osuosl.org; dri-de...@lists.freedesktop.org; linaro-mm-...@lists.linaro.org; linux-kernel@vger.kernel.org Subject: Re: [PATCH] staging: android: ion: add buffer flag update ioctl On 12/23/18 6:47 PM, Zengtao (B) wrote: Hi laura: -Original Message- From: Laura Abbott [mailto:labb...@redhat.com] Sent: Friday, December 21, 2018 4:50 AM To: Zengtao (B) ; sumit.sem...@linaro.org Cc: Greg Kroah-Hartman ; Arve Hjønnevåg ; Todd Kjos ; Martijn Coenen ; Joel Fernandes ; de...@driverdev.osuosl.org; dri-de...@lists.freedesktop.org; linaro-mm-...@lists.linaro.org; linux-kernel@vger.kernel.org Subject: Re: [PATCH] staging: android: ion: add buffer flag update ioctl On 12/19/18 5:39 PM, Zengtao (B) wrote: Hi laura: -Original Message- From: Laura Abbott [mailto:labb...@redhat.com] Sent: Thursday, December 20, 2018 2:10 AM To: Zengtao (B) ; sumit.sem...@linaro.org Cc: Greg Kroah-Hartman ; Arve Hjønnevåg ; Todd Kjos ; Martijn Coenen ; Joel Fernandes ; de...@driverdev.osuosl.org; dri-de...@lists.freedesktop.org; linaro-mm-...@lists.linaro.org; linux-kernel@vger.kernel.org Subject: Re: [PATCH] staging: android: ion: add buffer flag update ioctl On 12/19/18 9:19 AM, Zeng Tao wrote: In some usecases, the buffer cached attribute is not determined at allocation time, it's determined just before the real cpu mapping. And from the memory view of point, a buffer should not have the cached attribute util is really mapped by the cpu. So in this patch, we introduced the new ioctl command to target the requirement. This is racy and error prone. Can you explain more what problem you are trying to solve? My use case is like this: 1. There are two process A and B, A takes case of ion buffer allocation, and pass the buffer fd to B, then B maps and uses it. 2. Process B need to map the buffer with different cached attribute for different use case, for example, if the buffer is used for pure software algorithm, then we need to map it as cached, otherwise non-cached, and B needs to deal with both cases. And unfortunately the mmap syscall takes no cached flags and we can't decide the cache attribute when we are doing the mmap, so I introduce new the ioctl even though I think the solution is not as good. Thanks for the explanation, this was about the use case I expected. I'm pretty sure I had this exact problem once upon a time and we didn't come up with a solution. I'd still like to get rid of uncached buffers in general and just use cached buffers (see http://driverdev.linuxdriverproject.org/pipermail/driverdev-devel/201 8-N ovember/128842.html) What's your usecase for uncached buffers? Some buffers are only used by HW, in this case, we tend to use uncached buffers. But sometimes the SW need to read few buffer contents for debug purpose and no synchronization is needed. If this is primarily for debug, that's not really a compelling reason to support uncached buffers. It's incredibly difficult to do uncached buffers correctly I'd rather spend effort on making the cached use cases work better. No, not only for debug, I meant if the buffers are uncached, the SW will only mmap them for debug or even don't need to mmap them. So for the two kinds of buffers: 1. uncached: only the HW need to access it, the SW don't need to mmap it or just for debug. 2. cached: both the HW and the SW need to access it. The ION is designed as a generalized memory manager, I think we should cover as many requirements as we can in the ION design, so I 'd rather that we keep the uncached support. And I don’t quite understand the difficulty you mentioned to support the uncached buffers, would you explain a little more about it. We end up with aliasing problems. Each kernel page still has a cached mapping so it's very difficult to keep the two mappings in sync. https://lore.kernel.org/lkml/20181120164636.jcw7li2uaa3cmwc3@DESKTOP-E1NTVVP.localdomain/ This thread does a better job of explaining the problem and the objections to uncached buffers. And if the software never touches the buffer, why does it matter if the buffer is uncached? Laura Thanks Zengtao Signed-off-by: Zeng Tao --- drivers/staging/android/ion/ion-ioctl.c | 4 drivers/staging/android/ion/ion.c | 17 + drivers/staging/android/ion/ion.h | 1 + drivers/staging/android/uapi/ion.h | 22 ++ 4 files changed, 44 insertions(+) diff --git a/drivers/staging/android/ion/ion-ioctl.c b/drivers/staging/android/ion/ion-ioctl.c index a8d3cc4..60bb702 100644 --- a/drivers/staging/android/ion/ion-ioctl.c +++ b/drivers/staging/android/ion/ion-ioctl.c
Re: compilation failure with CONFIG_VFIO_PCI_NVLINK2
On 1/3/19 5:49 AM, Alexey Kardashevskiy wrote: On 03/01/2019 03:37, Laura Abbott wrote: Hi, I got a compilation failure when building with CONFIG_VFIO_PCI_NVLINK2 enabled: + make -s 'HOSTCFLAGS=-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mcpu=power8 -mtune=power8 -fasynchronous-unwind-tables -fstack-clash-protection' 'HOSTLDFLAGS=-Wl,-z,relro -Wl,--as-needed -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -Wl,--build-id=uuid' ARCH=powerpc -j4 modules BUILDSTDERR: In file included from drivers/vfio/pci/trace.h:102, BUILDSTDERR: from drivers/vfio/pci/vfio_pci_nvlink2.c:29: BUILDSTDERR: ./include/trace/define_trace.h:89:42: fatal error: ./trace.h: No such file or directory BUILDSTDERR: #include TRACE_INCLUDE(TRACE_INCLUDE_FILE) BUILDSTDERR: ^ BUILDSTDERR: compilation terminated. BUILDSTDERR: make[3]: *** [scripts/Makefile.build:277: drivers/vfio/pci/vfio_pci_nvlink2.o] Error 1 BUILDSTDERR: make[2]: *** [scripts/Makefile.build:492: drivers/vfio/pci] Error 2 BUILDSTDERR: make[1]: *** [scripts/Makefile.build:492: drivers/vfio] Error 2 BUILDSTDERR: make: *** [Makefile:1053: drivers] Error 2 BUILDSTDERR: make: *** Waiting for unfinished jobs I don't know enough about ftrace building to make a guess here. Config is attacked. What gcc is this and what is the exact sha1 of the tree? gcc8 prints other error with your config in drivers/scsi/esas2r/esas2r_ioctl.c but not this one so I am curious. gcc (GCC) 8.2.1 20181215 (Red Hat 8.2.1-6) sha 8e143b90e4d45cca3dc53760d3cfab988bc74571 Also, would it be possible to switch this option from def_bool to bool? I can't turn it off directly when it's def_bool. Why? Honestly I'd rather fix the compile error. It's not just about this error, there may be other situations where it would be good to have this turned off. Thanks, Luara
Re: [PATCH] staging: android: ion: move map_kernel to ion_dma_buf_kmap
On 12/24/18 12:19 AM, Qing Xia wrote: Now, as Google's user guide, if userspace need clean ION buffer's cache, they should call ioctl(fd, DMA_BUF_IOCTL_SYNC, sync). Then we found that ion_dma_buf_begin_cpu_access/ion_dma_buf_end_cpu_access will do ION buffer's map_kernel, it's not necessary. And if usersapce only need clean cache, they will call ion_dma_buf_end_cpu_access by dmabuf's ioctl, ION buffer's kmap_cnt will be wrong value -1, then driver could not get right kernel vaddr by dma_buf_kmap. The problem is this subtly violates how dma_buf is supposed to work. All setup is supposed to happen in begin_cpu_access. I agree calling map kernel each time isn't great but I think this needs to be worked out with dma_buf. Thanks, Laura Signed-off-by: Qing Xia --- drivers/staging/android/ion/ion.c | 46 ++- 1 file changed, 21 insertions(+), 25 deletions(-) diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c index 9907332..f7e2812 100644 --- a/drivers/staging/android/ion/ion.c +++ b/drivers/staging/android/ion/ion.c @@ -303,45 +303,47 @@ static void ion_dma_buf_release(struct dma_buf *dmabuf) static void *ion_dma_buf_kmap(struct dma_buf *dmabuf, unsigned long offset) { struct ion_buffer *buffer = dmabuf->priv; + void *vaddr; - return buffer->vaddr + offset * PAGE_SIZE; + if (buffer->heap->ops->map_kernel) { + mutex_lock(>lock); + vaddr = ion_buffer_kmap_get(buffer); + mutex_unlock(>lock); + if (IS_ERR(vaddr)) + return vaddr; + + return vaddr + offset * PAGE_SIZE; + } + + return NULL; } static void ion_dma_buf_kunmap(struct dma_buf *dmabuf, unsigned long offset, void *ptr) { + struct ion_buffer *buffer = dmabuf->priv; + + if (buffer->heap->ops->map_kernel) { + mutex_lock(>lock); + ion_buffer_kmap_put(buffer); + mutex_unlock(>lock); + } } static int ion_dma_buf_begin_cpu_access(struct dma_buf *dmabuf, enum dma_data_direction direction) { struct ion_buffer *buffer = dmabuf->priv; - void *vaddr; struct ion_dma_buf_attachment *a; - int ret = 0; - - /* -* TODO: Move this elsewhere because we don't always need a vaddr -*/ - if (buffer->heap->ops->map_kernel) { - mutex_lock(>lock); - vaddr = ion_buffer_kmap_get(buffer); - if (IS_ERR(vaddr)) { - ret = PTR_ERR(vaddr); - goto unlock; - } - mutex_unlock(>lock); - } mutex_lock(>lock); list_for_each_entry(a, >attachments, list) { dma_sync_sg_for_cpu(a->dev, a->table->sgl, a->table->nents, direction); } - -unlock: mutex_unlock(>lock); - return ret; + + return 0; } static int ion_dma_buf_end_cpu_access(struct dma_buf *dmabuf, @@ -350,12 +352,6 @@ static int ion_dma_buf_end_cpu_access(struct dma_buf *dmabuf, struct ion_buffer *buffer = dmabuf->priv; struct ion_dma_buf_attachment *a; - if (buffer->heap->ops->map_kernel) { - mutex_lock(>lock); - ion_buffer_kmap_put(buffer); - mutex_unlock(>lock); - } - mutex_lock(>lock); list_for_each_entry(a, >attachments, list) { dma_sync_sg_for_device(a->dev, a->table->sgl, a->table->nents,
Re: [PATCH] staging: android: ion: add buffer flag update ioctl
On 12/23/18 6:47 PM, Zengtao (B) wrote: Hi laura: -Original Message- From: Laura Abbott [mailto:labb...@redhat.com] Sent: Friday, December 21, 2018 4:50 AM To: Zengtao (B) ; sumit.sem...@linaro.org Cc: Greg Kroah-Hartman ; Arve Hjønnevåg ; Todd Kjos ; Martijn Coenen ; Joel Fernandes ; de...@driverdev.osuosl.org; dri-de...@lists.freedesktop.org; linaro-mm-...@lists.linaro.org; linux-kernel@vger.kernel.org Subject: Re: [PATCH] staging: android: ion: add buffer flag update ioctl On 12/19/18 5:39 PM, Zengtao (B) wrote: Hi laura: -Original Message- From: Laura Abbott [mailto:labb...@redhat.com] Sent: Thursday, December 20, 2018 2:10 AM To: Zengtao (B) ; sumit.sem...@linaro.org Cc: Greg Kroah-Hartman ; Arve Hjønnevåg ; Todd Kjos ; Martijn Coenen ; Joel Fernandes ; de...@driverdev.osuosl.org; dri-de...@lists.freedesktop.org; linaro-mm-...@lists.linaro.org; linux-kernel@vger.kernel.org Subject: Re: [PATCH] staging: android: ion: add buffer flag update ioctl On 12/19/18 9:19 AM, Zeng Tao wrote: In some usecases, the buffer cached attribute is not determined at allocation time, it's determined just before the real cpu mapping. And from the memory view of point, a buffer should not have the cached attribute util is really mapped by the cpu. So in this patch, we introduced the new ioctl command to target the requirement. This is racy and error prone. Can you explain more what problem you are trying to solve? My use case is like this: 1. There are two process A and B, A takes case of ion buffer allocation, and pass the buffer fd to B, then B maps and uses it. 2. Process B need to map the buffer with different cached attribute for different use case, for example, if the buffer is used for pure software algorithm, then we need to map it as cached, otherwise non-cached, and B needs to deal with both cases. And unfortunately the mmap syscall takes no cached flags and we can't decide the cache attribute when we are doing the mmap, so I introduce new the ioctl even though I think the solution is not as good. Thanks for the explanation, this was about the use case I expected. I'm pretty sure I had this exact problem once upon a time and we didn't come up with a solution. I'd still like to get rid of uncached buffers in general and just use cached buffers (see http://driverdev.linuxdriverproject.org/pipermail/driverdev-devel/2018-N ovember/128842.html) What's your usecase for uncached buffers? Some buffers are only used by HW, in this case, we tend to use uncached buffers. But sometimes the SW need to read few buffer contents for debug purpose and no synchronization is needed. If this is primarily for debug, that's not really a compelling reason to support uncached buffers. It's incredibly difficult to do uncached buffers correctly I'd rather spend effort on making the cached use cases work better. Thanks, Laura Signed-off-by: Zeng Tao --- drivers/staging/android/ion/ion-ioctl.c | 4 drivers/staging/android/ion/ion.c | 17 + drivers/staging/android/ion/ion.h | 1 + drivers/staging/android/uapi/ion.h | 22 ++ 4 files changed, 44 insertions(+) diff --git a/drivers/staging/android/ion/ion-ioctl.c b/drivers/staging/android/ion/ion-ioctl.c index a8d3cc4..60bb702 100644 --- a/drivers/staging/android/ion/ion-ioctl.c +++ b/drivers/staging/android/ion/ion-ioctl.c @@ -12,6 +12,7 @@ union ion_ioctl_arg { struct ion_allocation_data allocation; + struct ion_buffer_flag_data update; struct ion_heap_query query; }; @@ -83,6 +84,9 @@ long ion_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) break; } + case ION_IOC_BUFFER_UPDATE: + ret = ion_buffer_update(data.update.fd, data.update.flags); + break; case ION_IOC_HEAP_QUERY: ret = ion_query_heaps(); break; diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c index 9907332..f1404dc 100644 --- a/drivers/staging/android/ion/ion.c +++ b/drivers/staging/android/ion/ion.c @@ -436,6 +436,23 @@ int ion_alloc(size_t len, unsigned int heap_id_mask, unsigned int flags) return fd; } +int ion_buffer_update(unsigned int fd, unsigned int flags) { + struct dma_buf *dmabuf; + struct ion_buffer *buffer; + + dmabuf = dma_buf_get(fd); + + if (!dmabuf) + return -EINVAL; + + buffer = dmabuf->priv; + buffer->flags = flags; + dma_buf_put(dmabuf); + + return 0; +} + int ion_query_heaps(struct ion_heap_query *query) { struct ion_device *dev = internal_dev; diff --git a/drivers/staging/android/ion/ion.h b/drivers/staging/android/ion/ion.h index c006fc1..99bf9ab 100644 --- a/drivers/staging/android/ion/ion.h +++ b/drivers/staging/android/ion/ion.h @@ -199,6 +199,7 @@ int ion_heap_pages_zero(struct page *page,
Re: [PATCH] staging: android: ion: add buffer flag update ioctl
On 12/19/18 5:39 PM, Zengtao (B) wrote: Hi laura: -Original Message- From: Laura Abbott [mailto:labb...@redhat.com] Sent: Thursday, December 20, 2018 2:10 AM To: Zengtao (B) ; sumit.sem...@linaro.org Cc: Greg Kroah-Hartman ; Arve Hjønnevåg ; Todd Kjos ; Martijn Coenen ; Joel Fernandes ; de...@driverdev.osuosl.org; dri-de...@lists.freedesktop.org; linaro-mm-...@lists.linaro.org; linux-kernel@vger.kernel.org Subject: Re: [PATCH] staging: android: ion: add buffer flag update ioctl On 12/19/18 9:19 AM, Zeng Tao wrote: In some usecases, the buffer cached attribute is not determined at allocation time, it's determined just before the real cpu mapping. And from the memory view of point, a buffer should not have the cached attribute util is really mapped by the cpu. So in this patch, we introduced the new ioctl command to target the requirement. This is racy and error prone. Can you explain more what problem you are trying to solve? My use case is like this: 1. There are two process A and B, A takes case of ion buffer allocation, and pass the buffer fd to B, then B maps and uses it. 2. Process B need to map the buffer with different cached attribute for different use case, for example, if the buffer is used for pure software algorithm, then we need to map it as cached, otherwise non-cached, and B needs to deal with both cases. And unfortunately the mmap syscall takes no cached flags and we can't decide the cache attribute when we are doing the mmap, so I introduce new the ioctl even though I think the solution is not as good. Thanks for the explanation, this was about the use case I expected. I'm pretty sure I had this exact problem once upon a time and we didn't come up with a solution. I'd still like to get rid of uncached buffers in general and just use cached buffers (see http://driverdev.linuxdriverproject.org/pipermail/driverdev-devel/2018-November/128842.html) What's your usecase for uncached buffers? Signed-off-by: Zeng Tao --- drivers/staging/android/ion/ion-ioctl.c | 4 drivers/staging/android/ion/ion.c | 17 + drivers/staging/android/ion/ion.h | 1 + drivers/staging/android/uapi/ion.h | 22 ++ 4 files changed, 44 insertions(+) diff --git a/drivers/staging/android/ion/ion-ioctl.c b/drivers/staging/android/ion/ion-ioctl.c index a8d3cc4..60bb702 100644 --- a/drivers/staging/android/ion/ion-ioctl.c +++ b/drivers/staging/android/ion/ion-ioctl.c @@ -12,6 +12,7 @@ union ion_ioctl_arg { struct ion_allocation_data allocation; + struct ion_buffer_flag_data update; struct ion_heap_query query; }; @@ -83,6 +84,9 @@ long ion_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) break; } + case ION_IOC_BUFFER_UPDATE: + ret = ion_buffer_update(data.update.fd, data.update.flags); + break; case ION_IOC_HEAP_QUERY: ret = ion_query_heaps(); break; diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c index 9907332..f1404dc 100644 --- a/drivers/staging/android/ion/ion.c +++ b/drivers/staging/android/ion/ion.c @@ -436,6 +436,23 @@ int ion_alloc(size_t len, unsigned int heap_id_mask, unsigned int flags) return fd; } +int ion_buffer_update(unsigned int fd, unsigned int flags) { + struct dma_buf *dmabuf; + struct ion_buffer *buffer; + + dmabuf = dma_buf_get(fd); + + if (!dmabuf) + return -EINVAL; + + buffer = dmabuf->priv; + buffer->flags = flags; + dma_buf_put(dmabuf); + + return 0; +} + int ion_query_heaps(struct ion_heap_query *query) { struct ion_device *dev = internal_dev; diff --git a/drivers/staging/android/ion/ion.h b/drivers/staging/android/ion/ion.h index c006fc1..99bf9ab 100644 --- a/drivers/staging/android/ion/ion.h +++ b/drivers/staging/android/ion/ion.h @@ -199,6 +199,7 @@ int ion_heap_pages_zero(struct page *page, size_t size, pgprot_t pgprot); int ion_alloc(size_t len, unsigned int heap_id_mask, unsigned int flags); +int ion_buffer_update(unsigned int fd, unsigned int flags); /** * ion_heap_init_shrinker diff --git a/drivers/staging/android/uapi/ion.h b/drivers/staging/android/uapi/ion.h index 5d70098..99753fc 100644 --- a/drivers/staging/android/uapi/ion.h +++ b/drivers/staging/android/uapi/ion.h @@ -74,6 +74,20 @@ struct ion_allocation_data { __u32 unused; }; +/** + * struct ion_buffer_flag_data - metadata passed from userspace for +update + * buffer flags + * @fd:file descriptor of the buffer + * @flags: flags passed to the buffer + * + * Provided by userspace as an argument to the ioctl */ + +struct ion_buffer_flag_data { + __u32 fd; + __u32 flags; +} + #define MAX_HEAP_NAME32 /** @@ -116,6 +130,14 @@
Re: [PATCH] staging: android: ion: add buffer flag update ioctl
On 12/19/18 9:19 AM, Zeng Tao wrote: In some usecases, the buffer cached attribute is not determined at allocation time, it's determined just before the real cpu mapping. And from the memory view of point, a buffer should not have the cached attribute util is really mapped by the cpu. So in this patch, we introduced the new ioctl command to target the requirement. This is racy and error prone. Can you explain more what problem you are trying to solve? Signed-off-by: Zeng Tao --- drivers/staging/android/ion/ion-ioctl.c | 4 drivers/staging/android/ion/ion.c | 17 + drivers/staging/android/ion/ion.h | 1 + drivers/staging/android/uapi/ion.h | 22 ++ 4 files changed, 44 insertions(+) diff --git a/drivers/staging/android/ion/ion-ioctl.c b/drivers/staging/android/ion/ion-ioctl.c index a8d3cc4..60bb702 100644 --- a/drivers/staging/android/ion/ion-ioctl.c +++ b/drivers/staging/android/ion/ion-ioctl.c @@ -12,6 +12,7 @@ union ion_ioctl_arg { struct ion_allocation_data allocation; + struct ion_buffer_flag_data update; struct ion_heap_query query; }; @@ -83,6 +84,9 @@ long ion_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) break; } + case ION_IOC_BUFFER_UPDATE: + ret = ion_buffer_update(data.update.fd, data.update.flags); + break; case ION_IOC_HEAP_QUERY: ret = ion_query_heaps(); break; diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c index 9907332..f1404dc 100644 --- a/drivers/staging/android/ion/ion.c +++ b/drivers/staging/android/ion/ion.c @@ -436,6 +436,23 @@ int ion_alloc(size_t len, unsigned int heap_id_mask, unsigned int flags) return fd; } +int ion_buffer_update(unsigned int fd, unsigned int flags) +{ + struct dma_buf *dmabuf; + struct ion_buffer *buffer; + + dmabuf = dma_buf_get(fd); + + if (!dmabuf) + return -EINVAL; + + buffer = dmabuf->priv; + buffer->flags = flags; + dma_buf_put(dmabuf); + + return 0; +} + int ion_query_heaps(struct ion_heap_query *query) { struct ion_device *dev = internal_dev; diff --git a/drivers/staging/android/ion/ion.h b/drivers/staging/android/ion/ion.h index c006fc1..99bf9ab 100644 --- a/drivers/staging/android/ion/ion.h +++ b/drivers/staging/android/ion/ion.h @@ -199,6 +199,7 @@ int ion_heap_pages_zero(struct page *page, size_t size, pgprot_t pgprot); int ion_alloc(size_t len, unsigned int heap_id_mask, unsigned int flags); +int ion_buffer_update(unsigned int fd, unsigned int flags); /** * ion_heap_init_shrinker diff --git a/drivers/staging/android/uapi/ion.h b/drivers/staging/android/uapi/ion.h index 5d70098..99753fc 100644 --- a/drivers/staging/android/uapi/ion.h +++ b/drivers/staging/android/uapi/ion.h @@ -74,6 +74,20 @@ struct ion_allocation_data { __u32 unused; }; +/** + * struct ion_buffer_flag_data - metadata passed from userspace for update + * buffer flags + * @fd:file descriptor of the buffer + * @flags: flags passed to the buffer + * + * Provided by userspace as an argument to the ioctl + */ + +struct ion_buffer_flag_data { + __u32 fd; + __u32 flags; +} + #define MAX_HEAP_NAME 32 /** @@ -116,6 +130,14 @@ struct ion_heap_query { struct ion_allocation_data) /** + * DOC: ION_IOC_BUFFER_UPDATE - update the specified ion buffer flags + * + * Takes an ion_buffer_flag_data structure and returns the result of the + * buffer flag update operation. + */ +#define ION_IOC_BUFFER_UPDATE _IOWR(ION_IOC_MAGIC, 1, \ + struct ion_buffer_flag_data) +/** * DOC: ION_IOC_HEAP_QUERY - information about available heaps * * Takes an ion_heap_query structure and populates information about
Re: [PATCH] x86: vdso: Pass --eh-frame-hdr to ld
On 12/14/18 2:27 PM, Alistair Strachan wrote: Commit 379d98ddf413 ("x86: vdso: Use $LD instead of $CC to link") accidentally broke unwinding from userspace, because ld would strip the .eh_frame sections when linking. Originally, the compiler would implicitly add --eh-frame-hdr when invoking the linker, but when this Makefile was converted from invoking ld via the compiler, to invoking it directly (like vmlinux does), the flag was missed. (The EH_FRAME section is important for the VDSO shared libraries, but not for vmlinux.) Fix the problem by explicitly specifying --eh-frame-hdr, which restores parity with the old method. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201741 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1659295 Reported-by: Laura Abbott You should probably use the bugzilla reporters instead of me given I just did the boring part of sending an e-mail. Instead you can use: Tested-by: Laura Abbott Fixes: 379d98ddf413 ("x86: vdso: Use $LD instead of $CC to link") Cc: sta...@vger.kernel.org Cc: Andy Lutomirski Cc: Thomas Gleixner Cc: "H. Peter Anvin" Cc: X86 ML Cc: Florian Weimer , Cc: Carlos O'Donell , Cc: "H. J. Lu" Cc: Joel Fernandes Cc: kernel-t...@android.com Signed-off-by: Alistair Strachan --- arch/x86/entry/vdso/Makefile | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile index 141d415a8c80..c3d7ccd25381 100644 --- a/arch/x86/entry/vdso/Makefile +++ b/arch/x86/entry/vdso/Makefile @@ -171,7 +171,8 @@ quiet_cmd_vdso = VDSO$@ sh $(srctree)/$(src)/checkundef.sh '$(NM)' '$@' VDSO_LDFLAGS = -shared $(call ld-option, --hash-style=both) \ - $(call ld-option, --build-id) -Bsymbolic + $(call ld-option, --build-id) $(call ld-option, --eh-frame-hdr) \ + -Bsymbolic GCOV_PROFILE := n #
Unwinding regression with 379d98ddf413 ("x86: vdso: Use $LD instead of $CC to link")
Hi, There are two reports of a regression with unwinding with 379d98ddf413 ("x86: vdso: Use $LD instead of $CC to link") https://bugzilla.kernel.org/show_bug.cgi?id=201741 https://bugzilla.redhat.com/show_bug.cgi?id=1659295 It looks like the unwinding information has been stripped out. Any ideas? Thanks, Laura
Re: [PATCH] acpi / apei: fix NULL deref during init
(adding some more people, please remember to run get_maintainer.pl to get the full list in the future) On 12/14/18 10:15 AM, Thomas Schoebel-Theuer wrote: Since commit commit d91525eb8ee6 ("ACPI, EINJ: Enhance error injection tolerance level"), starting with kernel 4.0, the following happens during boot of a specific old hardware: APEI: Can not request [mem 0x0009c2f2-0x0009c2fc] for APEI ERST registers BUG: unable to handle kernel NULL pointer dereference at (null) IP: [] __list_del_entry+0x5c/0x98 PGD 0 Oops: [#1] SMP Modules linked in: CPU: 0 PID: 1 UID: 0 Comm: swapper/0 Not tainted 4.4.0-ui18344.004-uiabi1-infong-amd64 #1 Hardware name: IBM IBM eServer BladeCenter HS12 -[8028Z5S]-/Server Blade, BIOS -[N1E150AUS-1.11]- 11/04/2010 task: 88021fe4e040 ti: 88021fe7c000 task.ti: 88021fe7c000 RIP: 0010:[] [] __list_del_entry+0x5c/0x98 RSP: :88021fe7fd18 EFLAGS: 00010207 RAX: RBX: 88021fe7fde0 RCX: 88021fe7fde0 RDX: 819bd040 RSI: dead0200 RDI: 88021fe7fde0 RBP: 88021fe7fd18 R08: R09: R10: 816ce240 R11: 0001 R12: 819bd040 R13: 88021fe7fda0 R14: 88021d2cd840 R15: FS: () GS:88022fc0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: CR3: 019b6000 CR4: 00040670 Stack: 88021fe7fd30 81343dd7 88021fe7fde0 88021fe7fd58 813931c0 88021fe7fda0 88021fe7fe00 88021d2cd840 88021fe7fd70 813931e5 ffea 88021fe7fdf0 Call Trace: [] list_del+0xd/0x25 [] apei_res_clean+0x1f/0x37 [] apei_resources_fini+0xd/0x19 [] apei_resources_request+0x24f/0x268 [] ? apei_exec_for_each_entry+0x77/0x8e [] ? setup_erst_disable+0x12/0x12 [] erst_init+0xed/0x2ca [] ? do_one_initcall+0x8c/0x174 [] ? setup_erst_disable+0x12/0x12 [] ? setup_erst_disable+0x12/0x12 [] do_one_initcall+0xe9/0x174 [] ? parse_args+0x161/0x296 [] kernel_init_freeable+0x169/0x1f6 [] ? do_early_param+0x88/0x88 [] ? rest_init+0x79/0x79 [] kernel_init+0x9/0xd5 [] ret_from_fork+0x55/0x80 [] ? rest_init+0x79/0x79 Code: 02 00 00 00 00 ad de 48 39 f0 75 1f 49 89 c0 48 c7 c2 38 de 8e 81 be 38 00 00 00 48 c7 c7 13 dd 8e 81 31 c0 e8 94 36 d0 ff eb 3a <48> 8b 30 48 39 fe 74 11 49 89 f0 48 c7 c2 6c de 8e 81 be 3b 00 RIP [] __list_del_entry+0x5c/0x98 RSP CR2: ---[ end trace 3610e544cef27e81 ]--- Kernel panic - not syncing: Attempted to kill init! exitcode=0x0009 Reason is a conditional initialization of variable arch_res, which happens only under a specific precondition. When the condition is false, the variable remains uninitialized. This may later trigger a splat, e.g. when some error path is taken. Solution: do the initialisation unconditionally. Also as a safeguard. Fixes: d91525eb8ee6a622ce476955fe1a2530ade87c83 Signed-off-by: Thomas Schoebel-Theuer --- drivers/acpi/apei/apei-base.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/acpi/apei/apei-base.c b/drivers/acpi/apei/apei-base.c index da370e1d31f4..ef931b8a0b11 100644 --- a/drivers/acpi/apei/apei-base.c +++ b/drivers/acpi/apei/apei-base.c @@ -494,8 +494,8 @@ int apei_resources_request(struct apei_resources *resources, if (rc) goto nvs_res_fini; + apei_resources_init(_res); if (arch_apei_filter_addr) { - apei_resources_init(_res); rc = apei_get_arch_resources(_res); if (rc) goto arch_res_fini;
Process for severe early stable bugs?
The latest file system corruption issue (Nominally fixed by ffe81d45322c ("blk-mq: fix corruption with direct issue") later fixed by c616cbee97ae ("blk-mq: punt failed direct issue to dispatch list")) brought a lot of rightfully concerned users asking about release schedules. 4.18 went EOL on Nov 21 and Fedora rebased to 4.19.3 on Nov 23. When the issue started getting visibility, users were left with the option of running known EOL 4.18.x kernels or running a 4.19 series that could corrupt their data. Admittedly, the risk of running the EOL kernel was pretty low given how recent it was, but it's still not a great look to tell people to run something marked EOL. I'm wondering if there's anything we can do to make things easier on kernel consumers. Bugs will certainly happen but it really makes it hard to push the "always run the latest stable" narrative if there isn't a good fallback when things go seriously wrong. I don't actually have a great proposal for a solution here other than retroactively bringing back 4.18 (which I don't think Greg would like) but I figured I should at least bring it up. Thanks, Laura
Process for severe early stable bugs?
The latest file system corruption issue (Nominally fixed by ffe81d45322c ("blk-mq: fix corruption with direct issue") later fixed by c616cbee97ae ("blk-mq: punt failed direct issue to dispatch list")) brought a lot of rightfully concerned users asking about release schedules. 4.18 went EOL on Nov 21 and Fedora rebased to 4.19.3 on Nov 23. When the issue started getting visibility, users were left with the option of running known EOL 4.18.x kernels or running a 4.19 series that could corrupt their data. Admittedly, the risk of running the EOL kernel was pretty low given how recent it was, but it's still not a great look to tell people to run something marked EOL. I'm wondering if there's anything we can do to make things easier on kernel consumers. Bugs will certainly happen but it really makes it hard to push the "always run the latest stable" narrative if there isn't a good fallback when things go seriously wrong. I don't actually have a great proposal for a solution here other than retroactively bringing back 4.18 (which I don't think Greg would like) but I figured I should at least bring it up. Thanks, Laura
Re: [PATCH] Revert "HID: uhid: use strlcpy() instead of strncpy()"
On 11/14/18 5:16 AM, David Herrmann wrote: This reverts commit 336fd4f5f25157e9e8bd50e898a1bbcd99eaea46. Please note that `strlcpy()` does *NOT* do what you think it does. strlcpy() *ALWAYS* reads the full input string, regardless of the 'length' parameter. That is, if the input is not zero-terminated, strlcpy() will *READ* beyond input boundaries. It does this, because it always returns the size it *would* copy if the target was big enough, not the truncated size it actually copied. The original code was perfectly fine. The hid device is zero-initialized and the strncpy() functions copied up to n-1 characters. The result is always zero-terminated this way. This is the third time someone tried to replace strncpy with strlcpy in this function, and gets it wrong. I now added a comment that should at least make people reconsider. Can we switch to strscpy instead? This will quiet gcc and avoid the issues with strlcpy. Signed-off-by: David Herrmann --- drivers/hid/uhid.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/drivers/hid/uhid.c b/drivers/hid/uhid.c index fefedc0b4dc6..0dfdd0ac7120 100644 --- a/drivers/hid/uhid.c +++ b/drivers/hid/uhid.c @@ -496,12 +496,13 @@ static int uhid_dev_create2(struct uhid_device *uhid, goto err_free; } - len = min(sizeof(hid->name), sizeof(ev->u.create2.name)); - strlcpy(hid->name, ev->u.create2.name, len); - len = min(sizeof(hid->phys), sizeof(ev->u.create2.phys)); - strlcpy(hid->phys, ev->u.create2.phys, len); - len = min(sizeof(hid->uniq), sizeof(ev->u.create2.uniq)); - strlcpy(hid->uniq, ev->u.create2.uniq, len); + /* @hid is zero-initialized, strncpy() is correct, strlcpy() not */ + len = min(sizeof(hid->name), sizeof(ev->u.create2.name)) - 1; + strncpy(hid->name, ev->u.create2.name, len); + len = min(sizeof(hid->phys), sizeof(ev->u.create2.phys)) - 1; + strncpy(hid->phys, ev->u.create2.phys, len); + len = min(sizeof(hid->uniq), sizeof(ev->u.create2.uniq)) - 1; + strncpy(hid->uniq, ev->u.create2.uniq, len); hid->ll_driver = _hid_driver; hid->bus = ev->u.create2.bus;
Re: [PATCH] Revert "HID: uhid: use strlcpy() instead of strncpy()"
On 11/14/18 5:16 AM, David Herrmann wrote: This reverts commit 336fd4f5f25157e9e8bd50e898a1bbcd99eaea46. Please note that `strlcpy()` does *NOT* do what you think it does. strlcpy() *ALWAYS* reads the full input string, regardless of the 'length' parameter. That is, if the input is not zero-terminated, strlcpy() will *READ* beyond input boundaries. It does this, because it always returns the size it *would* copy if the target was big enough, not the truncated size it actually copied. The original code was perfectly fine. The hid device is zero-initialized and the strncpy() functions copied up to n-1 characters. The result is always zero-terminated this way. This is the third time someone tried to replace strncpy with strlcpy in this function, and gets it wrong. I now added a comment that should at least make people reconsider. Can we switch to strscpy instead? This will quiet gcc and avoid the issues with strlcpy. Signed-off-by: David Herrmann --- drivers/hid/uhid.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/drivers/hid/uhid.c b/drivers/hid/uhid.c index fefedc0b4dc6..0dfdd0ac7120 100644 --- a/drivers/hid/uhid.c +++ b/drivers/hid/uhid.c @@ -496,12 +496,13 @@ static int uhid_dev_create2(struct uhid_device *uhid, goto err_free; } - len = min(sizeof(hid->name), sizeof(ev->u.create2.name)); - strlcpy(hid->name, ev->u.create2.name, len); - len = min(sizeof(hid->phys), sizeof(ev->u.create2.phys)); - strlcpy(hid->phys, ev->u.create2.phys, len); - len = min(sizeof(hid->uniq), sizeof(ev->u.create2.uniq)); - strlcpy(hid->uniq, ev->u.create2.uniq, len); + /* @hid is zero-initialized, strncpy() is correct, strlcpy() not */ + len = min(sizeof(hid->name), sizeof(ev->u.create2.name)) - 1; + strncpy(hid->name, ev->u.create2.name, len); + len = min(sizeof(hid->phys), sizeof(ev->u.create2.phys)) - 1; + strncpy(hid->phys, ev->u.create2.phys, len); + len = min(sizeof(hid->uniq), sizeof(ev->u.create2.uniq)) - 1; + strncpy(hid->uniq, ev->u.create2.uniq, len); hid->ll_driver = _hid_driver; hid->bus = ev->u.create2.bus;
Re: [PATCH] ARM: mm: dump: Change to use DEFINE_SHOW_ATTRIBUTE macro
On 11/5/18 6:32 AM, Yangtao Li wrote: Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Acked-by: Laura Abbott Signed-off-by: Yangtao Li --- arch/arm/mm/ptdump_debugfs.c | 12 +--- 1 file changed, 1 insertion(+), 11 deletions(-) diff --git a/arch/arm/mm/ptdump_debugfs.c b/arch/arm/mm/ptdump_debugfs.c index be8d87be4b93..201cd467a739 100644 --- a/arch/arm/mm/ptdump_debugfs.c +++ b/arch/arm/mm/ptdump_debugfs.c @@ -12,17 +12,7 @@ static int ptdump_show(struct seq_file *m, void *v) return 0; } -static int ptdump_open(struct inode *inode, struct file *file) -{ - return single_open(file, ptdump_show, inode->i_private); -} - -static const struct file_operations ptdump_fops = { - .open = ptdump_open, - .read = seq_read, - .llseek = seq_lseek, - .release= single_release, -}; +DEFINE_SHOW_ATTRIBUTE(ptdump); int ptdump_debugfs_register(struct ptdump_info *info, const char *name) {
Re: [PATCH] ARM: mm: dump: Change to use DEFINE_SHOW_ATTRIBUTE macro
On 11/5/18 6:32 AM, Yangtao Li wrote: Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Acked-by: Laura Abbott Signed-off-by: Yangtao Li --- arch/arm/mm/ptdump_debugfs.c | 12 +--- 1 file changed, 1 insertion(+), 11 deletions(-) diff --git a/arch/arm/mm/ptdump_debugfs.c b/arch/arm/mm/ptdump_debugfs.c index be8d87be4b93..201cd467a739 100644 --- a/arch/arm/mm/ptdump_debugfs.c +++ b/arch/arm/mm/ptdump_debugfs.c @@ -12,17 +12,7 @@ static int ptdump_show(struct seq_file *m, void *v) return 0; } -static int ptdump_open(struct inode *inode, struct file *file) -{ - return single_open(file, ptdump_show, inode->i_private); -} - -static const struct file_operations ptdump_fops = { - .open = ptdump_open, - .read = seq_read, - .llseek = seq_lseek, - .release= single_release, -}; +DEFINE_SHOW_ATTRIBUTE(ptdump); int ptdump_debugfs_register(struct ptdump_info *info, const char *name) {
Re: Call to Action Re: [PATCH 0/7] Code of Conduct: Fix some wording, and add an interpretation document
On 10/21/2018 02:20 PM, NeilBrown wrote: I call on the community to consider what *does* need to be said, about conduct, to people outside the community and who have recently joined. What is the document that you would have liked to have read as you were starting out? It is all too long ago for me to remember clearly, and so much has changed. I joined much more recently than many and what I would have wanted then is an interesting question. I probably would _not_ have wanted a code of conduct when I first started working in open source. I also said things in my younger years I regret and probably wouldn't have said if I was held to a higher standard of conduct. Younger me frequently put up with behavior I wouldn't tolerate today. Younger me also greatly benefited from the experience of other kernel developers giving me firm feedback in a helpful way and saying no to bad approaches. I don't believe I would have continued if I hadn't been given that feedback in a useful way. Today, I think the code of conduct is a very important addition to the community. It's a stronger assertion that the kernel community is committed to raising the bar for behavior. I have no concern about patch review or quality dropping because most maintainers demonstrate every day that they can give effective feedback. We're all going to screw that up sometimes and the Code of Conduct reminds us to do our best. Most issues that arise can be resolved with a private e-mail going "you might want to rethink your wording there." What the Code of Conduct also provides is confidence that more serious community issues such as harassment not related to patch review can be handled. It spells out certain behaviors that won't be tolerated and explains how those problems will be dealt with. Will those problems actually be handled appropriately when the time comes? I can't actually say for sure, but the kernel community works on trust so I guess I have to trust that it will. I really hope I never have to report harassment but I'm glad there's a process in place. Thanks, Laura
Re: Call to Action Re: [PATCH 0/7] Code of Conduct: Fix some wording, and add an interpretation document
On 10/21/2018 02:20 PM, NeilBrown wrote: I call on the community to consider what *does* need to be said, about conduct, to people outside the community and who have recently joined. What is the document that you would have liked to have read as you were starting out? It is all too long ago for me to remember clearly, and so much has changed. I joined much more recently than many and what I would have wanted then is an interesting question. I probably would _not_ have wanted a code of conduct when I first started working in open source. I also said things in my younger years I regret and probably wouldn't have said if I was held to a higher standard of conduct. Younger me frequently put up with behavior I wouldn't tolerate today. Younger me also greatly benefited from the experience of other kernel developers giving me firm feedback in a helpful way and saying no to bad approaches. I don't believe I would have continued if I hadn't been given that feedback in a useful way. Today, I think the code of conduct is a very important addition to the community. It's a stronger assertion that the kernel community is committed to raising the bar for behavior. I have no concern about patch review or quality dropping because most maintainers demonstrate every day that they can give effective feedback. We're all going to screw that up sometimes and the Code of Conduct reminds us to do our best. Most issues that arise can be resolved with a private e-mail going "you might want to rethink your wording there." What the Code of Conduct also provides is confidence that more serious community issues such as harassment not related to patch review can be handled. It spells out certain behaviors that won't be tolerated and explains how those problems will be dealt with. Will those problems actually be handled appropriately when the time comes? I can't actually say for sure, but the kernel community works on trust so I guess I have to trust that it will. I really hope I never have to report harassment but I'm glad there's a process in place. Thanks, Laura
Re: [PATCH] arm64: kprobe: make page to RO mode when allocate it
On 10/15/2018 04:16 AM, Anders Roxell wrote: Commit 1404d6f13e47 ("arm64: dump: Add checking for writable and exectuable pages") has successfully identified code that leaves a page with W+X permissions. [3.245140] arm64/mm: Found insecure W+X mapping at address (ptrval)/0x00d9 [3.245771] WARNING: CPU: 0 PID: 1 at ../arch/arm64/mm/dump.c:232 note_page+0x410/0x420 [3.246141] Modules linked in: [3.246653] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.0-rc5-next-20180928-1-ge70ae259b853-dirty #62 [3.247008] Hardware name: linux,dummy-virt (DT) [3.247347] pstate: 8005 (Nzcv daif -PAN -UAO) [3.247623] pc : note_page+0x410/0x420 [3.247898] lr : note_page+0x410/0x420 [3.248071] sp : 0804bcd0 [3.248254] x29: 0804bcd0 x28: 09274000 [3.248578] x27: 0921a000 x26: 80007dfff000 [3.248845] x25: 093f5000 x24: 09526f6a [3.249109] x23: 0004 x22: 00d91000 [3.249396] x21: 00d9 x20: [3.249661] x19: 0804bde8 x18: 0400 [3.249924] x17: x16: [3.250271] x15: x14: 295f5f5f5f6c6176 [3.250594] x13: 7274705f5f5f5f28 x12: 2073736572646461 [3.250941] x11: 20746120676e6970 x10: 70616d20582b5720 [3.251252] x9 : 6572756365736e69 x8 : 3039643030303030 [3.251519] x7 : 3078302f x6 : 095467b2 [3.251802] x5 : x4 : [3.252060] x3 : x2 : [3.252323] x1 : 4d151327adc50b00 x0 : [3.252664] Call trace: [3.252953] note_page+0x410/0x420 [3.253186] walk_pgd+0x12c/0x238 [3.253417] ptdump_check_wx+0x68/0xf8 [3.253637] mark_rodata_ro+0x68/0x98 [3.253847] kernel_init+0x38/0x160 [3.254103] ret_from_fork+0x10/0x18 Reworked to that when allocate a page it sets mode RO. Inspired by commit 63fef14fc98a ("kprobes/x86: Make insn buffer always ROX and use text_poke()"). Reviewed-by: Laura Abbott Cc: Arnd Bergmann Cc: Ard Biesheuvel Cc: Laura Abbott Cc: Catalin Marinas Co-developed-by: Arnd Bergmann Co-developed-by: Ard Biesheuvel Signed-off-by: Anders Roxell --- arch/arm64/kernel/probes/kprobes.c | 27 --- 1 file changed, 20 insertions(+), 7 deletions(-) diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c index 9b65132e789a..b842e908b423 100644 --- a/arch/arm64/kernel/probes/kprobes.c +++ b/arch/arm64/kernel/probes/kprobes.c @@ -23,7 +23,9 @@ #include #include #include +#include #include +#include #include #include #include @@ -42,10 +44,21 @@ DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk); static void __kprobes post_kprobe_handler(struct kprobe_ctlblk *, struct pt_regs *); +static int __kprobes patch_text(kprobe_opcode_t *addr, u32 opcode) +{ + void *addrs[1]; + u32 insns[1]; + + addrs[0] = (void *)addr; + insns[0] = (u32)opcode; + + return aarch64_insn_patch_text(addrs, insns, 1); +} + static void __kprobes arch_prepare_ss_slot(struct kprobe *p) { /* prepare insn slot */ - p->ainsn.api.insn[0] = cpu_to_le32(p->opcode); + patch_text(p->ainsn.api.insn, p->opcode); flush_icache_range((uintptr_t) (p->ainsn.api.insn), (uintptr_t) (p->ainsn.api.insn) + @@ -118,15 +131,15 @@ int __kprobes arch_prepare_kprobe(struct kprobe *p) return 0; } -static int __kprobes patch_text(kprobe_opcode_t *addr, u32 opcode) +void *alloc_insn_page(void) { - void *addrs[1]; - u32 insns[1]; + void *page; - addrs[0] = (void *)addr; - insns[0] = (u32)opcode; + page = vmalloc_exec(PAGE_SIZE); + if (page) + set_memory_ro((unsigned long)page & PAGE_MASK, 1); - return aarch64_insn_patch_text(addrs, insns, 1); + return page; } /* arm kprobe: install breakpoint in text */