Re: [RFC PATCH 1/5] mm/page_alloc: always add freeing page at the tail of the buddy list
On Wed, Oct 26, 2016 at 01:50:37PM +0800, Xishi Qiu wrote: > On 2016/10/26 12:37, Joonsoo Kim wrote: > > > On Mon, Oct 17, 2016 at 05:21:54PM +0800, Xishi Qiu wrote: > >> On 2016/10/13 16:08, js1...@gmail.com wrote: > >> > >>> From: Joonsoo Kim> >>> > >>> Currently, freeing page can stay longer in the buddy list if next higher > >>> order page is in the buddy list in order to help coalescence. However, > >>> it doesn't work for the simplest sequential free case. For example, think > >>> about the situation that 8 consecutive pages are freed in sequential > >>> order. > >>> > >>> page 0: attached at the head of order 0 list > >>> page 1: merged with page 0, attached at the head of order 1 list > >>> page 2: attached at the tail of order 0 list > >>> page 3: merged with page 2 and then merged with page 0, attached at > >>> the head of order 2 list > >>> page 4: attached at the head of order 0 list > >>> page 5: merged with page 4, attached at the tail of order 1 list > >>> page 6: attached at the tail of order 0 list > >>> page 7: merged with page 6 and then merged with page 4. Lastly, merged > >>> with page 0 and we get order 3 freepage. > >>> > >>> With excluding page 0 case, there are three cases that freeing page is > >>> attached at the head of buddy list in this example and if just one > >>> corresponding ordered allocation request comes at that moment, this page > >>> in being a high order page will be allocated and we would fail to make > >>> order-3 freepage. > >>> > >>> Allocation usually happens in sequential order and free also does. So, it > >>> would be important to detect such a situation and to give some chance > >>> to be coalesced. > >>> > >>> I think that simple and effective heuristic about this case is just > >>> attaching freeing page at the tail of the buddy list unconditionally. > >>> If freeing isn't merged during one rotation, it would be actual > >>> fragmentation and we don't need to care about it for coalescence. > >>> > >> > >> Hi Joonsoo, > >> > >> I find another two places to reduce fragmentation. > >> > >> 1) > >> __rmqueue_fallback > >>steal_suitable_fallback > >>move_freepages_block > >>move_freepages > >>list_move > >> If we steal some free pages, we will add these page at the head of > >> start_migratetype list, > >> this will cause more fixed migratetype, because this pages will be > >> allocated more easily. > >> So how about use list_move_tail instead of list_move? > > > > Yeah... I don't think deeply but, at a glance, it would be helpful. > > > >> > >> 2) > >> __rmqueue_fallback > >>expand > >>list_add > >> How about use list_add_tail instead of list_add? If add the tail, then the > >> rest of pages > >> will be hard to be allocated and we can merge them again as soon as the > >> page freed. > > > > I guess that it has no effect. When we do __rmqueue_fallback() and > > expand(), we don't have any freepage on this or more order. So, > > list_add or list_add_tail will show the same result. > > > > Hi Joonsoo, > > Usually this list is empty, but in the following case, the list is not empty. > > __rmqueue_fallback > steal_suitable_fallback > move_freepages_block // move to the list of start_migratetype > expand // split the largest order first > list_add // add to the list of start_migratetype In this case, stealed freepage on steal_suitable_fallback() and splitted freepage would come from the same pageblock. So, it doen't matter to use whatever list_add* function. Thanks.
Re: [RFC PATCH 1/5] mm/page_alloc: always add freeing page at the tail of the buddy list
On Wed, Oct 26, 2016 at 01:50:37PM +0800, Xishi Qiu wrote: > On 2016/10/26 12:37, Joonsoo Kim wrote: > > > On Mon, Oct 17, 2016 at 05:21:54PM +0800, Xishi Qiu wrote: > >> On 2016/10/13 16:08, js1...@gmail.com wrote: > >> > >>> From: Joonsoo Kim > >>> > >>> Currently, freeing page can stay longer in the buddy list if next higher > >>> order page is in the buddy list in order to help coalescence. However, > >>> it doesn't work for the simplest sequential free case. For example, think > >>> about the situation that 8 consecutive pages are freed in sequential > >>> order. > >>> > >>> page 0: attached at the head of order 0 list > >>> page 1: merged with page 0, attached at the head of order 1 list > >>> page 2: attached at the tail of order 0 list > >>> page 3: merged with page 2 and then merged with page 0, attached at > >>> the head of order 2 list > >>> page 4: attached at the head of order 0 list > >>> page 5: merged with page 4, attached at the tail of order 1 list > >>> page 6: attached at the tail of order 0 list > >>> page 7: merged with page 6 and then merged with page 4. Lastly, merged > >>> with page 0 and we get order 3 freepage. > >>> > >>> With excluding page 0 case, there are three cases that freeing page is > >>> attached at the head of buddy list in this example and if just one > >>> corresponding ordered allocation request comes at that moment, this page > >>> in being a high order page will be allocated and we would fail to make > >>> order-3 freepage. > >>> > >>> Allocation usually happens in sequential order and free also does. So, it > >>> would be important to detect such a situation and to give some chance > >>> to be coalesced. > >>> > >>> I think that simple and effective heuristic about this case is just > >>> attaching freeing page at the tail of the buddy list unconditionally. > >>> If freeing isn't merged during one rotation, it would be actual > >>> fragmentation and we don't need to care about it for coalescence. > >>> > >> > >> Hi Joonsoo, > >> > >> I find another two places to reduce fragmentation. > >> > >> 1) > >> __rmqueue_fallback > >>steal_suitable_fallback > >>move_freepages_block > >>move_freepages > >>list_move > >> If we steal some free pages, we will add these page at the head of > >> start_migratetype list, > >> this will cause more fixed migratetype, because this pages will be > >> allocated more easily. > >> So how about use list_move_tail instead of list_move? > > > > Yeah... I don't think deeply but, at a glance, it would be helpful. > > > >> > >> 2) > >> __rmqueue_fallback > >>expand > >>list_add > >> How about use list_add_tail instead of list_add? If add the tail, then the > >> rest of pages > >> will be hard to be allocated and we can merge them again as soon as the > >> page freed. > > > > I guess that it has no effect. When we do __rmqueue_fallback() and > > expand(), we don't have any freepage on this or more order. So, > > list_add or list_add_tail will show the same result. > > > > Hi Joonsoo, > > Usually this list is empty, but in the following case, the list is not empty. > > __rmqueue_fallback > steal_suitable_fallback > move_freepages_block // move to the list of start_migratetype > expand // split the largest order first > list_add // add to the list of start_migratetype In this case, stealed freepage on steal_suitable_fallback() and splitted freepage would come from the same pageblock. So, it doen't matter to use whatever list_add* function. Thanks.
Re: [RFC PATCH 1/5] mm/page_alloc: always add freeing page at the tail of the buddy list
On 2016/10/26 12:37, Joonsoo Kim wrote: > On Mon, Oct 17, 2016 at 05:21:54PM +0800, Xishi Qiu wrote: >> On 2016/10/13 16:08, js1...@gmail.com wrote: >> >>> From: Joonsoo Kim>>> >>> Currently, freeing page can stay longer in the buddy list if next higher >>> order page is in the buddy list in order to help coalescence. However, >>> it doesn't work for the simplest sequential free case. For example, think >>> about the situation that 8 consecutive pages are freed in sequential >>> order. >>> >>> page 0: attached at the head of order 0 list >>> page 1: merged with page 0, attached at the head of order 1 list >>> page 2: attached at the tail of order 0 list >>> page 3: merged with page 2 and then merged with page 0, attached at >>> the head of order 2 list >>> page 4: attached at the head of order 0 list >>> page 5: merged with page 4, attached at the tail of order 1 list >>> page 6: attached at the tail of order 0 list >>> page 7: merged with page 6 and then merged with page 4. Lastly, merged >>> with page 0 and we get order 3 freepage. >>> >>> With excluding page 0 case, there are three cases that freeing page is >>> attached at the head of buddy list in this example and if just one >>> corresponding ordered allocation request comes at that moment, this page >>> in being a high order page will be allocated and we would fail to make >>> order-3 freepage. >>> >>> Allocation usually happens in sequential order and free also does. So, it >>> would be important to detect such a situation and to give some chance >>> to be coalesced. >>> >>> I think that simple and effective heuristic about this case is just >>> attaching freeing page at the tail of the buddy list unconditionally. >>> If freeing isn't merged during one rotation, it would be actual >>> fragmentation and we don't need to care about it for coalescence. >>> >> >> Hi Joonsoo, >> >> I find another two places to reduce fragmentation. >> >> 1) >> __rmqueue_fallback >> steal_suitable_fallback >> move_freepages_block >> move_freepages >> list_move >> If we steal some free pages, we will add these page at the head of >> start_migratetype list, >> this will cause more fixed migratetype, because this pages will be allocated >> more easily. >> So how about use list_move_tail instead of list_move? > > Yeah... I don't think deeply but, at a glance, it would be helpful. > >> >> 2) >> __rmqueue_fallback >> expand >> list_add >> How about use list_add_tail instead of list_add? If add the tail, then the >> rest of pages >> will be hard to be allocated and we can merge them again as soon as the page >> freed. > > I guess that it has no effect. When we do __rmqueue_fallback() and > expand(), we don't have any freepage on this or more order. So, > list_add or list_add_tail will show the same result. > Hi Joonsoo, Usually this list is empty, but in the following case, the list is not empty. __rmqueue_fallback steal_suitable_fallback move_freepages_block // move to the list of start_migratetype expand // split the largest order first list_add // add to the list of start_migratetype Thanks, Xishi Qiu
Re: [RFC PATCH 1/5] mm/page_alloc: always add freeing page at the tail of the buddy list
On 2016/10/26 12:37, Joonsoo Kim wrote: > On Mon, Oct 17, 2016 at 05:21:54PM +0800, Xishi Qiu wrote: >> On 2016/10/13 16:08, js1...@gmail.com wrote: >> >>> From: Joonsoo Kim >>> >>> Currently, freeing page can stay longer in the buddy list if next higher >>> order page is in the buddy list in order to help coalescence. However, >>> it doesn't work for the simplest sequential free case. For example, think >>> about the situation that 8 consecutive pages are freed in sequential >>> order. >>> >>> page 0: attached at the head of order 0 list >>> page 1: merged with page 0, attached at the head of order 1 list >>> page 2: attached at the tail of order 0 list >>> page 3: merged with page 2 and then merged with page 0, attached at >>> the head of order 2 list >>> page 4: attached at the head of order 0 list >>> page 5: merged with page 4, attached at the tail of order 1 list >>> page 6: attached at the tail of order 0 list >>> page 7: merged with page 6 and then merged with page 4. Lastly, merged >>> with page 0 and we get order 3 freepage. >>> >>> With excluding page 0 case, there are three cases that freeing page is >>> attached at the head of buddy list in this example and if just one >>> corresponding ordered allocation request comes at that moment, this page >>> in being a high order page will be allocated and we would fail to make >>> order-3 freepage. >>> >>> Allocation usually happens in sequential order and free also does. So, it >>> would be important to detect such a situation and to give some chance >>> to be coalesced. >>> >>> I think that simple and effective heuristic about this case is just >>> attaching freeing page at the tail of the buddy list unconditionally. >>> If freeing isn't merged during one rotation, it would be actual >>> fragmentation and we don't need to care about it for coalescence. >>> >> >> Hi Joonsoo, >> >> I find another two places to reduce fragmentation. >> >> 1) >> __rmqueue_fallback >> steal_suitable_fallback >> move_freepages_block >> move_freepages >> list_move >> If we steal some free pages, we will add these page at the head of >> start_migratetype list, >> this will cause more fixed migratetype, because this pages will be allocated >> more easily. >> So how about use list_move_tail instead of list_move? > > Yeah... I don't think deeply but, at a glance, it would be helpful. > >> >> 2) >> __rmqueue_fallback >> expand >> list_add >> How about use list_add_tail instead of list_add? If add the tail, then the >> rest of pages >> will be hard to be allocated and we can merge them again as soon as the page >> freed. > > I guess that it has no effect. When we do __rmqueue_fallback() and > expand(), we don't have any freepage on this or more order. So, > list_add or list_add_tail will show the same result. > Hi Joonsoo, Usually this list is empty, but in the following case, the list is not empty. __rmqueue_fallback steal_suitable_fallback move_freepages_block // move to the list of start_migratetype expand // split the largest order first list_add // add to the list of start_migratetype Thanks, Xishi Qiu
Re: [PATCH 1/2] x86/io: add interface to reserve io memtype for a resource range. (v1.1)
On Tue, Oct 25, 2016 at 07:31:29PM +0200, Luis R. Rodriguez wrote: > On Mon, Oct 24, 2016 at 04:31:45PM +1000, Dave Airlie wrote: > > A recent change to the mm code in: > > 87744ab3832b83ba71b931f86f9cfdb000d07da5 > > mm: fix cache mode tracking in vm_insert_mixed() > > > > started enforcing checking the memory type against the registered list for > > amixed pfn insertion mappings. It happens that the drm drivers for a number > > of gpus relied on this being broken. Currently the driver only inserted > > VRAM mappings into the tracking table when they came from the kernel, > > and userspace mappings never landed in the table. This led to a regression > > where all the mapping end up as UC instead of WC now. > > Eek. > > > I've considered a number of solutions but since this needs to be fixed > > in fixes and not next, and some of the solutions were going to introduce > > overhead that hadn't been there before I didn't consider them viable at > > this stage. These mainly concerned hooking into the TTM io reserve APIs, > > but these API have a bunch of fast paths I didn't want to unwind to add > > this to. > > > > The solution I've decided on is to add a new API like the arch_phys_wc > > APIs (these would have worked but wc_del didn't take a range), and > > use them from the drivers to add a WC compatible mapping to the table > > for all VRAM on those GPUs. This means we can then create userspace > > mapping that won't get degraded to UC. > > Is anything on a driver to be able to tell when this is actually needed ? > How will driver developers know? Can you add a bit of documentation to > the API? If its transitive towards a secondary solution indicating so > would help driver developers. I'll plug the io-mapping stuff again here, and more specifically the userspace pte wrangling stuff we've added in 4.9 to i915_mm.c. Should probably move that one to the core. That way io_mapping takes care of the full reservartion, and allows you to on-demand kmap (for kernel) and write ptes. All nicely fast and all, and for bonus, also nicely encapsulated. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [PATCH 1/2] x86/io: add interface to reserve io memtype for a resource range. (v1.1)
On Tue, Oct 25, 2016 at 07:31:29PM +0200, Luis R. Rodriguez wrote: > On Mon, Oct 24, 2016 at 04:31:45PM +1000, Dave Airlie wrote: > > A recent change to the mm code in: > > 87744ab3832b83ba71b931f86f9cfdb000d07da5 > > mm: fix cache mode tracking in vm_insert_mixed() > > > > started enforcing checking the memory type against the registered list for > > amixed pfn insertion mappings. It happens that the drm drivers for a number > > of gpus relied on this being broken. Currently the driver only inserted > > VRAM mappings into the tracking table when they came from the kernel, > > and userspace mappings never landed in the table. This led to a regression > > where all the mapping end up as UC instead of WC now. > > Eek. > > > I've considered a number of solutions but since this needs to be fixed > > in fixes and not next, and some of the solutions were going to introduce > > overhead that hadn't been there before I didn't consider them viable at > > this stage. These mainly concerned hooking into the TTM io reserve APIs, > > but these API have a bunch of fast paths I didn't want to unwind to add > > this to. > > > > The solution I've decided on is to add a new API like the arch_phys_wc > > APIs (these would have worked but wc_del didn't take a range), and > > use them from the drivers to add a WC compatible mapping to the table > > for all VRAM on those GPUs. This means we can then create userspace > > mapping that won't get degraded to UC. > > Is anything on a driver to be able to tell when this is actually needed ? > How will driver developers know? Can you add a bit of documentation to > the API? If its transitive towards a secondary solution indicating so > would help driver developers. I'll plug the io-mapping stuff again here, and more specifically the userspace pte wrangling stuff we've added in 4.9 to i915_mm.c. Should probably move that one to the core. That way io_mapping takes care of the full reservartion, and allows you to on-demand kmap (for kernel) and write ptes. All nicely fast and all, and for bonus, also nicely encapsulated. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
[tip:x86/asm] mm/page_alloc: Remove kernel address exposure in free_reserved_area()
Commit-ID: adb1fe9ae2ee6ef6bc10f3d5a588020e7664dfa7 Gitweb: http://git.kernel.org/tip/adb1fe9ae2ee6ef6bc10f3d5a588020e7664dfa7 Author: Josh PoimboeufAuthorDate: Tue, 25 Oct 2016 09:51:14 -0500 Committer: Ingo Molnar CommitDate: Tue, 25 Oct 2016 18:40:37 +0200 mm/page_alloc: Remove kernel address exposure in free_reserved_area() Linus suggested we try to remove some of the low-hanging fruit related to kernel address exposure in dmesg. The only leaks I see on my local system are: Freeing SMP alternatives memory: 32K (9e309000 - 9e311000) Freeing initrd memory: 10588K (a0b736b42000 - a0b737599000) Freeing unused kernel memory: 3592K (9df87000 - 9e309000) Freeing unused kernel memory: 1352K (a0b7288ae000 - a0b728a0) Freeing unused kernel memory: 632K (a0b728d62000 - a0b728e0) Linus says: "I suspect we should just remove [the addresses in the 'Freeing' messages]. I'm sure they are useful in theory, but I suspect they were more useful back when the whole "free init memory" was originally done. These days, if we have a use-after-free, I suspect the init-mem situation is the easiest situation by far. Compared to all the dynamic allocations which are much more likely to show it anyway. So having debug output for that case is likely not all that productive." With this patch the freeing messages now look like this: Freeing SMP alternatives memory: 32K Freeing initrd memory: 10588K Freeing unused kernel memory: 3592K Freeing unused kernel memory: 1352K Freeing unused kernel memory: 632K Suggested-by: Linus Torvalds Signed-off-by: Josh Poimboeuf Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux...@kvack.org Link: http://lkml.kernel.org/r/6836ff90c45b71d38e5d4405aec56fa9e5d1d4b2.1477405374.git.jpoim...@redhat.com Signed-off-by: Ingo Molnar --- mm/page_alloc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2b3bf67..3f63973 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6508,8 +6508,8 @@ unsigned long free_reserved_area(void *start, void *end, int poison, char *s) } if (pages && s) - pr_info("Freeing %s memory: %ldK (%p - %p)\n", - s, pages << (PAGE_SHIFT - 10), start, end); + pr_info("Freeing %s memory: %ldK\n", + s, pages << (PAGE_SHIFT - 10)); return pages; }
[tip:x86/asm] mm/page_alloc: Remove kernel address exposure in free_reserved_area()
Commit-ID: adb1fe9ae2ee6ef6bc10f3d5a588020e7664dfa7 Gitweb: http://git.kernel.org/tip/adb1fe9ae2ee6ef6bc10f3d5a588020e7664dfa7 Author: Josh Poimboeuf AuthorDate: Tue, 25 Oct 2016 09:51:14 -0500 Committer: Ingo Molnar CommitDate: Tue, 25 Oct 2016 18:40:37 +0200 mm/page_alloc: Remove kernel address exposure in free_reserved_area() Linus suggested we try to remove some of the low-hanging fruit related to kernel address exposure in dmesg. The only leaks I see on my local system are: Freeing SMP alternatives memory: 32K (9e309000 - 9e311000) Freeing initrd memory: 10588K (a0b736b42000 - a0b737599000) Freeing unused kernel memory: 3592K (9df87000 - 9e309000) Freeing unused kernel memory: 1352K (a0b7288ae000 - a0b728a0) Freeing unused kernel memory: 632K (a0b728d62000 - a0b728e0) Linus says: "I suspect we should just remove [the addresses in the 'Freeing' messages]. I'm sure they are useful in theory, but I suspect they were more useful back when the whole "free init memory" was originally done. These days, if we have a use-after-free, I suspect the init-mem situation is the easiest situation by far. Compared to all the dynamic allocations which are much more likely to show it anyway. So having debug output for that case is likely not all that productive." With this patch the freeing messages now look like this: Freeing SMP alternatives memory: 32K Freeing initrd memory: 10588K Freeing unused kernel memory: 3592K Freeing unused kernel memory: 1352K Freeing unused kernel memory: 632K Suggested-by: Linus Torvalds Signed-off-by: Josh Poimboeuf Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux...@kvack.org Link: http://lkml.kernel.org/r/6836ff90c45b71d38e5d4405aec56fa9e5d1d4b2.1477405374.git.jpoim...@redhat.com Signed-off-by: Ingo Molnar --- mm/page_alloc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2b3bf67..3f63973 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6508,8 +6508,8 @@ unsigned long free_reserved_area(void *start, void *end, int poison, char *s) } if (pages && s) - pr_info("Freeing %s memory: %ldK (%p - %p)\n", - s, pages << (PAGE_SHIFT - 10), start, end); + pr_info("Freeing %s memory: %ldK\n", + s, pages << (PAGE_SHIFT - 10)); return pages; }
[tip:x86/asm] x86/dumpstack: Remove raw stack dump
Commit-ID: 0ee1dd9f5e7eae4e55f95935b72d4beecb03de9c Gitweb: http://git.kernel.org/tip/0ee1dd9f5e7eae4e55f95935b72d4beecb03de9c Author: Josh PoimboeufAuthorDate: Tue, 25 Oct 2016 09:51:13 -0500 Committer: Ingo Molnar CommitDate: Tue, 25 Oct 2016 18:40:37 +0200 x86/dumpstack: Remove raw stack dump For mostly historical reasons, the x86 oops dump shows the raw stack values: ... [registers] Stack: 880079af7350 880079905400 c98f3ae0 a0196610 0001 0001 87654321 0002 Call Trace: ... This seems to be an artifact from long ago, and probably isn't needed anymore. It generally just adds noise to the dump, and it can be actively harmful because it leaks kernel addresses. Linus says: "The stack dump actually goes back to forever, and it used to be useful back in 1992 or so. But it used to be useful mainly because stacks were simpler and we didn't have very good call traces anyway. I definitely remember having used them - I just do not remember having used them in the last ten+ years. Of course, it's still true that if you can trigger an oops, you've likely already lost the security game, but since the stack dump is so useless, let's aim to just remove it and make games like the above harder." This also removes the related 'kstack=' cmdline option and the 'kstack_depth_to_print' sysctl. Suggested-by: Linus Torvalds Signed-off-by: Josh Poimboeuf Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/e83bd50df52d8fe88e94d2566426ae40d813bf8f.1477405374.git.jpoim...@redhat.com Signed-off-by: Ingo Molnar --- Documentation/kernel-parameters.txt | 3 -- Documentation/sysctl/kernel.txt | 8 - Documentation/x86/x86_64/boot-options.txt | 4 --- arch/x86/include/asm/stacktrace.h | 5 --- arch/x86/kernel/dumpstack.c | 21 ++-- arch/x86/kernel/dumpstack_32.c| 33 +-- arch/x86/kernel/dumpstack_64.c| 53 +-- kernel/sysctl.c | 7 8 files changed, 4 insertions(+), 130 deletions(-) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 37babf9..049a917 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1958,9 +1958,6 @@ bytes respectively. Such letter suffixes can also be entirely omitted. kmemcheck=2 (one-shot mode) Default: 2 (one-shot mode) - kstack=N[X86] Print N words from the kernel stack - in oops dumps. - kvm.ignore_msrs=[KVM] Ignore guest accesses to unhandled MSRs. Default is 0 (don't ignore, but inject #GP) diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index ffab8b5..065f184 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -40,7 +40,6 @@ show up in /proc/sys/kernel: - hung_task_warnings - kexec_load_disabled - kptr_restrict -- kstack_depth_to_print [ X86 only ] - l2cr[ PPC only ] - modprobe==> Documentation/debugging-modules.txt - modules_disabled @@ -395,13 +394,6 @@ When kptr_restrict is set to (2), kernel pointers printed using == -kstack_depth_to_print: (X86 only) - -Controls the number of words to print when dumping the raw -kernel stack. - -== - l2cr: (PPC only) This flag controls the L2 cache of G3 processor boards. If diff --git a/Documentation/x86/x86_64/boot-options.txt b/Documentation/x86/x86_64/boot-options.txt index 0965a71..61b611e 100644 --- a/Documentation/x86/x86_64/boot-options.txt +++ b/Documentation/x86/x86_64/boot-options.txt @@ -277,10 +277,6 @@ IOMMU (input/output memory management unit) space might stop working. Use this option if you have devices that are accessed from userspace directly on some PCI host bridge. -Debugging - - kstack=N Print N words from the kernel stack in oops dumps. - Miscellaneous nogbpages diff --git a/arch/x86/include/asm/stacktrace.h b/arch/x86/include/asm/stacktrace.h index 37f2e0b..1e375b0 100644 --- a/arch/x86/include/asm/stacktrace.h +++ b/arch/x86/include/asm/stacktrace.h @@ -43,8 +43,6 @@ static inline bool on_stack(struct stack_info *info, void *addr, size_t len) addr
[tip:x86/asm] x86/dumpstack: Remove raw stack dump
Commit-ID: 0ee1dd9f5e7eae4e55f95935b72d4beecb03de9c Gitweb: http://git.kernel.org/tip/0ee1dd9f5e7eae4e55f95935b72d4beecb03de9c Author: Josh Poimboeuf AuthorDate: Tue, 25 Oct 2016 09:51:13 -0500 Committer: Ingo Molnar CommitDate: Tue, 25 Oct 2016 18:40:37 +0200 x86/dumpstack: Remove raw stack dump For mostly historical reasons, the x86 oops dump shows the raw stack values: ... [registers] Stack: 880079af7350 880079905400 c98f3ae0 a0196610 0001 0001 87654321 0002 Call Trace: ... This seems to be an artifact from long ago, and probably isn't needed anymore. It generally just adds noise to the dump, and it can be actively harmful because it leaks kernel addresses. Linus says: "The stack dump actually goes back to forever, and it used to be useful back in 1992 or so. But it used to be useful mainly because stacks were simpler and we didn't have very good call traces anyway. I definitely remember having used them - I just do not remember having used them in the last ten+ years. Of course, it's still true that if you can trigger an oops, you've likely already lost the security game, but since the stack dump is so useless, let's aim to just remove it and make games like the above harder." This also removes the related 'kstack=' cmdline option and the 'kstack_depth_to_print' sysctl. Suggested-by: Linus Torvalds Signed-off-by: Josh Poimboeuf Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/e83bd50df52d8fe88e94d2566426ae40d813bf8f.1477405374.git.jpoim...@redhat.com Signed-off-by: Ingo Molnar --- Documentation/kernel-parameters.txt | 3 -- Documentation/sysctl/kernel.txt | 8 - Documentation/x86/x86_64/boot-options.txt | 4 --- arch/x86/include/asm/stacktrace.h | 5 --- arch/x86/kernel/dumpstack.c | 21 ++-- arch/x86/kernel/dumpstack_32.c| 33 +-- arch/x86/kernel/dumpstack_64.c| 53 +-- kernel/sysctl.c | 7 8 files changed, 4 insertions(+), 130 deletions(-) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 37babf9..049a917 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1958,9 +1958,6 @@ bytes respectively. Such letter suffixes can also be entirely omitted. kmemcheck=2 (one-shot mode) Default: 2 (one-shot mode) - kstack=N[X86] Print N words from the kernel stack - in oops dumps. - kvm.ignore_msrs=[KVM] Ignore guest accesses to unhandled MSRs. Default is 0 (don't ignore, but inject #GP) diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index ffab8b5..065f184 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -40,7 +40,6 @@ show up in /proc/sys/kernel: - hung_task_warnings - kexec_load_disabled - kptr_restrict -- kstack_depth_to_print [ X86 only ] - l2cr[ PPC only ] - modprobe==> Documentation/debugging-modules.txt - modules_disabled @@ -395,13 +394,6 @@ When kptr_restrict is set to (2), kernel pointers printed using == -kstack_depth_to_print: (X86 only) - -Controls the number of words to print when dumping the raw -kernel stack. - -== - l2cr: (PPC only) This flag controls the L2 cache of G3 processor boards. If diff --git a/Documentation/x86/x86_64/boot-options.txt b/Documentation/x86/x86_64/boot-options.txt index 0965a71..61b611e 100644 --- a/Documentation/x86/x86_64/boot-options.txt +++ b/Documentation/x86/x86_64/boot-options.txt @@ -277,10 +277,6 @@ IOMMU (input/output memory management unit) space might stop working. Use this option if you have devices that are accessed from userspace directly on some PCI host bridge. -Debugging - - kstack=N Print N words from the kernel stack in oops dumps. - Miscellaneous nogbpages diff --git a/arch/x86/include/asm/stacktrace.h b/arch/x86/include/asm/stacktrace.h index 37f2e0b..1e375b0 100644 --- a/arch/x86/include/asm/stacktrace.h +++ b/arch/x86/include/asm/stacktrace.h @@ -43,8 +43,6 @@ static inline bool on_stack(struct stack_info *info, void *addr, size_t len) addr + len > begin && addr + len <= end); } -extern int kstack_depth_to_print; - #ifdef CONFIG_X86_32 #define STACKSLOTS_PER_LINE 8 #else @@ -86,9 +84,6 @@ get_stack_pointer(struct task_struct *task, struct pt_regs *regs) void
[tip:x86/asm] x86/dumpstack: Remove kernel text addresses from stack dump
Commit-ID: bb5e5ce545f2031c96f7901cd8d1698ea3ca4c9c Gitweb: http://git.kernel.org/tip/bb5e5ce545f2031c96f7901cd8d1698ea3ca4c9c Author: Josh PoimboeufAuthorDate: Tue, 25 Oct 2016 09:51:12 -0500 Committer: Ingo Molnar CommitDate: Tue, 25 Oct 2016 18:40:37 +0200 x86/dumpstack: Remove kernel text addresses from stack dump Printing kernel text addresses in stack dumps is of questionable value, especially now that address randomization is becoming common. It can be a security issue because it leaks kernel addresses. It also affects the usefulness of the stack dump. Linus says: "I actually spend time cleaning up commit messages in logs, because useless data that isn't actually information (random hex numbers) is actively detrimental. It makes commit logs less legible. It also makes it harder to parse dumps. It's not useful. That makes it actively bad. I probably look at more oops reports than most people. I have not found the hex numbers useful for the last five years, because they are just randomized crap. The stack content thing just makes code scroll off the screen etc, for example." The only real downside to removing these addresses is that they can be used to disambiguate duplicate symbol names. However such cases are rare, and the context of the stack dump should be enough to be able to figure it out. There's now a 'faddr2line' script which can be used to convert a function address to a file name and line: $ ./scripts/faddr2line ~/k/vmlinux write_sysrq_trigger+0x51/0x60 write_sysrq_trigger+0x51/0x60: write_sysrq_trigger at drivers/tty/sysrq.c:1098 Or gdb can be used: $ echo "list *write_sysrq_trigger+0x51" |gdb ~/k/vmlinux |grep "is in" (gdb) 0x815b5d83 is in driver_probe_device (/home/jpoimboe/git/linux/drivers/base/dd.c:378). (But note that when there are duplicate symbol names, gdb will only show the first symbol it finds. faddr2line is recommended over gdb because it handles duplicates and it also does function size checking.) Here's an example of what a stack dump looks like after this change: BUG: unable to handle kernel NULL pointer dereference at (null) IP: sysrq_handle_crash+0x45/0x80 PGD 36bfa067 [ 29.650644] PUD 7aca3067 Oops: 0002 [#1] PREEMPT SMP Modules linked in: ... CPU: 1 PID: 786 Comm: bash Tainted: GE 4.9.0-rc1+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014 task: 880078582a40 task.stack: c9ba8000 RIP: 0010:sysrq_handle_crash+0x45/0x80 RSP: 0018:c9babdc8 EFLAGS: 00010296 RAX: 880078582a40 RBX: 0063 RCX: 0001 RDX: 0001 RSI: RDI: 0292 RBP: c9babdc8 R08: 000b31866061 R09: R10: 0001 R11: R12: R13: 0007 R14: 81ee8680 R15: FS: 7ffb43869700() GS:88007d40() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: CR3: 7a3e9000 CR4: 001406e0 Stack: c9babe00 81572d08 81572bd5 0002 880079606600 7ffb4386e000 c9babe20 81573201 880036a3fd00 fffb c9babe40 Call Trace: __handle_sysrq+0x138/0x220 ? __handle_sysrq+0x5/0x220 write_sysrq_trigger+0x51/0x60 proc_reg_write+0x42/0x70 __vfs_write+0x37/0x140 ? preempt_count_sub+0xa1/0x100 ? __sb_start_write+0xf5/0x210 ? vfs_write+0x183/0x1a0 vfs_write+0xb8/0x1a0 SyS_write+0x58/0xc0 entry_SYSCALL_64_fastpath+0x1f/0xc2 RIP: 0033:0x7ffb42f55940 RSP: 002b:7ffd33bb6b18 EFLAGS: 0246 ORIG_RAX: 0001 RAX: ffda RBX: 0046 RCX: 7ffb42f55940 RDX: 0002 RSI: 7ffb4386e000 RDI: 0001 RBP: 0011 R08: 7ffb4321ea40 R09: 7ffb43869700 R10: 7ffb43869700 R11: 0246 R12: 00778a10 R13: 7ffd33bb5c00 R14: 0007 R15: 0010 Code: 34 e8 d0 34 bc ff 48 c7 c2 3b 2b 57 81 be 01 00 00 00 48 c7 c7 e0 dd e5 81 e8 a8 55 ba ff c7 05 0e 3f de 00 01 00 00 00 0f ae f8 04 25 00 00 00 00 01 5d c3 e8 4c 49 bc ff 84 c0 75 c3 48 c7 RIP: sysrq_handle_crash+0x45/0x80 RSP: c9babdc8 CR2: Suggested-by: Linus Torvalds Signed-off-by: Josh Poimboeuf Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/69329cb29b8f324bb5fcea14d61d224807fb6488.1477405374.git.jpoim...@redhat.com Signed-off-by: Ingo Molnar
[tip:x86/asm] x86/dumpstack: Remove kernel text addresses from stack dump
Commit-ID: bb5e5ce545f2031c96f7901cd8d1698ea3ca4c9c Gitweb: http://git.kernel.org/tip/bb5e5ce545f2031c96f7901cd8d1698ea3ca4c9c Author: Josh Poimboeuf AuthorDate: Tue, 25 Oct 2016 09:51:12 -0500 Committer: Ingo Molnar CommitDate: Tue, 25 Oct 2016 18:40:37 +0200 x86/dumpstack: Remove kernel text addresses from stack dump Printing kernel text addresses in stack dumps is of questionable value, especially now that address randomization is becoming common. It can be a security issue because it leaks kernel addresses. It also affects the usefulness of the stack dump. Linus says: "I actually spend time cleaning up commit messages in logs, because useless data that isn't actually information (random hex numbers) is actively detrimental. It makes commit logs less legible. It also makes it harder to parse dumps. It's not useful. That makes it actively bad. I probably look at more oops reports than most people. I have not found the hex numbers useful for the last five years, because they are just randomized crap. The stack content thing just makes code scroll off the screen etc, for example." The only real downside to removing these addresses is that they can be used to disambiguate duplicate symbol names. However such cases are rare, and the context of the stack dump should be enough to be able to figure it out. There's now a 'faddr2line' script which can be used to convert a function address to a file name and line: $ ./scripts/faddr2line ~/k/vmlinux write_sysrq_trigger+0x51/0x60 write_sysrq_trigger+0x51/0x60: write_sysrq_trigger at drivers/tty/sysrq.c:1098 Or gdb can be used: $ echo "list *write_sysrq_trigger+0x51" |gdb ~/k/vmlinux |grep "is in" (gdb) 0x815b5d83 is in driver_probe_device (/home/jpoimboe/git/linux/drivers/base/dd.c:378). (But note that when there are duplicate symbol names, gdb will only show the first symbol it finds. faddr2line is recommended over gdb because it handles duplicates and it also does function size checking.) Here's an example of what a stack dump looks like after this change: BUG: unable to handle kernel NULL pointer dereference at (null) IP: sysrq_handle_crash+0x45/0x80 PGD 36bfa067 [ 29.650644] PUD 7aca3067 Oops: 0002 [#1] PREEMPT SMP Modules linked in: ... CPU: 1 PID: 786 Comm: bash Tainted: GE 4.9.0-rc1+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014 task: 880078582a40 task.stack: c9ba8000 RIP: 0010:sysrq_handle_crash+0x45/0x80 RSP: 0018:c9babdc8 EFLAGS: 00010296 RAX: 880078582a40 RBX: 0063 RCX: 0001 RDX: 0001 RSI: RDI: 0292 RBP: c9babdc8 R08: 000b31866061 R09: R10: 0001 R11: R12: R13: 0007 R14: 81ee8680 R15: FS: 7ffb43869700() GS:88007d40() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: CR3: 7a3e9000 CR4: 001406e0 Stack: c9babe00 81572d08 81572bd5 0002 880079606600 7ffb4386e000 c9babe20 81573201 880036a3fd00 fffb c9babe40 Call Trace: __handle_sysrq+0x138/0x220 ? __handle_sysrq+0x5/0x220 write_sysrq_trigger+0x51/0x60 proc_reg_write+0x42/0x70 __vfs_write+0x37/0x140 ? preempt_count_sub+0xa1/0x100 ? __sb_start_write+0xf5/0x210 ? vfs_write+0x183/0x1a0 vfs_write+0xb8/0x1a0 SyS_write+0x58/0xc0 entry_SYSCALL_64_fastpath+0x1f/0xc2 RIP: 0033:0x7ffb42f55940 RSP: 002b:7ffd33bb6b18 EFLAGS: 0246 ORIG_RAX: 0001 RAX: ffda RBX: 0046 RCX: 7ffb42f55940 RDX: 0002 RSI: 7ffb4386e000 RDI: 0001 RBP: 0011 R08: 7ffb4321ea40 R09: 7ffb43869700 R10: 7ffb43869700 R11: 0246 R12: 00778a10 R13: 7ffd33bb5c00 R14: 0007 R15: 0010 Code: 34 e8 d0 34 bc ff 48 c7 c2 3b 2b 57 81 be 01 00 00 00 48 c7 c7 e0 dd e5 81 e8 a8 55 ba ff c7 05 0e 3f de 00 01 00 00 00 0f ae f8 04 25 00 00 00 00 01 5d c3 e8 4c 49 bc ff 84 c0 75 c3 48 c7 RIP: sysrq_handle_crash+0x45/0x80 RSP: c9babdc8 CR2: Suggested-by: Linus Torvalds Signed-off-by: Josh Poimboeuf Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/69329cb29b8f324bb5fcea14d61d224807fb6488.1477405374.git.jpoim...@redhat.com Signed-off-by: Ingo Molnar --- arch/x86/include/asm/kdebug.h | 1 - arch/x86/kernel/dumpstack.c | 18 -- arch/x86/kernel/process_32.c | 7 +++ arch/x86/kernel/process_64.c | 6 +++--- arch/x86/mm/fault.c | 3
[tip:x86/asm] scripts/faddr2line: Fix "size mismatch" error
Commit-ID: efdb4167e676aaba7505bec739785b76e206cb45 Gitweb: http://git.kernel.org/tip/efdb4167e676aaba7505bec739785b76e206cb45 Author: Josh PoimboeufAuthorDate: Tue, 25 Oct 2016 09:51:11 -0500 Committer: Ingo Molnar CommitDate: Tue, 25 Oct 2016 18:40:37 +0200 scripts/faddr2line: Fix "size mismatch" error I'm not sure how we missed this problem before. When I take a function address and size from an oops and give it to faddr2line, it usually complains about a size mismatch: $ scripts/faddr2line ~/k/vmlinux write_sysrq_trigger+0x51/0x60 skipping write_sysrq_trigger address at 0x815731a1 due to size mismatch (0x60 != 83) no match for write_sysrq_trigger+0x51/0x60 The problem is caused by differences in how kallsyms and faddr2line determine a function's size. kallsyms calculates a function's size by parsing the output of 'nm -n' and subtracting the next function's address from the current function's address. This means that nop instructions after the end of the function are included in the size. In contrast, faddr2line reads the size from the symbol table, which does *not* include the ending nops in the function's size. Change faddr2line to calculate the size from the output of 'nm -n' to be consistent with kallsyms and oops outputs. Signed-off-by: Josh Poimboeuf Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/bd313ed7c4003f6b1fda63e825325c44a9d837de.1477405374.git.jpoim...@redhat.com Signed-off-by: Ingo Molnar --- scripts/faddr2line | 33 + 1 file changed, 21 insertions(+), 12 deletions(-) diff --git a/scripts/faddr2line b/scripts/faddr2line index 450b332..29df825 100755 --- a/scripts/faddr2line +++ b/scripts/faddr2line @@ -105,9 +105,18 @@ __faddr2line() { # In rare cases there might be duplicates. while read symbol; do local fields=($symbol) - local sym_base=0x${fields[1]} - local sym_size=${fields[2]} - local sym_type=${fields[3]} + local sym_base=0x${fields[0]} + local sym_type=${fields[1]} + local sym_end=0x${fields[3]} + + # calculate the size + local sym_size=$(($sym_end - $sym_base)) + if [[ -z $sym_size ]] || [[ $sym_size -le 0 ]]; then + warn "bad symbol size: base: $sym_base end: $sym_end" + DONE=1 + return + fi + sym_size=0x$(printf %x $sym_size) # calculate the address local addr=$(($sym_base + $offset)) @@ -116,26 +125,26 @@ __faddr2line() { DONE=1 return fi - local hexaddr=0x$(printf %x $addr) + addr=0x$(printf %x $addr) # weed out non-function symbols - if [[ $sym_type != "FUNC" ]]; then + if [[ $sym_type != t ]] && [[ $sym_type != T ]]; then [[ $print_warnings = 1 ]] && - echo "skipping $func address at $hexaddr due to non-function symbol" + echo "skipping $func address at $addr due to non-function symbol of type '$sym_type'" continue fi # if the user provided a size, make sure it matches the symbol's size if [[ -n $size ]] && [[ $size -ne $sym_size ]]; then [[ $print_warnings = 1 ]] && - echo "skipping $func address at $hexaddr due to size mismatch ($size != $sym_size)" + echo "skipping $func address at $addr due to size mismatch ($size != $sym_size)" continue; fi # make sure the provided offset is within the symbol's range if [[ $offset -gt $sym_size ]]; then [[ $print_warnings = 1 ]] && - echo "skipping $func address at $hexaddr due to size mismatch ($offset > $sym_size)" + echo "skipping $func address at $addr due to size mismatch ($offset > $sym_size)" continue fi @@ -143,12 +152,12 @@ __faddr2line() { [[ $FIRST = 0 ]] && echo FIRST=0 - local hexsize=0x$(printf %x $sym_size) - echo "$func+$offset/$hexsize:" - addr2line -fpie $objfile $hexaddr | sed "s; $dir_prefix\(\./\)*; ;" + # pass real
[tip:x86/asm] scripts/faddr2line: Fix "size mismatch" error
Commit-ID: efdb4167e676aaba7505bec739785b76e206cb45 Gitweb: http://git.kernel.org/tip/efdb4167e676aaba7505bec739785b76e206cb45 Author: Josh Poimboeuf AuthorDate: Tue, 25 Oct 2016 09:51:11 -0500 Committer: Ingo Molnar CommitDate: Tue, 25 Oct 2016 18:40:37 +0200 scripts/faddr2line: Fix "size mismatch" error I'm not sure how we missed this problem before. When I take a function address and size from an oops and give it to faddr2line, it usually complains about a size mismatch: $ scripts/faddr2line ~/k/vmlinux write_sysrq_trigger+0x51/0x60 skipping write_sysrq_trigger address at 0x815731a1 due to size mismatch (0x60 != 83) no match for write_sysrq_trigger+0x51/0x60 The problem is caused by differences in how kallsyms and faddr2line determine a function's size. kallsyms calculates a function's size by parsing the output of 'nm -n' and subtracting the next function's address from the current function's address. This means that nop instructions after the end of the function are included in the size. In contrast, faddr2line reads the size from the symbol table, which does *not* include the ending nops in the function's size. Change faddr2line to calculate the size from the output of 'nm -n' to be consistent with kallsyms and oops outputs. Signed-off-by: Josh Poimboeuf Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/bd313ed7c4003f6b1fda63e825325c44a9d837de.1477405374.git.jpoim...@redhat.com Signed-off-by: Ingo Molnar --- scripts/faddr2line | 33 + 1 file changed, 21 insertions(+), 12 deletions(-) diff --git a/scripts/faddr2line b/scripts/faddr2line index 450b332..29df825 100755 --- a/scripts/faddr2line +++ b/scripts/faddr2line @@ -105,9 +105,18 @@ __faddr2line() { # In rare cases there might be duplicates. while read symbol; do local fields=($symbol) - local sym_base=0x${fields[1]} - local sym_size=${fields[2]} - local sym_type=${fields[3]} + local sym_base=0x${fields[0]} + local sym_type=${fields[1]} + local sym_end=0x${fields[3]} + + # calculate the size + local sym_size=$(($sym_end - $sym_base)) + if [[ -z $sym_size ]] || [[ $sym_size -le 0 ]]; then + warn "bad symbol size: base: $sym_base end: $sym_end" + DONE=1 + return + fi + sym_size=0x$(printf %x $sym_size) # calculate the address local addr=$(($sym_base + $offset)) @@ -116,26 +125,26 @@ __faddr2line() { DONE=1 return fi - local hexaddr=0x$(printf %x $addr) + addr=0x$(printf %x $addr) # weed out non-function symbols - if [[ $sym_type != "FUNC" ]]; then + if [[ $sym_type != t ]] && [[ $sym_type != T ]]; then [[ $print_warnings = 1 ]] && - echo "skipping $func address at $hexaddr due to non-function symbol" + echo "skipping $func address at $addr due to non-function symbol of type '$sym_type'" continue fi # if the user provided a size, make sure it matches the symbol's size if [[ -n $size ]] && [[ $size -ne $sym_size ]]; then [[ $print_warnings = 1 ]] && - echo "skipping $func address at $hexaddr due to size mismatch ($size != $sym_size)" + echo "skipping $func address at $addr due to size mismatch ($size != $sym_size)" continue; fi # make sure the provided offset is within the symbol's range if [[ $offset -gt $sym_size ]]; then [[ $print_warnings = 1 ]] && - echo "skipping $func address at $hexaddr due to size mismatch ($offset > $sym_size)" + echo "skipping $func address at $addr due to size mismatch ($offset > $sym_size)" continue fi @@ -143,12 +152,12 @@ __faddr2line() { [[ $FIRST = 0 ]] && echo FIRST=0 - local hexsize=0x$(printf %x $sym_size) - echo "$func+$offset/$hexsize:" - addr2line -fpie $objfile $hexaddr | sed "s; $dir_prefix\(\./\)*; ;" + # pass real address to addr2line + echo "$func+$offset/$sym_size:" + addr2line -fpie $objfile $addr | sed "s; $dir_prefix\(\./\)*; ;" DONE=1 - done < <(readelf -sW $objfile | awk -v f=$func '$8 == f
[PATCH v2 3/3] kernel/smp: Tell the user we're bringing up secondary CPUs
Currently we don't print anything before starting to bring up secondary CPUs. This can be confusing if it takes a long time to bring up the secondaries, or if the kernel crashes while doing so and produces no further output. On x86 they work around this by detecting when the first secondary CPU comes up and printing a message (see announce_cpu()). But doing it in smp_init() is simpler and works for all arches. Signed-off-by: Michael EllermanReviewed-by: Borislav Petkov --- kernel/smp.c | 2 ++ 1 file changed, 2 insertions(+) v2: Drop "smp:" from pr_info() now we have pr_fmt() defined. diff --git a/kernel/smp.c b/kernel/smp.c index 4323c5db7d26..77fcdb9f2775 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -555,6 +555,8 @@ void __init smp_init(void) idle_threads_init(); cpuhp_threads_init(); + pr_info("Bringing up secondary CPUs ...\n"); + /* FIXME: This should be done in userspace --RR */ for_each_present_cpu(cpu) { if (num_online_cpus() >= setup_max_cpus) -- 2.7.4
[PATCH v2 3/3] kernel/smp: Tell the user we're bringing up secondary CPUs
Currently we don't print anything before starting to bring up secondary CPUs. This can be confusing if it takes a long time to bring up the secondaries, or if the kernel crashes while doing so and produces no further output. On x86 they work around this by detecting when the first secondary CPU comes up and printing a message (see announce_cpu()). But doing it in smp_init() is simpler and works for all arches. Signed-off-by: Michael Ellerman Reviewed-by: Borislav Petkov --- kernel/smp.c | 2 ++ 1 file changed, 2 insertions(+) v2: Drop "smp:" from pr_info() now we have pr_fmt() defined. diff --git a/kernel/smp.c b/kernel/smp.c index 4323c5db7d26..77fcdb9f2775 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -555,6 +555,8 @@ void __init smp_init(void) idle_threads_init(); cpuhp_threads_init(); + pr_info("Bringing up secondary CPUs ...\n"); + /* FIXME: This should be done in userspace --RR */ for_each_present_cpu(cpu) { if (num_online_cpus() >= setup_max_cpus) -- 2.7.4
[PATCH v2 2/3] kernel/smp: Make the SMP boot message common on all arches
Currently after bringing up secondary CPUs all arches print "Brought up %d CPUs". On x86 they also print the number of nodes that were brought online. It would be nice to also print the number of nodes on other arches. Although we could override smp_announce() on the other ~10 NUMA aware arches, it seems simpler to just always print the number of nodes. On non-NUMA arches there is just always 1 node. Having done that, smp_announce() is no longer weak, and seems small enough to just pull directly into smp_init(). Also update the printing of "%d CPUs" to be smart when an SMP kernel is booted on a single CPU system, or when only one CPU is available, eg: smp: Brought up 2 nodes, 1 CPU Signed-off-by: Michael EllermanReviewed-by: Borislav Petkov --- arch/x86/kernel/smpboot.c | 8 kernel/smp.c | 13 +++-- 2 files changed, 7 insertions(+), 14 deletions(-) v2: Print singular CPU when only 1 CPU is found. Drop "smp:" from pr_info() now we have pr_fmt() defined. diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 42f5eb7b4f6c..b9f02383f372 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -821,14 +821,6 @@ wakeup_secondary_cpu_via_init(int phys_apicid, unsigned long start_eip) return (send_status | accept_status); } -void smp_announce(void) -{ - int num_nodes = num_online_nodes(); - - printk(KERN_INFO "x86: Booted up %d node%s, %d CPUs\n", - num_nodes, (num_nodes > 1 ? "s" : ""), num_online_cpus()); -} - /* reduce the number of lines printed when booting a large cpu count system */ static void announce_cpu(int cpu, int apicid) { diff --git a/kernel/smp.c b/kernel/smp.c index 2d1f15d43022..4323c5db7d26 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -546,14 +546,10 @@ void __init setup_nr_cpu_ids(void) nr_cpu_ids = find_last_bit(cpumask_bits(cpu_possible_mask),NR_CPUS) + 1; } -void __weak smp_announce(void) -{ - printk(KERN_INFO "Brought up %d CPUs\n", num_online_cpus()); -} - /* Called by boot processor to activate the rest. */ void __init smp_init(void) { + int num_nodes, num_cpus; unsigned int cpu; idle_threads_init(); @@ -567,8 +563,13 @@ void __init smp_init(void) cpu_up(cpu); } + num_nodes = num_online_nodes(); + num_cpus = num_online_cpus(); + pr_info("Brought up %d node%s, %d CPU%s\n", + num_nodes, (num_nodes > 1 ? "s" : ""), + num_cpus, (num_cpus > 1 ? "s" : "")); + /* Any cleanup work */ - smp_announce(); smp_cpus_done(setup_max_cpus); } -- 2.7.4
[PATCH v2 2/3] kernel/smp: Make the SMP boot message common on all arches
Currently after bringing up secondary CPUs all arches print "Brought up %d CPUs". On x86 they also print the number of nodes that were brought online. It would be nice to also print the number of nodes on other arches. Although we could override smp_announce() on the other ~10 NUMA aware arches, it seems simpler to just always print the number of nodes. On non-NUMA arches there is just always 1 node. Having done that, smp_announce() is no longer weak, and seems small enough to just pull directly into smp_init(). Also update the printing of "%d CPUs" to be smart when an SMP kernel is booted on a single CPU system, or when only one CPU is available, eg: smp: Brought up 2 nodes, 1 CPU Signed-off-by: Michael Ellerman Reviewed-by: Borislav Petkov --- arch/x86/kernel/smpboot.c | 8 kernel/smp.c | 13 +++-- 2 files changed, 7 insertions(+), 14 deletions(-) v2: Print singular CPU when only 1 CPU is found. Drop "smp:" from pr_info() now we have pr_fmt() defined. diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 42f5eb7b4f6c..b9f02383f372 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -821,14 +821,6 @@ wakeup_secondary_cpu_via_init(int phys_apicid, unsigned long start_eip) return (send_status | accept_status); } -void smp_announce(void) -{ - int num_nodes = num_online_nodes(); - - printk(KERN_INFO "x86: Booted up %d node%s, %d CPUs\n", - num_nodes, (num_nodes > 1 ? "s" : ""), num_online_cpus()); -} - /* reduce the number of lines printed when booting a large cpu count system */ static void announce_cpu(int cpu, int apicid) { diff --git a/kernel/smp.c b/kernel/smp.c index 2d1f15d43022..4323c5db7d26 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -546,14 +546,10 @@ void __init setup_nr_cpu_ids(void) nr_cpu_ids = find_last_bit(cpumask_bits(cpu_possible_mask),NR_CPUS) + 1; } -void __weak smp_announce(void) -{ - printk(KERN_INFO "Brought up %d CPUs\n", num_online_cpus()); -} - /* Called by boot processor to activate the rest. */ void __init smp_init(void) { + int num_nodes, num_cpus; unsigned int cpu; idle_threads_init(); @@ -567,8 +563,13 @@ void __init smp_init(void) cpu_up(cpu); } + num_nodes = num_online_nodes(); + num_cpus = num_online_cpus(); + pr_info("Brought up %d node%s, %d CPU%s\n", + num_nodes, (num_nodes > 1 ? "s" : ""), + num_cpus, (num_cpus > 1 ? "s" : "")); + /* Any cleanup work */ - smp_announce(); smp_cpus_done(setup_max_cpus); } -- 2.7.4
[PATCH v2 1/3] kernel/smp: Define pr_fmt() for smp.c
This makes all our pr_xxx()'s start with "smp: ", which helps pin down where they come from and generally looks nice. There is actually only one pr_xxx() use in smp.c at the moment, but we will add some more in the next commit. Suggested-by: Borislav PetkovSigned-off-by: Michael Ellerman --- kernel/smp.c | 3 +++ 1 file changed, 3 insertions(+) v2: New in v2. diff --git a/kernel/smp.c b/kernel/smp.c index bba3b201668d..2d1f15d43022 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -3,6 +3,9 @@ * * (C) Jens Axboe 2008 */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + #include #include #include -- 2.7.4
[PATCH v2 1/3] kernel/smp: Define pr_fmt() for smp.c
This makes all our pr_xxx()'s start with "smp: ", which helps pin down where they come from and generally looks nice. There is actually only one pr_xxx() use in smp.c at the moment, but we will add some more in the next commit. Suggested-by: Borislav Petkov Signed-off-by: Michael Ellerman --- kernel/smp.c | 3 +++ 1 file changed, 3 insertions(+) v2: New in v2. diff --git a/kernel/smp.c b/kernel/smp.c index bba3b201668d..2d1f15d43022 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -3,6 +3,9 @@ * * (C) Jens Axboe 2008 */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + #include #include #include -- 2.7.4
Re: [PATCH 1/2] kernel/smp: Make the SMP boot message common on all arches
Ingo Molnarwrites: > * Michael Ellerman wrote: >> @@ -564,8 +560,11 @@ void __init smp_init(void) >> cpu_up(cpu); >> } >> >> +num_nodes = num_online_nodes(); >> +pr_info("smp: Brought up %d node%s, %d CPUs\n", >> +num_nodes, (num_nodes > 1 ? "s" : ""), num_online_cpus()); > > No objections - but pedantry requires me to mention that while we are > evolving > this code and changing the strings I think we should make the CPU > announcement > CPU%s smart as well: an SMP kernel on a single CPU bootup will result in > num_online_cpus() == 1, right? Yeah that makes sense. I don't often boot any single CPU systems, but I tested with maxcpus=1 and it does look nicer: smp: Brought up 2 nodes, 1 CPU Will send a v2. cheers
Re: [PATCH 1/2] kernel/smp: Make the SMP boot message common on all arches
Ingo Molnar writes: > * Michael Ellerman wrote: >> @@ -564,8 +560,11 @@ void __init smp_init(void) >> cpu_up(cpu); >> } >> >> +num_nodes = num_online_nodes(); >> +pr_info("smp: Brought up %d node%s, %d CPUs\n", >> +num_nodes, (num_nodes > 1 ? "s" : ""), num_online_cpus()); > > No objections - but pedantry requires me to mention that while we are > evolving > this code and changing the strings I think we should make the CPU > announcement > CPU%s smart as well: an SMP kernel on a single CPU bootup will result in > num_online_cpus() == 1, right? Yeah that makes sense. I don't often boot any single CPU systems, but I tested with maxcpus=1 and it does look nicer: smp: Brought up 2 nodes, 1 CPU Will send a v2. cheers
Re: [PATCH 1/2] kernel/smp: Make the SMP boot message common on all arches
Borislav Petkovwrites: > On Thu, Oct 13, 2016 at 07:55:19PM +1100, Michael Ellerman wrote: >> @@ -564,8 +560,11 @@ void __init smp_init(void) >> cpu_up(cpu); >> } >> >> +num_nodes = num_online_nodes(); >> +pr_info("smp: Brought up %d node%s, %d CPUs\n", >> +num_nodes, (num_nodes > 1 ? "s" : ""), num_online_cpus()); > > Please define pr_fmt for this file so that pr_info adds the prefix > automatically. I guess > > #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt > > at the top, before all the include directives should suffice. Sure thing. > Other than that, for both patches: > > Reviewed-by: Borislav Petkov Thanks, v2 coming soon. cheers
Re: [PATCH 1/2] kernel/smp: Make the SMP boot message common on all arches
Borislav Petkov writes: > On Thu, Oct 13, 2016 at 07:55:19PM +1100, Michael Ellerman wrote: >> @@ -564,8 +560,11 @@ void __init smp_init(void) >> cpu_up(cpu); >> } >> >> +num_nodes = num_online_nodes(); >> +pr_info("smp: Brought up %d node%s, %d CPUs\n", >> +num_nodes, (num_nodes > 1 ? "s" : ""), num_online_cpus()); > > Please define pr_fmt for this file so that pr_info adds the prefix > automatically. I guess > > #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt > > at the top, before all the include directives should suffice. Sure thing. > Other than that, for both patches: > > Reviewed-by: Borislav Petkov Thanks, v2 coming soon. cheers
[PATCH] arm64: defconfig: Enable DRM DU and V4L2 FCP + VSP modules
From: Magnus DammExtend the ARM64 defconfig to enable the DU DRM device as module together with required dependencies of V4L2 FCP and VSP modules. This enables VGA output on the r8a7795 Salvator-X board. Signed-off-by: Magnus Damm --- Written against next-20161026 arch/arm64/configs/defconfig | 14 ++ 1 file changed, 14 insertions(+) --- 0001/arch/arm64/configs/defconfig +++ work/arch/arm64/configs/defconfig 2016-10-26 14:10:58.220607110 +0900 @@ -293,8 +293,22 @@ CONFIG_REGULATOR_PWM=y CONFIG_REGULATOR_QCOM_SMD_RPM=y CONFIG_REGULATOR_QCOM_SPMI=y CONFIG_REGULATOR_S2MPS11=y +CONFIG_MEDIA_SUPPORT=m +CONFIG_MEDIA_CAMERA_SUPPORT=y +CONFIG_MEDIA_ANALOG_TV_SUPPORT=y +CONFIG_MEDIA_DIGITAL_TV_SUPPORT=y +CONFIG_MEDIA_CONTROLLER=y +CONFIG_VIDEO_V4L2_SUBDEV_API=y +# CONFIG_DVB_NET is not set +CONFIG_V4L_MEM2MEM_DRIVERS=y +CONFIG_VIDEO_RENESAS_FCP=m +CONFIG_VIDEO_RENESAS_VSP1=m CONFIG_DRM=m CONFIG_DRM_NOUVEAU=m +CONFIG_DRM_RCAR_DU=m +CONFIG_DRM_RCAR_HDMI=y +CONFIG_DRM_RCAR_LVDS=y +CONFIG_DRM_RCAR_VSP=y CONFIG_DRM_TEGRA=m CONFIG_DRM_PANEL_SIMPLE=m CONFIG_DRM_I2C_ADV7511=m
[PATCH] arm64: defconfig: Enable DRM DU and V4L2 FCP + VSP modules
From: Magnus Damm Extend the ARM64 defconfig to enable the DU DRM device as module together with required dependencies of V4L2 FCP and VSP modules. This enables VGA output on the r8a7795 Salvator-X board. Signed-off-by: Magnus Damm --- Written against next-20161026 arch/arm64/configs/defconfig | 14 ++ 1 file changed, 14 insertions(+) --- 0001/arch/arm64/configs/defconfig +++ work/arch/arm64/configs/defconfig 2016-10-26 14:10:58.220607110 +0900 @@ -293,8 +293,22 @@ CONFIG_REGULATOR_PWM=y CONFIG_REGULATOR_QCOM_SMD_RPM=y CONFIG_REGULATOR_QCOM_SPMI=y CONFIG_REGULATOR_S2MPS11=y +CONFIG_MEDIA_SUPPORT=m +CONFIG_MEDIA_CAMERA_SUPPORT=y +CONFIG_MEDIA_ANALOG_TV_SUPPORT=y +CONFIG_MEDIA_DIGITAL_TV_SUPPORT=y +CONFIG_MEDIA_CONTROLLER=y +CONFIG_VIDEO_V4L2_SUBDEV_API=y +# CONFIG_DVB_NET is not set +CONFIG_V4L_MEM2MEM_DRIVERS=y +CONFIG_VIDEO_RENESAS_FCP=m +CONFIG_VIDEO_RENESAS_VSP1=m CONFIG_DRM=m CONFIG_DRM_NOUVEAU=m +CONFIG_DRM_RCAR_DU=m +CONFIG_DRM_RCAR_HDMI=y +CONFIG_DRM_RCAR_LVDS=y +CONFIG_DRM_RCAR_VSP=y CONFIG_DRM_TEGRA=m CONFIG_DRM_PANEL_SIMPLE=m CONFIG_DRM_I2C_ADV7511=m
[PATCH 3/3] x86/vmware: Add paravirt sched clock
Set pv_time_ops.sched_clock to vmware_sched_clock(). It is simplified version of native_sched_clock() without ring buffer of mult/shift/offset triplets and preempt toggling. Since VMware hypervisor provides constant tsc we can use constant mult/shift/offset triplet calculated at boot time. no-vmw-sched-clock kernel parameter is added to switch back to the native_sched_clock() implementation. Signed-off-by: Alexey MakhalovAcked-by: Alok N Kataria --- Documentation/kernel-parameters.txt | 4 arch/x86/kernel/cpu/vmware.c| 38 + 2 files changed, 42 insertions(+) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 37babf9..b3b2ec0 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2754,6 +2754,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted. no-kvmapf [X86,KVM] Disable paravirtualized asynchronous page fault handling. + no-vmw-sched-clock + [X86,PV_OPS] Disable paravirtualized VMware scheduler + clock and use the default one. + no-steal-acc[X86,KVM] Disable paravirtualized steal time accounting. steal time is computed, but won't influence scheduler behaviour diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c index e3fb320..6ef22c1 100644 --- a/arch/x86/kernel/cpu/vmware.c +++ b/arch/x86/kernel/cpu/vmware.c @@ -24,10 +24,12 @@ #include #include #include +#include #include #include #include #include +#include #define CPUID_VMWARE_INFO_LEAF 0x4000 #define VMWARE_HYPERVISOR_MAGIC0x564D5868 @@ -62,10 +64,46 @@ static unsigned long vmware_get_tsc_khz(void) } #ifdef CONFIG_PARAVIRT +static struct cyc2ns_data vmware_cyc2ns __ro_after_init; + +static int vmw_sched_clock __initdata = 1; +static __init int setup_vmw_sched_clock(char *s) +{ + vmw_sched_clock = 0; + return 0; +} +early_param("no-vmw-sched-clock", setup_vmw_sched_clock); + +static unsigned long long vmware_sched_clock(void) +{ + unsigned long long ns; + + ns = mul_u64_u32_shr(rdtsc(), vmware_cyc2ns.cyc2ns_mul, +vmware_cyc2ns.cyc2ns_shift); + ns -= vmware_cyc2ns.cyc2ns_offset; + return ns; +} + static void __init vmware_paravirt_ops_setup(void) { pv_info.name = "VMware"; pv_cpu_ops.io_delay = paravirt_nop; + + if (vmware_tsc_khz && vmw_sched_clock) { + unsigned long long tsc_now = rdtsc(); + + clocks_calc_mult_shift(_cyc2ns.cyc2ns_mul, + _cyc2ns.cyc2ns_shift, + vmware_tsc_khz, + NSEC_PER_MSEC, 0); + vmware_cyc2ns.cyc2ns_offset = + mul_u64_u32_shr(tsc_now, vmware_cyc2ns.cyc2ns_mul, + vmware_cyc2ns.cyc2ns_shift); + + pv_time_ops.sched_clock = vmware_sched_clock; + pr_info("vmware: using sched offset of %llu ns\n", + vmware_cyc2ns.cyc2ns_offset); + } } #else #define vmware_paravirt_ops_setup() do {} while (0) -- 2.10.1
[PATCH 3/3] x86/vmware: Add paravirt sched clock
Set pv_time_ops.sched_clock to vmware_sched_clock(). It is simplified version of native_sched_clock() without ring buffer of mult/shift/offset triplets and preempt toggling. Since VMware hypervisor provides constant tsc we can use constant mult/shift/offset triplet calculated at boot time. no-vmw-sched-clock kernel parameter is added to switch back to the native_sched_clock() implementation. Signed-off-by: Alexey Makhalov Acked-by: Alok N Kataria --- Documentation/kernel-parameters.txt | 4 arch/x86/kernel/cpu/vmware.c| 38 + 2 files changed, 42 insertions(+) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 37babf9..b3b2ec0 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2754,6 +2754,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted. no-kvmapf [X86,KVM] Disable paravirtualized asynchronous page fault handling. + no-vmw-sched-clock + [X86,PV_OPS] Disable paravirtualized VMware scheduler + clock and use the default one. + no-steal-acc[X86,KVM] Disable paravirtualized steal time accounting. steal time is computed, but won't influence scheduler behaviour diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c index e3fb320..6ef22c1 100644 --- a/arch/x86/kernel/cpu/vmware.c +++ b/arch/x86/kernel/cpu/vmware.c @@ -24,10 +24,12 @@ #include #include #include +#include #include #include #include #include +#include #define CPUID_VMWARE_INFO_LEAF 0x4000 #define VMWARE_HYPERVISOR_MAGIC0x564D5868 @@ -62,10 +64,46 @@ static unsigned long vmware_get_tsc_khz(void) } #ifdef CONFIG_PARAVIRT +static struct cyc2ns_data vmware_cyc2ns __ro_after_init; + +static int vmw_sched_clock __initdata = 1; +static __init int setup_vmw_sched_clock(char *s) +{ + vmw_sched_clock = 0; + return 0; +} +early_param("no-vmw-sched-clock", setup_vmw_sched_clock); + +static unsigned long long vmware_sched_clock(void) +{ + unsigned long long ns; + + ns = mul_u64_u32_shr(rdtsc(), vmware_cyc2ns.cyc2ns_mul, +vmware_cyc2ns.cyc2ns_shift); + ns -= vmware_cyc2ns.cyc2ns_offset; + return ns; +} + static void __init vmware_paravirt_ops_setup(void) { pv_info.name = "VMware"; pv_cpu_ops.io_delay = paravirt_nop; + + if (vmware_tsc_khz && vmw_sched_clock) { + unsigned long long tsc_now = rdtsc(); + + clocks_calc_mult_shift(_cyc2ns.cyc2ns_mul, + _cyc2ns.cyc2ns_shift, + vmware_tsc_khz, + NSEC_PER_MSEC, 0); + vmware_cyc2ns.cyc2ns_offset = + mul_u64_u32_shr(tsc_now, vmware_cyc2ns.cyc2ns_mul, + vmware_cyc2ns.cyc2ns_shift); + + pv_time_ops.sched_clock = vmware_sched_clock; + pr_info("vmware: using sched offset of %llu ns\n", + vmware_cyc2ns.cyc2ns_offset); + } } #else #define vmware_paravirt_ops_setup() do {} while (0) -- 2.10.1
[PATCH 2/3] x86/vmware: Add basic paravirt ops support
Add basic paravirt support: 1. set pv_info.name to "VMware" to have proper boot log message Booting paravirtualized kernel on VMware instead of "... on bare hardware" 2. set pv_cpu_ops.io_delay() to empty function - paravirt_nop() to avoid vm-exits on IO delays. Signed-off-by: Alexey MakhalovAcked-by: Alok N Kataria --- arch/x86/kernel/cpu/vmware.c | 12 1 file changed, 12 insertions(+) diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c index 480790f..e3fb320 100644 --- a/arch/x86/kernel/cpu/vmware.c +++ b/arch/x86/kernel/cpu/vmware.c @@ -61,6 +61,16 @@ static unsigned long vmware_get_tsc_khz(void) return vmware_tsc_khz; } +#ifdef CONFIG_PARAVIRT +static void __init vmware_paravirt_ops_setup(void) +{ + pv_info.name = "VMware"; + pv_cpu_ops.io_delay = paravirt_nop; +} +#else +#define vmware_paravirt_ops_setup() do {} while (0) +#endif + static void __init vmware_platform_setup(void) { uint32_t eax, ebx, ecx, edx; @@ -94,6 +104,8 @@ static void __init vmware_platform_setup(void) } else { pr_warn("Failed to get TSC freq from the hypervisor\n"); } + + vmware_paravirt_ops_setup(); } /* -- 2.10.1
[PATCH 2/3] x86/vmware: Add basic paravirt ops support
Add basic paravirt support: 1. set pv_info.name to "VMware" to have proper boot log message Booting paravirtualized kernel on VMware instead of "... on bare hardware" 2. set pv_cpu_ops.io_delay() to empty function - paravirt_nop() to avoid vm-exits on IO delays. Signed-off-by: Alexey Makhalov Acked-by: Alok N Kataria --- arch/x86/kernel/cpu/vmware.c | 12 1 file changed, 12 insertions(+) diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c index 480790f..e3fb320 100644 --- a/arch/x86/kernel/cpu/vmware.c +++ b/arch/x86/kernel/cpu/vmware.c @@ -61,6 +61,16 @@ static unsigned long vmware_get_tsc_khz(void) return vmware_tsc_khz; } +#ifdef CONFIG_PARAVIRT +static void __init vmware_paravirt_ops_setup(void) +{ + pv_info.name = "VMware"; + pv_cpu_ops.io_delay = paravirt_nop; +} +#else +#define vmware_paravirt_ops_setup() do {} while (0) +#endif + static void __init vmware_platform_setup(void) { uint32_t eax, ebx, ecx, edx; @@ -94,6 +104,8 @@ static void __init vmware_platform_setup(void) } else { pr_warn("Failed to get TSC freq from the hypervisor\n"); } + + vmware_paravirt_ops_setup(); } /* -- 2.10.1
lening bieden 3%
Goede dag, Dit is Lloyd's TSB Bank plc leningen aan te bieden. Lloyds TSB biedt flexibele en betaalbare leningen voor welk doel u te helpen uw doelen te bereiken. we lening tegen lage rente van 3%. Hier zijn een aantal belangrijke kenmerken van de persoonlijke lening aangeboden door Lloyd's TSB. Hier zijn de Loan Factoren we werken met de toonaangevende Britse makelaars die toegang hebben tot de top kredietverstrekkers hebben en in staat zijn om de beste financiële oplossing tegen een betaalbare price.Please vinden als u geïnteresseerd bent vriendelijk contact met ons op via deze e-mail: lloyds26...@gmail.com Na de reactie, zal u een aanvraag voor een lening te vullen ontvangen. Geen sociale zekerheid en geen credit check, 100% gegarandeerd. Het zal ons een eer zijn als u ons toelaten om u van dienst zijn. INFORMATIE NODIG Jullie namen: Adres: ... Telefoon: ... Benodigd Duur: ... Bezetting: ... Maandelijks Inkomen Level: Geslacht: ... Geboortedatum: Staat: .. Land: .. Doel: . Ontmoeting uw financiële behoeften is onze trots. Dr.John Mahama.
lening bieden 3%
Goede dag, Dit is Lloyd's TSB Bank plc leningen aan te bieden. Lloyds TSB biedt flexibele en betaalbare leningen voor welk doel u te helpen uw doelen te bereiken. we lening tegen lage rente van 3%. Hier zijn een aantal belangrijke kenmerken van de persoonlijke lening aangeboden door Lloyd's TSB. Hier zijn de Loan Factoren we werken met de toonaangevende Britse makelaars die toegang hebben tot de top kredietverstrekkers hebben en in staat zijn om de beste financiële oplossing tegen een betaalbare price.Please vinden als u geïnteresseerd bent vriendelijk contact met ons op via deze e-mail: lloyds26...@gmail.com Na de reactie, zal u een aanvraag voor een lening te vullen ontvangen. Geen sociale zekerheid en geen credit check, 100% gegarandeerd. Het zal ons een eer zijn als u ons toelaten om u van dienst zijn. INFORMATIE NODIG Jullie namen: Adres: ... Telefoon: ... Benodigd Duur: ... Bezetting: ... Maandelijks Inkomen Level: Geslacht: ... Geboortedatum: Staat: .. Land: .. Doel: . Ontmoeting uw financiële behoeften is onze trots. Dr.John Mahama.
[PATCH 0/3] x86/vmware guest improvements
This patchset includes several VMware guest improvements: Alexey Makhalov (3): x86/vmware: Use tsc_khz value for calibrate_cpu() x86/vmware: Add basic paravirt ops support x86/vmware: Add paravirt sched clock Documentation/kernel-parameters.txt | 4 +++ arch/x86/kernel/cpu/vmware.c| 51 + 2 files changed, 55 insertions(+) -- 2.10.1
[PATCH 0/3] x86/vmware guest improvements
This patchset includes several VMware guest improvements: Alexey Makhalov (3): x86/vmware: Use tsc_khz value for calibrate_cpu() x86/vmware: Add basic paravirt ops support x86/vmware: Add paravirt sched clock Documentation/kernel-parameters.txt | 4 +++ arch/x86/kernel/cpu/vmware.c| 51 + 2 files changed, 55 insertions(+) -- 2.10.1
[PATCH 1/3] x86/vmware: Use tsc_khz value for calibrate_cpu()
After aa297292d708, there are separate native calibrations for cpu_khz and tsc_khz. The code sets x86_platform.calibrate_cpu to native_calibrate_cpu() which looks in cpuid leaf 0x16 or msrs for the cpu frequency. Since we keep the tsc_khz constant (even after vmotion), the cpu_khz and tsc_khz may start diverging. tsc_init() now does cpu_khz = x86_platform.calibrate_cpu(); tsc_khz = x86_platform.calibrate_tsc(); if (tsc_khz == 0) tsc_khz = cpu_khz; else if (abs(cpu_khz - tsc_khz) * 10 > tsc_khz) cpu_khz = tsc_khz; We want the cpu_khz and tsc_khz to be sync even if they diverge less then 10%. This patch resolves this issue by setting x86_platform.calibrate_cpu to vmware_get_tsc_khz(). Signed-off-by: Alexey MakhalovAcked-by: Alok N Kataria --- arch/x86/kernel/cpu/vmware.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c index 4e34da4b..480790f 100644 --- a/arch/x86/kernel/cpu/vmware.c +++ b/arch/x86/kernel/cpu/vmware.c @@ -83,6 +83,7 @@ static void __init vmware_platform_setup(void) vmware_tsc_khz = tsc_khz; x86_platform.calibrate_tsc = vmware_get_tsc_khz; + x86_platform.calibrate_cpu = vmware_get_tsc_khz; #ifdef CONFIG_X86_LOCAL_APIC /* Skip lapic calibration since we know the bus frequency. */ -- 2.10.1
[PATCH 1/3] x86/vmware: Use tsc_khz value for calibrate_cpu()
After aa297292d708, there are separate native calibrations for cpu_khz and tsc_khz. The code sets x86_platform.calibrate_cpu to native_calibrate_cpu() which looks in cpuid leaf 0x16 or msrs for the cpu frequency. Since we keep the tsc_khz constant (even after vmotion), the cpu_khz and tsc_khz may start diverging. tsc_init() now does cpu_khz = x86_platform.calibrate_cpu(); tsc_khz = x86_platform.calibrate_tsc(); if (tsc_khz == 0) tsc_khz = cpu_khz; else if (abs(cpu_khz - tsc_khz) * 10 > tsc_khz) cpu_khz = tsc_khz; We want the cpu_khz and tsc_khz to be sync even if they diverge less then 10%. This patch resolves this issue by setting x86_platform.calibrate_cpu to vmware_get_tsc_khz(). Signed-off-by: Alexey Makhalov Acked-by: Alok N Kataria --- arch/x86/kernel/cpu/vmware.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c index 4e34da4b..480790f 100644 --- a/arch/x86/kernel/cpu/vmware.c +++ b/arch/x86/kernel/cpu/vmware.c @@ -83,6 +83,7 @@ static void __init vmware_platform_setup(void) vmware_tsc_khz = tsc_khz; x86_platform.calibrate_tsc = vmware_get_tsc_khz; + x86_platform.calibrate_cpu = vmware_get_tsc_khz; #ifdef CONFIG_X86_LOCAL_APIC /* Skip lapic calibration since we know the bus frequency. */ -- 2.10.1
RE: [PATCH 0/9] staging: fsl-mc: move bus driver out of staging, add dpio
> -Original Message- > From: Alexander Graf [mailto:ag...@suse.de] > Sent: Monday, October 24, 2016 9:34 AM > To: Stuart Yoder; gre...@linuxfoundation.org > Cc: German Rivera ; de...@driverdev.osuosl.org; > linux-kernel@vger.kernel.org; > a...@arndb.de; Leo Li > Subject: Re: [PATCH 0/9] staging: fsl-mc: move bus driver out of staging, add > dpio > > Hi Stuart, > > On 10/21/2016 04:01 PM, Stuart Yoder wrote: > > This patch series: A) addresses the final item in the staging > > TODO list for the fsl-mc bus driver-- adding a functional driver > > on top of the bus driver, and B) requests that the fsl-mc bus driver > > be moved out of staging. > > Awesome, it's great to see progress again! :) > > > The proposed destination for the bus driver is drivers/bus. > > Proposed location for global header files for fsl-mc and dpaa2 > > is include/linux/fsl. > > > > The functional driver added is for the DPIO object which provides > > queuing services for other DPAA2 drivers. An overview of the > > I thought the idea of the TODO item was to have a full-fledged user of > the bus, like a full network driver. The TODO item reads: > > > -* Add at least one device driver for a DPAA2 object (child device of the > > - fsl-mc bus). Most likely candidate for this is adding DPAA2 Ethernet > > - driver support, which depends on drivers for several objects: DPNI, > > - DPIO, DPMAC. Other pre-requisites include: DPIO is a "full fleged user" of the bus. But, yes, it does provide infrastructure services and so does not have a standalone I/O function. > which to me indicates that DPIO is only part of that goal. Of course I'm > the last person blocking progress to move the driver out of staging. But > are we at the right point yet? I thought the goal was to demonstrate a driver on top of the fsl-mc bus driver because without that it would have been difficult to validate/review that the bus infrastructure was correct. The DPIO driver demonstrates full use of the bus driver infrastructure-- getting probed, discovering and mapping mmio regions, initializing the device, initializing interrupts. > To me the topmost important bit of having this outside of staging is > actually missing in the TODO list (probably since it's obvious): Have > stable, reliable, responsible maintainership for the code. > > So far I've seen German do the initial push upstream, then there was > silence for a while. Now some time passed and you push a few bits here > and there again. All of the efforts are great and very appreciated, but > I'm missing the "maintainer" figure. Some peer to German and you who > oversees the whole thing, reviews your patches and devotes at least 2-3 > days a week to only upstream fsl-mc work. Someone like York for U-Boot > or Scott for general Linux work. > > Without that, there's too much of a chance that the code will stay > incomplete, bitrot, etc. And that'd be bad for everyone involved. I > think the concept behind fsl-mc is great and exactly what people need, > so we should make sure it succeeds. I agree we need that. We are actively working on getting an additional maintainer (or two), and until we can get the right person(s) I'm willing to fill that role. We're not going to let this code bitrot. I actually think getting the bus driver out of staging will help spur broader involvment by NXP engineers in the fsl-mc bus support. There are enhancements like a resource management interface for user space, an interface to see the MC log buffer, SMMU-related hooks for the fsl-mc bus, and vfio for the fsl-mc bus. All that stuff is on hold until we get the bus driver out of staging. The directive we have is to add no new features until the bus driver is out. For example, the ARM SMMU driver has an include of , but I don't see the SMMU maintainers accepting the following in arm-smmu.c: #include <../drivers/staging/fsl-mc/include/mc.h> Given that the fsl-mc bus TODO list is done, there is not a whole lot for a new maintainer to do to the bus driver itself until we get the driver out of staging (aside from reviewing another DPAA2 object driver that would also go into staging). Once the bus driver + dpio is out staging it also opens up the door for other DPAA2 drivers-- network, crypto, DMA, L2 switch, decompression/compression, and others to be upstreamed. I didn't think we wanted all of those to go into staging, but we were waiting until some 1 driver was accepted first, proving the bus infrastructure is sound. I was hoping DPI could be that proof of concept. So, in short, I think getting the bus driver and DPIO out of staging will open some parallel development and will also provide more opportunities for some new maintainers to get involved, because there will be more to review and do. However, if you want things to stay in staging for now, I will resubmit and put DPIO there. Thanks, Stuart
RE: [PATCH 0/9] staging: fsl-mc: move bus driver out of staging, add dpio
> -Original Message- > From: Alexander Graf [mailto:ag...@suse.de] > Sent: Monday, October 24, 2016 9:34 AM > To: Stuart Yoder ; gre...@linuxfoundation.org > Cc: German Rivera ; de...@driverdev.osuosl.org; > linux-kernel@vger.kernel.org; > a...@arndb.de; Leo Li > Subject: Re: [PATCH 0/9] staging: fsl-mc: move bus driver out of staging, add > dpio > > Hi Stuart, > > On 10/21/2016 04:01 PM, Stuart Yoder wrote: > > This patch series: A) addresses the final item in the staging > > TODO list for the fsl-mc bus driver-- adding a functional driver > > on top of the bus driver, and B) requests that the fsl-mc bus driver > > be moved out of staging. > > Awesome, it's great to see progress again! :) > > > The proposed destination for the bus driver is drivers/bus. > > Proposed location for global header files for fsl-mc and dpaa2 > > is include/linux/fsl. > > > > The functional driver added is for the DPIO object which provides > > queuing services for other DPAA2 drivers. An overview of the > > I thought the idea of the TODO item was to have a full-fledged user of > the bus, like a full network driver. The TODO item reads: > > > -* Add at least one device driver for a DPAA2 object (child device of the > > - fsl-mc bus). Most likely candidate for this is adding DPAA2 Ethernet > > - driver support, which depends on drivers for several objects: DPNI, > > - DPIO, DPMAC. Other pre-requisites include: DPIO is a "full fleged user" of the bus. But, yes, it does provide infrastructure services and so does not have a standalone I/O function. > which to me indicates that DPIO is only part of that goal. Of course I'm > the last person blocking progress to move the driver out of staging. But > are we at the right point yet? I thought the goal was to demonstrate a driver on top of the fsl-mc bus driver because without that it would have been difficult to validate/review that the bus infrastructure was correct. The DPIO driver demonstrates full use of the bus driver infrastructure-- getting probed, discovering and mapping mmio regions, initializing the device, initializing interrupts. > To me the topmost important bit of having this outside of staging is > actually missing in the TODO list (probably since it's obvious): Have > stable, reliable, responsible maintainership for the code. > > So far I've seen German do the initial push upstream, then there was > silence for a while. Now some time passed and you push a few bits here > and there again. All of the efforts are great and very appreciated, but > I'm missing the "maintainer" figure. Some peer to German and you who > oversees the whole thing, reviews your patches and devotes at least 2-3 > days a week to only upstream fsl-mc work. Someone like York for U-Boot > or Scott for general Linux work. > > Without that, there's too much of a chance that the code will stay > incomplete, bitrot, etc. And that'd be bad for everyone involved. I > think the concept behind fsl-mc is great and exactly what people need, > so we should make sure it succeeds. I agree we need that. We are actively working on getting an additional maintainer (or two), and until we can get the right person(s) I'm willing to fill that role. We're not going to let this code bitrot. I actually think getting the bus driver out of staging will help spur broader involvment by NXP engineers in the fsl-mc bus support. There are enhancements like a resource management interface for user space, an interface to see the MC log buffer, SMMU-related hooks for the fsl-mc bus, and vfio for the fsl-mc bus. All that stuff is on hold until we get the bus driver out of staging. The directive we have is to add no new features until the bus driver is out. For example, the ARM SMMU driver has an include of , but I don't see the SMMU maintainers accepting the following in arm-smmu.c: #include <../drivers/staging/fsl-mc/include/mc.h> Given that the fsl-mc bus TODO list is done, there is not a whole lot for a new maintainer to do to the bus driver itself until we get the driver out of staging (aside from reviewing another DPAA2 object driver that would also go into staging). Once the bus driver + dpio is out staging it also opens up the door for other DPAA2 drivers-- network, crypto, DMA, L2 switch, decompression/compression, and others to be upstreamed. I didn't think we wanted all of those to go into staging, but we were waiting until some 1 driver was accepted first, proving the bus infrastructure is sound. I was hoping DPI could be that proof of concept. So, in short, I think getting the bus driver and DPIO out of staging will open some parallel development and will also provide more opportunities for some new maintainers to get involved, because there will be more to review and do. However, if you want things to stay in staging for now, I will resubmit and put DPIO there. Thanks, Stuart
Re: [RFC 8/8] mm: Add N_COHERENT_DEVICE node type into node_states[]
On 10/25/2016 12:52 PM, Balbir Singh wrote: > > > On 24/10/16 15:31, Anshuman Khandual wrote: >> Add a new member N_COHERENT_DEVICE into node_states[] nodemask array to >> enlist all those nodes which contain only coherent device memory. Also >> creates a new sysfs interface /sys/devices/system/node/is_coherent_device >> to list down all those nodes which has coherent device memory. >> >> Signed-off-by: Anshuman Khandual>> --- >> Documentation/ABI/stable/sysfs-devices-node | 7 +++ >> drivers/base/node.c | 6 ++ >> include/linux/nodemask.h| 3 +++ >> mm/memory_hotplug.c | 10 ++ >> 4 files changed, 26 insertions(+) >> >> diff --git a/Documentation/ABI/stable/sysfs-devices-node >> b/Documentation/ABI/stable/sysfs-devices-node >> index 5b2d0f0..5538791 100644 >> --- a/Documentation/ABI/stable/sysfs-devices-node >> +++ b/Documentation/ABI/stable/sysfs-devices-node >> @@ -29,6 +29,13 @@ Description: >> Nodes that have regular or high memory. >> Depends on CONFIG_HIGHMEM. >> >> +What: /sys/devices/system/node/is_coherent_device >> +Date: October 2016 >> +Contact:Linux Memory Management list >> +Description: >> +Lists the nodemask of nodes that have coherent memory. >> +Depends on CONFIG_COHERENT_DEVICE. >> + >> What: /sys/devices/system/node/nodeX >> Date: October 2002 >> Contact:Linux Memory Management list >> diff --git a/drivers/base/node.c b/drivers/base/node.c >> index 5548f96..5b5dd89 100644 >> --- a/drivers/base/node.c >> +++ b/drivers/base/node.c >> @@ -661,6 +661,9 @@ static struct node_attr node_state_attr[] = { >> [N_MEMORY] = _NODE_ATTR(has_memory, N_MEMORY), >> #endif >> [N_CPU] = _NODE_ATTR(has_cpu, N_CPU), >> +#ifdef CONFIG_COHERENT_DEVICE >> +[N_COHERENT_DEVICE] = _NODE_ATTR(is_coherent_device, N_COHERENT_DEVICE), >> +#endif >> }; >> >> static struct attribute *node_state_attrs[] = { >> @@ -674,6 +677,9 @@ static struct attribute *node_state_attrs[] = { >> _state_attr[N_MEMORY].attr.attr, >> #endif >> _state_attr[N_CPU].attr.attr, >> +#ifdef CONFIG_COHERENT_DEVICE >> +_state_attr[N_COHERENT_DEVICE].attr.attr, >> +#endif >> NULL >> }; >> >> diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h >> index f746e44..605cb0d 100644 >> --- a/include/linux/nodemask.h >> +++ b/include/linux/nodemask.h >> @@ -393,6 +393,9 @@ enum node_states { >> N_MEMORY = N_HIGH_MEMORY, >> #endif >> N_CPU, /* The node has one or more cpus */ >> +#ifdef CONFIG_COHERENT_DEVICE >> +N_COHERENT_DEVICE, /* The node has coherent device memory */ >> +#endif >> NR_NODE_STATES >> }; >> >> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >> index 9629273..8f03962 100644 >> --- a/mm/memory_hotplug.c >> +++ b/mm/memory_hotplug.c >> @@ -1044,6 +1044,11 @@ static void node_states_set_node(int node, struct >> memory_notify *arg) >> if (arg->status_change_nid_high >= 0) >> node_set_state(node, N_HIGH_MEMORY); >> >> +#ifdef CONFIG_COHERENT_DEVICE >> +if (isolated_cdm_node(node)) >> +node_set_state(node, N_COHERENT_DEVICE); >> +#endif >> + > > #ifdef not required, see below > Right, will change. >> node_set_state(node, N_MEMORY); >> } >> >> @@ -1858,6 +1863,11 @@ static void node_states_clear_node(int node, struct >> memory_notify *arg) >> if ((N_MEMORY != N_HIGH_MEMORY) && >> (arg->status_change_nid >= 0)) >> node_clear_state(node, N_MEMORY); >> + >> +#ifdef CONFIG_COHERENT_DEVICE >> +if (isolated_cdm_node(node)) >> +node_clear_state(node, N_COHERENT_DEVICE); >> +#endif >> } >> > > I think the #ifdefs are not needed if isolated_cdm_node > is defined for both with and without CONFIG_COHERENT_DEVICE. > > I think this patch needs to move up in the series so that > node state can be examined by other core algorithms Okay, will move up.
Re: [RFC 8/8] mm: Add N_COHERENT_DEVICE node type into node_states[]
On 10/25/2016 12:52 PM, Balbir Singh wrote: > > > On 24/10/16 15:31, Anshuman Khandual wrote: >> Add a new member N_COHERENT_DEVICE into node_states[] nodemask array to >> enlist all those nodes which contain only coherent device memory. Also >> creates a new sysfs interface /sys/devices/system/node/is_coherent_device >> to list down all those nodes which has coherent device memory. >> >> Signed-off-by: Anshuman Khandual >> --- >> Documentation/ABI/stable/sysfs-devices-node | 7 +++ >> drivers/base/node.c | 6 ++ >> include/linux/nodemask.h| 3 +++ >> mm/memory_hotplug.c | 10 ++ >> 4 files changed, 26 insertions(+) >> >> diff --git a/Documentation/ABI/stable/sysfs-devices-node >> b/Documentation/ABI/stable/sysfs-devices-node >> index 5b2d0f0..5538791 100644 >> --- a/Documentation/ABI/stable/sysfs-devices-node >> +++ b/Documentation/ABI/stable/sysfs-devices-node >> @@ -29,6 +29,13 @@ Description: >> Nodes that have regular or high memory. >> Depends on CONFIG_HIGHMEM. >> >> +What: /sys/devices/system/node/is_coherent_device >> +Date: October 2016 >> +Contact:Linux Memory Management list >> +Description: >> +Lists the nodemask of nodes that have coherent memory. >> +Depends on CONFIG_COHERENT_DEVICE. >> + >> What: /sys/devices/system/node/nodeX >> Date: October 2002 >> Contact:Linux Memory Management list >> diff --git a/drivers/base/node.c b/drivers/base/node.c >> index 5548f96..5b5dd89 100644 >> --- a/drivers/base/node.c >> +++ b/drivers/base/node.c >> @@ -661,6 +661,9 @@ static struct node_attr node_state_attr[] = { >> [N_MEMORY] = _NODE_ATTR(has_memory, N_MEMORY), >> #endif >> [N_CPU] = _NODE_ATTR(has_cpu, N_CPU), >> +#ifdef CONFIG_COHERENT_DEVICE >> +[N_COHERENT_DEVICE] = _NODE_ATTR(is_coherent_device, N_COHERENT_DEVICE), >> +#endif >> }; >> >> static struct attribute *node_state_attrs[] = { >> @@ -674,6 +677,9 @@ static struct attribute *node_state_attrs[] = { >> _state_attr[N_MEMORY].attr.attr, >> #endif >> _state_attr[N_CPU].attr.attr, >> +#ifdef CONFIG_COHERENT_DEVICE >> +_state_attr[N_COHERENT_DEVICE].attr.attr, >> +#endif >> NULL >> }; >> >> diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h >> index f746e44..605cb0d 100644 >> --- a/include/linux/nodemask.h >> +++ b/include/linux/nodemask.h >> @@ -393,6 +393,9 @@ enum node_states { >> N_MEMORY = N_HIGH_MEMORY, >> #endif >> N_CPU, /* The node has one or more cpus */ >> +#ifdef CONFIG_COHERENT_DEVICE >> +N_COHERENT_DEVICE, /* The node has coherent device memory */ >> +#endif >> NR_NODE_STATES >> }; >> >> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >> index 9629273..8f03962 100644 >> --- a/mm/memory_hotplug.c >> +++ b/mm/memory_hotplug.c >> @@ -1044,6 +1044,11 @@ static void node_states_set_node(int node, struct >> memory_notify *arg) >> if (arg->status_change_nid_high >= 0) >> node_set_state(node, N_HIGH_MEMORY); >> >> +#ifdef CONFIG_COHERENT_DEVICE >> +if (isolated_cdm_node(node)) >> +node_set_state(node, N_COHERENT_DEVICE); >> +#endif >> + > > #ifdef not required, see below > Right, will change. >> node_set_state(node, N_MEMORY); >> } >> >> @@ -1858,6 +1863,11 @@ static void node_states_clear_node(int node, struct >> memory_notify *arg) >> if ((N_MEMORY != N_HIGH_MEMORY) && >> (arg->status_change_nid >= 0)) >> node_clear_state(node, N_MEMORY); >> + >> +#ifdef CONFIG_COHERENT_DEVICE >> +if (isolated_cdm_node(node)) >> +node_clear_state(node, N_COHERENT_DEVICE); >> +#endif >> } >> > > I think the #ifdefs are not needed if isolated_cdm_node > is defined for both with and without CONFIG_COHERENT_DEVICE. > > I think this patch needs to move up in the series so that > node state can be examined by other core algorithms Okay, will move up.
Re: [RFC PATCH 2/5] mm/page_alloc: use smallest fallback page first in movable allocation
On Fri, Oct 14, 2016 at 12:52:26PM +0200, Vlastimil Babka wrote: > On 10/14/2016 03:26 AM, Joonsoo Kim wrote: > >On Thu, Oct 13, 2016 at 11:12:10AM +0200, Vlastimil Babka wrote: > >>On 10/13/2016 10:08 AM, js1...@gmail.com wrote: > >>>From: Joonsoo Kim> >>> > >>>When we try to find freepage in fallback buddy list, we always serach > >>>the largest one. This would help for fragmentation if we process > >>>unmovable/reclaimable allocation request because it could cause permanent > >>>fragmentation on movable pageblock and spread out such allocations would > >>>cause more fragmentation. But, movable allocation request is > >>>rather different. It would be simply freed or migrated so it doesn't > >>>contribute to fragmentation on the other pageblock. In this case, it would > >>>be better not to break the precious highest order freepage so we need to > >>>search the smallest freepage first. > >> > >>I've also pondered this, but then found a lower hanging fruit that > >>should be hopefully clear win and mitigate most cases of breaking > >>high-order pages unnecessarily: > >> > >>http://marc.info/?l=linux-mm=147582914330198=2 > > > >Yes, I agree with that change. That's the similar patch what I tried > >before. > > > >"mm/page_alloc: don't break highest order freepage if steal" > >http://marc.info/?l=linux-mm=143011930520417=2 > > Ah, indeed, I forgot about it and had to rediscover :) > > > > >> > >>So I would try that first, and then test your patch on top? In your > >>patch there's a risk that we make it harder for > >>unmovable/reclaimable pageblocks to become movable again (we start > >>with the smallest page which means there's lower chance that > >>move_freepages_block() will convert more than half of the block). > > > >Indeed, but, with your "count movable pages when stealing", risk would > >disappear. :) > > Hmm, but that counting is only triggered when we attempt to steal > whole pageblock. For movable allocation, can_steal_fallback() allows > that only for > (order >= pageblock_order / 2), and since your patch makes "order" > as small as possible for movable allocations, the chances are lower? Chances are lower than current but we eventually try to steal that (order >= pageblock_order / 2) freepage from unmovable pageblock and your logic will result in changing pageblock migratetype from unmovable to movable. Thanks.
Re: [RFC PATCH 2/5] mm/page_alloc: use smallest fallback page first in movable allocation
On Fri, Oct 14, 2016 at 12:52:26PM +0200, Vlastimil Babka wrote: > On 10/14/2016 03:26 AM, Joonsoo Kim wrote: > >On Thu, Oct 13, 2016 at 11:12:10AM +0200, Vlastimil Babka wrote: > >>On 10/13/2016 10:08 AM, js1...@gmail.com wrote: > >>>From: Joonsoo Kim > >>> > >>>When we try to find freepage in fallback buddy list, we always serach > >>>the largest one. This would help for fragmentation if we process > >>>unmovable/reclaimable allocation request because it could cause permanent > >>>fragmentation on movable pageblock and spread out such allocations would > >>>cause more fragmentation. But, movable allocation request is > >>>rather different. It would be simply freed or migrated so it doesn't > >>>contribute to fragmentation on the other pageblock. In this case, it would > >>>be better not to break the precious highest order freepage so we need to > >>>search the smallest freepage first. > >> > >>I've also pondered this, but then found a lower hanging fruit that > >>should be hopefully clear win and mitigate most cases of breaking > >>high-order pages unnecessarily: > >> > >>http://marc.info/?l=linux-mm=147582914330198=2 > > > >Yes, I agree with that change. That's the similar patch what I tried > >before. > > > >"mm/page_alloc: don't break highest order freepage if steal" > >http://marc.info/?l=linux-mm=143011930520417=2 > > Ah, indeed, I forgot about it and had to rediscover :) > > > > >> > >>So I would try that first, and then test your patch on top? In your > >>patch there's a risk that we make it harder for > >>unmovable/reclaimable pageblocks to become movable again (we start > >>with the smallest page which means there's lower chance that > >>move_freepages_block() will convert more than half of the block). > > > >Indeed, but, with your "count movable pages when stealing", risk would > >disappear. :) > > Hmm, but that counting is only triggered when we attempt to steal > whole pageblock. For movable allocation, can_steal_fallback() allows > that only for > (order >= pageblock_order / 2), and since your patch makes "order" > as small as possible for movable allocations, the chances are lower? Chances are lower than current but we eventually try to steal that (order >= pageblock_order / 2) freepage from unmovable pageblock and your logic will result in changing pageblock migratetype from unmovable to movable. Thanks.
Re: [RFC PATCH 1/5] mm/page_alloc: always add freeing page at the tail of the buddy list
On Mon, Oct 17, 2016 at 05:21:54PM +0800, Xishi Qiu wrote: > On 2016/10/13 16:08, js1...@gmail.com wrote: > > > From: Joonsoo Kim> > > > Currently, freeing page can stay longer in the buddy list if next higher > > order page is in the buddy list in order to help coalescence. However, > > it doesn't work for the simplest sequential free case. For example, think > > about the situation that 8 consecutive pages are freed in sequential > > order. > > > > page 0: attached at the head of order 0 list > > page 1: merged with page 0, attached at the head of order 1 list > > page 2: attached at the tail of order 0 list > > page 3: merged with page 2 and then merged with page 0, attached at > > the head of order 2 list > > page 4: attached at the head of order 0 list > > page 5: merged with page 4, attached at the tail of order 1 list > > page 6: attached at the tail of order 0 list > > page 7: merged with page 6 and then merged with page 4. Lastly, merged > > with page 0 and we get order 3 freepage. > > > > With excluding page 0 case, there are three cases that freeing page is > > attached at the head of buddy list in this example and if just one > > corresponding ordered allocation request comes at that moment, this page > > in being a high order page will be allocated and we would fail to make > > order-3 freepage. > > > > Allocation usually happens in sequential order and free also does. So, it > > would be important to detect such a situation and to give some chance > > to be coalesced. > > > > I think that simple and effective heuristic about this case is just > > attaching freeing page at the tail of the buddy list unconditionally. > > If freeing isn't merged during one rotation, it would be actual > > fragmentation and we don't need to care about it for coalescence. > > > > Hi Joonsoo, > > I find another two places to reduce fragmentation. > > 1) > __rmqueue_fallback > steal_suitable_fallback > move_freepages_block > move_freepages > list_move > If we steal some free pages, we will add these page at the head of > start_migratetype list, > this will cause more fixed migratetype, because this pages will be allocated > more easily. > So how about use list_move_tail instead of list_move? Yeah... I don't think deeply but, at a glance, it would be helpful. > > 2) > __rmqueue_fallback > expand > list_add > How about use list_add_tail instead of list_add? If add the tail, then the > rest of pages > will be hard to be allocated and we can merge them again as soon as the page > freed. I guess that it has no effect. When we do __rmqueue_fallback() and expand(), we don't have any freepage on this or more order. So, list_add or list_add_tail will show the same result. Thanks.
Re: [RFC PATCH 1/5] mm/page_alloc: always add freeing page at the tail of the buddy list
On Mon, Oct 17, 2016 at 05:21:54PM +0800, Xishi Qiu wrote: > On 2016/10/13 16:08, js1...@gmail.com wrote: > > > From: Joonsoo Kim > > > > Currently, freeing page can stay longer in the buddy list if next higher > > order page is in the buddy list in order to help coalescence. However, > > it doesn't work for the simplest sequential free case. For example, think > > about the situation that 8 consecutive pages are freed in sequential > > order. > > > > page 0: attached at the head of order 0 list > > page 1: merged with page 0, attached at the head of order 1 list > > page 2: attached at the tail of order 0 list > > page 3: merged with page 2 and then merged with page 0, attached at > > the head of order 2 list > > page 4: attached at the head of order 0 list > > page 5: merged with page 4, attached at the tail of order 1 list > > page 6: attached at the tail of order 0 list > > page 7: merged with page 6 and then merged with page 4. Lastly, merged > > with page 0 and we get order 3 freepage. > > > > With excluding page 0 case, there are three cases that freeing page is > > attached at the head of buddy list in this example and if just one > > corresponding ordered allocation request comes at that moment, this page > > in being a high order page will be allocated and we would fail to make > > order-3 freepage. > > > > Allocation usually happens in sequential order and free also does. So, it > > would be important to detect such a situation and to give some chance > > to be coalesced. > > > > I think that simple and effective heuristic about this case is just > > attaching freeing page at the tail of the buddy list unconditionally. > > If freeing isn't merged during one rotation, it would be actual > > fragmentation and we don't need to care about it for coalescence. > > > > Hi Joonsoo, > > I find another two places to reduce fragmentation. > > 1) > __rmqueue_fallback > steal_suitable_fallback > move_freepages_block > move_freepages > list_move > If we steal some free pages, we will add these page at the head of > start_migratetype list, > this will cause more fixed migratetype, because this pages will be allocated > more easily. > So how about use list_move_tail instead of list_move? Yeah... I don't think deeply but, at a glance, it would be helpful. > > 2) > __rmqueue_fallback > expand > list_add > How about use list_add_tail instead of list_add? If add the tail, then the > rest of pages > will be hard to be allocated and we can merge them again as soon as the page > freed. I guess that it has no effect. When we do __rmqueue_fallback() and expand(), we don't have any freepage on this or more order. So, list_add or list_add_tail will show the same result. Thanks.
Re: [kernel-hardening] [PATCH] module: extend 'rodata=off' boot cmdline parameter to module mappings
Rusty, Jessica On Wed, Oct 26, 2016 at 10:43:32AM +1030, Rusty Russell wrote: > AKASHI Takahirowrites: > > On Thu, Oct 20, 2016 at 01:48:15PM -0700, Kees Cook wrote: > >> On Wed, Oct 19, 2016 at 11:24 PM, AKASHI Takahiro > >> wrote: > >> > The current "rodata=off" parameter disables read-only kernel mappings > >> > under CONFIG_DEBUG_RODATA: > >> > commit d2aa1acad22f ("mm/init: Add 'rodata=off' boot cmdline > >> > parameter > >> > to disable read-only kernel mappings") > >> > > >> > This patch is a logical extension to module mappings ie. read-only > >> > mappings > >> > at module loading can be disabled even if CONFIG_DEBUG_SET_MODULE_RONX > >> > (mainly for debug use). Please note, however, that it only affects RO/RW > >> > permissions, keeping NX set. > > This patch looks good (except the minor issues noted by Kees); please CC > the followup version to Jessica as new module maintainer. I think that the new version (v2)[1] addresses Kees' comments already. [1] http://lkml.iu.edu//hypermail/linux/kernel/1610.2/04163.html Thanks, -Takahiro AKASHI > Thanks! > Rusty. > > >> > > >> > This is the first step to make CONFIG_DEBUG_SET_MODULE_RONX mandatory > >> > (always-on) in the future as CONFIG_DEBUG_RODATA on x86 and arm64. > >> > > >> > Suggested-by: Mark Rutland > >> > Signed-off-by: AKASHI Takahiro > >> > Cc: Rusty Russell > >> > --- > >> > v1: > >> > * remove RFC's "module_ronx=" and merge it with "rodata=" > >> > * always keep NX set if CONFIG_SET_MODULE_RONX > >> > > >> > include/linux/init.h | 3 ++- > >> > init/main.c | 2 +- > >> > kernel/module.c | 21 ++--- > >> > 3 files changed, 21 insertions(+), 5 deletions(-) > >> > > >> > diff --git a/include/linux/init.h b/include/linux/init.h > >> > index e30104c..20aa2eb 100644 > >> > --- a/include/linux/init.h > >> > +++ b/include/linux/init.h > >> > @@ -126,7 +126,8 @@ void prepare_namespace(void); > >> > void __init load_default_modules(void); > >> > int __init init_rootfs(void); > >> > > >> > -#ifdef CONFIG_DEBUG_RODATA > >> > +#if defined(CONFIG_DEBUG_RODATA) || > >> > defined(CONFIG_DEBUG_SET_MODULE_RONX) > >> > +extern bool rodata_enabled; > >> > void mark_rodata_ro(void); > >> > #endif > >> > > >> > diff --git a/init/main.c b/init/main.c > >> > index 2858be7..92db2f3 100644 > >> > --- a/init/main.c > >> > +++ b/init/main.c > >> > @@ -915,7 +915,7 @@ static int try_to_run_init_process(const char > >> > *init_filename) > >> > static noinline void __init kernel_init_freeable(void); > >> > > >> > #ifdef CONFIG_DEBUG_RODATA > >> > -static bool rodata_enabled = true; > >> > +bool rodata_enabled = true; > >> > >> Is there a mismatch here between the extern ifdef and the bool ifdef? > >> I.e. shouldn't the ifdef here be || DEBUG_SET_MODULE_RONX too? > > > > Yes. > > > >> Also, can you mark this as __ro_after_init, since nothing changes it > >> after the kernel command line is parsed? > > > > Yes, yes. > > > > Thanks, > > -Takahiro AKASHI > > > >> Otherwise, this looks fine to me. > >> > >> -Kees > >> > >> > >> -- > >> Kees Cook > >> Nexus Security
Re: [kernel-hardening] [PATCH] module: extend 'rodata=off' boot cmdline parameter to module mappings
Rusty, Jessica On Wed, Oct 26, 2016 at 10:43:32AM +1030, Rusty Russell wrote: > AKASHI Takahiro writes: > > On Thu, Oct 20, 2016 at 01:48:15PM -0700, Kees Cook wrote: > >> On Wed, Oct 19, 2016 at 11:24 PM, AKASHI Takahiro > >> wrote: > >> > The current "rodata=off" parameter disables read-only kernel mappings > >> > under CONFIG_DEBUG_RODATA: > >> > commit d2aa1acad22f ("mm/init: Add 'rodata=off' boot cmdline > >> > parameter > >> > to disable read-only kernel mappings") > >> > > >> > This patch is a logical extension to module mappings ie. read-only > >> > mappings > >> > at module loading can be disabled even if CONFIG_DEBUG_SET_MODULE_RONX > >> > (mainly for debug use). Please note, however, that it only affects RO/RW > >> > permissions, keeping NX set. > > This patch looks good (except the minor issues noted by Kees); please CC > the followup version to Jessica as new module maintainer. I think that the new version (v2)[1] addresses Kees' comments already. [1] http://lkml.iu.edu//hypermail/linux/kernel/1610.2/04163.html Thanks, -Takahiro AKASHI > Thanks! > Rusty. > > >> > > >> > This is the first step to make CONFIG_DEBUG_SET_MODULE_RONX mandatory > >> > (always-on) in the future as CONFIG_DEBUG_RODATA on x86 and arm64. > >> > > >> > Suggested-by: Mark Rutland > >> > Signed-off-by: AKASHI Takahiro > >> > Cc: Rusty Russell > >> > --- > >> > v1: > >> > * remove RFC's "module_ronx=" and merge it with "rodata=" > >> > * always keep NX set if CONFIG_SET_MODULE_RONX > >> > > >> > include/linux/init.h | 3 ++- > >> > init/main.c | 2 +- > >> > kernel/module.c | 21 ++--- > >> > 3 files changed, 21 insertions(+), 5 deletions(-) > >> > > >> > diff --git a/include/linux/init.h b/include/linux/init.h > >> > index e30104c..20aa2eb 100644 > >> > --- a/include/linux/init.h > >> > +++ b/include/linux/init.h > >> > @@ -126,7 +126,8 @@ void prepare_namespace(void); > >> > void __init load_default_modules(void); > >> > int __init init_rootfs(void); > >> > > >> > -#ifdef CONFIG_DEBUG_RODATA > >> > +#if defined(CONFIG_DEBUG_RODATA) || > >> > defined(CONFIG_DEBUG_SET_MODULE_RONX) > >> > +extern bool rodata_enabled; > >> > void mark_rodata_ro(void); > >> > #endif > >> > > >> > diff --git a/init/main.c b/init/main.c > >> > index 2858be7..92db2f3 100644 > >> > --- a/init/main.c > >> > +++ b/init/main.c > >> > @@ -915,7 +915,7 @@ static int try_to_run_init_process(const char > >> > *init_filename) > >> > static noinline void __init kernel_init_freeable(void); > >> > > >> > #ifdef CONFIG_DEBUG_RODATA > >> > -static bool rodata_enabled = true; > >> > +bool rodata_enabled = true; > >> > >> Is there a mismatch here between the extern ifdef and the bool ifdef? > >> I.e. shouldn't the ifdef here be || DEBUG_SET_MODULE_RONX too? > > > > Yes. > > > >> Also, can you mark this as __ro_after_init, since nothing changes it > >> after the kernel command line is parsed? > > > > Yes, yes. > > > > Thanks, > > -Takahiro AKASHI > > > >> Otherwise, this looks fine to me. > >> > >> -Kees > >> > >> > >> -- > >> Kees Cook > >> Nexus Security
Re: [PATCH v6 3/6] mm/cma: populate ZONE_CMA
On Tue, Oct 18, 2016 at 05:27:30PM +0900, Joonsoo Kim wrote: > On Tue, Oct 18, 2016 at 09:42:57AM +0200, Vlastimil Babka wrote: > > On 10/14/2016 05:03 AM, js1...@gmail.com wrote: > > >@@ -145,6 +145,35 @@ static int __init cma_activate_area(struct cma *cma) > > > static int __init cma_init_reserved_areas(void) > > > { > > > int i; > > >+ struct zone *zone; > > >+ pg_data_t *pgdat; > > >+ > > >+ if (!cma_area_count) > > >+ return 0; > > >+ > > >+ for_each_online_pgdat(pgdat) { > > >+ unsigned long start_pfn = UINT_MAX, end_pfn = 0; > > >+ > > >+ for (i = 0; i < cma_area_count; i++) { > > >+ if (pfn_to_nid(cma_areas[i].base_pfn) != > > >+ pgdat->node_id) > > >+ continue; > > >+ > > >+ start_pfn = min(start_pfn, cma_areas[i].base_pfn); > > >+ end_pfn = max(end_pfn, cma_areas[i].base_pfn + > > >+ cma_areas[i].count); > > >+ } > > >+ > > >+ if (!end_pfn) > > >+ continue; > > >+ > > >+ zone = >node_zones[ZONE_CMA]; > > >+ > > >+ /* ZONE_CMA doesn't need to exceed CMA region */ > > >+ zone->zone_start_pfn = max(zone->zone_start_pfn, start_pfn); > > >+ zone->spanned_pages = min(zone_end_pfn(zone), end_pfn) - > > >+ zone->zone_start_pfn; > > > > Hmm, do the max/min here work as intended? IIUC the initial > > Yeap. > > > zone_start_pfn is UINT_MAX and zone->spanned_pages is 1? So at least > > the max/min should be swapped? > > No. CMA zone's start/end pfn are updated as node's start/end pfn. > > > Also the zone_end_pfn(zone) on the second line already sees the > > changes to zone->zone_start_pfn in the first line, so it's kind of a > > mess. You should probably cache zone_end_pfn() to a temporary > > variable before changing zone_start_pfn. > > You're right although it doesn't cause any problem. I look at the code > again and find that max/min isn't needed. Calculated start/end pfn > should be inbetween node's start/end pfn so max(zone->zone_start_pfn, > start_pfn) will return start_pfn and messed up min(zone_end_pfn(zone), > end_pfn) will return end_pfn in all the cases. > > Anyway, I will fix it as following. > > zone->zone_start_pfn = start_pfn > zone->spanned_pages = end_pfn - start_pfn Hello, Here comes fixed one. --->8 >From 93fb05a83d74f9e2c8caebc2fa6d1a8807c9ffb6 Mon Sep 17 00:00:00 2001 From: Joonsoo KimDate: Thu, 24 Mar 2016 22:29:10 +0900 Subject: [PATCH] mm/cma: populate ZONE_CMA Until now, reserved pages for CMA are managed in the ordinary zones where page's pfn are belong to. This approach has numorous problems and fixing them isn't easy. (It is mentioned on previous patch.) To fix this situation, ZONE_CMA is introduced in previous patch, but, not yet populated. This patch implement population of ZONE_CMA by stealing reserved pages from the ordinary zones. Unlike previous implementation that kernel allocation request with __GFP_MOVABLE could be serviced from CMA region, allocation request only with GFP_HIGHUSER_MOVABLE can be serviced from CMA region in the new approach. This is an inevitable design decision to use the zone implementation because ZONE_CMA could contain highmem. Due to this decision, ZONE_CMA will work like as ZONE_HIGHMEM or ZONE_MOVABLE. I don't think it would be a problem because most of file cache pages and anonymous pages are requested with GFP_HIGHUSER_MOVABLE. It could be proved by the fact that there are many systems with ZONE_HIGHMEM and they work fine. Notable disadvantage is that we cannot use these pages for blockdev file cache page, because it usually has __GFP_MOVABLE but not __GFP_HIGHMEM and __GFP_USER. But, in this case, there is pros and cons. In my experience, blockdev file cache pages are one of the top reason that causes cma_alloc() to fail temporarily. So, we can get more guarantee of cma_alloc() success by discarding that case. Implementation itself is very easy to understand. Steal when cma area is initialized and recalculate various per zone stat/threshold. Reviewed-by: Aneesh Kumar K.V Signed-off-by: Joonsoo Kim --- include/linux/memory_hotplug.h | 3 -- include/linux/mm.h | 1 + mm/cma.c | 62 ++ mm/internal.h | 3 ++ mm/page_alloc.c| 29 +--- 5 files changed, 86 insertions(+), 12 deletions(-) diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 01033fa..ea5af47 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -198,9 +198,6 @@ extern void get_page_bootmem(unsigned long ingo, struct page *page, void mem_hotplug_begin(void); void mem_hotplug_done(void); -extern void
Re: [PATCH v6 3/6] mm/cma: populate ZONE_CMA
On Tue, Oct 18, 2016 at 05:27:30PM +0900, Joonsoo Kim wrote: > On Tue, Oct 18, 2016 at 09:42:57AM +0200, Vlastimil Babka wrote: > > On 10/14/2016 05:03 AM, js1...@gmail.com wrote: > > >@@ -145,6 +145,35 @@ static int __init cma_activate_area(struct cma *cma) > > > static int __init cma_init_reserved_areas(void) > > > { > > > int i; > > >+ struct zone *zone; > > >+ pg_data_t *pgdat; > > >+ > > >+ if (!cma_area_count) > > >+ return 0; > > >+ > > >+ for_each_online_pgdat(pgdat) { > > >+ unsigned long start_pfn = UINT_MAX, end_pfn = 0; > > >+ > > >+ for (i = 0; i < cma_area_count; i++) { > > >+ if (pfn_to_nid(cma_areas[i].base_pfn) != > > >+ pgdat->node_id) > > >+ continue; > > >+ > > >+ start_pfn = min(start_pfn, cma_areas[i].base_pfn); > > >+ end_pfn = max(end_pfn, cma_areas[i].base_pfn + > > >+ cma_areas[i].count); > > >+ } > > >+ > > >+ if (!end_pfn) > > >+ continue; > > >+ > > >+ zone = >node_zones[ZONE_CMA]; > > >+ > > >+ /* ZONE_CMA doesn't need to exceed CMA region */ > > >+ zone->zone_start_pfn = max(zone->zone_start_pfn, start_pfn); > > >+ zone->spanned_pages = min(zone_end_pfn(zone), end_pfn) - > > >+ zone->zone_start_pfn; > > > > Hmm, do the max/min here work as intended? IIUC the initial > > Yeap. > > > zone_start_pfn is UINT_MAX and zone->spanned_pages is 1? So at least > > the max/min should be swapped? > > No. CMA zone's start/end pfn are updated as node's start/end pfn. > > > Also the zone_end_pfn(zone) on the second line already sees the > > changes to zone->zone_start_pfn in the first line, so it's kind of a > > mess. You should probably cache zone_end_pfn() to a temporary > > variable before changing zone_start_pfn. > > You're right although it doesn't cause any problem. I look at the code > again and find that max/min isn't needed. Calculated start/end pfn > should be inbetween node's start/end pfn so max(zone->zone_start_pfn, > start_pfn) will return start_pfn and messed up min(zone_end_pfn(zone), > end_pfn) will return end_pfn in all the cases. > > Anyway, I will fix it as following. > > zone->zone_start_pfn = start_pfn > zone->spanned_pages = end_pfn - start_pfn Hello, Here comes fixed one. --->8 >From 93fb05a83d74f9e2c8caebc2fa6d1a8807c9ffb6 Mon Sep 17 00:00:00 2001 From: Joonsoo Kim Date: Thu, 24 Mar 2016 22:29:10 +0900 Subject: [PATCH] mm/cma: populate ZONE_CMA Until now, reserved pages for CMA are managed in the ordinary zones where page's pfn are belong to. This approach has numorous problems and fixing them isn't easy. (It is mentioned on previous patch.) To fix this situation, ZONE_CMA is introduced in previous patch, but, not yet populated. This patch implement population of ZONE_CMA by stealing reserved pages from the ordinary zones. Unlike previous implementation that kernel allocation request with __GFP_MOVABLE could be serviced from CMA region, allocation request only with GFP_HIGHUSER_MOVABLE can be serviced from CMA region in the new approach. This is an inevitable design decision to use the zone implementation because ZONE_CMA could contain highmem. Due to this decision, ZONE_CMA will work like as ZONE_HIGHMEM or ZONE_MOVABLE. I don't think it would be a problem because most of file cache pages and anonymous pages are requested with GFP_HIGHUSER_MOVABLE. It could be proved by the fact that there are many systems with ZONE_HIGHMEM and they work fine. Notable disadvantage is that we cannot use these pages for blockdev file cache page, because it usually has __GFP_MOVABLE but not __GFP_HIGHMEM and __GFP_USER. But, in this case, there is pros and cons. In my experience, blockdev file cache pages are one of the top reason that causes cma_alloc() to fail temporarily. So, we can get more guarantee of cma_alloc() success by discarding that case. Implementation itself is very easy to understand. Steal when cma area is initialized and recalculate various per zone stat/threshold. Reviewed-by: Aneesh Kumar K.V Signed-off-by: Joonsoo Kim --- include/linux/memory_hotplug.h | 3 -- include/linux/mm.h | 1 + mm/cma.c | 62 ++ mm/internal.h | 3 ++ mm/page_alloc.c| 29 +--- 5 files changed, 86 insertions(+), 12 deletions(-) diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 01033fa..ea5af47 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -198,9 +198,6 @@ extern void get_page_bootmem(unsigned long ingo, struct page *page, void mem_hotplug_begin(void); void mem_hotplug_done(void); -extern void set_zone_contiguous(struct zone *zone); -extern void clear_zone_contiguous(struct zone
Re: [PATCH v3] x86/msr: Add write msr notrace to avoid the debug codes splash
2016-10-25 19:15 GMT+08:00 Paolo Bonzini: > > > On 25/10/2016 04:58, Wanpeng Li wrote: >> @@ -319,7 +319,7 @@ static void kvm_guest_apic_eoi_write(u32 reg, u32 val) > > This needs to be notrace too. Ok, I just sent out a new version for this. Regards, Wanpeng Li
Re: [PATCH v3] x86/msr: Add write msr notrace to avoid the debug codes splash
2016-10-25 19:15 GMT+08:00 Paolo Bonzini : > > > On 25/10/2016 04:58, Wanpeng Li wrote: >> @@ -319,7 +319,7 @@ static void kvm_guest_apic_eoi_write(u32 reg, u32 val) > > This needs to be notrace too. Ok, I just sent out a new version for this. Regards, Wanpeng Li
Re: [RFC PATCH] xhci: do not halt the secondary HCD
On Tue, Sep 20, 2016 at 5:56 PM, Mathias Nymanwrote: > Quick Googling shows that that TI TUSB 73x0 USB3.0 xHCI host has an issue > with halting. > > Errata says host needs 125us to 1ms between the last control transfer and > clearing the run/stop bit. (halting the host) > > Suggested workaround is to wait at least 2ms before halting the host. > > See issue #10 in: > http://www.ti.com/lit/er/sllz076/sllz076.pdf > > It might just be that the patch works because it forces halting the host to > be done later (secondary hcd -> primary hcd), giving it enough time after > the last control transfer. Well spotted. I gave this a go, adding a quirk and performing a msleep: +++ b/drivers/usb/host/xhci.c @@ -109,6 +109,10 @@ int xhci_halt(struct xhci_hcd *xhci) { int ret; xhci_dbg_trace(xhci, trace_xhci_dbg_init, "// Halt the HC"); + + if (xhci->quirks & XHCI_HALT_DELAY_QUIRK) + msleep(2); + xhci_quiesce(xhci); However it didn't help. Are we guaranteed that transfers are not in flight at that point? > >>> a first step. >>> >>> load primary >>> load secondary (starts the xhci controller >>> ... >>> unload secondary (halts the controller) >>> unload primary (free memory) > > > Now thinking about it, it doesn't really make sense to halt the host > controller hardware > before removing the primary hcd. It will just cause devices under the > primary (USB2) to > be removed uncleanly. So basically the idea of the workaround makes sense, > it just needs > to be cleaned up from a workaround to intended behavior. Great. When you say clean up, do you just mean tidying the comments? Cheers, Joel > > We might also need an additional quirk for TI TUSB 73x0 that adds a msleep() > before the > xhci_halt, even if it's moved to the last hcd removed. > > -Mathias
Re: [RFC PATCH] xhci: do not halt the secondary HCD
On Tue, Sep 20, 2016 at 5:56 PM, Mathias Nyman wrote: > Quick Googling shows that that TI TUSB 73x0 USB3.0 xHCI host has an issue > with halting. > > Errata says host needs 125us to 1ms between the last control transfer and > clearing the run/stop bit. (halting the host) > > Suggested workaround is to wait at least 2ms before halting the host. > > See issue #10 in: > http://www.ti.com/lit/er/sllz076/sllz076.pdf > > It might just be that the patch works because it forces halting the host to > be done later (secondary hcd -> primary hcd), giving it enough time after > the last control transfer. Well spotted. I gave this a go, adding a quirk and performing a msleep: +++ b/drivers/usb/host/xhci.c @@ -109,6 +109,10 @@ int xhci_halt(struct xhci_hcd *xhci) { int ret; xhci_dbg_trace(xhci, trace_xhci_dbg_init, "// Halt the HC"); + + if (xhci->quirks & XHCI_HALT_DELAY_QUIRK) + msleep(2); + xhci_quiesce(xhci); However it didn't help. Are we guaranteed that transfers are not in flight at that point? > >>> a first step. >>> >>> load primary >>> load secondary (starts the xhci controller >>> ... >>> unload secondary (halts the controller) >>> unload primary (free memory) > > > Now thinking about it, it doesn't really make sense to halt the host > controller hardware > before removing the primary hcd. It will just cause devices under the > primary (USB2) to > be removed uncleanly. So basically the idea of the workaround makes sense, > it just needs > to be cleaned up from a workaround to intended behavior. Great. When you say clean up, do you just mean tidying the comments? Cheers, Joel > > We might also need an additional quirk for TI TUSB 73x0 that adds a msleep() > before the > xhci_halt, even if it's moved to the last hcd removed. > > -Mathias
[PATCH v4] x86/msr: Add write msr notrace to avoid the debug codes splash
From: Wanpeng LiAs Peterz pointed out: | The thing is, many many smp_reschedule_interrupt() invocations don't | actually execute anything much at all and are only send to tickle the | return to user path (which does the actual preemption). This patch add write msr notrace to avoid the debug codes splash. Suggested-by: Peter Zijlstra Suggested-by: Paolo Bonzini Cc: Ingo Molnar Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Paolo Bonzini Signed-off-by: Wanpeng Li --- arch/x86/include/asm/apic.h | 3 ++- arch/x86/include/asm/msr.h | 15 +++ arch/x86/kernel/apic/apic.c | 1 + arch/x86/kernel/kvm.c | 6 +++--- arch/x86/kernel/smp.c | 2 -- 5 files changed, 21 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h index f5aaf6c..a5a0bcf 100644 --- a/arch/x86/include/asm/apic.h +++ b/arch/x86/include/asm/apic.h @@ -196,7 +196,7 @@ static inline void native_apic_msr_write(u32 reg, u32 v) static inline void native_apic_msr_eoi_write(u32 reg, u32 v) { - wrmsr(APIC_BASE_MSR + (APIC_EOI >> 4), APIC_EOI_ACK, 0); + wrmsr_notrace(APIC_BASE_MSR + (APIC_EOI >> 4), APIC_EOI_ACK, 0); } static inline u32 native_apic_msr_read(u32 reg) @@ -332,6 +332,7 @@ struct apic { * on write for EOI. */ void (*eoi_write)(u32 reg, u32 v); + void (*native_eoi_write)(u32 reg, u32 v); u64 (*icr_read)(void); void (*icr_write)(u32 low, u32 high); void (*wait_icr_idle)(void); diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h index b5fee97..afbb221 100644 --- a/arch/x86/include/asm/msr.h +++ b/arch/x86/include/asm/msr.h @@ -127,6 +127,21 @@ notrace static inline void native_write_msr(unsigned int msr, } /* Can be uninlined because referenced by paravirt */ +notrace static inline void native_write_msr_notrace(unsigned int msr, + unsigned low, unsigned high) +{ + asm volatile("1: wrmsr\n" +"2:\n" +_ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_wrmsr_unsafe) +: : "c" (msr), "a"(low), "d" (high) : "memory"); +} + +static inline void wrmsr_notrace(unsigned msr, unsigned low, unsigned high) +{ + native_write_msr_notrace(msr, low, high); +} + +/* Can be uninlined because referenced by paravirt */ notrace static inline int native_write_msr_safe(unsigned int msr, unsigned low, unsigned high) { diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index 88c657b..2686894 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -2263,6 +2263,7 @@ void __init apic_set_eoi_write(void (*eoi_write)(u32 reg, u32 v)) for (drv = __apicdrivers; drv < __apicdrivers_end; drv++) { /* Should happen once for each apic */ WARN_ON((*drv)->eoi_write == eoi_write); + (*drv)->native_eoi_write = (*drv)->eoi_write; (*drv)->eoi_write = eoi_write; } } diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index edbbfc8..a4627ed 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -308,7 +308,7 @@ static void kvm_register_steal_time(void) static DEFINE_PER_CPU(unsigned long, kvm_apic_eoi) = KVM_PV_EOI_DISABLED; -static void kvm_guest_apic_eoi_write(u32 reg, u32 val) +static void kvm_guest_apic_eoi_write_notrace(u32 reg, u32 val) { /** * This relies on __test_and_clear_bit to modify the memory @@ -319,7 +319,7 @@ static void kvm_guest_apic_eoi_write(u32 reg, u32 val) */ if (__test_and_clear_bit(KVM_PV_EOI_BIT, this_cpu_ptr(_apic_eoi))) return; - apic_write(APIC_EOI, APIC_EOI_ACK); + apic->native_eoi_write(APIC_EOI, APIC_EOI_ACK); } static void kvm_guest_cpu_init(void) @@ -474,7 +474,7 @@ void __init kvm_guest_init(void) } if (kvm_para_has_feature(KVM_FEATURE_PV_EOI)) - apic_set_eoi_write(kvm_guest_apic_eoi_write); + apic_set_eoi_write(kvm_guest_apic_eoi_write_notrace); if (kvmclock_vsyscall) kvm_setup_vsyscall_timeinfo(); diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c index c00cb64..68f8cc2 100644 --- a/arch/x86/kernel/smp.c +++ b/arch/x86/kernel/smp.c @@ -261,10 +261,8 @@ static inline void __smp_reschedule_interrupt(void) __visible void smp_reschedule_interrupt(struct pt_regs *regs) { - irq_enter(); ack_APIC_irq(); __smp_reschedule_interrupt(); - irq_exit(); /* * KVM uses this interrupt to force a cpu out of guest mode */ -- 1.9.1
[PATCH v4] x86/msr: Add write msr notrace to avoid the debug codes splash
From: Wanpeng Li As Peterz pointed out: | The thing is, many many smp_reschedule_interrupt() invocations don't | actually execute anything much at all and are only send to tickle the | return to user path (which does the actual preemption). This patch add write msr notrace to avoid the debug codes splash. Suggested-by: Peter Zijlstra Suggested-by: Paolo Bonzini Cc: Ingo Molnar Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Paolo Bonzini Signed-off-by: Wanpeng Li --- arch/x86/include/asm/apic.h | 3 ++- arch/x86/include/asm/msr.h | 15 +++ arch/x86/kernel/apic/apic.c | 1 + arch/x86/kernel/kvm.c | 6 +++--- arch/x86/kernel/smp.c | 2 -- 5 files changed, 21 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h index f5aaf6c..a5a0bcf 100644 --- a/arch/x86/include/asm/apic.h +++ b/arch/x86/include/asm/apic.h @@ -196,7 +196,7 @@ static inline void native_apic_msr_write(u32 reg, u32 v) static inline void native_apic_msr_eoi_write(u32 reg, u32 v) { - wrmsr(APIC_BASE_MSR + (APIC_EOI >> 4), APIC_EOI_ACK, 0); + wrmsr_notrace(APIC_BASE_MSR + (APIC_EOI >> 4), APIC_EOI_ACK, 0); } static inline u32 native_apic_msr_read(u32 reg) @@ -332,6 +332,7 @@ struct apic { * on write for EOI. */ void (*eoi_write)(u32 reg, u32 v); + void (*native_eoi_write)(u32 reg, u32 v); u64 (*icr_read)(void); void (*icr_write)(u32 low, u32 high); void (*wait_icr_idle)(void); diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h index b5fee97..afbb221 100644 --- a/arch/x86/include/asm/msr.h +++ b/arch/x86/include/asm/msr.h @@ -127,6 +127,21 @@ notrace static inline void native_write_msr(unsigned int msr, } /* Can be uninlined because referenced by paravirt */ +notrace static inline void native_write_msr_notrace(unsigned int msr, + unsigned low, unsigned high) +{ + asm volatile("1: wrmsr\n" +"2:\n" +_ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_wrmsr_unsafe) +: : "c" (msr), "a"(low), "d" (high) : "memory"); +} + +static inline void wrmsr_notrace(unsigned msr, unsigned low, unsigned high) +{ + native_write_msr_notrace(msr, low, high); +} + +/* Can be uninlined because referenced by paravirt */ notrace static inline int native_write_msr_safe(unsigned int msr, unsigned low, unsigned high) { diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index 88c657b..2686894 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -2263,6 +2263,7 @@ void __init apic_set_eoi_write(void (*eoi_write)(u32 reg, u32 v)) for (drv = __apicdrivers; drv < __apicdrivers_end; drv++) { /* Should happen once for each apic */ WARN_ON((*drv)->eoi_write == eoi_write); + (*drv)->native_eoi_write = (*drv)->eoi_write; (*drv)->eoi_write = eoi_write; } } diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index edbbfc8..a4627ed 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -308,7 +308,7 @@ static void kvm_register_steal_time(void) static DEFINE_PER_CPU(unsigned long, kvm_apic_eoi) = KVM_PV_EOI_DISABLED; -static void kvm_guest_apic_eoi_write(u32 reg, u32 val) +static void kvm_guest_apic_eoi_write_notrace(u32 reg, u32 val) { /** * This relies on __test_and_clear_bit to modify the memory @@ -319,7 +319,7 @@ static void kvm_guest_apic_eoi_write(u32 reg, u32 val) */ if (__test_and_clear_bit(KVM_PV_EOI_BIT, this_cpu_ptr(_apic_eoi))) return; - apic_write(APIC_EOI, APIC_EOI_ACK); + apic->native_eoi_write(APIC_EOI, APIC_EOI_ACK); } static void kvm_guest_cpu_init(void) @@ -474,7 +474,7 @@ void __init kvm_guest_init(void) } if (kvm_para_has_feature(KVM_FEATURE_PV_EOI)) - apic_set_eoi_write(kvm_guest_apic_eoi_write); + apic_set_eoi_write(kvm_guest_apic_eoi_write_notrace); if (kvmclock_vsyscall) kvm_setup_vsyscall_timeinfo(); diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c index c00cb64..68f8cc2 100644 --- a/arch/x86/kernel/smp.c +++ b/arch/x86/kernel/smp.c @@ -261,10 +261,8 @@ static inline void __smp_reschedule_interrupt(void) __visible void smp_reschedule_interrupt(struct pt_regs *regs) { - irq_enter(); ack_APIC_irq(); __smp_reschedule_interrupt(); - irq_exit(); /* * KVM uses this interrupt to force a cpu out of guest mode */ -- 1.9.1
Re: [PATCHv4 18/43] block: define BIO_MAX_PAGES to HPAGE_PMD_NR if huge page cache enabled
On Oct 25, 2016, at 6:54 AM, Kirill A. Shutemovwrote: > > On Tue, Oct 25, 2016 at 12:21:22AM -0700, Christoph Hellwig wrote: >> On Tue, Oct 25, 2016 at 03:13:17AM +0300, Kirill A. Shutemov wrote: >>> We are going to do IO a huge page a time. So we need BIO_MAX_PAGES to be >>> at least HPAGE_PMD_NR. For x86-64, it's 512 pages. >> >> NAK. The maximum bio size should not depend on an obscure vm config, >> please send a standalone patch increasing the size to the block list, >> with a much long explanation. Also you can't simply increase the size >> of the largers pool, we'll probably need more pools instead, or maybe >> even implement a similar chaining scheme as we do for struct >> scatterlist. > > The size of required pool depends on architecture: different architectures > has different (huge page size)/(base page size). > > Would it be okay if I add one more pool with size equal to HPAGE_PMD_NR, > if it's bigger than than BIO_MAX_PAGES and huge pages are enabled? Why wouldn't you have all the pool sizes in between? Definitely 1MB has been too small already for high-bandwidth IO. I wouldn't mind BIOs up to 4MB or larger since most high-end RAID hardware does best with 4MB IOs. Cheers, Andreas signature.asc Description: Message signed with OpenPGP using GPGMail
Re: [PATCHv4 18/43] block: define BIO_MAX_PAGES to HPAGE_PMD_NR if huge page cache enabled
On Oct 25, 2016, at 6:54 AM, Kirill A. Shutemov wrote: > > On Tue, Oct 25, 2016 at 12:21:22AM -0700, Christoph Hellwig wrote: >> On Tue, Oct 25, 2016 at 03:13:17AM +0300, Kirill A. Shutemov wrote: >>> We are going to do IO a huge page a time. So we need BIO_MAX_PAGES to be >>> at least HPAGE_PMD_NR. For x86-64, it's 512 pages. >> >> NAK. The maximum bio size should not depend on an obscure vm config, >> please send a standalone patch increasing the size to the block list, >> with a much long explanation. Also you can't simply increase the size >> of the largers pool, we'll probably need more pools instead, or maybe >> even implement a similar chaining scheme as we do for struct >> scatterlist. > > The size of required pool depends on architecture: different architectures > has different (huge page size)/(base page size). > > Would it be okay if I add one more pool with size equal to HPAGE_PMD_NR, > if it's bigger than than BIO_MAX_PAGES and huge pages are enabled? Why wouldn't you have all the pool sizes in between? Definitely 1MB has been too small already for high-bandwidth IO. I wouldn't mind BIOs up to 4MB or larger since most high-end RAID hardware does best with 4MB IOs. Cheers, Andreas signature.asc Description: Message signed with OpenPGP using GPGMail
RE: [PATCH 1/3] clk: qcom: gdsc: Add support for gdscs with HW control
Hi Stan, >Hi Sricharan, > >On 10/24/2016 01:18 PM, Sricharan R wrote: >> From: Rajendra Nayak>> >> Some GDSCs might support a HW control mode, where in the power >> domain (gdsc) is brought in and out of low power state (while >> unsued) without any SW assistance, saving power. >> Such GDSCs can be configured in a HW control mode when powered on >> until they are explicitly requested to be powered off by software. >> >> Signed-off-by: Rajendra Nayak >> Signed-off-by: Sricharan R >> --- >> drivers/clk/qcom/gdsc.c | 15 +++ >> drivers/clk/qcom/gdsc.h | 1 + >> 2 files changed, 16 insertions(+) >> >> diff --git a/drivers/clk/qcom/gdsc.c b/drivers/clk/qcom/gdsc.c >> index f12d7b2..a5e1c8c 100644 >> --- a/drivers/clk/qcom/gdsc.c >> +++ b/drivers/clk/qcom/gdsc.c >> @@ -55,6 +55,13 @@ static int gdsc_is_enabled(struct gdsc *sc, unsigned int >> reg) >> return !!(val & PWR_ON_MASK); >> } >> >> +static int gdsc_hwctrl(struct gdsc *sc, bool en) >> +{ >> +u32 val = en ? HW_CONTROL_MASK : 0; >> + >> +return regmap_update_bits(sc->regmap, sc->gdscr, HW_CONTROL_MASK, val); >> +} >> + >> static int gdsc_toggle_logic(struct gdsc *sc, bool en) >> { >> int ret; >> @@ -164,6 +171,10 @@ static int gdsc_enable(struct generic_pm_domain *domain) >> */ >> udelay(1); >> >> +/* Turn on HW trigger mode if supported */ >> +if (sc->flags & HW_CTRL) >> +gdsc_hwctrl(sc, true); > Sure, will add the check. Regards, Sricharan
RE: [PATCH 1/3] clk: qcom: gdsc: Add support for gdscs with HW control
Hi Stan, >Hi Sricharan, > >On 10/24/2016 01:18 PM, Sricharan R wrote: >> From: Rajendra Nayak >> >> Some GDSCs might support a HW control mode, where in the power >> domain (gdsc) is brought in and out of low power state (while >> unsued) without any SW assistance, saving power. >> Such GDSCs can be configured in a HW control mode when powered on >> until they are explicitly requested to be powered off by software. >> >> Signed-off-by: Rajendra Nayak >> Signed-off-by: Sricharan R >> --- >> drivers/clk/qcom/gdsc.c | 15 +++ >> drivers/clk/qcom/gdsc.h | 1 + >> 2 files changed, 16 insertions(+) >> >> diff --git a/drivers/clk/qcom/gdsc.c b/drivers/clk/qcom/gdsc.c >> index f12d7b2..a5e1c8c 100644 >> --- a/drivers/clk/qcom/gdsc.c >> +++ b/drivers/clk/qcom/gdsc.c >> @@ -55,6 +55,13 @@ static int gdsc_is_enabled(struct gdsc *sc, unsigned int >> reg) >> return !!(val & PWR_ON_MASK); >> } >> >> +static int gdsc_hwctrl(struct gdsc *sc, bool en) >> +{ >> +u32 val = en ? HW_CONTROL_MASK : 0; >> + >> +return regmap_update_bits(sc->regmap, sc->gdscr, HW_CONTROL_MASK, val); >> +} >> + >> static int gdsc_toggle_logic(struct gdsc *sc, bool en) >> { >> int ret; >> @@ -164,6 +171,10 @@ static int gdsc_enable(struct generic_pm_domain *domain) >> */ >> udelay(1); >> >> +/* Turn on HW trigger mode if supported */ >> +if (sc->flags & HW_CTRL) >> +gdsc_hwctrl(sc, true); > Sure, will add the check. Regards, Sricharan
[PATCH] drm: rcar-du: Fix R-Car Gen3 crash when VSP is disabled
From: Magnus DammFor the DU to operate on R-Car Gen3 hardware a combination of DU and VSP devices are required. Since the DU driver also supports earlier generations hardware the VSP portion is enabled via Kconfig. The arm64 defconfig is as of v4.9-rc1 having the DU driver enabled as a module, however this is not enough to support R-Car Gen3. In the current case of CONFIG_DRM_RCAR_VSP=n then the kernel crashes when loading the module. This patch is fixing that particular case. In more detail, the crash triggers in drm_atomic_get_plane_state() when __drm_atomic_helper_set_config() passes NULL as crtc->primary. This patch corrects this issue by failing to load the DU driver on R-Car Gen3 when VSP is not available. Signed-off-by: Magnus Damm --- drivers/gpu/drm/rcar-du/rcar_du_vsp.h |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- 0001/drivers/gpu/drm/rcar-du/rcar_du_vsp.h +++ work/drivers/gpu/drm/rcar-du/rcar_du_vsp.h 2016-10-26 00:01:12.920607110 +0900 @@ -70,7 +70,7 @@ void rcar_du_vsp_disable(struct rcar_du_ void rcar_du_vsp_atomic_begin(struct rcar_du_crtc *crtc); void rcar_du_vsp_atomic_flush(struct rcar_du_crtc *crtc); #else -static inline int rcar_du_vsp_init(struct rcar_du_vsp *vsp) { return 0; }; +static inline int rcar_du_vsp_init(struct rcar_du_vsp *vsp) { return -ENXIO; }; static inline void rcar_du_vsp_enable(struct rcar_du_crtc *crtc) { }; static inline void rcar_du_vsp_disable(struct rcar_du_crtc *crtc) { }; static inline void rcar_du_vsp_atomic_begin(struct rcar_du_crtc *crtc) { };
[PATCH] drm: rcar-du: Fix R-Car Gen3 crash when VSP is disabled
From: Magnus Damm For the DU to operate on R-Car Gen3 hardware a combination of DU and VSP devices are required. Since the DU driver also supports earlier generations hardware the VSP portion is enabled via Kconfig. The arm64 defconfig is as of v4.9-rc1 having the DU driver enabled as a module, however this is not enough to support R-Car Gen3. In the current case of CONFIG_DRM_RCAR_VSP=n then the kernel crashes when loading the module. This patch is fixing that particular case. In more detail, the crash triggers in drm_atomic_get_plane_state() when __drm_atomic_helper_set_config() passes NULL as crtc->primary. This patch corrects this issue by failing to load the DU driver on R-Car Gen3 when VSP is not available. Signed-off-by: Magnus Damm --- drivers/gpu/drm/rcar-du/rcar_du_vsp.h |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- 0001/drivers/gpu/drm/rcar-du/rcar_du_vsp.h +++ work/drivers/gpu/drm/rcar-du/rcar_du_vsp.h 2016-10-26 00:01:12.920607110 +0900 @@ -70,7 +70,7 @@ void rcar_du_vsp_disable(struct rcar_du_ void rcar_du_vsp_atomic_begin(struct rcar_du_crtc *crtc); void rcar_du_vsp_atomic_flush(struct rcar_du_crtc *crtc); #else -static inline int rcar_du_vsp_init(struct rcar_du_vsp *vsp) { return 0; }; +static inline int rcar_du_vsp_init(struct rcar_du_vsp *vsp) { return -ENXIO; }; static inline void rcar_du_vsp_enable(struct rcar_du_crtc *crtc) { }; static inline void rcar_du_vsp_disable(struct rcar_du_crtc *crtc) { }; static inline void rcar_du_vsp_atomic_begin(struct rcar_du_crtc *crtc) { };
Re: [PATCH V2 4/8] PM / OPP: Pass struct dev_pm_opp_supply to _set_opp_voltage()
On 25-10-16, 13:26, Stephen Boyd wrote: > For things like AVS we'll probably want to do that, although it's > sort of funny because replacing RCU with rw-locks is the opposite > direction most people go. Yes, that would be very funny :) > With AVS we would be updating the > voltage(s) in use for the current OPP, and we would want that > update to block any OPP transition until the voltage is adjusted. > I don't know how we would do that with RCU very well. Plus, RCU > is for reader heavy things, but we mostly have one or two > readers. Not just that, think of opp_disable() function. What guarantees currently that an OPP being disabled isn't already used right now? Or is on the way of getting used? I strongly feel RCU is not the best fit for OPP core at least. > I guess it's ok for now to do all this copying, but it feels like > we'll need to undo a large portion of it later with things like > AVS. Yes. > Or at least we'll be doing copies for almost no reason > because we'll want to hold the read lock across the whole OPP > transition. I was going to suggest we pass around information > about what we want to grab from the RCU protected data > structures, think index of regulator, etc. and then have small > RCU read-side critical sections to grab that info during the OPP > transition but I'm not sure that's any better. It might be worse > because the OPP could change during the OPP transition and we > could be using half of the old and half of the new data. The problem is that this code is getting harder to read for everybody. If we are finding it difficult to understand, what about newbies.. -- viresh
Re: [PATCH V2 4/8] PM / OPP: Pass struct dev_pm_opp_supply to _set_opp_voltage()
On 25-10-16, 13:26, Stephen Boyd wrote: > For things like AVS we'll probably want to do that, although it's > sort of funny because replacing RCU with rw-locks is the opposite > direction most people go. Yes, that would be very funny :) > With AVS we would be updating the > voltage(s) in use for the current OPP, and we would want that > update to block any OPP transition until the voltage is adjusted. > I don't know how we would do that with RCU very well. Plus, RCU > is for reader heavy things, but we mostly have one or two > readers. Not just that, think of opp_disable() function. What guarantees currently that an OPP being disabled isn't already used right now? Or is on the way of getting used? I strongly feel RCU is not the best fit for OPP core at least. > I guess it's ok for now to do all this copying, but it feels like > we'll need to undo a large portion of it later with things like > AVS. Yes. > Or at least we'll be doing copies for almost no reason > because we'll want to hold the read lock across the whole OPP > transition. I was going to suggest we pass around information > about what we want to grab from the RCU protected data > structures, think index of regulator, etc. and then have small > RCU read-side critical sections to grab that info during the OPP > transition but I'm not sure that's any better. It might be worse > because the OPP could change during the OPP transition and we > could be using half of the old and half of the new data. The problem is that this code is getting harder to read for everybody. If we are finding it difficult to understand, what about newbies.. -- viresh
Re: [PATCH 1/3] usb: dwc3: host: inherit dma configuration from parent dev
On Tue, Oct 25, 2016 at 04:26:26PM +0530, Sriram Dash wrote: > For xhci-hcd platform device, all the DMA parameters are not configured > properly, notably dma ops for dwc3 devices. > > The idea here is that you pass in the parent of_node along with the child > device pointer, so it would behave exactly like the parent already does. > The difference is that it also handles all the other attributes besides > the mask. > Splitting the usb_bus->controller field into the Linux-internal device > (used for the sysfs hierarchy, for printks and for power management) > and a new pointer (used for DMA, DT enumeration and phy lookup) probably > covers all that we really need. > > Signed-off-by: Arnd Bergmann> Signed-off-by: Sriram Dash > Cc: Felipe Balbi > Cc: Grygorii Strashko > Cc: Sinjan Kumar > Cc: David Fisher > Cc: Catalin Marinas > Cc: "Thang Q. Nguyen" > Cc: Yoshihiro Shimoda > Cc: Stephen Boyd > Cc: Bjorn Andersson > Cc: Ming Lei > Cc: Jon Masters > Cc: Dann Frazier > Cc: Peter Chen > Cc: Leo Li > --- > drivers/usb/chipidea/host.c | 3 ++- > drivers/usb/chipidea/udc.c | 10 + > drivers/usb/core/buffer.c| 12 +-- > drivers/usb/core/hcd.c | 48 > ++-- > drivers/usb/core/usb.c | 18 - > drivers/usb/dwc3/core.c | 22 +--- > drivers/usb/dwc3/core.h | 1 + > drivers/usb/dwc3/ep0.c | 8 > drivers/usb/dwc3/gadget.c| 37 +- > drivers/usb/dwc3/host.c | 8 > drivers/usb/host/ehci-fsl.c | 4 ++-- > drivers/usb/host/xhci-mem.c | 12 +-- > drivers/usb/host/xhci-plat.c | 33 +++--- > drivers/usb/host/xhci.c | 15 ++ > include/linux/usb.h | 1 + > include/linux/usb/hcd.h | 3 +++ > 16 files changed, 144 insertions(+), 91 deletions(-) > > diff --git a/drivers/usb/chipidea/host.c b/drivers/usb/chipidea/host.c > index 96ae695..ca27893 100644 > --- a/drivers/usb/chipidea/host.c > +++ b/drivers/usb/chipidea/host.c > @@ -116,7 +116,8 @@ static int host_start(struct ci_hdrc *ci) > if (usb_disabled()) > return -ENODEV; > > - hcd = usb_create_hcd(_ehci_hc_driver, ci->dev, dev_name(ci->dev)); > + hcd = __usb_create_hcd(_ehci_hc_driver, ci->dev->parent, > +ci->dev, dev_name(ci->dev), NULL); > if (!hcd) > return -ENOMEM; > > diff --git a/drivers/usb/chipidea/udc.c b/drivers/usb/chipidea/udc.c > index 661f43f..bc55922 100644 > --- a/drivers/usb/chipidea/udc.c > +++ b/drivers/usb/chipidea/udc.c > @@ -423,7 +423,8 @@ static int _hardware_enqueue(struct ci_hw_ep *hwep, > struct ci_hw_req *hwreq) > > hwreq->req.status = -EALREADY; > > - ret = usb_gadget_map_request(>gadget, >req, hwep->dir); > + ret = usb_gadget_map_request_by_dev(ci->dev->parent, > + >req, hwep->dir); > if (ret) > return ret; > > @@ -603,7 +604,8 @@ static int _hardware_dequeue(struct ci_hw_ep *hwep, > struct ci_hw_req *hwreq) > list_del_init(>td); > } > > - usb_gadget_unmap_request(>ci->gadget, >req, hwep->dir); > + usb_gadget_unmap_request_by_dev(hwep->ci->dev->parent, > + >req, hwep->dir); > > hwreq->req.actual += actual; > > @@ -1904,13 +1906,13 @@ static int udc_start(struct ci_hdrc *ci) > INIT_LIST_HEAD(>gadget.ep_list); > > /* alloc resources */ > - ci->qh_pool = dma_pool_create("ci_hw_qh", dev, > + ci->qh_pool = dma_pool_create("ci_hw_qh", dev->parent, > sizeof(struct ci_hw_qh), > 64, CI_HDRC_PAGE_SIZE); > if (ci->qh_pool == NULL) > return -ENOMEM; > > - ci->td_pool = dma_pool_create("ci_hw_td", dev, > + ci->td_pool = dma_pool_create("ci_hw_td", dev->parent, > sizeof(struct ci_hw_td), > 64, CI_HDRC_PAGE_SIZE); The chipidea part is ok for me, but just follow Arnd's suggestion for patch split, subject, and commit log. Peter > if (ci->td_pool == NULL) { > diff --git a/drivers/usb/core/buffer.c b/drivers/usb/core/buffer.c > index 98e39f9..1e41ef7 100644 > --- a/drivers/usb/core/buffer.c > +++ b/drivers/usb/core/buffer.c > @@ -63,7 +63,7 @@ int hcd_buffer_create(struct usb_hcd *hcd) > int i, size; > > if (!IS_ENABLED(CONFIG_HAS_DMA) || > -
Re: [PATCH 1/3] usb: dwc3: host: inherit dma configuration from parent dev
On Tue, Oct 25, 2016 at 04:26:26PM +0530, Sriram Dash wrote: > For xhci-hcd platform device, all the DMA parameters are not configured > properly, notably dma ops for dwc3 devices. > > The idea here is that you pass in the parent of_node along with the child > device pointer, so it would behave exactly like the parent already does. > The difference is that it also handles all the other attributes besides > the mask. > Splitting the usb_bus->controller field into the Linux-internal device > (used for the sysfs hierarchy, for printks and for power management) > and a new pointer (used for DMA, DT enumeration and phy lookup) probably > covers all that we really need. > > Signed-off-by: Arnd Bergmann > Signed-off-by: Sriram Dash > Cc: Felipe Balbi > Cc: Grygorii Strashko > Cc: Sinjan Kumar > Cc: David Fisher > Cc: Catalin Marinas > Cc: "Thang Q. Nguyen" > Cc: Yoshihiro Shimoda > Cc: Stephen Boyd > Cc: Bjorn Andersson > Cc: Ming Lei > Cc: Jon Masters > Cc: Dann Frazier > Cc: Peter Chen > Cc: Leo Li > --- > drivers/usb/chipidea/host.c | 3 ++- > drivers/usb/chipidea/udc.c | 10 + > drivers/usb/core/buffer.c| 12 +-- > drivers/usb/core/hcd.c | 48 > ++-- > drivers/usb/core/usb.c | 18 - > drivers/usb/dwc3/core.c | 22 +--- > drivers/usb/dwc3/core.h | 1 + > drivers/usb/dwc3/ep0.c | 8 > drivers/usb/dwc3/gadget.c| 37 +- > drivers/usb/dwc3/host.c | 8 > drivers/usb/host/ehci-fsl.c | 4 ++-- > drivers/usb/host/xhci-mem.c | 12 +-- > drivers/usb/host/xhci-plat.c | 33 +++--- > drivers/usb/host/xhci.c | 15 ++ > include/linux/usb.h | 1 + > include/linux/usb/hcd.h | 3 +++ > 16 files changed, 144 insertions(+), 91 deletions(-) > > diff --git a/drivers/usb/chipidea/host.c b/drivers/usb/chipidea/host.c > index 96ae695..ca27893 100644 > --- a/drivers/usb/chipidea/host.c > +++ b/drivers/usb/chipidea/host.c > @@ -116,7 +116,8 @@ static int host_start(struct ci_hdrc *ci) > if (usb_disabled()) > return -ENODEV; > > - hcd = usb_create_hcd(_ehci_hc_driver, ci->dev, dev_name(ci->dev)); > + hcd = __usb_create_hcd(_ehci_hc_driver, ci->dev->parent, > +ci->dev, dev_name(ci->dev), NULL); > if (!hcd) > return -ENOMEM; > > diff --git a/drivers/usb/chipidea/udc.c b/drivers/usb/chipidea/udc.c > index 661f43f..bc55922 100644 > --- a/drivers/usb/chipidea/udc.c > +++ b/drivers/usb/chipidea/udc.c > @@ -423,7 +423,8 @@ static int _hardware_enqueue(struct ci_hw_ep *hwep, > struct ci_hw_req *hwreq) > > hwreq->req.status = -EALREADY; > > - ret = usb_gadget_map_request(>gadget, >req, hwep->dir); > + ret = usb_gadget_map_request_by_dev(ci->dev->parent, > + >req, hwep->dir); > if (ret) > return ret; > > @@ -603,7 +604,8 @@ static int _hardware_dequeue(struct ci_hw_ep *hwep, > struct ci_hw_req *hwreq) > list_del_init(>td); > } > > - usb_gadget_unmap_request(>ci->gadget, >req, hwep->dir); > + usb_gadget_unmap_request_by_dev(hwep->ci->dev->parent, > + >req, hwep->dir); > > hwreq->req.actual += actual; > > @@ -1904,13 +1906,13 @@ static int udc_start(struct ci_hdrc *ci) > INIT_LIST_HEAD(>gadget.ep_list); > > /* alloc resources */ > - ci->qh_pool = dma_pool_create("ci_hw_qh", dev, > + ci->qh_pool = dma_pool_create("ci_hw_qh", dev->parent, > sizeof(struct ci_hw_qh), > 64, CI_HDRC_PAGE_SIZE); > if (ci->qh_pool == NULL) > return -ENOMEM; > > - ci->td_pool = dma_pool_create("ci_hw_td", dev, > + ci->td_pool = dma_pool_create("ci_hw_td", dev->parent, > sizeof(struct ci_hw_td), > 64, CI_HDRC_PAGE_SIZE); The chipidea part is ok for me, but just follow Arnd's suggestion for patch split, subject, and commit log. Peter > if (ci->td_pool == NULL) { > diff --git a/drivers/usb/core/buffer.c b/drivers/usb/core/buffer.c > index 98e39f9..1e41ef7 100644 > --- a/drivers/usb/core/buffer.c > +++ b/drivers/usb/core/buffer.c > @@ -63,7 +63,7 @@ int hcd_buffer_create(struct usb_hcd *hcd) > int i, size; > > if (!IS_ENABLED(CONFIG_HAS_DMA) || > - (!hcd->self.controller->dma_mask && > + (!hcd->self.sysdev->dma_mask && >!(hcd->driver->flags & HCD_LOCAL_MEM))) > return 0; > > @@ -72,7 +72,7 @@ int hcd_buffer_create(struct usb_hcd *hcd) > if (!size) > continue; > snprintf(name, sizeof(name), "buffer-%d", size); > - hcd->pool[i] =
Re: [PATCH V2 0/8] PM / OPP: Multiple regulator support
On 25-10-16, 16:13, Dave Gerlach wrote: > I think what you have shared below is a good safety check but if I rename > the regulator properties in the DT for the cpu (to vdd and vbb, meaning > cpufreq detects no regulator) and do *not* call dev_pm_opp_set_regulators > before cpufreq-dt probes we fail before we even get to that point: > > [16.946] cpu cpu0: opp_parse_supplies: Invalid number of elements in > opp-microvolt property (6) with supplies (1) > [16.967] cpu cpu0: _of_add_opp_table_v2: Failed to add OPP, -22 > [16.982] cpu cpu0: dev_pm_opp_get_opp_count: OPP table not found (-19) > [16.982] cpu cpu0: OPP table is not ready, deferring probe > > This failure is because opp_parse_supplies assumes a count of 1 regulator if > no regulators at all are present and then hard fails if too many voltages > have been passed for each OPP. Exactly. And yes this is intentional. > It seems we need a check much earlier similar > to what you suggested below to allow us to defer if an OPP has supplied > voltages but no regulator has been registered with the system. I think this > is reasonable even for the 1 regulator case, no? No. OPP core needs to know about regulators only if the user drivers want it to manage DVFS. It is still possible for cpufreq drivers to use OPP framework for managing the tables, but do the real DVFS stuff themselves. That's why it is not compulsory in the code to set regulator names. And its only wrong if dev_pm_opp_set_rate() is called without first setting the regulators.. > cpufreq-dt won't handle this properly as is, but now that the opp core is > evolving perhaps it makes sense to modify the resources_available check > slightly to rely on the OPP core rather than just a dummy > regulator_get_optional to see if the regulator is ready. I am not sure yet on what to change there. You mean regarding multiple regulators? -- viresh
Re: [PATCH V2 0/8] PM / OPP: Multiple regulator support
On 25-10-16, 16:13, Dave Gerlach wrote: > I think what you have shared below is a good safety check but if I rename > the regulator properties in the DT for the cpu (to vdd and vbb, meaning > cpufreq detects no regulator) and do *not* call dev_pm_opp_set_regulators > before cpufreq-dt probes we fail before we even get to that point: > > [16.946] cpu cpu0: opp_parse_supplies: Invalid number of elements in > opp-microvolt property (6) with supplies (1) > [16.967] cpu cpu0: _of_add_opp_table_v2: Failed to add OPP, -22 > [16.982] cpu cpu0: dev_pm_opp_get_opp_count: OPP table not found (-19) > [16.982] cpu cpu0: OPP table is not ready, deferring probe > > This failure is because opp_parse_supplies assumes a count of 1 regulator if > no regulators at all are present and then hard fails if too many voltages > have been passed for each OPP. Exactly. And yes this is intentional. > It seems we need a check much earlier similar > to what you suggested below to allow us to defer if an OPP has supplied > voltages but no regulator has been registered with the system. I think this > is reasonable even for the 1 regulator case, no? No. OPP core needs to know about regulators only if the user drivers want it to manage DVFS. It is still possible for cpufreq drivers to use OPP framework for managing the tables, but do the real DVFS stuff themselves. That's why it is not compulsory in the code to set regulator names. And its only wrong if dev_pm_opp_set_rate() is called without first setting the regulators.. > cpufreq-dt won't handle this properly as is, but now that the opp core is > evolving perhaps it makes sense to modify the resources_available check > slightly to rely on the OPP core rather than just a dummy > regulator_get_optional to see if the regulator is ready. I am not sure yet on what to change there. You mean regarding multiple regulators? -- viresh
Re: [PATCH V2 0/6] ARM64: Uprobe support added
Hi Catalin, Please let me know if everything else other than is_trap_insn() looks fine to you. May be I can work well in time. It would be great if we can make it into v4.9. ~Pratyush On Tue, Sep 27, 2016 at 1:17 PM, Pratyush Anandwrote: > Changes since v1: > * Exposed sync_icache_aliases() and used that in stead of > flush_uprobe_xol_access() > * Assigned 0x0005 to BRK64_ESR_UPROBES in stead of 0x0008 > * moved uprobe_opcode_t from probes.h to uprobes.h > * Assigned 4 to TIF_UPROBE instead of 5 > * Assigned AARCH64_INSN_SIZE to UPROBE_SWBP_INSN_SIZE instead of hard code 4. > * Removed saved_fault_code from struct arch_uprobe_task > * Removed preempt_dis(en)able() from arch_uprobe_copy_ixol() > * Removed case INSN_GOOD from arch_uprobe_analyze_insn() > * Now we do check that probe point is not for a 32 bit task. > * Return a false positive from is_tarp_insn() > * Changes for rebase conflict resolution > > V1 was here: https://lkml.org/lkml/2016/8/2/29 > Patches have been rebased on next-20160927, so that there would be no > conflicts with other arm64/for-next/core patches. > > Patches have been tested for following: > 1. Step-able instructions, like sub, ldr, add etc. > 2. Simulation-able like ret, cbnz, cbz etc. > 3. uretprobe > 4. Reject-able instructions like sev, wfe etc. > 5. trapped and abort xol path > 6. probe at unaligned user address. > 7. longjump test cases > > aarch32 task probing is not yet supported. > > Pratyush Anand (6): > arm64: kprobe: protect/rename few definitions to be reused by uprobe > arm64: kgdb_step_brk_fn: ignore other's exception > arm64: Handle TRAP_TRACE for user mode as well > arm64: Handle TRAP_BRKPT for user mode as well > arm64: introduce mm context flag to keep 32 bit task information > arm64: Add uprobe support > > arch/arm64/Kconfig | 3 + > arch/arm64/include/asm/cacheflush.h | 1 + > arch/arm64/include/asm/debug-monitors.h | 3 + > arch/arm64/include/asm/elf.h| 12 +- > arch/arm64/include/asm/mmu.h| 1 + > arch/arm64/include/asm/probes.h | 19 +-- > arch/arm64/include/asm/ptrace.h | 8 ++ > arch/arm64/include/asm/thread_info.h| 5 +- > arch/arm64/include/asm/uprobes.h| 36 ++ > arch/arm64/kernel/debug-monitors.c | 40 +++--- > arch/arm64/kernel/kgdb.c| 3 + > arch/arm64/kernel/probes/Makefile | 2 + > arch/arm64/kernel/probes/decode-insn.c | 32 ++--- > arch/arm64/kernel/probes/decode-insn.h | 8 +- > arch/arm64/kernel/probes/kprobes.c | 36 +++--- > arch/arm64/kernel/probes/uprobes.c | 221 > > arch/arm64/kernel/signal.c | 3 + > arch/arm64/mm/flush.c | 2 +- > 18 files changed, 371 insertions(+), 64 deletions(-) > create mode 100644 arch/arm64/include/asm/uprobes.h > create mode 100644 arch/arm64/kernel/probes/uprobes.c > > -- > 2.7.4 >
Re: [PATCH V2 0/6] ARM64: Uprobe support added
Hi Catalin, Please let me know if everything else other than is_trap_insn() looks fine to you. May be I can work well in time. It would be great if we can make it into v4.9. ~Pratyush On Tue, Sep 27, 2016 at 1:17 PM, Pratyush Anand wrote: > Changes since v1: > * Exposed sync_icache_aliases() and used that in stead of > flush_uprobe_xol_access() > * Assigned 0x0005 to BRK64_ESR_UPROBES in stead of 0x0008 > * moved uprobe_opcode_t from probes.h to uprobes.h > * Assigned 4 to TIF_UPROBE instead of 5 > * Assigned AARCH64_INSN_SIZE to UPROBE_SWBP_INSN_SIZE instead of hard code 4. > * Removed saved_fault_code from struct arch_uprobe_task > * Removed preempt_dis(en)able() from arch_uprobe_copy_ixol() > * Removed case INSN_GOOD from arch_uprobe_analyze_insn() > * Now we do check that probe point is not for a 32 bit task. > * Return a false positive from is_tarp_insn() > * Changes for rebase conflict resolution > > V1 was here: https://lkml.org/lkml/2016/8/2/29 > Patches have been rebased on next-20160927, so that there would be no > conflicts with other arm64/for-next/core patches. > > Patches have been tested for following: > 1. Step-able instructions, like sub, ldr, add etc. > 2. Simulation-able like ret, cbnz, cbz etc. > 3. uretprobe > 4. Reject-able instructions like sev, wfe etc. > 5. trapped and abort xol path > 6. probe at unaligned user address. > 7. longjump test cases > > aarch32 task probing is not yet supported. > > Pratyush Anand (6): > arm64: kprobe: protect/rename few definitions to be reused by uprobe > arm64: kgdb_step_brk_fn: ignore other's exception > arm64: Handle TRAP_TRACE for user mode as well > arm64: Handle TRAP_BRKPT for user mode as well > arm64: introduce mm context flag to keep 32 bit task information > arm64: Add uprobe support > > arch/arm64/Kconfig | 3 + > arch/arm64/include/asm/cacheflush.h | 1 + > arch/arm64/include/asm/debug-monitors.h | 3 + > arch/arm64/include/asm/elf.h| 12 +- > arch/arm64/include/asm/mmu.h| 1 + > arch/arm64/include/asm/probes.h | 19 +-- > arch/arm64/include/asm/ptrace.h | 8 ++ > arch/arm64/include/asm/thread_info.h| 5 +- > arch/arm64/include/asm/uprobes.h| 36 ++ > arch/arm64/kernel/debug-monitors.c | 40 +++--- > arch/arm64/kernel/kgdb.c| 3 + > arch/arm64/kernel/probes/Makefile | 2 + > arch/arm64/kernel/probes/decode-insn.c | 32 ++--- > arch/arm64/kernel/probes/decode-insn.h | 8 +- > arch/arm64/kernel/probes/kprobes.c | 36 +++--- > arch/arm64/kernel/probes/uprobes.c | 221 > > arch/arm64/kernel/signal.c | 3 + > arch/arm64/mm/flush.c | 2 +- > 18 files changed, 371 insertions(+), 64 deletions(-) > create mode 100644 arch/arm64/include/asm/uprobes.h > create mode 100644 arch/arm64/kernel/probes/uprobes.c > > -- > 2.7.4 >
Re: [PATCH 1/2] mm/memblock: prepare a capability to support memblock near alloc
On 2016/10/25 21:23, Michal Hocko wrote: > On Tue 25-10-16 10:59:17, Zhen Lei wrote: >> If HAVE_MEMORYLESS_NODES is selected, and some memoryless numa nodes are >> actually exist. The percpu variable areas and numa control blocks of that >> memoryless numa nodes need to be allocated from the nearest available >> node to improve performance. >> >> Although memblock_alloc_try_nid and memblock_virt_alloc_try_nid try the >> specified nid at the first time, but if that allocation failed it will >> directly drop to use NUMA_NO_NODE. This mean any nodes maybe possible at >> the second time. >> >> To compatible the above old scene, I use a marco node_distance_ready to >> control it. By default, the marco node_distance_ready is not defined in >> any platforms, the above mentioned functions will work as normal as >> before. Otherwise, they will try the nearest node first. > > I am sorry but it is absolutely unclear to me _what_ is the motivation > of the patch. Is this a performance optimization, correctness issue or > something else? Could you please restate what is the problem, why do you > think it has to be fixed at memblock layer and describe what the actual > fix is please? This is a performance optimization. The problem is if some memoryless numa nodes are actually exist, for example: there are total 4 nodes, 0,1,2,3, node 1 has no memory, and the node distances is as below: -board--- | | | | socket0 socket1 / \ / \ / \ / \ node0 node1 node2 node3 distance[1][0] is nearer than distance[1][2] and distance[1][3]. CPUs on node1 access the memory of node0 is faster than node2 or node3. Linux defines a lot of percpu variables, each cpu has a copy of it and most of the time only to access their own percpu area. In this example, we hope the percpu area of CPUs on node1 allocated from node0. But without these patches, it's not sure that. If each node has their own memory, we can directly use below functions to allocate memory from its local node: 1. memblock_alloc_nid 2. memblock_alloc_try_nid 3. memblock_virt_alloc_try_nid_nopanic 4. memblock_virt_alloc_try_nid So, these patches is only used for numa memoryless scenario. Another use case is the control block "extern pg_data_t *node_data[]", Here is an example of x86 numa in arch/x86/mm/numa.c: static void __init alloc_node_data(int nid) { ... ... /* * Allocate node data. Try node-local memory and then any node. //==>But the nearest node is the best * Never allocate in DMA zone. */ nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid); if (!nd_pa) { nd_pa = __memblock_alloc_base(nd_size, SMP_CACHE_BYTES, MEMBLOCK_ALLOC_ACCESSIBLE); if (!nd_pa) { pr_err("Cannot find %zu bytes in node %d\n", nd_size, nid); return; } } nd = __va(nd_pa); ... ... node_data[nid] = nd; > >>From a quick glance you are trying to bend over the memblock API for > something that should be handled on a different layer. > >> >> Signed-off-by: Zhen Lei>> --- >> mm/memblock.c | 76 >> ++- >> 1 file changed, 65 insertions(+), 11 deletions(-) >> >> diff --git a/mm/memblock.c b/mm/memblock.c >> index 7608bc3..556bbd2 100644 >> --- a/mm/memblock.c >> +++ b/mm/memblock.c >> @@ -1213,9 +1213,71 @@ phys_addr_t __init memblock_alloc(phys_addr_t size, >> phys_addr_t align) >> return memblock_alloc_base(size, align, MEMBLOCK_ALLOC_ACCESSIBLE); >> } >> >> +#ifndef node_distance_ready >> +#define node_distance_ready() 0 >> +#endif >> + >> +static phys_addr_t __init memblock_alloc_near_nid(phys_addr_t size, >> +phys_addr_t align, phys_addr_t start, >> +phys_addr_t end, int nid, ulong flags, >> +int alloc_func_type) >> +{ >> +int nnid, round = 0; >> +u64 pa; >> +DECLARE_BITMAP(nodes_map, MAX_NUMNODES); >> + >> +bitmap_zero(nodes_map, MAX_NUMNODES); >> + >> +again: >> +/* >> + * There are total 4 cases: >> + * >> + * 1)2) node_distance_ready || !node_distance_ready >> + * Round 1, nnid = nid = NUMA_NO_NODE; >> + * >> + * 3) !node_distance_ready >> + * Round 1, nnid = nid; >> + *::Round 2, currently only applicable for alloc_func_type = <0> >> + * Round 2, nnid = NUMA_NO_NODE; >> + * 4) node_distance_ready >> + * Round 1, LOCAL_DISTANCE, nnid = nid; >> + * Round ?, nnid = nearest nid;
Re: [PATCH 1/2] mm/memblock: prepare a capability to support memblock near alloc
On 2016/10/25 21:23, Michal Hocko wrote: > On Tue 25-10-16 10:59:17, Zhen Lei wrote: >> If HAVE_MEMORYLESS_NODES is selected, and some memoryless numa nodes are >> actually exist. The percpu variable areas and numa control blocks of that >> memoryless numa nodes need to be allocated from the nearest available >> node to improve performance. >> >> Although memblock_alloc_try_nid and memblock_virt_alloc_try_nid try the >> specified nid at the first time, but if that allocation failed it will >> directly drop to use NUMA_NO_NODE. This mean any nodes maybe possible at >> the second time. >> >> To compatible the above old scene, I use a marco node_distance_ready to >> control it. By default, the marco node_distance_ready is not defined in >> any platforms, the above mentioned functions will work as normal as >> before. Otherwise, they will try the nearest node first. > > I am sorry but it is absolutely unclear to me _what_ is the motivation > of the patch. Is this a performance optimization, correctness issue or > something else? Could you please restate what is the problem, why do you > think it has to be fixed at memblock layer and describe what the actual > fix is please? This is a performance optimization. The problem is if some memoryless numa nodes are actually exist, for example: there are total 4 nodes, 0,1,2,3, node 1 has no memory, and the node distances is as below: -board--- | | | | socket0 socket1 / \ / \ / \ / \ node0 node1 node2 node3 distance[1][0] is nearer than distance[1][2] and distance[1][3]. CPUs on node1 access the memory of node0 is faster than node2 or node3. Linux defines a lot of percpu variables, each cpu has a copy of it and most of the time only to access their own percpu area. In this example, we hope the percpu area of CPUs on node1 allocated from node0. But without these patches, it's not sure that. If each node has their own memory, we can directly use below functions to allocate memory from its local node: 1. memblock_alloc_nid 2. memblock_alloc_try_nid 3. memblock_virt_alloc_try_nid_nopanic 4. memblock_virt_alloc_try_nid So, these patches is only used for numa memoryless scenario. Another use case is the control block "extern pg_data_t *node_data[]", Here is an example of x86 numa in arch/x86/mm/numa.c: static void __init alloc_node_data(int nid) { ... ... /* * Allocate node data. Try node-local memory and then any node. //==>But the nearest node is the best * Never allocate in DMA zone. */ nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid); if (!nd_pa) { nd_pa = __memblock_alloc_base(nd_size, SMP_CACHE_BYTES, MEMBLOCK_ALLOC_ACCESSIBLE); if (!nd_pa) { pr_err("Cannot find %zu bytes in node %d\n", nd_size, nid); return; } } nd = __va(nd_pa); ... ... node_data[nid] = nd; > >>From a quick glance you are trying to bend over the memblock API for > something that should be handled on a different layer. > >> >> Signed-off-by: Zhen Lei >> --- >> mm/memblock.c | 76 >> ++- >> 1 file changed, 65 insertions(+), 11 deletions(-) >> >> diff --git a/mm/memblock.c b/mm/memblock.c >> index 7608bc3..556bbd2 100644 >> --- a/mm/memblock.c >> +++ b/mm/memblock.c >> @@ -1213,9 +1213,71 @@ phys_addr_t __init memblock_alloc(phys_addr_t size, >> phys_addr_t align) >> return memblock_alloc_base(size, align, MEMBLOCK_ALLOC_ACCESSIBLE); >> } >> >> +#ifndef node_distance_ready >> +#define node_distance_ready() 0 >> +#endif >> + >> +static phys_addr_t __init memblock_alloc_near_nid(phys_addr_t size, >> +phys_addr_t align, phys_addr_t start, >> +phys_addr_t end, int nid, ulong flags, >> +int alloc_func_type) >> +{ >> +int nnid, round = 0; >> +u64 pa; >> +DECLARE_BITMAP(nodes_map, MAX_NUMNODES); >> + >> +bitmap_zero(nodes_map, MAX_NUMNODES); >> + >> +again: >> +/* >> + * There are total 4 cases: >> + * >> + * 1)2) node_distance_ready || !node_distance_ready >> + * Round 1, nnid = nid = NUMA_NO_NODE; >> + * >> + * 3) !node_distance_ready >> + * Round 1, nnid = nid; >> + *::Round 2, currently only applicable for alloc_func_type = <0> >> + * Round 2, nnid = NUMA_NO_NODE; >> + * 4) node_distance_ready >> + * Round 1, LOCAL_DISTANCE, nnid = nid; >> + * Round ?, nnid = nearest nid; >> + */ >> +if
[PATCH v6 4/5] ARM: DTS: da850: Add cfgchip syscon node
Add a syscon node for the SoC CFGCHIPn registers. This is needed for the new usb phy driver. Signed-off-by: David Lechner--- arch/arm/boot/dts/da850.dtsi | 4 1 file changed, 4 insertions(+) diff --git a/arch/arm/boot/dts/da850.dtsi b/arch/arm/boot/dts/da850.dtsi index f79e1b9..6bbf20d 100644 --- a/arch/arm/boot/dts/da850.dtsi +++ b/arch/arm/boot/dts/da850.dtsi @@ -188,6 +188,10 @@ }; }; + cfgchip: cfgchip@1417c { + compatible = "ti,da830-cfgchip", "syscon"; + reg = <0x1417c 0x14>; + }; edma0: edma@0 { compatible = "ti,edma3-tpcc"; /* eDMA3 CC0: 0x01c0 - 0x01c0 7fff */ -- 2.7.4
[PATCH v6 0/5] da8xx USB PHY platform devices and clocks
It has been almost 6 months since the v5 submission, so here is a recap: * There were a number of phy and usb dependencies that were submitted separately. * The last of the usb dependencies has finally made its way into linux-next today. * This series was recently included in "[PATCH/RFT v2 00/17] Add DT support for ohci-da8xx". I am breaking it back out again as a standalone series. v6 changes: * Combine "ARM: davinci: da8xx: Enable the usb20 "per" clk on phy_clk_enable" from the "[PATCH/RFT v2 00/17] Add DT support for ohci-da8xx" series with the "ARM: davinci: da8xx: add usb phy clocks" patch in this series. * Change the syscon and da8xx-usb-phy device ids to -1. v5 changes: renamed "usbphy" to "usb_phy" or "usb-phy" as appropriate v4 changes: fix strict checkpatch complaint v3 changes: * Fixed the davinci device tree declarations to use the preferred DT address convention so that the items I have added can be correct too. * Moved that davinci clock init so that we don't have to call ioremap in the clock mux functions. * Added a new "syscon" device for the CFGCHIP registers. This is used by the USB PHY driver and will be used in the future in common clock framework drivers. * USB clocks are moved to a common file instead of having duplicated code. * PHY driver uses syscon for CFGCHIP registers instead of using them directly. David Lechner (5): ARM: davinci: da8xx: add usb phy clocks ARM: davinci: da8xx: Add CFGCHIP syscon platform declaration. ARM: davinci: da8xx: Add USB PHY platform declaration ARM: DTS: da850: Add cfgchip syscon node ARM: DTS: da850: Add usb phy node arch/arm/boot/dts/da850.dtsi| 9 ++ arch/arm/mach-davinci/board-da830-evm.c | 52 +++--- arch/arm/mach-davinci/board-da850-evm.c | 4 + arch/arm/mach-davinci/board-mityomapl138.c | 4 + arch/arm/mach-davinci/board-omapl138-hawk.c | 23 ++- arch/arm/mach-davinci/devices-da8xx.c | 28 arch/arm/mach-davinci/include/mach/da8xx.h | 6 + arch/arm/mach-davinci/usb-da8xx.c | 243 +++- 8 files changed, 327 insertions(+), 42 deletions(-) -- 2.7.4
[PATCH v6 1/5] ARM: davinci: da8xx: add usb phy clocks
Up to this point, the USB phy clock configuration was handled manually in the board files and in the usb drivers. This adds proper clocks so that the usb drivers can use clk_get and clk_enable and not have to worry about the details. Also, the related code is removed from the board files and replaced with the new clock registration functions. Signed-off-by: David LechnerSigned-off-by: Axel Haslam --- I have added "ARM: davinci: da8xx: Enable the usb20 "per" clk on phy_clk_enable" from Axel Haslam to this patch. In the review of Axel's patch, Sekhar said: > We should not be using a NULL device pointer here. Can you pass the musb > device pointer available in the same file? Also, da850_clks[] in da850.c > needs to be fixed to add the matching device name. However, the musb device may not be registered. The usb20_clk can be used to supply a 48MHz clock to USB 1.1 (ohci) without using the musb device. So, I am inclined to leave this as NULL. arch/arm/mach-davinci/board-da830-evm.c | 22 ++- arch/arm/mach-davinci/board-omapl138-hawk.c | 16 +- arch/arm/mach-davinci/include/mach/da8xx.h | 3 + arch/arm/mach-davinci/usb-da8xx.c | 232 +++- 4 files changed, 252 insertions(+), 21 deletions(-) diff --git a/arch/arm/mach-davinci/board-da830-evm.c b/arch/arm/mach-davinci/board-da830-evm.c index 3d8cf8c..605d444 100644 --- a/arch/arm/mach-davinci/board-da830-evm.c +++ b/arch/arm/mach-davinci/board-da830-evm.c @@ -115,18 +115,6 @@ static __init void da830_evm_usb_init(void) */ cfgchip2 = __raw_readl(DA8XX_SYSCFG0_VIRT(DA8XX_CFGCHIP2_REG)); - /* USB2.0 PHY reference clock is 24 MHz */ - cfgchip2 &= ~CFGCHIP2_REFFREQ; - cfgchip2 |= CFGCHIP2_REFFREQ_24MHZ; - - /* -* Select internal reference clock for USB 2.0 PHY -* and use it as a clock source for USB 1.1 PHY -* (this is the default setting anyway). -*/ - cfgchip2 &= ~CFGCHIP2_USB1PHYCLKMUX; - cfgchip2 |= CFGCHIP2_USB2PHYCLKMUX; - /* * We have to override VBUS/ID signals when MUSB is configured into the * host-only mode -- ID pin will float if no cable is connected, so the @@ -143,6 +131,16 @@ static __init void da830_evm_usb_init(void) __raw_writel(cfgchip2, DA8XX_SYSCFG0_VIRT(DA8XX_CFGCHIP2_REG)); /* USB_REFCLKIN is not used. */ + ret = da8xx_register_usb20_phy_clk(false); + if (ret) + pr_warn("%s: USB 2.0 PHY CLK registration failed: %d\n", + __func__, ret); + + ret = da8xx_register_usb11_phy_clk(false); + if (ret) + pr_warn("%s: USB 1.1 PHY CLK registration failed: %d\n", + __func__, ret); + ret = davinci_cfg_reg(DA830_USB0_DRVVBUS); if (ret) pr_warn("%s: USB 2.0 PinMux setup failed: %d\n", __func__, ret); diff --git a/arch/arm/mach-davinci/board-omapl138-hawk.c b/arch/arm/mach-davinci/board-omapl138-hawk.c index ee62486..d4930b6 100644 --- a/arch/arm/mach-davinci/board-omapl138-hawk.c +++ b/arch/arm/mach-davinci/board-omapl138-hawk.c @@ -243,7 +243,6 @@ static irqreturn_t omapl138_hawk_usb_ocic_irq(int irq, void *dev_id) static __init void omapl138_hawk_usb_init(void) { int ret; - u32 cfgchip2; ret = davinci_cfg_reg_list(da850_hawk_usb11_pins); if (ret) { @@ -251,12 +250,15 @@ static __init void omapl138_hawk_usb_init(void) return; } - /* Setup the Ref. clock frequency for the HAWK at 24 MHz. */ - - cfgchip2 = __raw_readl(DA8XX_SYSCFG0_VIRT(DA8XX_CFGCHIP2_REG)); - cfgchip2 &= ~CFGCHIP2_REFFREQ; - cfgchip2 |= CFGCHIP2_REFFREQ_24MHZ; - __raw_writel(cfgchip2, DA8XX_SYSCFG0_VIRT(DA8XX_CFGCHIP2_REG)); + /* USB_REFCLKIN is not used. */ + ret = da8xx_register_usb20_phy_clk(false); + if (ret) + pr_warn("%s: USB 2.0 PHY CLK registration failed: %d\n", + __func__, ret); + ret = da8xx_register_usb11_phy_clk(false); + if (ret) + pr_warn("%s: USB 1.1 PHY CLK registration failed: %d\n", + __func__, ret); ret = gpio_request_one(DA850_USB1_VBUS_PIN, GPIOF_DIR_OUT, "USB1 VBUS"); diff --git a/arch/arm/mach-davinci/include/mach/da8xx.h b/arch/arm/mach-davinci/include/mach/da8xx.h index f9f9713..c367530 100644 --- a/arch/arm/mach-davinci/include/mach/da8xx.h +++ b/arch/arm/mach-davinci/include/mach/da8xx.h @@ -88,6 +88,9 @@ int da850_register_edma(struct edma_rsv_info *rsv[2]); int da8xx_register_i2c(int instance, struct davinci_i2c_platform_data *pdata); int da8xx_register_spi_bus(int instance, unsigned num_chipselect); int da8xx_register_watchdog(void); +int da8xx_register_usb_refclkin(int rate); +int da8xx_register_usb20_phy_clk(bool use_usb_refclkin); +int da8xx_register_usb11_phy_clk(bool
[PATCH v6 3/5] ARM: davinci: da8xx: Add USB PHY platform declaration
There is now a proper phy driver for the DA8xx SoC USB PHY. This adds the platform device declarations needed to use it. Signed-off-by: David Lechner--- da8xx-usb-phy device id is changed to -1 since there is only one da8xx-usb-phy device. arch/arm/mach-davinci/board-da830-evm.c | 28 +--- arch/arm/mach-davinci/board-omapl138-hawk.c | 5 + arch/arm/mach-davinci/include/mach/da8xx.h | 1 + arch/arm/mach-davinci/usb-da8xx.c | 11 +++ 4 files changed, 22 insertions(+), 23 deletions(-) diff --git a/arch/arm/mach-davinci/board-da830-evm.c b/arch/arm/mach-davinci/board-da830-evm.c index 3051cb6..c62766e 100644 --- a/arch/arm/mach-davinci/board-da830-evm.c +++ b/arch/arm/mach-davinci/board-da830-evm.c @@ -26,7 +26,6 @@ #include #include #include -#include #include #include @@ -106,30 +105,8 @@ static irqreturn_t da830_evm_usb_ocic_irq(int irq, void *dev_id) static __init void da830_evm_usb_init(void) { - u32 cfgchip2; int ret; - /* -* Set up USB clock/mode in the CFGCHIP2 register. -* FYI: CFGCHIP2 is 0xef00 initially. -*/ - cfgchip2 = __raw_readl(DA8XX_SYSCFG0_VIRT(DA8XX_CFGCHIP2_REG)); - - /* -* We have to override VBUS/ID signals when MUSB is configured into the -* host-only mode -- ID pin will float if no cable is connected, so the -* controller won't be able to drive VBUS thinking that it's a B-device. -* Otherwise, we want to use the OTG mode and enable VBUS comparators. -*/ - cfgchip2 &= ~CFGCHIP2_OTGMODE; -#ifdef CONFIG_USB_MUSB_HOST - cfgchip2 |= CFGCHIP2_FORCE_HOST; -#else - cfgchip2 |= CFGCHIP2_SESENDEN | CFGCHIP2_VBDTCTEN; -#endif - - __raw_writel(cfgchip2, DA8XX_SYSCFG0_VIRT(DA8XX_CFGCHIP2_REG)); - /* USB_REFCLKIN is not used. */ ret = da8xx_register_usb20_phy_clk(false); if (ret) @@ -141,6 +118,11 @@ static __init void da830_evm_usb_init(void) pr_warn("%s: USB 1.1 PHY CLK registration failed: %d\n", __func__, ret); + ret = da8xx_register_usb_phy(); + if (ret) + pr_warn("%s: USB PHY registration failed: %d\n", + __func__, ret); + ret = davinci_cfg_reg(DA830_USB0_DRVVBUS); if (ret) pr_warn("%s: USB 2.0 PinMux setup failed: %d\n", __func__, ret); diff --git a/arch/arm/mach-davinci/board-omapl138-hawk.c b/arch/arm/mach-davinci/board-omapl138-hawk.c index 8691a25..c5cb8d9 100644 --- a/arch/arm/mach-davinci/board-omapl138-hawk.c +++ b/arch/arm/mach-davinci/board-omapl138-hawk.c @@ -260,6 +260,11 @@ static __init void omapl138_hawk_usb_init(void) pr_warn("%s: USB 1.1 PHY CLK registration failed: %d\n", __func__, ret); + ret = da8xx_register_usb_phy(); + if (ret) + pr_warn("%s: USB PHY registration failed: %d\n", + __func__, ret); + ret = gpio_request_one(DA850_USB1_VBUS_PIN, GPIOF_DIR_OUT, "USB1 VBUS"); if (ret < 0) { diff --git a/arch/arm/mach-davinci/include/mach/da8xx.h b/arch/arm/mach-davinci/include/mach/da8xx.h index c32444b..38d932e 100644 --- a/arch/arm/mach-davinci/include/mach/da8xx.h +++ b/arch/arm/mach-davinci/include/mach/da8xx.h @@ -92,6 +92,7 @@ int da8xx_register_watchdog(void); int da8xx_register_usb_refclkin(int rate); int da8xx_register_usb20_phy_clk(bool use_usb_refclkin); int da8xx_register_usb11_phy_clk(bool use_usb_refclkin); +int da8xx_register_usb_phy(void); int da8xx_register_usb20(unsigned mA, unsigned potpgt); int da8xx_register_usb11(struct da8xx_ohci_root_hub *pdata); int da8xx_register_emac(void); diff --git a/arch/arm/mach-davinci/usb-da8xx.c b/arch/arm/mach-davinci/usb-da8xx.c index 71a6d85..9c30bff 100644 --- a/arch/arm/mach-davinci/usb-da8xx.c +++ b/arch/arm/mach-davinci/usb-da8xx.c @@ -7,6 +7,7 @@ #include #include #include +#include #include #include @@ -243,6 +244,16 @@ int __init da8xx_register_usb11_phy_clk(bool use_usb_refclkin) return ret; } +static struct platform_device da8xx_usb_phy = { + .name = "da8xx-usb-phy", + .id = -1, +}; + +int __init da8xx_register_usb_phy(void) +{ + return platform_device_register(_usb_phy); +} + #if IS_ENABLED(CONFIG_USB_MUSB_HDRC) static struct musb_hdrc_config musb_config = { -- 2.7.4
[PATCH v6 4/5] ARM: DTS: da850: Add cfgchip syscon node
Add a syscon node for the SoC CFGCHIPn registers. This is needed for the new usb phy driver. Signed-off-by: David Lechner --- arch/arm/boot/dts/da850.dtsi | 4 1 file changed, 4 insertions(+) diff --git a/arch/arm/boot/dts/da850.dtsi b/arch/arm/boot/dts/da850.dtsi index f79e1b9..6bbf20d 100644 --- a/arch/arm/boot/dts/da850.dtsi +++ b/arch/arm/boot/dts/da850.dtsi @@ -188,6 +188,10 @@ }; }; + cfgchip: cfgchip@1417c { + compatible = "ti,da830-cfgchip", "syscon"; + reg = <0x1417c 0x14>; + }; edma0: edma@0 { compatible = "ti,edma3-tpcc"; /* eDMA3 CC0: 0x01c0 - 0x01c0 7fff */ -- 2.7.4
[PATCH v6 0/5] da8xx USB PHY platform devices and clocks
It has been almost 6 months since the v5 submission, so here is a recap: * There were a number of phy and usb dependencies that were submitted separately. * The last of the usb dependencies has finally made its way into linux-next today. * This series was recently included in "[PATCH/RFT v2 00/17] Add DT support for ohci-da8xx". I am breaking it back out again as a standalone series. v6 changes: * Combine "ARM: davinci: da8xx: Enable the usb20 "per" clk on phy_clk_enable" from the "[PATCH/RFT v2 00/17] Add DT support for ohci-da8xx" series with the "ARM: davinci: da8xx: add usb phy clocks" patch in this series. * Change the syscon and da8xx-usb-phy device ids to -1. v5 changes: renamed "usbphy" to "usb_phy" or "usb-phy" as appropriate v4 changes: fix strict checkpatch complaint v3 changes: * Fixed the davinci device tree declarations to use the preferred DT address convention so that the items I have added can be correct too. * Moved that davinci clock init so that we don't have to call ioremap in the clock mux functions. * Added a new "syscon" device for the CFGCHIP registers. This is used by the USB PHY driver and will be used in the future in common clock framework drivers. * USB clocks are moved to a common file instead of having duplicated code. * PHY driver uses syscon for CFGCHIP registers instead of using them directly. David Lechner (5): ARM: davinci: da8xx: add usb phy clocks ARM: davinci: da8xx: Add CFGCHIP syscon platform declaration. ARM: davinci: da8xx: Add USB PHY platform declaration ARM: DTS: da850: Add cfgchip syscon node ARM: DTS: da850: Add usb phy node arch/arm/boot/dts/da850.dtsi| 9 ++ arch/arm/mach-davinci/board-da830-evm.c | 52 +++--- arch/arm/mach-davinci/board-da850-evm.c | 4 + arch/arm/mach-davinci/board-mityomapl138.c | 4 + arch/arm/mach-davinci/board-omapl138-hawk.c | 23 ++- arch/arm/mach-davinci/devices-da8xx.c | 28 arch/arm/mach-davinci/include/mach/da8xx.h | 6 + arch/arm/mach-davinci/usb-da8xx.c | 243 +++- 8 files changed, 327 insertions(+), 42 deletions(-) -- 2.7.4
[PATCH v6 1/5] ARM: davinci: da8xx: add usb phy clocks
Up to this point, the USB phy clock configuration was handled manually in the board files and in the usb drivers. This adds proper clocks so that the usb drivers can use clk_get and clk_enable and not have to worry about the details. Also, the related code is removed from the board files and replaced with the new clock registration functions. Signed-off-by: David Lechner Signed-off-by: Axel Haslam --- I have added "ARM: davinci: da8xx: Enable the usb20 "per" clk on phy_clk_enable" from Axel Haslam to this patch. In the review of Axel's patch, Sekhar said: > We should not be using a NULL device pointer here. Can you pass the musb > device pointer available in the same file? Also, da850_clks[] in da850.c > needs to be fixed to add the matching device name. However, the musb device may not be registered. The usb20_clk can be used to supply a 48MHz clock to USB 1.1 (ohci) without using the musb device. So, I am inclined to leave this as NULL. arch/arm/mach-davinci/board-da830-evm.c | 22 ++- arch/arm/mach-davinci/board-omapl138-hawk.c | 16 +- arch/arm/mach-davinci/include/mach/da8xx.h | 3 + arch/arm/mach-davinci/usb-da8xx.c | 232 +++- 4 files changed, 252 insertions(+), 21 deletions(-) diff --git a/arch/arm/mach-davinci/board-da830-evm.c b/arch/arm/mach-davinci/board-da830-evm.c index 3d8cf8c..605d444 100644 --- a/arch/arm/mach-davinci/board-da830-evm.c +++ b/arch/arm/mach-davinci/board-da830-evm.c @@ -115,18 +115,6 @@ static __init void da830_evm_usb_init(void) */ cfgchip2 = __raw_readl(DA8XX_SYSCFG0_VIRT(DA8XX_CFGCHIP2_REG)); - /* USB2.0 PHY reference clock is 24 MHz */ - cfgchip2 &= ~CFGCHIP2_REFFREQ; - cfgchip2 |= CFGCHIP2_REFFREQ_24MHZ; - - /* -* Select internal reference clock for USB 2.0 PHY -* and use it as a clock source for USB 1.1 PHY -* (this is the default setting anyway). -*/ - cfgchip2 &= ~CFGCHIP2_USB1PHYCLKMUX; - cfgchip2 |= CFGCHIP2_USB2PHYCLKMUX; - /* * We have to override VBUS/ID signals when MUSB is configured into the * host-only mode -- ID pin will float if no cable is connected, so the @@ -143,6 +131,16 @@ static __init void da830_evm_usb_init(void) __raw_writel(cfgchip2, DA8XX_SYSCFG0_VIRT(DA8XX_CFGCHIP2_REG)); /* USB_REFCLKIN is not used. */ + ret = da8xx_register_usb20_phy_clk(false); + if (ret) + pr_warn("%s: USB 2.0 PHY CLK registration failed: %d\n", + __func__, ret); + + ret = da8xx_register_usb11_phy_clk(false); + if (ret) + pr_warn("%s: USB 1.1 PHY CLK registration failed: %d\n", + __func__, ret); + ret = davinci_cfg_reg(DA830_USB0_DRVVBUS); if (ret) pr_warn("%s: USB 2.0 PinMux setup failed: %d\n", __func__, ret); diff --git a/arch/arm/mach-davinci/board-omapl138-hawk.c b/arch/arm/mach-davinci/board-omapl138-hawk.c index ee62486..d4930b6 100644 --- a/arch/arm/mach-davinci/board-omapl138-hawk.c +++ b/arch/arm/mach-davinci/board-omapl138-hawk.c @@ -243,7 +243,6 @@ static irqreturn_t omapl138_hawk_usb_ocic_irq(int irq, void *dev_id) static __init void omapl138_hawk_usb_init(void) { int ret; - u32 cfgchip2; ret = davinci_cfg_reg_list(da850_hawk_usb11_pins); if (ret) { @@ -251,12 +250,15 @@ static __init void omapl138_hawk_usb_init(void) return; } - /* Setup the Ref. clock frequency for the HAWK at 24 MHz. */ - - cfgchip2 = __raw_readl(DA8XX_SYSCFG0_VIRT(DA8XX_CFGCHIP2_REG)); - cfgchip2 &= ~CFGCHIP2_REFFREQ; - cfgchip2 |= CFGCHIP2_REFFREQ_24MHZ; - __raw_writel(cfgchip2, DA8XX_SYSCFG0_VIRT(DA8XX_CFGCHIP2_REG)); + /* USB_REFCLKIN is not used. */ + ret = da8xx_register_usb20_phy_clk(false); + if (ret) + pr_warn("%s: USB 2.0 PHY CLK registration failed: %d\n", + __func__, ret); + ret = da8xx_register_usb11_phy_clk(false); + if (ret) + pr_warn("%s: USB 1.1 PHY CLK registration failed: %d\n", + __func__, ret); ret = gpio_request_one(DA850_USB1_VBUS_PIN, GPIOF_DIR_OUT, "USB1 VBUS"); diff --git a/arch/arm/mach-davinci/include/mach/da8xx.h b/arch/arm/mach-davinci/include/mach/da8xx.h index f9f9713..c367530 100644 --- a/arch/arm/mach-davinci/include/mach/da8xx.h +++ b/arch/arm/mach-davinci/include/mach/da8xx.h @@ -88,6 +88,9 @@ int da850_register_edma(struct edma_rsv_info *rsv[2]); int da8xx_register_i2c(int instance, struct davinci_i2c_platform_data *pdata); int da8xx_register_spi_bus(int instance, unsigned num_chipselect); int da8xx_register_watchdog(void); +int da8xx_register_usb_refclkin(int rate); +int da8xx_register_usb20_phy_clk(bool use_usb_refclkin); +int da8xx_register_usb11_phy_clk(bool use_usb_refclkin); int da8xx_register_usb20(unsigned
[PATCH v6 3/5] ARM: davinci: da8xx: Add USB PHY platform declaration
There is now a proper phy driver for the DA8xx SoC USB PHY. This adds the platform device declarations needed to use it. Signed-off-by: David Lechner --- da8xx-usb-phy device id is changed to -1 since there is only one da8xx-usb-phy device. arch/arm/mach-davinci/board-da830-evm.c | 28 +--- arch/arm/mach-davinci/board-omapl138-hawk.c | 5 + arch/arm/mach-davinci/include/mach/da8xx.h | 1 + arch/arm/mach-davinci/usb-da8xx.c | 11 +++ 4 files changed, 22 insertions(+), 23 deletions(-) diff --git a/arch/arm/mach-davinci/board-da830-evm.c b/arch/arm/mach-davinci/board-da830-evm.c index 3051cb6..c62766e 100644 --- a/arch/arm/mach-davinci/board-da830-evm.c +++ b/arch/arm/mach-davinci/board-da830-evm.c @@ -26,7 +26,6 @@ #include #include #include -#include #include #include @@ -106,30 +105,8 @@ static irqreturn_t da830_evm_usb_ocic_irq(int irq, void *dev_id) static __init void da830_evm_usb_init(void) { - u32 cfgchip2; int ret; - /* -* Set up USB clock/mode in the CFGCHIP2 register. -* FYI: CFGCHIP2 is 0xef00 initially. -*/ - cfgchip2 = __raw_readl(DA8XX_SYSCFG0_VIRT(DA8XX_CFGCHIP2_REG)); - - /* -* We have to override VBUS/ID signals when MUSB is configured into the -* host-only mode -- ID pin will float if no cable is connected, so the -* controller won't be able to drive VBUS thinking that it's a B-device. -* Otherwise, we want to use the OTG mode and enable VBUS comparators. -*/ - cfgchip2 &= ~CFGCHIP2_OTGMODE; -#ifdef CONFIG_USB_MUSB_HOST - cfgchip2 |= CFGCHIP2_FORCE_HOST; -#else - cfgchip2 |= CFGCHIP2_SESENDEN | CFGCHIP2_VBDTCTEN; -#endif - - __raw_writel(cfgchip2, DA8XX_SYSCFG0_VIRT(DA8XX_CFGCHIP2_REG)); - /* USB_REFCLKIN is not used. */ ret = da8xx_register_usb20_phy_clk(false); if (ret) @@ -141,6 +118,11 @@ static __init void da830_evm_usb_init(void) pr_warn("%s: USB 1.1 PHY CLK registration failed: %d\n", __func__, ret); + ret = da8xx_register_usb_phy(); + if (ret) + pr_warn("%s: USB PHY registration failed: %d\n", + __func__, ret); + ret = davinci_cfg_reg(DA830_USB0_DRVVBUS); if (ret) pr_warn("%s: USB 2.0 PinMux setup failed: %d\n", __func__, ret); diff --git a/arch/arm/mach-davinci/board-omapl138-hawk.c b/arch/arm/mach-davinci/board-omapl138-hawk.c index 8691a25..c5cb8d9 100644 --- a/arch/arm/mach-davinci/board-omapl138-hawk.c +++ b/arch/arm/mach-davinci/board-omapl138-hawk.c @@ -260,6 +260,11 @@ static __init void omapl138_hawk_usb_init(void) pr_warn("%s: USB 1.1 PHY CLK registration failed: %d\n", __func__, ret); + ret = da8xx_register_usb_phy(); + if (ret) + pr_warn("%s: USB PHY registration failed: %d\n", + __func__, ret); + ret = gpio_request_one(DA850_USB1_VBUS_PIN, GPIOF_DIR_OUT, "USB1 VBUS"); if (ret < 0) { diff --git a/arch/arm/mach-davinci/include/mach/da8xx.h b/arch/arm/mach-davinci/include/mach/da8xx.h index c32444b..38d932e 100644 --- a/arch/arm/mach-davinci/include/mach/da8xx.h +++ b/arch/arm/mach-davinci/include/mach/da8xx.h @@ -92,6 +92,7 @@ int da8xx_register_watchdog(void); int da8xx_register_usb_refclkin(int rate); int da8xx_register_usb20_phy_clk(bool use_usb_refclkin); int da8xx_register_usb11_phy_clk(bool use_usb_refclkin); +int da8xx_register_usb_phy(void); int da8xx_register_usb20(unsigned mA, unsigned potpgt); int da8xx_register_usb11(struct da8xx_ohci_root_hub *pdata); int da8xx_register_emac(void); diff --git a/arch/arm/mach-davinci/usb-da8xx.c b/arch/arm/mach-davinci/usb-da8xx.c index 71a6d85..9c30bff 100644 --- a/arch/arm/mach-davinci/usb-da8xx.c +++ b/arch/arm/mach-davinci/usb-da8xx.c @@ -7,6 +7,7 @@ #include #include #include +#include #include #include @@ -243,6 +244,16 @@ int __init da8xx_register_usb11_phy_clk(bool use_usb_refclkin) return ret; } +static struct platform_device da8xx_usb_phy = { + .name = "da8xx-usb-phy", + .id = -1, +}; + +int __init da8xx_register_usb_phy(void) +{ + return platform_device_register(_usb_phy); +} + #if IS_ENABLED(CONFIG_USB_MUSB_HDRC) static struct musb_hdrc_config musb_config = { -- 2.7.4
[PATCH v6 2/5] ARM: davinci: da8xx: Add CFGCHIP syscon platform declaration.
The CFGCHIP registers are used by a number of devices, so using a syscon device to share them. The first consumer of this will by the phy-da8xx-usb driver. Signed-off-by: David Lechner--- syscon device id is changed to -1 since there is only one syscon device. arch/arm/mach-davinci/board-da830-evm.c | 4 arch/arm/mach-davinci/board-da850-evm.c | 4 arch/arm/mach-davinci/board-mityomapl138.c | 4 arch/arm/mach-davinci/board-omapl138-hawk.c | 4 arch/arm/mach-davinci/devices-da8xx.c | 28 arch/arm/mach-davinci/include/mach/da8xx.h | 2 ++ 6 files changed, 46 insertions(+) diff --git a/arch/arm/mach-davinci/board-da830-evm.c b/arch/arm/mach-davinci/board-da830-evm.c index 605d444..3051cb6 100644 --- a/arch/arm/mach-davinci/board-da830-evm.c +++ b/arch/arm/mach-davinci/board-da830-evm.c @@ -586,6 +586,10 @@ static __init void da830_evm_init(void) struct davinci_soc_info *soc_info = _soc_info; int ret; + ret = da8xx_register_cfgchip(); + if (ret) + pr_warn("%s: CFGCHIP registration failed: %d\n", __func__, ret); + ret = da830_register_gpio(); if (ret) pr_warn("%s: GPIO init failed: %d\n", __func__, ret); diff --git a/arch/arm/mach-davinci/board-da850-evm.c b/arch/arm/mach-davinci/board-da850-evm.c index 8e4539f..ec5cb10 100644 --- a/arch/arm/mach-davinci/board-da850-evm.c +++ b/arch/arm/mach-davinci/board-da850-evm.c @@ -1345,6 +1345,10 @@ static __init void da850_evm_init(void) { int ret; + ret = da8xx_register_cfgchip(); + if (ret) + pr_warn("%s: CFGCHIP registration failed: %d\n", __func__, ret); + ret = da850_register_gpio(); if (ret) pr_warn("%s: GPIO init failed: %d\n", __func__, ret); diff --git a/arch/arm/mach-davinci/board-mityomapl138.c b/arch/arm/mach-davinci/board-mityomapl138.c index bc4e63f..1a6d430 100644 --- a/arch/arm/mach-davinci/board-mityomapl138.c +++ b/arch/arm/mach-davinci/board-mityomapl138.c @@ -514,6 +514,10 @@ static void __init mityomapl138_init(void) { int ret; + ret = da8xx_register_cfgchip(); + if (ret) + pr_warn("%s: CFGCHIP registration failed: %d\n", __func__, ret); + /* for now, no special EDMA channels are reserved */ ret = da850_register_edma(NULL); if (ret) diff --git a/arch/arm/mach-davinci/board-omapl138-hawk.c b/arch/arm/mach-davinci/board-omapl138-hawk.c index d4930b6..8691a25 100644 --- a/arch/arm/mach-davinci/board-omapl138-hawk.c +++ b/arch/arm/mach-davinci/board-omapl138-hawk.c @@ -294,6 +294,10 @@ static __init void omapl138_hawk_init(void) { int ret; + ret = da8xx_register_cfgchip(); + if (ret) + pr_warn("%s: CFGCHIP registration failed: %d\n", __func__, ret); + ret = da850_register_gpio(); if (ret) pr_warn("%s: GPIO init failed: %d\n", __func__, ret); diff --git a/arch/arm/mach-davinci/devices-da8xx.c b/arch/arm/mach-davinci/devices-da8xx.c index add3771..31a99db 100644 --- a/arch/arm/mach-davinci/devices-da8xx.c +++ b/arch/arm/mach-davinci/devices-da8xx.c @@ -11,6 +11,7 @@ * (at your option) any later version. */ #include +#include #include #include #include @@ -1089,3 +1090,30 @@ int __init da850_register_sata(unsigned long refclkpn) return platform_device_register(_sata_device); } #endif + +static struct syscon_platform_data da8xx_cfgchip_platform_data = { + .label = "cfgchip", +}; + +static struct resource da8xx_cfgchip_resources[] = { + { + .start = DA8XX_SYSCFG0_BASE + DA8XX_CFGCHIP0_REG, + .end= DA8XX_SYSCFG0_BASE + DA8XX_CFGCHIP4_REG + 3, + .flags = IORESOURCE_MEM, + }, +}; + +static struct platform_device da8xx_cfgchip_device = { + .name = "syscon", + .id = -1, + .dev= { + .platform_data = _cfgchip_platform_data, + }, + .num_resources = ARRAY_SIZE(da8xx_cfgchip_resources), + .resource = da8xx_cfgchip_resources, +}; + +int __init da8xx_register_cfgchip(void) +{ + return platform_device_register(_cfgchip_device); +} diff --git a/arch/arm/mach-davinci/include/mach/da8xx.h b/arch/arm/mach-davinci/include/mach/da8xx.h index c367530..c32444b 100644 --- a/arch/arm/mach-davinci/include/mach/da8xx.h +++ b/arch/arm/mach-davinci/include/mach/da8xx.h @@ -61,6 +61,7 @@ extern unsigned int da850_max_speed; #define DA8XX_CFGCHIP1_REG 0x180 #define DA8XX_CFGCHIP2_REG 0x184 #define DA8XX_CFGCHIP3_REG 0x188 +#define DA8XX_CFGCHIP4_REG 0x18c #define DA8XX_SYSCFG1_BASE (IO_PHYS + 0x22C000) #define DA8XX_SYSCFG1_VIRT(x) (da8xx_syscfg1_base + (x)) @@ -116,6 +117,7 @@ void da8xx_rproc_reserve_cma(void); int da8xx_register_rproc(void); int da850_register_gpio(void); int da830_register_gpio(void); +int
[PATCH v6 5/5] ARM: DTS: da850: Add usb phy node
Add a node for the new usb phy driver. Signed-off-by: David Lechner--- arch/arm/boot/dts/da850.dtsi | 5 + 1 file changed, 5 insertions(+) diff --git a/arch/arm/boot/dts/da850.dtsi b/arch/arm/boot/dts/da850.dtsi index 6bbf20d..33fcdce 100644 --- a/arch/arm/boot/dts/da850.dtsi +++ b/arch/arm/boot/dts/da850.dtsi @@ -376,6 +376,11 @@ >; status = "disabled"; }; + usb_phy: usb-phy { + compatible = "ti,da830-usb-phy"; + #phy-cells = <1>; + status = "disabled"; + }; gpio: gpio@226000 { compatible = "ti,dm6441-gpio"; gpio-controller; -- 2.7.4
[PATCH v6 2/5] ARM: davinci: da8xx: Add CFGCHIP syscon platform declaration.
The CFGCHIP registers are used by a number of devices, so using a syscon device to share them. The first consumer of this will by the phy-da8xx-usb driver. Signed-off-by: David Lechner --- syscon device id is changed to -1 since there is only one syscon device. arch/arm/mach-davinci/board-da830-evm.c | 4 arch/arm/mach-davinci/board-da850-evm.c | 4 arch/arm/mach-davinci/board-mityomapl138.c | 4 arch/arm/mach-davinci/board-omapl138-hawk.c | 4 arch/arm/mach-davinci/devices-da8xx.c | 28 arch/arm/mach-davinci/include/mach/da8xx.h | 2 ++ 6 files changed, 46 insertions(+) diff --git a/arch/arm/mach-davinci/board-da830-evm.c b/arch/arm/mach-davinci/board-da830-evm.c index 605d444..3051cb6 100644 --- a/arch/arm/mach-davinci/board-da830-evm.c +++ b/arch/arm/mach-davinci/board-da830-evm.c @@ -586,6 +586,10 @@ static __init void da830_evm_init(void) struct davinci_soc_info *soc_info = _soc_info; int ret; + ret = da8xx_register_cfgchip(); + if (ret) + pr_warn("%s: CFGCHIP registration failed: %d\n", __func__, ret); + ret = da830_register_gpio(); if (ret) pr_warn("%s: GPIO init failed: %d\n", __func__, ret); diff --git a/arch/arm/mach-davinci/board-da850-evm.c b/arch/arm/mach-davinci/board-da850-evm.c index 8e4539f..ec5cb10 100644 --- a/arch/arm/mach-davinci/board-da850-evm.c +++ b/arch/arm/mach-davinci/board-da850-evm.c @@ -1345,6 +1345,10 @@ static __init void da850_evm_init(void) { int ret; + ret = da8xx_register_cfgchip(); + if (ret) + pr_warn("%s: CFGCHIP registration failed: %d\n", __func__, ret); + ret = da850_register_gpio(); if (ret) pr_warn("%s: GPIO init failed: %d\n", __func__, ret); diff --git a/arch/arm/mach-davinci/board-mityomapl138.c b/arch/arm/mach-davinci/board-mityomapl138.c index bc4e63f..1a6d430 100644 --- a/arch/arm/mach-davinci/board-mityomapl138.c +++ b/arch/arm/mach-davinci/board-mityomapl138.c @@ -514,6 +514,10 @@ static void __init mityomapl138_init(void) { int ret; + ret = da8xx_register_cfgchip(); + if (ret) + pr_warn("%s: CFGCHIP registration failed: %d\n", __func__, ret); + /* for now, no special EDMA channels are reserved */ ret = da850_register_edma(NULL); if (ret) diff --git a/arch/arm/mach-davinci/board-omapl138-hawk.c b/arch/arm/mach-davinci/board-omapl138-hawk.c index d4930b6..8691a25 100644 --- a/arch/arm/mach-davinci/board-omapl138-hawk.c +++ b/arch/arm/mach-davinci/board-omapl138-hawk.c @@ -294,6 +294,10 @@ static __init void omapl138_hawk_init(void) { int ret; + ret = da8xx_register_cfgchip(); + if (ret) + pr_warn("%s: CFGCHIP registration failed: %d\n", __func__, ret); + ret = da850_register_gpio(); if (ret) pr_warn("%s: GPIO init failed: %d\n", __func__, ret); diff --git a/arch/arm/mach-davinci/devices-da8xx.c b/arch/arm/mach-davinci/devices-da8xx.c index add3771..31a99db 100644 --- a/arch/arm/mach-davinci/devices-da8xx.c +++ b/arch/arm/mach-davinci/devices-da8xx.c @@ -11,6 +11,7 @@ * (at your option) any later version. */ #include +#include #include #include #include @@ -1089,3 +1090,30 @@ int __init da850_register_sata(unsigned long refclkpn) return platform_device_register(_sata_device); } #endif + +static struct syscon_platform_data da8xx_cfgchip_platform_data = { + .label = "cfgchip", +}; + +static struct resource da8xx_cfgchip_resources[] = { + { + .start = DA8XX_SYSCFG0_BASE + DA8XX_CFGCHIP0_REG, + .end= DA8XX_SYSCFG0_BASE + DA8XX_CFGCHIP4_REG + 3, + .flags = IORESOURCE_MEM, + }, +}; + +static struct platform_device da8xx_cfgchip_device = { + .name = "syscon", + .id = -1, + .dev= { + .platform_data = _cfgchip_platform_data, + }, + .num_resources = ARRAY_SIZE(da8xx_cfgchip_resources), + .resource = da8xx_cfgchip_resources, +}; + +int __init da8xx_register_cfgchip(void) +{ + return platform_device_register(_cfgchip_device); +} diff --git a/arch/arm/mach-davinci/include/mach/da8xx.h b/arch/arm/mach-davinci/include/mach/da8xx.h index c367530..c32444b 100644 --- a/arch/arm/mach-davinci/include/mach/da8xx.h +++ b/arch/arm/mach-davinci/include/mach/da8xx.h @@ -61,6 +61,7 @@ extern unsigned int da850_max_speed; #define DA8XX_CFGCHIP1_REG 0x180 #define DA8XX_CFGCHIP2_REG 0x184 #define DA8XX_CFGCHIP3_REG 0x188 +#define DA8XX_CFGCHIP4_REG 0x18c #define DA8XX_SYSCFG1_BASE (IO_PHYS + 0x22C000) #define DA8XX_SYSCFG1_VIRT(x) (da8xx_syscfg1_base + (x)) @@ -116,6 +117,7 @@ void da8xx_rproc_reserve_cma(void); int da8xx_register_rproc(void); int da850_register_gpio(void); int da830_register_gpio(void); +int da8xx_register_cfgchip(void);
[PATCH v6 5/5] ARM: DTS: da850: Add usb phy node
Add a node for the new usb phy driver. Signed-off-by: David Lechner --- arch/arm/boot/dts/da850.dtsi | 5 + 1 file changed, 5 insertions(+) diff --git a/arch/arm/boot/dts/da850.dtsi b/arch/arm/boot/dts/da850.dtsi index 6bbf20d..33fcdce 100644 --- a/arch/arm/boot/dts/da850.dtsi +++ b/arch/arm/boot/dts/da850.dtsi @@ -376,6 +376,11 @@ >; status = "disabled"; }; + usb_phy: usb-phy { + compatible = "ti,da830-usb-phy"; + #phy-cells = <1>; + status = "disabled"; + }; gpio: gpio@226000 { compatible = "ti,dm6441-gpio"; gpio-controller; -- 2.7.4
linux-next: no releases next week
Hi all, There will probably be no linux-next releases next week while I am attending Kernel Summit. -- Cheers, Stephen Rothwell
linux-next: no releases next week
Hi all, There will probably be no linux-next releases next week while I am attending Kernel Summit. -- Cheers, Stephen Rothwell
linux-next: Tree for Oct 26
Hi all, There will probably be no linux-next releases next week while I attend the Kernel Summit. Changes since 20161025: The sunxi tree lost its build failure. The akpm-current tree still had its build failures for which I applied 2 patches. Non-merge commits (relative to Linus' tree): 2628 3334 files changed, 210166 insertions(+), 49968 deletions(-) I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use "git pull" to do so as that will try to merge the new linux-next release with the old one. You should use "git fetch" and checkout or reset to the new master. You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with a ppc64_defconfig for powerpc and an allmodconfig (with CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a native build of tools/perf. After the final fixups (if any), I do an x86_64 modules_install followed by builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig (this fails its final link) and pseries_le_defconfig and i386, sparc and sparc64 defconfig. Below is a summary of the state of the merge. I am currently merging 245 trees (counting Linus' and 35 trees of patches pending for Linus' tree). Stats about the size of the tree over time can be seen at http://neuling.org/linux-next-size.html . Status of my local build tests will be at http://kisskb.ellerman.id.au/linux-next . If maintainers want to give advice about cross compilers/configs that work, we are always open to add more builds. Thanks to Randy Dunlap for doing many randconfig builds. And to Paul Gortmaker for triage and bug fixes. -- Cheers, Stephen Rothwell $ git checkout master $ git reset --hard stable Merging origin/master (9fe68cad6e74 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6) Merging fixes/master (30066ce675d3 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6) Merging kbuild-current/rc-fixes (989cea5c14be kbuild: prevent lib-ksyms.o rebuilds) Merging arc-current/for-curr (9868c77a82f7 ARC: build: retire old toggles) Merging arm-current/fixes (6127d124ee4e ARM: wire up new pkey syscalls) Merging m68k-current/for-linus (6736e65effc3 m68k: Migrate exception table users off module.h and onto extable.h) Merging metag-fixes/fixes (35d04077ad96 metag: Only define atomic_dec_if_positive conditionally) Merging powerpc-fixes/fixes (09b7e37b18ee powerpc/64: Fix race condition in setting lock bit in idle/wakeup code) Merging sparc/master (ee9e83973d54 sparc32: Fix old style declaration GCC warnings) Merging net/master (44060abe1dd6 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth) CONFLICT (content): Merge conflict in drivers/net/ethernet/qlogic/Kconfig Applying: qed*: merge fix for CONFIG_INFINIBAND_QEDR Kconfig move Merging ipsec/master (7f92083eb58f vti6: flush x-netns xfrm cache when vti interface is removed) Merging netfilter/master (7034b566a4e7 netfilter: fix nf_queue handling) Merging ipvs/master (ea43f860d984 Merge branch 'ethoc-fixes') Merging wireless-drivers/master (1ea2643961b0 ath6kl: add Dell OEM SDIO I/O for the Venue 8 Pro) Merging mac80211/master (b4f0fd4baa90 qed: Use list_move_tail instead of list_del/list_add_tail) Merging sound-current/for-linus (9b50898ad96c ALSA: seq: Fix time account regression) Merging pci-current/for-linus (02a1b8f4167e PCI: designware-plat: Update author email address) Merging driver-core.current/driver-core-linus (07d9a380680d Linux 4.9-rc2) Merging tty.current/tty-linus (1001354ca341 Linux 4.9-rc1) Merging usb.current/usb-linus (b76032396d79 usb: renesas_usbhs: add wait after initialization for R-Car Gen3) Merging usb-gadget-fixes/fixes (a1aa8cf6471b Revert "Documentation: devicetree: dwc2: Deprecate g-tx-fifo-size") Merging usb-serial-fixes/usb-linus (07d9a380680d Linux 4.9-rc2) Merging usb-chipidea-fixes/ci-for-usb-stable (6b7f456e67a1 usb: chipidea: host: fix NULL ptr dereference during shutdown) Merging phy/fixes (1001354ca341 Linux 4.9-rc1) Merging staging.current/staging-linus (e866dd8aab76 greybus: fix a leak on error in gb_module_create()) Merging char-misc.current/char-misc-linus (407a3aee6ee2 hv: do not lose pending heartbeat vmbus packets) Merging input-current/for-linus (324ae0958cab Input: psmouse - cleanup Focaltech code) Merging crypto-current/master (6d4952d9d9d4 hwrng: core - Don't use a stack buffer in add_early_randomness()) Merging ide/master (797cee982eef Merge branch 'stable-4.8' of git://g
linux-next: Tree for Oct 26
Hi all, There will probably be no linux-next releases next week while I attend the Kernel Summit. Changes since 20161025: The sunxi tree lost its build failure. The akpm-current tree still had its build failures for which I applied 2 patches. Non-merge commits (relative to Linus' tree): 2628 3334 files changed, 210166 insertions(+), 49968 deletions(-) I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use "git pull" to do so as that will try to merge the new linux-next release with the old one. You should use "git fetch" and checkout or reset to the new master. You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with a ppc64_defconfig for powerpc and an allmodconfig (with CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a native build of tools/perf. After the final fixups (if any), I do an x86_64 modules_install followed by builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig (this fails its final link) and pseries_le_defconfig and i386, sparc and sparc64 defconfig. Below is a summary of the state of the merge. I am currently merging 245 trees (counting Linus' and 35 trees of patches pending for Linus' tree). Stats about the size of the tree over time can be seen at http://neuling.org/linux-next-size.html . Status of my local build tests will be at http://kisskb.ellerman.id.au/linux-next . If maintainers want to give advice about cross compilers/configs that work, we are always open to add more builds. Thanks to Randy Dunlap for doing many randconfig builds. And to Paul Gortmaker for triage and bug fixes. -- Cheers, Stephen Rothwell $ git checkout master $ git reset --hard stable Merging origin/master (9fe68cad6e74 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6) Merging fixes/master (30066ce675d3 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6) Merging kbuild-current/rc-fixes (989cea5c14be kbuild: prevent lib-ksyms.o rebuilds) Merging arc-current/for-curr (9868c77a82f7 ARC: build: retire old toggles) Merging arm-current/fixes (6127d124ee4e ARM: wire up new pkey syscalls) Merging m68k-current/for-linus (6736e65effc3 m68k: Migrate exception table users off module.h and onto extable.h) Merging metag-fixes/fixes (35d04077ad96 metag: Only define atomic_dec_if_positive conditionally) Merging powerpc-fixes/fixes (09b7e37b18ee powerpc/64: Fix race condition in setting lock bit in idle/wakeup code) Merging sparc/master (ee9e83973d54 sparc32: Fix old style declaration GCC warnings) Merging net/master (44060abe1dd6 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth) CONFLICT (content): Merge conflict in drivers/net/ethernet/qlogic/Kconfig Applying: qed*: merge fix for CONFIG_INFINIBAND_QEDR Kconfig move Merging ipsec/master (7f92083eb58f vti6: flush x-netns xfrm cache when vti interface is removed) Merging netfilter/master (7034b566a4e7 netfilter: fix nf_queue handling) Merging ipvs/master (ea43f860d984 Merge branch 'ethoc-fixes') Merging wireless-drivers/master (1ea2643961b0 ath6kl: add Dell OEM SDIO I/O for the Venue 8 Pro) Merging mac80211/master (b4f0fd4baa90 qed: Use list_move_tail instead of list_del/list_add_tail) Merging sound-current/for-linus (9b50898ad96c ALSA: seq: Fix time account regression) Merging pci-current/for-linus (02a1b8f4167e PCI: designware-plat: Update author email address) Merging driver-core.current/driver-core-linus (07d9a380680d Linux 4.9-rc2) Merging tty.current/tty-linus (1001354ca341 Linux 4.9-rc1) Merging usb.current/usb-linus (b76032396d79 usb: renesas_usbhs: add wait after initialization for R-Car Gen3) Merging usb-gadget-fixes/fixes (a1aa8cf6471b Revert "Documentation: devicetree: dwc2: Deprecate g-tx-fifo-size") Merging usb-serial-fixes/usb-linus (07d9a380680d Linux 4.9-rc2) Merging usb-chipidea-fixes/ci-for-usb-stable (6b7f456e67a1 usb: chipidea: host: fix NULL ptr dereference during shutdown) Merging phy/fixes (1001354ca341 Linux 4.9-rc1) Merging staging.current/staging-linus (e866dd8aab76 greybus: fix a leak on error in gb_module_create()) Merging char-misc.current/char-misc-linus (407a3aee6ee2 hv: do not lose pending heartbeat vmbus packets) Merging input-current/for-linus (324ae0958cab Input: psmouse - cleanup Focaltech code) Merging crypto-current/master (6d4952d9d9d4 hwrng: core - Don't use a stack buffer in add_early_randomness()) Merging ide/master (797cee982eef Merge branch 'stable-4.8' of git://g
Re: [RFC PATCH 0/6] UART slave devices using serio
Hi, On Tue, Oct 25, 2016 at 05:02:23PM -0500, Rob Herring wrote: > On Tue, Oct 25, 2016 at 4:55 PM, Sebastian Reichel wrote: > > On Wed, Aug 24, 2016 at 06:24:30PM -0500, Rob Herring wrote: > >> [...] > > I had a more detailed look at the series during the last two weeks. > > For me the approach looks ok and it should work for the nokia bluetooth > > use case. Actually my work on that driver is more or less stalled until > > this is solved, so it would be nice to get this forward. Whose feedback > > is this waiting from? I guess > > I think it is mainly waiting for me to spend more time on it and get > the tty port part done. The general approach could already be commented on. > I could use help especially for converting the BT part properly. Ok, I will have a look at that. > > * Alan & Greg for the serial parts > > * Marcel for the bluetooth parts > > * Dmitry for the serio parts > > > > Maybe you can try to find some minutes at the Kernel Summit to talk > > about this? > > Still waiting for my invite... > But I will be at Plumbers if folks want to discuss this. Ok. I obviously assumed invites have already been sent and that you would be invited. -- Sebastian signature.asc Description: PGP signature
Re: [RFC PATCH 0/6] UART slave devices using serio
Hi, On Tue, Oct 25, 2016 at 05:02:23PM -0500, Rob Herring wrote: > On Tue, Oct 25, 2016 at 4:55 PM, Sebastian Reichel wrote: > > On Wed, Aug 24, 2016 at 06:24:30PM -0500, Rob Herring wrote: > >> [...] > > I had a more detailed look at the series during the last two weeks. > > For me the approach looks ok and it should work for the nokia bluetooth > > use case. Actually my work on that driver is more or less stalled until > > this is solved, so it would be nice to get this forward. Whose feedback > > is this waiting from? I guess > > I think it is mainly waiting for me to spend more time on it and get > the tty port part done. The general approach could already be commented on. > I could use help especially for converting the BT part properly. Ok, I will have a look at that. > > * Alan & Greg for the serial parts > > * Marcel for the bluetooth parts > > * Dmitry for the serio parts > > > > Maybe you can try to find some minutes at the Kernel Summit to talk > > about this? > > Still waiting for my invite... > But I will be at Plumbers if folks want to discuss this. Ok. I obviously assumed invites have already been sent and that you would be invited. -- Sebastian signature.asc Description: PGP signature
[PATCH v2 5/5] posix-timers: make it configurable
Some embedded systems have no use for them. This removes about 22KB from the kernel binary size when configured out. Corresponding syscalls are routed to a stub logging the attempt to use those syscalls which should be enough of a clue if they were disabled without proper consideration. They are: timer_create, timer_gettime: timer_getoverrun, timer_settime, timer_delete, clock_adjtime. The clock_settime, clock_gettime, clock_getres and clock_nanosleep syscalls are replaced by simple wrappers compatible with CLOCK_REALTIME, CLOCK_MONOTONIC and CLOCK_BOOTTIME only which should cover the vast majority of use cases with very little code. Signed-off-by: Nicolas PitreReviewed-by: Josh Triplett --- drivers/ptp/Kconfig | 2 +- include/linux/posix-timers.h | 28 +- include/linux/sched.h| 10 init/Kconfig | 17 +++ kernel/signal.c | 4 ++ kernel/time/Makefile | 10 +++- kernel/time/posix-stubs.c| 118 +++ 7 files changed, 184 insertions(+), 5 deletions(-) create mode 100644 kernel/time/posix-stubs.c diff --git a/drivers/ptp/Kconfig b/drivers/ptp/Kconfig index 0f7492f8ea..bdce332911 100644 --- a/drivers/ptp/Kconfig +++ b/drivers/ptp/Kconfig @@ -6,7 +6,7 @@ menu "PTP clock support" config PTP_1588_CLOCK tristate "PTP clock support" - depends on NET + depends on NET && POSIX_TIMERS select PPS select NET_PTP_CLASSIFY help diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h index 62d44c1760..2288c5c557 100644 --- a/include/linux/posix-timers.h +++ b/include/linux/posix-timers.h @@ -118,6 +118,8 @@ struct k_clock { extern struct k_clock clock_posix_cpu; extern struct k_clock clock_posix_dynamic; +#ifdef CONFIG_POSIX_TIMERS + void posix_timers_register_clock(const clockid_t clock_id, struct k_clock *new_clock); /* function to call to trigger timer event */ @@ -131,8 +133,30 @@ void posix_cpu_timers_exit_group(struct task_struct *task); void set_process_cpu_timer(struct task_struct *task, unsigned int clock_idx, cputime_t *newval, cputime_t *oldval); -long clock_nanosleep_restart(struct restart_block *restart_block); - void update_rlimit_cpu(struct task_struct *task, unsigned long rlim_new); +#else + +#include + +static inline void posix_timers_register_clock(const clockid_t clock_id, + struct k_clock *new_clock) {} +static inline int posix_timer_event(struct k_itimer *timr, int si_private) +{ return 0; } +static inline void run_posix_cpu_timers(struct task_struct *task) {} +static inline void posix_cpu_timers_exit(struct task_struct *task) +{ + add_device_randomness((const void*) >se.sum_exec_runtime, + sizeof(unsigned long long)); +} +static inline void posix_cpu_timers_exit_group(struct task_struct *task) {} +static inline void set_process_cpu_timer(struct task_struct *task, + unsigned int clock_idx, cputime_t *newval, cputime_t *oldval) {} +static inline void update_rlimit_cpu(struct task_struct *task, +unsigned long rlim_new) {} + +#endif + +long clock_nanosleep_restart(struct restart_block *restart_block); + #endif diff --git a/include/linux/sched.h b/include/linux/sched.h index 348f51b0ec..ad716d5559 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2946,8 +2946,13 @@ static inline void exit_thread(struct task_struct *tsk) extern void exit_files(struct task_struct *); extern void __cleanup_sighand(struct sighand_struct *); +#ifdef CONFIG_POSIX_TIMERS extern void exit_itimers(struct signal_struct *); extern void flush_itimer_signals(void); +#else +static inline void exit_itimers(struct signal_struct *s) {} +static inline void flush_itimer_signals(void) {} +#endif extern void do_group_exit(int); @@ -3450,7 +3455,12 @@ static __always_inline bool need_resched(void) * Thread group CPU time accounting. */ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times); +#ifdef CONFIG_POSIX_TIMERS void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times); +#else +static inline void thread_group_cputimer(struct task_struct *tsk, +struct task_cputime *times) {} +#endif /* * Reevaluate whether the task has signals pending delivery. diff --git a/init/Kconfig b/init/Kconfig index 34407f15e6..351d422252 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1445,6 +1445,23 @@ config SYSCTL_SYSCALL If unsure say N here. +config POSIX_TIMERS + bool "Posix Clocks & timers" if EXPERT + default y + help + This includes native support for POSIX timers to the kernel. + Most embedded systems may have no use for them and therefore they + can be configured out to
[PATCH v2 3/5] kconfig: regenerate *.c_shipped files after previous changes
Signed-off-by: Nicolas Pitre--- scripts/kconfig/zconf.hash.c_shipped | 228 ++--- scripts/kconfig/zconf.tab.c_shipped | 1631 -- 2 files changed, 888 insertions(+), 971 deletions(-) diff --git a/scripts/kconfig/zconf.hash.c_shipped b/scripts/kconfig/zconf.hash.c_shipped index 360a62df2b..bf7f1378b3 100644 --- a/scripts/kconfig/zconf.hash.c_shipped +++ b/scripts/kconfig/zconf.hash.c_shipped @@ -32,7 +32,7 @@ struct kconf_id; static const struct kconf_id *kconf_id_lookup(register const char *str, register unsigned int len); -/* maximum key range = 71, duplicates = 0 */ +/* maximum key range = 72, duplicates = 0 */ #ifdef __GNUC__ __inline @@ -46,32 +46,32 @@ kconf_id_hash (register const char *str, register unsigned int len) { static const unsigned char asso_values[] = { - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 0, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 5, 25, 25, - 0, 0, 0, 5, 0, 0, 73, 73, 5, 0, - 10, 5, 45, 73, 20, 20, 0, 15, 15, 73, - 20, 5, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73 + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 0, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 0, 20, 10, + 0, 0, 0, 30, 0, 0, 74, 74, 5, 15, + 0, 25, 40, 74, 15, 0, 0, 10, 35, 74, + 10, 0, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74 }; register int hval = len; @@ -97,33 +97,35 @@ struct kconf_id_strings_t char kconf_id_strings_str8[sizeof("tristate")]; char kconf_id_strings_str9[sizeof("endchoice")]; char kconf_id_strings_str10[sizeof("---help---")]; +char kconf_id_strings_str11[sizeof("select")]; char kconf_id_strings_str12[sizeof("def_tristate")]; char kconf_id_strings_str13[sizeof("def_bool")]; char kconf_id_strings_str14[sizeof("defconfig_list")]; -char kconf_id_strings_str17[sizeof("on")]; -char kconf_id_strings_str18[sizeof("optional")]; -char kconf_id_strings_str21[sizeof("option")]; -char kconf_id_strings_str22[sizeof("endmenu")]; -char kconf_id_strings_str23[sizeof("mainmenu")]; -char kconf_id_strings_str25[sizeof("menuconfig")]; -char kconf_id_strings_str27[sizeof("modules")]; -char kconf_id_strings_str28[sizeof("allnoconfig_y")]; +char kconf_id_strings_str16[sizeof("source")]; +char kconf_id_strings_str17[sizeof("endmenu")]; +char kconf_id_strings_str18[sizeof("allnoconfig_y")]; +char kconf_id_strings_str20[sizeof("range")]; +char kconf_id_strings_str22[sizeof("modules")]; +char kconf_id_strings_str23[sizeof("hex")]; +char kconf_id_strings_str27[sizeof("on")]; char kconf_id_strings_str29[sizeof("menu")]; -char kconf_id_strings_str31[sizeof("select")]; +char kconf_id_strings_str31[sizeof("option")]; char kconf_id_strings_str32[sizeof("comment")]; -char kconf_id_strings_str33[sizeof("env")]; -char kconf_id_strings_str35[sizeof("range")]; -char kconf_id_strings_str36[sizeof("choice")]; -char kconf_id_strings_str39[sizeof("bool")]; -char kconf_id_strings_str41[sizeof("source")]; +
[PATCH v2 5/5] posix-timers: make it configurable
Some embedded systems have no use for them. This removes about 22KB from the kernel binary size when configured out. Corresponding syscalls are routed to a stub logging the attempt to use those syscalls which should be enough of a clue if they were disabled without proper consideration. They are: timer_create, timer_gettime: timer_getoverrun, timer_settime, timer_delete, clock_adjtime. The clock_settime, clock_gettime, clock_getres and clock_nanosleep syscalls are replaced by simple wrappers compatible with CLOCK_REALTIME, CLOCK_MONOTONIC and CLOCK_BOOTTIME only which should cover the vast majority of use cases with very little code. Signed-off-by: Nicolas Pitre Reviewed-by: Josh Triplett --- drivers/ptp/Kconfig | 2 +- include/linux/posix-timers.h | 28 +- include/linux/sched.h| 10 init/Kconfig | 17 +++ kernel/signal.c | 4 ++ kernel/time/Makefile | 10 +++- kernel/time/posix-stubs.c| 118 +++ 7 files changed, 184 insertions(+), 5 deletions(-) create mode 100644 kernel/time/posix-stubs.c diff --git a/drivers/ptp/Kconfig b/drivers/ptp/Kconfig index 0f7492f8ea..bdce332911 100644 --- a/drivers/ptp/Kconfig +++ b/drivers/ptp/Kconfig @@ -6,7 +6,7 @@ menu "PTP clock support" config PTP_1588_CLOCK tristate "PTP clock support" - depends on NET + depends on NET && POSIX_TIMERS select PPS select NET_PTP_CLASSIFY help diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h index 62d44c1760..2288c5c557 100644 --- a/include/linux/posix-timers.h +++ b/include/linux/posix-timers.h @@ -118,6 +118,8 @@ struct k_clock { extern struct k_clock clock_posix_cpu; extern struct k_clock clock_posix_dynamic; +#ifdef CONFIG_POSIX_TIMERS + void posix_timers_register_clock(const clockid_t clock_id, struct k_clock *new_clock); /* function to call to trigger timer event */ @@ -131,8 +133,30 @@ void posix_cpu_timers_exit_group(struct task_struct *task); void set_process_cpu_timer(struct task_struct *task, unsigned int clock_idx, cputime_t *newval, cputime_t *oldval); -long clock_nanosleep_restart(struct restart_block *restart_block); - void update_rlimit_cpu(struct task_struct *task, unsigned long rlim_new); +#else + +#include + +static inline void posix_timers_register_clock(const clockid_t clock_id, + struct k_clock *new_clock) {} +static inline int posix_timer_event(struct k_itimer *timr, int si_private) +{ return 0; } +static inline void run_posix_cpu_timers(struct task_struct *task) {} +static inline void posix_cpu_timers_exit(struct task_struct *task) +{ + add_device_randomness((const void*) >se.sum_exec_runtime, + sizeof(unsigned long long)); +} +static inline void posix_cpu_timers_exit_group(struct task_struct *task) {} +static inline void set_process_cpu_timer(struct task_struct *task, + unsigned int clock_idx, cputime_t *newval, cputime_t *oldval) {} +static inline void update_rlimit_cpu(struct task_struct *task, +unsigned long rlim_new) {} + +#endif + +long clock_nanosleep_restart(struct restart_block *restart_block); + #endif diff --git a/include/linux/sched.h b/include/linux/sched.h index 348f51b0ec..ad716d5559 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2946,8 +2946,13 @@ static inline void exit_thread(struct task_struct *tsk) extern void exit_files(struct task_struct *); extern void __cleanup_sighand(struct sighand_struct *); +#ifdef CONFIG_POSIX_TIMERS extern void exit_itimers(struct signal_struct *); extern void flush_itimer_signals(void); +#else +static inline void exit_itimers(struct signal_struct *s) {} +static inline void flush_itimer_signals(void) {} +#endif extern void do_group_exit(int); @@ -3450,7 +3455,12 @@ static __always_inline bool need_resched(void) * Thread group CPU time accounting. */ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times); +#ifdef CONFIG_POSIX_TIMERS void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times); +#else +static inline void thread_group_cputimer(struct task_struct *tsk, +struct task_cputime *times) {} +#endif /* * Reevaluate whether the task has signals pending delivery. diff --git a/init/Kconfig b/init/Kconfig index 34407f15e6..351d422252 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1445,6 +1445,23 @@ config SYSCTL_SYSCALL If unsure say N here. +config POSIX_TIMERS + bool "Posix Clocks & timers" if EXPERT + default y + help + This includes native support for POSIX timers to the kernel. + Most embedded systems may have no use for them and therefore they + can be configured out to reduce the size of the kernel image. + +
[PATCH v2 3/5] kconfig: regenerate *.c_shipped files after previous changes
Signed-off-by: Nicolas Pitre --- scripts/kconfig/zconf.hash.c_shipped | 228 ++--- scripts/kconfig/zconf.tab.c_shipped | 1631 -- 2 files changed, 888 insertions(+), 971 deletions(-) diff --git a/scripts/kconfig/zconf.hash.c_shipped b/scripts/kconfig/zconf.hash.c_shipped index 360a62df2b..bf7f1378b3 100644 --- a/scripts/kconfig/zconf.hash.c_shipped +++ b/scripts/kconfig/zconf.hash.c_shipped @@ -32,7 +32,7 @@ struct kconf_id; static const struct kconf_id *kconf_id_lookup(register const char *str, register unsigned int len); -/* maximum key range = 71, duplicates = 0 */ +/* maximum key range = 72, duplicates = 0 */ #ifdef __GNUC__ __inline @@ -46,32 +46,32 @@ kconf_id_hash (register const char *str, register unsigned int len) { static const unsigned char asso_values[] = { - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 0, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 5, 25, 25, - 0, 0, 0, 5, 0, 0, 73, 73, 5, 0, - 10, 5, 45, 73, 20, 20, 0, 15, 15, 73, - 20, 5, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73, 73, 73, 73, 73, - 73, 73, 73, 73, 73, 73 + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 0, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 0, 20, 10, + 0, 0, 0, 30, 0, 0, 74, 74, 5, 15, + 0, 25, 40, 74, 15, 0, 0, 10, 35, 74, + 10, 0, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74, 74, 74, 74, 74, + 74, 74, 74, 74, 74, 74 }; register int hval = len; @@ -97,33 +97,35 @@ struct kconf_id_strings_t char kconf_id_strings_str8[sizeof("tristate")]; char kconf_id_strings_str9[sizeof("endchoice")]; char kconf_id_strings_str10[sizeof("---help---")]; +char kconf_id_strings_str11[sizeof("select")]; char kconf_id_strings_str12[sizeof("def_tristate")]; char kconf_id_strings_str13[sizeof("def_bool")]; char kconf_id_strings_str14[sizeof("defconfig_list")]; -char kconf_id_strings_str17[sizeof("on")]; -char kconf_id_strings_str18[sizeof("optional")]; -char kconf_id_strings_str21[sizeof("option")]; -char kconf_id_strings_str22[sizeof("endmenu")]; -char kconf_id_strings_str23[sizeof("mainmenu")]; -char kconf_id_strings_str25[sizeof("menuconfig")]; -char kconf_id_strings_str27[sizeof("modules")]; -char kconf_id_strings_str28[sizeof("allnoconfig_y")]; +char kconf_id_strings_str16[sizeof("source")]; +char kconf_id_strings_str17[sizeof("endmenu")]; +char kconf_id_strings_str18[sizeof("allnoconfig_y")]; +char kconf_id_strings_str20[sizeof("range")]; +char kconf_id_strings_str22[sizeof("modules")]; +char kconf_id_strings_str23[sizeof("hex")]; +char kconf_id_strings_str27[sizeof("on")]; char kconf_id_strings_str29[sizeof("menu")]; -char kconf_id_strings_str31[sizeof("select")]; +char kconf_id_strings_str31[sizeof("option")]; char kconf_id_strings_str32[sizeof("comment")]; -char kconf_id_strings_str33[sizeof("env")]; -char kconf_id_strings_str35[sizeof("range")]; -char kconf_id_strings_str36[sizeof("choice")]; -char kconf_id_strings_str39[sizeof("bool")]; -char kconf_id_strings_str41[sizeof("source")]; +char
[PATCH v2 1/5] kconfig: introduce the "imply" keyword
The "imply" keyword is a weak version of "select" where the target config symbol can still be turned off, avoiding those pitfalls that come with the "select" keyword. This is useful e.g. with multiple drivers that want to indicate their ability to hook into a given subsystem while still being able to configure that subsystem out and keep those drivers selected. Currently, the same effect can almost be achieved with: config DRIVER_A tristate config DRIVER_B tristate config DRIVER_C tristate config DRIVER_D tristate [...] config SUBSYSTEM_X tristate default DRIVER_A || DRIVER_B || DRIVER_C || DRIVER_D || [...] This is unwieldly to maintain especially with a large number of drivers. Furthermore, there is no easy way to restrict the choice for SUBSYSTEM_X to y or n, excluding m, when some drivers are built-in. The "select" keyword allows for excluding m, but it excludes n as well. Hence this "imply" keyword. The above becomes: config DRIVER_A tristate imply SUBSYSTEM_X config DRIVER_B tristate imply SUBSYSTEM_X [...] config SUBSYSTEM_X tristate This is much cleaner, and way more flexible than "select". SUBSYSTEM_X can still be configured out, and it can be set as a module when none of the drivers are selected or all of them are also modular. Signed-off-by: Nicolas PitreReviewed-by: Josh Triplett --- Documentation/kbuild/kconfig-language.txt | 28 scripts/kconfig/expr.h| 2 ++ scripts/kconfig/menu.c| 55 ++- scripts/kconfig/symbol.c | 24 +- scripts/kconfig/zconf.gperf | 1 + scripts/kconfig/zconf.y | 16 +++-- 6 files changed, 107 insertions(+), 19 deletions(-) diff --git a/Documentation/kbuild/kconfig-language.txt b/Documentation/kbuild/kconfig-language.txt index 069fcb3eef..5ee0dd3c85 100644 --- a/Documentation/kbuild/kconfig-language.txt +++ b/Documentation/kbuild/kconfig-language.txt @@ -113,6 +113,33 @@ applicable everywhere (see syntax). That will limit the usefulness but on the other hand avoid the illegal configurations all over. +- weak reverse dependencies: "imply" ["if" ] + This is similar to "select" as it enforces a lower limit on another + symbol except that the "implied" config symbol's value may still be + set to n from a direct dependency or with a visible prompt. + Given the following example: + + config FOO + tristate + imply BAZ + + config BAZ + tristate + depends on BAR + + The following values are possible: + + FOO BAR BAZ's default choice for BAZ + --- --- - -- + n y n N/m/y + m y m M/y/n + y y y Y/n + y n * N + + This is useful e.g. with multiple drivers that want to indicate their + ability to hook into a given subsystem while still being able to + configure that subsystem out and keep those drivers selected. + - limiting menu display: "visible if" This attribute is only applicable to menu blocks, if the condition is false, the menu block is not displayed to the user (the symbols @@ -481,6 +508,7 @@ historical issues resolved through these different solutions. b) Match dependency semantics: b1) Swap all "select FOO" to "depends on FOO" or, b2) Swap all "depends on FOO" to "select FOO" + c) Consider the use of "imply" instead of "select" The resolution to a) can be tested with the sample Kconfig file Documentation/kbuild/Kconfig.recursion-issue-01 through the removal diff --git a/scripts/kconfig/expr.h b/scripts/kconfig/expr.h index 973b6f7333..a73f762c48 100644 --- a/scripts/kconfig/expr.h +++ b/scripts/kconfig/expr.h @@ -85,6 +85,7 @@ struct symbol { struct property *prop; struct expr_value dir_dep; struct expr_value rev_dep; + struct expr_value implied; }; #define for_all_symbols(i, sym) for (i = 0; i < SYMBOL_HASHSIZE; i++) for (sym = symbol_hash[i]; sym; sym = sym->next) if (sym->type != S_OTHER) @@ -136,6 +137,7 @@ enum prop_type { P_DEFAULT, /* default y */ P_CHOICE, /* choice value */ P_SELECT, /* select BAR */ + P_IMPLY,/* imply BAR */ P_RANGE,/* range 7..100 (for a symbol) */ P_ENV, /* value from environment variable */ P_SYMBOL, /* where a symbol is defined */ diff --git a/scripts/kconfig/menu.c b/scripts/kconfig/menu.c index aed678e8a7..e9357931b4 100644 --- a/scripts/kconfig/menu.c +++ b/scripts/kconfig/menu.c @@ -233,6 +233,8 @@ static void sym_check_prop(struct symbol *sym) { struct property *prop;
[PATCH v2 1/5] kconfig: introduce the "imply" keyword
The "imply" keyword is a weak version of "select" where the target config symbol can still be turned off, avoiding those pitfalls that come with the "select" keyword. This is useful e.g. with multiple drivers that want to indicate their ability to hook into a given subsystem while still being able to configure that subsystem out and keep those drivers selected. Currently, the same effect can almost be achieved with: config DRIVER_A tristate config DRIVER_B tristate config DRIVER_C tristate config DRIVER_D tristate [...] config SUBSYSTEM_X tristate default DRIVER_A || DRIVER_B || DRIVER_C || DRIVER_D || [...] This is unwieldly to maintain especially with a large number of drivers. Furthermore, there is no easy way to restrict the choice for SUBSYSTEM_X to y or n, excluding m, when some drivers are built-in. The "select" keyword allows for excluding m, but it excludes n as well. Hence this "imply" keyword. The above becomes: config DRIVER_A tristate imply SUBSYSTEM_X config DRIVER_B tristate imply SUBSYSTEM_X [...] config SUBSYSTEM_X tristate This is much cleaner, and way more flexible than "select". SUBSYSTEM_X can still be configured out, and it can be set as a module when none of the drivers are selected or all of them are also modular. Signed-off-by: Nicolas Pitre Reviewed-by: Josh Triplett --- Documentation/kbuild/kconfig-language.txt | 28 scripts/kconfig/expr.h| 2 ++ scripts/kconfig/menu.c| 55 ++- scripts/kconfig/symbol.c | 24 +- scripts/kconfig/zconf.gperf | 1 + scripts/kconfig/zconf.y | 16 +++-- 6 files changed, 107 insertions(+), 19 deletions(-) diff --git a/Documentation/kbuild/kconfig-language.txt b/Documentation/kbuild/kconfig-language.txt index 069fcb3eef..5ee0dd3c85 100644 --- a/Documentation/kbuild/kconfig-language.txt +++ b/Documentation/kbuild/kconfig-language.txt @@ -113,6 +113,33 @@ applicable everywhere (see syntax). That will limit the usefulness but on the other hand avoid the illegal configurations all over. +- weak reverse dependencies: "imply" ["if" ] + This is similar to "select" as it enforces a lower limit on another + symbol except that the "implied" config symbol's value may still be + set to n from a direct dependency or with a visible prompt. + Given the following example: + + config FOO + tristate + imply BAZ + + config BAZ + tristate + depends on BAR + + The following values are possible: + + FOO BAR BAZ's default choice for BAZ + --- --- - -- + n y n N/m/y + m y m M/y/n + y y y Y/n + y n * N + + This is useful e.g. with multiple drivers that want to indicate their + ability to hook into a given subsystem while still being able to + configure that subsystem out and keep those drivers selected. + - limiting menu display: "visible if" This attribute is only applicable to menu blocks, if the condition is false, the menu block is not displayed to the user (the symbols @@ -481,6 +508,7 @@ historical issues resolved through these different solutions. b) Match dependency semantics: b1) Swap all "select FOO" to "depends on FOO" or, b2) Swap all "depends on FOO" to "select FOO" + c) Consider the use of "imply" instead of "select" The resolution to a) can be tested with the sample Kconfig file Documentation/kbuild/Kconfig.recursion-issue-01 through the removal diff --git a/scripts/kconfig/expr.h b/scripts/kconfig/expr.h index 973b6f7333..a73f762c48 100644 --- a/scripts/kconfig/expr.h +++ b/scripts/kconfig/expr.h @@ -85,6 +85,7 @@ struct symbol { struct property *prop; struct expr_value dir_dep; struct expr_value rev_dep; + struct expr_value implied; }; #define for_all_symbols(i, sym) for (i = 0; i < SYMBOL_HASHSIZE; i++) for (sym = symbol_hash[i]; sym; sym = sym->next) if (sym->type != S_OTHER) @@ -136,6 +137,7 @@ enum prop_type { P_DEFAULT, /* default y */ P_CHOICE, /* choice value */ P_SELECT, /* select BAR */ + P_IMPLY,/* imply BAR */ P_RANGE,/* range 7..100 (for a symbol) */ P_ENV, /* value from environment variable */ P_SYMBOL, /* where a symbol is defined */ diff --git a/scripts/kconfig/menu.c b/scripts/kconfig/menu.c index aed678e8a7..e9357931b4 100644 --- a/scripts/kconfig/menu.c +++ b/scripts/kconfig/menu.c @@ -233,6 +233,8 @@ static void sym_check_prop(struct symbol *sym) { struct property *prop; struct symbol *sym2; + char
Re: [PATCH -next 1/2] Input: synaptics-rmi4 - add support for F55 sensor tuning
On 10/25/2016 11:26 AM, Andrew Duggan wrote: On 10/24/2016 08:13 PM, Guenter Roeck wrote: Hi Andrew, On 10/24/2016 05:59 PM, Andrew Duggan wrote: Hi Guenter, I have a couple of comments below. Thanks a lot for the feedback. On 09/30/2016 08:22 PM, Guenter Roeck wrote: Sensor tuning support is needed to determine the number of enabled tx and rx electrodes for use in F54 functions. The number of enabled electrodes is not identical to the total number of electrodes as reported with F55:Query0 and F55:Query1. It has to be calculated by analyzing F55:Ctrl1 (sensor receiver assignment) and F55:Ctrl2 (sensor transmitter assignment). Support for additional sensor tuning functions may be added later. Signed-off-by: Guenter Roeck--- This patch applies to next-20160930. drivers/input/rmi4/Kconfig | 9 +++ drivers/input/rmi4/Makefile | 1 + drivers/input/rmi4/rmi_bus.c| 3 + drivers/input/rmi4/rmi_driver.h | 1 + drivers/input/rmi4/rmi_f55.c| 127 5 files changed, 141 insertions(+) create mode 100644 drivers/input/rmi4/rmi_f55.c diff --git a/drivers/input/rmi4/Kconfig b/drivers/input/rmi4/Kconfig index 4c8a55857e00..11ede43c9936 100644 --- a/drivers/input/rmi4/Kconfig +++ b/drivers/input/rmi4/Kconfig @@ -72,3 +72,12 @@ config RMI4_F54 Function 54 provides access to various diagnostic features in certain RMI4 touch sensors. + +config RMI4_F55 +bool "RMI4 Function 55 (Sensor tuning)" +depends on RMI4_CORE +help + Say Y here if you want to add support for RMI4 function 55 + + Function 55 provides access to the RMI4 touch sensor tuning + mechanism. diff --git a/drivers/input/rmi4/Makefile b/drivers/input/rmi4/Makefile index 0bafc8502c4b..96f8e0c21e3b 100644 --- a/drivers/input/rmi4/Makefile +++ b/drivers/input/rmi4/Makefile @@ -8,6 +8,7 @@ rmi_core-$(CONFIG_RMI4_F11) += rmi_f11.o rmi_core-$(CONFIG_RMI4_F12) += rmi_f12.o rmi_core-$(CONFIG_RMI4_F30) += rmi_f30.o rmi_core-$(CONFIG_RMI4_F54) += rmi_f54.o +rmi_core-$(CONFIG_RMI4_F55) += rmi_f55.o # Transports obj-$(CONFIG_RMI4_I2C) += rmi_i2c.o diff --git a/drivers/input/rmi4/rmi_bus.c b/drivers/input/rmi4/rmi_bus.c index ef8c747c35e7..82b7d4960858 100644 --- a/drivers/input/rmi4/rmi_bus.c +++ b/drivers/input/rmi4/rmi_bus.c @@ -314,6 +314,9 @@ static struct rmi_function_handler *fn_handlers[] = { #ifdef CONFIG_RMI4_F54 _f54_handler, #endif +#ifdef CONFIG_RMI4_F55 +_f55_handler, +#endif }; static void __rmi_unregister_function_handlers(int start_idx) diff --git a/drivers/input/rmi4/rmi_driver.h b/drivers/input/rmi4/rmi_driver.h index 8dfbebe9bf86..a65cf70f61e2 100644 --- a/drivers/input/rmi4/rmi_driver.h +++ b/drivers/input/rmi4/rmi_driver.h @@ -103,4 +103,5 @@ extern struct rmi_function_handler rmi_f11_handler; extern struct rmi_function_handler rmi_f12_handler; extern struct rmi_function_handler rmi_f30_handler; extern struct rmi_function_handler rmi_f54_handler; +extern struct rmi_function_handler rmi_f55_handler; #endif diff --git a/drivers/input/rmi4/rmi_f55.c b/drivers/input/rmi4/rmi_f55.c new file mode 100644 index ..268fa904205a --- /dev/null +++ b/drivers/input/rmi4/rmi_f55.c @@ -0,0 +1,127 @@ +/* + * Copyright (c) 2012-2015 Synaptics Incorporated + * Copyright (C) 2016 Zodiac Inflight Innovations + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + */ + +#include +#include +#include This is incidental, but I don't think i2c.h needs to be included here since this file shouldn't contain anything i2c specific. Its not that big a deal, but I noticed it so I thought I would mention it. Makes sense. delay.h and input.h seem to be unnecessary too. I'll remove those if/when I resubmit. +#include +#include +#include +#include +#include "rmi_driver.h" + +#define F55_NAME"rmi4_f55" + +/* F55 data offsets */ +#define F55_NUM_RX_OFFSET0 +#define F55_NUM_TX_OFFSET1 +#define F55_PHYS_CHAR_OFFSET2 + +/* Fixed sizes of reports */ +#define F55_QUERY_LEN17 How did you chose the number 17? The number of F55 query registers present will depend on how the firmware is configured so the total length of query registers can change. Right now this driver is only using the first three F55 query registers which will always be present so that not an issue. But, beyond query 2 not all query registers are guaranteed to be present. According to the information I have, the maximum size is 17. Do you have a better idea on how to handle the dynamic length ? Or a better number ? Should I only read the minimum ? Or the number we actually need (3) at this point ? Or just name the define F55_QUERY_MAXLEN and change the comment to "maximum size of report" ? I would just read the three registers which
Re: [PATCH -next 1/2] Input: synaptics-rmi4 - add support for F55 sensor tuning
On 10/25/2016 11:26 AM, Andrew Duggan wrote: On 10/24/2016 08:13 PM, Guenter Roeck wrote: Hi Andrew, On 10/24/2016 05:59 PM, Andrew Duggan wrote: Hi Guenter, I have a couple of comments below. Thanks a lot for the feedback. On 09/30/2016 08:22 PM, Guenter Roeck wrote: Sensor tuning support is needed to determine the number of enabled tx and rx electrodes for use in F54 functions. The number of enabled electrodes is not identical to the total number of electrodes as reported with F55:Query0 and F55:Query1. It has to be calculated by analyzing F55:Ctrl1 (sensor receiver assignment) and F55:Ctrl2 (sensor transmitter assignment). Support for additional sensor tuning functions may be added later. Signed-off-by: Guenter Roeck --- This patch applies to next-20160930. drivers/input/rmi4/Kconfig | 9 +++ drivers/input/rmi4/Makefile | 1 + drivers/input/rmi4/rmi_bus.c| 3 + drivers/input/rmi4/rmi_driver.h | 1 + drivers/input/rmi4/rmi_f55.c| 127 5 files changed, 141 insertions(+) create mode 100644 drivers/input/rmi4/rmi_f55.c diff --git a/drivers/input/rmi4/Kconfig b/drivers/input/rmi4/Kconfig index 4c8a55857e00..11ede43c9936 100644 --- a/drivers/input/rmi4/Kconfig +++ b/drivers/input/rmi4/Kconfig @@ -72,3 +72,12 @@ config RMI4_F54 Function 54 provides access to various diagnostic features in certain RMI4 touch sensors. + +config RMI4_F55 +bool "RMI4 Function 55 (Sensor tuning)" +depends on RMI4_CORE +help + Say Y here if you want to add support for RMI4 function 55 + + Function 55 provides access to the RMI4 touch sensor tuning + mechanism. diff --git a/drivers/input/rmi4/Makefile b/drivers/input/rmi4/Makefile index 0bafc8502c4b..96f8e0c21e3b 100644 --- a/drivers/input/rmi4/Makefile +++ b/drivers/input/rmi4/Makefile @@ -8,6 +8,7 @@ rmi_core-$(CONFIG_RMI4_F11) += rmi_f11.o rmi_core-$(CONFIG_RMI4_F12) += rmi_f12.o rmi_core-$(CONFIG_RMI4_F30) += rmi_f30.o rmi_core-$(CONFIG_RMI4_F54) += rmi_f54.o +rmi_core-$(CONFIG_RMI4_F55) += rmi_f55.o # Transports obj-$(CONFIG_RMI4_I2C) += rmi_i2c.o diff --git a/drivers/input/rmi4/rmi_bus.c b/drivers/input/rmi4/rmi_bus.c index ef8c747c35e7..82b7d4960858 100644 --- a/drivers/input/rmi4/rmi_bus.c +++ b/drivers/input/rmi4/rmi_bus.c @@ -314,6 +314,9 @@ static struct rmi_function_handler *fn_handlers[] = { #ifdef CONFIG_RMI4_F54 _f54_handler, #endif +#ifdef CONFIG_RMI4_F55 +_f55_handler, +#endif }; static void __rmi_unregister_function_handlers(int start_idx) diff --git a/drivers/input/rmi4/rmi_driver.h b/drivers/input/rmi4/rmi_driver.h index 8dfbebe9bf86..a65cf70f61e2 100644 --- a/drivers/input/rmi4/rmi_driver.h +++ b/drivers/input/rmi4/rmi_driver.h @@ -103,4 +103,5 @@ extern struct rmi_function_handler rmi_f11_handler; extern struct rmi_function_handler rmi_f12_handler; extern struct rmi_function_handler rmi_f30_handler; extern struct rmi_function_handler rmi_f54_handler; +extern struct rmi_function_handler rmi_f55_handler; #endif diff --git a/drivers/input/rmi4/rmi_f55.c b/drivers/input/rmi4/rmi_f55.c new file mode 100644 index ..268fa904205a --- /dev/null +++ b/drivers/input/rmi4/rmi_f55.c @@ -0,0 +1,127 @@ +/* + * Copyright (c) 2012-2015 Synaptics Incorporated + * Copyright (C) 2016 Zodiac Inflight Innovations + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + */ + +#include +#include +#include This is incidental, but I don't think i2c.h needs to be included here since this file shouldn't contain anything i2c specific. Its not that big a deal, but I noticed it so I thought I would mention it. Makes sense. delay.h and input.h seem to be unnecessary too. I'll remove those if/when I resubmit. +#include +#include +#include +#include +#include "rmi_driver.h" + +#define F55_NAME"rmi4_f55" + +/* F55 data offsets */ +#define F55_NUM_RX_OFFSET0 +#define F55_NUM_TX_OFFSET1 +#define F55_PHYS_CHAR_OFFSET2 + +/* Fixed sizes of reports */ +#define F55_QUERY_LEN17 How did you chose the number 17? The number of F55 query registers present will depend on how the firmware is configured so the total length of query registers can change. Right now this driver is only using the first three F55 query registers which will always be present so that not an issue. But, beyond query 2 not all query registers are guaranteed to be present. According to the information I have, the maximum size is 17. Do you have a better idea on how to handle the dynamic length ? Or a better number ? Should I only read the minimum ? Or the number we actually need (3) at this point ? Or just name the define F55_QUERY_MAXLEN and change the comment to "maximum size of report" ? I would just read the three registers which you are using. Those
[PATCH] Change the document about iowait
The iowait is not reliable by reading from /proc/stat, so this method to get iowait is not suggested. And we mark it in the document. Signed-off-by: Cao JinSigned-off-by: Chao Fan --- Documentation/filesystems/proc.txt | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 74329fd..71f5096 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -1305,7 +1305,16 @@ second). The meanings of the columns are as follows, from left to right: - nice: niced processes executing in user mode - system: processes executing in kernel mode - idle: twiddling thumbs -- iowait: waiting for I/O to complete +- iowait: In a word, iowait stands for waiting for I/O to complete. But there + are several problems: + 1. Cpu will not wait for I/O to complete, iowait is the time that a task is + waiting for I/O to complete. When cpu goes into idle state for + outstanding task io, another task will be scheduled on this CPU. + 2. In a multi-core CPU, the task waiting for I/O to complete is not running + on any CPU, so the iowait of each CPU is difficult to calculate. + 3. The value of iowait field in /proc/stat will decrease in certain + conditions. + So, the iowait is not reliable by reading from /proc/stat. - irq: servicing interrupts - softirq: servicing softirqs - steal: involuntary wait -- 2.7.4
[PATCH] Change the document about iowait
The iowait is not reliable by reading from /proc/stat, so this method to get iowait is not suggested. And we mark it in the document. Signed-off-by: Cao Jin Signed-off-by: Chao Fan --- Documentation/filesystems/proc.txt | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 74329fd..71f5096 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -1305,7 +1305,16 @@ second). The meanings of the columns are as follows, from left to right: - nice: niced processes executing in user mode - system: processes executing in kernel mode - idle: twiddling thumbs -- iowait: waiting for I/O to complete +- iowait: In a word, iowait stands for waiting for I/O to complete. But there + are several problems: + 1. Cpu will not wait for I/O to complete, iowait is the time that a task is + waiting for I/O to complete. When cpu goes into idle state for + outstanding task io, another task will be scheduled on this CPU. + 2. In a multi-core CPU, the task waiting for I/O to complete is not running + on any CPU, so the iowait of each CPU is difficult to calculate. + 3. The value of iowait field in /proc/stat will decrease in certain + conditions. + So, the iowait is not reliable by reading from /proc/stat. - irq: servicing interrupts - softirq: servicing softirqs - steal: involuntary wait -- 2.7.4
Re: [PATCH] ARM: imx6: Fix GPC probe error path
On 10/25/2016 10:34 AM, Guenter Roeck wrote: GPC may fail to instantiate with imx-gpc: probe of 20dc000.gpc failed with error -22 which is returned from of_genpd_add_provider_onecell(). The error path does not call pm_genpd_remove(). This results in the following crash later on. Unhandled fault: page domain fault (0x01b) at 0x0040 pgd = c0204000 [0040] *pgd= Internal error: : 1b [#1] SMP ARM Modules linked in: CPU: 0 PID: 108 Comm: kworker/0:3 Not tainted 4.9.0-rc2 #8 Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) Workqueue: pm genpd_power_off_work_fn task: c759ea00 task.stack: c766a000 PC is at mutex_lock+0xc/0x4c LR is at regulator_disable+0x28/0x64 ... [] (mutex_lock) from [] (regulator_disable+0x28/0x64) [] (regulator_disable) from [] (imx6q_pm_pu_power_off+0x90/0x98) [] (imx6q_pm_pu_power_off) from [] (genpd_poweroff+0x114/0x1d4) [] (genpd_poweroff) from [] (genpd_power_off_work_fn+0x20/0x2c) [] (genpd_power_off_work_fn) from [] (process_one_work+0x138/0x34c) [] (process_one_work) from [] (worker_thread+0x38/0x510) [] (worker_thread) from [] (kthread+0xdc/0xf4) [] (kthread) from [] (ret_from_fork+0x14/0x3c) This is seen with multi_v7_defconfig and imx6dl-sabrelite.dtb running in qemu (v2.7 patched to fix a qemu related problem). The error return from of_genpd_add_provider_onecell() is not seen in v4.8 and may be caused by a devicetree change (this is a wild guess only), but that is a different problem. Fixes: 00eb60a8b4f7 ("ARM: imx6: gpc: Add PU power domain for GPU/VPU") Cc: Philipp ZabelCc: Arnd Bergmann Signed-off-by: Guenter Roeck --- Several bisect attempts trying to track down "imx-gpc: probe ... failed with error -22" point to commit 00e729c93395 ("Merge tag 'armsoc-dt' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc"). I have not been able to track down the real culprit. Part of the problem is that CONFIG_REGULATOR_ANATOP must be enabled for the problem to be seen, and CONFIG_ARCH_AT91 causes compile errors for some sequence of commits between v4.8 and v4.9-rc1. But even after taking this into account, the bisect results always point to 00e729c93395. If anyone has an idea how to track down that problem, or what might be causing it, please let me know. Looking into this some more, it turns out that of_genpd_add_provider_onecell() now returns an error if one of the provided power domains does not exist. In this case, the "ARM" power domain does not exist. I don't see where it is created, so it may well be that this now fails for all imx6 boards with multi_v7_defconfig. Looking into kernelci.org test results, this is confirmed for at least imx6dl-riotboard. Overall I think it is quite safe to assume that all imx6 boards crash with mainline kernels and multi_v7_defconfig. The change can be tracked down to commit 0159ec67076 ("PM / Domains: Verify the PM domain is present when adding a provider"). Adding everyone in the commit log for feedback. Guenter arch/arm/mach-imx/gpc.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/arch/arm/mach-imx/gpc.c b/arch/arm/mach-imx/gpc.c index 0df062d8b2c9..f3f40045b4c9 100644 --- a/arch/arm/mach-imx/gpc.c +++ b/arch/arm/mach-imx/gpc.c @@ -409,6 +409,7 @@ static int imx_gpc_genpd_init(struct device *dev, struct regulator *pu_reg) { struct clk *clk; int i; + int ret; imx6q_pu_domain.reg = pu_reg; @@ -431,9 +432,14 @@ static int imx_gpc_genpd_init(struct device *dev, struct regulator *pu_reg) return 0; pm_genpd_init(_pu_domain.base, NULL, false); - return of_genpd_add_provider_onecell(dev->of_node, -_gpc_onecell_data); + ret = of_genpd_add_provider_onecell(dev->of_node, + _gpc_onecell_data); + if (ret) + goto genpd_remove; + return 0; +genpd_remove: + pm_genpd_remove(_pu_domain.base); clk_err: while (i--) clk_put(imx6q_pu_domain.clk[i]);
Re: [PATCH] ARM: imx6: Fix GPC probe error path
On 10/25/2016 10:34 AM, Guenter Roeck wrote: GPC may fail to instantiate with imx-gpc: probe of 20dc000.gpc failed with error -22 which is returned from of_genpd_add_provider_onecell(). The error path does not call pm_genpd_remove(). This results in the following crash later on. Unhandled fault: page domain fault (0x01b) at 0x0040 pgd = c0204000 [0040] *pgd= Internal error: : 1b [#1] SMP ARM Modules linked in: CPU: 0 PID: 108 Comm: kworker/0:3 Not tainted 4.9.0-rc2 #8 Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) Workqueue: pm genpd_power_off_work_fn task: c759ea00 task.stack: c766a000 PC is at mutex_lock+0xc/0x4c LR is at regulator_disable+0x28/0x64 ... [] (mutex_lock) from [] (regulator_disable+0x28/0x64) [] (regulator_disable) from [] (imx6q_pm_pu_power_off+0x90/0x98) [] (imx6q_pm_pu_power_off) from [] (genpd_poweroff+0x114/0x1d4) [] (genpd_poweroff) from [] (genpd_power_off_work_fn+0x20/0x2c) [] (genpd_power_off_work_fn) from [] (process_one_work+0x138/0x34c) [] (process_one_work) from [] (worker_thread+0x38/0x510) [] (worker_thread) from [] (kthread+0xdc/0xf4) [] (kthread) from [] (ret_from_fork+0x14/0x3c) This is seen with multi_v7_defconfig and imx6dl-sabrelite.dtb running in qemu (v2.7 patched to fix a qemu related problem). The error return from of_genpd_add_provider_onecell() is not seen in v4.8 and may be caused by a devicetree change (this is a wild guess only), but that is a different problem. Fixes: 00eb60a8b4f7 ("ARM: imx6: gpc: Add PU power domain for GPU/VPU") Cc: Philipp Zabel Cc: Arnd Bergmann Signed-off-by: Guenter Roeck --- Several bisect attempts trying to track down "imx-gpc: probe ... failed with error -22" point to commit 00e729c93395 ("Merge tag 'armsoc-dt' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc"). I have not been able to track down the real culprit. Part of the problem is that CONFIG_REGULATOR_ANATOP must be enabled for the problem to be seen, and CONFIG_ARCH_AT91 causes compile errors for some sequence of commits between v4.8 and v4.9-rc1. But even after taking this into account, the bisect results always point to 00e729c93395. If anyone has an idea how to track down that problem, or what might be causing it, please let me know. Looking into this some more, it turns out that of_genpd_add_provider_onecell() now returns an error if one of the provided power domains does not exist. In this case, the "ARM" power domain does not exist. I don't see where it is created, so it may well be that this now fails for all imx6 boards with multi_v7_defconfig. Looking into kernelci.org test results, this is confirmed for at least imx6dl-riotboard. Overall I think it is quite safe to assume that all imx6 boards crash with mainline kernels and multi_v7_defconfig. The change can be tracked down to commit 0159ec67076 ("PM / Domains: Verify the PM domain is present when adding a provider"). Adding everyone in the commit log for feedback. Guenter arch/arm/mach-imx/gpc.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/arch/arm/mach-imx/gpc.c b/arch/arm/mach-imx/gpc.c index 0df062d8b2c9..f3f40045b4c9 100644 --- a/arch/arm/mach-imx/gpc.c +++ b/arch/arm/mach-imx/gpc.c @@ -409,6 +409,7 @@ static int imx_gpc_genpd_init(struct device *dev, struct regulator *pu_reg) { struct clk *clk; int i; + int ret; imx6q_pu_domain.reg = pu_reg; @@ -431,9 +432,14 @@ static int imx_gpc_genpd_init(struct device *dev, struct regulator *pu_reg) return 0; pm_genpd_init(_pu_domain.base, NULL, false); - return of_genpd_add_provider_onecell(dev->of_node, -_gpc_onecell_data); + ret = of_genpd_add_provider_onecell(dev->of_node, + _gpc_onecell_data); + if (ret) + goto genpd_remove; + return 0; +genpd_remove: + pm_genpd_remove(_pu_domain.base); clk_err: while (i--) clk_put(imx6q_pu_domain.clk[i]);