Re: [RFC PATCH 1/5] mm/page_alloc: always add freeing page at the tail of the buddy list

2016-10-25 Thread Joonsoo Kim
On Wed, Oct 26, 2016 at 01:50:37PM +0800, Xishi Qiu wrote:
> On 2016/10/26 12:37, Joonsoo Kim wrote:
> 
> > On Mon, Oct 17, 2016 at 05:21:54PM +0800, Xishi Qiu wrote:
> >> On 2016/10/13 16:08, js1...@gmail.com wrote:
> >>
> >>> From: Joonsoo Kim 
> >>>
> >>> Currently, freeing page can stay longer in the buddy list if next higher
> >>> order page is in the buddy list in order to help coalescence. However,
> >>> it doesn't work for the simplest sequential free case. For example, think
> >>> about the situation that 8 consecutive pages are freed in sequential
> >>> order.
> >>>
> >>> page 0: attached at the head of order 0 list
> >>> page 1: merged with page 0, attached at the head of order 1 list
> >>> page 2: attached at the tail of order 0 list
> >>> page 3: merged with page 2 and then merged with page 0, attached at
> >>>  the head of order 2 list
> >>> page 4: attached at the head of order 0 list
> >>> page 5: merged with page 4, attached at the tail of order 1 list
> >>> page 6: attached at the tail of order 0 list
> >>> page 7: merged with page 6 and then merged with page 4. Lastly, merged
> >>>  with page 0 and we get order 3 freepage.
> >>>
> >>> With excluding page 0 case, there are three cases that freeing page is
> >>> attached at the head of buddy list in this example and if just one
> >>> corresponding ordered allocation request comes at that moment, this page
> >>> in being a high order page will be allocated and we would fail to make
> >>> order-3 freepage.
> >>>
> >>> Allocation usually happens in sequential order and free also does. So, it
> >>> would be important to detect such a situation and to give some chance
> >>> to be coalesced.
> >>>
> >>> I think that simple and effective heuristic about this case is just
> >>> attaching freeing page at the tail of the buddy list unconditionally.
> >>> If freeing isn't merged during one rotation, it would be actual
> >>> fragmentation and we don't need to care about it for coalescence.
> >>>
> >>
> >> Hi Joonsoo,
> >>
> >> I find another two places to reduce fragmentation.
> >>
> >> 1)
> >> __rmqueue_fallback
> >>steal_suitable_fallback
> >>move_freepages_block
> >>move_freepages
> >>list_move
> >> If we steal some free pages, we will add these page at the head of 
> >> start_migratetype list,
> >> this will cause more fixed migratetype, because this pages will be 
> >> allocated more easily.
> >> So how about use list_move_tail instead of list_move?
> > 
> > Yeah... I don't think deeply but, at a glance, it would be helpful.
> > 
> >>
> >> 2)
> >> __rmqueue_fallback
> >>expand
> >>list_add
> >> How about use list_add_tail instead of list_add? If add the tail, then the 
> >> rest of pages
> >> will be hard to be allocated and we can merge them again as soon as the 
> >> page freed.
> > 
> > I guess that it has no effect. When we do __rmqueue_fallback() and
> > expand(), we don't have any freepage on this or more order. So,
> > list_add or list_add_tail will show the same result.
> > 
> 
> Hi Joonsoo,
> 
> Usually this list is empty, but in the following case, the list is not empty.
> 
> __rmqueue_fallback
>   steal_suitable_fallback
>   move_freepages_block  // move to the list of start_migratetype
>   expand  // split the largest order first
>   list_add  // add to the list of start_migratetype

In this case, stealed freepage on steal_suitable_fallback() and
splitted freepage would come from the same pageblock. So, it doen't
matter to use whatever list_add* function.

Thanks.


Re: [RFC PATCH 1/5] mm/page_alloc: always add freeing page at the tail of the buddy list

2016-10-25 Thread Joonsoo Kim
On Wed, Oct 26, 2016 at 01:50:37PM +0800, Xishi Qiu wrote:
> On 2016/10/26 12:37, Joonsoo Kim wrote:
> 
> > On Mon, Oct 17, 2016 at 05:21:54PM +0800, Xishi Qiu wrote:
> >> On 2016/10/13 16:08, js1...@gmail.com wrote:
> >>
> >>> From: Joonsoo Kim 
> >>>
> >>> Currently, freeing page can stay longer in the buddy list if next higher
> >>> order page is in the buddy list in order to help coalescence. However,
> >>> it doesn't work for the simplest sequential free case. For example, think
> >>> about the situation that 8 consecutive pages are freed in sequential
> >>> order.
> >>>
> >>> page 0: attached at the head of order 0 list
> >>> page 1: merged with page 0, attached at the head of order 1 list
> >>> page 2: attached at the tail of order 0 list
> >>> page 3: merged with page 2 and then merged with page 0, attached at
> >>>  the head of order 2 list
> >>> page 4: attached at the head of order 0 list
> >>> page 5: merged with page 4, attached at the tail of order 1 list
> >>> page 6: attached at the tail of order 0 list
> >>> page 7: merged with page 6 and then merged with page 4. Lastly, merged
> >>>  with page 0 and we get order 3 freepage.
> >>>
> >>> With excluding page 0 case, there are three cases that freeing page is
> >>> attached at the head of buddy list in this example and if just one
> >>> corresponding ordered allocation request comes at that moment, this page
> >>> in being a high order page will be allocated and we would fail to make
> >>> order-3 freepage.
> >>>
> >>> Allocation usually happens in sequential order and free also does. So, it
> >>> would be important to detect such a situation and to give some chance
> >>> to be coalesced.
> >>>
> >>> I think that simple and effective heuristic about this case is just
> >>> attaching freeing page at the tail of the buddy list unconditionally.
> >>> If freeing isn't merged during one rotation, it would be actual
> >>> fragmentation and we don't need to care about it for coalescence.
> >>>
> >>
> >> Hi Joonsoo,
> >>
> >> I find another two places to reduce fragmentation.
> >>
> >> 1)
> >> __rmqueue_fallback
> >>steal_suitable_fallback
> >>move_freepages_block
> >>move_freepages
> >>list_move
> >> If we steal some free pages, we will add these page at the head of 
> >> start_migratetype list,
> >> this will cause more fixed migratetype, because this pages will be 
> >> allocated more easily.
> >> So how about use list_move_tail instead of list_move?
> > 
> > Yeah... I don't think deeply but, at a glance, it would be helpful.
> > 
> >>
> >> 2)
> >> __rmqueue_fallback
> >>expand
> >>list_add
> >> How about use list_add_tail instead of list_add? If add the tail, then the 
> >> rest of pages
> >> will be hard to be allocated and we can merge them again as soon as the 
> >> page freed.
> > 
> > I guess that it has no effect. When we do __rmqueue_fallback() and
> > expand(), we don't have any freepage on this or more order. So,
> > list_add or list_add_tail will show the same result.
> > 
> 
> Hi Joonsoo,
> 
> Usually this list is empty, but in the following case, the list is not empty.
> 
> __rmqueue_fallback
>   steal_suitable_fallback
>   move_freepages_block  // move to the list of start_migratetype
>   expand  // split the largest order first
>   list_add  // add to the list of start_migratetype

In this case, stealed freepage on steal_suitable_fallback() and
splitted freepage would come from the same pageblock. So, it doen't
matter to use whatever list_add* function.

Thanks.


Re: [RFC PATCH 1/5] mm/page_alloc: always add freeing page at the tail of the buddy list

2016-10-25 Thread Xishi Qiu
On 2016/10/26 12:37, Joonsoo Kim wrote:

> On Mon, Oct 17, 2016 at 05:21:54PM +0800, Xishi Qiu wrote:
>> On 2016/10/13 16:08, js1...@gmail.com wrote:
>>
>>> From: Joonsoo Kim 
>>>
>>> Currently, freeing page can stay longer in the buddy list if next higher
>>> order page is in the buddy list in order to help coalescence. However,
>>> it doesn't work for the simplest sequential free case. For example, think
>>> about the situation that 8 consecutive pages are freed in sequential
>>> order.
>>>
>>> page 0: attached at the head of order 0 list
>>> page 1: merged with page 0, attached at the head of order 1 list
>>> page 2: attached at the tail of order 0 list
>>> page 3: merged with page 2 and then merged with page 0, attached at
>>>  the head of order 2 list
>>> page 4: attached at the head of order 0 list
>>> page 5: merged with page 4, attached at the tail of order 1 list
>>> page 6: attached at the tail of order 0 list
>>> page 7: merged with page 6 and then merged with page 4. Lastly, merged
>>>  with page 0 and we get order 3 freepage.
>>>
>>> With excluding page 0 case, there are three cases that freeing page is
>>> attached at the head of buddy list in this example and if just one
>>> corresponding ordered allocation request comes at that moment, this page
>>> in being a high order page will be allocated and we would fail to make
>>> order-3 freepage.
>>>
>>> Allocation usually happens in sequential order and free also does. So, it
>>> would be important to detect such a situation and to give some chance
>>> to be coalesced.
>>>
>>> I think that simple and effective heuristic about this case is just
>>> attaching freeing page at the tail of the buddy list unconditionally.
>>> If freeing isn't merged during one rotation, it would be actual
>>> fragmentation and we don't need to care about it for coalescence.
>>>
>>
>> Hi Joonsoo,
>>
>> I find another two places to reduce fragmentation.
>>
>> 1)
>> __rmqueue_fallback
>>  steal_suitable_fallback
>>  move_freepages_block
>>  move_freepages
>>  list_move
>> If we steal some free pages, we will add these page at the head of 
>> start_migratetype list,
>> this will cause more fixed migratetype, because this pages will be allocated 
>> more easily.
>> So how about use list_move_tail instead of list_move?
> 
> Yeah... I don't think deeply but, at a glance, it would be helpful.
> 
>>
>> 2)
>> __rmqueue_fallback
>>  expand
>>  list_add
>> How about use list_add_tail instead of list_add? If add the tail, then the 
>> rest of pages
>> will be hard to be allocated and we can merge them again as soon as the page 
>> freed.
> 
> I guess that it has no effect. When we do __rmqueue_fallback() and
> expand(), we don't have any freepage on this or more order. So,
> list_add or list_add_tail will show the same result.
> 

Hi Joonsoo,

Usually this list is empty, but in the following case, the list is not empty.

__rmqueue_fallback
steal_suitable_fallback
move_freepages_block  // move to the list of start_migratetype
expand  // split the largest order first
list_add  // add to the list of start_migratetype

Thanks,
Xishi Qiu





Re: [RFC PATCH 1/5] mm/page_alloc: always add freeing page at the tail of the buddy list

2016-10-25 Thread Xishi Qiu
On 2016/10/26 12:37, Joonsoo Kim wrote:

> On Mon, Oct 17, 2016 at 05:21:54PM +0800, Xishi Qiu wrote:
>> On 2016/10/13 16:08, js1...@gmail.com wrote:
>>
>>> From: Joonsoo Kim 
>>>
>>> Currently, freeing page can stay longer in the buddy list if next higher
>>> order page is in the buddy list in order to help coalescence. However,
>>> it doesn't work for the simplest sequential free case. For example, think
>>> about the situation that 8 consecutive pages are freed in sequential
>>> order.
>>>
>>> page 0: attached at the head of order 0 list
>>> page 1: merged with page 0, attached at the head of order 1 list
>>> page 2: attached at the tail of order 0 list
>>> page 3: merged with page 2 and then merged with page 0, attached at
>>>  the head of order 2 list
>>> page 4: attached at the head of order 0 list
>>> page 5: merged with page 4, attached at the tail of order 1 list
>>> page 6: attached at the tail of order 0 list
>>> page 7: merged with page 6 and then merged with page 4. Lastly, merged
>>>  with page 0 and we get order 3 freepage.
>>>
>>> With excluding page 0 case, there are three cases that freeing page is
>>> attached at the head of buddy list in this example and if just one
>>> corresponding ordered allocation request comes at that moment, this page
>>> in being a high order page will be allocated and we would fail to make
>>> order-3 freepage.
>>>
>>> Allocation usually happens in sequential order and free also does. So, it
>>> would be important to detect such a situation and to give some chance
>>> to be coalesced.
>>>
>>> I think that simple and effective heuristic about this case is just
>>> attaching freeing page at the tail of the buddy list unconditionally.
>>> If freeing isn't merged during one rotation, it would be actual
>>> fragmentation and we don't need to care about it for coalescence.
>>>
>>
>> Hi Joonsoo,
>>
>> I find another two places to reduce fragmentation.
>>
>> 1)
>> __rmqueue_fallback
>>  steal_suitable_fallback
>>  move_freepages_block
>>  move_freepages
>>  list_move
>> If we steal some free pages, we will add these page at the head of 
>> start_migratetype list,
>> this will cause more fixed migratetype, because this pages will be allocated 
>> more easily.
>> So how about use list_move_tail instead of list_move?
> 
> Yeah... I don't think deeply but, at a glance, it would be helpful.
> 
>>
>> 2)
>> __rmqueue_fallback
>>  expand
>>  list_add
>> How about use list_add_tail instead of list_add? If add the tail, then the 
>> rest of pages
>> will be hard to be allocated and we can merge them again as soon as the page 
>> freed.
> 
> I guess that it has no effect. When we do __rmqueue_fallback() and
> expand(), we don't have any freepage on this or more order. So,
> list_add or list_add_tail will show the same result.
> 

Hi Joonsoo,

Usually this list is empty, but in the following case, the list is not empty.

__rmqueue_fallback
steal_suitable_fallback
move_freepages_block  // move to the list of start_migratetype
expand  // split the largest order first
list_add  // add to the list of start_migratetype

Thanks,
Xishi Qiu





Re: [PATCH 1/2] x86/io: add interface to reserve io memtype for a resource range. (v1.1)

2016-10-25 Thread Daniel Vetter
On Tue, Oct 25, 2016 at 07:31:29PM +0200, Luis R. Rodriguez wrote:
> On Mon, Oct 24, 2016 at 04:31:45PM +1000, Dave Airlie wrote:
> > A recent change to the mm code in:
> > 87744ab3832b83ba71b931f86f9cfdb000d07da5
> > mm: fix cache mode tracking in vm_insert_mixed()
> > 
> > started enforcing checking the memory type against the registered list for
> > amixed pfn insertion mappings. It happens that the drm drivers for a number
> > of gpus relied on this being broken. Currently the driver only inserted
> > VRAM mappings into the tracking table when they came from the kernel,
> > and userspace mappings never landed in the table. This led to a regression
> > where all the mapping end up as UC instead of WC now.
> 
> Eek.
> 
> > I've considered a number of solutions but since this needs to be fixed
> > in fixes and not next, and some of the solutions were going to introduce
> > overhead that hadn't been there before I didn't consider them viable at
> > this stage. These mainly concerned hooking into the TTM io reserve APIs,
> > but these API have a bunch of fast paths I didn't want to unwind to add
> > this to.
> > 
> > The solution I've decided on is to add a new API like the arch_phys_wc
> > APIs (these would have worked but wc_del didn't take a range), and
> > use them from the drivers to add a WC compatible mapping to the table
> > for all VRAM on those GPUs. This means we can then create userspace
> > mapping that won't get degraded to UC.
> 
> Is anything on a driver to be able to tell when this is actually needed ?
> How will driver developers know? Can you add a bit of documentation to
> the API? If its transitive towards a secondary solution indicating so
> would help driver developers.

I'll plug the io-mapping stuff again here, and more specifically the
userspace pte wrangling stuff we've added in 4.9 to i915_mm.c. Should
probably move that one to the core. That way io_mapping takes care of the
full reservartion, and allows you to on-demand kmap (for kernel) and write
ptes. All nicely fast and all, and for bonus, also nicely encapsulated.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 1/2] x86/io: add interface to reserve io memtype for a resource range. (v1.1)

2016-10-25 Thread Daniel Vetter
On Tue, Oct 25, 2016 at 07:31:29PM +0200, Luis R. Rodriguez wrote:
> On Mon, Oct 24, 2016 at 04:31:45PM +1000, Dave Airlie wrote:
> > A recent change to the mm code in:
> > 87744ab3832b83ba71b931f86f9cfdb000d07da5
> > mm: fix cache mode tracking in vm_insert_mixed()
> > 
> > started enforcing checking the memory type against the registered list for
> > amixed pfn insertion mappings. It happens that the drm drivers for a number
> > of gpus relied on this being broken. Currently the driver only inserted
> > VRAM mappings into the tracking table when they came from the kernel,
> > and userspace mappings never landed in the table. This led to a regression
> > where all the mapping end up as UC instead of WC now.
> 
> Eek.
> 
> > I've considered a number of solutions but since this needs to be fixed
> > in fixes and not next, and some of the solutions were going to introduce
> > overhead that hadn't been there before I didn't consider them viable at
> > this stage. These mainly concerned hooking into the TTM io reserve APIs,
> > but these API have a bunch of fast paths I didn't want to unwind to add
> > this to.
> > 
> > The solution I've decided on is to add a new API like the arch_phys_wc
> > APIs (these would have worked but wc_del didn't take a range), and
> > use them from the drivers to add a WC compatible mapping to the table
> > for all VRAM on those GPUs. This means we can then create userspace
> > mapping that won't get degraded to UC.
> 
> Is anything on a driver to be able to tell when this is actually needed ?
> How will driver developers know? Can you add a bit of documentation to
> the API? If its transitive towards a secondary solution indicating so
> would help driver developers.

I'll plug the io-mapping stuff again here, and more specifically the
userspace pte wrangling stuff we've added in 4.9 to i915_mm.c. Should
probably move that one to the core. That way io_mapping takes care of the
full reservartion, and allows you to on-demand kmap (for kernel) and write
ptes. All nicely fast and all, and for bonus, also nicely encapsulated.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


[tip:x86/asm] mm/page_alloc: Remove kernel address exposure in free_reserved_area()

2016-10-25 Thread tip-bot for Josh Poimboeuf
Commit-ID:  adb1fe9ae2ee6ef6bc10f3d5a588020e7664dfa7
Gitweb: http://git.kernel.org/tip/adb1fe9ae2ee6ef6bc10f3d5a588020e7664dfa7
Author: Josh Poimboeuf 
AuthorDate: Tue, 25 Oct 2016 09:51:14 -0500
Committer:  Ingo Molnar 
CommitDate: Tue, 25 Oct 2016 18:40:37 +0200

mm/page_alloc: Remove kernel address exposure in free_reserved_area()

Linus suggested we try to remove some of the low-hanging fruit related
to kernel address exposure in dmesg.  The only leaks I see on my local
system are:

  Freeing SMP alternatives memory: 32K (9e309000 - 9e311000)
  Freeing initrd memory: 10588K (a0b736b42000 - a0b737599000)
  Freeing unused kernel memory: 3592K (9df87000 - 9e309000)
  Freeing unused kernel memory: 1352K (a0b7288ae000 - a0b728a0)
  Freeing unused kernel memory: 632K (a0b728d62000 - a0b728e0)

Linus says:

  "I suspect we should just remove [the addresses in the 'Freeing'
   messages]. I'm sure they are useful in theory, but I suspect they
   were more useful back when the whole "free init memory" was
   originally done.

   These days, if we have a use-after-free, I suspect the init-mem
   situation is the easiest situation by far. Compared to all the dynamic
   allocations which are much more likely to show it anyway. So having
   debug output for that case is likely not all that productive."

With this patch the freeing messages now look like this:

  Freeing SMP alternatives memory: 32K
  Freeing initrd memory: 10588K
  Freeing unused kernel memory: 3592K
  Freeing unused kernel memory: 1352K
  Freeing unused kernel memory: 632K

Suggested-by: Linus Torvalds 
Signed-off-by: Josh Poimboeuf 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: linux...@kvack.org
Link: 
http://lkml.kernel.org/r/6836ff90c45b71d38e5d4405aec56fa9e5d1d4b2.1477405374.git.jpoim...@redhat.com
Signed-off-by: Ingo Molnar 
---
 mm/page_alloc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2b3bf67..3f63973 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6508,8 +6508,8 @@ unsigned long free_reserved_area(void *start, void *end, 
int poison, char *s)
}
 
if (pages && s)
-   pr_info("Freeing %s memory: %ldK (%p - %p)\n",
-   s, pages << (PAGE_SHIFT - 10), start, end);
+   pr_info("Freeing %s memory: %ldK\n",
+   s, pages << (PAGE_SHIFT - 10));
 
return pages;
 }


[tip:x86/asm] mm/page_alloc: Remove kernel address exposure in free_reserved_area()

2016-10-25 Thread tip-bot for Josh Poimboeuf
Commit-ID:  adb1fe9ae2ee6ef6bc10f3d5a588020e7664dfa7
Gitweb: http://git.kernel.org/tip/adb1fe9ae2ee6ef6bc10f3d5a588020e7664dfa7
Author: Josh Poimboeuf 
AuthorDate: Tue, 25 Oct 2016 09:51:14 -0500
Committer:  Ingo Molnar 
CommitDate: Tue, 25 Oct 2016 18:40:37 +0200

mm/page_alloc: Remove kernel address exposure in free_reserved_area()

Linus suggested we try to remove some of the low-hanging fruit related
to kernel address exposure in dmesg.  The only leaks I see on my local
system are:

  Freeing SMP alternatives memory: 32K (9e309000 - 9e311000)
  Freeing initrd memory: 10588K (a0b736b42000 - a0b737599000)
  Freeing unused kernel memory: 3592K (9df87000 - 9e309000)
  Freeing unused kernel memory: 1352K (a0b7288ae000 - a0b728a0)
  Freeing unused kernel memory: 632K (a0b728d62000 - a0b728e0)

Linus says:

  "I suspect we should just remove [the addresses in the 'Freeing'
   messages]. I'm sure they are useful in theory, but I suspect they
   were more useful back when the whole "free init memory" was
   originally done.

   These days, if we have a use-after-free, I suspect the init-mem
   situation is the easiest situation by far. Compared to all the dynamic
   allocations which are much more likely to show it anyway. So having
   debug output for that case is likely not all that productive."

With this patch the freeing messages now look like this:

  Freeing SMP alternatives memory: 32K
  Freeing initrd memory: 10588K
  Freeing unused kernel memory: 3592K
  Freeing unused kernel memory: 1352K
  Freeing unused kernel memory: 632K

Suggested-by: Linus Torvalds 
Signed-off-by: Josh Poimboeuf 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: linux...@kvack.org
Link: 
http://lkml.kernel.org/r/6836ff90c45b71d38e5d4405aec56fa9e5d1d4b2.1477405374.git.jpoim...@redhat.com
Signed-off-by: Ingo Molnar 
---
 mm/page_alloc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2b3bf67..3f63973 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6508,8 +6508,8 @@ unsigned long free_reserved_area(void *start, void *end, 
int poison, char *s)
}
 
if (pages && s)
-   pr_info("Freeing %s memory: %ldK (%p - %p)\n",
-   s, pages << (PAGE_SHIFT - 10), start, end);
+   pr_info("Freeing %s memory: %ldK\n",
+   s, pages << (PAGE_SHIFT - 10));
 
return pages;
 }


[tip:x86/asm] x86/dumpstack: Remove raw stack dump

2016-10-25 Thread tip-bot for Josh Poimboeuf
Commit-ID:  0ee1dd9f5e7eae4e55f95935b72d4beecb03de9c
Gitweb: http://git.kernel.org/tip/0ee1dd9f5e7eae4e55f95935b72d4beecb03de9c
Author: Josh Poimboeuf 
AuthorDate: Tue, 25 Oct 2016 09:51:13 -0500
Committer:  Ingo Molnar 
CommitDate: Tue, 25 Oct 2016 18:40:37 +0200

x86/dumpstack: Remove raw stack dump

For mostly historical reasons, the x86 oops dump shows the raw stack
values:

  ...
  [registers]
  Stack:
   880079af7350 880079905400  c98f3ae0
   a0196610 0001 0001 87654321
   0002   
  Call Trace:
  ...

This seems to be an artifact from long ago, and probably isn't needed
anymore.  It generally just adds noise to the dump, and it can be
actively harmful because it leaks kernel addresses.

Linus says:

  "The stack dump actually goes back to forever, and it used to be
   useful back in 1992 or so. But it used to be useful mainly because
   stacks were simpler and we didn't have very good call traces anyway. I
   definitely remember having used them - I just do not remember having
   used them in the last ten+ years.

   Of course, it's still true that if you can trigger an oops, you've
   likely already lost the security game, but since the stack dump is so
   useless, let's aim to just remove it and make games like the above
   harder."

This also removes the related 'kstack=' cmdline option and the
'kstack_depth_to_print' sysctl.

Suggested-by: Linus Torvalds 
Signed-off-by: Josh Poimboeuf 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/e83bd50df52d8fe88e94d2566426ae40d813bf8f.1477405374.git.jpoim...@redhat.com
Signed-off-by: Ingo Molnar 
---
 Documentation/kernel-parameters.txt   |  3 --
 Documentation/sysctl/kernel.txt   |  8 -
 Documentation/x86/x86_64/boot-options.txt |  4 ---
 arch/x86/include/asm/stacktrace.h |  5 ---
 arch/x86/kernel/dumpstack.c   | 21 ++--
 arch/x86/kernel/dumpstack_32.c| 33 +--
 arch/x86/kernel/dumpstack_64.c| 53 +--
 kernel/sysctl.c   |  7 
 8 files changed, 4 insertions(+), 130 deletions(-)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 37babf9..049a917 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1958,9 +1958,6 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
kmemcheck=2 (one-shot mode)
Default: 2 (one-shot mode)
 
-   kstack=N[X86] Print N words from the kernel stack
-   in oops dumps.
-
kvm.ignore_msrs=[KVM] Ignore guest accesses to unhandled MSRs.
Default is 0 (don't ignore, but inject #GP)
 
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index ffab8b5..065f184 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -40,7 +40,6 @@ show up in /proc/sys/kernel:
 - hung_task_warnings
 - kexec_load_disabled
 - kptr_restrict
-- kstack_depth_to_print   [ X86 only ]
 - l2cr[ PPC only ]
 - modprobe==> Documentation/debugging-modules.txt
 - modules_disabled
@@ -395,13 +394,6 @@ When kptr_restrict is set to (2), kernel pointers printed 
using
 
 ==
 
-kstack_depth_to_print: (X86 only)
-
-Controls the number of words to print when dumping the raw
-kernel stack.
-
-==
-
 l2cr: (PPC only)
 
 This flag controls the L2 cache of G3 processor boards. If
diff --git a/Documentation/x86/x86_64/boot-options.txt 
b/Documentation/x86/x86_64/boot-options.txt
index 0965a71..61b611e 100644
--- a/Documentation/x86/x86_64/boot-options.txt
+++ b/Documentation/x86/x86_64/boot-options.txt
@@ -277,10 +277,6 @@ IOMMU (input/output memory management unit)
 space might stop working. Use this option if you have devices that
 are accessed from userspace directly on some PCI host bridge.
 
-Debugging
-
-  kstack=N Print N words from the kernel stack in oops dumps.
-
 Miscellaneous
 
nogbpages
diff --git a/arch/x86/include/asm/stacktrace.h 
b/arch/x86/include/asm/stacktrace.h
index 37f2e0b..1e375b0 100644
--- a/arch/x86/include/asm/stacktrace.h
+++ b/arch/x86/include/asm/stacktrace.h
@@ -43,8 +43,6 @@ static inline bool on_stack(struct stack_info *info, void 
*addr, size_t len)
addr 

[tip:x86/asm] x86/dumpstack: Remove raw stack dump

2016-10-25 Thread tip-bot for Josh Poimboeuf
Commit-ID:  0ee1dd9f5e7eae4e55f95935b72d4beecb03de9c
Gitweb: http://git.kernel.org/tip/0ee1dd9f5e7eae4e55f95935b72d4beecb03de9c
Author: Josh Poimboeuf 
AuthorDate: Tue, 25 Oct 2016 09:51:13 -0500
Committer:  Ingo Molnar 
CommitDate: Tue, 25 Oct 2016 18:40:37 +0200

x86/dumpstack: Remove raw stack dump

For mostly historical reasons, the x86 oops dump shows the raw stack
values:

  ...
  [registers]
  Stack:
   880079af7350 880079905400  c98f3ae0
   a0196610 0001 0001 87654321
   0002   
  Call Trace:
  ...

This seems to be an artifact from long ago, and probably isn't needed
anymore.  It generally just adds noise to the dump, and it can be
actively harmful because it leaks kernel addresses.

Linus says:

  "The stack dump actually goes back to forever, and it used to be
   useful back in 1992 or so. But it used to be useful mainly because
   stacks were simpler and we didn't have very good call traces anyway. I
   definitely remember having used them - I just do not remember having
   used them in the last ten+ years.

   Of course, it's still true that if you can trigger an oops, you've
   likely already lost the security game, but since the stack dump is so
   useless, let's aim to just remove it and make games like the above
   harder."

This also removes the related 'kstack=' cmdline option and the
'kstack_depth_to_print' sysctl.

Suggested-by: Linus Torvalds 
Signed-off-by: Josh Poimboeuf 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/e83bd50df52d8fe88e94d2566426ae40d813bf8f.1477405374.git.jpoim...@redhat.com
Signed-off-by: Ingo Molnar 
---
 Documentation/kernel-parameters.txt   |  3 --
 Documentation/sysctl/kernel.txt   |  8 -
 Documentation/x86/x86_64/boot-options.txt |  4 ---
 arch/x86/include/asm/stacktrace.h |  5 ---
 arch/x86/kernel/dumpstack.c   | 21 ++--
 arch/x86/kernel/dumpstack_32.c| 33 +--
 arch/x86/kernel/dumpstack_64.c| 53 +--
 kernel/sysctl.c   |  7 
 8 files changed, 4 insertions(+), 130 deletions(-)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 37babf9..049a917 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1958,9 +1958,6 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
kmemcheck=2 (one-shot mode)
Default: 2 (one-shot mode)
 
-   kstack=N[X86] Print N words from the kernel stack
-   in oops dumps.
-
kvm.ignore_msrs=[KVM] Ignore guest accesses to unhandled MSRs.
Default is 0 (don't ignore, but inject #GP)
 
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index ffab8b5..065f184 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -40,7 +40,6 @@ show up in /proc/sys/kernel:
 - hung_task_warnings
 - kexec_load_disabled
 - kptr_restrict
-- kstack_depth_to_print   [ X86 only ]
 - l2cr[ PPC only ]
 - modprobe==> Documentation/debugging-modules.txt
 - modules_disabled
@@ -395,13 +394,6 @@ When kptr_restrict is set to (2), kernel pointers printed 
using
 
 ==
 
-kstack_depth_to_print: (X86 only)
-
-Controls the number of words to print when dumping the raw
-kernel stack.
-
-==
-
 l2cr: (PPC only)
 
 This flag controls the L2 cache of G3 processor boards. If
diff --git a/Documentation/x86/x86_64/boot-options.txt 
b/Documentation/x86/x86_64/boot-options.txt
index 0965a71..61b611e 100644
--- a/Documentation/x86/x86_64/boot-options.txt
+++ b/Documentation/x86/x86_64/boot-options.txt
@@ -277,10 +277,6 @@ IOMMU (input/output memory management unit)
 space might stop working. Use this option if you have devices that
 are accessed from userspace directly on some PCI host bridge.
 
-Debugging
-
-  kstack=N Print N words from the kernel stack in oops dumps.
-
 Miscellaneous
 
nogbpages
diff --git a/arch/x86/include/asm/stacktrace.h 
b/arch/x86/include/asm/stacktrace.h
index 37f2e0b..1e375b0 100644
--- a/arch/x86/include/asm/stacktrace.h
+++ b/arch/x86/include/asm/stacktrace.h
@@ -43,8 +43,6 @@ static inline bool on_stack(struct stack_info *info, void 
*addr, size_t len)
addr + len > begin && addr + len <= end);
 }
 
-extern int kstack_depth_to_print;
-
 #ifdef CONFIG_X86_32
 #define STACKSLOTS_PER_LINE 8
 #else
@@ -86,9 +84,6 @@ get_stack_pointer(struct task_struct *task, struct pt_regs 
*regs)
 void 

[tip:x86/asm] x86/dumpstack: Remove kernel text addresses from stack dump

2016-10-25 Thread tip-bot for Josh Poimboeuf
Commit-ID:  bb5e5ce545f2031c96f7901cd8d1698ea3ca4c9c
Gitweb: http://git.kernel.org/tip/bb5e5ce545f2031c96f7901cd8d1698ea3ca4c9c
Author: Josh Poimboeuf 
AuthorDate: Tue, 25 Oct 2016 09:51:12 -0500
Committer:  Ingo Molnar 
CommitDate: Tue, 25 Oct 2016 18:40:37 +0200

x86/dumpstack: Remove kernel text addresses from stack dump

Printing kernel text addresses in stack dumps is of questionable value,
especially now that address randomization is becoming common.

It can be a security issue because it leaks kernel addresses.  It also
affects the usefulness of the stack dump.  Linus says:

  "I actually spend time cleaning up commit messages in logs, because
  useless data that isn't actually information (random hex numbers) is
  actively detrimental.

  It makes commit logs less legible.

  It also makes it harder to parse dumps.

  It's not useful. That makes it actively bad.

  I probably look at more oops reports than most people. I have not
  found the hex numbers useful for the last five years, because they are
  just randomized crap.

  The stack content thing just makes code scroll off the screen etc, for
  example."

The only real downside to removing these addresses is that they can be
used to disambiguate duplicate symbol names.  However such cases are
rare, and the context of the stack dump should be enough to be able to
figure it out.

There's now a 'faddr2line' script which can be used to convert a
function address to a file name and line:

  $ ./scripts/faddr2line ~/k/vmlinux write_sysrq_trigger+0x51/0x60
  write_sysrq_trigger+0x51/0x60:
  write_sysrq_trigger at drivers/tty/sysrq.c:1098

Or gdb can be used:

  $ echo "list *write_sysrq_trigger+0x51" |gdb ~/k/vmlinux |grep "is in"
  (gdb) 0x815b5d83 is in driver_probe_device 
(/home/jpoimboe/git/linux/drivers/base/dd.c:378).

(But note that when there are duplicate symbol names, gdb will only show
the first symbol it finds.  faddr2line is recommended over gdb because
it handles duplicates and it also does function size checking.)

Here's an example of what a stack dump looks like after this change:

  BUG: unable to handle kernel NULL pointer dereference at   (null)
  IP: sysrq_handle_crash+0x45/0x80
  PGD 36bfa067 [   29.650644] PUD 7aca3067
  Oops: 0002 [#1] PREEMPT SMP
  Modules linked in: ...
  CPU: 1 PID: 786 Comm: bash Tainted: GE   4.9.0-rc1+ #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 
04/01/2014
  task: 880078582a40 task.stack: c9ba8000
  RIP: 0010:sysrq_handle_crash+0x45/0x80
  RSP: 0018:c9babdc8 EFLAGS: 00010296
  RAX: 880078582a40 RBX: 0063 RCX: 0001
  RDX: 0001 RSI:  RDI: 0292
  RBP: c9babdc8 R08: 000b31866061 R09: 
  R10: 0001 R11:  R12: 
  R13: 0007 R14: 81ee8680 R15: 
  FS:  7ffb43869700() GS:88007d40() knlGS:
  CS:  0010 DS:  ES:  CR0: 80050033
  CR2:  CR3: 7a3e9000 CR4: 001406e0
  Stack:
   c9babe00 81572d08 81572bd5 0002
    880079606600 7ffb4386e000 c9babe20
   81573201 880036a3fd00 fffb c9babe40
  Call Trace:
   __handle_sysrq+0x138/0x220
   ? __handle_sysrq+0x5/0x220
   write_sysrq_trigger+0x51/0x60
   proc_reg_write+0x42/0x70
   __vfs_write+0x37/0x140
   ? preempt_count_sub+0xa1/0x100
   ? __sb_start_write+0xf5/0x210
   ? vfs_write+0x183/0x1a0
   vfs_write+0xb8/0x1a0
   SyS_write+0x58/0xc0
   entry_SYSCALL_64_fastpath+0x1f/0xc2
  RIP: 0033:0x7ffb42f55940
  RSP: 002b:7ffd33bb6b18 EFLAGS: 0246 ORIG_RAX: 0001
  RAX: ffda RBX: 0046 RCX: 7ffb42f55940
  RDX: 0002 RSI: 7ffb4386e000 RDI: 0001
  RBP: 0011 R08: 7ffb4321ea40 R09: 7ffb43869700
  R10: 7ffb43869700 R11: 0246 R12: 00778a10
  R13: 7ffd33bb5c00 R14: 0007 R15: 0010
  Code: 34 e8 d0 34 bc ff 48 c7 c2 3b 2b 57 81 be 01 00 00 00 48 c7 c7 e0 dd e5 
81 e8 a8 55 ba ff c7 05 0e 3f de 00 01 00 00 00 0f ae f8  04 25 00 00 00 00 
01 5d c3 e8 4c 49 bc ff 84 c0 75 c3 48 c7
  RIP: sysrq_handle_crash+0x45/0x80 RSP: c9babdc8
  CR2: 

Suggested-by: Linus Torvalds 
Signed-off-by: Josh Poimboeuf 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/69329cb29b8f324bb5fcea14d61d224807fb6488.1477405374.git.jpoim...@redhat.com
Signed-off-by: Ingo Molnar 

[tip:x86/asm] x86/dumpstack: Remove kernel text addresses from stack dump

2016-10-25 Thread tip-bot for Josh Poimboeuf
Commit-ID:  bb5e5ce545f2031c96f7901cd8d1698ea3ca4c9c
Gitweb: http://git.kernel.org/tip/bb5e5ce545f2031c96f7901cd8d1698ea3ca4c9c
Author: Josh Poimboeuf 
AuthorDate: Tue, 25 Oct 2016 09:51:12 -0500
Committer:  Ingo Molnar 
CommitDate: Tue, 25 Oct 2016 18:40:37 +0200

x86/dumpstack: Remove kernel text addresses from stack dump

Printing kernel text addresses in stack dumps is of questionable value,
especially now that address randomization is becoming common.

It can be a security issue because it leaks kernel addresses.  It also
affects the usefulness of the stack dump.  Linus says:

  "I actually spend time cleaning up commit messages in logs, because
  useless data that isn't actually information (random hex numbers) is
  actively detrimental.

  It makes commit logs less legible.

  It also makes it harder to parse dumps.

  It's not useful. That makes it actively bad.

  I probably look at more oops reports than most people. I have not
  found the hex numbers useful for the last five years, because they are
  just randomized crap.

  The stack content thing just makes code scroll off the screen etc, for
  example."

The only real downside to removing these addresses is that they can be
used to disambiguate duplicate symbol names.  However such cases are
rare, and the context of the stack dump should be enough to be able to
figure it out.

There's now a 'faddr2line' script which can be used to convert a
function address to a file name and line:

  $ ./scripts/faddr2line ~/k/vmlinux write_sysrq_trigger+0x51/0x60
  write_sysrq_trigger+0x51/0x60:
  write_sysrq_trigger at drivers/tty/sysrq.c:1098

Or gdb can be used:

  $ echo "list *write_sysrq_trigger+0x51" |gdb ~/k/vmlinux |grep "is in"
  (gdb) 0x815b5d83 is in driver_probe_device 
(/home/jpoimboe/git/linux/drivers/base/dd.c:378).

(But note that when there are duplicate symbol names, gdb will only show
the first symbol it finds.  faddr2line is recommended over gdb because
it handles duplicates and it also does function size checking.)

Here's an example of what a stack dump looks like after this change:

  BUG: unable to handle kernel NULL pointer dereference at   (null)
  IP: sysrq_handle_crash+0x45/0x80
  PGD 36bfa067 [   29.650644] PUD 7aca3067
  Oops: 0002 [#1] PREEMPT SMP
  Modules linked in: ...
  CPU: 1 PID: 786 Comm: bash Tainted: GE   4.9.0-rc1+ #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 
04/01/2014
  task: 880078582a40 task.stack: c9ba8000
  RIP: 0010:sysrq_handle_crash+0x45/0x80
  RSP: 0018:c9babdc8 EFLAGS: 00010296
  RAX: 880078582a40 RBX: 0063 RCX: 0001
  RDX: 0001 RSI:  RDI: 0292
  RBP: c9babdc8 R08: 000b31866061 R09: 
  R10: 0001 R11:  R12: 
  R13: 0007 R14: 81ee8680 R15: 
  FS:  7ffb43869700() GS:88007d40() knlGS:
  CS:  0010 DS:  ES:  CR0: 80050033
  CR2:  CR3: 7a3e9000 CR4: 001406e0
  Stack:
   c9babe00 81572d08 81572bd5 0002
    880079606600 7ffb4386e000 c9babe20
   81573201 880036a3fd00 fffb c9babe40
  Call Trace:
   __handle_sysrq+0x138/0x220
   ? __handle_sysrq+0x5/0x220
   write_sysrq_trigger+0x51/0x60
   proc_reg_write+0x42/0x70
   __vfs_write+0x37/0x140
   ? preempt_count_sub+0xa1/0x100
   ? __sb_start_write+0xf5/0x210
   ? vfs_write+0x183/0x1a0
   vfs_write+0xb8/0x1a0
   SyS_write+0x58/0xc0
   entry_SYSCALL_64_fastpath+0x1f/0xc2
  RIP: 0033:0x7ffb42f55940
  RSP: 002b:7ffd33bb6b18 EFLAGS: 0246 ORIG_RAX: 0001
  RAX: ffda RBX: 0046 RCX: 7ffb42f55940
  RDX: 0002 RSI: 7ffb4386e000 RDI: 0001
  RBP: 0011 R08: 7ffb4321ea40 R09: 7ffb43869700
  R10: 7ffb43869700 R11: 0246 R12: 00778a10
  R13: 7ffd33bb5c00 R14: 0007 R15: 0010
  Code: 34 e8 d0 34 bc ff 48 c7 c2 3b 2b 57 81 be 01 00 00 00 48 c7 c7 e0 dd e5 
81 e8 a8 55 ba ff c7 05 0e 3f de 00 01 00 00 00 0f ae f8  04 25 00 00 00 00 
01 5d c3 e8 4c 49 bc ff 84 c0 75 c3 48 c7
  RIP: sysrq_handle_crash+0x45/0x80 RSP: c9babdc8
  CR2: 

Suggested-by: Linus Torvalds 
Signed-off-by: Josh Poimboeuf 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/69329cb29b8f324bb5fcea14d61d224807fb6488.1477405374.git.jpoim...@redhat.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/kdebug.h |  1 -
 arch/x86/kernel/dumpstack.c   | 18 --
 arch/x86/kernel/process_32.c  |  7 +++
 arch/x86/kernel/process_64.c  |  6 +++---
 arch/x86/mm/fault.c   |  3 

[tip:x86/asm] scripts/faddr2line: Fix "size mismatch" error

2016-10-25 Thread tip-bot for Josh Poimboeuf
Commit-ID:  efdb4167e676aaba7505bec739785b76e206cb45
Gitweb: http://git.kernel.org/tip/efdb4167e676aaba7505bec739785b76e206cb45
Author: Josh Poimboeuf 
AuthorDate: Tue, 25 Oct 2016 09:51:11 -0500
Committer:  Ingo Molnar 
CommitDate: Tue, 25 Oct 2016 18:40:37 +0200

scripts/faddr2line: Fix "size mismatch" error

I'm not sure how we missed this problem before.  When I take a function
address and size from an oops and give it to faddr2line, it usually
complains about a size mismatch:

  $ scripts/faddr2line ~/k/vmlinux write_sysrq_trigger+0x51/0x60
  skipping write_sysrq_trigger address at 0x815731a1 due to size 
mismatch (0x60 != 83)
  no match for write_sysrq_trigger+0x51/0x60

The problem is caused by differences in how kallsyms and faddr2line
determine a function's size.

kallsyms calculates a function's size by parsing the output of 'nm -n'
and subtracting the next function's address from the current function's
address.  This means that nop instructions after the end of the function
are included in the size.

In contrast, faddr2line reads the size from the symbol table, which does
*not* include the ending nops in the function's size.

Change faddr2line to calculate the size from the output of 'nm -n' to be
consistent with kallsyms and oops outputs.

Signed-off-by: Josh Poimboeuf 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/bd313ed7c4003f6b1fda63e825325c44a9d837de.1477405374.git.jpoim...@redhat.com
Signed-off-by: Ingo Molnar 
---
 scripts/faddr2line | 33 +
 1 file changed, 21 insertions(+), 12 deletions(-)

diff --git a/scripts/faddr2line b/scripts/faddr2line
index 450b332..29df825 100755
--- a/scripts/faddr2line
+++ b/scripts/faddr2line
@@ -105,9 +105,18 @@ __faddr2line() {
# In rare cases there might be duplicates.
while read symbol; do
local fields=($symbol)
-   local sym_base=0x${fields[1]}
-   local sym_size=${fields[2]}
-   local sym_type=${fields[3]}
+   local sym_base=0x${fields[0]}
+   local sym_type=${fields[1]}
+   local sym_end=0x${fields[3]}
+
+   # calculate the size
+   local sym_size=$(($sym_end - $sym_base))
+   if [[ -z $sym_size ]] || [[ $sym_size -le 0 ]]; then
+   warn "bad symbol size: base: $sym_base end: $sym_end"
+   DONE=1
+   return
+   fi
+   sym_size=0x$(printf %x $sym_size)
 
# calculate the address
local addr=$(($sym_base + $offset))
@@ -116,26 +125,26 @@ __faddr2line() {
DONE=1
return
fi
-   local hexaddr=0x$(printf %x $addr)
+   addr=0x$(printf %x $addr)
 
# weed out non-function symbols
-   if [[ $sym_type != "FUNC" ]]; then
+   if [[ $sym_type != t ]] && [[ $sym_type != T ]]; then
[[ $print_warnings = 1 ]] &&
-   echo "skipping $func address at $hexaddr due to 
non-function symbol"
+   echo "skipping $func address at $addr due to 
non-function symbol of type '$sym_type'"
continue
fi
 
# if the user provided a size, make sure it matches the 
symbol's size
if [[ -n $size ]] && [[ $size -ne $sym_size ]]; then
[[ $print_warnings = 1 ]] &&
-   echo "skipping $func address at $hexaddr due to 
size mismatch ($size != $sym_size)"
+   echo "skipping $func address at $addr due to 
size mismatch ($size != $sym_size)"
continue;
fi
 
# make sure the provided offset is within the symbol's range
if [[ $offset -gt $sym_size ]]; then
[[ $print_warnings = 1 ]] &&
-   echo "skipping $func address at $hexaddr due to 
size mismatch ($offset > $sym_size)"
+   echo "skipping $func address at $addr due to 
size mismatch ($offset > $sym_size)"
continue
fi
 
@@ -143,12 +152,12 @@ __faddr2line() {
[[ $FIRST = 0 ]] && echo
FIRST=0
 
-   local hexsize=0x$(printf %x $sym_size)
-   echo "$func+$offset/$hexsize:"
-   addr2line -fpie $objfile $hexaddr | sed "s; 
$dir_prefix\(\./\)*; ;"
+   # pass real 

[tip:x86/asm] scripts/faddr2line: Fix "size mismatch" error

2016-10-25 Thread tip-bot for Josh Poimboeuf
Commit-ID:  efdb4167e676aaba7505bec739785b76e206cb45
Gitweb: http://git.kernel.org/tip/efdb4167e676aaba7505bec739785b76e206cb45
Author: Josh Poimboeuf 
AuthorDate: Tue, 25 Oct 2016 09:51:11 -0500
Committer:  Ingo Molnar 
CommitDate: Tue, 25 Oct 2016 18:40:37 +0200

scripts/faddr2line: Fix "size mismatch" error

I'm not sure how we missed this problem before.  When I take a function
address and size from an oops and give it to faddr2line, it usually
complains about a size mismatch:

  $ scripts/faddr2line ~/k/vmlinux write_sysrq_trigger+0x51/0x60
  skipping write_sysrq_trigger address at 0x815731a1 due to size 
mismatch (0x60 != 83)
  no match for write_sysrq_trigger+0x51/0x60

The problem is caused by differences in how kallsyms and faddr2line
determine a function's size.

kallsyms calculates a function's size by parsing the output of 'nm -n'
and subtracting the next function's address from the current function's
address.  This means that nop instructions after the end of the function
are included in the size.

In contrast, faddr2line reads the size from the symbol table, which does
*not* include the ending nops in the function's size.

Change faddr2line to calculate the size from the output of 'nm -n' to be
consistent with kallsyms and oops outputs.

Signed-off-by: Josh Poimboeuf 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/bd313ed7c4003f6b1fda63e825325c44a9d837de.1477405374.git.jpoim...@redhat.com
Signed-off-by: Ingo Molnar 
---
 scripts/faddr2line | 33 +
 1 file changed, 21 insertions(+), 12 deletions(-)

diff --git a/scripts/faddr2line b/scripts/faddr2line
index 450b332..29df825 100755
--- a/scripts/faddr2line
+++ b/scripts/faddr2line
@@ -105,9 +105,18 @@ __faddr2line() {
# In rare cases there might be duplicates.
while read symbol; do
local fields=($symbol)
-   local sym_base=0x${fields[1]}
-   local sym_size=${fields[2]}
-   local sym_type=${fields[3]}
+   local sym_base=0x${fields[0]}
+   local sym_type=${fields[1]}
+   local sym_end=0x${fields[3]}
+
+   # calculate the size
+   local sym_size=$(($sym_end - $sym_base))
+   if [[ -z $sym_size ]] || [[ $sym_size -le 0 ]]; then
+   warn "bad symbol size: base: $sym_base end: $sym_end"
+   DONE=1
+   return
+   fi
+   sym_size=0x$(printf %x $sym_size)
 
# calculate the address
local addr=$(($sym_base + $offset))
@@ -116,26 +125,26 @@ __faddr2line() {
DONE=1
return
fi
-   local hexaddr=0x$(printf %x $addr)
+   addr=0x$(printf %x $addr)
 
# weed out non-function symbols
-   if [[ $sym_type != "FUNC" ]]; then
+   if [[ $sym_type != t ]] && [[ $sym_type != T ]]; then
[[ $print_warnings = 1 ]] &&
-   echo "skipping $func address at $hexaddr due to 
non-function symbol"
+   echo "skipping $func address at $addr due to 
non-function symbol of type '$sym_type'"
continue
fi
 
# if the user provided a size, make sure it matches the 
symbol's size
if [[ -n $size ]] && [[ $size -ne $sym_size ]]; then
[[ $print_warnings = 1 ]] &&
-   echo "skipping $func address at $hexaddr due to 
size mismatch ($size != $sym_size)"
+   echo "skipping $func address at $addr due to 
size mismatch ($size != $sym_size)"
continue;
fi
 
# make sure the provided offset is within the symbol's range
if [[ $offset -gt $sym_size ]]; then
[[ $print_warnings = 1 ]] &&
-   echo "skipping $func address at $hexaddr due to 
size mismatch ($offset > $sym_size)"
+   echo "skipping $func address at $addr due to 
size mismatch ($offset > $sym_size)"
continue
fi
 
@@ -143,12 +152,12 @@ __faddr2line() {
[[ $FIRST = 0 ]] && echo
FIRST=0
 
-   local hexsize=0x$(printf %x $sym_size)
-   echo "$func+$offset/$hexsize:"
-   addr2line -fpie $objfile $hexaddr | sed "s; 
$dir_prefix\(\./\)*; ;"
+   # pass real address to addr2line
+   echo "$func+$offset/$sym_size:"
+   addr2line -fpie $objfile $addr | sed "s; $dir_prefix\(\./\)*; ;"
DONE=1
 
-   done < <(readelf -sW $objfile | awk -v f=$func '$8 == f 

[PATCH v2 3/3] kernel/smp: Tell the user we're bringing up secondary CPUs

2016-10-25 Thread Michael Ellerman
Currently we don't print anything before starting to bring up secondary
CPUs. This can be confusing if it takes a long time to bring up the
secondaries, or if the kernel crashes while doing so and produces no
further output.

On x86 they work around this by detecting when the first secondary CPU
comes up and printing a message (see announce_cpu()). But doing it in
smp_init() is simpler and works for all arches.

Signed-off-by: Michael Ellerman 
Reviewed-by: Borislav Petkov 
---
 kernel/smp.c | 2 ++
 1 file changed, 2 insertions(+)

v2: Drop "smp:" from pr_info() now we have pr_fmt() defined.

diff --git a/kernel/smp.c b/kernel/smp.c
index 4323c5db7d26..77fcdb9f2775 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -555,6 +555,8 @@ void __init smp_init(void)
idle_threads_init();
cpuhp_threads_init();
 
+   pr_info("Bringing up secondary CPUs ...\n");
+
/* FIXME: This should be done in userspace --RR */
for_each_present_cpu(cpu) {
if (num_online_cpus() >= setup_max_cpus)
-- 
2.7.4



[PATCH v2 3/3] kernel/smp: Tell the user we're bringing up secondary CPUs

2016-10-25 Thread Michael Ellerman
Currently we don't print anything before starting to bring up secondary
CPUs. This can be confusing if it takes a long time to bring up the
secondaries, or if the kernel crashes while doing so and produces no
further output.

On x86 they work around this by detecting when the first secondary CPU
comes up and printing a message (see announce_cpu()). But doing it in
smp_init() is simpler and works for all arches.

Signed-off-by: Michael Ellerman 
Reviewed-by: Borislav Petkov 
---
 kernel/smp.c | 2 ++
 1 file changed, 2 insertions(+)

v2: Drop "smp:" from pr_info() now we have pr_fmt() defined.

diff --git a/kernel/smp.c b/kernel/smp.c
index 4323c5db7d26..77fcdb9f2775 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -555,6 +555,8 @@ void __init smp_init(void)
idle_threads_init();
cpuhp_threads_init();
 
+   pr_info("Bringing up secondary CPUs ...\n");
+
/* FIXME: This should be done in userspace --RR */
for_each_present_cpu(cpu) {
if (num_online_cpus() >= setup_max_cpus)
-- 
2.7.4



[PATCH v2 2/3] kernel/smp: Make the SMP boot message common on all arches

2016-10-25 Thread Michael Ellerman
Currently after bringing up secondary CPUs all arches print "Brought up
%d CPUs". On x86 they also print the number of nodes that were brought
online.

It would be nice to also print the number of nodes on other arches.
Although we could override smp_announce() on the other ~10 NUMA aware
arches, it seems simpler to just always print the number of nodes. On
non-NUMA arches there is just always 1 node.

Having done that, smp_announce() is no longer weak, and seems small
enough to just pull directly into smp_init().

Also update the printing of "%d CPUs" to be smart when an SMP kernel is
booted on a single CPU system, or when only one CPU is available, eg:

   smp: Brought up 2 nodes, 1 CPU

Signed-off-by: Michael Ellerman 
Reviewed-by: Borislav Petkov 
---
 arch/x86/kernel/smpboot.c |  8 
 kernel/smp.c  | 13 +++--
 2 files changed, 7 insertions(+), 14 deletions(-)

v2: Print singular CPU when only 1 CPU is found.
Drop "smp:" from pr_info() now we have pr_fmt() defined.

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 42f5eb7b4f6c..b9f02383f372 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -821,14 +821,6 @@ wakeup_secondary_cpu_via_init(int phys_apicid, unsigned 
long start_eip)
return (send_status | accept_status);
 }
 
-void smp_announce(void)
-{
-   int num_nodes = num_online_nodes();
-
-   printk(KERN_INFO "x86: Booted up %d node%s, %d CPUs\n",
-  num_nodes, (num_nodes > 1 ? "s" : ""), num_online_cpus());
-}
-
 /* reduce the number of lines printed when booting a large cpu count system */
 static void announce_cpu(int cpu, int apicid)
 {
diff --git a/kernel/smp.c b/kernel/smp.c
index 2d1f15d43022..4323c5db7d26 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -546,14 +546,10 @@ void __init setup_nr_cpu_ids(void)
nr_cpu_ids = find_last_bit(cpumask_bits(cpu_possible_mask),NR_CPUS) + 1;
 }
 
-void __weak smp_announce(void)
-{
-   printk(KERN_INFO "Brought up %d CPUs\n", num_online_cpus());
-}
-
 /* Called by boot processor to activate the rest. */
 void __init smp_init(void)
 {
+   int num_nodes, num_cpus;
unsigned int cpu;
 
idle_threads_init();
@@ -567,8 +563,13 @@ void __init smp_init(void)
cpu_up(cpu);
}
 
+   num_nodes = num_online_nodes();
+   num_cpus  = num_online_cpus();
+   pr_info("Brought up %d node%s, %d CPU%s\n",
+   num_nodes, (num_nodes > 1 ? "s" : ""),
+   num_cpus,  (num_cpus  > 1 ? "s" : ""));
+
/* Any cleanup work */
-   smp_announce();
smp_cpus_done(setup_max_cpus);
 }
 
-- 
2.7.4



[PATCH v2 2/3] kernel/smp: Make the SMP boot message common on all arches

2016-10-25 Thread Michael Ellerman
Currently after bringing up secondary CPUs all arches print "Brought up
%d CPUs". On x86 they also print the number of nodes that were brought
online.

It would be nice to also print the number of nodes on other arches.
Although we could override smp_announce() on the other ~10 NUMA aware
arches, it seems simpler to just always print the number of nodes. On
non-NUMA arches there is just always 1 node.

Having done that, smp_announce() is no longer weak, and seems small
enough to just pull directly into smp_init().

Also update the printing of "%d CPUs" to be smart when an SMP kernel is
booted on a single CPU system, or when only one CPU is available, eg:

   smp: Brought up 2 nodes, 1 CPU

Signed-off-by: Michael Ellerman 
Reviewed-by: Borislav Petkov 
---
 arch/x86/kernel/smpboot.c |  8 
 kernel/smp.c  | 13 +++--
 2 files changed, 7 insertions(+), 14 deletions(-)

v2: Print singular CPU when only 1 CPU is found.
Drop "smp:" from pr_info() now we have pr_fmt() defined.

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 42f5eb7b4f6c..b9f02383f372 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -821,14 +821,6 @@ wakeup_secondary_cpu_via_init(int phys_apicid, unsigned 
long start_eip)
return (send_status | accept_status);
 }
 
-void smp_announce(void)
-{
-   int num_nodes = num_online_nodes();
-
-   printk(KERN_INFO "x86: Booted up %d node%s, %d CPUs\n",
-  num_nodes, (num_nodes > 1 ? "s" : ""), num_online_cpus());
-}
-
 /* reduce the number of lines printed when booting a large cpu count system */
 static void announce_cpu(int cpu, int apicid)
 {
diff --git a/kernel/smp.c b/kernel/smp.c
index 2d1f15d43022..4323c5db7d26 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -546,14 +546,10 @@ void __init setup_nr_cpu_ids(void)
nr_cpu_ids = find_last_bit(cpumask_bits(cpu_possible_mask),NR_CPUS) + 1;
 }
 
-void __weak smp_announce(void)
-{
-   printk(KERN_INFO "Brought up %d CPUs\n", num_online_cpus());
-}
-
 /* Called by boot processor to activate the rest. */
 void __init smp_init(void)
 {
+   int num_nodes, num_cpus;
unsigned int cpu;
 
idle_threads_init();
@@ -567,8 +563,13 @@ void __init smp_init(void)
cpu_up(cpu);
}
 
+   num_nodes = num_online_nodes();
+   num_cpus  = num_online_cpus();
+   pr_info("Brought up %d node%s, %d CPU%s\n",
+   num_nodes, (num_nodes > 1 ? "s" : ""),
+   num_cpus,  (num_cpus  > 1 ? "s" : ""));
+
/* Any cleanup work */
-   smp_announce();
smp_cpus_done(setup_max_cpus);
 }
 
-- 
2.7.4



[PATCH v2 1/3] kernel/smp: Define pr_fmt() for smp.c

2016-10-25 Thread Michael Ellerman
This makes all our pr_xxx()'s start with "smp: ", which helps pin down
where they come from and generally looks nice. There is actually only
one pr_xxx() use in smp.c at the moment, but we will add some more in
the next commit.

Suggested-by: Borislav Petkov 
Signed-off-by: Michael Ellerman 
---
 kernel/smp.c | 3 +++
 1 file changed, 3 insertions(+)

v2: New in v2.

diff --git a/kernel/smp.c b/kernel/smp.c
index bba3b201668d..2d1f15d43022 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -3,6 +3,9 @@
  *
  * (C) Jens Axboe  2008
  */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
 #include 
 #include 
 #include 
-- 
2.7.4



[PATCH v2 1/3] kernel/smp: Define pr_fmt() for smp.c

2016-10-25 Thread Michael Ellerman
This makes all our pr_xxx()'s start with "smp: ", which helps pin down
where they come from and generally looks nice. There is actually only
one pr_xxx() use in smp.c at the moment, but we will add some more in
the next commit.

Suggested-by: Borislav Petkov 
Signed-off-by: Michael Ellerman 
---
 kernel/smp.c | 3 +++
 1 file changed, 3 insertions(+)

v2: New in v2.

diff --git a/kernel/smp.c b/kernel/smp.c
index bba3b201668d..2d1f15d43022 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -3,6 +3,9 @@
  *
  * (C) Jens Axboe  2008
  */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
 #include 
 #include 
 #include 
-- 
2.7.4



Re: [PATCH 1/2] kernel/smp: Make the SMP boot message common on all arches

2016-10-25 Thread Michael Ellerman
Ingo Molnar  writes:
> * Michael Ellerman  wrote:
>> @@ -564,8 +560,11 @@ void __init smp_init(void)
>>  cpu_up(cpu);
>>  }
>>  
>> +num_nodes = num_online_nodes();
>> +pr_info("smp: Brought up %d node%s, %d CPUs\n",
>> +num_nodes, (num_nodes > 1 ? "s" : ""), num_online_cpus());
>
> No objections - but pedantry requires me to mention that while we are 
> evolving 
> this code and changing the strings I think we should make the CPU 
> announcement 
> CPU%s smart as well: an SMP kernel on a single CPU bootup will result in 
> num_online_cpus() == 1, right?

Yeah that makes sense. I don't often boot any single CPU systems, but I
tested with maxcpus=1 and it does look nicer:

smp: Brought up 2 nodes, 1 CPU


Will send a v2.

cheers


Re: [PATCH 1/2] kernel/smp: Make the SMP boot message common on all arches

2016-10-25 Thread Michael Ellerman
Ingo Molnar  writes:
> * Michael Ellerman  wrote:
>> @@ -564,8 +560,11 @@ void __init smp_init(void)
>>  cpu_up(cpu);
>>  }
>>  
>> +num_nodes = num_online_nodes();
>> +pr_info("smp: Brought up %d node%s, %d CPUs\n",
>> +num_nodes, (num_nodes > 1 ? "s" : ""), num_online_cpus());
>
> No objections - but pedantry requires me to mention that while we are 
> evolving 
> this code and changing the strings I think we should make the CPU 
> announcement 
> CPU%s smart as well: an SMP kernel on a single CPU bootup will result in 
> num_online_cpus() == 1, right?

Yeah that makes sense. I don't often boot any single CPU systems, but I
tested with maxcpus=1 and it does look nicer:

smp: Brought up 2 nodes, 1 CPU


Will send a v2.

cheers


Re: [PATCH 1/2] kernel/smp: Make the SMP boot message common on all arches

2016-10-25 Thread Michael Ellerman
Borislav Petkov  writes:
> On Thu, Oct 13, 2016 at 07:55:19PM +1100, Michael Ellerman wrote:
>> @@ -564,8 +560,11 @@ void __init smp_init(void)
>>  cpu_up(cpu);
>>  }
>>  
>> +num_nodes = num_online_nodes();
>> +pr_info("smp: Brought up %d node%s, %d CPUs\n",
>> +num_nodes, (num_nodes > 1 ? "s" : ""), num_online_cpus());
>
> Please define pr_fmt for this file so that pr_info adds the prefix
> automatically. I guess
>
>   #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>
> at the top, before all the include directives should suffice.

Sure thing.

> Other than that, for both patches:
>
> Reviewed-by: Borislav Petkov 

Thanks, v2 coming soon.

cheers


Re: [PATCH 1/2] kernel/smp: Make the SMP boot message common on all arches

2016-10-25 Thread Michael Ellerman
Borislav Petkov  writes:
> On Thu, Oct 13, 2016 at 07:55:19PM +1100, Michael Ellerman wrote:
>> @@ -564,8 +560,11 @@ void __init smp_init(void)
>>  cpu_up(cpu);
>>  }
>>  
>> +num_nodes = num_online_nodes();
>> +pr_info("smp: Brought up %d node%s, %d CPUs\n",
>> +num_nodes, (num_nodes > 1 ? "s" : ""), num_online_cpus());
>
> Please define pr_fmt for this file so that pr_info adds the prefix
> automatically. I guess
>
>   #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>
> at the top, before all the include directives should suffice.

Sure thing.

> Other than that, for both patches:
>
> Reviewed-by: Borislav Petkov 

Thanks, v2 coming soon.

cheers


[PATCH] arm64: defconfig: Enable DRM DU and V4L2 FCP + VSP modules

2016-10-25 Thread Magnus Damm
From: Magnus Damm 

Extend the ARM64 defconfig to enable the DU DRM device as module
together with required dependencies of V4L2 FCP and VSP modules.

This enables VGA output on the r8a7795 Salvator-X board.

Signed-off-by: Magnus Damm 
---

 Written against next-20161026

 arch/arm64/configs/defconfig |   14 ++
 1 file changed, 14 insertions(+)

--- 0001/arch/arm64/configs/defconfig
+++ work/arch/arm64/configs/defconfig   2016-10-26 14:10:58.220607110 +0900
@@ -293,8 +293,22 @@ CONFIG_REGULATOR_PWM=y
 CONFIG_REGULATOR_QCOM_SMD_RPM=y
 CONFIG_REGULATOR_QCOM_SPMI=y
 CONFIG_REGULATOR_S2MPS11=y
+CONFIG_MEDIA_SUPPORT=m
+CONFIG_MEDIA_CAMERA_SUPPORT=y
+CONFIG_MEDIA_ANALOG_TV_SUPPORT=y
+CONFIG_MEDIA_DIGITAL_TV_SUPPORT=y
+CONFIG_MEDIA_CONTROLLER=y
+CONFIG_VIDEO_V4L2_SUBDEV_API=y
+# CONFIG_DVB_NET is not set
+CONFIG_V4L_MEM2MEM_DRIVERS=y
+CONFIG_VIDEO_RENESAS_FCP=m
+CONFIG_VIDEO_RENESAS_VSP1=m
 CONFIG_DRM=m
 CONFIG_DRM_NOUVEAU=m
+CONFIG_DRM_RCAR_DU=m
+CONFIG_DRM_RCAR_HDMI=y
+CONFIG_DRM_RCAR_LVDS=y
+CONFIG_DRM_RCAR_VSP=y
 CONFIG_DRM_TEGRA=m
 CONFIG_DRM_PANEL_SIMPLE=m
 CONFIG_DRM_I2C_ADV7511=m


[PATCH] arm64: defconfig: Enable DRM DU and V4L2 FCP + VSP modules

2016-10-25 Thread Magnus Damm
From: Magnus Damm 

Extend the ARM64 defconfig to enable the DU DRM device as module
together with required dependencies of V4L2 FCP and VSP modules.

This enables VGA output on the r8a7795 Salvator-X board.

Signed-off-by: Magnus Damm 
---

 Written against next-20161026

 arch/arm64/configs/defconfig |   14 ++
 1 file changed, 14 insertions(+)

--- 0001/arch/arm64/configs/defconfig
+++ work/arch/arm64/configs/defconfig   2016-10-26 14:10:58.220607110 +0900
@@ -293,8 +293,22 @@ CONFIG_REGULATOR_PWM=y
 CONFIG_REGULATOR_QCOM_SMD_RPM=y
 CONFIG_REGULATOR_QCOM_SPMI=y
 CONFIG_REGULATOR_S2MPS11=y
+CONFIG_MEDIA_SUPPORT=m
+CONFIG_MEDIA_CAMERA_SUPPORT=y
+CONFIG_MEDIA_ANALOG_TV_SUPPORT=y
+CONFIG_MEDIA_DIGITAL_TV_SUPPORT=y
+CONFIG_MEDIA_CONTROLLER=y
+CONFIG_VIDEO_V4L2_SUBDEV_API=y
+# CONFIG_DVB_NET is not set
+CONFIG_V4L_MEM2MEM_DRIVERS=y
+CONFIG_VIDEO_RENESAS_FCP=m
+CONFIG_VIDEO_RENESAS_VSP1=m
 CONFIG_DRM=m
 CONFIG_DRM_NOUVEAU=m
+CONFIG_DRM_RCAR_DU=m
+CONFIG_DRM_RCAR_HDMI=y
+CONFIG_DRM_RCAR_LVDS=y
+CONFIG_DRM_RCAR_VSP=y
 CONFIG_DRM_TEGRA=m
 CONFIG_DRM_PANEL_SIMPLE=m
 CONFIG_DRM_I2C_ADV7511=m


[PATCH 3/3] x86/vmware: Add paravirt sched clock

2016-10-25 Thread Alexey Makhalov
Set pv_time_ops.sched_clock to vmware_sched_clock(). It is simplified
version of native_sched_clock() without ring buffer of mult/shift/offset
triplets and preempt toggling.
Since VMware hypervisor provides constant tsc we can use constant
mult/shift/offset triplet calculated at boot time.

no-vmw-sched-clock kernel parameter is added to switch back to the
native_sched_clock() implementation.

Signed-off-by: Alexey Makhalov 
Acked-by: Alok N Kataria 
---
 Documentation/kernel-parameters.txt |  4 
 arch/x86/kernel/cpu/vmware.c| 38 +
 2 files changed, 42 insertions(+)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 37babf9..b3b2ec0 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2754,6 +2754,10 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
no-kvmapf   [X86,KVM] Disable paravirtualized asynchronous page
fault handling.
 
+   no-vmw-sched-clock
+   [X86,PV_OPS] Disable paravirtualized VMware scheduler
+   clock and use the default one.
+
no-steal-acc[X86,KVM] Disable paravirtualized steal time accounting.
steal time is computed, but won't influence scheduler
behaviour
diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c
index e3fb320..6ef22c1 100644
--- a/arch/x86/kernel/cpu/vmware.c
+++ b/arch/x86/kernel/cpu/vmware.c
@@ -24,10 +24,12 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
+#include 
 
 #define CPUID_VMWARE_INFO_LEAF 0x4000
 #define VMWARE_HYPERVISOR_MAGIC0x564D5868
@@ -62,10 +64,46 @@ static unsigned long vmware_get_tsc_khz(void)
 }
 
 #ifdef CONFIG_PARAVIRT
+static struct cyc2ns_data vmware_cyc2ns __ro_after_init;
+
+static int vmw_sched_clock __initdata = 1;
+static __init int setup_vmw_sched_clock(char *s)
+{
+   vmw_sched_clock = 0;
+   return 0;
+}
+early_param("no-vmw-sched-clock", setup_vmw_sched_clock);
+
+static unsigned long long vmware_sched_clock(void)
+{
+   unsigned long long ns;
+
+   ns = mul_u64_u32_shr(rdtsc(), vmware_cyc2ns.cyc2ns_mul,
+vmware_cyc2ns.cyc2ns_shift);
+   ns -= vmware_cyc2ns.cyc2ns_offset;
+   return ns;
+}
+
 static void __init vmware_paravirt_ops_setup(void)
 {
pv_info.name = "VMware";
pv_cpu_ops.io_delay = paravirt_nop;
+
+   if (vmware_tsc_khz && vmw_sched_clock) {
+   unsigned long long tsc_now = rdtsc();
+
+   clocks_calc_mult_shift(_cyc2ns.cyc2ns_mul,
+  _cyc2ns.cyc2ns_shift,
+  vmware_tsc_khz,
+  NSEC_PER_MSEC, 0);
+   vmware_cyc2ns.cyc2ns_offset =
+   mul_u64_u32_shr(tsc_now, vmware_cyc2ns.cyc2ns_mul,
+   vmware_cyc2ns.cyc2ns_shift);
+
+   pv_time_ops.sched_clock = vmware_sched_clock;
+   pr_info("vmware: using sched offset of %llu ns\n",
+   vmware_cyc2ns.cyc2ns_offset);
+   }
 }
 #else
 #define vmware_paravirt_ops_setup() do {} while (0)
-- 
2.10.1



[PATCH 3/3] x86/vmware: Add paravirt sched clock

2016-10-25 Thread Alexey Makhalov
Set pv_time_ops.sched_clock to vmware_sched_clock(). It is simplified
version of native_sched_clock() without ring buffer of mult/shift/offset
triplets and preempt toggling.
Since VMware hypervisor provides constant tsc we can use constant
mult/shift/offset triplet calculated at boot time.

no-vmw-sched-clock kernel parameter is added to switch back to the
native_sched_clock() implementation.

Signed-off-by: Alexey Makhalov 
Acked-by: Alok N Kataria 
---
 Documentation/kernel-parameters.txt |  4 
 arch/x86/kernel/cpu/vmware.c| 38 +
 2 files changed, 42 insertions(+)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 37babf9..b3b2ec0 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2754,6 +2754,10 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
no-kvmapf   [X86,KVM] Disable paravirtualized asynchronous page
fault handling.
 
+   no-vmw-sched-clock
+   [X86,PV_OPS] Disable paravirtualized VMware scheduler
+   clock and use the default one.
+
no-steal-acc[X86,KVM] Disable paravirtualized steal time accounting.
steal time is computed, but won't influence scheduler
behaviour
diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c
index e3fb320..6ef22c1 100644
--- a/arch/x86/kernel/cpu/vmware.c
+++ b/arch/x86/kernel/cpu/vmware.c
@@ -24,10 +24,12 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
+#include 
 
 #define CPUID_VMWARE_INFO_LEAF 0x4000
 #define VMWARE_HYPERVISOR_MAGIC0x564D5868
@@ -62,10 +64,46 @@ static unsigned long vmware_get_tsc_khz(void)
 }
 
 #ifdef CONFIG_PARAVIRT
+static struct cyc2ns_data vmware_cyc2ns __ro_after_init;
+
+static int vmw_sched_clock __initdata = 1;
+static __init int setup_vmw_sched_clock(char *s)
+{
+   vmw_sched_clock = 0;
+   return 0;
+}
+early_param("no-vmw-sched-clock", setup_vmw_sched_clock);
+
+static unsigned long long vmware_sched_clock(void)
+{
+   unsigned long long ns;
+
+   ns = mul_u64_u32_shr(rdtsc(), vmware_cyc2ns.cyc2ns_mul,
+vmware_cyc2ns.cyc2ns_shift);
+   ns -= vmware_cyc2ns.cyc2ns_offset;
+   return ns;
+}
+
 static void __init vmware_paravirt_ops_setup(void)
 {
pv_info.name = "VMware";
pv_cpu_ops.io_delay = paravirt_nop;
+
+   if (vmware_tsc_khz && vmw_sched_clock) {
+   unsigned long long tsc_now = rdtsc();
+
+   clocks_calc_mult_shift(_cyc2ns.cyc2ns_mul,
+  _cyc2ns.cyc2ns_shift,
+  vmware_tsc_khz,
+  NSEC_PER_MSEC, 0);
+   vmware_cyc2ns.cyc2ns_offset =
+   mul_u64_u32_shr(tsc_now, vmware_cyc2ns.cyc2ns_mul,
+   vmware_cyc2ns.cyc2ns_shift);
+
+   pv_time_ops.sched_clock = vmware_sched_clock;
+   pr_info("vmware: using sched offset of %llu ns\n",
+   vmware_cyc2ns.cyc2ns_offset);
+   }
 }
 #else
 #define vmware_paravirt_ops_setup() do {} while (0)
-- 
2.10.1



[PATCH 2/3] x86/vmware: Add basic paravirt ops support

2016-10-25 Thread Alexey Makhalov
Add basic paravirt support:
 1. set pv_info.name to "VMware" to have proper boot log message
Booting paravirtualized kernel on VMware
instead of "... on bare hardware"
 2. set pv_cpu_ops.io_delay() to empty function - paravirt_nop() to
avoid vm-exits on IO delays.

Signed-off-by: Alexey Makhalov 
Acked-by: Alok N Kataria 
---
 arch/x86/kernel/cpu/vmware.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c
index 480790f..e3fb320 100644
--- a/arch/x86/kernel/cpu/vmware.c
+++ b/arch/x86/kernel/cpu/vmware.c
@@ -61,6 +61,16 @@ static unsigned long vmware_get_tsc_khz(void)
return vmware_tsc_khz;
 }
 
+#ifdef CONFIG_PARAVIRT
+static void __init vmware_paravirt_ops_setup(void)
+{
+   pv_info.name = "VMware";
+   pv_cpu_ops.io_delay = paravirt_nop;
+}
+#else
+#define vmware_paravirt_ops_setup() do {} while (0)
+#endif
+
 static void __init vmware_platform_setup(void)
 {
uint32_t eax, ebx, ecx, edx;
@@ -94,6 +104,8 @@ static void __init vmware_platform_setup(void)
} else {
pr_warn("Failed to get TSC freq from the hypervisor\n");
}
+
+   vmware_paravirt_ops_setup();
 }
 
 /*
-- 
2.10.1



[PATCH 2/3] x86/vmware: Add basic paravirt ops support

2016-10-25 Thread Alexey Makhalov
Add basic paravirt support:
 1. set pv_info.name to "VMware" to have proper boot log message
Booting paravirtualized kernel on VMware
instead of "... on bare hardware"
 2. set pv_cpu_ops.io_delay() to empty function - paravirt_nop() to
avoid vm-exits on IO delays.

Signed-off-by: Alexey Makhalov 
Acked-by: Alok N Kataria 
---
 arch/x86/kernel/cpu/vmware.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c
index 480790f..e3fb320 100644
--- a/arch/x86/kernel/cpu/vmware.c
+++ b/arch/x86/kernel/cpu/vmware.c
@@ -61,6 +61,16 @@ static unsigned long vmware_get_tsc_khz(void)
return vmware_tsc_khz;
 }
 
+#ifdef CONFIG_PARAVIRT
+static void __init vmware_paravirt_ops_setup(void)
+{
+   pv_info.name = "VMware";
+   pv_cpu_ops.io_delay = paravirt_nop;
+}
+#else
+#define vmware_paravirt_ops_setup() do {} while (0)
+#endif
+
 static void __init vmware_platform_setup(void)
 {
uint32_t eax, ebx, ecx, edx;
@@ -94,6 +104,8 @@ static void __init vmware_platform_setup(void)
} else {
pr_warn("Failed to get TSC freq from the hypervisor\n");
}
+
+   vmware_paravirt_ops_setup();
 }
 
 /*
-- 
2.10.1



lening bieden 3%

2016-10-25 Thread Lloyds TSB Bank PLC
Goede dag,

 Dit is Lloyd's TSB Bank plc leningen aan te bieden.

   Lloyds TSB biedt flexibele en betaalbare leningen voor welk doel u te helpen 
uw doelen te bereiken. we lening tegen lage rente van 3%. Hier zijn een aantal 
belangrijke kenmerken van de persoonlijke lening aangeboden door Lloyd's TSB. 
Hier zijn de Loan Factoren we werken met de toonaangevende Britse makelaars die 
toegang hebben tot de top kredietverstrekkers hebben en in staat zijn om de 
beste financiële oplossing tegen een betaalbare price.Please vinden als u 
geïnteresseerd bent vriendelijk contact met ons op via deze e-mail: 
lloyds26...@gmail.com


Na de reactie, zal u een aanvraag voor een lening te vullen ontvangen. Geen 
sociale zekerheid en geen credit check, 100% gegarandeerd.

Het zal ons een eer zijn als u ons toelaten om u van dienst zijn.


INFORMATIE NODIG

Jullie namen:
Adres: ...
Telefoon: ...
Benodigd 
Duur: ...
Bezetting: ...
Maandelijks Inkomen Level: 
Geslacht: ...
Geboortedatum: 
Staat: ..
Land: ..
Doel: .

Ontmoeting uw financiële behoeften is onze trots.


Dr.John Mahama.


lening bieden 3%

2016-10-25 Thread Lloyds TSB Bank PLC
Goede dag,

 Dit is Lloyd's TSB Bank plc leningen aan te bieden.

   Lloyds TSB biedt flexibele en betaalbare leningen voor welk doel u te helpen 
uw doelen te bereiken. we lening tegen lage rente van 3%. Hier zijn een aantal 
belangrijke kenmerken van de persoonlijke lening aangeboden door Lloyd's TSB. 
Hier zijn de Loan Factoren we werken met de toonaangevende Britse makelaars die 
toegang hebben tot de top kredietverstrekkers hebben en in staat zijn om de 
beste financiële oplossing tegen een betaalbare price.Please vinden als u 
geïnteresseerd bent vriendelijk contact met ons op via deze e-mail: 
lloyds26...@gmail.com


Na de reactie, zal u een aanvraag voor een lening te vullen ontvangen. Geen 
sociale zekerheid en geen credit check, 100% gegarandeerd.

Het zal ons een eer zijn als u ons toelaten om u van dienst zijn.


INFORMATIE NODIG

Jullie namen:
Adres: ...
Telefoon: ...
Benodigd 
Duur: ...
Bezetting: ...
Maandelijks Inkomen Level: 
Geslacht: ...
Geboortedatum: 
Staat: ..
Land: ..
Doel: .

Ontmoeting uw financiële behoeften is onze trots.


Dr.John Mahama.


[PATCH 0/3] x86/vmware guest improvements

2016-10-25 Thread Alexey Makhalov
This patchset includes several VMware guest improvements:

Alexey Makhalov (3):
  x86/vmware: Use tsc_khz value for calibrate_cpu()
  x86/vmware: Add basic paravirt ops support
  x86/vmware: Add paravirt sched clock

 Documentation/kernel-parameters.txt |  4 +++
 arch/x86/kernel/cpu/vmware.c| 51 +
 2 files changed, 55 insertions(+)

-- 
2.10.1



[PATCH 0/3] x86/vmware guest improvements

2016-10-25 Thread Alexey Makhalov
This patchset includes several VMware guest improvements:

Alexey Makhalov (3):
  x86/vmware: Use tsc_khz value for calibrate_cpu()
  x86/vmware: Add basic paravirt ops support
  x86/vmware: Add paravirt sched clock

 Documentation/kernel-parameters.txt |  4 +++
 arch/x86/kernel/cpu/vmware.c| 51 +
 2 files changed, 55 insertions(+)

-- 
2.10.1



[PATCH 1/3] x86/vmware: Use tsc_khz value for calibrate_cpu()

2016-10-25 Thread Alexey Makhalov
After aa297292d708, there are separate native calibrations for cpu_khz and
tsc_khz. The code sets x86_platform.calibrate_cpu to native_calibrate_cpu()
which looks in cpuid leaf 0x16 or msrs for the cpu frequency. Since we keep
the tsc_khz constant (even after vmotion), the cpu_khz and tsc_khz may
start diverging.

tsc_init() now does

cpu_khz = x86_platform.calibrate_cpu();
tsc_khz = x86_platform.calibrate_tsc();
if (tsc_khz == 0)
tsc_khz = cpu_khz;
else if (abs(cpu_khz - tsc_khz) * 10 > tsc_khz)
cpu_khz = tsc_khz;

We want the cpu_khz and tsc_khz to be sync even if they diverge less then
10%.
This patch resolves this issue by setting x86_platform.calibrate_cpu to
vmware_get_tsc_khz().

Signed-off-by: Alexey Makhalov 
Acked-by: Alok N Kataria 
---
 arch/x86/kernel/cpu/vmware.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c
index 4e34da4b..480790f 100644
--- a/arch/x86/kernel/cpu/vmware.c
+++ b/arch/x86/kernel/cpu/vmware.c
@@ -83,6 +83,7 @@ static void __init vmware_platform_setup(void)
 
vmware_tsc_khz = tsc_khz;
x86_platform.calibrate_tsc = vmware_get_tsc_khz;
+   x86_platform.calibrate_cpu = vmware_get_tsc_khz;
 
 #ifdef CONFIG_X86_LOCAL_APIC
/* Skip lapic calibration since we know the bus frequency. */
-- 
2.10.1



[PATCH 1/3] x86/vmware: Use tsc_khz value for calibrate_cpu()

2016-10-25 Thread Alexey Makhalov
After aa297292d708, there are separate native calibrations for cpu_khz and
tsc_khz. The code sets x86_platform.calibrate_cpu to native_calibrate_cpu()
which looks in cpuid leaf 0x16 or msrs for the cpu frequency. Since we keep
the tsc_khz constant (even after vmotion), the cpu_khz and tsc_khz may
start diverging.

tsc_init() now does

cpu_khz = x86_platform.calibrate_cpu();
tsc_khz = x86_platform.calibrate_tsc();
if (tsc_khz == 0)
tsc_khz = cpu_khz;
else if (abs(cpu_khz - tsc_khz) * 10 > tsc_khz)
cpu_khz = tsc_khz;

We want the cpu_khz and tsc_khz to be sync even if they diverge less then
10%.
This patch resolves this issue by setting x86_platform.calibrate_cpu to
vmware_get_tsc_khz().

Signed-off-by: Alexey Makhalov 
Acked-by: Alok N Kataria 
---
 arch/x86/kernel/cpu/vmware.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c
index 4e34da4b..480790f 100644
--- a/arch/x86/kernel/cpu/vmware.c
+++ b/arch/x86/kernel/cpu/vmware.c
@@ -83,6 +83,7 @@ static void __init vmware_platform_setup(void)
 
vmware_tsc_khz = tsc_khz;
x86_platform.calibrate_tsc = vmware_get_tsc_khz;
+   x86_platform.calibrate_cpu = vmware_get_tsc_khz;
 
 #ifdef CONFIG_X86_LOCAL_APIC
/* Skip lapic calibration since we know the bus frequency. */
-- 
2.10.1



RE: [PATCH 0/9] staging: fsl-mc: move bus driver out of staging, add dpio

2016-10-25 Thread Stuart Yoder


> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Monday, October 24, 2016 9:34 AM
> To: Stuart Yoder ; gre...@linuxfoundation.org
> Cc: German Rivera ; de...@driverdev.osuosl.org; 
> linux-kernel@vger.kernel.org;
> a...@arndb.de; Leo Li 
> Subject: Re: [PATCH 0/9] staging: fsl-mc: move bus driver out of staging, add 
> dpio
> 
> Hi Stuart,
> 
> On 10/21/2016 04:01 PM, Stuart Yoder wrote:
> > This patch series: A) addresses the final item in the staging
> > TODO list for the fsl-mc bus driver-- adding a functional driver
> > on top of the bus driver, and B) requests that the fsl-mc bus driver
> > be moved out of staging.
> 
> Awesome, it's great to see progress again! :)
> 
> > The proposed destination for the bus driver is drivers/bus.
> > Proposed location for global header files for fsl-mc and dpaa2
> > is include/linux/fsl.
> >
> > The functional driver added is for the DPIO object which provides
> > queuing services for other DPAA2 drivers.  An overview of the
> 
> I thought the idea of the TODO item was to have a full-fledged user of
> the bus, like a full network driver. The TODO item reads:
> 
> > -* Add at least one device driver for a DPAA2 object (child device of the
> > -  fsl-mc bus).  Most likely candidate for this is adding DPAA2 Ethernet
> > -  driver support, which depends on drivers for several objects: DPNI,
> > -  DPIO, DPMAC.  Other pre-requisites include:

DPIO is a "full fleged user" of the bus.  But, yes, it does provide
infrastructure services and so does not have a standalone I/O function.

> which to me indicates that DPIO is only part of that goal. Of course I'm
> the last person blocking progress to move the driver out of staging. But
> are we at the right point yet?

I thought the goal was to demonstrate a driver on top of the fsl-mc
bus driver because without that it would have been difficult to validate/review
that the bus infrastructure was correct.

The DPIO driver demonstrates full use of the bus driver infrastructure--
getting probed, discovering and mapping mmio regions, initializing the
device, initializing interrupts.

> To me the topmost important bit of having this outside of staging is
> actually missing in the TODO list (probably since it's obvious): Have
> stable, reliable, responsible maintainership for the code.
> 
> So far I've seen German do the initial push upstream, then there was
> silence for a while. Now some time passed and you push a few bits here
> and there again. All of the efforts are great and very appreciated, but
> I'm missing the "maintainer" figure. Some peer to German and you who
> oversees the whole thing, reviews your patches and devotes at least 2-3
> days a week to only upstream fsl-mc work. Someone like York for U-Boot
> or Scott for general Linux work.
> 
> Without that, there's too much of a chance that the code will stay
> incomplete, bitrot, etc. And that'd be bad for everyone involved. I
> think the concept behind fsl-mc is great and exactly what people need,
> so we should make sure it succeeds.

I agree we need that.  We are actively working on getting an additional
maintainer (or two), and until we can get the right person(s) I'm willing
to fill that role.  We're not going to let this code bitrot.

I actually think getting the bus driver out of staging will help spur
broader involvment by NXP engineers in the fsl-mc bus support.  There
are enhancements like a resource management interface for user space,
an interface to see the MC log buffer, SMMU-related hooks for the fsl-mc
bus, and vfio for the fsl-mc bus.  All that stuff is on hold until we
get the bus driver out of staging. The directive we have is to add no
new features until the bus driver is out.

For example, the ARM SMMU driver has an include of ,
but I don't see the SMMU maintainers accepting the following in
arm-smmu.c:
   #include <../drivers/staging/fsl-mc/include/mc.h>

Given that the fsl-mc bus TODO list is done, there is not a whole lot
for a new maintainer to do to the bus driver itself until we get the
driver out of staging (aside from reviewing another DPAA2 object driver
that would also go into staging).

Once the bus driver + dpio is out staging it also opens up the door
for other DPAA2 drivers-- network, crypto, DMA, L2 switch,
decompression/compression, and others to be upstreamed.  I didn't think
we wanted all of those to go into staging, but we were waiting until
some 1 driver was accepted first, proving the bus infrastructure is 
sound.  I was hoping DPI could be that proof of concept.

So, in short, I think getting the bus driver and DPIO out of staging
will open some parallel development and will also provide more 
opportunities for some new maintainers to get involved, because there
will be more to review and do.

However, if you want things to stay in staging for now, I will resubmit
and put DPIO there.

Thanks,
Stuart


RE: [PATCH 0/9] staging: fsl-mc: move bus driver out of staging, add dpio

2016-10-25 Thread Stuart Yoder


> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Monday, October 24, 2016 9:34 AM
> To: Stuart Yoder ; gre...@linuxfoundation.org
> Cc: German Rivera ; de...@driverdev.osuosl.org; 
> linux-kernel@vger.kernel.org;
> a...@arndb.de; Leo Li 
> Subject: Re: [PATCH 0/9] staging: fsl-mc: move bus driver out of staging, add 
> dpio
> 
> Hi Stuart,
> 
> On 10/21/2016 04:01 PM, Stuart Yoder wrote:
> > This patch series: A) addresses the final item in the staging
> > TODO list for the fsl-mc bus driver-- adding a functional driver
> > on top of the bus driver, and B) requests that the fsl-mc bus driver
> > be moved out of staging.
> 
> Awesome, it's great to see progress again! :)
> 
> > The proposed destination for the bus driver is drivers/bus.
> > Proposed location for global header files for fsl-mc and dpaa2
> > is include/linux/fsl.
> >
> > The functional driver added is for the DPIO object which provides
> > queuing services for other DPAA2 drivers.  An overview of the
> 
> I thought the idea of the TODO item was to have a full-fledged user of
> the bus, like a full network driver. The TODO item reads:
> 
> > -* Add at least one device driver for a DPAA2 object (child device of the
> > -  fsl-mc bus).  Most likely candidate for this is adding DPAA2 Ethernet
> > -  driver support, which depends on drivers for several objects: DPNI,
> > -  DPIO, DPMAC.  Other pre-requisites include:

DPIO is a "full fleged user" of the bus.  But, yes, it does provide
infrastructure services and so does not have a standalone I/O function.

> which to me indicates that DPIO is only part of that goal. Of course I'm
> the last person blocking progress to move the driver out of staging. But
> are we at the right point yet?

I thought the goal was to demonstrate a driver on top of the fsl-mc
bus driver because without that it would have been difficult to validate/review
that the bus infrastructure was correct.

The DPIO driver demonstrates full use of the bus driver infrastructure--
getting probed, discovering and mapping mmio regions, initializing the
device, initializing interrupts.

> To me the topmost important bit of having this outside of staging is
> actually missing in the TODO list (probably since it's obvious): Have
> stable, reliable, responsible maintainership for the code.
> 
> So far I've seen German do the initial push upstream, then there was
> silence for a while. Now some time passed and you push a few bits here
> and there again. All of the efforts are great and very appreciated, but
> I'm missing the "maintainer" figure. Some peer to German and you who
> oversees the whole thing, reviews your patches and devotes at least 2-3
> days a week to only upstream fsl-mc work. Someone like York for U-Boot
> or Scott for general Linux work.
> 
> Without that, there's too much of a chance that the code will stay
> incomplete, bitrot, etc. And that'd be bad for everyone involved. I
> think the concept behind fsl-mc is great and exactly what people need,
> so we should make sure it succeeds.

I agree we need that.  We are actively working on getting an additional
maintainer (or two), and until we can get the right person(s) I'm willing
to fill that role.  We're not going to let this code bitrot.

I actually think getting the bus driver out of staging will help spur
broader involvment by NXP engineers in the fsl-mc bus support.  There
are enhancements like a resource management interface for user space,
an interface to see the MC log buffer, SMMU-related hooks for the fsl-mc
bus, and vfio for the fsl-mc bus.  All that stuff is on hold until we
get the bus driver out of staging. The directive we have is to add no
new features until the bus driver is out.

For example, the ARM SMMU driver has an include of ,
but I don't see the SMMU maintainers accepting the following in
arm-smmu.c:
   #include <../drivers/staging/fsl-mc/include/mc.h>

Given that the fsl-mc bus TODO list is done, there is not a whole lot
for a new maintainer to do to the bus driver itself until we get the
driver out of staging (aside from reviewing another DPAA2 object driver
that would also go into staging).

Once the bus driver + dpio is out staging it also opens up the door
for other DPAA2 drivers-- network, crypto, DMA, L2 switch,
decompression/compression, and others to be upstreamed.  I didn't think
we wanted all of those to go into staging, but we were waiting until
some 1 driver was accepted first, proving the bus infrastructure is 
sound.  I was hoping DPI could be that proof of concept.

So, in short, I think getting the bus driver and DPIO out of staging
will open some parallel development and will also provide more 
opportunities for some new maintainers to get involved, because there
will be more to review and do.

However, if you want things to stay in staging for now, I will resubmit
and put DPIO there.

Thanks,
Stuart


Re: [RFC 8/8] mm: Add N_COHERENT_DEVICE node type into node_states[]

2016-10-25 Thread Anshuman Khandual
On 10/25/2016 12:52 PM, Balbir Singh wrote:
> 
> 
> On 24/10/16 15:31, Anshuman Khandual wrote:
>> Add a new member N_COHERENT_DEVICE into node_states[] nodemask array to
>> enlist all those nodes which contain only coherent device memory. Also
>> creates a new sysfs interface /sys/devices/system/node/is_coherent_device
>> to list down all those nodes which has coherent device memory.
>>
>> Signed-off-by: Anshuman Khandual 
>> ---
>>  Documentation/ABI/stable/sysfs-devices-node |  7 +++
>>  drivers/base/node.c |  6 ++
>>  include/linux/nodemask.h|  3 +++
>>  mm/memory_hotplug.c | 10 ++
>>  4 files changed, 26 insertions(+)
>>
>> diff --git a/Documentation/ABI/stable/sysfs-devices-node 
>> b/Documentation/ABI/stable/sysfs-devices-node
>> index 5b2d0f0..5538791 100644
>> --- a/Documentation/ABI/stable/sysfs-devices-node
>> +++ b/Documentation/ABI/stable/sysfs-devices-node
>> @@ -29,6 +29,13 @@ Description:
>>  Nodes that have regular or high memory.
>>  Depends on CONFIG_HIGHMEM.
>>  
>> +What:   /sys/devices/system/node/is_coherent_device
>> +Date:   October 2016
>> +Contact:Linux Memory Management list 
>> +Description:
>> +Lists the nodemask of nodes that have coherent memory.
>> +Depends on CONFIG_COHERENT_DEVICE.
>> +
>>  What:   /sys/devices/system/node/nodeX
>>  Date:   October 2002
>>  Contact:Linux Memory Management list 
>> diff --git a/drivers/base/node.c b/drivers/base/node.c
>> index 5548f96..5b5dd89 100644
>> --- a/drivers/base/node.c
>> +++ b/drivers/base/node.c
>> @@ -661,6 +661,9 @@ static struct node_attr node_state_attr[] = {
>>  [N_MEMORY] = _NODE_ATTR(has_memory, N_MEMORY),
>>  #endif
>>  [N_CPU] = _NODE_ATTR(has_cpu, N_CPU),
>> +#ifdef CONFIG_COHERENT_DEVICE
>> +[N_COHERENT_DEVICE] = _NODE_ATTR(is_coherent_device, N_COHERENT_DEVICE),
>> +#endif
>>  };
>>  
>>  static struct attribute *node_state_attrs[] = {
>> @@ -674,6 +677,9 @@ static struct attribute *node_state_attrs[] = {
>>  _state_attr[N_MEMORY].attr.attr,
>>  #endif
>>  _state_attr[N_CPU].attr.attr,
>> +#ifdef CONFIG_COHERENT_DEVICE
>> +_state_attr[N_COHERENT_DEVICE].attr.attr,
>> +#endif
>>  NULL
>>  };
>>  
>> diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h
>> index f746e44..605cb0d 100644
>> --- a/include/linux/nodemask.h
>> +++ b/include/linux/nodemask.h
>> @@ -393,6 +393,9 @@ enum node_states {
>>  N_MEMORY = N_HIGH_MEMORY,
>>  #endif
>>  N_CPU,  /* The node has one or more cpus */
>> +#ifdef CONFIG_COHERENT_DEVICE
>> +N_COHERENT_DEVICE,  /* The node has coherent device memory */
>> +#endif
>>  NR_NODE_STATES
>>  };
>>  
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 9629273..8f03962 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -1044,6 +1044,11 @@ static void node_states_set_node(int node, struct 
>> memory_notify *arg)
>>  if (arg->status_change_nid_high >= 0)
>>  node_set_state(node, N_HIGH_MEMORY);
>>  
>> +#ifdef CONFIG_COHERENT_DEVICE
>> +if (isolated_cdm_node(node))
>> +node_set_state(node, N_COHERENT_DEVICE);
>> +#endif
>> +
> 
> #ifdef not required, see below
> 

Right, will change.

>>  node_set_state(node, N_MEMORY);
>>  }
>>  
>> @@ -1858,6 +1863,11 @@ static void node_states_clear_node(int node, struct 
>> memory_notify *arg)
>>  if ((N_MEMORY != N_HIGH_MEMORY) &&
>>  (arg->status_change_nid >= 0))
>>  node_clear_state(node, N_MEMORY);
>> +
>> +#ifdef CONFIG_COHERENT_DEVICE
>> +if (isolated_cdm_node(node))
>> +node_clear_state(node, N_COHERENT_DEVICE);
>> +#endif
>>  }
>>  
> 
> I think the #ifdefs are not needed if isolated_cdm_node
> is defined for both with and without CONFIG_COHERENT_DEVICE.
> 
> I think this patch needs to move up in the series so that
> node state can be examined by other core algorithms

Okay, will move up.



Re: [RFC 8/8] mm: Add N_COHERENT_DEVICE node type into node_states[]

2016-10-25 Thread Anshuman Khandual
On 10/25/2016 12:52 PM, Balbir Singh wrote:
> 
> 
> On 24/10/16 15:31, Anshuman Khandual wrote:
>> Add a new member N_COHERENT_DEVICE into node_states[] nodemask array to
>> enlist all those nodes which contain only coherent device memory. Also
>> creates a new sysfs interface /sys/devices/system/node/is_coherent_device
>> to list down all those nodes which has coherent device memory.
>>
>> Signed-off-by: Anshuman Khandual 
>> ---
>>  Documentation/ABI/stable/sysfs-devices-node |  7 +++
>>  drivers/base/node.c |  6 ++
>>  include/linux/nodemask.h|  3 +++
>>  mm/memory_hotplug.c | 10 ++
>>  4 files changed, 26 insertions(+)
>>
>> diff --git a/Documentation/ABI/stable/sysfs-devices-node 
>> b/Documentation/ABI/stable/sysfs-devices-node
>> index 5b2d0f0..5538791 100644
>> --- a/Documentation/ABI/stable/sysfs-devices-node
>> +++ b/Documentation/ABI/stable/sysfs-devices-node
>> @@ -29,6 +29,13 @@ Description:
>>  Nodes that have regular or high memory.
>>  Depends on CONFIG_HIGHMEM.
>>  
>> +What:   /sys/devices/system/node/is_coherent_device
>> +Date:   October 2016
>> +Contact:Linux Memory Management list 
>> +Description:
>> +Lists the nodemask of nodes that have coherent memory.
>> +Depends on CONFIG_COHERENT_DEVICE.
>> +
>>  What:   /sys/devices/system/node/nodeX
>>  Date:   October 2002
>>  Contact:Linux Memory Management list 
>> diff --git a/drivers/base/node.c b/drivers/base/node.c
>> index 5548f96..5b5dd89 100644
>> --- a/drivers/base/node.c
>> +++ b/drivers/base/node.c
>> @@ -661,6 +661,9 @@ static struct node_attr node_state_attr[] = {
>>  [N_MEMORY] = _NODE_ATTR(has_memory, N_MEMORY),
>>  #endif
>>  [N_CPU] = _NODE_ATTR(has_cpu, N_CPU),
>> +#ifdef CONFIG_COHERENT_DEVICE
>> +[N_COHERENT_DEVICE] = _NODE_ATTR(is_coherent_device, N_COHERENT_DEVICE),
>> +#endif
>>  };
>>  
>>  static struct attribute *node_state_attrs[] = {
>> @@ -674,6 +677,9 @@ static struct attribute *node_state_attrs[] = {
>>  _state_attr[N_MEMORY].attr.attr,
>>  #endif
>>  _state_attr[N_CPU].attr.attr,
>> +#ifdef CONFIG_COHERENT_DEVICE
>> +_state_attr[N_COHERENT_DEVICE].attr.attr,
>> +#endif
>>  NULL
>>  };
>>  
>> diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h
>> index f746e44..605cb0d 100644
>> --- a/include/linux/nodemask.h
>> +++ b/include/linux/nodemask.h
>> @@ -393,6 +393,9 @@ enum node_states {
>>  N_MEMORY = N_HIGH_MEMORY,
>>  #endif
>>  N_CPU,  /* The node has one or more cpus */
>> +#ifdef CONFIG_COHERENT_DEVICE
>> +N_COHERENT_DEVICE,  /* The node has coherent device memory */
>> +#endif
>>  NR_NODE_STATES
>>  };
>>  
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 9629273..8f03962 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -1044,6 +1044,11 @@ static void node_states_set_node(int node, struct 
>> memory_notify *arg)
>>  if (arg->status_change_nid_high >= 0)
>>  node_set_state(node, N_HIGH_MEMORY);
>>  
>> +#ifdef CONFIG_COHERENT_DEVICE
>> +if (isolated_cdm_node(node))
>> +node_set_state(node, N_COHERENT_DEVICE);
>> +#endif
>> +
> 
> #ifdef not required, see below
> 

Right, will change.

>>  node_set_state(node, N_MEMORY);
>>  }
>>  
>> @@ -1858,6 +1863,11 @@ static void node_states_clear_node(int node, struct 
>> memory_notify *arg)
>>  if ((N_MEMORY != N_HIGH_MEMORY) &&
>>  (arg->status_change_nid >= 0))
>>  node_clear_state(node, N_MEMORY);
>> +
>> +#ifdef CONFIG_COHERENT_DEVICE
>> +if (isolated_cdm_node(node))
>> +node_clear_state(node, N_COHERENT_DEVICE);
>> +#endif
>>  }
>>  
> 
> I think the #ifdefs are not needed if isolated_cdm_node
> is defined for both with and without CONFIG_COHERENT_DEVICE.
> 
> I think this patch needs to move up in the series so that
> node state can be examined by other core algorithms

Okay, will move up.



Re: [RFC PATCH 2/5] mm/page_alloc: use smallest fallback page first in movable allocation

2016-10-25 Thread Joonsoo Kim
On Fri, Oct 14, 2016 at 12:52:26PM +0200, Vlastimil Babka wrote:
> On 10/14/2016 03:26 AM, Joonsoo Kim wrote:
> >On Thu, Oct 13, 2016 at 11:12:10AM +0200, Vlastimil Babka wrote:
> >>On 10/13/2016 10:08 AM, js1...@gmail.com wrote:
> >>>From: Joonsoo Kim 
> >>>
> >>>When we try to find freepage in fallback buddy list, we always serach
> >>>the largest one. This would help for fragmentation if we process
> >>>unmovable/reclaimable allocation request because it could cause permanent
> >>>fragmentation on movable pageblock and spread out such allocations would
> >>>cause more fragmentation. But, movable allocation request is
> >>>rather different. It would be simply freed or migrated so it doesn't
> >>>contribute to fragmentation on the other pageblock. In this case, it would
> >>>be better not to break the precious highest order freepage so we need to
> >>>search the smallest freepage first.
> >>
> >>I've also pondered this, but then found a lower hanging fruit that
> >>should be hopefully clear win and mitigate most cases of breaking
> >>high-order pages unnecessarily:
> >>
> >>http://marc.info/?l=linux-mm=147582914330198=2
> >
> >Yes, I agree with that change. That's the similar patch what I tried
> >before.
> >
> >"mm/page_alloc: don't break highest order freepage if steal"
> >http://marc.info/?l=linux-mm=143011930520417=2
> 
> Ah, indeed, I forgot about it and had to rediscover :)
> 
> >
> >>
> >>So I would try that first, and then test your patch on top? In your
> >>patch there's a risk that we make it harder for
> >>unmovable/reclaimable pageblocks to become movable again (we start
> >>with the smallest page which means there's lower chance that
> >>move_freepages_block() will convert more than half of the block).
> >
> >Indeed, but, with your "count movable pages when stealing", risk would
> >disappear. :)
> 
> Hmm, but that counting is only triggered when we attempt to steal
> whole pageblock. For movable allocation, can_steal_fallback() allows
> that only for
> (order >= pageblock_order / 2), and since your patch makes "order"
> as small as possible for movable allocations, the chances are lower?

Chances are lower than current but we eventually try to steal that
(order >= pageblock_order / 2) freepage from unmovable pageblock and
your logic will result in changing pageblock migratetype from
unmovable to movable.

Thanks.



Re: [RFC PATCH 2/5] mm/page_alloc: use smallest fallback page first in movable allocation

2016-10-25 Thread Joonsoo Kim
On Fri, Oct 14, 2016 at 12:52:26PM +0200, Vlastimil Babka wrote:
> On 10/14/2016 03:26 AM, Joonsoo Kim wrote:
> >On Thu, Oct 13, 2016 at 11:12:10AM +0200, Vlastimil Babka wrote:
> >>On 10/13/2016 10:08 AM, js1...@gmail.com wrote:
> >>>From: Joonsoo Kim 
> >>>
> >>>When we try to find freepage in fallback buddy list, we always serach
> >>>the largest one. This would help for fragmentation if we process
> >>>unmovable/reclaimable allocation request because it could cause permanent
> >>>fragmentation on movable pageblock and spread out such allocations would
> >>>cause more fragmentation. But, movable allocation request is
> >>>rather different. It would be simply freed or migrated so it doesn't
> >>>contribute to fragmentation on the other pageblock. In this case, it would
> >>>be better not to break the precious highest order freepage so we need to
> >>>search the smallest freepage first.
> >>
> >>I've also pondered this, but then found a lower hanging fruit that
> >>should be hopefully clear win and mitigate most cases of breaking
> >>high-order pages unnecessarily:
> >>
> >>http://marc.info/?l=linux-mm=147582914330198=2
> >
> >Yes, I agree with that change. That's the similar patch what I tried
> >before.
> >
> >"mm/page_alloc: don't break highest order freepage if steal"
> >http://marc.info/?l=linux-mm=143011930520417=2
> 
> Ah, indeed, I forgot about it and had to rediscover :)
> 
> >
> >>
> >>So I would try that first, and then test your patch on top? In your
> >>patch there's a risk that we make it harder for
> >>unmovable/reclaimable pageblocks to become movable again (we start
> >>with the smallest page which means there's lower chance that
> >>move_freepages_block() will convert more than half of the block).
> >
> >Indeed, but, with your "count movable pages when stealing", risk would
> >disappear. :)
> 
> Hmm, but that counting is only triggered when we attempt to steal
> whole pageblock. For movable allocation, can_steal_fallback() allows
> that only for
> (order >= pageblock_order / 2), and since your patch makes "order"
> as small as possible for movable allocations, the chances are lower?

Chances are lower than current but we eventually try to steal that
(order >= pageblock_order / 2) freepage from unmovable pageblock and
your logic will result in changing pageblock migratetype from
unmovable to movable.

Thanks.



Re: [RFC PATCH 1/5] mm/page_alloc: always add freeing page at the tail of the buddy list

2016-10-25 Thread Joonsoo Kim
On Mon, Oct 17, 2016 at 05:21:54PM +0800, Xishi Qiu wrote:
> On 2016/10/13 16:08, js1...@gmail.com wrote:
> 
> > From: Joonsoo Kim 
> > 
> > Currently, freeing page can stay longer in the buddy list if next higher
> > order page is in the buddy list in order to help coalescence. However,
> > it doesn't work for the simplest sequential free case. For example, think
> > about the situation that 8 consecutive pages are freed in sequential
> > order.
> > 
> > page 0: attached at the head of order 0 list
> > page 1: merged with page 0, attached at the head of order 1 list
> > page 2: attached at the tail of order 0 list
> > page 3: merged with page 2 and then merged with page 0, attached at
> >  the head of order 2 list
> > page 4: attached at the head of order 0 list
> > page 5: merged with page 4, attached at the tail of order 1 list
> > page 6: attached at the tail of order 0 list
> > page 7: merged with page 6 and then merged with page 4. Lastly, merged
> >  with page 0 and we get order 3 freepage.
> > 
> > With excluding page 0 case, there are three cases that freeing page is
> > attached at the head of buddy list in this example and if just one
> > corresponding ordered allocation request comes at that moment, this page
> > in being a high order page will be allocated and we would fail to make
> > order-3 freepage.
> > 
> > Allocation usually happens in sequential order and free also does. So, it
> > would be important to detect such a situation and to give some chance
> > to be coalesced.
> > 
> > I think that simple and effective heuristic about this case is just
> > attaching freeing page at the tail of the buddy list unconditionally.
> > If freeing isn't merged during one rotation, it would be actual
> > fragmentation and we don't need to care about it for coalescence.
> > 
> 
> Hi Joonsoo,
> 
> I find another two places to reduce fragmentation.
> 
> 1)
> __rmqueue_fallback
>   steal_suitable_fallback
>   move_freepages_block
>   move_freepages
>   list_move
> If we steal some free pages, we will add these page at the head of 
> start_migratetype list,
> this will cause more fixed migratetype, because this pages will be allocated 
> more easily.
> So how about use list_move_tail instead of list_move?

Yeah... I don't think deeply but, at a glance, it would be helpful.

> 
> 2)
> __rmqueue_fallback
>   expand
>   list_add
> How about use list_add_tail instead of list_add? If add the tail, then the 
> rest of pages
> will be hard to be allocated and we can merge them again as soon as the page 
> freed.

I guess that it has no effect. When we do __rmqueue_fallback() and
expand(), we don't have any freepage on this or more order. So,
list_add or list_add_tail will show the same result.

Thanks.


Re: [RFC PATCH 1/5] mm/page_alloc: always add freeing page at the tail of the buddy list

2016-10-25 Thread Joonsoo Kim
On Mon, Oct 17, 2016 at 05:21:54PM +0800, Xishi Qiu wrote:
> On 2016/10/13 16:08, js1...@gmail.com wrote:
> 
> > From: Joonsoo Kim 
> > 
> > Currently, freeing page can stay longer in the buddy list if next higher
> > order page is in the buddy list in order to help coalescence. However,
> > it doesn't work for the simplest sequential free case. For example, think
> > about the situation that 8 consecutive pages are freed in sequential
> > order.
> > 
> > page 0: attached at the head of order 0 list
> > page 1: merged with page 0, attached at the head of order 1 list
> > page 2: attached at the tail of order 0 list
> > page 3: merged with page 2 and then merged with page 0, attached at
> >  the head of order 2 list
> > page 4: attached at the head of order 0 list
> > page 5: merged with page 4, attached at the tail of order 1 list
> > page 6: attached at the tail of order 0 list
> > page 7: merged with page 6 and then merged with page 4. Lastly, merged
> >  with page 0 and we get order 3 freepage.
> > 
> > With excluding page 0 case, there are three cases that freeing page is
> > attached at the head of buddy list in this example and if just one
> > corresponding ordered allocation request comes at that moment, this page
> > in being a high order page will be allocated and we would fail to make
> > order-3 freepage.
> > 
> > Allocation usually happens in sequential order and free also does. So, it
> > would be important to detect such a situation and to give some chance
> > to be coalesced.
> > 
> > I think that simple and effective heuristic about this case is just
> > attaching freeing page at the tail of the buddy list unconditionally.
> > If freeing isn't merged during one rotation, it would be actual
> > fragmentation and we don't need to care about it for coalescence.
> > 
> 
> Hi Joonsoo,
> 
> I find another two places to reduce fragmentation.
> 
> 1)
> __rmqueue_fallback
>   steal_suitable_fallback
>   move_freepages_block
>   move_freepages
>   list_move
> If we steal some free pages, we will add these page at the head of 
> start_migratetype list,
> this will cause more fixed migratetype, because this pages will be allocated 
> more easily.
> So how about use list_move_tail instead of list_move?

Yeah... I don't think deeply but, at a glance, it would be helpful.

> 
> 2)
> __rmqueue_fallback
>   expand
>   list_add
> How about use list_add_tail instead of list_add? If add the tail, then the 
> rest of pages
> will be hard to be allocated and we can merge them again as soon as the page 
> freed.

I guess that it has no effect. When we do __rmqueue_fallback() and
expand(), we don't have any freepage on this or more order. So,
list_add or list_add_tail will show the same result.

Thanks.


Re: [kernel-hardening] [PATCH] module: extend 'rodata=off' boot cmdline parameter to module mappings

2016-10-25 Thread AKASHI Takahiro
Rusty, Jessica

On Wed, Oct 26, 2016 at 10:43:32AM +1030, Rusty Russell wrote:
> AKASHI Takahiro  writes:
> > On Thu, Oct 20, 2016 at 01:48:15PM -0700, Kees Cook wrote:
> >> On Wed, Oct 19, 2016 at 11:24 PM, AKASHI Takahiro
> >>  wrote:
> >> > The current "rodata=off" parameter disables read-only kernel mappings
> >> > under CONFIG_DEBUG_RODATA:
> >> > commit d2aa1acad22f ("mm/init: Add 'rodata=off' boot cmdline 
> >> > parameter
> >> > to disable read-only kernel mappings")
> >> >
> >> > This patch is a logical extension to module mappings ie. read-only 
> >> > mappings
> >> > at module loading can be disabled even if CONFIG_DEBUG_SET_MODULE_RONX
> >> > (mainly for debug use). Please note, however, that it only affects RO/RW
> >> > permissions, keeping NX set.
> 
> This patch looks good (except the minor issues noted by Kees); please CC
> the followup version to Jessica as new module maintainer.

I think that the new version (v2)[1] addresses Kees' comments already.

[1] http://lkml.iu.edu//hypermail/linux/kernel/1610.2/04163.html

Thanks,
-Takahiro AKASHI

> Thanks!
> Rusty.
> 
> >> >
> >> > This is the first step to make CONFIG_DEBUG_SET_MODULE_RONX mandatory
> >> > (always-on) in the future as CONFIG_DEBUG_RODATA on x86 and arm64.
> >> >
> >> > Suggested-by: Mark Rutland 
> >> > Signed-off-by: AKASHI Takahiro 
> >> > Cc: Rusty Russell 
> >> > ---
> >> > v1:
> >> >   * remove RFC's "module_ronx=" and merge it with "rodata="
> >> >   * always keep NX set if CONFIG_SET_MODULE_RONX
> >> >
> >> >  include/linux/init.h |  3 ++-
> >> >  init/main.c  |  2 +-
> >> >  kernel/module.c  | 21 ++---
> >> >  3 files changed, 21 insertions(+), 5 deletions(-)
> >> >
> >> > diff --git a/include/linux/init.h b/include/linux/init.h
> >> > index e30104c..20aa2eb 100644
> >> > --- a/include/linux/init.h
> >> > +++ b/include/linux/init.h
> >> > @@ -126,7 +126,8 @@ void prepare_namespace(void);
> >> >  void __init load_default_modules(void);
> >> >  int __init init_rootfs(void);
> >> >
> >> > -#ifdef CONFIG_DEBUG_RODATA
> >> > +#if defined(CONFIG_DEBUG_RODATA) || 
> >> > defined(CONFIG_DEBUG_SET_MODULE_RONX)
> >> > +extern bool rodata_enabled;
> >> >  void mark_rodata_ro(void);
> >> >  #endif
> >> >
> >> > diff --git a/init/main.c b/init/main.c
> >> > index 2858be7..92db2f3 100644
> >> > --- a/init/main.c
> >> > +++ b/init/main.c
> >> > @@ -915,7 +915,7 @@ static int try_to_run_init_process(const char 
> >> > *init_filename)
> >> >  static noinline void __init kernel_init_freeable(void);
> >> >
> >> >  #ifdef CONFIG_DEBUG_RODATA
> >> > -static bool rodata_enabled = true;
> >> > +bool rodata_enabled = true;
> >> 
> >> Is there a mismatch here between the extern ifdef and the bool ifdef?
> >> I.e. shouldn't the ifdef here be || DEBUG_SET_MODULE_RONX too?
> >
> > Yes.
> >
> >> Also, can you mark this as __ro_after_init, since nothing changes it
> >> after the kernel command line is parsed?
> >
> > Yes, yes.
> >
> > Thanks,
> > -Takahiro AKASHI
> >
> >> Otherwise, this looks fine to me.
> >> 
> >> -Kees
> >> 
> >> 
> >> -- 
> >> Kees Cook
> >> Nexus Security


Re: [kernel-hardening] [PATCH] module: extend 'rodata=off' boot cmdline parameter to module mappings

2016-10-25 Thread AKASHI Takahiro
Rusty, Jessica

On Wed, Oct 26, 2016 at 10:43:32AM +1030, Rusty Russell wrote:
> AKASHI Takahiro  writes:
> > On Thu, Oct 20, 2016 at 01:48:15PM -0700, Kees Cook wrote:
> >> On Wed, Oct 19, 2016 at 11:24 PM, AKASHI Takahiro
> >>  wrote:
> >> > The current "rodata=off" parameter disables read-only kernel mappings
> >> > under CONFIG_DEBUG_RODATA:
> >> > commit d2aa1acad22f ("mm/init: Add 'rodata=off' boot cmdline 
> >> > parameter
> >> > to disable read-only kernel mappings")
> >> >
> >> > This patch is a logical extension to module mappings ie. read-only 
> >> > mappings
> >> > at module loading can be disabled even if CONFIG_DEBUG_SET_MODULE_RONX
> >> > (mainly for debug use). Please note, however, that it only affects RO/RW
> >> > permissions, keeping NX set.
> 
> This patch looks good (except the minor issues noted by Kees); please CC
> the followup version to Jessica as new module maintainer.

I think that the new version (v2)[1] addresses Kees' comments already.

[1] http://lkml.iu.edu//hypermail/linux/kernel/1610.2/04163.html

Thanks,
-Takahiro AKASHI

> Thanks!
> Rusty.
> 
> >> >
> >> > This is the first step to make CONFIG_DEBUG_SET_MODULE_RONX mandatory
> >> > (always-on) in the future as CONFIG_DEBUG_RODATA on x86 and arm64.
> >> >
> >> > Suggested-by: Mark Rutland 
> >> > Signed-off-by: AKASHI Takahiro 
> >> > Cc: Rusty Russell 
> >> > ---
> >> > v1:
> >> >   * remove RFC's "module_ronx=" and merge it with "rodata="
> >> >   * always keep NX set if CONFIG_SET_MODULE_RONX
> >> >
> >> >  include/linux/init.h |  3 ++-
> >> >  init/main.c  |  2 +-
> >> >  kernel/module.c  | 21 ++---
> >> >  3 files changed, 21 insertions(+), 5 deletions(-)
> >> >
> >> > diff --git a/include/linux/init.h b/include/linux/init.h
> >> > index e30104c..20aa2eb 100644
> >> > --- a/include/linux/init.h
> >> > +++ b/include/linux/init.h
> >> > @@ -126,7 +126,8 @@ void prepare_namespace(void);
> >> >  void __init load_default_modules(void);
> >> >  int __init init_rootfs(void);
> >> >
> >> > -#ifdef CONFIG_DEBUG_RODATA
> >> > +#if defined(CONFIG_DEBUG_RODATA) || 
> >> > defined(CONFIG_DEBUG_SET_MODULE_RONX)
> >> > +extern bool rodata_enabled;
> >> >  void mark_rodata_ro(void);
> >> >  #endif
> >> >
> >> > diff --git a/init/main.c b/init/main.c
> >> > index 2858be7..92db2f3 100644
> >> > --- a/init/main.c
> >> > +++ b/init/main.c
> >> > @@ -915,7 +915,7 @@ static int try_to_run_init_process(const char 
> >> > *init_filename)
> >> >  static noinline void __init kernel_init_freeable(void);
> >> >
> >> >  #ifdef CONFIG_DEBUG_RODATA
> >> > -static bool rodata_enabled = true;
> >> > +bool rodata_enabled = true;
> >> 
> >> Is there a mismatch here between the extern ifdef and the bool ifdef?
> >> I.e. shouldn't the ifdef here be || DEBUG_SET_MODULE_RONX too?
> >
> > Yes.
> >
> >> Also, can you mark this as __ro_after_init, since nothing changes it
> >> after the kernel command line is parsed?
> >
> > Yes, yes.
> >
> > Thanks,
> > -Takahiro AKASHI
> >
> >> Otherwise, this looks fine to me.
> >> 
> >> -Kees
> >> 
> >> 
> >> -- 
> >> Kees Cook
> >> Nexus Security


Re: [PATCH v6 3/6] mm/cma: populate ZONE_CMA

2016-10-25 Thread Joonsoo Kim
On Tue, Oct 18, 2016 at 05:27:30PM +0900, Joonsoo Kim wrote:
> On Tue, Oct 18, 2016 at 09:42:57AM +0200, Vlastimil Babka wrote:
> > On 10/14/2016 05:03 AM, js1...@gmail.com wrote:
> > >@@ -145,6 +145,35 @@ static int __init cma_activate_area(struct cma *cma)
> > > static int __init cma_init_reserved_areas(void)
> > > {
> > >   int i;
> > >+  struct zone *zone;
> > >+  pg_data_t *pgdat;
> > >+
> > >+  if (!cma_area_count)
> > >+  return 0;
> > >+
> > >+  for_each_online_pgdat(pgdat) {
> > >+  unsigned long start_pfn = UINT_MAX, end_pfn = 0;
> > >+
> > >+  for (i = 0; i < cma_area_count; i++) {
> > >+  if (pfn_to_nid(cma_areas[i].base_pfn) !=
> > >+  pgdat->node_id)
> > >+  continue;
> > >+
> > >+  start_pfn = min(start_pfn, cma_areas[i].base_pfn);
> > >+  end_pfn = max(end_pfn, cma_areas[i].base_pfn +
> > >+  cma_areas[i].count);
> > >+  }
> > >+
> > >+  if (!end_pfn)
> > >+  continue;
> > >+
> > >+  zone = >node_zones[ZONE_CMA];
> > >+
> > >+  /* ZONE_CMA doesn't need to exceed CMA region */
> > >+  zone->zone_start_pfn = max(zone->zone_start_pfn, start_pfn);
> > >+  zone->spanned_pages = min(zone_end_pfn(zone), end_pfn) -
> > >+  zone->zone_start_pfn;
> > 
> > Hmm, do the max/min here work as intended? IIUC the initial
> 
> Yeap.
> 
> > zone_start_pfn is UINT_MAX and zone->spanned_pages is 1? So at least
> > the max/min should be swapped?
> 
> No. CMA zone's start/end pfn are updated as node's start/end pfn.
> 
> > Also the zone_end_pfn(zone) on the second line already sees the
> > changes to zone->zone_start_pfn in the first line, so it's kind of a
> > mess. You should probably cache zone_end_pfn() to a temporary
> > variable before changing zone_start_pfn.
> 
> You're right although it doesn't cause any problem. I look at the code
> again and find that max/min isn't needed. Calculated start/end pfn
> should be inbetween node's start/end pfn so max(zone->zone_start_pfn,
> start_pfn) will return start_pfn and messed up min(zone_end_pfn(zone),
> end_pfn) will return end_pfn in all the cases.
> 
> Anyway, I will fix it as following.
> 
> zone->zone_start_pfn = start_pfn
> zone->spanned_pages = end_pfn - start_pfn

Hello,

Here comes fixed one.

--->8
>From 93fb05a83d74f9e2c8caebc2fa6d1a8807c9ffb6 Mon Sep 17 00:00:00 2001
From: Joonsoo Kim 
Date: Thu, 24 Mar 2016 22:29:10 +0900
Subject: [PATCH] mm/cma: populate ZONE_CMA

Until now, reserved pages for CMA are managed in the ordinary zones
where page's pfn are belong to. This approach has numorous problems
and fixing them isn't easy. (It is mentioned on previous patch.)
To fix this situation, ZONE_CMA is introduced in previous patch, but,
not yet populated. This patch implement population of ZONE_CMA
by stealing reserved pages from the ordinary zones.

Unlike previous implementation that kernel allocation request with
__GFP_MOVABLE could be serviced from CMA region, allocation request only
with GFP_HIGHUSER_MOVABLE can be serviced from CMA region in the new
approach. This is an inevitable design decision to use the zone
implementation because ZONE_CMA could contain highmem. Due to this
decision, ZONE_CMA will work like as ZONE_HIGHMEM or ZONE_MOVABLE.

I don't think it would be a problem because most of file cache pages
and anonymous pages are requested with GFP_HIGHUSER_MOVABLE. It could
be proved by the fact that there are many systems with ZONE_HIGHMEM and
they work fine. Notable disadvantage is that we cannot use these pages
for blockdev file cache page, because it usually has __GFP_MOVABLE but
not __GFP_HIGHMEM and __GFP_USER. But, in this case, there is pros and
cons. In my experience, blockdev file cache pages are one of the top
reason that causes cma_alloc() to fail temporarily. So, we can get more
guarantee of cma_alloc() success by discarding that case.

Implementation itself is very easy to understand. Steal when cma area is
initialized and recalculate various per zone stat/threshold.

Reviewed-by: Aneesh Kumar K.V 
Signed-off-by: Joonsoo Kim 
---
 include/linux/memory_hotplug.h |  3 --
 include/linux/mm.h |  1 +
 mm/cma.c   | 62 ++
 mm/internal.h  |  3 ++
 mm/page_alloc.c| 29 +---
 5 files changed, 86 insertions(+), 12 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 01033fa..ea5af47 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -198,9 +198,6 @@ extern void get_page_bootmem(unsigned long ingo, struct 
page *page,
 void mem_hotplug_begin(void);
 void mem_hotplug_done(void);
 
-extern void 

Re: [PATCH v6 3/6] mm/cma: populate ZONE_CMA

2016-10-25 Thread Joonsoo Kim
On Tue, Oct 18, 2016 at 05:27:30PM +0900, Joonsoo Kim wrote:
> On Tue, Oct 18, 2016 at 09:42:57AM +0200, Vlastimil Babka wrote:
> > On 10/14/2016 05:03 AM, js1...@gmail.com wrote:
> > >@@ -145,6 +145,35 @@ static int __init cma_activate_area(struct cma *cma)
> > > static int __init cma_init_reserved_areas(void)
> > > {
> > >   int i;
> > >+  struct zone *zone;
> > >+  pg_data_t *pgdat;
> > >+
> > >+  if (!cma_area_count)
> > >+  return 0;
> > >+
> > >+  for_each_online_pgdat(pgdat) {
> > >+  unsigned long start_pfn = UINT_MAX, end_pfn = 0;
> > >+
> > >+  for (i = 0; i < cma_area_count; i++) {
> > >+  if (pfn_to_nid(cma_areas[i].base_pfn) !=
> > >+  pgdat->node_id)
> > >+  continue;
> > >+
> > >+  start_pfn = min(start_pfn, cma_areas[i].base_pfn);
> > >+  end_pfn = max(end_pfn, cma_areas[i].base_pfn +
> > >+  cma_areas[i].count);
> > >+  }
> > >+
> > >+  if (!end_pfn)
> > >+  continue;
> > >+
> > >+  zone = >node_zones[ZONE_CMA];
> > >+
> > >+  /* ZONE_CMA doesn't need to exceed CMA region */
> > >+  zone->zone_start_pfn = max(zone->zone_start_pfn, start_pfn);
> > >+  zone->spanned_pages = min(zone_end_pfn(zone), end_pfn) -
> > >+  zone->zone_start_pfn;
> > 
> > Hmm, do the max/min here work as intended? IIUC the initial
> 
> Yeap.
> 
> > zone_start_pfn is UINT_MAX and zone->spanned_pages is 1? So at least
> > the max/min should be swapped?
> 
> No. CMA zone's start/end pfn are updated as node's start/end pfn.
> 
> > Also the zone_end_pfn(zone) on the second line already sees the
> > changes to zone->zone_start_pfn in the first line, so it's kind of a
> > mess. You should probably cache zone_end_pfn() to a temporary
> > variable before changing zone_start_pfn.
> 
> You're right although it doesn't cause any problem. I look at the code
> again and find that max/min isn't needed. Calculated start/end pfn
> should be inbetween node's start/end pfn so max(zone->zone_start_pfn,
> start_pfn) will return start_pfn and messed up min(zone_end_pfn(zone),
> end_pfn) will return end_pfn in all the cases.
> 
> Anyway, I will fix it as following.
> 
> zone->zone_start_pfn = start_pfn
> zone->spanned_pages = end_pfn - start_pfn

Hello,

Here comes fixed one.

--->8
>From 93fb05a83d74f9e2c8caebc2fa6d1a8807c9ffb6 Mon Sep 17 00:00:00 2001
From: Joonsoo Kim 
Date: Thu, 24 Mar 2016 22:29:10 +0900
Subject: [PATCH] mm/cma: populate ZONE_CMA

Until now, reserved pages for CMA are managed in the ordinary zones
where page's pfn are belong to. This approach has numorous problems
and fixing them isn't easy. (It is mentioned on previous patch.)
To fix this situation, ZONE_CMA is introduced in previous patch, but,
not yet populated. This patch implement population of ZONE_CMA
by stealing reserved pages from the ordinary zones.

Unlike previous implementation that kernel allocation request with
__GFP_MOVABLE could be serviced from CMA region, allocation request only
with GFP_HIGHUSER_MOVABLE can be serviced from CMA region in the new
approach. This is an inevitable design decision to use the zone
implementation because ZONE_CMA could contain highmem. Due to this
decision, ZONE_CMA will work like as ZONE_HIGHMEM or ZONE_MOVABLE.

I don't think it would be a problem because most of file cache pages
and anonymous pages are requested with GFP_HIGHUSER_MOVABLE. It could
be proved by the fact that there are many systems with ZONE_HIGHMEM and
they work fine. Notable disadvantage is that we cannot use these pages
for blockdev file cache page, because it usually has __GFP_MOVABLE but
not __GFP_HIGHMEM and __GFP_USER. But, in this case, there is pros and
cons. In my experience, blockdev file cache pages are one of the top
reason that causes cma_alloc() to fail temporarily. So, we can get more
guarantee of cma_alloc() success by discarding that case.

Implementation itself is very easy to understand. Steal when cma area is
initialized and recalculate various per zone stat/threshold.

Reviewed-by: Aneesh Kumar K.V 
Signed-off-by: Joonsoo Kim 
---
 include/linux/memory_hotplug.h |  3 --
 include/linux/mm.h |  1 +
 mm/cma.c   | 62 ++
 mm/internal.h  |  3 ++
 mm/page_alloc.c| 29 +---
 5 files changed, 86 insertions(+), 12 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 01033fa..ea5af47 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -198,9 +198,6 @@ extern void get_page_bootmem(unsigned long ingo, struct 
page *page,
 void mem_hotplug_begin(void);
 void mem_hotplug_done(void);
 
-extern void set_zone_contiguous(struct zone *zone);
-extern void clear_zone_contiguous(struct zone 

Re: [PATCH v3] x86/msr: Add write msr notrace to avoid the debug codes splash

2016-10-25 Thread Wanpeng Li
2016-10-25 19:15 GMT+08:00 Paolo Bonzini :
>
>
> On 25/10/2016 04:58, Wanpeng Li wrote:
>> @@ -319,7 +319,7 @@ static void kvm_guest_apic_eoi_write(u32 reg, u32 val)
>
> This needs to be notrace too.

Ok, I just sent out a new version for this.

Regards,
Wanpeng Li


Re: [PATCH v3] x86/msr: Add write msr notrace to avoid the debug codes splash

2016-10-25 Thread Wanpeng Li
2016-10-25 19:15 GMT+08:00 Paolo Bonzini :
>
>
> On 25/10/2016 04:58, Wanpeng Li wrote:
>> @@ -319,7 +319,7 @@ static void kvm_guest_apic_eoi_write(u32 reg, u32 val)
>
> This needs to be notrace too.

Ok, I just sent out a new version for this.

Regards,
Wanpeng Li


Re: [RFC PATCH] xhci: do not halt the secondary HCD

2016-10-25 Thread Joel Stanley
On Tue, Sep 20, 2016 at 5:56 PM, Mathias Nyman
 wrote:
> Quick Googling shows that that TI TUSB 73x0 USB3.0 xHCI host has an issue
> with halting.
>
> Errata says host needs 125us to 1ms between the last control transfer and
> clearing the run/stop bit. (halting the host)
>
> Suggested workaround is to wait at least 2ms before halting the host.
>
> See issue #10 in:
> http://www.ti.com/lit/er/sllz076/sllz076.pdf
>
> It might just be that the patch works because it forces halting the host to
> be done later (secondary hcd -> primary hcd),  giving it enough time after
> the last control transfer.

Well spotted.

I gave this a go, adding a quirk and performing a msleep:

+++ b/drivers/usb/host/xhci.c
@@ -109,6 +109,10 @@ int xhci_halt(struct xhci_hcd *xhci)
 {
int ret;
xhci_dbg_trace(xhci, trace_xhci_dbg_init, "// Halt the HC");
+
+   if (xhci->quirks & XHCI_HALT_DELAY_QUIRK)
+   msleep(2);
+
xhci_quiesce(xhci);

However it didn't help.

Are we guaranteed that transfers are not in flight at that point?

>
>>> a first step.
>>>
>>> load primary
>>> load secondary  (starts the xhci controller
>>> ...
>>> unload secondary (halts the controller)
>>> unload primary   (free memory)
>
>
> Now thinking about it, it doesn't really make sense to halt the host
> controller hardware
> before removing the primary hcd. It will just cause devices under the
> primary (USB2) to
> be removed uncleanly.  So basically the idea of the workaround makes sense,
> it just needs
> to be cleaned up from a workaround to intended behavior.

Great. When you say clean up, do you just mean tidying the comments?

Cheers,

Joel


>
> We might also need an additional quirk for TI TUSB 73x0 that adds a msleep()
> before the
> xhci_halt, even if it's moved to the last hcd removed.
>
> -Mathias


Re: [RFC PATCH] xhci: do not halt the secondary HCD

2016-10-25 Thread Joel Stanley
On Tue, Sep 20, 2016 at 5:56 PM, Mathias Nyman
 wrote:
> Quick Googling shows that that TI TUSB 73x0 USB3.0 xHCI host has an issue
> with halting.
>
> Errata says host needs 125us to 1ms between the last control transfer and
> clearing the run/stop bit. (halting the host)
>
> Suggested workaround is to wait at least 2ms before halting the host.
>
> See issue #10 in:
> http://www.ti.com/lit/er/sllz076/sllz076.pdf
>
> It might just be that the patch works because it forces halting the host to
> be done later (secondary hcd -> primary hcd),  giving it enough time after
> the last control transfer.

Well spotted.

I gave this a go, adding a quirk and performing a msleep:

+++ b/drivers/usb/host/xhci.c
@@ -109,6 +109,10 @@ int xhci_halt(struct xhci_hcd *xhci)
 {
int ret;
xhci_dbg_trace(xhci, trace_xhci_dbg_init, "// Halt the HC");
+
+   if (xhci->quirks & XHCI_HALT_DELAY_QUIRK)
+   msleep(2);
+
xhci_quiesce(xhci);

However it didn't help.

Are we guaranteed that transfers are not in flight at that point?

>
>>> a first step.
>>>
>>> load primary
>>> load secondary  (starts the xhci controller
>>> ...
>>> unload secondary (halts the controller)
>>> unload primary   (free memory)
>
>
> Now thinking about it, it doesn't really make sense to halt the host
> controller hardware
> before removing the primary hcd. It will just cause devices under the
> primary (USB2) to
> be removed uncleanly.  So basically the idea of the workaround makes sense,
> it just needs
> to be cleaned up from a workaround to intended behavior.

Great. When you say clean up, do you just mean tidying the comments?

Cheers,

Joel


>
> We might also need an additional quirk for TI TUSB 73x0 that adds a msleep()
> before the
> xhci_halt, even if it's moved to the last hcd removed.
>
> -Mathias


[PATCH v4] x86/msr: Add write msr notrace to avoid the debug codes splash

2016-10-25 Thread Wanpeng Li
From: Wanpeng Li 

As Peterz pointed out:

| The thing is, many many smp_reschedule_interrupt() invocations don't
| actually execute anything much at all and are only send to tickle the
| return to user path (which does the actual preemption).

This patch add write msr notrace to avoid the debug codes splash.

Suggested-by: Peter Zijlstra 
Suggested-by: Paolo Bonzini 
Cc: Ingo Molnar 
Cc: Mike Galbraith 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Paolo Bonzini 
Signed-off-by: Wanpeng Li 
---
 arch/x86/include/asm/apic.h |  3 ++-
 arch/x86/include/asm/msr.h  | 15 +++
 arch/x86/kernel/apic/apic.c |  1 +
 arch/x86/kernel/kvm.c   |  6 +++---
 arch/x86/kernel/smp.c   |  2 --
 5 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index f5aaf6c..a5a0bcf 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -196,7 +196,7 @@ static inline void native_apic_msr_write(u32 reg, u32 v)
 
 static inline void native_apic_msr_eoi_write(u32 reg, u32 v)
 {
-   wrmsr(APIC_BASE_MSR + (APIC_EOI >> 4), APIC_EOI_ACK, 0);
+   wrmsr_notrace(APIC_BASE_MSR + (APIC_EOI >> 4), APIC_EOI_ACK, 0);
 }
 
 static inline u32 native_apic_msr_read(u32 reg)
@@ -332,6 +332,7 @@ struct apic {
 * on write for EOI.
 */
void (*eoi_write)(u32 reg, u32 v);
+   void (*native_eoi_write)(u32 reg, u32 v);
u64 (*icr_read)(void);
void (*icr_write)(u32 low, u32 high);
void (*wait_icr_idle)(void);
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index b5fee97..afbb221 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -127,6 +127,21 @@ notrace static inline void native_write_msr(unsigned int 
msr,
 }
 
 /* Can be uninlined because referenced by paravirt */
+notrace static inline void native_write_msr_notrace(unsigned int msr,
+   unsigned low, unsigned high)
+{
+   asm volatile("1: wrmsr\n"
+"2:\n"
+_ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_wrmsr_unsafe)
+: : "c" (msr), "a"(low), "d" (high) : "memory");
+}
+
+static inline void wrmsr_notrace(unsigned msr, unsigned low, unsigned high)
+{
+   native_write_msr_notrace(msr, low, high);
+}
+
+/* Can be uninlined because referenced by paravirt */
 notrace static inline int native_write_msr_safe(unsigned int msr,
unsigned low, unsigned high)
 {
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 88c657b..2686894 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -2263,6 +2263,7 @@ void __init apic_set_eoi_write(void (*eoi_write)(u32 reg, 
u32 v))
for (drv = __apicdrivers; drv < __apicdrivers_end; drv++) {
/* Should happen once for each apic */
WARN_ON((*drv)->eoi_write == eoi_write);
+   (*drv)->native_eoi_write = (*drv)->eoi_write;
(*drv)->eoi_write = eoi_write;
}
 }
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index edbbfc8..a4627ed 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -308,7 +308,7 @@ static void kvm_register_steal_time(void)
 
 static DEFINE_PER_CPU(unsigned long, kvm_apic_eoi) = KVM_PV_EOI_DISABLED;
 
-static void kvm_guest_apic_eoi_write(u32 reg, u32 val)
+static void kvm_guest_apic_eoi_write_notrace(u32 reg, u32 val)
 {
/**
 * This relies on __test_and_clear_bit to modify the memory
@@ -319,7 +319,7 @@ static void kvm_guest_apic_eoi_write(u32 reg, u32 val)
 */
if (__test_and_clear_bit(KVM_PV_EOI_BIT, this_cpu_ptr(_apic_eoi)))
return;
-   apic_write(APIC_EOI, APIC_EOI_ACK);
+   apic->native_eoi_write(APIC_EOI, APIC_EOI_ACK);
 }
 
 static void kvm_guest_cpu_init(void)
@@ -474,7 +474,7 @@ void __init kvm_guest_init(void)
}
 
if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
-   apic_set_eoi_write(kvm_guest_apic_eoi_write);
+   apic_set_eoi_write(kvm_guest_apic_eoi_write_notrace);
 
if (kvmclock_vsyscall)
kvm_setup_vsyscall_timeinfo();
diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index c00cb64..68f8cc2 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -261,10 +261,8 @@ static inline void __smp_reschedule_interrupt(void)
 
 __visible void smp_reschedule_interrupt(struct pt_regs *regs)
 {
-   irq_enter();
ack_APIC_irq();
__smp_reschedule_interrupt();
-   irq_exit();
/*
 * KVM uses this interrupt to force a cpu out of guest mode
 */
-- 
1.9.1



[PATCH v4] x86/msr: Add write msr notrace to avoid the debug codes splash

2016-10-25 Thread Wanpeng Li
From: Wanpeng Li 

As Peterz pointed out:

| The thing is, many many smp_reschedule_interrupt() invocations don't
| actually execute anything much at all and are only send to tickle the
| return to user path (which does the actual preemption).

This patch add write msr notrace to avoid the debug codes splash.

Suggested-by: Peter Zijlstra 
Suggested-by: Paolo Bonzini 
Cc: Ingo Molnar 
Cc: Mike Galbraith 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Paolo Bonzini 
Signed-off-by: Wanpeng Li 
---
 arch/x86/include/asm/apic.h |  3 ++-
 arch/x86/include/asm/msr.h  | 15 +++
 arch/x86/kernel/apic/apic.c |  1 +
 arch/x86/kernel/kvm.c   |  6 +++---
 arch/x86/kernel/smp.c   |  2 --
 5 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index f5aaf6c..a5a0bcf 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -196,7 +196,7 @@ static inline void native_apic_msr_write(u32 reg, u32 v)
 
 static inline void native_apic_msr_eoi_write(u32 reg, u32 v)
 {
-   wrmsr(APIC_BASE_MSR + (APIC_EOI >> 4), APIC_EOI_ACK, 0);
+   wrmsr_notrace(APIC_BASE_MSR + (APIC_EOI >> 4), APIC_EOI_ACK, 0);
 }
 
 static inline u32 native_apic_msr_read(u32 reg)
@@ -332,6 +332,7 @@ struct apic {
 * on write for EOI.
 */
void (*eoi_write)(u32 reg, u32 v);
+   void (*native_eoi_write)(u32 reg, u32 v);
u64 (*icr_read)(void);
void (*icr_write)(u32 low, u32 high);
void (*wait_icr_idle)(void);
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index b5fee97..afbb221 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -127,6 +127,21 @@ notrace static inline void native_write_msr(unsigned int 
msr,
 }
 
 /* Can be uninlined because referenced by paravirt */
+notrace static inline void native_write_msr_notrace(unsigned int msr,
+   unsigned low, unsigned high)
+{
+   asm volatile("1: wrmsr\n"
+"2:\n"
+_ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_wrmsr_unsafe)
+: : "c" (msr), "a"(low), "d" (high) : "memory");
+}
+
+static inline void wrmsr_notrace(unsigned msr, unsigned low, unsigned high)
+{
+   native_write_msr_notrace(msr, low, high);
+}
+
+/* Can be uninlined because referenced by paravirt */
 notrace static inline int native_write_msr_safe(unsigned int msr,
unsigned low, unsigned high)
 {
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 88c657b..2686894 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -2263,6 +2263,7 @@ void __init apic_set_eoi_write(void (*eoi_write)(u32 reg, 
u32 v))
for (drv = __apicdrivers; drv < __apicdrivers_end; drv++) {
/* Should happen once for each apic */
WARN_ON((*drv)->eoi_write == eoi_write);
+   (*drv)->native_eoi_write = (*drv)->eoi_write;
(*drv)->eoi_write = eoi_write;
}
 }
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index edbbfc8..a4627ed 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -308,7 +308,7 @@ static void kvm_register_steal_time(void)
 
 static DEFINE_PER_CPU(unsigned long, kvm_apic_eoi) = KVM_PV_EOI_DISABLED;
 
-static void kvm_guest_apic_eoi_write(u32 reg, u32 val)
+static void kvm_guest_apic_eoi_write_notrace(u32 reg, u32 val)
 {
/**
 * This relies on __test_and_clear_bit to modify the memory
@@ -319,7 +319,7 @@ static void kvm_guest_apic_eoi_write(u32 reg, u32 val)
 */
if (__test_and_clear_bit(KVM_PV_EOI_BIT, this_cpu_ptr(_apic_eoi)))
return;
-   apic_write(APIC_EOI, APIC_EOI_ACK);
+   apic->native_eoi_write(APIC_EOI, APIC_EOI_ACK);
 }
 
 static void kvm_guest_cpu_init(void)
@@ -474,7 +474,7 @@ void __init kvm_guest_init(void)
}
 
if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
-   apic_set_eoi_write(kvm_guest_apic_eoi_write);
+   apic_set_eoi_write(kvm_guest_apic_eoi_write_notrace);
 
if (kvmclock_vsyscall)
kvm_setup_vsyscall_timeinfo();
diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index c00cb64..68f8cc2 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -261,10 +261,8 @@ static inline void __smp_reschedule_interrupt(void)
 
 __visible void smp_reschedule_interrupt(struct pt_regs *regs)
 {
-   irq_enter();
ack_APIC_irq();
__smp_reschedule_interrupt();
-   irq_exit();
/*
 * KVM uses this interrupt to force a cpu out of guest mode
 */
-- 
1.9.1



Re: [PATCHv4 18/43] block: define BIO_MAX_PAGES to HPAGE_PMD_NR if huge page cache enabled

2016-10-25 Thread Andreas Dilger
On Oct 25, 2016, at 6:54 AM, Kirill A. Shutemov  wrote:
> 
> On Tue, Oct 25, 2016 at 12:21:22AM -0700, Christoph Hellwig wrote:
>> On Tue, Oct 25, 2016 at 03:13:17AM +0300, Kirill A. Shutemov wrote:
>>> We are going to do IO a huge page a time. So we need BIO_MAX_PAGES to be
>>> at least HPAGE_PMD_NR. For x86-64, it's 512 pages.
>> 
>> NAK.  The maximum bio size should not depend on an obscure vm config,
>> please send a standalone patch increasing the size to the block list,
>> with a much long explanation.  Also you can't simply increase the size
>> of the largers pool, we'll probably need more pools instead, or maybe
>> even implement a similar chaining scheme as we do for struct
>> scatterlist.
> 
> The size of required pool depends on architecture: different architectures
> has different (huge page size)/(base page size).
> 
> Would it be okay if I add one more pool with size equal to HPAGE_PMD_NR,
> if it's bigger than than BIO_MAX_PAGES and huge pages are enabled?

Why wouldn't you have all the pool sizes in between?  Definitely 1MB has
been too small already for high-bandwidth IO.  I wouldn't mind BIOs up to
4MB or larger since most high-end RAID hardware does best with 4MB IOs.

Cheers, Andreas







signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [PATCHv4 18/43] block: define BIO_MAX_PAGES to HPAGE_PMD_NR if huge page cache enabled

2016-10-25 Thread Andreas Dilger
On Oct 25, 2016, at 6:54 AM, Kirill A. Shutemov  wrote:
> 
> On Tue, Oct 25, 2016 at 12:21:22AM -0700, Christoph Hellwig wrote:
>> On Tue, Oct 25, 2016 at 03:13:17AM +0300, Kirill A. Shutemov wrote:
>>> We are going to do IO a huge page a time. So we need BIO_MAX_PAGES to be
>>> at least HPAGE_PMD_NR. For x86-64, it's 512 pages.
>> 
>> NAK.  The maximum bio size should not depend on an obscure vm config,
>> please send a standalone patch increasing the size to the block list,
>> with a much long explanation.  Also you can't simply increase the size
>> of the largers pool, we'll probably need more pools instead, or maybe
>> even implement a similar chaining scheme as we do for struct
>> scatterlist.
> 
> The size of required pool depends on architecture: different architectures
> has different (huge page size)/(base page size).
> 
> Would it be okay if I add one more pool with size equal to HPAGE_PMD_NR,
> if it's bigger than than BIO_MAX_PAGES and huge pages are enabled?

Why wouldn't you have all the pool sizes in between?  Definitely 1MB has
been too small already for high-bandwidth IO.  I wouldn't mind BIOs up to
4MB or larger since most high-end RAID hardware does best with 4MB IOs.

Cheers, Andreas







signature.asc
Description: Message signed with OpenPGP using GPGMail


RE: [PATCH 1/3] clk: qcom: gdsc: Add support for gdscs with HW control

2016-10-25 Thread Sricharan
Hi Stan,

>Hi Sricharan,
>
>On 10/24/2016 01:18 PM, Sricharan R wrote:
>> From: Rajendra Nayak 
>>
>> Some GDSCs might support a HW control mode, where in the power
>> domain (gdsc) is brought in and out of low power state (while
>> unsued) without any SW assistance, saving power.
>> Such GDSCs can be configured in a HW control mode when powered on
>> until they are explicitly requested to be powered off by software.
>>
>> Signed-off-by: Rajendra Nayak 
>> Signed-off-by: Sricharan R 
>> ---
>>  drivers/clk/qcom/gdsc.c | 15 +++
>>  drivers/clk/qcom/gdsc.h |  1 +
>>  2 files changed, 16 insertions(+)
>>
>> diff --git a/drivers/clk/qcom/gdsc.c b/drivers/clk/qcom/gdsc.c
>> index f12d7b2..a5e1c8c 100644
>> --- a/drivers/clk/qcom/gdsc.c
>> +++ b/drivers/clk/qcom/gdsc.c
>> @@ -55,6 +55,13 @@ static int gdsc_is_enabled(struct gdsc *sc, unsigned int 
>> reg)
>>  return !!(val & PWR_ON_MASK);
>>  }
>>
>> +static int gdsc_hwctrl(struct gdsc *sc, bool en)
>> +{
>> +u32 val = en ? HW_CONTROL_MASK : 0;
>> +
>> +return regmap_update_bits(sc->regmap, sc->gdscr, HW_CONTROL_MASK, val);
>> +}
>> +
>>  static int gdsc_toggle_logic(struct gdsc *sc, bool en)
>>  {
>>  int ret;
>> @@ -164,6 +171,10 @@ static int gdsc_enable(struct generic_pm_domain *domain)
>>   */
>>  udelay(1);
>>
>> +/* Turn on HW trigger mode if supported */
>> +if (sc->flags & HW_CTRL)
>> +gdsc_hwctrl(sc, true);
>
   Sure, will add the check.

Regards,
 Sricharan




RE: [PATCH 1/3] clk: qcom: gdsc: Add support for gdscs with HW control

2016-10-25 Thread Sricharan
Hi Stan,

>Hi Sricharan,
>
>On 10/24/2016 01:18 PM, Sricharan R wrote:
>> From: Rajendra Nayak 
>>
>> Some GDSCs might support a HW control mode, where in the power
>> domain (gdsc) is brought in and out of low power state (while
>> unsued) without any SW assistance, saving power.
>> Such GDSCs can be configured in a HW control mode when powered on
>> until they are explicitly requested to be powered off by software.
>>
>> Signed-off-by: Rajendra Nayak 
>> Signed-off-by: Sricharan R 
>> ---
>>  drivers/clk/qcom/gdsc.c | 15 +++
>>  drivers/clk/qcom/gdsc.h |  1 +
>>  2 files changed, 16 insertions(+)
>>
>> diff --git a/drivers/clk/qcom/gdsc.c b/drivers/clk/qcom/gdsc.c
>> index f12d7b2..a5e1c8c 100644
>> --- a/drivers/clk/qcom/gdsc.c
>> +++ b/drivers/clk/qcom/gdsc.c
>> @@ -55,6 +55,13 @@ static int gdsc_is_enabled(struct gdsc *sc, unsigned int 
>> reg)
>>  return !!(val & PWR_ON_MASK);
>>  }
>>
>> +static int gdsc_hwctrl(struct gdsc *sc, bool en)
>> +{
>> +u32 val = en ? HW_CONTROL_MASK : 0;
>> +
>> +return regmap_update_bits(sc->regmap, sc->gdscr, HW_CONTROL_MASK, val);
>> +}
>> +
>>  static int gdsc_toggle_logic(struct gdsc *sc, bool en)
>>  {
>>  int ret;
>> @@ -164,6 +171,10 @@ static int gdsc_enable(struct generic_pm_domain *domain)
>>   */
>>  udelay(1);
>>
>> +/* Turn on HW trigger mode if supported */
>> +if (sc->flags & HW_CTRL)
>> +gdsc_hwctrl(sc, true);
>
   Sure, will add the check.

Regards,
 Sricharan




[PATCH] drm: rcar-du: Fix R-Car Gen3 crash when VSP is disabled

2016-10-25 Thread Magnus Damm
From: Magnus Damm 

For the DU to operate on R-Car Gen3 hardware a combination of DU
and VSP devices are required. Since the DU driver also supports
earlier generations hardware the VSP portion is enabled via Kconfig.

The arm64 defconfig is as of v4.9-rc1 having the DU driver enabled
as a module, however this is not enough to support R-Car Gen3. In
the current case of CONFIG_DRM_RCAR_VSP=n then the kernel crashes
when loading the module. This patch is fixing that particular case.

In more detail, the crash triggers in drm_atomic_get_plane_state()
when __drm_atomic_helper_set_config() passes NULL as crtc->primary.

This patch corrects this issue by failing to load the DU driver on
R-Car Gen3 when VSP is not available.

Signed-off-by: Magnus Damm 
---

 drivers/gpu/drm/rcar-du/rcar_du_vsp.h |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- 0001/drivers/gpu/drm/rcar-du/rcar_du_vsp.h
+++ work/drivers/gpu/drm/rcar-du/rcar_du_vsp.h  2016-10-26 00:01:12.920607110 
+0900
@@ -70,7 +70,7 @@ void rcar_du_vsp_disable(struct rcar_du_
 void rcar_du_vsp_atomic_begin(struct rcar_du_crtc *crtc);
 void rcar_du_vsp_atomic_flush(struct rcar_du_crtc *crtc);
 #else
-static inline int rcar_du_vsp_init(struct rcar_du_vsp *vsp) { return 0; };
+static inline int rcar_du_vsp_init(struct rcar_du_vsp *vsp) { return -ENXIO; };
 static inline void rcar_du_vsp_enable(struct rcar_du_crtc *crtc) { };
 static inline void rcar_du_vsp_disable(struct rcar_du_crtc *crtc) { };
 static inline void rcar_du_vsp_atomic_begin(struct rcar_du_crtc *crtc) { };


[PATCH] drm: rcar-du: Fix R-Car Gen3 crash when VSP is disabled

2016-10-25 Thread Magnus Damm
From: Magnus Damm 

For the DU to operate on R-Car Gen3 hardware a combination of DU
and VSP devices are required. Since the DU driver also supports
earlier generations hardware the VSP portion is enabled via Kconfig.

The arm64 defconfig is as of v4.9-rc1 having the DU driver enabled
as a module, however this is not enough to support R-Car Gen3. In
the current case of CONFIG_DRM_RCAR_VSP=n then the kernel crashes
when loading the module. This patch is fixing that particular case.

In more detail, the crash triggers in drm_atomic_get_plane_state()
when __drm_atomic_helper_set_config() passes NULL as crtc->primary.

This patch corrects this issue by failing to load the DU driver on
R-Car Gen3 when VSP is not available.

Signed-off-by: Magnus Damm 
---

 drivers/gpu/drm/rcar-du/rcar_du_vsp.h |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- 0001/drivers/gpu/drm/rcar-du/rcar_du_vsp.h
+++ work/drivers/gpu/drm/rcar-du/rcar_du_vsp.h  2016-10-26 00:01:12.920607110 
+0900
@@ -70,7 +70,7 @@ void rcar_du_vsp_disable(struct rcar_du_
 void rcar_du_vsp_atomic_begin(struct rcar_du_crtc *crtc);
 void rcar_du_vsp_atomic_flush(struct rcar_du_crtc *crtc);
 #else
-static inline int rcar_du_vsp_init(struct rcar_du_vsp *vsp) { return 0; };
+static inline int rcar_du_vsp_init(struct rcar_du_vsp *vsp) { return -ENXIO; };
 static inline void rcar_du_vsp_enable(struct rcar_du_crtc *crtc) { };
 static inline void rcar_du_vsp_disable(struct rcar_du_crtc *crtc) { };
 static inline void rcar_du_vsp_atomic_begin(struct rcar_du_crtc *crtc) { };


Re: [PATCH V2 4/8] PM / OPP: Pass struct dev_pm_opp_supply to _set_opp_voltage()

2016-10-25 Thread Viresh Kumar
On 25-10-16, 13:26, Stephen Boyd wrote:
> For things like AVS we'll probably want to do that, although it's
> sort of funny because replacing RCU with rw-locks is the opposite
> direction most people go.

Yes, that would be very funny :)

> With AVS we would be updating the
> voltage(s) in use for the current OPP, and we would want that
> update to block any OPP transition until the voltage is adjusted.
> I don't know how we would do that with RCU very well. Plus, RCU
> is for reader heavy things, but we mostly have one or two
> readers.

Not just that, think of opp_disable() function. What guarantees currently that
an OPP being disabled isn't already used right now? Or is on the way of getting
used?

I strongly feel RCU is not the best fit for OPP core at least.

> I guess it's ok for now to do all this copying, but it feels like
> we'll need to undo a large portion of it later with things like
> AVS.

Yes.

> Or at least we'll be doing copies for almost no reason
> because we'll want to hold the read lock across the whole OPP
> transition. I was going to suggest we pass around information
> about what we want to grab from the RCU protected data
> structures, think index of regulator, etc. and then have small
> RCU read-side critical sections to grab that info during the OPP
> transition but I'm not sure that's any better. It might be worse
> because the OPP could change during the OPP transition and we
> could be using half of the old and half of the new data.

The problem is that this code is getting harder to read for everybody. If we are
finding it difficult to understand, what about newbies..

-- 
viresh


Re: [PATCH V2 4/8] PM / OPP: Pass struct dev_pm_opp_supply to _set_opp_voltage()

2016-10-25 Thread Viresh Kumar
On 25-10-16, 13:26, Stephen Boyd wrote:
> For things like AVS we'll probably want to do that, although it's
> sort of funny because replacing RCU with rw-locks is the opposite
> direction most people go.

Yes, that would be very funny :)

> With AVS we would be updating the
> voltage(s) in use for the current OPP, and we would want that
> update to block any OPP transition until the voltage is adjusted.
> I don't know how we would do that with RCU very well. Plus, RCU
> is for reader heavy things, but we mostly have one or two
> readers.

Not just that, think of opp_disable() function. What guarantees currently that
an OPP being disabled isn't already used right now? Or is on the way of getting
used?

I strongly feel RCU is not the best fit for OPP core at least.

> I guess it's ok for now to do all this copying, but it feels like
> we'll need to undo a large portion of it later with things like
> AVS.

Yes.

> Or at least we'll be doing copies for almost no reason
> because we'll want to hold the read lock across the whole OPP
> transition. I was going to suggest we pass around information
> about what we want to grab from the RCU protected data
> structures, think index of regulator, etc. and then have small
> RCU read-side critical sections to grab that info during the OPP
> transition but I'm not sure that's any better. It might be worse
> because the OPP could change during the OPP transition and we
> could be using half of the old and half of the new data.

The problem is that this code is getting harder to read for everybody. If we are
finding it difficult to understand, what about newbies..

-- 
viresh


Re: [PATCH 1/3] usb: dwc3: host: inherit dma configuration from parent dev

2016-10-25 Thread Peter Chen
On Tue, Oct 25, 2016 at 04:26:26PM +0530, Sriram Dash wrote:
> For xhci-hcd platform device, all the DMA parameters are not configured
> properly, notably dma ops for dwc3 devices.
> 
> The idea here is that you pass in the parent of_node along with the child
> device pointer, so it would behave exactly like the parent already does.
> The difference is that it also handles all the other attributes besides
> the mask.
> Splitting the usb_bus->controller field into the Linux-internal device
> (used for the sysfs hierarchy, for printks and for power management)
> and a new pointer (used for DMA, DT enumeration and phy lookup) probably
> covers all that we really need.
> 
> Signed-off-by: Arnd Bergmann 
> Signed-off-by: Sriram Dash 
> Cc: Felipe Balbi 
> Cc: Grygorii Strashko 
> Cc: Sinjan Kumar 
> Cc: David Fisher 
> Cc: Catalin Marinas 
> Cc: "Thang Q. Nguyen" 
> Cc: Yoshihiro Shimoda 
> Cc: Stephen Boyd 
> Cc: Bjorn Andersson 
> Cc: Ming Lei 
> Cc: Jon Masters 
> Cc: Dann Frazier 
> Cc: Peter Chen 
> Cc: Leo Li 
> ---
>  drivers/usb/chipidea/host.c  |  3 ++-
>  drivers/usb/chipidea/udc.c   | 10 +
>  drivers/usb/core/buffer.c| 12 +--
>  drivers/usb/core/hcd.c   | 48 
> ++--
>  drivers/usb/core/usb.c   | 18 -
>  drivers/usb/dwc3/core.c  | 22 +---
>  drivers/usb/dwc3/core.h  |  1 +
>  drivers/usb/dwc3/ep0.c   |  8 
>  drivers/usb/dwc3/gadget.c| 37 +-
>  drivers/usb/dwc3/host.c  |  8 
>  drivers/usb/host/ehci-fsl.c  |  4 ++--
>  drivers/usb/host/xhci-mem.c  | 12 +--
>  drivers/usb/host/xhci-plat.c | 33 +++---
>  drivers/usb/host/xhci.c  | 15 ++
>  include/linux/usb.h  |  1 +
>  include/linux/usb/hcd.h  |  3 +++
>  16 files changed, 144 insertions(+), 91 deletions(-)
> 
> diff --git a/drivers/usb/chipidea/host.c b/drivers/usb/chipidea/host.c
> index 96ae695..ca27893 100644
> --- a/drivers/usb/chipidea/host.c
> +++ b/drivers/usb/chipidea/host.c
> @@ -116,7 +116,8 @@ static int host_start(struct ci_hdrc *ci)
>   if (usb_disabled())
>   return -ENODEV;
>  
> - hcd = usb_create_hcd(_ehci_hc_driver, ci->dev, dev_name(ci->dev));
> + hcd = __usb_create_hcd(_ehci_hc_driver, ci->dev->parent,
> +ci->dev, dev_name(ci->dev), NULL);
>   if (!hcd)
>   return -ENOMEM;
>  
> diff --git a/drivers/usb/chipidea/udc.c b/drivers/usb/chipidea/udc.c
> index 661f43f..bc55922 100644
> --- a/drivers/usb/chipidea/udc.c
> +++ b/drivers/usb/chipidea/udc.c
> @@ -423,7 +423,8 @@ static int _hardware_enqueue(struct ci_hw_ep *hwep, 
> struct ci_hw_req *hwreq)
>  
>   hwreq->req.status = -EALREADY;
>  
> - ret = usb_gadget_map_request(>gadget, >req, hwep->dir);
> + ret = usb_gadget_map_request_by_dev(ci->dev->parent,
> + >req, hwep->dir);
>   if (ret)
>   return ret;
>  
> @@ -603,7 +604,8 @@ static int _hardware_dequeue(struct ci_hw_ep *hwep, 
> struct ci_hw_req *hwreq)
>   list_del_init(>td);
>   }
>  
> - usb_gadget_unmap_request(>ci->gadget, >req, hwep->dir);
> + usb_gadget_unmap_request_by_dev(hwep->ci->dev->parent,
> + >req, hwep->dir);
>  
>   hwreq->req.actual += actual;
>  
> @@ -1904,13 +1906,13 @@ static int udc_start(struct ci_hdrc *ci)
>   INIT_LIST_HEAD(>gadget.ep_list);
>  
>   /* alloc resources */
> - ci->qh_pool = dma_pool_create("ci_hw_qh", dev,
> + ci->qh_pool = dma_pool_create("ci_hw_qh", dev->parent,
>  sizeof(struct ci_hw_qh),
>  64, CI_HDRC_PAGE_SIZE);
>   if (ci->qh_pool == NULL)
>   return -ENOMEM;
>  
> - ci->td_pool = dma_pool_create("ci_hw_td", dev,
> + ci->td_pool = dma_pool_create("ci_hw_td", dev->parent,
>  sizeof(struct ci_hw_td),
>  64, CI_HDRC_PAGE_SIZE);

The chipidea part is ok for me, but just follow Arnd's suggestion
for patch split, subject, and commit log.

Peter

>   if (ci->td_pool == NULL) {
> diff --git a/drivers/usb/core/buffer.c b/drivers/usb/core/buffer.c
> index 98e39f9..1e41ef7 100644
> --- a/drivers/usb/core/buffer.c
> +++ b/drivers/usb/core/buffer.c
> @@ -63,7 +63,7 @@ int hcd_buffer_create(struct usb_hcd *hcd)
>   int i, size;
>  
>   if (!IS_ENABLED(CONFIG_HAS_DMA) ||
> - 

Re: [PATCH 1/3] usb: dwc3: host: inherit dma configuration from parent dev

2016-10-25 Thread Peter Chen
On Tue, Oct 25, 2016 at 04:26:26PM +0530, Sriram Dash wrote:
> For xhci-hcd platform device, all the DMA parameters are not configured
> properly, notably dma ops for dwc3 devices.
> 
> The idea here is that you pass in the parent of_node along with the child
> device pointer, so it would behave exactly like the parent already does.
> The difference is that it also handles all the other attributes besides
> the mask.
> Splitting the usb_bus->controller field into the Linux-internal device
> (used for the sysfs hierarchy, for printks and for power management)
> and a new pointer (used for DMA, DT enumeration and phy lookup) probably
> covers all that we really need.
> 
> Signed-off-by: Arnd Bergmann 
> Signed-off-by: Sriram Dash 
> Cc: Felipe Balbi 
> Cc: Grygorii Strashko 
> Cc: Sinjan Kumar 
> Cc: David Fisher 
> Cc: Catalin Marinas 
> Cc: "Thang Q. Nguyen" 
> Cc: Yoshihiro Shimoda 
> Cc: Stephen Boyd 
> Cc: Bjorn Andersson 
> Cc: Ming Lei 
> Cc: Jon Masters 
> Cc: Dann Frazier 
> Cc: Peter Chen 
> Cc: Leo Li 
> ---
>  drivers/usb/chipidea/host.c  |  3 ++-
>  drivers/usb/chipidea/udc.c   | 10 +
>  drivers/usb/core/buffer.c| 12 +--
>  drivers/usb/core/hcd.c   | 48 
> ++--
>  drivers/usb/core/usb.c   | 18 -
>  drivers/usb/dwc3/core.c  | 22 +---
>  drivers/usb/dwc3/core.h  |  1 +
>  drivers/usb/dwc3/ep0.c   |  8 
>  drivers/usb/dwc3/gadget.c| 37 +-
>  drivers/usb/dwc3/host.c  |  8 
>  drivers/usb/host/ehci-fsl.c  |  4 ++--
>  drivers/usb/host/xhci-mem.c  | 12 +--
>  drivers/usb/host/xhci-plat.c | 33 +++---
>  drivers/usb/host/xhci.c  | 15 ++
>  include/linux/usb.h  |  1 +
>  include/linux/usb/hcd.h  |  3 +++
>  16 files changed, 144 insertions(+), 91 deletions(-)
> 
> diff --git a/drivers/usb/chipidea/host.c b/drivers/usb/chipidea/host.c
> index 96ae695..ca27893 100644
> --- a/drivers/usb/chipidea/host.c
> +++ b/drivers/usb/chipidea/host.c
> @@ -116,7 +116,8 @@ static int host_start(struct ci_hdrc *ci)
>   if (usb_disabled())
>   return -ENODEV;
>  
> - hcd = usb_create_hcd(_ehci_hc_driver, ci->dev, dev_name(ci->dev));
> + hcd = __usb_create_hcd(_ehci_hc_driver, ci->dev->parent,
> +ci->dev, dev_name(ci->dev), NULL);
>   if (!hcd)
>   return -ENOMEM;
>  
> diff --git a/drivers/usb/chipidea/udc.c b/drivers/usb/chipidea/udc.c
> index 661f43f..bc55922 100644
> --- a/drivers/usb/chipidea/udc.c
> +++ b/drivers/usb/chipidea/udc.c
> @@ -423,7 +423,8 @@ static int _hardware_enqueue(struct ci_hw_ep *hwep, 
> struct ci_hw_req *hwreq)
>  
>   hwreq->req.status = -EALREADY;
>  
> - ret = usb_gadget_map_request(>gadget, >req, hwep->dir);
> + ret = usb_gadget_map_request_by_dev(ci->dev->parent,
> + >req, hwep->dir);
>   if (ret)
>   return ret;
>  
> @@ -603,7 +604,8 @@ static int _hardware_dequeue(struct ci_hw_ep *hwep, 
> struct ci_hw_req *hwreq)
>   list_del_init(>td);
>   }
>  
> - usb_gadget_unmap_request(>ci->gadget, >req, hwep->dir);
> + usb_gadget_unmap_request_by_dev(hwep->ci->dev->parent,
> + >req, hwep->dir);
>  
>   hwreq->req.actual += actual;
>  
> @@ -1904,13 +1906,13 @@ static int udc_start(struct ci_hdrc *ci)
>   INIT_LIST_HEAD(>gadget.ep_list);
>  
>   /* alloc resources */
> - ci->qh_pool = dma_pool_create("ci_hw_qh", dev,
> + ci->qh_pool = dma_pool_create("ci_hw_qh", dev->parent,
>  sizeof(struct ci_hw_qh),
>  64, CI_HDRC_PAGE_SIZE);
>   if (ci->qh_pool == NULL)
>   return -ENOMEM;
>  
> - ci->td_pool = dma_pool_create("ci_hw_td", dev,
> + ci->td_pool = dma_pool_create("ci_hw_td", dev->parent,
>  sizeof(struct ci_hw_td),
>  64, CI_HDRC_PAGE_SIZE);

The chipidea part is ok for me, but just follow Arnd's suggestion
for patch split, subject, and commit log.

Peter

>   if (ci->td_pool == NULL) {
> diff --git a/drivers/usb/core/buffer.c b/drivers/usb/core/buffer.c
> index 98e39f9..1e41ef7 100644
> --- a/drivers/usb/core/buffer.c
> +++ b/drivers/usb/core/buffer.c
> @@ -63,7 +63,7 @@ int hcd_buffer_create(struct usb_hcd *hcd)
>   int i, size;
>  
>   if (!IS_ENABLED(CONFIG_HAS_DMA) ||
> - (!hcd->self.controller->dma_mask &&
> + (!hcd->self.sysdev->dma_mask &&
>!(hcd->driver->flags & HCD_LOCAL_MEM)))
>   return 0;
>  
> @@ -72,7 +72,7 @@ int hcd_buffer_create(struct usb_hcd *hcd)
>   if (!size)
>   continue;
>   snprintf(name, sizeof(name), "buffer-%d", size);
> - hcd->pool[i] = 

Re: [PATCH V2 0/8] PM / OPP: Multiple regulator support

2016-10-25 Thread Viresh Kumar
On 25-10-16, 16:13, Dave Gerlach wrote:
> I think what you have shared below is a good safety check but if I rename
> the regulator properties in the DT for the cpu (to vdd and vbb, meaning
> cpufreq detects no regulator) and do *not* call dev_pm_opp_set_regulators
> before cpufreq-dt probes we fail before we even get to that point:
> 
> [16.946] cpu cpu0: opp_parse_supplies: Invalid number of elements in
> opp-microvolt property (6) with supplies (1)
> [16.967] cpu cpu0: _of_add_opp_table_v2: Failed to add OPP, -22
> [16.982] cpu cpu0: dev_pm_opp_get_opp_count: OPP table not found (-19)
> [16.982] cpu cpu0: OPP table is not ready, deferring probe
> 
> This failure is because opp_parse_supplies assumes a count of 1 regulator if
> no regulators at all are present and then hard fails if too many voltages
> have been passed for each OPP.

Exactly. And yes this is intentional.

> It seems we need a check much earlier similar
> to what you suggested below to allow us to defer if an OPP has supplied
> voltages but no regulator has been registered with the system. I think this
> is reasonable even for the 1 regulator case, no?

No.

OPP core needs to know about regulators only if the user drivers want it to
manage DVFS. It is still possible for cpufreq drivers to use OPP framework for
managing the tables, but do the real DVFS stuff themselves. That's why it is not
compulsory in the code to set regulator names.

And its only wrong if dev_pm_opp_set_rate() is called without first setting the
regulators..

> cpufreq-dt won't handle this properly as is, but now that the opp core is
> evolving perhaps it makes sense to modify the resources_available check
> slightly to rely on the OPP core rather than just a dummy
> regulator_get_optional to see if the regulator is ready.

I am not sure yet on what to change there. You mean regarding multiple
regulators?

-- 
viresh


Re: [PATCH V2 0/8] PM / OPP: Multiple regulator support

2016-10-25 Thread Viresh Kumar
On 25-10-16, 16:13, Dave Gerlach wrote:
> I think what you have shared below is a good safety check but if I rename
> the regulator properties in the DT for the cpu (to vdd and vbb, meaning
> cpufreq detects no regulator) and do *not* call dev_pm_opp_set_regulators
> before cpufreq-dt probes we fail before we even get to that point:
> 
> [16.946] cpu cpu0: opp_parse_supplies: Invalid number of elements in
> opp-microvolt property (6) with supplies (1)
> [16.967] cpu cpu0: _of_add_opp_table_v2: Failed to add OPP, -22
> [16.982] cpu cpu0: dev_pm_opp_get_opp_count: OPP table not found (-19)
> [16.982] cpu cpu0: OPP table is not ready, deferring probe
> 
> This failure is because opp_parse_supplies assumes a count of 1 regulator if
> no regulators at all are present and then hard fails if too many voltages
> have been passed for each OPP.

Exactly. And yes this is intentional.

> It seems we need a check much earlier similar
> to what you suggested below to allow us to defer if an OPP has supplied
> voltages but no regulator has been registered with the system. I think this
> is reasonable even for the 1 regulator case, no?

No.

OPP core needs to know about regulators only if the user drivers want it to
manage DVFS. It is still possible for cpufreq drivers to use OPP framework for
managing the tables, but do the real DVFS stuff themselves. That's why it is not
compulsory in the code to set regulator names.

And its only wrong if dev_pm_opp_set_rate() is called without first setting the
regulators..

> cpufreq-dt won't handle this properly as is, but now that the opp core is
> evolving perhaps it makes sense to modify the resources_available check
> slightly to rely on the OPP core rather than just a dummy
> regulator_get_optional to see if the regulator is ready.

I am not sure yet on what to change there. You mean regarding multiple
regulators?

-- 
viresh


Re: [PATCH V2 0/6] ARM64: Uprobe support added

2016-10-25 Thread Pratyush Anand
Hi Catalin,

Please let me know if everything else other than is_trap_insn() looks
fine to you. May be I can work well in time. It would be great if we
can make it into v4.9.


~Pratyush


On Tue, Sep 27, 2016 at 1:17 PM, Pratyush Anand  wrote:
> Changes since v1:
> * Exposed sync_icache_aliases() and used that in stead of 
> flush_uprobe_xol_access()
> * Assigned 0x0005 to BRK64_ESR_UPROBES in stead of 0x0008
> * moved uprobe_opcode_t from probes.h to uprobes.h
> * Assigned 4 to TIF_UPROBE instead of 5
> * Assigned AARCH64_INSN_SIZE to UPROBE_SWBP_INSN_SIZE instead of hard code 4.
> * Removed saved_fault_code from struct arch_uprobe_task
> * Removed preempt_dis(en)able() from arch_uprobe_copy_ixol()
> * Removed case INSN_GOOD from arch_uprobe_analyze_insn()
> * Now we do check that probe point is not for a 32 bit task.
> * Return a false positive from is_tarp_insn()
> * Changes for rebase conflict resolution
>
> V1 was here: https://lkml.org/lkml/2016/8/2/29
> Patches have been rebased on next-20160927, so that there would be no
> conflicts with other arm64/for-next/core patches.
>
> Patches have been tested for following:
> 1. Step-able instructions, like sub, ldr, add etc.
> 2. Simulation-able like ret, cbnz, cbz etc.
> 3. uretprobe
> 4. Reject-able instructions like sev, wfe etc.
> 5. trapped and abort xol path
> 6. probe at unaligned user address.
> 7. longjump test cases
>
> aarch32 task probing is not yet supported.
>
> Pratyush Anand (6):
>   arm64: kprobe: protect/rename few definitions to be reused by uprobe
>   arm64: kgdb_step_brk_fn: ignore other's exception
>   arm64: Handle TRAP_TRACE for user mode as well
>   arm64: Handle TRAP_BRKPT for user mode as well
>   arm64: introduce mm context flag to keep 32 bit task information
>   arm64: Add uprobe support
>
>  arch/arm64/Kconfig  |   3 +
>  arch/arm64/include/asm/cacheflush.h |   1 +
>  arch/arm64/include/asm/debug-monitors.h |   3 +
>  arch/arm64/include/asm/elf.h|  12 +-
>  arch/arm64/include/asm/mmu.h|   1 +
>  arch/arm64/include/asm/probes.h |  19 +--
>  arch/arm64/include/asm/ptrace.h |   8 ++
>  arch/arm64/include/asm/thread_info.h|   5 +-
>  arch/arm64/include/asm/uprobes.h|  36 ++
>  arch/arm64/kernel/debug-monitors.c  |  40 +++---
>  arch/arm64/kernel/kgdb.c|   3 +
>  arch/arm64/kernel/probes/Makefile   |   2 +
>  arch/arm64/kernel/probes/decode-insn.c  |  32 ++---
>  arch/arm64/kernel/probes/decode-insn.h  |   8 +-
>  arch/arm64/kernel/probes/kprobes.c  |  36 +++---
>  arch/arm64/kernel/probes/uprobes.c  | 221 
> 
>  arch/arm64/kernel/signal.c  |   3 +
>  arch/arm64/mm/flush.c   |   2 +-
>  18 files changed, 371 insertions(+), 64 deletions(-)
>  create mode 100644 arch/arm64/include/asm/uprobes.h
>  create mode 100644 arch/arm64/kernel/probes/uprobes.c
>
> --
> 2.7.4
>


Re: [PATCH V2 0/6] ARM64: Uprobe support added

2016-10-25 Thread Pratyush Anand
Hi Catalin,

Please let me know if everything else other than is_trap_insn() looks
fine to you. May be I can work well in time. It would be great if we
can make it into v4.9.


~Pratyush


On Tue, Sep 27, 2016 at 1:17 PM, Pratyush Anand  wrote:
> Changes since v1:
> * Exposed sync_icache_aliases() and used that in stead of 
> flush_uprobe_xol_access()
> * Assigned 0x0005 to BRK64_ESR_UPROBES in stead of 0x0008
> * moved uprobe_opcode_t from probes.h to uprobes.h
> * Assigned 4 to TIF_UPROBE instead of 5
> * Assigned AARCH64_INSN_SIZE to UPROBE_SWBP_INSN_SIZE instead of hard code 4.
> * Removed saved_fault_code from struct arch_uprobe_task
> * Removed preempt_dis(en)able() from arch_uprobe_copy_ixol()
> * Removed case INSN_GOOD from arch_uprobe_analyze_insn()
> * Now we do check that probe point is not for a 32 bit task.
> * Return a false positive from is_tarp_insn()
> * Changes for rebase conflict resolution
>
> V1 was here: https://lkml.org/lkml/2016/8/2/29
> Patches have been rebased on next-20160927, so that there would be no
> conflicts with other arm64/for-next/core patches.
>
> Patches have been tested for following:
> 1. Step-able instructions, like sub, ldr, add etc.
> 2. Simulation-able like ret, cbnz, cbz etc.
> 3. uretprobe
> 4. Reject-able instructions like sev, wfe etc.
> 5. trapped and abort xol path
> 6. probe at unaligned user address.
> 7. longjump test cases
>
> aarch32 task probing is not yet supported.
>
> Pratyush Anand (6):
>   arm64: kprobe: protect/rename few definitions to be reused by uprobe
>   arm64: kgdb_step_brk_fn: ignore other's exception
>   arm64: Handle TRAP_TRACE for user mode as well
>   arm64: Handle TRAP_BRKPT for user mode as well
>   arm64: introduce mm context flag to keep 32 bit task information
>   arm64: Add uprobe support
>
>  arch/arm64/Kconfig  |   3 +
>  arch/arm64/include/asm/cacheflush.h |   1 +
>  arch/arm64/include/asm/debug-monitors.h |   3 +
>  arch/arm64/include/asm/elf.h|  12 +-
>  arch/arm64/include/asm/mmu.h|   1 +
>  arch/arm64/include/asm/probes.h |  19 +--
>  arch/arm64/include/asm/ptrace.h |   8 ++
>  arch/arm64/include/asm/thread_info.h|   5 +-
>  arch/arm64/include/asm/uprobes.h|  36 ++
>  arch/arm64/kernel/debug-monitors.c  |  40 +++---
>  arch/arm64/kernel/kgdb.c|   3 +
>  arch/arm64/kernel/probes/Makefile   |   2 +
>  arch/arm64/kernel/probes/decode-insn.c  |  32 ++---
>  arch/arm64/kernel/probes/decode-insn.h  |   8 +-
>  arch/arm64/kernel/probes/kprobes.c  |  36 +++---
>  arch/arm64/kernel/probes/uprobes.c  | 221 
> 
>  arch/arm64/kernel/signal.c  |   3 +
>  arch/arm64/mm/flush.c   |   2 +-
>  18 files changed, 371 insertions(+), 64 deletions(-)
>  create mode 100644 arch/arm64/include/asm/uprobes.h
>  create mode 100644 arch/arm64/kernel/probes/uprobes.c
>
> --
> 2.7.4
>


Re: [PATCH 1/2] mm/memblock: prepare a capability to support memblock near alloc

2016-10-25 Thread Leizhen (ThunderTown)


On 2016/10/25 21:23, Michal Hocko wrote:
> On Tue 25-10-16 10:59:17, Zhen Lei wrote:
>> If HAVE_MEMORYLESS_NODES is selected, and some memoryless numa nodes are
>> actually exist. The percpu variable areas and numa control blocks of that
>> memoryless numa nodes need to be allocated from the nearest available
>> node to improve performance.
>>
>> Although memblock_alloc_try_nid and memblock_virt_alloc_try_nid try the
>> specified nid at the first time, but if that allocation failed it will
>> directly drop to use NUMA_NO_NODE. This mean any nodes maybe possible at
>> the second time.
>>
>> To compatible the above old scene, I use a marco node_distance_ready to
>> control it. By default, the marco node_distance_ready is not defined in
>> any platforms, the above mentioned functions will work as normal as
>> before. Otherwise, they will try the nearest node first.
> 
> I am sorry but it is absolutely unclear to me _what_ is the motivation
> of the patch. Is this a performance optimization, correctness issue or
> something else? Could you please restate what is the problem, why do you
> think it has to be fixed at memblock layer and describe what the actual
> fix is please?
This is a performance optimization. The problem is if some memoryless numa 
nodes are
actually exist, for example: there are total 4 nodes, 0,1,2,3, node 1 has no 
memory,
and the node distances is as below:
-board---
|   |
|   |
 socket0 socket1
   / \ / \
  /   \   /   \
   node0 node1 node2 node3
distance[1][0] is nearer than distance[1][2] and distance[1][3]. CPUs on node1 
access
the memory of node0 is faster than node2 or node3.

Linux defines a lot of percpu variables, each cpu has a copy of it and most of 
the time
only to access their own percpu area. In this example, we hope the percpu area 
of CPUs
on node1 allocated from node0. But without these patches, it's not sure that.

If each node has their own memory, we can directly use below functions to 
allocate memory
from its local node:
1. memblock_alloc_nid
2. memblock_alloc_try_nid
3. memblock_virt_alloc_try_nid_nopanic
4. memblock_virt_alloc_try_nid

So, these patches is only used for numa memoryless scenario.

Another use case is the control block "extern pg_data_t *node_data[]",
Here is an example of x86 numa in arch/x86/mm/numa.c:
static void __init alloc_node_data(int nid)
{
... ...
/*
 * Allocate node data.  Try node-local memory and then any node.
//==>But the nearest node is the best
 * Never allocate in DMA zone.
 */
nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid);
if (!nd_pa) {
nd_pa = __memblock_alloc_base(nd_size, SMP_CACHE_BYTES,
  MEMBLOCK_ALLOC_ACCESSIBLE);
if (!nd_pa) {
pr_err("Cannot find %zu bytes in node %d\n",
   nd_size, nid);
return;
}
}
nd = __va(nd_pa);
... ...
node_data[nid] = nd;

> 
>>From a quick glance you are trying to bend over the memblock API for
> something that should be handled on a different layer.
> 
>>
>> Signed-off-by: Zhen Lei 
>> ---
>>  mm/memblock.c | 76 
>> ++-
>>  1 file changed, 65 insertions(+), 11 deletions(-)
>>
>> diff --git a/mm/memblock.c b/mm/memblock.c
>> index 7608bc3..556bbd2 100644
>> --- a/mm/memblock.c
>> +++ b/mm/memblock.c
>> @@ -1213,9 +1213,71 @@ phys_addr_t __init memblock_alloc(phys_addr_t size, 
>> phys_addr_t align)
>>  return memblock_alloc_base(size, align, MEMBLOCK_ALLOC_ACCESSIBLE);
>>  }
>>
>> +#ifndef node_distance_ready
>> +#define node_distance_ready()   0
>> +#endif
>> +
>> +static phys_addr_t __init memblock_alloc_near_nid(phys_addr_t size,
>> +phys_addr_t align, phys_addr_t start,
>> +phys_addr_t end, int nid, ulong flags,
>> +int alloc_func_type)
>> +{
>> +int nnid, round = 0;
>> +u64 pa;
>> +DECLARE_BITMAP(nodes_map, MAX_NUMNODES);
>> +
>> +bitmap_zero(nodes_map, MAX_NUMNODES);
>> +
>> +again:
>> +/*
>> + * There are total 4 cases:
>> + * 
>> + *   1)2) node_distance_ready || !node_distance_ready
>> + *  Round 1, nnid = nid = NUMA_NO_NODE;
>> + * 
>> + *   3) !node_distance_ready
>> + *  Round 1, nnid = nid;
>> + *::Round 2, currently only applicable for alloc_func_type = <0>
>> + *  Round 2, nnid = NUMA_NO_NODE;
>> + *   4) node_distance_ready
>> + *  Round 1, LOCAL_DISTANCE, nnid = nid;
>> + *  Round ?, nnid = nearest nid;

Re: [PATCH 1/2] mm/memblock: prepare a capability to support memblock near alloc

2016-10-25 Thread Leizhen (ThunderTown)


On 2016/10/25 21:23, Michal Hocko wrote:
> On Tue 25-10-16 10:59:17, Zhen Lei wrote:
>> If HAVE_MEMORYLESS_NODES is selected, and some memoryless numa nodes are
>> actually exist. The percpu variable areas and numa control blocks of that
>> memoryless numa nodes need to be allocated from the nearest available
>> node to improve performance.
>>
>> Although memblock_alloc_try_nid and memblock_virt_alloc_try_nid try the
>> specified nid at the first time, but if that allocation failed it will
>> directly drop to use NUMA_NO_NODE. This mean any nodes maybe possible at
>> the second time.
>>
>> To compatible the above old scene, I use a marco node_distance_ready to
>> control it. By default, the marco node_distance_ready is not defined in
>> any platforms, the above mentioned functions will work as normal as
>> before. Otherwise, they will try the nearest node first.
> 
> I am sorry but it is absolutely unclear to me _what_ is the motivation
> of the patch. Is this a performance optimization, correctness issue or
> something else? Could you please restate what is the problem, why do you
> think it has to be fixed at memblock layer and describe what the actual
> fix is please?
This is a performance optimization. The problem is if some memoryless numa 
nodes are
actually exist, for example: there are total 4 nodes, 0,1,2,3, node 1 has no 
memory,
and the node distances is as below:
-board---
|   |
|   |
 socket0 socket1
   / \ / \
  /   \   /   \
   node0 node1 node2 node3
distance[1][0] is nearer than distance[1][2] and distance[1][3]. CPUs on node1 
access
the memory of node0 is faster than node2 or node3.

Linux defines a lot of percpu variables, each cpu has a copy of it and most of 
the time
only to access their own percpu area. In this example, we hope the percpu area 
of CPUs
on node1 allocated from node0. But without these patches, it's not sure that.

If each node has their own memory, we can directly use below functions to 
allocate memory
from its local node:
1. memblock_alloc_nid
2. memblock_alloc_try_nid
3. memblock_virt_alloc_try_nid_nopanic
4. memblock_virt_alloc_try_nid

So, these patches is only used for numa memoryless scenario.

Another use case is the control block "extern pg_data_t *node_data[]",
Here is an example of x86 numa in arch/x86/mm/numa.c:
static void __init alloc_node_data(int nid)
{
... ...
/*
 * Allocate node data.  Try node-local memory and then any node.
//==>But the nearest node is the best
 * Never allocate in DMA zone.
 */
nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid);
if (!nd_pa) {
nd_pa = __memblock_alloc_base(nd_size, SMP_CACHE_BYTES,
  MEMBLOCK_ALLOC_ACCESSIBLE);
if (!nd_pa) {
pr_err("Cannot find %zu bytes in node %d\n",
   nd_size, nid);
return;
}
}
nd = __va(nd_pa);
... ...
node_data[nid] = nd;

> 
>>From a quick glance you are trying to bend over the memblock API for
> something that should be handled on a different layer.
> 
>>
>> Signed-off-by: Zhen Lei 
>> ---
>>  mm/memblock.c | 76 
>> ++-
>>  1 file changed, 65 insertions(+), 11 deletions(-)
>>
>> diff --git a/mm/memblock.c b/mm/memblock.c
>> index 7608bc3..556bbd2 100644
>> --- a/mm/memblock.c
>> +++ b/mm/memblock.c
>> @@ -1213,9 +1213,71 @@ phys_addr_t __init memblock_alloc(phys_addr_t size, 
>> phys_addr_t align)
>>  return memblock_alloc_base(size, align, MEMBLOCK_ALLOC_ACCESSIBLE);
>>  }
>>
>> +#ifndef node_distance_ready
>> +#define node_distance_ready()   0
>> +#endif
>> +
>> +static phys_addr_t __init memblock_alloc_near_nid(phys_addr_t size,
>> +phys_addr_t align, phys_addr_t start,
>> +phys_addr_t end, int nid, ulong flags,
>> +int alloc_func_type)
>> +{
>> +int nnid, round = 0;
>> +u64 pa;
>> +DECLARE_BITMAP(nodes_map, MAX_NUMNODES);
>> +
>> +bitmap_zero(nodes_map, MAX_NUMNODES);
>> +
>> +again:
>> +/*
>> + * There are total 4 cases:
>> + * 
>> + *   1)2) node_distance_ready || !node_distance_ready
>> + *  Round 1, nnid = nid = NUMA_NO_NODE;
>> + * 
>> + *   3) !node_distance_ready
>> + *  Round 1, nnid = nid;
>> + *::Round 2, currently only applicable for alloc_func_type = <0>
>> + *  Round 2, nnid = NUMA_NO_NODE;
>> + *   4) node_distance_ready
>> + *  Round 1, LOCAL_DISTANCE, nnid = nid;
>> + *  Round ?, nnid = nearest nid;
>> + */
>> +if 

[PATCH v6 4/5] ARM: DTS: da850: Add cfgchip syscon node

2016-10-25 Thread David Lechner
Add a syscon node for the SoC CFGCHIPn registers. This is needed for
the new usb phy driver.

Signed-off-by: David Lechner 
---
 arch/arm/boot/dts/da850.dtsi | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/arm/boot/dts/da850.dtsi b/arch/arm/boot/dts/da850.dtsi
index f79e1b9..6bbf20d 100644
--- a/arch/arm/boot/dts/da850.dtsi
+++ b/arch/arm/boot/dts/da850.dtsi
@@ -188,6 +188,10 @@
};
 
};
+   cfgchip: cfgchip@1417c {
+   compatible = "ti,da830-cfgchip", "syscon";
+   reg = <0x1417c 0x14>;
+   };
edma0: edma@0 {
compatible = "ti,edma3-tpcc";
/* eDMA3 CC0: 0x01c0  - 0x01c0 7fff */
-- 
2.7.4



[PATCH v6 0/5] da8xx USB PHY platform devices and clocks

2016-10-25 Thread David Lechner
It has been almost 6 months since the v5 submission, so here is a recap:

* There were a number of phy and usb dependencies that were submitted
  separately.
* The last of the usb dependencies has finally made its way into linux-next
  today.
* This series was recently included in "[PATCH/RFT v2 00/17] Add DT support for
  ohci-da8xx". I am breaking it back out again as a standalone series.


v6 changes:

* Combine "ARM: davinci: da8xx: Enable the usb20 "per" clk on phy_clk_enable"
  from the "[PATCH/RFT v2 00/17] Add DT support for ohci-da8xx" series with
  the "ARM: davinci: da8xx: add usb phy clocks" patch in this series.
* Change the syscon and da8xx-usb-phy device ids to -1.

v5 changes: renamed "usbphy" to "usb_phy" or "usb-phy" as appropriate

v4 changes: fix strict checkpatch complaint

v3 changes:

* Fixed the davinci device tree declarations to use the preferred DT address
  convention so that the items I have added can be correct too.
* Moved that davinci clock init so that we don't have to call ioremap in the
  clock mux functions.
* Added a new "syscon" device for the CFGCHIP registers. This is used by the
  USB PHY driver and will be used in the future in common clock framework
  drivers.
* USB clocks are moved to a common file instead of having duplicated code.
* PHY driver uses syscon for CFGCHIP registers instead of using them directly.

David Lechner (5):
  ARM: davinci: da8xx: add usb phy clocks
  ARM: davinci: da8xx: Add CFGCHIP syscon platform declaration.
  ARM: davinci: da8xx: Add USB PHY platform declaration
  ARM: DTS: da850: Add cfgchip syscon node
  ARM: DTS: da850: Add usb phy node

 arch/arm/boot/dts/da850.dtsi|   9 ++
 arch/arm/mach-davinci/board-da830-evm.c |  52 +++---
 arch/arm/mach-davinci/board-da850-evm.c |   4 +
 arch/arm/mach-davinci/board-mityomapl138.c  |   4 +
 arch/arm/mach-davinci/board-omapl138-hawk.c |  23 ++-
 arch/arm/mach-davinci/devices-da8xx.c   |  28 
 arch/arm/mach-davinci/include/mach/da8xx.h  |   6 +
 arch/arm/mach-davinci/usb-da8xx.c   | 243 +++-
 8 files changed, 327 insertions(+), 42 deletions(-)

-- 
2.7.4



[PATCH v6 1/5] ARM: davinci: da8xx: add usb phy clocks

2016-10-25 Thread David Lechner
Up to this point, the USB phy clock configuration was handled manually in
the board files and in the usb drivers. This adds proper clocks so that
the usb drivers can use clk_get and clk_enable and not have to worry about
the details. Also, the related code is removed from the board files and
replaced with the new clock registration functions.

Signed-off-by: David Lechner 
Signed-off-by: Axel Haslam 
---

I have added "ARM: davinci: da8xx: Enable the usb20 "per" clk on phy_clk_enable"
from Axel Haslam to this patch.

In the review of Axel's patch, Sekhar said:

> We should not be using a NULL device pointer here. Can you pass the musb
> device pointer available in the same file? Also, da850_clks[] in da850.c
> needs to be fixed to add the matching device name.

However, the musb device may not be registered. The usb20_clk can be used to
supply a 48MHz clock to USB 1.1 (ohci) without using the musb device. So, I am
inclined to leave this as NULL.


 arch/arm/mach-davinci/board-da830-evm.c |  22 ++-
 arch/arm/mach-davinci/board-omapl138-hawk.c |  16 +-
 arch/arm/mach-davinci/include/mach/da8xx.h  |   3 +
 arch/arm/mach-davinci/usb-da8xx.c   | 232 +++-
 4 files changed, 252 insertions(+), 21 deletions(-)

diff --git a/arch/arm/mach-davinci/board-da830-evm.c 
b/arch/arm/mach-davinci/board-da830-evm.c
index 3d8cf8c..605d444 100644
--- a/arch/arm/mach-davinci/board-da830-evm.c
+++ b/arch/arm/mach-davinci/board-da830-evm.c
@@ -115,18 +115,6 @@ static __init void da830_evm_usb_init(void)
 */
cfgchip2 = __raw_readl(DA8XX_SYSCFG0_VIRT(DA8XX_CFGCHIP2_REG));
 
-   /* USB2.0 PHY reference clock is 24 MHz */
-   cfgchip2 &= ~CFGCHIP2_REFFREQ;
-   cfgchip2 |=  CFGCHIP2_REFFREQ_24MHZ;
-
-   /*
-* Select internal reference clock for USB 2.0 PHY
-* and use it as a clock source for USB 1.1 PHY
-* (this is the default setting anyway).
-*/
-   cfgchip2 &= ~CFGCHIP2_USB1PHYCLKMUX;
-   cfgchip2 |=  CFGCHIP2_USB2PHYCLKMUX;
-
/*
 * We have to override VBUS/ID signals when MUSB is configured into the
 * host-only mode -- ID pin will float if no cable is connected, so the
@@ -143,6 +131,16 @@ static __init void da830_evm_usb_init(void)
__raw_writel(cfgchip2, DA8XX_SYSCFG0_VIRT(DA8XX_CFGCHIP2_REG));
 
/* USB_REFCLKIN is not used. */
+   ret = da8xx_register_usb20_phy_clk(false);
+   if (ret)
+   pr_warn("%s: USB 2.0 PHY CLK registration failed: %d\n",
+   __func__, ret);
+
+   ret = da8xx_register_usb11_phy_clk(false);
+   if (ret)
+   pr_warn("%s: USB 1.1 PHY CLK registration failed: %d\n",
+   __func__, ret);
+
ret = davinci_cfg_reg(DA830_USB0_DRVVBUS);
if (ret)
pr_warn("%s: USB 2.0 PinMux setup failed: %d\n", __func__, ret);
diff --git a/arch/arm/mach-davinci/board-omapl138-hawk.c 
b/arch/arm/mach-davinci/board-omapl138-hawk.c
index ee62486..d4930b6 100644
--- a/arch/arm/mach-davinci/board-omapl138-hawk.c
+++ b/arch/arm/mach-davinci/board-omapl138-hawk.c
@@ -243,7 +243,6 @@ static irqreturn_t omapl138_hawk_usb_ocic_irq(int irq, void 
*dev_id)
 static __init void omapl138_hawk_usb_init(void)
 {
int ret;
-   u32 cfgchip2;
 
ret = davinci_cfg_reg_list(da850_hawk_usb11_pins);
if (ret) {
@@ -251,12 +250,15 @@ static __init void omapl138_hawk_usb_init(void)
return;
}
 
-   /* Setup the Ref. clock frequency for the HAWK at 24 MHz. */
-
-   cfgchip2 = __raw_readl(DA8XX_SYSCFG0_VIRT(DA8XX_CFGCHIP2_REG));
-   cfgchip2 &= ~CFGCHIP2_REFFREQ;
-   cfgchip2 |=  CFGCHIP2_REFFREQ_24MHZ;
-   __raw_writel(cfgchip2, DA8XX_SYSCFG0_VIRT(DA8XX_CFGCHIP2_REG));
+   /* USB_REFCLKIN is not used. */
+   ret = da8xx_register_usb20_phy_clk(false);
+   if (ret)
+   pr_warn("%s: USB 2.0 PHY CLK registration failed: %d\n",
+   __func__, ret);
+   ret = da8xx_register_usb11_phy_clk(false);
+   if (ret)
+   pr_warn("%s: USB 1.1 PHY CLK registration failed: %d\n",
+   __func__, ret);
 
ret = gpio_request_one(DA850_USB1_VBUS_PIN,
GPIOF_DIR_OUT, "USB1 VBUS");
diff --git a/arch/arm/mach-davinci/include/mach/da8xx.h 
b/arch/arm/mach-davinci/include/mach/da8xx.h
index f9f9713..c367530 100644
--- a/arch/arm/mach-davinci/include/mach/da8xx.h
+++ b/arch/arm/mach-davinci/include/mach/da8xx.h
@@ -88,6 +88,9 @@ int da850_register_edma(struct edma_rsv_info *rsv[2]);
 int da8xx_register_i2c(int instance, struct davinci_i2c_platform_data *pdata);
 int da8xx_register_spi_bus(int instance, unsigned num_chipselect);
 int da8xx_register_watchdog(void);
+int da8xx_register_usb_refclkin(int rate);
+int da8xx_register_usb20_phy_clk(bool use_usb_refclkin);
+int da8xx_register_usb11_phy_clk(bool 

[PATCH v6 3/5] ARM: davinci: da8xx: Add USB PHY platform declaration

2016-10-25 Thread David Lechner
There is now a proper phy driver for the DA8xx SoC USB PHY. This adds the
platform device declarations needed to use it.

Signed-off-by: David Lechner 
---

da8xx-usb-phy device id is changed to -1 since there is only one da8xx-usb-phy
device.

 arch/arm/mach-davinci/board-da830-evm.c | 28 +---
 arch/arm/mach-davinci/board-omapl138-hawk.c |  5 +
 arch/arm/mach-davinci/include/mach/da8xx.h  |  1 +
 arch/arm/mach-davinci/usb-da8xx.c   | 11 +++
 4 files changed, 22 insertions(+), 23 deletions(-)

diff --git a/arch/arm/mach-davinci/board-da830-evm.c 
b/arch/arm/mach-davinci/board-da830-evm.c
index 3051cb6..c62766e 100644
--- a/arch/arm/mach-davinci/board-da830-evm.c
+++ b/arch/arm/mach-davinci/board-da830-evm.c
@@ -26,7 +26,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include 
 #include 
@@ -106,30 +105,8 @@ static irqreturn_t da830_evm_usb_ocic_irq(int irq, void 
*dev_id)
 
 static __init void da830_evm_usb_init(void)
 {
-   u32 cfgchip2;
int ret;
 
-   /*
-* Set up USB clock/mode in the CFGCHIP2 register.
-* FYI:  CFGCHIP2 is 0xef00 initially.
-*/
-   cfgchip2 = __raw_readl(DA8XX_SYSCFG0_VIRT(DA8XX_CFGCHIP2_REG));
-
-   /*
-* We have to override VBUS/ID signals when MUSB is configured into the
-* host-only mode -- ID pin will float if no cable is connected, so the
-* controller won't be able to drive VBUS thinking that it's a B-device.
-* Otherwise, we want to use the OTG mode and enable VBUS comparators.
-*/
-   cfgchip2 &= ~CFGCHIP2_OTGMODE;
-#ifdef CONFIG_USB_MUSB_HOST
-   cfgchip2 |=  CFGCHIP2_FORCE_HOST;
-#else
-   cfgchip2 |=  CFGCHIP2_SESENDEN | CFGCHIP2_VBDTCTEN;
-#endif
-
-   __raw_writel(cfgchip2, DA8XX_SYSCFG0_VIRT(DA8XX_CFGCHIP2_REG));
-
/* USB_REFCLKIN is not used. */
ret = da8xx_register_usb20_phy_clk(false);
if (ret)
@@ -141,6 +118,11 @@ static __init void da830_evm_usb_init(void)
pr_warn("%s: USB 1.1 PHY CLK registration failed: %d\n",
__func__, ret);
 
+   ret = da8xx_register_usb_phy();
+   if (ret)
+   pr_warn("%s: USB PHY registration failed: %d\n",
+   __func__, ret);
+
ret = davinci_cfg_reg(DA830_USB0_DRVVBUS);
if (ret)
pr_warn("%s: USB 2.0 PinMux setup failed: %d\n", __func__, ret);
diff --git a/arch/arm/mach-davinci/board-omapl138-hawk.c 
b/arch/arm/mach-davinci/board-omapl138-hawk.c
index 8691a25..c5cb8d9 100644
--- a/arch/arm/mach-davinci/board-omapl138-hawk.c
+++ b/arch/arm/mach-davinci/board-omapl138-hawk.c
@@ -260,6 +260,11 @@ static __init void omapl138_hawk_usb_init(void)
pr_warn("%s: USB 1.1 PHY CLK registration failed: %d\n",
__func__, ret);
 
+   ret = da8xx_register_usb_phy();
+   if (ret)
+   pr_warn("%s: USB PHY registration failed: %d\n",
+   __func__, ret);
+
ret = gpio_request_one(DA850_USB1_VBUS_PIN,
GPIOF_DIR_OUT, "USB1 VBUS");
if (ret < 0) {
diff --git a/arch/arm/mach-davinci/include/mach/da8xx.h 
b/arch/arm/mach-davinci/include/mach/da8xx.h
index c32444b..38d932e 100644
--- a/arch/arm/mach-davinci/include/mach/da8xx.h
+++ b/arch/arm/mach-davinci/include/mach/da8xx.h
@@ -92,6 +92,7 @@ int da8xx_register_watchdog(void);
 int da8xx_register_usb_refclkin(int rate);
 int da8xx_register_usb20_phy_clk(bool use_usb_refclkin);
 int da8xx_register_usb11_phy_clk(bool use_usb_refclkin);
+int da8xx_register_usb_phy(void);
 int da8xx_register_usb20(unsigned mA, unsigned potpgt);
 int da8xx_register_usb11(struct da8xx_ohci_root_hub *pdata);
 int da8xx_register_emac(void);
diff --git a/arch/arm/mach-davinci/usb-da8xx.c 
b/arch/arm/mach-davinci/usb-da8xx.c
index 71a6d85..9c30bff 100644
--- a/arch/arm/mach-davinci/usb-da8xx.c
+++ b/arch/arm/mach-davinci/usb-da8xx.c
@@ -7,6 +7,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -243,6 +244,16 @@ int __init da8xx_register_usb11_phy_clk(bool 
use_usb_refclkin)
return ret;
 }
 
+static struct platform_device da8xx_usb_phy = {
+   .name   = "da8xx-usb-phy",
+   .id = -1,
+};
+
+int __init da8xx_register_usb_phy(void)
+{
+   return platform_device_register(_usb_phy);
+}
+
 #if IS_ENABLED(CONFIG_USB_MUSB_HDRC)
 
 static struct musb_hdrc_config musb_config = {
-- 
2.7.4



[PATCH v6 4/5] ARM: DTS: da850: Add cfgchip syscon node

2016-10-25 Thread David Lechner
Add a syscon node for the SoC CFGCHIPn registers. This is needed for
the new usb phy driver.

Signed-off-by: David Lechner 
---
 arch/arm/boot/dts/da850.dtsi | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/arm/boot/dts/da850.dtsi b/arch/arm/boot/dts/da850.dtsi
index f79e1b9..6bbf20d 100644
--- a/arch/arm/boot/dts/da850.dtsi
+++ b/arch/arm/boot/dts/da850.dtsi
@@ -188,6 +188,10 @@
};
 
};
+   cfgchip: cfgchip@1417c {
+   compatible = "ti,da830-cfgchip", "syscon";
+   reg = <0x1417c 0x14>;
+   };
edma0: edma@0 {
compatible = "ti,edma3-tpcc";
/* eDMA3 CC0: 0x01c0  - 0x01c0 7fff */
-- 
2.7.4



[PATCH v6 0/5] da8xx USB PHY platform devices and clocks

2016-10-25 Thread David Lechner
It has been almost 6 months since the v5 submission, so here is a recap:

* There were a number of phy and usb dependencies that were submitted
  separately.
* The last of the usb dependencies has finally made its way into linux-next
  today.
* This series was recently included in "[PATCH/RFT v2 00/17] Add DT support for
  ohci-da8xx". I am breaking it back out again as a standalone series.


v6 changes:

* Combine "ARM: davinci: da8xx: Enable the usb20 "per" clk on phy_clk_enable"
  from the "[PATCH/RFT v2 00/17] Add DT support for ohci-da8xx" series with
  the "ARM: davinci: da8xx: add usb phy clocks" patch in this series.
* Change the syscon and da8xx-usb-phy device ids to -1.

v5 changes: renamed "usbphy" to "usb_phy" or "usb-phy" as appropriate

v4 changes: fix strict checkpatch complaint

v3 changes:

* Fixed the davinci device tree declarations to use the preferred DT address
  convention so that the items I have added can be correct too.
* Moved that davinci clock init so that we don't have to call ioremap in the
  clock mux functions.
* Added a new "syscon" device for the CFGCHIP registers. This is used by the
  USB PHY driver and will be used in the future in common clock framework
  drivers.
* USB clocks are moved to a common file instead of having duplicated code.
* PHY driver uses syscon for CFGCHIP registers instead of using them directly.

David Lechner (5):
  ARM: davinci: da8xx: add usb phy clocks
  ARM: davinci: da8xx: Add CFGCHIP syscon platform declaration.
  ARM: davinci: da8xx: Add USB PHY platform declaration
  ARM: DTS: da850: Add cfgchip syscon node
  ARM: DTS: da850: Add usb phy node

 arch/arm/boot/dts/da850.dtsi|   9 ++
 arch/arm/mach-davinci/board-da830-evm.c |  52 +++---
 arch/arm/mach-davinci/board-da850-evm.c |   4 +
 arch/arm/mach-davinci/board-mityomapl138.c  |   4 +
 arch/arm/mach-davinci/board-omapl138-hawk.c |  23 ++-
 arch/arm/mach-davinci/devices-da8xx.c   |  28 
 arch/arm/mach-davinci/include/mach/da8xx.h  |   6 +
 arch/arm/mach-davinci/usb-da8xx.c   | 243 +++-
 8 files changed, 327 insertions(+), 42 deletions(-)

-- 
2.7.4



[PATCH v6 1/5] ARM: davinci: da8xx: add usb phy clocks

2016-10-25 Thread David Lechner
Up to this point, the USB phy clock configuration was handled manually in
the board files and in the usb drivers. This adds proper clocks so that
the usb drivers can use clk_get and clk_enable and not have to worry about
the details. Also, the related code is removed from the board files and
replaced with the new clock registration functions.

Signed-off-by: David Lechner 
Signed-off-by: Axel Haslam 
---

I have added "ARM: davinci: da8xx: Enable the usb20 "per" clk on phy_clk_enable"
from Axel Haslam to this patch.

In the review of Axel's patch, Sekhar said:

> We should not be using a NULL device pointer here. Can you pass the musb
> device pointer available in the same file? Also, da850_clks[] in da850.c
> needs to be fixed to add the matching device name.

However, the musb device may not be registered. The usb20_clk can be used to
supply a 48MHz clock to USB 1.1 (ohci) without using the musb device. So, I am
inclined to leave this as NULL.


 arch/arm/mach-davinci/board-da830-evm.c |  22 ++-
 arch/arm/mach-davinci/board-omapl138-hawk.c |  16 +-
 arch/arm/mach-davinci/include/mach/da8xx.h  |   3 +
 arch/arm/mach-davinci/usb-da8xx.c   | 232 +++-
 4 files changed, 252 insertions(+), 21 deletions(-)

diff --git a/arch/arm/mach-davinci/board-da830-evm.c 
b/arch/arm/mach-davinci/board-da830-evm.c
index 3d8cf8c..605d444 100644
--- a/arch/arm/mach-davinci/board-da830-evm.c
+++ b/arch/arm/mach-davinci/board-da830-evm.c
@@ -115,18 +115,6 @@ static __init void da830_evm_usb_init(void)
 */
cfgchip2 = __raw_readl(DA8XX_SYSCFG0_VIRT(DA8XX_CFGCHIP2_REG));
 
-   /* USB2.0 PHY reference clock is 24 MHz */
-   cfgchip2 &= ~CFGCHIP2_REFFREQ;
-   cfgchip2 |=  CFGCHIP2_REFFREQ_24MHZ;
-
-   /*
-* Select internal reference clock for USB 2.0 PHY
-* and use it as a clock source for USB 1.1 PHY
-* (this is the default setting anyway).
-*/
-   cfgchip2 &= ~CFGCHIP2_USB1PHYCLKMUX;
-   cfgchip2 |=  CFGCHIP2_USB2PHYCLKMUX;
-
/*
 * We have to override VBUS/ID signals when MUSB is configured into the
 * host-only mode -- ID pin will float if no cable is connected, so the
@@ -143,6 +131,16 @@ static __init void da830_evm_usb_init(void)
__raw_writel(cfgchip2, DA8XX_SYSCFG0_VIRT(DA8XX_CFGCHIP2_REG));
 
/* USB_REFCLKIN is not used. */
+   ret = da8xx_register_usb20_phy_clk(false);
+   if (ret)
+   pr_warn("%s: USB 2.0 PHY CLK registration failed: %d\n",
+   __func__, ret);
+
+   ret = da8xx_register_usb11_phy_clk(false);
+   if (ret)
+   pr_warn("%s: USB 1.1 PHY CLK registration failed: %d\n",
+   __func__, ret);
+
ret = davinci_cfg_reg(DA830_USB0_DRVVBUS);
if (ret)
pr_warn("%s: USB 2.0 PinMux setup failed: %d\n", __func__, ret);
diff --git a/arch/arm/mach-davinci/board-omapl138-hawk.c 
b/arch/arm/mach-davinci/board-omapl138-hawk.c
index ee62486..d4930b6 100644
--- a/arch/arm/mach-davinci/board-omapl138-hawk.c
+++ b/arch/arm/mach-davinci/board-omapl138-hawk.c
@@ -243,7 +243,6 @@ static irqreturn_t omapl138_hawk_usb_ocic_irq(int irq, void 
*dev_id)
 static __init void omapl138_hawk_usb_init(void)
 {
int ret;
-   u32 cfgchip2;
 
ret = davinci_cfg_reg_list(da850_hawk_usb11_pins);
if (ret) {
@@ -251,12 +250,15 @@ static __init void omapl138_hawk_usb_init(void)
return;
}
 
-   /* Setup the Ref. clock frequency for the HAWK at 24 MHz. */
-
-   cfgchip2 = __raw_readl(DA8XX_SYSCFG0_VIRT(DA8XX_CFGCHIP2_REG));
-   cfgchip2 &= ~CFGCHIP2_REFFREQ;
-   cfgchip2 |=  CFGCHIP2_REFFREQ_24MHZ;
-   __raw_writel(cfgchip2, DA8XX_SYSCFG0_VIRT(DA8XX_CFGCHIP2_REG));
+   /* USB_REFCLKIN is not used. */
+   ret = da8xx_register_usb20_phy_clk(false);
+   if (ret)
+   pr_warn("%s: USB 2.0 PHY CLK registration failed: %d\n",
+   __func__, ret);
+   ret = da8xx_register_usb11_phy_clk(false);
+   if (ret)
+   pr_warn("%s: USB 1.1 PHY CLK registration failed: %d\n",
+   __func__, ret);
 
ret = gpio_request_one(DA850_USB1_VBUS_PIN,
GPIOF_DIR_OUT, "USB1 VBUS");
diff --git a/arch/arm/mach-davinci/include/mach/da8xx.h 
b/arch/arm/mach-davinci/include/mach/da8xx.h
index f9f9713..c367530 100644
--- a/arch/arm/mach-davinci/include/mach/da8xx.h
+++ b/arch/arm/mach-davinci/include/mach/da8xx.h
@@ -88,6 +88,9 @@ int da850_register_edma(struct edma_rsv_info *rsv[2]);
 int da8xx_register_i2c(int instance, struct davinci_i2c_platform_data *pdata);
 int da8xx_register_spi_bus(int instance, unsigned num_chipselect);
 int da8xx_register_watchdog(void);
+int da8xx_register_usb_refclkin(int rate);
+int da8xx_register_usb20_phy_clk(bool use_usb_refclkin);
+int da8xx_register_usb11_phy_clk(bool use_usb_refclkin);
 int da8xx_register_usb20(unsigned 

[PATCH v6 3/5] ARM: davinci: da8xx: Add USB PHY platform declaration

2016-10-25 Thread David Lechner
There is now a proper phy driver for the DA8xx SoC USB PHY. This adds the
platform device declarations needed to use it.

Signed-off-by: David Lechner 
---

da8xx-usb-phy device id is changed to -1 since there is only one da8xx-usb-phy
device.

 arch/arm/mach-davinci/board-da830-evm.c | 28 +---
 arch/arm/mach-davinci/board-omapl138-hawk.c |  5 +
 arch/arm/mach-davinci/include/mach/da8xx.h  |  1 +
 arch/arm/mach-davinci/usb-da8xx.c   | 11 +++
 4 files changed, 22 insertions(+), 23 deletions(-)

diff --git a/arch/arm/mach-davinci/board-da830-evm.c 
b/arch/arm/mach-davinci/board-da830-evm.c
index 3051cb6..c62766e 100644
--- a/arch/arm/mach-davinci/board-da830-evm.c
+++ b/arch/arm/mach-davinci/board-da830-evm.c
@@ -26,7 +26,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include 
 #include 
@@ -106,30 +105,8 @@ static irqreturn_t da830_evm_usb_ocic_irq(int irq, void 
*dev_id)
 
 static __init void da830_evm_usb_init(void)
 {
-   u32 cfgchip2;
int ret;
 
-   /*
-* Set up USB clock/mode in the CFGCHIP2 register.
-* FYI:  CFGCHIP2 is 0xef00 initially.
-*/
-   cfgchip2 = __raw_readl(DA8XX_SYSCFG0_VIRT(DA8XX_CFGCHIP2_REG));
-
-   /*
-* We have to override VBUS/ID signals when MUSB is configured into the
-* host-only mode -- ID pin will float if no cable is connected, so the
-* controller won't be able to drive VBUS thinking that it's a B-device.
-* Otherwise, we want to use the OTG mode and enable VBUS comparators.
-*/
-   cfgchip2 &= ~CFGCHIP2_OTGMODE;
-#ifdef CONFIG_USB_MUSB_HOST
-   cfgchip2 |=  CFGCHIP2_FORCE_HOST;
-#else
-   cfgchip2 |=  CFGCHIP2_SESENDEN | CFGCHIP2_VBDTCTEN;
-#endif
-
-   __raw_writel(cfgchip2, DA8XX_SYSCFG0_VIRT(DA8XX_CFGCHIP2_REG));
-
/* USB_REFCLKIN is not used. */
ret = da8xx_register_usb20_phy_clk(false);
if (ret)
@@ -141,6 +118,11 @@ static __init void da830_evm_usb_init(void)
pr_warn("%s: USB 1.1 PHY CLK registration failed: %d\n",
__func__, ret);
 
+   ret = da8xx_register_usb_phy();
+   if (ret)
+   pr_warn("%s: USB PHY registration failed: %d\n",
+   __func__, ret);
+
ret = davinci_cfg_reg(DA830_USB0_DRVVBUS);
if (ret)
pr_warn("%s: USB 2.0 PinMux setup failed: %d\n", __func__, ret);
diff --git a/arch/arm/mach-davinci/board-omapl138-hawk.c 
b/arch/arm/mach-davinci/board-omapl138-hawk.c
index 8691a25..c5cb8d9 100644
--- a/arch/arm/mach-davinci/board-omapl138-hawk.c
+++ b/arch/arm/mach-davinci/board-omapl138-hawk.c
@@ -260,6 +260,11 @@ static __init void omapl138_hawk_usb_init(void)
pr_warn("%s: USB 1.1 PHY CLK registration failed: %d\n",
__func__, ret);
 
+   ret = da8xx_register_usb_phy();
+   if (ret)
+   pr_warn("%s: USB PHY registration failed: %d\n",
+   __func__, ret);
+
ret = gpio_request_one(DA850_USB1_VBUS_PIN,
GPIOF_DIR_OUT, "USB1 VBUS");
if (ret < 0) {
diff --git a/arch/arm/mach-davinci/include/mach/da8xx.h 
b/arch/arm/mach-davinci/include/mach/da8xx.h
index c32444b..38d932e 100644
--- a/arch/arm/mach-davinci/include/mach/da8xx.h
+++ b/arch/arm/mach-davinci/include/mach/da8xx.h
@@ -92,6 +92,7 @@ int da8xx_register_watchdog(void);
 int da8xx_register_usb_refclkin(int rate);
 int da8xx_register_usb20_phy_clk(bool use_usb_refclkin);
 int da8xx_register_usb11_phy_clk(bool use_usb_refclkin);
+int da8xx_register_usb_phy(void);
 int da8xx_register_usb20(unsigned mA, unsigned potpgt);
 int da8xx_register_usb11(struct da8xx_ohci_root_hub *pdata);
 int da8xx_register_emac(void);
diff --git a/arch/arm/mach-davinci/usb-da8xx.c 
b/arch/arm/mach-davinci/usb-da8xx.c
index 71a6d85..9c30bff 100644
--- a/arch/arm/mach-davinci/usb-da8xx.c
+++ b/arch/arm/mach-davinci/usb-da8xx.c
@@ -7,6 +7,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -243,6 +244,16 @@ int __init da8xx_register_usb11_phy_clk(bool 
use_usb_refclkin)
return ret;
 }
 
+static struct platform_device da8xx_usb_phy = {
+   .name   = "da8xx-usb-phy",
+   .id = -1,
+};
+
+int __init da8xx_register_usb_phy(void)
+{
+   return platform_device_register(_usb_phy);
+}
+
 #if IS_ENABLED(CONFIG_USB_MUSB_HDRC)
 
 static struct musb_hdrc_config musb_config = {
-- 
2.7.4



[PATCH v6 2/5] ARM: davinci: da8xx: Add CFGCHIP syscon platform declaration.

2016-10-25 Thread David Lechner
The CFGCHIP registers are used by a number of devices, so using a syscon
device to share them. The first consumer of this will by the phy-da8xx-usb
driver.

Signed-off-by: David Lechner 
---

syscon device id is changed to -1 since there is only one syscon device.

 arch/arm/mach-davinci/board-da830-evm.c |  4 
 arch/arm/mach-davinci/board-da850-evm.c |  4 
 arch/arm/mach-davinci/board-mityomapl138.c  |  4 
 arch/arm/mach-davinci/board-omapl138-hawk.c |  4 
 arch/arm/mach-davinci/devices-da8xx.c   | 28 
 arch/arm/mach-davinci/include/mach/da8xx.h  |  2 ++
 6 files changed, 46 insertions(+)

diff --git a/arch/arm/mach-davinci/board-da830-evm.c 
b/arch/arm/mach-davinci/board-da830-evm.c
index 605d444..3051cb6 100644
--- a/arch/arm/mach-davinci/board-da830-evm.c
+++ b/arch/arm/mach-davinci/board-da830-evm.c
@@ -586,6 +586,10 @@ static __init void da830_evm_init(void)
struct davinci_soc_info *soc_info = _soc_info;
int ret;
 
+   ret = da8xx_register_cfgchip();
+   if (ret)
+   pr_warn("%s: CFGCHIP registration failed: %d\n", __func__, ret);
+
ret = da830_register_gpio();
if (ret)
pr_warn("%s: GPIO init failed: %d\n", __func__, ret);
diff --git a/arch/arm/mach-davinci/board-da850-evm.c 
b/arch/arm/mach-davinci/board-da850-evm.c
index 8e4539f..ec5cb10 100644
--- a/arch/arm/mach-davinci/board-da850-evm.c
+++ b/arch/arm/mach-davinci/board-da850-evm.c
@@ -1345,6 +1345,10 @@ static __init void da850_evm_init(void)
 {
int ret;
 
+   ret = da8xx_register_cfgchip();
+   if (ret)
+   pr_warn("%s: CFGCHIP registration failed: %d\n", __func__, ret);
+
ret = da850_register_gpio();
if (ret)
pr_warn("%s: GPIO init failed: %d\n", __func__, ret);
diff --git a/arch/arm/mach-davinci/board-mityomapl138.c 
b/arch/arm/mach-davinci/board-mityomapl138.c
index bc4e63f..1a6d430 100644
--- a/arch/arm/mach-davinci/board-mityomapl138.c
+++ b/arch/arm/mach-davinci/board-mityomapl138.c
@@ -514,6 +514,10 @@ static void __init mityomapl138_init(void)
 {
int ret;
 
+   ret = da8xx_register_cfgchip();
+   if (ret)
+   pr_warn("%s: CFGCHIP registration failed: %d\n", __func__, ret);
+
/* for now, no special EDMA channels are reserved */
ret = da850_register_edma(NULL);
if (ret)
diff --git a/arch/arm/mach-davinci/board-omapl138-hawk.c 
b/arch/arm/mach-davinci/board-omapl138-hawk.c
index d4930b6..8691a25 100644
--- a/arch/arm/mach-davinci/board-omapl138-hawk.c
+++ b/arch/arm/mach-davinci/board-omapl138-hawk.c
@@ -294,6 +294,10 @@ static __init void omapl138_hawk_init(void)
 {
int ret;
 
+   ret = da8xx_register_cfgchip();
+   if (ret)
+   pr_warn("%s: CFGCHIP registration failed: %d\n", __func__, ret);
+
ret = da850_register_gpio();
if (ret)
pr_warn("%s: GPIO init failed: %d\n", __func__, ret);
diff --git a/arch/arm/mach-davinci/devices-da8xx.c 
b/arch/arm/mach-davinci/devices-da8xx.c
index add3771..31a99db 100644
--- a/arch/arm/mach-davinci/devices-da8xx.c
+++ b/arch/arm/mach-davinci/devices-da8xx.c
@@ -11,6 +11,7 @@
  * (at your option) any later version.
  */
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1089,3 +1090,30 @@ int __init da850_register_sata(unsigned long refclkpn)
return platform_device_register(_sata_device);
 }
 #endif
+
+static struct syscon_platform_data da8xx_cfgchip_platform_data = {
+   .label  = "cfgchip",
+};
+
+static struct resource da8xx_cfgchip_resources[] = {
+   {
+   .start  = DA8XX_SYSCFG0_BASE + DA8XX_CFGCHIP0_REG,
+   .end= DA8XX_SYSCFG0_BASE + DA8XX_CFGCHIP4_REG + 3,
+   .flags  = IORESOURCE_MEM,
+   },
+};
+
+static struct platform_device da8xx_cfgchip_device = {
+   .name   = "syscon",
+   .id = -1,
+   .dev= {
+   .platform_data  = _cfgchip_platform_data,
+   },
+   .num_resources  = ARRAY_SIZE(da8xx_cfgchip_resources),
+   .resource   = da8xx_cfgchip_resources,
+};
+
+int __init da8xx_register_cfgchip(void)
+{
+   return platform_device_register(_cfgchip_device);
+}
diff --git a/arch/arm/mach-davinci/include/mach/da8xx.h 
b/arch/arm/mach-davinci/include/mach/da8xx.h
index c367530..c32444b 100644
--- a/arch/arm/mach-davinci/include/mach/da8xx.h
+++ b/arch/arm/mach-davinci/include/mach/da8xx.h
@@ -61,6 +61,7 @@ extern unsigned int da850_max_speed;
 #define DA8XX_CFGCHIP1_REG 0x180
 #define DA8XX_CFGCHIP2_REG 0x184
 #define DA8XX_CFGCHIP3_REG 0x188
+#define DA8XX_CFGCHIP4_REG 0x18c
 
 #define DA8XX_SYSCFG1_BASE (IO_PHYS + 0x22C000)
 #define DA8XX_SYSCFG1_VIRT(x)  (da8xx_syscfg1_base + (x))
@@ -116,6 +117,7 @@ void da8xx_rproc_reserve_cma(void);
 int da8xx_register_rproc(void);
 int da850_register_gpio(void);
 int da830_register_gpio(void);
+int 

[PATCH v6 5/5] ARM: DTS: da850: Add usb phy node

2016-10-25 Thread David Lechner
Add a node for the new usb phy driver.

Signed-off-by: David Lechner 
---
 arch/arm/boot/dts/da850.dtsi | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/arm/boot/dts/da850.dtsi b/arch/arm/boot/dts/da850.dtsi
index 6bbf20d..33fcdce 100644
--- a/arch/arm/boot/dts/da850.dtsi
+++ b/arch/arm/boot/dts/da850.dtsi
@@ -376,6 +376,11 @@
>;
status = "disabled";
};
+   usb_phy: usb-phy {
+   compatible = "ti,da830-usb-phy";
+   #phy-cells = <1>;
+   status = "disabled";
+   };
gpio: gpio@226000 {
compatible = "ti,dm6441-gpio";
gpio-controller;
-- 
2.7.4



[PATCH v6 2/5] ARM: davinci: da8xx: Add CFGCHIP syscon platform declaration.

2016-10-25 Thread David Lechner
The CFGCHIP registers are used by a number of devices, so using a syscon
device to share them. The first consumer of this will by the phy-da8xx-usb
driver.

Signed-off-by: David Lechner 
---

syscon device id is changed to -1 since there is only one syscon device.

 arch/arm/mach-davinci/board-da830-evm.c |  4 
 arch/arm/mach-davinci/board-da850-evm.c |  4 
 arch/arm/mach-davinci/board-mityomapl138.c  |  4 
 arch/arm/mach-davinci/board-omapl138-hawk.c |  4 
 arch/arm/mach-davinci/devices-da8xx.c   | 28 
 arch/arm/mach-davinci/include/mach/da8xx.h  |  2 ++
 6 files changed, 46 insertions(+)

diff --git a/arch/arm/mach-davinci/board-da830-evm.c 
b/arch/arm/mach-davinci/board-da830-evm.c
index 605d444..3051cb6 100644
--- a/arch/arm/mach-davinci/board-da830-evm.c
+++ b/arch/arm/mach-davinci/board-da830-evm.c
@@ -586,6 +586,10 @@ static __init void da830_evm_init(void)
struct davinci_soc_info *soc_info = _soc_info;
int ret;
 
+   ret = da8xx_register_cfgchip();
+   if (ret)
+   pr_warn("%s: CFGCHIP registration failed: %d\n", __func__, ret);
+
ret = da830_register_gpio();
if (ret)
pr_warn("%s: GPIO init failed: %d\n", __func__, ret);
diff --git a/arch/arm/mach-davinci/board-da850-evm.c 
b/arch/arm/mach-davinci/board-da850-evm.c
index 8e4539f..ec5cb10 100644
--- a/arch/arm/mach-davinci/board-da850-evm.c
+++ b/arch/arm/mach-davinci/board-da850-evm.c
@@ -1345,6 +1345,10 @@ static __init void da850_evm_init(void)
 {
int ret;
 
+   ret = da8xx_register_cfgchip();
+   if (ret)
+   pr_warn("%s: CFGCHIP registration failed: %d\n", __func__, ret);
+
ret = da850_register_gpio();
if (ret)
pr_warn("%s: GPIO init failed: %d\n", __func__, ret);
diff --git a/arch/arm/mach-davinci/board-mityomapl138.c 
b/arch/arm/mach-davinci/board-mityomapl138.c
index bc4e63f..1a6d430 100644
--- a/arch/arm/mach-davinci/board-mityomapl138.c
+++ b/arch/arm/mach-davinci/board-mityomapl138.c
@@ -514,6 +514,10 @@ static void __init mityomapl138_init(void)
 {
int ret;
 
+   ret = da8xx_register_cfgchip();
+   if (ret)
+   pr_warn("%s: CFGCHIP registration failed: %d\n", __func__, ret);
+
/* for now, no special EDMA channels are reserved */
ret = da850_register_edma(NULL);
if (ret)
diff --git a/arch/arm/mach-davinci/board-omapl138-hawk.c 
b/arch/arm/mach-davinci/board-omapl138-hawk.c
index d4930b6..8691a25 100644
--- a/arch/arm/mach-davinci/board-omapl138-hawk.c
+++ b/arch/arm/mach-davinci/board-omapl138-hawk.c
@@ -294,6 +294,10 @@ static __init void omapl138_hawk_init(void)
 {
int ret;
 
+   ret = da8xx_register_cfgchip();
+   if (ret)
+   pr_warn("%s: CFGCHIP registration failed: %d\n", __func__, ret);
+
ret = da850_register_gpio();
if (ret)
pr_warn("%s: GPIO init failed: %d\n", __func__, ret);
diff --git a/arch/arm/mach-davinci/devices-da8xx.c 
b/arch/arm/mach-davinci/devices-da8xx.c
index add3771..31a99db 100644
--- a/arch/arm/mach-davinci/devices-da8xx.c
+++ b/arch/arm/mach-davinci/devices-da8xx.c
@@ -11,6 +11,7 @@
  * (at your option) any later version.
  */
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1089,3 +1090,30 @@ int __init da850_register_sata(unsigned long refclkpn)
return platform_device_register(_sata_device);
 }
 #endif
+
+static struct syscon_platform_data da8xx_cfgchip_platform_data = {
+   .label  = "cfgchip",
+};
+
+static struct resource da8xx_cfgchip_resources[] = {
+   {
+   .start  = DA8XX_SYSCFG0_BASE + DA8XX_CFGCHIP0_REG,
+   .end= DA8XX_SYSCFG0_BASE + DA8XX_CFGCHIP4_REG + 3,
+   .flags  = IORESOURCE_MEM,
+   },
+};
+
+static struct platform_device da8xx_cfgchip_device = {
+   .name   = "syscon",
+   .id = -1,
+   .dev= {
+   .platform_data  = _cfgchip_platform_data,
+   },
+   .num_resources  = ARRAY_SIZE(da8xx_cfgchip_resources),
+   .resource   = da8xx_cfgchip_resources,
+};
+
+int __init da8xx_register_cfgchip(void)
+{
+   return platform_device_register(_cfgchip_device);
+}
diff --git a/arch/arm/mach-davinci/include/mach/da8xx.h 
b/arch/arm/mach-davinci/include/mach/da8xx.h
index c367530..c32444b 100644
--- a/arch/arm/mach-davinci/include/mach/da8xx.h
+++ b/arch/arm/mach-davinci/include/mach/da8xx.h
@@ -61,6 +61,7 @@ extern unsigned int da850_max_speed;
 #define DA8XX_CFGCHIP1_REG 0x180
 #define DA8XX_CFGCHIP2_REG 0x184
 #define DA8XX_CFGCHIP3_REG 0x188
+#define DA8XX_CFGCHIP4_REG 0x18c
 
 #define DA8XX_SYSCFG1_BASE (IO_PHYS + 0x22C000)
 #define DA8XX_SYSCFG1_VIRT(x)  (da8xx_syscfg1_base + (x))
@@ -116,6 +117,7 @@ void da8xx_rproc_reserve_cma(void);
 int da8xx_register_rproc(void);
 int da850_register_gpio(void);
 int da830_register_gpio(void);
+int da8xx_register_cfgchip(void);
 
 

[PATCH v6 5/5] ARM: DTS: da850: Add usb phy node

2016-10-25 Thread David Lechner
Add a node for the new usb phy driver.

Signed-off-by: David Lechner 
---
 arch/arm/boot/dts/da850.dtsi | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/arm/boot/dts/da850.dtsi b/arch/arm/boot/dts/da850.dtsi
index 6bbf20d..33fcdce 100644
--- a/arch/arm/boot/dts/da850.dtsi
+++ b/arch/arm/boot/dts/da850.dtsi
@@ -376,6 +376,11 @@
>;
status = "disabled";
};
+   usb_phy: usb-phy {
+   compatible = "ti,da830-usb-phy";
+   #phy-cells = <1>;
+   status = "disabled";
+   };
gpio: gpio@226000 {
compatible = "ti,dm6441-gpio";
gpio-controller;
-- 
2.7.4



linux-next: no releases next week

2016-10-25 Thread Stephen Rothwell
Hi all,

There will probably be no linux-next releases next week while I am
attending Kernel Summit.

-- 
Cheers,
Stephen Rothwell


linux-next: no releases next week

2016-10-25 Thread Stephen Rothwell
Hi all,

There will probably be no linux-next releases next week while I am
attending Kernel Summit.

-- 
Cheers,
Stephen Rothwell


linux-next: Tree for Oct 26

2016-10-25 Thread Stephen Rothwell
Hi all,

There will probably be no linux-next releases next week while I attend
the Kernel Summit.

Changes since 20161025:

The sunxi tree lost its build failure.

The akpm-current tree still had its build failures for which I applied
2 patches.

Non-merge commits (relative to Linus' tree): 2628
 3334 files changed, 210166 insertions(+), 49968 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc and an allmodconfig (with
CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a
native build of tools/perf. After the final fixups (if any), I do an
x86_64 modules_install followed by builds for x86_64 allnoconfig,
powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig
(this fails its final link) and pseries_le_defconfig and i386, sparc
and sparc64 defconfig.

Below is a summary of the state of the merge.

I am currently merging 245 trees (counting Linus' and 35 trees of patches
pending for Linus' tree).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (9fe68cad6e74 Merge branch 'linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6)
Merging fixes/master (30066ce675d3 Merge branch 'linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6)
Merging kbuild-current/rc-fixes (989cea5c14be kbuild: prevent lib-ksyms.o 
rebuilds)
Merging arc-current/for-curr (9868c77a82f7 ARC: build: retire old toggles)
Merging arm-current/fixes (6127d124ee4e ARM: wire up new pkey syscalls)
Merging m68k-current/for-linus (6736e65effc3 m68k: Migrate exception table 
users off module.h and onto extable.h)
Merging metag-fixes/fixes (35d04077ad96 metag: Only define 
atomic_dec_if_positive conditionally)
Merging powerpc-fixes/fixes (09b7e37b18ee powerpc/64: Fix race condition in 
setting lock bit in idle/wakeup code)
Merging sparc/master (ee9e83973d54 sparc32: Fix old style declaration GCC 
warnings)
Merging net/master (44060abe1dd6 Merge branch 'for-upstream' of 
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth)
CONFLICT (content): Merge conflict in drivers/net/ethernet/qlogic/Kconfig
Applying: qed*: merge fix for CONFIG_INFINIBAND_QEDR Kconfig move
Merging ipsec/master (7f92083eb58f vti6: flush x-netns xfrm cache when vti 
interface is removed)
Merging netfilter/master (7034b566a4e7 netfilter: fix nf_queue handling)
Merging ipvs/master (ea43f860d984 Merge branch 'ethoc-fixes')
Merging wireless-drivers/master (1ea2643961b0 ath6kl: add Dell OEM SDIO I/O for 
the Venue 8 Pro)
Merging mac80211/master (b4f0fd4baa90 qed: Use list_move_tail instead of 
list_del/list_add_tail)
Merging sound-current/for-linus (9b50898ad96c ALSA: seq: Fix time account 
regression)
Merging pci-current/for-linus (02a1b8f4167e PCI: designware-plat: Update author 
email address)
Merging driver-core.current/driver-core-linus (07d9a380680d Linux 4.9-rc2)
Merging tty.current/tty-linus (1001354ca341 Linux 4.9-rc1)
Merging usb.current/usb-linus (b76032396d79 usb: renesas_usbhs: add wait after 
initialization for R-Car Gen3)
Merging usb-gadget-fixes/fixes (a1aa8cf6471b Revert "Documentation: devicetree: 
dwc2: Deprecate g-tx-fifo-size")
Merging usb-serial-fixes/usb-linus (07d9a380680d Linux 4.9-rc2)
Merging usb-chipidea-fixes/ci-for-usb-stable (6b7f456e67a1 usb: chipidea: host: 
fix NULL ptr dereference during shutdown)
Merging phy/fixes (1001354ca341 Linux 4.9-rc1)
Merging staging.current/staging-linus (e866dd8aab76 greybus: fix a leak on 
error in gb_module_create())
Merging char-misc.current/char-misc-linus (407a3aee6ee2 hv: do not lose pending 
heartbeat vmbus packets)
Merging input-current/for-linus (324ae0958cab Input: psmouse - cleanup 
Focaltech code)
Merging crypto-current/master (6d4952d9d9d4 hwrng: core - Don't use a stack 
buffer in add_early_randomness())
Merging ide/master (797cee982eef Merge branch 'stable-4.8' of 
git://g

linux-next: Tree for Oct 26

2016-10-25 Thread Stephen Rothwell
Hi all,

There will probably be no linux-next releases next week while I attend
the Kernel Summit.

Changes since 20161025:

The sunxi tree lost its build failure.

The akpm-current tree still had its build failures for which I applied
2 patches.

Non-merge commits (relative to Linus' tree): 2628
 3334 files changed, 210166 insertions(+), 49968 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc and an allmodconfig (with
CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a
native build of tools/perf. After the final fixups (if any), I do an
x86_64 modules_install followed by builds for x86_64 allnoconfig,
powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig
(this fails its final link) and pseries_le_defconfig and i386, sparc
and sparc64 defconfig.

Below is a summary of the state of the merge.

I am currently merging 245 trees (counting Linus' and 35 trees of patches
pending for Linus' tree).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (9fe68cad6e74 Merge branch 'linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6)
Merging fixes/master (30066ce675d3 Merge branch 'linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6)
Merging kbuild-current/rc-fixes (989cea5c14be kbuild: prevent lib-ksyms.o 
rebuilds)
Merging arc-current/for-curr (9868c77a82f7 ARC: build: retire old toggles)
Merging arm-current/fixes (6127d124ee4e ARM: wire up new pkey syscalls)
Merging m68k-current/for-linus (6736e65effc3 m68k: Migrate exception table 
users off module.h and onto extable.h)
Merging metag-fixes/fixes (35d04077ad96 metag: Only define 
atomic_dec_if_positive conditionally)
Merging powerpc-fixes/fixes (09b7e37b18ee powerpc/64: Fix race condition in 
setting lock bit in idle/wakeup code)
Merging sparc/master (ee9e83973d54 sparc32: Fix old style declaration GCC 
warnings)
Merging net/master (44060abe1dd6 Merge branch 'for-upstream' of 
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth)
CONFLICT (content): Merge conflict in drivers/net/ethernet/qlogic/Kconfig
Applying: qed*: merge fix for CONFIG_INFINIBAND_QEDR Kconfig move
Merging ipsec/master (7f92083eb58f vti6: flush x-netns xfrm cache when vti 
interface is removed)
Merging netfilter/master (7034b566a4e7 netfilter: fix nf_queue handling)
Merging ipvs/master (ea43f860d984 Merge branch 'ethoc-fixes')
Merging wireless-drivers/master (1ea2643961b0 ath6kl: add Dell OEM SDIO I/O for 
the Venue 8 Pro)
Merging mac80211/master (b4f0fd4baa90 qed: Use list_move_tail instead of 
list_del/list_add_tail)
Merging sound-current/for-linus (9b50898ad96c ALSA: seq: Fix time account 
regression)
Merging pci-current/for-linus (02a1b8f4167e PCI: designware-plat: Update author 
email address)
Merging driver-core.current/driver-core-linus (07d9a380680d Linux 4.9-rc2)
Merging tty.current/tty-linus (1001354ca341 Linux 4.9-rc1)
Merging usb.current/usb-linus (b76032396d79 usb: renesas_usbhs: add wait after 
initialization for R-Car Gen3)
Merging usb-gadget-fixes/fixes (a1aa8cf6471b Revert "Documentation: devicetree: 
dwc2: Deprecate g-tx-fifo-size")
Merging usb-serial-fixes/usb-linus (07d9a380680d Linux 4.9-rc2)
Merging usb-chipidea-fixes/ci-for-usb-stable (6b7f456e67a1 usb: chipidea: host: 
fix NULL ptr dereference during shutdown)
Merging phy/fixes (1001354ca341 Linux 4.9-rc1)
Merging staging.current/staging-linus (e866dd8aab76 greybus: fix a leak on 
error in gb_module_create())
Merging char-misc.current/char-misc-linus (407a3aee6ee2 hv: do not lose pending 
heartbeat vmbus packets)
Merging input-current/for-linus (324ae0958cab Input: psmouse - cleanup 
Focaltech code)
Merging crypto-current/master (6d4952d9d9d4 hwrng: core - Don't use a stack 
buffer in add_early_randomness())
Merging ide/master (797cee982eef Merge branch 'stable-4.8' of 
git://g

Re: [RFC PATCH 0/6] UART slave devices using serio

2016-10-25 Thread Sebastian Reichel
Hi,

On Tue, Oct 25, 2016 at 05:02:23PM -0500, Rob Herring wrote:
> On Tue, Oct 25, 2016 at 4:55 PM, Sebastian Reichel wrote:
> > On Wed, Aug 24, 2016 at 06:24:30PM -0500, Rob Herring wrote:
> >> [...]
> > I had a more detailed look at the series during the last two weeks.
> > For me the approach looks ok and it should work for the nokia bluetooth
> > use case. Actually my work on that driver is more or less stalled until
> > this is solved, so it would be nice to get this forward. Whose feedback
> > is this waiting from? I guess
> 
> I think it is mainly waiting for me to spend more time on it and get
> the tty port part done.

The general approach could already be commented on.

> I could use help especially for converting the BT part properly.

Ok, I will have a look at that.

> >  * Alan & Greg for the serial parts
> >  * Marcel for the bluetooth parts
> >  * Dmitry for the serio parts
> >
> > Maybe you can try to find some minutes at the Kernel Summit to talk
> > about this?
> 
> Still waiting for my invite...
> But I will be at Plumbers if folks want to discuss this.

Ok. I obviously assumed invites have already been sent and that you
would be invited.

-- Sebastian


signature.asc
Description: PGP signature


Re: [RFC PATCH 0/6] UART slave devices using serio

2016-10-25 Thread Sebastian Reichel
Hi,

On Tue, Oct 25, 2016 at 05:02:23PM -0500, Rob Herring wrote:
> On Tue, Oct 25, 2016 at 4:55 PM, Sebastian Reichel wrote:
> > On Wed, Aug 24, 2016 at 06:24:30PM -0500, Rob Herring wrote:
> >> [...]
> > I had a more detailed look at the series during the last two weeks.
> > For me the approach looks ok and it should work for the nokia bluetooth
> > use case. Actually my work on that driver is more or less stalled until
> > this is solved, so it would be nice to get this forward. Whose feedback
> > is this waiting from? I guess
> 
> I think it is mainly waiting for me to spend more time on it and get
> the tty port part done.

The general approach could already be commented on.

> I could use help especially for converting the BT part properly.

Ok, I will have a look at that.

> >  * Alan & Greg for the serial parts
> >  * Marcel for the bluetooth parts
> >  * Dmitry for the serio parts
> >
> > Maybe you can try to find some minutes at the Kernel Summit to talk
> > about this?
> 
> Still waiting for my invite...
> But I will be at Plumbers if folks want to discuss this.

Ok. I obviously assumed invites have already been sent and that you
would be invited.

-- Sebastian


signature.asc
Description: PGP signature


[PATCH v2 5/5] posix-timers: make it configurable

2016-10-25 Thread Nicolas Pitre
Some embedded systems have no use for them.  This removes about
22KB from the kernel binary size when configured out.

Corresponding syscalls are routed to a stub logging the attempt to
use those syscalls which should be enough of a clue if they were
disabled without proper consideration. They are: timer_create,
timer_gettime: timer_getoverrun, timer_settime, timer_delete,
clock_adjtime.

The clock_settime, clock_gettime, clock_getres and clock_nanosleep
syscalls are replaced by simple wrappers compatible with CLOCK_REALTIME,
CLOCK_MONOTONIC and CLOCK_BOOTTIME only which should cover the vast
majority of use cases with very little code.

Signed-off-by: Nicolas Pitre 
Reviewed-by: Josh Triplett 
---
 drivers/ptp/Kconfig  |   2 +-
 include/linux/posix-timers.h |  28 +-
 include/linux/sched.h|  10 
 init/Kconfig |  17 +++
 kernel/signal.c  |   4 ++
 kernel/time/Makefile |  10 +++-
 kernel/time/posix-stubs.c| 118 +++
 7 files changed, 184 insertions(+), 5 deletions(-)
 create mode 100644 kernel/time/posix-stubs.c

diff --git a/drivers/ptp/Kconfig b/drivers/ptp/Kconfig
index 0f7492f8ea..bdce332911 100644
--- a/drivers/ptp/Kconfig
+++ b/drivers/ptp/Kconfig
@@ -6,7 +6,7 @@ menu "PTP clock support"
 
 config PTP_1588_CLOCK
tristate "PTP clock support"
-   depends on NET
+   depends on NET && POSIX_TIMERS
select PPS
select NET_PTP_CLASSIFY
help
diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h
index 62d44c1760..2288c5c557 100644
--- a/include/linux/posix-timers.h
+++ b/include/linux/posix-timers.h
@@ -118,6 +118,8 @@ struct k_clock {
 extern struct k_clock clock_posix_cpu;
 extern struct k_clock clock_posix_dynamic;
 
+#ifdef CONFIG_POSIX_TIMERS
+
 void posix_timers_register_clock(const clockid_t clock_id, struct k_clock 
*new_clock);
 
 /* function to call to trigger timer event */
@@ -131,8 +133,30 @@ void posix_cpu_timers_exit_group(struct task_struct *task);
 void set_process_cpu_timer(struct task_struct *task, unsigned int clock_idx,
   cputime_t *newval, cputime_t *oldval);
 
-long clock_nanosleep_restart(struct restart_block *restart_block);
-
 void update_rlimit_cpu(struct task_struct *task, unsigned long rlim_new);
 
+#else
+
+#include 
+
+static inline void posix_timers_register_clock(const clockid_t clock_id,
+  struct k_clock *new_clock) {}
+static inline int posix_timer_event(struct k_itimer *timr, int si_private)
+{ return 0; }
+static inline void run_posix_cpu_timers(struct task_struct *task) {}
+static inline void posix_cpu_timers_exit(struct task_struct *task)
+{
+   add_device_randomness((const void*) >se.sum_exec_runtime,
+ sizeof(unsigned long long));
+}
+static inline void posix_cpu_timers_exit_group(struct task_struct *task) {}
+static inline void set_process_cpu_timer(struct task_struct *task,
+   unsigned int clock_idx, cputime_t *newval, cputime_t *oldval) {}
+static inline void update_rlimit_cpu(struct task_struct *task,
+unsigned long rlim_new) {}
+
+#endif
+
+long clock_nanosleep_restart(struct restart_block *restart_block);
+
 #endif
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 348f51b0ec..ad716d5559 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2946,8 +2946,13 @@ static inline void exit_thread(struct task_struct *tsk)
 extern void exit_files(struct task_struct *);
 extern void __cleanup_sighand(struct sighand_struct *);
 
+#ifdef CONFIG_POSIX_TIMERS
 extern void exit_itimers(struct signal_struct *);
 extern void flush_itimer_signals(void);
+#else
+static inline void exit_itimers(struct signal_struct *s) {}
+static inline void flush_itimer_signals(void) {}
+#endif
 
 extern void do_group_exit(int);
 
@@ -3450,7 +3455,12 @@ static __always_inline bool need_resched(void)
  * Thread group CPU time accounting.
  */
 void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times);
+#ifdef CONFIG_POSIX_TIMERS
 void thread_group_cputimer(struct task_struct *tsk, struct task_cputime 
*times);
+#else
+static inline void thread_group_cputimer(struct task_struct *tsk,
+struct task_cputime *times) {}
+#endif
 
 /*
  * Reevaluate whether the task has signals pending delivery.
diff --git a/init/Kconfig b/init/Kconfig
index 34407f15e6..351d422252 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1445,6 +1445,23 @@ config SYSCTL_SYSCALL
 
  If unsure say N here.
 
+config POSIX_TIMERS
+   bool "Posix Clocks & timers" if EXPERT
+   default y
+   help
+ This includes native support for POSIX timers to the kernel.
+ Most embedded systems may have no use for them and therefore they
+ can be configured out to 

[PATCH v2 3/5] kconfig: regenerate *.c_shipped files after previous changes

2016-10-25 Thread Nicolas Pitre
Signed-off-by: Nicolas Pitre 
---
 scripts/kconfig/zconf.hash.c_shipped |  228 ++---
 scripts/kconfig/zconf.tab.c_shipped  | 1631 --
 2 files changed, 888 insertions(+), 971 deletions(-)

diff --git a/scripts/kconfig/zconf.hash.c_shipped 
b/scripts/kconfig/zconf.hash.c_shipped
index 360a62df2b..bf7f1378b3 100644
--- a/scripts/kconfig/zconf.hash.c_shipped
+++ b/scripts/kconfig/zconf.hash.c_shipped
@@ -32,7 +32,7 @@
 struct kconf_id;
 
 static const struct kconf_id *kconf_id_lookup(register const char *str, 
register unsigned int len);
-/* maximum key range = 71, duplicates = 0 */
+/* maximum key range = 72, duplicates = 0 */
 
 #ifdef __GNUC__
 __inline
@@ -46,32 +46,32 @@ kconf_id_hash (register const char *str, register unsigned 
int len)
 {
   static const unsigned char asso_values[] =
 {
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73,  0, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73,  5, 25, 25,
-   0,  0,  0,  5,  0,  0, 73, 73,  5,  0,
-  10,  5, 45, 73, 20, 20,  0, 15, 15, 73,
-  20,  5, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74,  0, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74,  0, 20, 10,
+   0,  0,  0, 30,  0,  0, 74, 74,  5, 15,
+   0, 25, 40, 74, 15,  0,  0, 10, 35, 74,
+  10,  0, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74
 };
   register int hval = len;
 
@@ -97,33 +97,35 @@ struct kconf_id_strings_t
 char kconf_id_strings_str8[sizeof("tristate")];
 char kconf_id_strings_str9[sizeof("endchoice")];
 char kconf_id_strings_str10[sizeof("---help---")];
+char kconf_id_strings_str11[sizeof("select")];
 char kconf_id_strings_str12[sizeof("def_tristate")];
 char kconf_id_strings_str13[sizeof("def_bool")];
 char kconf_id_strings_str14[sizeof("defconfig_list")];
-char kconf_id_strings_str17[sizeof("on")];
-char kconf_id_strings_str18[sizeof("optional")];
-char kconf_id_strings_str21[sizeof("option")];
-char kconf_id_strings_str22[sizeof("endmenu")];
-char kconf_id_strings_str23[sizeof("mainmenu")];
-char kconf_id_strings_str25[sizeof("menuconfig")];
-char kconf_id_strings_str27[sizeof("modules")];
-char kconf_id_strings_str28[sizeof("allnoconfig_y")];
+char kconf_id_strings_str16[sizeof("source")];
+char kconf_id_strings_str17[sizeof("endmenu")];
+char kconf_id_strings_str18[sizeof("allnoconfig_y")];
+char kconf_id_strings_str20[sizeof("range")];
+char kconf_id_strings_str22[sizeof("modules")];
+char kconf_id_strings_str23[sizeof("hex")];
+char kconf_id_strings_str27[sizeof("on")];
 char kconf_id_strings_str29[sizeof("menu")];
-char kconf_id_strings_str31[sizeof("select")];
+char kconf_id_strings_str31[sizeof("option")];
 char kconf_id_strings_str32[sizeof("comment")];
-char kconf_id_strings_str33[sizeof("env")];
-char kconf_id_strings_str35[sizeof("range")];
-char kconf_id_strings_str36[sizeof("choice")];
-char kconf_id_strings_str39[sizeof("bool")];
-char kconf_id_strings_str41[sizeof("source")];
+

[PATCH v2 5/5] posix-timers: make it configurable

2016-10-25 Thread Nicolas Pitre
Some embedded systems have no use for them.  This removes about
22KB from the kernel binary size when configured out.

Corresponding syscalls are routed to a stub logging the attempt to
use those syscalls which should be enough of a clue if they were
disabled without proper consideration. They are: timer_create,
timer_gettime: timer_getoverrun, timer_settime, timer_delete,
clock_adjtime.

The clock_settime, clock_gettime, clock_getres and clock_nanosleep
syscalls are replaced by simple wrappers compatible with CLOCK_REALTIME,
CLOCK_MONOTONIC and CLOCK_BOOTTIME only which should cover the vast
majority of use cases with very little code.

Signed-off-by: Nicolas Pitre 
Reviewed-by: Josh Triplett 
---
 drivers/ptp/Kconfig  |   2 +-
 include/linux/posix-timers.h |  28 +-
 include/linux/sched.h|  10 
 init/Kconfig |  17 +++
 kernel/signal.c  |   4 ++
 kernel/time/Makefile |  10 +++-
 kernel/time/posix-stubs.c| 118 +++
 7 files changed, 184 insertions(+), 5 deletions(-)
 create mode 100644 kernel/time/posix-stubs.c

diff --git a/drivers/ptp/Kconfig b/drivers/ptp/Kconfig
index 0f7492f8ea..bdce332911 100644
--- a/drivers/ptp/Kconfig
+++ b/drivers/ptp/Kconfig
@@ -6,7 +6,7 @@ menu "PTP clock support"
 
 config PTP_1588_CLOCK
tristate "PTP clock support"
-   depends on NET
+   depends on NET && POSIX_TIMERS
select PPS
select NET_PTP_CLASSIFY
help
diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h
index 62d44c1760..2288c5c557 100644
--- a/include/linux/posix-timers.h
+++ b/include/linux/posix-timers.h
@@ -118,6 +118,8 @@ struct k_clock {
 extern struct k_clock clock_posix_cpu;
 extern struct k_clock clock_posix_dynamic;
 
+#ifdef CONFIG_POSIX_TIMERS
+
 void posix_timers_register_clock(const clockid_t clock_id, struct k_clock 
*new_clock);
 
 /* function to call to trigger timer event */
@@ -131,8 +133,30 @@ void posix_cpu_timers_exit_group(struct task_struct *task);
 void set_process_cpu_timer(struct task_struct *task, unsigned int clock_idx,
   cputime_t *newval, cputime_t *oldval);
 
-long clock_nanosleep_restart(struct restart_block *restart_block);
-
 void update_rlimit_cpu(struct task_struct *task, unsigned long rlim_new);
 
+#else
+
+#include 
+
+static inline void posix_timers_register_clock(const clockid_t clock_id,
+  struct k_clock *new_clock) {}
+static inline int posix_timer_event(struct k_itimer *timr, int si_private)
+{ return 0; }
+static inline void run_posix_cpu_timers(struct task_struct *task) {}
+static inline void posix_cpu_timers_exit(struct task_struct *task)
+{
+   add_device_randomness((const void*) >se.sum_exec_runtime,
+ sizeof(unsigned long long));
+}
+static inline void posix_cpu_timers_exit_group(struct task_struct *task) {}
+static inline void set_process_cpu_timer(struct task_struct *task,
+   unsigned int clock_idx, cputime_t *newval, cputime_t *oldval) {}
+static inline void update_rlimit_cpu(struct task_struct *task,
+unsigned long rlim_new) {}
+
+#endif
+
+long clock_nanosleep_restart(struct restart_block *restart_block);
+
 #endif
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 348f51b0ec..ad716d5559 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2946,8 +2946,13 @@ static inline void exit_thread(struct task_struct *tsk)
 extern void exit_files(struct task_struct *);
 extern void __cleanup_sighand(struct sighand_struct *);
 
+#ifdef CONFIG_POSIX_TIMERS
 extern void exit_itimers(struct signal_struct *);
 extern void flush_itimer_signals(void);
+#else
+static inline void exit_itimers(struct signal_struct *s) {}
+static inline void flush_itimer_signals(void) {}
+#endif
 
 extern void do_group_exit(int);
 
@@ -3450,7 +3455,12 @@ static __always_inline bool need_resched(void)
  * Thread group CPU time accounting.
  */
 void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times);
+#ifdef CONFIG_POSIX_TIMERS
 void thread_group_cputimer(struct task_struct *tsk, struct task_cputime 
*times);
+#else
+static inline void thread_group_cputimer(struct task_struct *tsk,
+struct task_cputime *times) {}
+#endif
 
 /*
  * Reevaluate whether the task has signals pending delivery.
diff --git a/init/Kconfig b/init/Kconfig
index 34407f15e6..351d422252 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1445,6 +1445,23 @@ config SYSCTL_SYSCALL
 
  If unsure say N here.
 
+config POSIX_TIMERS
+   bool "Posix Clocks & timers" if EXPERT
+   default y
+   help
+ This includes native support for POSIX timers to the kernel.
+ Most embedded systems may have no use for them and therefore they
+ can be configured out to reduce the size of the kernel image.
+
+ 

[PATCH v2 3/5] kconfig: regenerate *.c_shipped files after previous changes

2016-10-25 Thread Nicolas Pitre
Signed-off-by: Nicolas Pitre 
---
 scripts/kconfig/zconf.hash.c_shipped |  228 ++---
 scripts/kconfig/zconf.tab.c_shipped  | 1631 --
 2 files changed, 888 insertions(+), 971 deletions(-)

diff --git a/scripts/kconfig/zconf.hash.c_shipped 
b/scripts/kconfig/zconf.hash.c_shipped
index 360a62df2b..bf7f1378b3 100644
--- a/scripts/kconfig/zconf.hash.c_shipped
+++ b/scripts/kconfig/zconf.hash.c_shipped
@@ -32,7 +32,7 @@
 struct kconf_id;
 
 static const struct kconf_id *kconf_id_lookup(register const char *str, 
register unsigned int len);
-/* maximum key range = 71, duplicates = 0 */
+/* maximum key range = 72, duplicates = 0 */
 
 #ifdef __GNUC__
 __inline
@@ -46,32 +46,32 @@ kconf_id_hash (register const char *str, register unsigned 
int len)
 {
   static const unsigned char asso_values[] =
 {
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73,  0, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73,  5, 25, 25,
-   0,  0,  0,  5,  0,  0, 73, 73,  5,  0,
-  10,  5, 45, 73, 20, 20,  0, 15, 15, 73,
-  20,  5, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74,  0, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74,  0, 20, 10,
+   0,  0,  0, 30,  0,  0, 74, 74,  5, 15,
+   0, 25, 40, 74, 15,  0,  0, 10, 35, 74,
+  10,  0, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74, 74, 74, 74, 74,
+  74, 74, 74, 74, 74, 74
 };
   register int hval = len;
 
@@ -97,33 +97,35 @@ struct kconf_id_strings_t
 char kconf_id_strings_str8[sizeof("tristate")];
 char kconf_id_strings_str9[sizeof("endchoice")];
 char kconf_id_strings_str10[sizeof("---help---")];
+char kconf_id_strings_str11[sizeof("select")];
 char kconf_id_strings_str12[sizeof("def_tristate")];
 char kconf_id_strings_str13[sizeof("def_bool")];
 char kconf_id_strings_str14[sizeof("defconfig_list")];
-char kconf_id_strings_str17[sizeof("on")];
-char kconf_id_strings_str18[sizeof("optional")];
-char kconf_id_strings_str21[sizeof("option")];
-char kconf_id_strings_str22[sizeof("endmenu")];
-char kconf_id_strings_str23[sizeof("mainmenu")];
-char kconf_id_strings_str25[sizeof("menuconfig")];
-char kconf_id_strings_str27[sizeof("modules")];
-char kconf_id_strings_str28[sizeof("allnoconfig_y")];
+char kconf_id_strings_str16[sizeof("source")];
+char kconf_id_strings_str17[sizeof("endmenu")];
+char kconf_id_strings_str18[sizeof("allnoconfig_y")];
+char kconf_id_strings_str20[sizeof("range")];
+char kconf_id_strings_str22[sizeof("modules")];
+char kconf_id_strings_str23[sizeof("hex")];
+char kconf_id_strings_str27[sizeof("on")];
 char kconf_id_strings_str29[sizeof("menu")];
-char kconf_id_strings_str31[sizeof("select")];
+char kconf_id_strings_str31[sizeof("option")];
 char kconf_id_strings_str32[sizeof("comment")];
-char kconf_id_strings_str33[sizeof("env")];
-char kconf_id_strings_str35[sizeof("range")];
-char kconf_id_strings_str36[sizeof("choice")];
-char kconf_id_strings_str39[sizeof("bool")];
-char kconf_id_strings_str41[sizeof("source")];
+char 

[PATCH v2 1/5] kconfig: introduce the "imply" keyword

2016-10-25 Thread Nicolas Pitre
The "imply" keyword is a weak version of "select" where the target
config symbol can still be turned off, avoiding those pitfalls that come
with the "select" keyword.

This is useful e.g. with multiple drivers that want to indicate their
ability to hook into a given subsystem while still being able to
configure that subsystem out and keep those drivers selected.

Currently, the same effect can almost be achieved with:

config DRIVER_A
tristate

config DRIVER_B
tristate

config DRIVER_C
tristate

config DRIVER_D
tristate

[...]

config SUBSYSTEM_X
tristate
default DRIVER_A || DRIVER_B || DRIVER_C || DRIVER_D || [...]

This is unwieldly to maintain especially with a large number of drivers.
Furthermore, there is no easy way to restrict the choice for SUBSYSTEM_X
to y or n, excluding m, when some drivers are built-in. The "select"
keyword allows for excluding m, but it excludes n as well. Hence
this "imply" keyword.  The above becomes:

config DRIVER_A
tristate
imply SUBSYSTEM_X

config DRIVER_B
tristate
imply SUBSYSTEM_X

[...]

config SUBSYSTEM_X
tristate

This is much cleaner, and way more flexible than "select". SUBSYSTEM_X
can still be configured out, and it can be set as a module when none of
the drivers are selected or all of them are also modular.

Signed-off-by: Nicolas Pitre 
Reviewed-by: Josh Triplett 
---
 Documentation/kbuild/kconfig-language.txt | 28 
 scripts/kconfig/expr.h|  2 ++
 scripts/kconfig/menu.c| 55 ++-
 scripts/kconfig/symbol.c  | 24 +-
 scripts/kconfig/zconf.gperf   |  1 +
 scripts/kconfig/zconf.y   | 16 +++--
 6 files changed, 107 insertions(+), 19 deletions(-)

diff --git a/Documentation/kbuild/kconfig-language.txt 
b/Documentation/kbuild/kconfig-language.txt
index 069fcb3eef..5ee0dd3c85 100644
--- a/Documentation/kbuild/kconfig-language.txt
+++ b/Documentation/kbuild/kconfig-language.txt
@@ -113,6 +113,33 @@ applicable everywhere (see syntax).
That will limit the usefulness but on the other hand avoid
the illegal configurations all over.
 
+- weak reverse dependencies: "imply"  ["if" ]
+  This is similar to "select" as it enforces a lower limit on another
+  symbol except that the "implied" config symbol's value may still be
+  set to n from a direct dependency or with a visible prompt.
+  Given the following example:
+
+  config FOO
+   tristate
+   imply BAZ
+
+  config BAZ
+   tristate
+   depends on BAR
+
+  The following values are possible:
+
+   FOO BAR BAZ's default   choice for BAZ
+   --- --- -   --
+   n   y   n   N/m/y
+   m   y   m   M/y/n
+   y   y   y   Y/n
+   y   n   *   N
+
+  This is useful e.g. with multiple drivers that want to indicate their
+  ability to hook into a given subsystem while still being able to
+  configure that subsystem out and keep those drivers selected.
+
 - limiting menu display: "visible if" 
   This attribute is only applicable to menu blocks, if the condition is
   false, the menu block is not displayed to the user (the symbols
@@ -481,6 +508,7 @@ historical issues resolved through these different 
solutions.
   b) Match dependency semantics:
b1) Swap all "select FOO" to "depends on FOO" or,
b2) Swap all "depends on FOO" to "select FOO"
+  c) Consider the use of "imply" instead of "select"
 
 The resolution to a) can be tested with the sample Kconfig file
 Documentation/kbuild/Kconfig.recursion-issue-01 through the removal
diff --git a/scripts/kconfig/expr.h b/scripts/kconfig/expr.h
index 973b6f7333..a73f762c48 100644
--- a/scripts/kconfig/expr.h
+++ b/scripts/kconfig/expr.h
@@ -85,6 +85,7 @@ struct symbol {
struct property *prop;
struct expr_value dir_dep;
struct expr_value rev_dep;
+   struct expr_value implied;
 };
 
 #define for_all_symbols(i, sym) for (i = 0; i < SYMBOL_HASHSIZE; i++) for (sym 
= symbol_hash[i]; sym; sym = sym->next) if (sym->type != S_OTHER)
@@ -136,6 +137,7 @@ enum prop_type {
P_DEFAULT,  /* default y */
P_CHOICE,   /* choice value */
P_SELECT,   /* select BAR */
+   P_IMPLY,/* imply BAR */
P_RANGE,/* range 7..100 (for a symbol) */
P_ENV,  /* value from environment variable */
P_SYMBOL,   /* where a symbol is defined */
diff --git a/scripts/kconfig/menu.c b/scripts/kconfig/menu.c
index aed678e8a7..e9357931b4 100644
--- a/scripts/kconfig/menu.c
+++ b/scripts/kconfig/menu.c
@@ -233,6 +233,8 @@ static void sym_check_prop(struct symbol *sym)
 {
struct property *prop;
  

[PATCH v2 1/5] kconfig: introduce the "imply" keyword

2016-10-25 Thread Nicolas Pitre
The "imply" keyword is a weak version of "select" where the target
config symbol can still be turned off, avoiding those pitfalls that come
with the "select" keyword.

This is useful e.g. with multiple drivers that want to indicate their
ability to hook into a given subsystem while still being able to
configure that subsystem out and keep those drivers selected.

Currently, the same effect can almost be achieved with:

config DRIVER_A
tristate

config DRIVER_B
tristate

config DRIVER_C
tristate

config DRIVER_D
tristate

[...]

config SUBSYSTEM_X
tristate
default DRIVER_A || DRIVER_B || DRIVER_C || DRIVER_D || [...]

This is unwieldly to maintain especially with a large number of drivers.
Furthermore, there is no easy way to restrict the choice for SUBSYSTEM_X
to y or n, excluding m, when some drivers are built-in. The "select"
keyword allows for excluding m, but it excludes n as well. Hence
this "imply" keyword.  The above becomes:

config DRIVER_A
tristate
imply SUBSYSTEM_X

config DRIVER_B
tristate
imply SUBSYSTEM_X

[...]

config SUBSYSTEM_X
tristate

This is much cleaner, and way more flexible than "select". SUBSYSTEM_X
can still be configured out, and it can be set as a module when none of
the drivers are selected or all of them are also modular.

Signed-off-by: Nicolas Pitre 
Reviewed-by: Josh Triplett 
---
 Documentation/kbuild/kconfig-language.txt | 28 
 scripts/kconfig/expr.h|  2 ++
 scripts/kconfig/menu.c| 55 ++-
 scripts/kconfig/symbol.c  | 24 +-
 scripts/kconfig/zconf.gperf   |  1 +
 scripts/kconfig/zconf.y   | 16 +++--
 6 files changed, 107 insertions(+), 19 deletions(-)

diff --git a/Documentation/kbuild/kconfig-language.txt 
b/Documentation/kbuild/kconfig-language.txt
index 069fcb3eef..5ee0dd3c85 100644
--- a/Documentation/kbuild/kconfig-language.txt
+++ b/Documentation/kbuild/kconfig-language.txt
@@ -113,6 +113,33 @@ applicable everywhere (see syntax).
That will limit the usefulness but on the other hand avoid
the illegal configurations all over.
 
+- weak reverse dependencies: "imply"  ["if" ]
+  This is similar to "select" as it enforces a lower limit on another
+  symbol except that the "implied" config symbol's value may still be
+  set to n from a direct dependency or with a visible prompt.
+  Given the following example:
+
+  config FOO
+   tristate
+   imply BAZ
+
+  config BAZ
+   tristate
+   depends on BAR
+
+  The following values are possible:
+
+   FOO BAR BAZ's default   choice for BAZ
+   --- --- -   --
+   n   y   n   N/m/y
+   m   y   m   M/y/n
+   y   y   y   Y/n
+   y   n   *   N
+
+  This is useful e.g. with multiple drivers that want to indicate their
+  ability to hook into a given subsystem while still being able to
+  configure that subsystem out and keep those drivers selected.
+
 - limiting menu display: "visible if" 
   This attribute is only applicable to menu blocks, if the condition is
   false, the menu block is not displayed to the user (the symbols
@@ -481,6 +508,7 @@ historical issues resolved through these different 
solutions.
   b) Match dependency semantics:
b1) Swap all "select FOO" to "depends on FOO" or,
b2) Swap all "depends on FOO" to "select FOO"
+  c) Consider the use of "imply" instead of "select"
 
 The resolution to a) can be tested with the sample Kconfig file
 Documentation/kbuild/Kconfig.recursion-issue-01 through the removal
diff --git a/scripts/kconfig/expr.h b/scripts/kconfig/expr.h
index 973b6f7333..a73f762c48 100644
--- a/scripts/kconfig/expr.h
+++ b/scripts/kconfig/expr.h
@@ -85,6 +85,7 @@ struct symbol {
struct property *prop;
struct expr_value dir_dep;
struct expr_value rev_dep;
+   struct expr_value implied;
 };
 
 #define for_all_symbols(i, sym) for (i = 0; i < SYMBOL_HASHSIZE; i++) for (sym 
= symbol_hash[i]; sym; sym = sym->next) if (sym->type != S_OTHER)
@@ -136,6 +137,7 @@ enum prop_type {
P_DEFAULT,  /* default y */
P_CHOICE,   /* choice value */
P_SELECT,   /* select BAR */
+   P_IMPLY,/* imply BAR */
P_RANGE,/* range 7..100 (for a symbol) */
P_ENV,  /* value from environment variable */
P_SYMBOL,   /* where a symbol is defined */
diff --git a/scripts/kconfig/menu.c b/scripts/kconfig/menu.c
index aed678e8a7..e9357931b4 100644
--- a/scripts/kconfig/menu.c
+++ b/scripts/kconfig/menu.c
@@ -233,6 +233,8 @@ static void sym_check_prop(struct symbol *sym)
 {
struct property *prop;
struct symbol *sym2;
+   char 

Re: [PATCH -next 1/2] Input: synaptics-rmi4 - add support for F55 sensor tuning

2016-10-25 Thread Guenter Roeck

On 10/25/2016 11:26 AM, Andrew Duggan wrote:

On 10/24/2016 08:13 PM, Guenter Roeck wrote:

Hi Andrew,

On 10/24/2016 05:59 PM, Andrew Duggan wrote:

Hi Guenter,

I have a couple of comments below.



Thanks a lot for the feedback.


On 09/30/2016 08:22 PM, Guenter Roeck wrote:

Sensor tuning support is needed to determine the number of enabled
tx and rx electrodes for use in F54 functions.

The number of enabled electrodes is not identical to the total number
of electrodes as reported with F55:Query0 and F55:Query1. It has to be
calculated by analyzing F55:Ctrl1 (sensor receiver assignment) and
F55:Ctrl2 (sensor transmitter assignment).

Support for additional sensor tuning functions may be added later.

Signed-off-by: Guenter Roeck 
---
This patch applies to next-20160930.

  drivers/input/rmi4/Kconfig  |   9 +++
  drivers/input/rmi4/Makefile |   1 +
  drivers/input/rmi4/rmi_bus.c|   3 +
  drivers/input/rmi4/rmi_driver.h |   1 +
  drivers/input/rmi4/rmi_f55.c| 127 
  5 files changed, 141 insertions(+)
  create mode 100644 drivers/input/rmi4/rmi_f55.c

diff --git a/drivers/input/rmi4/Kconfig b/drivers/input/rmi4/Kconfig
index 4c8a55857e00..11ede43c9936 100644
--- a/drivers/input/rmi4/Kconfig
+++ b/drivers/input/rmi4/Kconfig
@@ -72,3 +72,12 @@ config RMI4_F54
  Function 54 provides access to various diagnostic features in certain
RMI4 touch sensors.
+
+config RMI4_F55
+bool "RMI4 Function 55 (Sensor tuning)"
+depends on RMI4_CORE
+help
+  Say Y here if you want to add support for RMI4 function 55
+
+  Function 55 provides access to the RMI4 touch sensor tuning
+  mechanism.
diff --git a/drivers/input/rmi4/Makefile b/drivers/input/rmi4/Makefile
index 0bafc8502c4b..96f8e0c21e3b 100644
--- a/drivers/input/rmi4/Makefile
+++ b/drivers/input/rmi4/Makefile
@@ -8,6 +8,7 @@ rmi_core-$(CONFIG_RMI4_F11) += rmi_f11.o
  rmi_core-$(CONFIG_RMI4_F12) += rmi_f12.o
  rmi_core-$(CONFIG_RMI4_F30) += rmi_f30.o
  rmi_core-$(CONFIG_RMI4_F54) += rmi_f54.o
+rmi_core-$(CONFIG_RMI4_F55) += rmi_f55.o
# Transports
  obj-$(CONFIG_RMI4_I2C) += rmi_i2c.o
diff --git a/drivers/input/rmi4/rmi_bus.c b/drivers/input/rmi4/rmi_bus.c
index ef8c747c35e7..82b7d4960858 100644
--- a/drivers/input/rmi4/rmi_bus.c
+++ b/drivers/input/rmi4/rmi_bus.c
@@ -314,6 +314,9 @@ static struct rmi_function_handler *fn_handlers[] = {
  #ifdef CONFIG_RMI4_F54
  _f54_handler,
  #endif
+#ifdef CONFIG_RMI4_F55
+_f55_handler,
+#endif
  };
static void __rmi_unregister_function_handlers(int start_idx)
diff --git a/drivers/input/rmi4/rmi_driver.h b/drivers/input/rmi4/rmi_driver.h
index 8dfbebe9bf86..a65cf70f61e2 100644
--- a/drivers/input/rmi4/rmi_driver.h
+++ b/drivers/input/rmi4/rmi_driver.h
@@ -103,4 +103,5 @@ extern struct rmi_function_handler rmi_f11_handler;
  extern struct rmi_function_handler rmi_f12_handler;
  extern struct rmi_function_handler rmi_f30_handler;
  extern struct rmi_function_handler rmi_f54_handler;
+extern struct rmi_function_handler rmi_f55_handler;
  #endif
diff --git a/drivers/input/rmi4/rmi_f55.c b/drivers/input/rmi4/rmi_f55.c
new file mode 100644
index ..268fa904205a
--- /dev/null
+++ b/drivers/input/rmi4/rmi_f55.c
@@ -0,0 +1,127 @@
+/*
+ * Copyright (c) 2012-2015 Synaptics Incorporated
+ * Copyright (C) 2016 Zodiac Inflight Innovations
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 


This is incidental, but I don't think i2c.h needs to be included here since 
this file shouldn't contain anything i2c specific. Its not that big a deal, but 
I noticed it so I thought I would mention it.



Makes sense. delay.h and input.h seem to be unnecessary too.
I'll remove those if/when I resubmit.


+#include 
+#include 
+#include 
+#include 
+#include "rmi_driver.h"
+
+#define F55_NAME"rmi4_f55"
+
+/* F55 data offsets */
+#define F55_NUM_RX_OFFSET0
+#define F55_NUM_TX_OFFSET1
+#define F55_PHYS_CHAR_OFFSET2
+
+/* Fixed sizes of reports */
+#define F55_QUERY_LEN17


How did you chose the number 17? The number of F55 query registers present will 
depend on how the firmware is configured so the total length of query registers 
can change. Right now this driver is only using the first three F55 query 
registers which will always be present so that not an issue. But, beyond query 
2 not all query registers are guaranteed to be present.



According to the information I have, the maximum size is 17.

Do you have a better idea on how to handle the dynamic length ? Or a better 
number ?
Should I only read the minimum ? Or the number we actually need (3) at this 
point ?
Or just name the define F55_QUERY_MAXLEN and change the comment to "maximum size
of report" ?



I would just read the three registers which 

Re: [PATCH -next 1/2] Input: synaptics-rmi4 - add support for F55 sensor tuning

2016-10-25 Thread Guenter Roeck

On 10/25/2016 11:26 AM, Andrew Duggan wrote:

On 10/24/2016 08:13 PM, Guenter Roeck wrote:

Hi Andrew,

On 10/24/2016 05:59 PM, Andrew Duggan wrote:

Hi Guenter,

I have a couple of comments below.



Thanks a lot for the feedback.


On 09/30/2016 08:22 PM, Guenter Roeck wrote:

Sensor tuning support is needed to determine the number of enabled
tx and rx electrodes for use in F54 functions.

The number of enabled electrodes is not identical to the total number
of electrodes as reported with F55:Query0 and F55:Query1. It has to be
calculated by analyzing F55:Ctrl1 (sensor receiver assignment) and
F55:Ctrl2 (sensor transmitter assignment).

Support for additional sensor tuning functions may be added later.

Signed-off-by: Guenter Roeck 
---
This patch applies to next-20160930.

  drivers/input/rmi4/Kconfig  |   9 +++
  drivers/input/rmi4/Makefile |   1 +
  drivers/input/rmi4/rmi_bus.c|   3 +
  drivers/input/rmi4/rmi_driver.h |   1 +
  drivers/input/rmi4/rmi_f55.c| 127 
  5 files changed, 141 insertions(+)
  create mode 100644 drivers/input/rmi4/rmi_f55.c

diff --git a/drivers/input/rmi4/Kconfig b/drivers/input/rmi4/Kconfig
index 4c8a55857e00..11ede43c9936 100644
--- a/drivers/input/rmi4/Kconfig
+++ b/drivers/input/rmi4/Kconfig
@@ -72,3 +72,12 @@ config RMI4_F54
  Function 54 provides access to various diagnostic features in certain
RMI4 touch sensors.
+
+config RMI4_F55
+bool "RMI4 Function 55 (Sensor tuning)"
+depends on RMI4_CORE
+help
+  Say Y here if you want to add support for RMI4 function 55
+
+  Function 55 provides access to the RMI4 touch sensor tuning
+  mechanism.
diff --git a/drivers/input/rmi4/Makefile b/drivers/input/rmi4/Makefile
index 0bafc8502c4b..96f8e0c21e3b 100644
--- a/drivers/input/rmi4/Makefile
+++ b/drivers/input/rmi4/Makefile
@@ -8,6 +8,7 @@ rmi_core-$(CONFIG_RMI4_F11) += rmi_f11.o
  rmi_core-$(CONFIG_RMI4_F12) += rmi_f12.o
  rmi_core-$(CONFIG_RMI4_F30) += rmi_f30.o
  rmi_core-$(CONFIG_RMI4_F54) += rmi_f54.o
+rmi_core-$(CONFIG_RMI4_F55) += rmi_f55.o
# Transports
  obj-$(CONFIG_RMI4_I2C) += rmi_i2c.o
diff --git a/drivers/input/rmi4/rmi_bus.c b/drivers/input/rmi4/rmi_bus.c
index ef8c747c35e7..82b7d4960858 100644
--- a/drivers/input/rmi4/rmi_bus.c
+++ b/drivers/input/rmi4/rmi_bus.c
@@ -314,6 +314,9 @@ static struct rmi_function_handler *fn_handlers[] = {
  #ifdef CONFIG_RMI4_F54
  _f54_handler,
  #endif
+#ifdef CONFIG_RMI4_F55
+_f55_handler,
+#endif
  };
static void __rmi_unregister_function_handlers(int start_idx)
diff --git a/drivers/input/rmi4/rmi_driver.h b/drivers/input/rmi4/rmi_driver.h
index 8dfbebe9bf86..a65cf70f61e2 100644
--- a/drivers/input/rmi4/rmi_driver.h
+++ b/drivers/input/rmi4/rmi_driver.h
@@ -103,4 +103,5 @@ extern struct rmi_function_handler rmi_f11_handler;
  extern struct rmi_function_handler rmi_f12_handler;
  extern struct rmi_function_handler rmi_f30_handler;
  extern struct rmi_function_handler rmi_f54_handler;
+extern struct rmi_function_handler rmi_f55_handler;
  #endif
diff --git a/drivers/input/rmi4/rmi_f55.c b/drivers/input/rmi4/rmi_f55.c
new file mode 100644
index ..268fa904205a
--- /dev/null
+++ b/drivers/input/rmi4/rmi_f55.c
@@ -0,0 +1,127 @@
+/*
+ * Copyright (c) 2012-2015 Synaptics Incorporated
+ * Copyright (C) 2016 Zodiac Inflight Innovations
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 


This is incidental, but I don't think i2c.h needs to be included here since 
this file shouldn't contain anything i2c specific. Its not that big a deal, but 
I noticed it so I thought I would mention it.



Makes sense. delay.h and input.h seem to be unnecessary too.
I'll remove those if/when I resubmit.


+#include 
+#include 
+#include 
+#include 
+#include "rmi_driver.h"
+
+#define F55_NAME"rmi4_f55"
+
+/* F55 data offsets */
+#define F55_NUM_RX_OFFSET0
+#define F55_NUM_TX_OFFSET1
+#define F55_PHYS_CHAR_OFFSET2
+
+/* Fixed sizes of reports */
+#define F55_QUERY_LEN17


How did you chose the number 17? The number of F55 query registers present will 
depend on how the firmware is configured so the total length of query registers 
can change. Right now this driver is only using the first three F55 query 
registers which will always be present so that not an issue. But, beyond query 
2 not all query registers are guaranteed to be present.



According to the information I have, the maximum size is 17.

Do you have a better idea on how to handle the dynamic length ? Or a better 
number ?
Should I only read the minimum ? Or the number we actually need (3) at this 
point ?
Or just name the define F55_QUERY_MAXLEN and change the comment to "maximum size
of report" ?



I would just read the three registers which you are using. Those 

[PATCH] Change the document about iowait

2016-10-25 Thread Chao Fan
The iowait is not reliable by reading from /proc/stat, so this
method to get iowait is not suggested. And we mark it in the
document.

Signed-off-by: Cao Jin 
Signed-off-by: Chao Fan 
---
 Documentation/filesystems/proc.txt | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/Documentation/filesystems/proc.txt 
b/Documentation/filesystems/proc.txt
index 74329fd..71f5096 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -1305,7 +1305,16 @@ second).  The meanings of the columns are as follows, 
from left to right:
 - nice: niced processes executing in user mode
 - system: processes executing in kernel mode
 - idle: twiddling thumbs
-- iowait: waiting for I/O to complete
+- iowait: In a word, iowait stands for waiting for I/O to complete. But there
+  are several problems:
+  1. Cpu will not wait for I/O to complete, iowait is the time that a task is
+ waiting for I/O to complete. When cpu goes into idle state for
+ outstanding task io, another task will be scheduled on this CPU.
+  2. In a multi-core CPU, the task waiting for I/O to complete is not running
+ on any CPU, so the iowait of each CPU is difficult to calculate.
+  3. The value of iowait field in /proc/stat will decrease in certain
+ conditions.
+  So, the iowait is not reliable by reading from /proc/stat.
 - irq: servicing interrupts
 - softirq: servicing softirqs
 - steal: involuntary wait
-- 
2.7.4





[PATCH] Change the document about iowait

2016-10-25 Thread Chao Fan
The iowait is not reliable by reading from /proc/stat, so this
method to get iowait is not suggested. And we mark it in the
document.

Signed-off-by: Cao Jin 
Signed-off-by: Chao Fan 
---
 Documentation/filesystems/proc.txt | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/Documentation/filesystems/proc.txt 
b/Documentation/filesystems/proc.txt
index 74329fd..71f5096 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -1305,7 +1305,16 @@ second).  The meanings of the columns are as follows, 
from left to right:
 - nice: niced processes executing in user mode
 - system: processes executing in kernel mode
 - idle: twiddling thumbs
-- iowait: waiting for I/O to complete
+- iowait: In a word, iowait stands for waiting for I/O to complete. But there
+  are several problems:
+  1. Cpu will not wait for I/O to complete, iowait is the time that a task is
+ waiting for I/O to complete. When cpu goes into idle state for
+ outstanding task io, another task will be scheduled on this CPU.
+  2. In a multi-core CPU, the task waiting for I/O to complete is not running
+ on any CPU, so the iowait of each CPU is difficult to calculate.
+  3. The value of iowait field in /proc/stat will decrease in certain
+ conditions.
+  So, the iowait is not reliable by reading from /proc/stat.
 - irq: servicing interrupts
 - softirq: servicing softirqs
 - steal: involuntary wait
-- 
2.7.4





Re: [PATCH] ARM: imx6: Fix GPC probe error path

2016-10-25 Thread Guenter Roeck

On 10/25/2016 10:34 AM, Guenter Roeck wrote:

GPC may fail to instantiate with

imx-gpc: probe of 20dc000.gpc failed with error -22

which is returned from of_genpd_add_provider_onecell(). The error path
does not call pm_genpd_remove(). This results in the following crash
later on.

Unhandled fault: page domain fault (0x01b) at 0x0040
pgd = c0204000
[0040] *pgd=
Internal error: : 1b [#1] SMP ARM
Modules linked in:
CPU: 0 PID: 108 Comm: kworker/0:3 Not tainted 4.9.0-rc2 #8
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
Workqueue: pm genpd_power_off_work_fn
task: c759ea00 task.stack: c766a000
PC is at mutex_lock+0xc/0x4c
LR is at regulator_disable+0x28/0x64
...
[] (mutex_lock) from [] (regulator_disable+0x28/0x64)
[] (regulator_disable) from [] 
(imx6q_pm_pu_power_off+0x90/0x98)
[] (imx6q_pm_pu_power_off) from [] 
(genpd_poweroff+0x114/0x1d4)
[] (genpd_poweroff) from [] 
(genpd_power_off_work_fn+0x20/0x2c)
[] (genpd_power_off_work_fn) from [] 
(process_one_work+0x138/0x34c)
[] (process_one_work) from [] (worker_thread+0x38/0x510)
[] (worker_thread) from [] (kthread+0xdc/0xf4)
[] (kthread) from [] (ret_from_fork+0x14/0x3c)

This is seen with multi_v7_defconfig and imx6dl-sabrelite.dtb running in
qemu (v2.7 patched to fix a qemu related problem). The error return from
of_genpd_add_provider_onecell() is not seen in v4.8 and may be caused by
a devicetree change (this is a wild guess only), but that is a different
problem.

Fixes: 00eb60a8b4f7 ("ARM: imx6: gpc: Add PU power domain for GPU/VPU")
Cc: Philipp Zabel 
Cc: Arnd Bergmann 
Signed-off-by: Guenter Roeck 
---
Several bisect attempts trying to track down "imx-gpc: probe ... failed
with error -22" point to commit 00e729c93395 ("Merge tag 'armsoc-dt' of
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc"). I have not
been able to track down the real culprit. Part of the problem is that
CONFIG_REGULATOR_ANATOP must be enabled for the problem to be seen, and
CONFIG_ARCH_AT91 causes compile errors for some sequence of commits between
v4.8 and v4.9-rc1. But even after taking this into account, the bisect
results always point to 00e729c93395. If anyone has an idea how to track
down that problem, or what might be causing it, please let me know.



Looking into this some more, it turns out that of_genpd_add_provider_onecell()
now returns an error if one of the provided power domains does not exist.
In this case, the "ARM" power domain does not exist. I don't see where it is
created, so it may well be that this now fails for all imx6 boards with
multi_v7_defconfig. Looking into kernelci.org test results, this is confirmed
for at least imx6dl-riotboard. Overall I think it is quite safe to assume
that all imx6 boards crash with mainline kernels and multi_v7_defconfig.

The change can be tracked down to commit 0159ec67076 ("PM / Domains: Verify
the PM domain is present when adding a provider"). Adding everyone in the
commit log for feedback.

Guenter


 arch/arm/mach-imx/gpc.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/arm/mach-imx/gpc.c b/arch/arm/mach-imx/gpc.c
index 0df062d8b2c9..f3f40045b4c9 100644
--- a/arch/arm/mach-imx/gpc.c
+++ b/arch/arm/mach-imx/gpc.c
@@ -409,6 +409,7 @@ static int imx_gpc_genpd_init(struct device *dev, struct 
regulator *pu_reg)
 {
struct clk *clk;
int i;
+   int ret;

imx6q_pu_domain.reg = pu_reg;

@@ -431,9 +432,14 @@ static int imx_gpc_genpd_init(struct device *dev, struct 
regulator *pu_reg)
return 0;

pm_genpd_init(_pu_domain.base, NULL, false);
-   return of_genpd_add_provider_onecell(dev->of_node,
-_gpc_onecell_data);
+   ret = of_genpd_add_provider_onecell(dev->of_node,
+   _gpc_onecell_data);
+   if (ret)
+   goto genpd_remove;
+   return 0;

+genpd_remove:
+   pm_genpd_remove(_pu_domain.base);
 clk_err:
while (i--)
clk_put(imx6q_pu_domain.clk[i]);





Re: [PATCH] ARM: imx6: Fix GPC probe error path

2016-10-25 Thread Guenter Roeck

On 10/25/2016 10:34 AM, Guenter Roeck wrote:

GPC may fail to instantiate with

imx-gpc: probe of 20dc000.gpc failed with error -22

which is returned from of_genpd_add_provider_onecell(). The error path
does not call pm_genpd_remove(). This results in the following crash
later on.

Unhandled fault: page domain fault (0x01b) at 0x0040
pgd = c0204000
[0040] *pgd=
Internal error: : 1b [#1] SMP ARM
Modules linked in:
CPU: 0 PID: 108 Comm: kworker/0:3 Not tainted 4.9.0-rc2 #8
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
Workqueue: pm genpd_power_off_work_fn
task: c759ea00 task.stack: c766a000
PC is at mutex_lock+0xc/0x4c
LR is at regulator_disable+0x28/0x64
...
[] (mutex_lock) from [] (regulator_disable+0x28/0x64)
[] (regulator_disable) from [] 
(imx6q_pm_pu_power_off+0x90/0x98)
[] (imx6q_pm_pu_power_off) from [] 
(genpd_poweroff+0x114/0x1d4)
[] (genpd_poweroff) from [] 
(genpd_power_off_work_fn+0x20/0x2c)
[] (genpd_power_off_work_fn) from [] 
(process_one_work+0x138/0x34c)
[] (process_one_work) from [] (worker_thread+0x38/0x510)
[] (worker_thread) from [] (kthread+0xdc/0xf4)
[] (kthread) from [] (ret_from_fork+0x14/0x3c)

This is seen with multi_v7_defconfig and imx6dl-sabrelite.dtb running in
qemu (v2.7 patched to fix a qemu related problem). The error return from
of_genpd_add_provider_onecell() is not seen in v4.8 and may be caused by
a devicetree change (this is a wild guess only), but that is a different
problem.

Fixes: 00eb60a8b4f7 ("ARM: imx6: gpc: Add PU power domain for GPU/VPU")
Cc: Philipp Zabel 
Cc: Arnd Bergmann 
Signed-off-by: Guenter Roeck 
---
Several bisect attempts trying to track down "imx-gpc: probe ... failed
with error -22" point to commit 00e729c93395 ("Merge tag 'armsoc-dt' of
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc"). I have not
been able to track down the real culprit. Part of the problem is that
CONFIG_REGULATOR_ANATOP must be enabled for the problem to be seen, and
CONFIG_ARCH_AT91 causes compile errors for some sequence of commits between
v4.8 and v4.9-rc1. But even after taking this into account, the bisect
results always point to 00e729c93395. If anyone has an idea how to track
down that problem, or what might be causing it, please let me know.



Looking into this some more, it turns out that of_genpd_add_provider_onecell()
now returns an error if one of the provided power domains does not exist.
In this case, the "ARM" power domain does not exist. I don't see where it is
created, so it may well be that this now fails for all imx6 boards with
multi_v7_defconfig. Looking into kernelci.org test results, this is confirmed
for at least imx6dl-riotboard. Overall I think it is quite safe to assume
that all imx6 boards crash with mainline kernels and multi_v7_defconfig.

The change can be tracked down to commit 0159ec67076 ("PM / Domains: Verify
the PM domain is present when adding a provider"). Adding everyone in the
commit log for feedback.

Guenter


 arch/arm/mach-imx/gpc.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/arm/mach-imx/gpc.c b/arch/arm/mach-imx/gpc.c
index 0df062d8b2c9..f3f40045b4c9 100644
--- a/arch/arm/mach-imx/gpc.c
+++ b/arch/arm/mach-imx/gpc.c
@@ -409,6 +409,7 @@ static int imx_gpc_genpd_init(struct device *dev, struct 
regulator *pu_reg)
 {
struct clk *clk;
int i;
+   int ret;

imx6q_pu_domain.reg = pu_reg;

@@ -431,9 +432,14 @@ static int imx_gpc_genpd_init(struct device *dev, struct 
regulator *pu_reg)
return 0;

pm_genpd_init(_pu_domain.base, NULL, false);
-   return of_genpd_add_provider_onecell(dev->of_node,
-_gpc_onecell_data);
+   ret = of_genpd_add_provider_onecell(dev->of_node,
+   _gpc_onecell_data);
+   if (ret)
+   goto genpd_remove;
+   return 0;

+genpd_remove:
+   pm_genpd_remove(_pu_domain.base);
 clk_err:
while (i--)
clk_put(imx6q_pu_domain.clk[i]);





  1   2   3   4   5   6   7   8   9   10   >