date:20180223

[rcu] df95cc69cc: BUG:KASAN:null-ptr-deref_in__lock_acquire

2018-02-23 Thread kernel test robot


FYI, we noticed the following commit (built with gcc-7):

commit: df95cc69cca894430640237d39453f5d96c40a7d ("rcu: Parallelize expedited 
grace-period initialization")
https://git.kernel.org/cgit/linux/kernel/git/paulmck/linux-rcu.git rcu/dev

in testcase: trinity
with following parameters:

runtime: 300s

test-description: Trinity is a linux system call fuzz tester.
test-url: http://codemonkey.org.uk/projects/trinity/


on test machine: qemu-system-x86_64 -enable-kvm -m 512M

caused below changes (please refer to attached dmesg/kmsg for entire 
log/backtrace):


+--+++
|  | 3fea14045a 
| df95cc69cc |
+--+++
| boot_successes   | 4  
| 0  |
| boot_failures| 4  
| 17 |
| invoked_oom-killer:gfp_mask=0x   | 4  
||
| Mem-Info | 4  
||
| Kernel_panic-not_syncing:Out_of_memory_and_no_killable_processes | 4  
||
| BUG:KASAN:null-ptr-deref_in__lock_acquire| 0  
| 17 |
| BUG:unable_to_handle_kernel  | 0  
| 17 |
| Oops:#[##]   | 0  
| 17 |
| RIP:__lock_acquire   | 0  
| 17 |
| Kernel_panic-not_syncing:Fatal_exception | 0  
| 17 |
+--+++



[0.030859] BUG: KASAN: null-ptr-deref in __lock_acquire+0x171/0x13d0
[0.031636] Read of size 8 at addr 0018 by task swapper/0/0
[0.032000] 
[0.032000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
4.16.0-rc1-00044-gdf95cc6 #1
[0.032000] Call Trace:
[0.032000]  dump_stack+0x81/0xb3
[0.032000]  kasan_report+0x22a/0x25a
[0.032000]  __lock_acquire+0x171/0x13d0
[0.032000]  ? lookup_chain_cache+0x42/0x6b
[0.032000]  ? mark_lock+0x25b/0x26d
[0.032000]  ? rcu_report_exp_cpu_mult+0x21/0x6d
[0.032000]  ? debug_check_no_locks_freed+0x19f/0x19f
[0.032000]  ? debug_check_no_locks_freed+0x19f/0x19f
[0.032000]  ? acpi_hw_read+0x1a0/0x202
[0.032000]  ? rcu_report_exp_cpu_mult+0x21/0x6d
[0.032000]  ? lock_acquire+0x1c0/0x209
[0.032000]  lock_acquire+0x1c0/0x209
[0.032000]  ? rcu_report_exp_cpu_mult+0x21/0x6d
[0.032000]  ? sync_sched_exp_handler+0x111/0x111
[0.032000]  _raw_spin_lock_irqsave+0x43/0x56
[0.032000]  ? rcu_report_exp_cpu_mult+0x21/0x6d
[0.032000]  rcu_report_exp_cpu_mult+0x21/0x6d
[0.032000]  ? sync_sched_exp_handler+0x111/0x111
[0.032000]  sync_rcu_exp_select_cpus+0x31b/0x44d
[0.032000]  ? rcu_read_lock_sched_held+0x60/0x66
[0.032000]  ? sync_sched_exp_handler+0x111/0x111
[0.032000]  _synchronize_rcu_expedited+0x427/0x5ba
[0.032000]  ? signal_pending+0x15/0x15
[0.032000]  ? acpi_hw_write_pm1_control+0x52/0x52
[0.032000]  ? acpi_hw_write_pm1_control+0x52/0x52
[0.032000]  ? __change_page_attr_set_clr+0x420/0x420
[0.032000]  ? printk+0x94/0xb0
[0.032000]  ? show_regs_print_info+0xa/0xa
[0.032000]  ? lock_downgrade+0x26a/0x26a
[0.032000]  ? acpi_read_bit_register+0xb1/0xde
[0.032000]  ? acpi_read+0xa/0xa
[0.032000]  ? acpi_read+0xa/0xa
[0.032000]  ? acpi_hw_get_mode+0x91/0xc2
[0.032000]  ? _find_next_bit+0x3f/0xe4
[0.032000]  ? __lock_is_held+0x2a/0x87
[0.032000]  ? lock_is_held_type+0x78/0x86
[0.032000]  rcu_test_sync_prims+0xa/0x23
[0.032000]  rest_init+0xb/0xcf
[0.032000]  start_kernel+0x59a/0x5be
[0.032000]  ? mem_encrypt_init+0x6/0x6
[0.032000]  ? memcpy_orig+0x54/0x110
[0.032000]  ? x86_family+0x5/0x1d
[0.032000]  ? load_ucode_bsp+0x3a/0xab
[0.032000]  secondary_startup_64+0xa5/0xb0
[0.032000] 
==
[0.032000] Disabling lock debugging due to kernel taint
[0.032000] BUG: unable to handle kernel NULL pointer dereference at 
0018
[0.032000] IP: __lock_acquire+0x171/0x13d0
[0.032000] PGD 0 P4D 0 
[0.032000] Oops:  [#1] PREEMPT SMP KASAN PTI
[0.032000] Modules linked in:
[0.032000] CPU: 0 PID: 0 Comm: swapper/0 Tainted: GB
4.16.0-rc1-00044-gdf95cc6 #1
[0.032000] RIP: 0010:__lock_acquire+0x171/0x13d0
[0.032000] RSP: :89e079a0 EFLAGS: 00010056
[0.032000] RAX: 0096 RBX:  RCX: 884c9e31
[0.032000] RDX: 0003 RSI: 0003 RDI:

[rcu] df95cc69cc: BUG:KASAN:null-ptr-deref_in__lock_acquire

2018-02-23 Thread kernel test robot


FYI, we noticed the following commit (built with gcc-7):

commit: df95cc69cca894430640237d39453f5d96c40a7d ("rcu: Parallelize expedited 
grace-period initialization")
https://git.kernel.org/cgit/linux/kernel/git/paulmck/linux-rcu.git rcu/dev

in testcase: trinity
with following parameters:

runtime: 300s

test-description: Trinity is a linux system call fuzz tester.
test-url: http://codemonkey.org.uk/projects/trinity/


on test machine: qemu-system-x86_64 -enable-kvm -m 512M

caused below changes (please refer to attached dmesg/kmsg for entire 
log/backtrace):


+--+++
|  | 3fea14045a 
| df95cc69cc |
+--+++
| boot_successes   | 4  
| 0  |
| boot_failures| 4  
| 17 |
| invoked_oom-killer:gfp_mask=0x   | 4  
||
| Mem-Info | 4  
||
| Kernel_panic-not_syncing:Out_of_memory_and_no_killable_processes | 4  
||
| BUG:KASAN:null-ptr-deref_in__lock_acquire| 0  
| 17 |
| BUG:unable_to_handle_kernel  | 0  
| 17 |
| Oops:#[##]   | 0  
| 17 |
| RIP:__lock_acquire   | 0  
| 17 |
| Kernel_panic-not_syncing:Fatal_exception | 0  
| 17 |
+--+++



[0.030859] BUG: KASAN: null-ptr-deref in __lock_acquire+0x171/0x13d0
[0.031636] Read of size 8 at addr 0018 by task swapper/0/0
[0.032000] 
[0.032000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
4.16.0-rc1-00044-gdf95cc6 #1
[0.032000] Call Trace:
[0.032000]  dump_stack+0x81/0xb3
[0.032000]  kasan_report+0x22a/0x25a
[0.032000]  __lock_acquire+0x171/0x13d0
[0.032000]  ? lookup_chain_cache+0x42/0x6b
[0.032000]  ? mark_lock+0x25b/0x26d
[0.032000]  ? rcu_report_exp_cpu_mult+0x21/0x6d
[0.032000]  ? debug_check_no_locks_freed+0x19f/0x19f
[0.032000]  ? debug_check_no_locks_freed+0x19f/0x19f
[0.032000]  ? acpi_hw_read+0x1a0/0x202
[0.032000]  ? rcu_report_exp_cpu_mult+0x21/0x6d
[0.032000]  ? lock_acquire+0x1c0/0x209
[0.032000]  lock_acquire+0x1c0/0x209
[0.032000]  ? rcu_report_exp_cpu_mult+0x21/0x6d
[0.032000]  ? sync_sched_exp_handler+0x111/0x111
[0.032000]  _raw_spin_lock_irqsave+0x43/0x56
[0.032000]  ? rcu_report_exp_cpu_mult+0x21/0x6d
[0.032000]  rcu_report_exp_cpu_mult+0x21/0x6d
[0.032000]  ? sync_sched_exp_handler+0x111/0x111
[0.032000]  sync_rcu_exp_select_cpus+0x31b/0x44d
[0.032000]  ? rcu_read_lock_sched_held+0x60/0x66
[0.032000]  ? sync_sched_exp_handler+0x111/0x111
[0.032000]  _synchronize_rcu_expedited+0x427/0x5ba
[0.032000]  ? signal_pending+0x15/0x15
[0.032000]  ? acpi_hw_write_pm1_control+0x52/0x52
[0.032000]  ? acpi_hw_write_pm1_control+0x52/0x52
[0.032000]  ? __change_page_attr_set_clr+0x420/0x420
[0.032000]  ? printk+0x94/0xb0
[0.032000]  ? show_regs_print_info+0xa/0xa
[0.032000]  ? lock_downgrade+0x26a/0x26a
[0.032000]  ? acpi_read_bit_register+0xb1/0xde
[0.032000]  ? acpi_read+0xa/0xa
[0.032000]  ? acpi_read+0xa/0xa
[0.032000]  ? acpi_hw_get_mode+0x91/0xc2
[0.032000]  ? _find_next_bit+0x3f/0xe4
[0.032000]  ? __lock_is_held+0x2a/0x87
[0.032000]  ? lock_is_held_type+0x78/0x86
[0.032000]  rcu_test_sync_prims+0xa/0x23
[0.032000]  rest_init+0xb/0xcf
[0.032000]  start_kernel+0x59a/0x5be
[0.032000]  ? mem_encrypt_init+0x6/0x6
[0.032000]  ? memcpy_orig+0x54/0x110
[0.032000]  ? x86_family+0x5/0x1d
[0.032000]  ? load_ucode_bsp+0x3a/0xab
[0.032000]  secondary_startup_64+0xa5/0xb0
[0.032000] 
==
[0.032000] Disabling lock debugging due to kernel taint
[0.032000] BUG: unable to handle kernel NULL pointer dereference at 
0018
[0.032000] IP: __lock_acquire+0x171/0x13d0
[0.032000] PGD 0 P4D 0 
[0.032000] Oops:  [#1] PREEMPT SMP KASAN PTI
[0.032000] Modules linked in:
[0.032000] CPU: 0 PID: 0 Comm: swapper/0 Tainted: GB
4.16.0-rc1-00044-gdf95cc6 #1
[0.032000] RIP: 0010:__lock_acquire+0x171/0x13d0
[0.032000] RSP: :89e079a0 EFLAGS: 00010056
[0.032000] RAX: 0096 RBX:  RCX: 884c9e31
[0.032000] RDX: 0003 RSI: 0003 RDI:

Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook

2018-02-23 Thread Paolo Valente



> Il giorno 23 feb 2018, alle ore 17:17, Ming Lei  ha 
> scritto:
> 
> Hi Paolo,
> 
> On Fri, Feb 23, 2018 at 04:41:36PM +0100, Paolo Valente wrote:
>> 
>> 
>>> Il giorno 23 feb 2018, alle ore 16:07, Ming Lei  ha 
>>> scritto:
>>> 
>>> Hi Paolo,
>>> 
>>> On Wed, Feb 07, 2018 at 10:19:20PM +0100, Paolo Valente wrote:
 Commit 'a6a252e64914 ("blk-mq-sched: decide how to handle flush rq via
 RQF_FLUSH_SEQ")' makes all non-flush re-prepared requests for a device
 be re-inserted into the active I/O scheduler for that device. As a
>>> 
>>> No, this behaviour isn't related with commit a6a252e64914, and
>>> it has been there since blk_mq_requeue_request() is introduced.
>>> 
>> 
>> Hi Ming,
>> actually, we wrote the above statement after simply following the call
>> chain that led to the failure.  In this respect, the change in commit
>> a6a252e64914:
>> 
>> static bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx,
>> +  bool has_sched,
>>   struct request *rq)
>> {
>> -   if (rq->tag == -1) {
>> +   /* dispatch flush rq directly */
>> +   if (rq->rq_flags & RQF_FLUSH_SEQ) {
>> +   spin_lock(>lock);
>> +   list_add(>queuelist, >dispatch);
>> +   spin_unlock(>lock);
>> +   return true;
>> +   }
>> +
>> +   if (has_sched) {
>>rq->rq_flags |= RQF_SORTED;
>> -   return false;
>> +   WARN_ON(rq->tag != -1);
>>}
>> 
>> -   /*
>> -* If we already have a real request tag, send directly to
>> -* the dispatch list.
>> -*/
>> -   spin_lock(>lock);
>> -   list_add(>queuelist, >dispatch);
>> -   spin_unlock(>lock);
>> -   return true;
>> +   return false;
>> }
>> 
>> makes blk_mq_sched_bypass_insert return false for all non-flush
>> requests.  From that, the anomaly described in our commit follows, for
>> bfq any stateful scheduler that waits for the completion of requests
>> that passed through it.  I'm elaborating again a little bit on this in
>> my replies to your next points below.
> 
> Before a6a252e64914, follows blk_mq_sched_bypass_insert()
> 
>   if (rq->tag == -1) {
>   rq->rq_flags |= RQF_SORTED;
>   return false;
>  }
> 
>   /*
>* If we already have a real request tag, send directly to
>* the dispatch list.
>*/
>   spin_lock(>lock);
>   list_add(>queuelist, >dispatch);
>   spin_unlock(>lock);
>   return true;
> 
> This function still returns false for all non-flush request, so nothing
> changes wrt. this kind of handling.
> 

Yep Ming.  I don't have the expertise to tell you why, but the failure
in the USB case was caused by an rq that is not flush, but for which
rq->tag != -1.  So, the previous version of blk_mq_sched_bypass_insert
returned true, and there was not failure, while after commit
a6a252e64914 the function returns true and the failure occurs if bfq
does not exploit the requeue hook.

You have actually shown it yourself, several months ago, through the
simple code instrumentation you made and used to show that bfq was
stuck.  And I guess it can still be reproduced very easily, unless
something else has changed in blk-mq.

BTW, if you can shed a light on this fact, that would be a great
occasion to learn for me.

>> 
>> I don't mean that this change is an error, it simply sends a stateful
>> scheduler in an inconsistent state, unless the scheduler properly
>> handles the requeue that precedes the re-insertion into the
>> scheduler.
>> 
>> If this clarifies the situation, but there is still some misleading
>> statement in the commit, just let me better understand, and I'll be
>> glad to rectify it, if possible somehow.
>> 
>> 
>>> And you can see blk_mq_requeue_request() is called by lots of drivers,
>>> especially it is often used in error handler, see SCSI's example.
>>> 
 consequence, I/O schedulers may get the same request inserted again,
 even several times, without a finish_request invoked on that request
 before each re-insertion.
 
>> 
>> ...
>> 
 @@ -5426,7 +5482,8 @@ static struct elevator_type iosched_bfq_mq = {
.ops.mq = {
.limit_depth= bfq_limit_depth,
.prepare_request= bfq_prepare_request,
 -  .finish_request = bfq_finish_request,
 +  .requeue_request= bfq_finish_requeue_request,
 +  .finish_request = bfq_finish_requeue_request,
.exit_icq   = bfq_exit_icq,
.insert_requests= bfq_insert_requests,
.dispatch_request   = bfq_dispatch_request,
>>> 
>>> This way may not be correct since blk_mq_sched_requeue_request() can be
>>> called for one request which won't enter io scheduler.
>>> 
>> 
>> Exactly, there are two

Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook

2018-02-23 Thread Paolo Valente



> Il giorno 23 feb 2018, alle ore 17:17, Ming Lei  ha 
> scritto:
> 
> Hi Paolo,
> 
> On Fri, Feb 23, 2018 at 04:41:36PM +0100, Paolo Valente wrote:
>> 
>> 
>>> Il giorno 23 feb 2018, alle ore 16:07, Ming Lei  ha 
>>> scritto:
>>> 
>>> Hi Paolo,
>>> 
>>> On Wed, Feb 07, 2018 at 10:19:20PM +0100, Paolo Valente wrote:
 Commit 'a6a252e64914 ("blk-mq-sched: decide how to handle flush rq via
 RQF_FLUSH_SEQ")' makes all non-flush re-prepared requests for a device
 be re-inserted into the active I/O scheduler for that device. As a
>>> 
>>> No, this behaviour isn't related with commit a6a252e64914, and
>>> it has been there since blk_mq_requeue_request() is introduced.
>>> 
>> 
>> Hi Ming,
>> actually, we wrote the above statement after simply following the call
>> chain that led to the failure.  In this respect, the change in commit
>> a6a252e64914:
>> 
>> static bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx,
>> +  bool has_sched,
>>   struct request *rq)
>> {
>> -   if (rq->tag == -1) {
>> +   /* dispatch flush rq directly */
>> +   if (rq->rq_flags & RQF_FLUSH_SEQ) {
>> +   spin_lock(>lock);
>> +   list_add(>queuelist, >dispatch);
>> +   spin_unlock(>lock);
>> +   return true;
>> +   }
>> +
>> +   if (has_sched) {
>>rq->rq_flags |= RQF_SORTED;
>> -   return false;
>> +   WARN_ON(rq->tag != -1);
>>}
>> 
>> -   /*
>> -* If we already have a real request tag, send directly to
>> -* the dispatch list.
>> -*/
>> -   spin_lock(>lock);
>> -   list_add(>queuelist, >dispatch);
>> -   spin_unlock(>lock);
>> -   return true;
>> +   return false;
>> }
>> 
>> makes blk_mq_sched_bypass_insert return false for all non-flush
>> requests.  From that, the anomaly described in our commit follows, for
>> bfq any stateful scheduler that waits for the completion of requests
>> that passed through it.  I'm elaborating again a little bit on this in
>> my replies to your next points below.
> 
> Before a6a252e64914, follows blk_mq_sched_bypass_insert()
> 
>   if (rq->tag == -1) {
>   rq->rq_flags |= RQF_SORTED;
>   return false;
>  }
> 
>   /*
>* If we already have a real request tag, send directly to
>* the dispatch list.
>*/
>   spin_lock(>lock);
>   list_add(>queuelist, >dispatch);
>   spin_unlock(>lock);
>   return true;
> 
> This function still returns false for all non-flush request, so nothing
> changes wrt. this kind of handling.
> 

Yep Ming.  I don't have the expertise to tell you why, but the failure
in the USB case was caused by an rq that is not flush, but for which
rq->tag != -1.  So, the previous version of blk_mq_sched_bypass_insert
returned true, and there was not failure, while after commit
a6a252e64914 the function returns true and the failure occurs if bfq
does not exploit the requeue hook.

You have actually shown it yourself, several months ago, through the
simple code instrumentation you made and used to show that bfq was
stuck.  And I guess it can still be reproduced very easily, unless
something else has changed in blk-mq.

BTW, if you can shed a light on this fact, that would be a great
occasion to learn for me.

>> 
>> I don't mean that this change is an error, it simply sends a stateful
>> scheduler in an inconsistent state, unless the scheduler properly
>> handles the requeue that precedes the re-insertion into the
>> scheduler.
>> 
>> If this clarifies the situation, but there is still some misleading
>> statement in the commit, just let me better understand, and I'll be
>> glad to rectify it, if possible somehow.
>> 
>> 
>>> And you can see blk_mq_requeue_request() is called by lots of drivers,
>>> especially it is often used in error handler, see SCSI's example.
>>> 
 consequence, I/O schedulers may get the same request inserted again,
 even several times, without a finish_request invoked on that request
 before each re-insertion.
 
>> 
>> ...
>> 
 @@ -5426,7 +5482,8 @@ static struct elevator_type iosched_bfq_mq = {
.ops.mq = {
.limit_depth= bfq_limit_depth,
.prepare_request= bfq_prepare_request,
 -  .finish_request = bfq_finish_request,
 +  .requeue_request= bfq_finish_requeue_request,
 +  .finish_request = bfq_finish_requeue_request,
.exit_icq   = bfq_exit_icq,
.insert_requests= bfq_insert_requests,
.dispatch_request   = bfq_dispatch_request,
>>> 
>>> This way may not be correct since blk_mq_sched_requeue_request() can be
>>> called for one request which won't enter io scheduler.
>>> 
>> 
>> Exactly, there are two cases: requeues that lead to subsequent
>>

Re: [v4] ARM: dts: imx: Add support for Advantech DMS-BA16

2018-02-23 Thread Shawn Guo

On Sat, Feb 10, 2018 at 05:55:15AM +0800, Ken Lin wrote:
> Add support for Advantech DMS-BA16 board, which uses
> the Advantech BA-16 module.
> 
> Signed-off-by: Ken Lin 

Applied, thanks.

Re: [v4] ARM: dts: imx: Add support for Advantech DMS-BA16

2018-02-23 Thread Shawn Guo

On Sat, Feb 10, 2018 at 05:55:15AM +0800, Ken Lin wrote:
> Add support for Advantech DMS-BA16 board, which uses
> the Advantech BA-16 module.
> 
> Signed-off-by: Ken Lin 

Applied, thanks.

Re: [PATCH v5 3/6] powerpc/mm/slice: Enhance for supporting PPC32

2018-02-23 Thread Nicholas Piggin

On Thu, 22 Feb 2018 15:27:24 +0100 (CET)
Christophe Leroy  wrote:

> In preparation for the following patch which will fix an issue on
> the 8xx by re-using the 'slices', this patch enhances the
> 'slices' implementation to support 32 bits CPUs.
> 
> On PPC32, the address space is limited to 4Gbytes, hence only the low
> slices will be used.
> 
> The high slices use bitmaps. As bitmap functions are not prepared to
> handle bitmaps of size 0, this patch ensures that bitmap functions
> are called only when SLICE_NUM_HIGH is not nul.
> 
> Signed-off-by: Christophe Leroy 

This looks good to me, thank you for taking my feedback into account.

Is the patch split and naming good? Yes I guess so, this adds support
for ppc32 archs that select PPC_MM_SLICES, and the next one implements
it for 8xx. There looks to be some generic arch/powerpc/mm bits in the
next patch. I wonder if you would move them over? Then the next patch
could be called powerpc/8xx: ?

Anyway it's not a big deal.

Reviewed-by: Nicholas Piggin

Re: [PATCH v5 3/6] powerpc/mm/slice: Enhance for supporting PPC32

2018-02-23 Thread Nicholas Piggin

On Thu, 22 Feb 2018 15:27:24 +0100 (CET)
Christophe Leroy  wrote:

> In preparation for the following patch which will fix an issue on
> the 8xx by re-using the 'slices', this patch enhances the
> 'slices' implementation to support 32 bits CPUs.
> 
> On PPC32, the address space is limited to 4Gbytes, hence only the low
> slices will be used.
> 
> The high slices use bitmaps. As bitmap functions are not prepared to
> handle bitmaps of size 0, this patch ensures that bitmap functions
> are called only when SLICE_NUM_HIGH is not nul.
> 
> Signed-off-by: Christophe Leroy 

This looks good to me, thank you for taking my feedback into account.

Is the patch split and naming good? Yes I guess so, this adds support
for ppc32 archs that select PPC_MM_SLICES, and the next one implements
it for 8xx. There looks to be some generic arch/powerpc/mm bits in the
next patch. I wonder if you would move them over? Then the next patch
could be called powerpc/8xx: ?

Anyway it's not a big deal.

Reviewed-by: Nicholas Piggin

Re: [PATCHv4 1/2] ARM: imx53: add secure-reg-access support for PMU

2018-02-23 Thread Shawn Guo

On Mon, Feb 12, 2018 at 01:39:44PM +0100, Sebastian Reichel wrote:
> On i.MX53 it is necessary to set the DBG_EN bit in the
> platform GPC register to enable access to PMU counters
> other than the cycle counter.
> 
> Signed-off-by: Sebastian Reichel 
> ---
>  arch/arm/mach-imx/mach-imx53.c | 39 ++-
>  1 file changed, 38 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/mach-imx/mach-imx53.c b/arch/arm/mach-imx/mach-imx53.c
> index 07c2e8dca494..658e28604dca 100644
> --- a/arch/arm/mach-imx/mach-imx53.c
> +++ b/arch/arm/mach-imx/mach-imx53.c
> @@ -28,10 +28,47 @@ static void __init imx53_init_early(void)
>   mxc_set_cpu_type(MXC_CPU_MX53);
>  }
>  
> +#define MXC_CORTEXA8_PLAT_GPC 0x63fa0004

The base address should be retrieved from device tree.

Shawn

> +#define GPC_DBG_EN BIT(16)
> +
> +/*
> + * This enables the DBGEN bit in ARM_GPC register, which is
> + * required for accessing some performance counter features.
> + * Technically it is only required while perf is used, but to
> + * keep the source code simple we just enable it all the time
> + * when the kernel configuration allows using the feature.
> + */
> +static void imx53_pmu_init(void)
> +{
> + void __iomem *gpc_reg;
> + struct device_node *node;
> + u32 gpc;
> +
> + if (!IS_ENABLED(CONFIG_ARM_PMU))
> + return;
> +
> + node = of_find_compatible_node(NULL, NULL, "arm,cortex-a8-pmu");
> + if (!node)
> + return;
> +
> + if (!of_property_read_bool(node, "secure-reg-access"))
> + return;
> +
> + gpc_reg = ioremap(MXC_CORTEXA8_PLAT_GPC, 4);
> + if (!gpc_reg) {
> + pr_warning("unable to map GPC to enable perf\n");
> + return;
> + }
> +
> + gpc = readl_relaxed(gpc_reg);
> + gpc |= GPC_DBG_EN;
> + writel_relaxed(gpc, gpc_reg);
> +}
> +
>  static void __init imx53_dt_init(void)
>  {
>   imx_src_init();
> -
> + imx53_pmu_init();
>   imx_aips_allow_unprivileged_access("fsl,imx53-aipstz");
>  }
>  
> -- 
> 2.15.1
>

Re: [PATCHv4 1/2] ARM: imx53: add secure-reg-access support for PMU

2018-02-23 Thread Shawn Guo

On Mon, Feb 12, 2018 at 01:39:44PM +0100, Sebastian Reichel wrote:
> On i.MX53 it is necessary to set the DBG_EN bit in the
> platform GPC register to enable access to PMU counters
> other than the cycle counter.
> 
> Signed-off-by: Sebastian Reichel 
> ---
>  arch/arm/mach-imx/mach-imx53.c | 39 ++-
>  1 file changed, 38 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/mach-imx/mach-imx53.c b/arch/arm/mach-imx/mach-imx53.c
> index 07c2e8dca494..658e28604dca 100644
> --- a/arch/arm/mach-imx/mach-imx53.c
> +++ b/arch/arm/mach-imx/mach-imx53.c
> @@ -28,10 +28,47 @@ static void __init imx53_init_early(void)
>   mxc_set_cpu_type(MXC_CPU_MX53);
>  }
>  
> +#define MXC_CORTEXA8_PLAT_GPC 0x63fa0004

The base address should be retrieved from device tree.

Shawn

> +#define GPC_DBG_EN BIT(16)
> +
> +/*
> + * This enables the DBGEN bit in ARM_GPC register, which is
> + * required for accessing some performance counter features.
> + * Technically it is only required while perf is used, but to
> + * keep the source code simple we just enable it all the time
> + * when the kernel configuration allows using the feature.
> + */
> +static void imx53_pmu_init(void)
> +{
> + void __iomem *gpc_reg;
> + struct device_node *node;
> + u32 gpc;
> +
> + if (!IS_ENABLED(CONFIG_ARM_PMU))
> + return;
> +
> + node = of_find_compatible_node(NULL, NULL, "arm,cortex-a8-pmu");
> + if (!node)
> + return;
> +
> + if (!of_property_read_bool(node, "secure-reg-access"))
> + return;
> +
> + gpc_reg = ioremap(MXC_CORTEXA8_PLAT_GPC, 4);
> + if (!gpc_reg) {
> + pr_warning("unable to map GPC to enable perf\n");
> + return;
> + }
> +
> + gpc = readl_relaxed(gpc_reg);
> + gpc |= GPC_DBG_EN;
> + writel_relaxed(gpc, gpc_reg);
> +}
> +
>  static void __init imx53_dt_init(void)
>  {
>   imx_src_init();
> -
> + imx53_pmu_init();
>   imx_aips_allow_unprivileged_access("fsl,imx53-aipstz");
>  }
>  
> -- 
> 2.15.1
>

Re: [PATCH RFC] ARM: imx: avic: set low-power interrupt mask for imx25

2018-02-23 Thread Shawn Guo

On Fri, Feb 09, 2018 at 01:43:23PM +0100, Martin Kaiser wrote:
> imx25 contains two registers (LPIMR0 and 1) to define which interrupts
> are enabled in low-power mode. As of today, those two registers are
> configured to enable all interrupts. Before going to low-power mode, the
> AVIC's INTENABLEH and INTENABLEL registers are configured to enable only
> those interrupts which are used as wakeup sources.
> 
> It turned out that this approach is not sufficient if we want the imx25
> to go into stop mode during suspend-to-ram. (Stop mode is the low-power
> mode that consumes the least power. The peripheral master clock is
> switched off in this mode). For stop mode to work, the LPIMR0 and 1
> registers have to be configured with the set of interrupts that are
> allowed in low-power mode. Fortunately, the bits in the LPIMR registers
> are assigned to the same interrups as the bits in INTENABLEH and
> INTENABLEL. However, LPIMR uses 1 to mask an interrupt whereas the
> INTENABLE registers use 1 to enable an interrupt.
> 
> This patch sets the LPIMR registers to the inverted bitmask of the
> INTENABLE registers during suspend and goes back to "all interrupts
> masked" when we wake up again. We also make this the default at startup.
> 
> As far as I know, the other supported imx architectures have no similar
> mechanism. Since the LPIMR registers are part of the CCM module, we
> query the device tree for an imx25 ccm node in order to detect if we're
> running on imx25.
> 
> Signed-off-by: Martin Kaiser 
> ---
> 
> Dear all,
> 
> could you have a look at this first draft? The approach to detect imx25
> looks a bit hackish, I'd appreciate your suggestions how to do this
> properly.
> 
> Thanks & best regards,
> 
>Martin
> 
>  arch/arm/mach-imx/avic.c | 24 +++-
>  1 file changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/mach-imx/avic.c b/arch/arm/mach-imx/avic.c
> index 1afccae..bd6d3f2 100644
> --- a/arch/arm/mach-imx/avic.c
> +++ b/arch/arm/mach-imx/avic.c
> @@ -22,6 +22,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  
> @@ -51,7 +52,10 @@
>  
>  #define AVIC_NUM_IRQS 64
>  
> -static void __iomem *avic_base;
> +#define MX25_CCM_LPIMR0  0x68
> +#define MX25_CCM_LPIMR1  0x6C
> +
> +static void __iomem *avic_base, *mx25_ccm_base;

Keep avic_base line untouched, and add a new one for mx25_ccm_base.

>  static struct irq_domain *domain;
>  
>  #ifdef CONFIG_FIQ
> @@ -93,6 +97,11 @@ static void avic_irq_suspend(struct irq_data *d)
>  
>   avic_saved_mask_reg[idx] = imx_readl(avic_base + ct->regs.mask);
>   imx_writel(gc->wake_active, avic_base + ct->regs.mask);

Have a newline here.

> + if (mx25_ccm_base) {
> + u8 offs = d->hwirq < AVIC_NUM_IRQS / 2 ?
> + MX25_CCM_LPIMR0 : MX25_CCM_LPIMR1;
> + imx_writel(~gc->wake_active, mx25_ccm_base + offs);
> + }
>  }
>  
>  static void avic_irq_resume(struct irq_data *d)
> @@ -102,6 +111,11 @@ static void avic_irq_resume(struct irq_data *d)
>   int idx = d->hwirq >> 5;
>  
>   imx_writel(avic_saved_mask_reg[idx], avic_base + ct->regs.mask);

Ditto

Shawn

> + if (mx25_ccm_base) {
> + u8 offs = d->hwirq < AVIC_NUM_IRQS / 2 ?
> + MX25_CCM_LPIMR0 : MX25_CCM_LPIMR1;
> + imx_writel(0x, mx25_ccm_base + offs);
> + }
>  }
>  
>  #else
> @@ -158,6 +172,14 @@ void __init mxc_init_irq(void __iomem *irqbase)
>  
>   avic_base = irqbase;
>  
> + np = of_find_compatible_node(NULL, NULL, "fsl,imx25-ccm");
> + mx25_ccm_base = of_iomap(np, 0);
> +
> + if (mx25_ccm_base) {
> + imx_writel(0x, mx25_ccm_base + MX25_CCM_LPIMR0);
> + imx_writel(0x, mx25_ccm_base + MX25_CCM_LPIMR1);
> + }
> +
>   /* put the AVIC into the reset value with
>* all interrupts disabled
>*/
> -- 
> 2.1.4
>

Re: [PATCH RFC] ARM: imx: avic: set low-power interrupt mask for imx25

2018-02-23 Thread Shawn Guo

On Fri, Feb 09, 2018 at 01:43:23PM +0100, Martin Kaiser wrote:
> imx25 contains two registers (LPIMR0 and 1) to define which interrupts
> are enabled in low-power mode. As of today, those two registers are
> configured to enable all interrupts. Before going to low-power mode, the
> AVIC's INTENABLEH and INTENABLEL registers are configured to enable only
> those interrupts which are used as wakeup sources.
> 
> It turned out that this approach is not sufficient if we want the imx25
> to go into stop mode during suspend-to-ram. (Stop mode is the low-power
> mode that consumes the least power. The peripheral master clock is
> switched off in this mode). For stop mode to work, the LPIMR0 and 1
> registers have to be configured with the set of interrupts that are
> allowed in low-power mode. Fortunately, the bits in the LPIMR registers
> are assigned to the same interrups as the bits in INTENABLEH and
> INTENABLEL. However, LPIMR uses 1 to mask an interrupt whereas the
> INTENABLE registers use 1 to enable an interrupt.
> 
> This patch sets the LPIMR registers to the inverted bitmask of the
> INTENABLE registers during suspend and goes back to "all interrupts
> masked" when we wake up again. We also make this the default at startup.
> 
> As far as I know, the other supported imx architectures have no similar
> mechanism. Since the LPIMR registers are part of the CCM module, we
> query the device tree for an imx25 ccm node in order to detect if we're
> running on imx25.
> 
> Signed-off-by: Martin Kaiser 
> ---
> 
> Dear all,
> 
> could you have a look at this first draft? The approach to detect imx25
> looks a bit hackish, I'd appreciate your suggestions how to do this
> properly.
> 
> Thanks & best regards,
> 
>Martin
> 
>  arch/arm/mach-imx/avic.c | 24 +++-
>  1 file changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/mach-imx/avic.c b/arch/arm/mach-imx/avic.c
> index 1afccae..bd6d3f2 100644
> --- a/arch/arm/mach-imx/avic.c
> +++ b/arch/arm/mach-imx/avic.c
> @@ -22,6 +22,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  
> @@ -51,7 +52,10 @@
>  
>  #define AVIC_NUM_IRQS 64
>  
> -static void __iomem *avic_base;
> +#define MX25_CCM_LPIMR0  0x68
> +#define MX25_CCM_LPIMR1  0x6C
> +
> +static void __iomem *avic_base, *mx25_ccm_base;

Keep avic_base line untouched, and add a new one for mx25_ccm_base.

>  static struct irq_domain *domain;
>  
>  #ifdef CONFIG_FIQ
> @@ -93,6 +97,11 @@ static void avic_irq_suspend(struct irq_data *d)
>  
>   avic_saved_mask_reg[idx] = imx_readl(avic_base + ct->regs.mask);
>   imx_writel(gc->wake_active, avic_base + ct->regs.mask);

Have a newline here.

> + if (mx25_ccm_base) {
> + u8 offs = d->hwirq < AVIC_NUM_IRQS / 2 ?
> + MX25_CCM_LPIMR0 : MX25_CCM_LPIMR1;
> + imx_writel(~gc->wake_active, mx25_ccm_base + offs);
> + }
>  }
>  
>  static void avic_irq_resume(struct irq_data *d)
> @@ -102,6 +111,11 @@ static void avic_irq_resume(struct irq_data *d)
>   int idx = d->hwirq >> 5;
>  
>   imx_writel(avic_saved_mask_reg[idx], avic_base + ct->regs.mask);

Ditto

Shawn

> + if (mx25_ccm_base) {
> + u8 offs = d->hwirq < AVIC_NUM_IRQS / 2 ?
> + MX25_CCM_LPIMR0 : MX25_CCM_LPIMR1;
> + imx_writel(0x, mx25_ccm_base + offs);
> + }
>  }
>  
>  #else
> @@ -158,6 +172,14 @@ void __init mxc_init_irq(void __iomem *irqbase)
>  
>   avic_base = irqbase;
>  
> + np = of_find_compatible_node(NULL, NULL, "fsl,imx25-ccm");
> + mx25_ccm_base = of_iomap(np, 0);
> +
> + if (mx25_ccm_base) {
> + imx_writel(0x, mx25_ccm_base + MX25_CCM_LPIMR0);
> + imx_writel(0x, mx25_ccm_base + MX25_CCM_LPIMR1);
> + }
> +
>   /* put the AVIC into the reset value with
>* all interrupts disabled
>*/
> -- 
> 2.1.4
>

Re: [RFC tip/locking/lockdep v5 04/17] lockdep: Introduce lock_list::dep

2018-02-23 Thread Boqun Feng

On Sat, Feb 24, 2018 at 01:32:50PM +0800, Boqun Feng wrote:
[...]
> 
> I also reorder bit number for each kind of dependency, so that we have a
> simple __calc_dep_bit(), see the following:
> 
>   /*
>* DEP_*_BIT in lock_list::dep
>*
>* For dependency @prev -> @next:
>*
>*   RR: both @prev and @next are recursive read locks, i.e. ->read == 
> 2.
>*   RN: @prev is recursive and @next is non-recursive.
>*   NR: @prev is a not recursive and @next is recursive.
>*   NN: both @prev and @next are non-recursive.
>* 
>* Note that we define the value of DEP_*_BITs so that:
>*  bit0 is prev->read != 2
>*  bit1 is next->read != 2
>*/
>   #define DEP_RR_BIT 0
>   #define DEP_RN_BIT 1
>   #define DEP_NR_BIT 2

Oops, to make the following __cal_dep_bit() works, we should have:

#define DEP_NR_BIT 1
#define DEP_RN_BIT 2

instead.

Regards,
Boqun

>   #define DEP_NN_BIT 3
> 
>   #define DEP_RR_MASK (1U << (DEP_RR_BIT))
>   #define DEP_RN_MASK (1U << (DEP_RN_BIT))
>   #define DEP_NR_MASK (1U << (DEP_NR_BIT))
>   #define DEP_NN_MASK (1U << (DEP_NN_BIT))
> 
>   static inline unsigned int
>   __calc_dep_bit(struct held_lock *prev, struct held_lock *next)
>   {
>   return (prev->read != 2) + ((next->read != 2) << 1)
>   }
> 
>   static inline u8 calc_dep(struct held_lock *prev, struct held_lock 
> *next)
>   {
>   return 1U << __calc_dep_bit(prev, next);
>   }
> 


signature.asc
Description: PGP signature

Re: [RFC tip/locking/lockdep v5 04/17] lockdep: Introduce lock_list::dep

2018-02-23 Thread Boqun Feng

On Sat, Feb 24, 2018 at 01:32:50PM +0800, Boqun Feng wrote:
[...]
> 
> I also reorder bit number for each kind of dependency, so that we have a
> simple __calc_dep_bit(), see the following:
> 
>   /*
>* DEP_*_BIT in lock_list::dep
>*
>* For dependency @prev -> @next:
>*
>*   RR: both @prev and @next are recursive read locks, i.e. ->read == 
> 2.
>*   RN: @prev is recursive and @next is non-recursive.
>*   NR: @prev is a not recursive and @next is recursive.
>*   NN: both @prev and @next are non-recursive.
>* 
>* Note that we define the value of DEP_*_BITs so that:
>*  bit0 is prev->read != 2
>*  bit1 is next->read != 2
>*/
>   #define DEP_RR_BIT 0
>   #define DEP_RN_BIT 1
>   #define DEP_NR_BIT 2

Oops, to make the following __cal_dep_bit() works, we should have:

#define DEP_NR_BIT 1
#define DEP_RN_BIT 2

instead.

Regards,
Boqun

>   #define DEP_NN_BIT 3
> 
>   #define DEP_RR_MASK (1U << (DEP_RR_BIT))
>   #define DEP_RN_MASK (1U << (DEP_RN_BIT))
>   #define DEP_NR_MASK (1U << (DEP_NR_BIT))
>   #define DEP_NN_MASK (1U << (DEP_NN_BIT))
> 
>   static inline unsigned int
>   __calc_dep_bit(struct held_lock *prev, struct held_lock *next)
>   {
>   return (prev->read != 2) + ((next->read != 2) << 1)
>   }
> 
>   static inline u8 calc_dep(struct held_lock *prev, struct held_lock 
> *next)
>   {
>   return 1U << __calc_dep_bit(prev, next);
>   }
> 


signature.asc
Description: PGP signature

Re: [PATCH 2/2] efi/esrt: mark ESRT memory region as nomap

2018-02-23 Thread Dave Young

On 02/23/18 at 12:42pm, Tyler Baicar wrote:
> The ESRT memory region is being exposed as System RAM in /proc/iomem
> which is wrong because it cannot be overwritten. This memory is needed
> for kexec kernels in order to properly initialize ESRT, so if it is
> overwritten it will cause ESRT failures in the kexec kernel. Mark this
> region as nomap so that it is not overwritten.
> 
> Signed-off-by: Tyler Baicar 
> Tested-by: Jeffrey Hugo 
> ---
>  drivers/firmware/efi/esrt.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/firmware/efi/esrt.c b/drivers/firmware/efi/esrt.c
> index 504f3c3..f5f79c7 100644
> --- a/drivers/firmware/efi/esrt.c
> +++ b/drivers/firmware/efi/esrt.c
> @@ -335,6 +335,14 @@ void __init efi_esrt_init(void)
>   pr_info("Reserving ESRT space from %pa to %pa.\n", _data, );
>   efi_mem_reserve(esrt_data, esrt_data_size);
>  
> + /*
> +  * Mark the ESRT memory region as nomap to avoid it being exposed as
> +  * System RAM in /proc/iomem. Otherwise this block can be overwritten
> +  * which will then cause failures in kexec'd kernels since the ESRT
> +  * information is no longer there.
> +  */
> + memblock_mark_nomap(esrt_data, esrt_data_size);
> +

On my X86 machine, esrt region was marked as reserved /proc/iomem,
this issue could be a arm64 only problem, it is better to handle this in
arm init code.


>   pr_debug("esrt-init: loaded.\n");
>  err_memunmap:
>   early_memunmap(va, size);
> -- 
> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
> Technologies, Inc.
> Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-efi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Thanks
Dave

Re: [PATCH 2/2] efi/esrt: mark ESRT memory region as nomap

2018-02-23 Thread Dave Young

On 02/23/18 at 12:42pm, Tyler Baicar wrote:
> The ESRT memory region is being exposed as System RAM in /proc/iomem
> which is wrong because it cannot be overwritten. This memory is needed
> for kexec kernels in order to properly initialize ESRT, so if it is
> overwritten it will cause ESRT failures in the kexec kernel. Mark this
> region as nomap so that it is not overwritten.
> 
> Signed-off-by: Tyler Baicar 
> Tested-by: Jeffrey Hugo 
> ---
>  drivers/firmware/efi/esrt.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/firmware/efi/esrt.c b/drivers/firmware/efi/esrt.c
> index 504f3c3..f5f79c7 100644
> --- a/drivers/firmware/efi/esrt.c
> +++ b/drivers/firmware/efi/esrt.c
> @@ -335,6 +335,14 @@ void __init efi_esrt_init(void)
>   pr_info("Reserving ESRT space from %pa to %pa.\n", _data, );
>   efi_mem_reserve(esrt_data, esrt_data_size);
>  
> + /*
> +  * Mark the ESRT memory region as nomap to avoid it being exposed as
> +  * System RAM in /proc/iomem. Otherwise this block can be overwritten
> +  * which will then cause failures in kexec'd kernels since the ESRT
> +  * information is no longer there.
> +  */
> + memblock_mark_nomap(esrt_data, esrt_data_size);
> +

On my X86 machine, esrt region was marked as reserved /proc/iomem,
this issue could be a arm64 only problem, it is better to handle this in
arm init code.


>   pr_debug("esrt-init: loaded.\n");
>  err_memunmap:
>   early_memunmap(va, size);
> -- 
> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
> Technologies, Inc.
> Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-efi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Thanks
Dave

Re: [PATCH v5 2/6] powerpc/mm/slice: create header files dedicated to slices

2018-02-23 Thread Nicholas Piggin

On Thu, 22 Feb 2018 15:27:22 +0100 (CET)
Christophe Leroy  wrote:

> In preparation for the following patch which will enhance 'slices'
> for supporting PPC32 in order to fix an issue on hugepages on 8xx,
> this patch takes out of page*.h all bits related to 'slices' and put
> them into newly created slice.h header files.
> While common parts go into asm/slice.h, subarch specific
> parts go into respective books3s/64/slice.c and nohash/64/slice.c
> 'slices'
> 
> Signed-off-by: Christophe Leroy 

I don't see a problem with this. Even by itself it seems like
a good cleanup.

Reviewed-by: Nicholas Piggin 

> ---
>  v5: new - come from a split of patch 2 of v4
> 
>  arch/powerpc/include/asm/book3s/64/slice.h | 27 ++
>  arch/powerpc/include/asm/nohash/64/slice.h | 12 ++
>  arch/powerpc/include/asm/page.h|  1 +
>  arch/powerpc/include/asm/page_64.h | 59 
> --
>  arch/powerpc/include/asm/slice.h   | 40 
>  5 files changed, 80 insertions(+), 59 deletions(-)
>  create mode 100644 arch/powerpc/include/asm/book3s/64/slice.h
>  create mode 100644 arch/powerpc/include/asm/nohash/64/slice.h
>  create mode 100644 arch/powerpc/include/asm/slice.h
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/slice.h 
> b/arch/powerpc/include/asm/book3s/64/slice.h
> new file mode 100644
> index ..db0dedab65ee
> --- /dev/null
> +++ b/arch/powerpc/include/asm/book3s/64/slice.h
> @@ -0,0 +1,27 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_POWERPC_BOOK3S_64_SLICE_H
> +#define _ASM_POWERPC_BOOK3S_64_SLICE_H
> +
> +#ifdef CONFIG_PPC_MM_SLICES
> +
> +#define SLICE_LOW_SHIFT  28
> +#define SLICE_LOW_TOP(0x1ul)
> +#define SLICE_NUM_LOW(SLICE_LOW_TOP >> SLICE_LOW_SHIFT)
> +#define GET_LOW_SLICE_INDEX(addr)((addr) >> SLICE_LOW_SHIFT)
> +
> +#define SLICE_HIGH_SHIFT 40
> +#define SLICE_NUM_HIGH   (H_PGTABLE_RANGE >> SLICE_HIGH_SHIFT)
> +#define GET_HIGH_SLICE_INDEX(addr)   ((addr) >> SLICE_HIGH_SHIFT)
> +
> +#else /* CONFIG_PPC_MM_SLICES */
> +
> +#define get_slice_psize(mm, addr)((mm)->context.user_psize)
> +#define slice_set_user_psize(mm, psize)  \
> +do { \
> + (mm)->context.user_psize = (psize); \
> + (mm)->context.sllp = SLB_VSID_USER | mmu_psize_defs[(psize)].sllp; \
> +} while (0)
> +
> +#endif /* CONFIG_PPC_MM_SLICES */
> +
> +#endif /* _ASM_POWERPC_BOOK3S_64_SLICE_H */
> diff --git a/arch/powerpc/include/asm/nohash/64/slice.h 
> b/arch/powerpc/include/asm/nohash/64/slice.h
> new file mode 100644
> index ..ad0d6e3cc1c5
> --- /dev/null
> +++ b/arch/powerpc/include/asm/nohash/64/slice.h
> @@ -0,0 +1,12 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_POWERPC_NOHASH_64_SLICE_H
> +#define _ASM_POWERPC_NOHASH_64_SLICE_H
> +
> +#ifdef CONFIG_PPC_64K_PAGES
> +#define get_slice_psize(mm, addr)MMU_PAGE_64K
> +#else /* CONFIG_PPC_64K_PAGES */
> +#define get_slice_psize(mm, addr)MMU_PAGE_4K
> +#endif /* !CONFIG_PPC_64K_PAGES */
> +#define slice_set_user_psize(mm, psize)  do { BUG(); } while (0)
> +
> +#endif /* _ASM_POWERPC_NOHASH_64_SLICE_H */
> diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
> index 8da5d4c1cab2..d5f1c41b7dba 100644
> --- a/arch/powerpc/include/asm/page.h
> +++ b/arch/powerpc/include/asm/page.h
> @@ -344,5 +344,6 @@ typedef struct page *pgtable_t;
>  
>  #include 
>  #endif /* __ASSEMBLY__ */
> +#include 
>  
>  #endif /* _ASM_POWERPC_PAGE_H */
> diff --git a/arch/powerpc/include/asm/page_64.h 
> b/arch/powerpc/include/asm/page_64.h
> index 56234c6fcd61..af04acdb873f 100644
> --- a/arch/powerpc/include/asm/page_64.h
> +++ b/arch/powerpc/include/asm/page_64.h
> @@ -86,65 +86,6 @@ extern u64 ppc64_pft_size;
>  
>  #endif /* __ASSEMBLY__ */
>  
> -#ifdef CONFIG_PPC_MM_SLICES
> -
> -#define SLICE_LOW_SHIFT  28
> -#define SLICE_HIGH_SHIFT 40
> -
> -#define SLICE_LOW_TOP(0x1ul)
> -#define SLICE_NUM_LOW(SLICE_LOW_TOP >> SLICE_LOW_SHIFT)
> -#define SLICE_NUM_HIGH   (H_PGTABLE_RANGE >> SLICE_HIGH_SHIFT)
> -
> -#define GET_LOW_SLICE_INDEX(addr)((addr) >> SLICE_LOW_SHIFT)
> -#define GET_HIGH_SLICE_INDEX(addr)   ((addr) >> SLICE_HIGH_SHIFT)
> -
> -#ifndef __ASSEMBLY__
> -struct mm_struct;
> -
> -extern unsigned long slice_get_unmapped_area(unsigned long addr,
> -  unsigned long len,
> -  unsigned long flags,
> -  unsigned int psize,
> -  int topdown);
> -
> -extern unsigned int get_slice_psize(struct mm_struct *mm,
> - unsigned long addr);
> -
> -extern void slice_set_user_psize(struct

Re: [PATCH v5 2/6] powerpc/mm/slice: create header files dedicated to slices

2018-02-23 Thread Nicholas Piggin

On Thu, 22 Feb 2018 15:27:22 +0100 (CET)
Christophe Leroy  wrote:

> In preparation for the following patch which will enhance 'slices'
> for supporting PPC32 in order to fix an issue on hugepages on 8xx,
> this patch takes out of page*.h all bits related to 'slices' and put
> them into newly created slice.h header files.
> While common parts go into asm/slice.h, subarch specific
> parts go into respective books3s/64/slice.c and nohash/64/slice.c
> 'slices'
> 
> Signed-off-by: Christophe Leroy 

I don't see a problem with this. Even by itself it seems like
a good cleanup.

Reviewed-by: Nicholas Piggin 

> ---
>  v5: new - come from a split of patch 2 of v4
> 
>  arch/powerpc/include/asm/book3s/64/slice.h | 27 ++
>  arch/powerpc/include/asm/nohash/64/slice.h | 12 ++
>  arch/powerpc/include/asm/page.h|  1 +
>  arch/powerpc/include/asm/page_64.h | 59 
> --
>  arch/powerpc/include/asm/slice.h   | 40 
>  5 files changed, 80 insertions(+), 59 deletions(-)
>  create mode 100644 arch/powerpc/include/asm/book3s/64/slice.h
>  create mode 100644 arch/powerpc/include/asm/nohash/64/slice.h
>  create mode 100644 arch/powerpc/include/asm/slice.h
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/slice.h 
> b/arch/powerpc/include/asm/book3s/64/slice.h
> new file mode 100644
> index ..db0dedab65ee
> --- /dev/null
> +++ b/arch/powerpc/include/asm/book3s/64/slice.h
> @@ -0,0 +1,27 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_POWERPC_BOOK3S_64_SLICE_H
> +#define _ASM_POWERPC_BOOK3S_64_SLICE_H
> +
> +#ifdef CONFIG_PPC_MM_SLICES
> +
> +#define SLICE_LOW_SHIFT  28
> +#define SLICE_LOW_TOP(0x1ul)
> +#define SLICE_NUM_LOW(SLICE_LOW_TOP >> SLICE_LOW_SHIFT)
> +#define GET_LOW_SLICE_INDEX(addr)((addr) >> SLICE_LOW_SHIFT)
> +
> +#define SLICE_HIGH_SHIFT 40
> +#define SLICE_NUM_HIGH   (H_PGTABLE_RANGE >> SLICE_HIGH_SHIFT)
> +#define GET_HIGH_SLICE_INDEX(addr)   ((addr) >> SLICE_HIGH_SHIFT)
> +
> +#else /* CONFIG_PPC_MM_SLICES */
> +
> +#define get_slice_psize(mm, addr)((mm)->context.user_psize)
> +#define slice_set_user_psize(mm, psize)  \
> +do { \
> + (mm)->context.user_psize = (psize); \
> + (mm)->context.sllp = SLB_VSID_USER | mmu_psize_defs[(psize)].sllp; \
> +} while (0)
> +
> +#endif /* CONFIG_PPC_MM_SLICES */
> +
> +#endif /* _ASM_POWERPC_BOOK3S_64_SLICE_H */
> diff --git a/arch/powerpc/include/asm/nohash/64/slice.h 
> b/arch/powerpc/include/asm/nohash/64/slice.h
> new file mode 100644
> index ..ad0d6e3cc1c5
> --- /dev/null
> +++ b/arch/powerpc/include/asm/nohash/64/slice.h
> @@ -0,0 +1,12 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_POWERPC_NOHASH_64_SLICE_H
> +#define _ASM_POWERPC_NOHASH_64_SLICE_H
> +
> +#ifdef CONFIG_PPC_64K_PAGES
> +#define get_slice_psize(mm, addr)MMU_PAGE_64K
> +#else /* CONFIG_PPC_64K_PAGES */
> +#define get_slice_psize(mm, addr)MMU_PAGE_4K
> +#endif /* !CONFIG_PPC_64K_PAGES */
> +#define slice_set_user_psize(mm, psize)  do { BUG(); } while (0)
> +
> +#endif /* _ASM_POWERPC_NOHASH_64_SLICE_H */
> diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
> index 8da5d4c1cab2..d5f1c41b7dba 100644
> --- a/arch/powerpc/include/asm/page.h
> +++ b/arch/powerpc/include/asm/page.h
> @@ -344,5 +344,6 @@ typedef struct page *pgtable_t;
>  
>  #include 
>  #endif /* __ASSEMBLY__ */
> +#include 
>  
>  #endif /* _ASM_POWERPC_PAGE_H */
> diff --git a/arch/powerpc/include/asm/page_64.h 
> b/arch/powerpc/include/asm/page_64.h
> index 56234c6fcd61..af04acdb873f 100644
> --- a/arch/powerpc/include/asm/page_64.h
> +++ b/arch/powerpc/include/asm/page_64.h
> @@ -86,65 +86,6 @@ extern u64 ppc64_pft_size;
>  
>  #endif /* __ASSEMBLY__ */
>  
> -#ifdef CONFIG_PPC_MM_SLICES
> -
> -#define SLICE_LOW_SHIFT  28
> -#define SLICE_HIGH_SHIFT 40
> -
> -#define SLICE_LOW_TOP(0x1ul)
> -#define SLICE_NUM_LOW(SLICE_LOW_TOP >> SLICE_LOW_SHIFT)
> -#define SLICE_NUM_HIGH   (H_PGTABLE_RANGE >> SLICE_HIGH_SHIFT)
> -
> -#define GET_LOW_SLICE_INDEX(addr)((addr) >> SLICE_LOW_SHIFT)
> -#define GET_HIGH_SLICE_INDEX(addr)   ((addr) >> SLICE_HIGH_SHIFT)
> -
> -#ifndef __ASSEMBLY__
> -struct mm_struct;
> -
> -extern unsigned long slice_get_unmapped_area(unsigned long addr,
> -  unsigned long len,
> -  unsigned long flags,
> -  unsigned int psize,
> -  int topdown);
> -
> -extern unsigned int get_slice_psize(struct mm_struct *mm,
> - unsigned long addr);
> -
> -extern void slice_set_user_psize(struct mm_struct *mm, unsigned int psize);
> -extern void

Re: [PATCH 1/2] efi/esrt: fix unsupported version initialization failure

2018-02-23 Thread Dave Young

On 02/23/18 at 12:42pm, Tyler Baicar wrote:
> If ESRT initialization fails due to an unsupported version, the
> early_memremap allocation is never unmapped. This will cause an
> early ioremap leak. So, make sure to unmap the memory allocation
> before returning from efi_esrt_init().
> 
> Signed-off-by: Tyler Baicar 
> ---
>  drivers/firmware/efi/esrt.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/firmware/efi/esrt.c b/drivers/firmware/efi/esrt.c
> index c47e0c6..504f3c3 100644
> --- a/drivers/firmware/efi/esrt.c
> +++ b/drivers/firmware/efi/esrt.c
> @@ -285,7 +285,7 @@ void __init efi_esrt_init(void)
>   } else {
>   pr_err("Unsupported ESRT version %lld.\n",
>  tmpesrt.fw_resource_version);
> - return;
> + goto err_memunmap;
>   }
>  
>   if (tmpesrt.fw_resource_count > 0 && max - size < entry_size) {
> -- 
> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
> Technologies, Inc.
> Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-efi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reviewed-by: Dave Young 

Thanks
Dave

Re: [PATCH 1/2] efi/esrt: fix unsupported version initialization failure

2018-02-23 Thread Dave Young

On 02/23/18 at 12:42pm, Tyler Baicar wrote:
> If ESRT initialization fails due to an unsupported version, the
> early_memremap allocation is never unmapped. This will cause an
> early ioremap leak. So, make sure to unmap the memory allocation
> before returning from efi_esrt_init().
> 
> Signed-off-by: Tyler Baicar 
> ---
>  drivers/firmware/efi/esrt.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/firmware/efi/esrt.c b/drivers/firmware/efi/esrt.c
> index c47e0c6..504f3c3 100644
> --- a/drivers/firmware/efi/esrt.c
> +++ b/drivers/firmware/efi/esrt.c
> @@ -285,7 +285,7 @@ void __init efi_esrt_init(void)
>   } else {
>   pr_err("Unsupported ESRT version %lld.\n",
>  tmpesrt.fw_resource_version);
> - return;
> + goto err_memunmap;
>   }
>  
>   if (tmpesrt.fw_resource_count > 0 && max - size < entry_size) {
> -- 
> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
> Technologies, Inc.
> Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-efi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reviewed-by: Dave Young 

Thanks
Dave

Re: [PATCH 1/3] arm64: dts: ls1012a: add cpu idle support

2018-02-23 Thread Shawn Guo

On Sat, Feb 24, 2018 at 02:25:16PM +0800, Shawn Guo wrote:
> On Thu, Feb 08, 2018 at 03:54:34PM +0800, Ran Wang wrote:
> > From: Yuantian Tang 
> > 
> > Signed-off-by: Tang Yuantian 
> > ---
> >  arch/arm64/boot/dts/freescale/fsl-ls1012a.dtsi |   18 ++
> >  1 files changed, 18 insertions(+), 0 deletions(-)
> 
> Applied all, thanks.

Ran,

Just noticed that these patches are authored by Yuantian.  When you send
patches authored by someone else, you should have your SoB added to it.

I just append your SoB below to all 3 patches.

Signed-off-by: Ran Wang 

Shawn

Re: [PATCH 1/3] arm64: dts: ls1012a: add cpu idle support

2018-02-23 Thread Shawn Guo

On Sat, Feb 24, 2018 at 02:25:16PM +0800, Shawn Guo wrote:
> On Thu, Feb 08, 2018 at 03:54:34PM +0800, Ran Wang wrote:
> > From: Yuantian Tang 
> > 
> > Signed-off-by: Tang Yuantian 
> > ---
> >  arch/arm64/boot/dts/freescale/fsl-ls1012a.dtsi |   18 ++
> >  1 files changed, 18 insertions(+), 0 deletions(-)
> 
> Applied all, thanks.

Ran,

Just noticed that these patches are authored by Yuantian.  When you send
patches authored by someone else, you should have your SoB added to it.

I just append your SoB below to all 3 patches.

Signed-off-by: Ran Wang 

Shawn

lib/find_bit.c:203:15: error: redefinition of 'find_next_zero_bit_le'

2018-02-23 Thread kbuild test robot

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   9cb9c07d6b0c5fd97d83b8ab14d7e308ba4b612f
commit: 101110f6271ce956a049250c907bc960030577f8 Kbuild: always define 
endianess in kconfig.h
date:   2 days ago
config: m32r-allyesconfig (attached as .config)
compiler: m32r-linux-gcc (GCC) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout 101110f6271ce956a049250c907bc960030577f8
# save the attached .config to linux build tree
make.cross ARCH=m32r 

All errors (new ones prefixed by >>):

   In file included from arch/m32r/include/uapi/asm/byteorder.h:8:0,
from arch/m32r/include/asm/bitops.h:22,
from include/linux/bitops.h:38,
from lib/find_bit.c:19:
   include/linux/byteorder/big_endian.h:8:2: warning: #warning inconsistent 
configuration, needs CONFIG_CPU_BIG_ENDIAN [-Wcpp]
#warning inconsistent configuration, needs CONFIG_CPU_BIG_ENDIAN
 ^~~
>> lib/find_bit.c:203:15: error: redefinition of 'find_next_zero_bit_le'
unsigned long find_next_zero_bit_le(const void *addr, unsigned
  ^
   In file included from arch/m32r/include/asm/bitops.h:269:0,
from include/linux/bitops.h:38,
from lib/find_bit.c:19:
   include/asm-generic/bitops/le.h:12:29: note: previous definition of 
'find_next_zero_bit_le' was here
static inline unsigned long find_next_zero_bit_le(const void *addr,
^
>> lib/find_bit.c:212:15: error: redefinition of 'find_next_bit_le'
unsigned long find_next_bit_le(const void *addr, unsigned
  ^~~~
   In file included from arch/m32r/include/asm/bitops.h:269:0,
from include/linux/bitops.h:38,
from lib/find_bit.c:19:
   include/asm-generic/bitops/le.h:18:29: note: previous definition of 
'find_next_bit_le' was here
static inline unsigned long find_next_bit_le(const void *addr,
^~~~

vim +/find_next_zero_bit_le +203 lib/find_bit.c

^1da177e lib/find_next_bit.c Linus Torvalds 2005-04-16  @19  #include 

8f6f19dd lib/find_next_bit.c Yury Norov 2015-04-16   20  #include 

8bc3bcc9 lib/find_next_bit.c Paul Gortmaker 2011-11-16   21  #include 

2c57a0e2 lib/find_next_bit.c Yury Norov 2015-04-16   22  #include 

^1da177e lib/find_next_bit.c Linus Torvalds 2005-04-16   23  
0ade34c3 lib/find_bit.c  Clement Courbet2018-02-06   24  #if 
!defined(find_next_bit) || !defined(find_next_zero_bit) || \
0ade34c3 lib/find_bit.c  Clement Courbet2018-02-06   25 
!defined(find_next_and_bit)
c7f612cd lib/find_next_bit.c Akinobu Mita   2006-03-26   26  
64970b68 lib/find_next_bit.c Alexander van Heukelum 2008-03-11   27  /*
0ade34c3 lib/find_bit.c  Clement Courbet2018-02-06   28   * This is 
a common helper function for find_next_bit, find_next_zero_bit, and
0ade34c3 lib/find_bit.c  Clement Courbet2018-02-06   29   * 
find_next_and_bit. The differences are:
0ade34c3 lib/find_bit.c  Clement Courbet2018-02-06   30   *  - The 
"invert" argument, which is XORed with each fetched word before
0ade34c3 lib/find_bit.c  Clement Courbet2018-02-06   31   *
searching it for one bits.
0ade34c3 lib/find_bit.c  Clement Courbet2018-02-06   32   *  - The 
optional "addr2", which is anded with "addr1" if present.
c7f612cd lib/find_next_bit.c Akinobu Mita   2006-03-26   33   */
0ade34c3 lib/find_bit.c  Clement Courbet2018-02-06   34  static 
inline unsigned long _find_next_bit(const unsigned long *addr1,
0ade34c3 lib/find_bit.c  Clement Courbet2018-02-06   35 
const unsigned long *addr2, unsigned long nbits,
0ade34c3 lib/find_bit.c  Clement Courbet2018-02-06   36 
unsigned long start, unsigned long invert)
^1da177e lib/find_next_bit.c Linus Torvalds 2005-04-16   37  {
^1da177e lib/find_next_bit.c Linus Torvalds 2005-04-16   38 
unsigned long tmp;
^1da177e lib/find_next_bit.c Linus Torvalds 2005-04-16   39  
e4afd2e5 lib/find_bit.c  Matthew Wilcox 2017-02-24   40 if 
(unlikely(start >= nbits))
2c57a0e2 lib/find_next_bit.c Yury Norov 2015-04-16   41 
return nbits;
2c57a0e2 lib/find_next_bit.c Yury Norov 2015-04-16   42  
0ade34c3 lib/find_bit.c  Clement Courbet2018-02-06   43 tmp = 
addr1[start / BITS_PER_LONG];
0ade34c3 lib/find_bit.c  Clement Courbet2018-02-06   44 if 
(addr2)
0ade34c3 lib/find_bit.c  Clement Courbet2018-02-06   45 
tmp &= addr2[start / BITS_PER_LONG];
0ade34c3 lib/find_bit.c

lib/find_bit.c:203:15: error: redefinition of 'find_next_zero_bit_le'

2018-02-23 Thread kbuild test robot

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   9cb9c07d6b0c5fd97d83b8ab14d7e308ba4b612f
commit: 101110f6271ce956a049250c907bc960030577f8 Kbuild: always define 
endianess in kconfig.h
date:   2 days ago
config: m32r-allyesconfig (attached as .config)
compiler: m32r-linux-gcc (GCC) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout 101110f6271ce956a049250c907bc960030577f8
# save the attached .config to linux build tree
make.cross ARCH=m32r 

All errors (new ones prefixed by >>):

   In file included from arch/m32r/include/uapi/asm/byteorder.h:8:0,
from arch/m32r/include/asm/bitops.h:22,
from include/linux/bitops.h:38,
from lib/find_bit.c:19:
   include/linux/byteorder/big_endian.h:8:2: warning: #warning inconsistent 
configuration, needs CONFIG_CPU_BIG_ENDIAN [-Wcpp]
#warning inconsistent configuration, needs CONFIG_CPU_BIG_ENDIAN
 ^~~
>> lib/find_bit.c:203:15: error: redefinition of 'find_next_zero_bit_le'
unsigned long find_next_zero_bit_le(const void *addr, unsigned
  ^
   In file included from arch/m32r/include/asm/bitops.h:269:0,
from include/linux/bitops.h:38,
from lib/find_bit.c:19:
   include/asm-generic/bitops/le.h:12:29: note: previous definition of 
'find_next_zero_bit_le' was here
static inline unsigned long find_next_zero_bit_le(const void *addr,
^
>> lib/find_bit.c:212:15: error: redefinition of 'find_next_bit_le'
unsigned long find_next_bit_le(const void *addr, unsigned
  ^~~~
   In file included from arch/m32r/include/asm/bitops.h:269:0,
from include/linux/bitops.h:38,
from lib/find_bit.c:19:
   include/asm-generic/bitops/le.h:18:29: note: previous definition of 
'find_next_bit_le' was here
static inline unsigned long find_next_bit_le(const void *addr,
^~~~

vim +/find_next_zero_bit_le +203 lib/find_bit.c

^1da177e lib/find_next_bit.c Linus Torvalds 2005-04-16  @19  #include 

8f6f19dd lib/find_next_bit.c Yury Norov 2015-04-16   20  #include 

8bc3bcc9 lib/find_next_bit.c Paul Gortmaker 2011-11-16   21  #include 

2c57a0e2 lib/find_next_bit.c Yury Norov 2015-04-16   22  #include 

^1da177e lib/find_next_bit.c Linus Torvalds 2005-04-16   23  
0ade34c3 lib/find_bit.c  Clement Courbet2018-02-06   24  #if 
!defined(find_next_bit) || !defined(find_next_zero_bit) || \
0ade34c3 lib/find_bit.c  Clement Courbet2018-02-06   25 
!defined(find_next_and_bit)
c7f612cd lib/find_next_bit.c Akinobu Mita   2006-03-26   26  
64970b68 lib/find_next_bit.c Alexander van Heukelum 2008-03-11   27  /*
0ade34c3 lib/find_bit.c  Clement Courbet2018-02-06   28   * This is 
a common helper function for find_next_bit, find_next_zero_bit, and
0ade34c3 lib/find_bit.c  Clement Courbet2018-02-06   29   * 
find_next_and_bit. The differences are:
0ade34c3 lib/find_bit.c  Clement Courbet2018-02-06   30   *  - The 
"invert" argument, which is XORed with each fetched word before
0ade34c3 lib/find_bit.c  Clement Courbet2018-02-06   31   *
searching it for one bits.
0ade34c3 lib/find_bit.c  Clement Courbet2018-02-06   32   *  - The 
optional "addr2", which is anded with "addr1" if present.
c7f612cd lib/find_next_bit.c Akinobu Mita   2006-03-26   33   */
0ade34c3 lib/find_bit.c  Clement Courbet2018-02-06   34  static 
inline unsigned long _find_next_bit(const unsigned long *addr1,
0ade34c3 lib/find_bit.c  Clement Courbet2018-02-06   35 
const unsigned long *addr2, unsigned long nbits,
0ade34c3 lib/find_bit.c  Clement Courbet2018-02-06   36 
unsigned long start, unsigned long invert)
^1da177e lib/find_next_bit.c Linus Torvalds 2005-04-16   37  {
^1da177e lib/find_next_bit.c Linus Torvalds 2005-04-16   38 
unsigned long tmp;
^1da177e lib/find_next_bit.c Linus Torvalds 2005-04-16   39  
e4afd2e5 lib/find_bit.c  Matthew Wilcox 2017-02-24   40 if 
(unlikely(start >= nbits))
2c57a0e2 lib/find_next_bit.c Yury Norov 2015-04-16   41 
return nbits;
2c57a0e2 lib/find_next_bit.c Yury Norov 2015-04-16   42  
0ade34c3 lib/find_bit.c  Clement Courbet2018-02-06   43 tmp = 
addr1[start / BITS_PER_LONG];
0ade34c3 lib/find_bit.c  Clement Courbet2018-02-06   44 if 
(addr2)
0ade34c3 lib/find_bit.c  Clement Courbet2018-02-06   45 
tmp &= addr2[start / BITS_PER_LONG];
0ade34c3 lib/find_bit.c

RE: [Intel-wired-lan] [PATCH net-queue] e1000e: Fix check_for_link return value with autoneg off.

2018-02-23 Thread Brown, Aaron F

> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@osuosl.org] On
> Behalf Of Benjamin Poirier
> Sent: Monday, February 19, 2018 10:12 PM
> To: Kirsher, Jeffrey T 
> Cc: net...@vger.kernel.org; intel-wired-...@lists.osuosl.org; linux-
> ker...@vger.kernel.org
> Subject: [Intel-wired-lan] [PATCH net-queue] e1000e: Fix check_for_link
> return value with autoneg off.
> 
> When autoneg is off, the .check_for_link callback functions clear the
> get_link_status flag and systematically return a "pseudo-error". This means
> that the link is not detected as up until the next execution of the
> e1000_watchdog_task() 2 seconds later.
> 
> Fixes: 19110cfbb34d ("e1000e: Separate signaling for link check/link up")
> Signed-off-by: Benjamin Poirier 
> ---
>  drivers/net/ethernet/intel/e1000e/ich8lan.c | 2 +-
>  drivers/net/ethernet/intel/e1000e/mac.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 

Tested-by: Aaron Brown

RE: [Intel-wired-lan] [PATCH net-queue] e1000e: Fix check_for_link return value with autoneg off.

2018-02-23 Thread Brown, Aaron F

> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@osuosl.org] On
> Behalf Of Benjamin Poirier
> Sent: Monday, February 19, 2018 10:12 PM
> To: Kirsher, Jeffrey T 
> Cc: net...@vger.kernel.org; intel-wired-...@lists.osuosl.org; linux-
> ker...@vger.kernel.org
> Subject: [Intel-wired-lan] [PATCH net-queue] e1000e: Fix check_for_link
> return value with autoneg off.
> 
> When autoneg is off, the .check_for_link callback functions clear the
> get_link_status flag and systematically return a "pseudo-error". This means
> that the link is not detected as up until the next execution of the
> e1000_watchdog_task() 2 seconds later.
> 
> Fixes: 19110cfbb34d ("e1000e: Separate signaling for link check/link up")
> Signed-off-by: Benjamin Poirier 
> ---
>  drivers/net/ethernet/intel/e1000e/ich8lan.c | 2 +-
>  drivers/net/ethernet/intel/e1000e/mac.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 

Tested-by: Aaron Brown

Re: [RFC tip/locking/lockdep v5 04/17] lockdep: Introduce lock_list::dep

2018-02-23 Thread Boqun Feng

On Sat, Feb 24, 2018 at 01:32:50PM +0800, Boqun Feng wrote:
> On Fri, Feb 23, 2018 at 08:37:32PM +0800, Boqun Feng wrote:
> > On Fri, Feb 23, 2018 at 12:55:20PM +0100, Peter Zijlstra wrote:
> > > On Thu, Feb 22, 2018 at 03:08:51PM +0800, Boqun Feng wrote:
> > > > @@ -1012,6 +1013,33 @@ static inline bool bfs_error(enum bfs_result res)
> > > > return res < 0;
> > > >  }
> > > >  
> > > > +#define DEP_NN_BIT 0
> > > > +#define DEP_RN_BIT 1
> > > > +#define DEP_NR_BIT 2
> > > > +#define DEP_RR_BIT 3
> > > > +
> > > > +#define DEP_NN_MASK (1U << (DEP_NN_BIT))
> > > > +#define DEP_RN_MASK (1U << (DEP_RN_BIT))
> > > > +#define DEP_NR_MASK (1U << (DEP_NR_BIT))
> > > > +#define DEP_RR_MASK (1U << (DEP_RR_BIT))
> > > > +
> > > > +static inline unsigned int __calc_dep_bit(int prev, int next)
> > > > +{
> > > > +   if (prev == 2 && next != 2)
> > > > +   return DEP_RN_BIT;
> > > > +   if (prev != 2 && next == 2)
> > > > +   return DEP_NR_BIT;
> > > > +   if (prev == 2 && next == 2)
> > > > +   return DEP_RR_BIT;
> > > > +   else
> > > > +   return DEP_NN_BIT;
> > > > +}
> > > > +
> > > > +static inline unsigned int calc_dep(int prev, int next)
> > > > +{
> > > > +   return 1U << __calc_dep_bit(prev, next);
> > > > +}
> > > > +
> > > >  static enum bfs_result __bfs(struct lock_list *source_entry,
> > > >  void *data,
> > > >  int (*match)(struct lock_list *entry, void 
> > > > *data),
> > > > @@ -1921,6 +1949,16 @@ check_prev_add(struct task_struct *curr, struct 
> > > > held_lock *prev,
> > > > if (entry->class == hlock_class(next)) {
> > > > if (distance == 1)
> > > > entry->distance = 1;
> > > > +   entry->dep |= calc_dep(prev->read, next->read);
> > > > +   }
> > > > +   }
> > > > +
> > > > +   /* Also, update the reverse dependency in @next's 
> > > > ->locks_before list */
> > > > +   list_for_each_entry(entry, _class(next)->locks_before, 
> > > > entry) {
> > > > +   if (entry->class == hlock_class(prev)) {
> > > > +   if (distance == 1)
> > > > +   entry->distance = 1;
> > > > +   entry->dep |= calc_dep(next->read, prev->read);
> > > > return 1;
> > > > }
> > > > }
> > > 
> > > I think it all becomes simpler if you use only 2 bits. Such that:
> > > 
> > >   bit0 is the prev R (0) or N (1) value,
> > >   bit1 is the next R (0) or N (1) value.
> > > 
> > > I think this should work because we don't care about the empty set
> > > (currently ) and all the complexity in patch 5 is because we can
> > > have R bits set when there's also N bits. The concequence of that is
> > > that we cannot replace ! with ~ (which is what I kept doing).
> > > 
> > > But with only 2 bits, we only track the strongest relation in the set,
> > > which is exactly what we appear to need.
> > > 
> > 
> > But if we only have RN and NR, both bits will be set, we can not check
> > whether we have NN or not. Consider we have:
> > 
> > A -(RR)-> B
> > B -(NR)-> C and B -(RN)-> C
> > C -(RN)-> A
> > 
> > this is not a deadlock case, but with "two bits" approach, we can not
> > differ this with:
> > 
> > A -(RR)-> B
> > B -(NN)-> C
> > C -(RN)-> A
> > 
> > , which is a deadlock.
> > 
> > But maybe "three bits" (NR, RN and NN bits) approach works, that is if
> > ->dep is 0, we indicates this is only RR, and is_rx() becomes:
> > 
> > static inline bool is_rx(u8 dep)
> > {
> > return !(dep & (NR_MASK | NN_MASK));
> > }
> > 
> > and is_xr() becomes:
> > 
> > static inline bool is_xr(u8 dep)
> > {
> > return !(dep & (RN_MASK | NN_MASK));
> > }
> > 
> > , with this I think your simplification with have_xr works, thanks!
> > 
> 
> Ah! I see. Actually your very first approach works, except the
> definitions of is_rx() and ir_xr() are wrong. In that approach, you
> define
>   
>   static inline bool is_rx(u8 dep)
>   {
>   return !!(dep & (DEP_RR_MASK | DEP_RN_MASK);
>   }
> 
> , which means "whether we have a R* dependency?". But in fact, what we
> need to check is "whether we _only_ have R* dependencies?", if so and
> have_xr is true, that means we could only have a -(*R)-> A -(R*)-> if we
> pick the next dependency, and that means we should skip. So my new
> definition above works, and I think we better name it as only_rx() to
> avoid confusion? Ditto for is_xr().
> 
> I also reorder bit number for each kind of dependency, so that we have a
> simple __calc_dep_bit(), see the following:
> 
>   /*
>* DEP_*_BIT in lock_list::dep
>*
>* For dependency @prev -> @next:
>*
>*   RR: both @prev and @next are recursive read locks, i.e. ->read == 
>

Re: [RFC tip/locking/lockdep v5 04/17] lockdep: Introduce lock_list::dep

2018-02-23 Thread Boqun Feng

On Sat, Feb 24, 2018 at 01:32:50PM +0800, Boqun Feng wrote:
> On Fri, Feb 23, 2018 at 08:37:32PM +0800, Boqun Feng wrote:
> > On Fri, Feb 23, 2018 at 12:55:20PM +0100, Peter Zijlstra wrote:
> > > On Thu, Feb 22, 2018 at 03:08:51PM +0800, Boqun Feng wrote:
> > > > @@ -1012,6 +1013,33 @@ static inline bool bfs_error(enum bfs_result res)
> > > > return res < 0;
> > > >  }
> > > >  
> > > > +#define DEP_NN_BIT 0
> > > > +#define DEP_RN_BIT 1
> > > > +#define DEP_NR_BIT 2
> > > > +#define DEP_RR_BIT 3
> > > > +
> > > > +#define DEP_NN_MASK (1U << (DEP_NN_BIT))
> > > > +#define DEP_RN_MASK (1U << (DEP_RN_BIT))
> > > > +#define DEP_NR_MASK (1U << (DEP_NR_BIT))
> > > > +#define DEP_RR_MASK (1U << (DEP_RR_BIT))
> > > > +
> > > > +static inline unsigned int __calc_dep_bit(int prev, int next)
> > > > +{
> > > > +   if (prev == 2 && next != 2)
> > > > +   return DEP_RN_BIT;
> > > > +   if (prev != 2 && next == 2)
> > > > +   return DEP_NR_BIT;
> > > > +   if (prev == 2 && next == 2)
> > > > +   return DEP_RR_BIT;
> > > > +   else
> > > > +   return DEP_NN_BIT;
> > > > +}
> > > > +
> > > > +static inline unsigned int calc_dep(int prev, int next)
> > > > +{
> > > > +   return 1U << __calc_dep_bit(prev, next);
> > > > +}
> > > > +
> > > >  static enum bfs_result __bfs(struct lock_list *source_entry,
> > > >  void *data,
> > > >  int (*match)(struct lock_list *entry, void 
> > > > *data),
> > > > @@ -1921,6 +1949,16 @@ check_prev_add(struct task_struct *curr, struct 
> > > > held_lock *prev,
> > > > if (entry->class == hlock_class(next)) {
> > > > if (distance == 1)
> > > > entry->distance = 1;
> > > > +   entry->dep |= calc_dep(prev->read, next->read);
> > > > +   }
> > > > +   }
> > > > +
> > > > +   /* Also, update the reverse dependency in @next's 
> > > > ->locks_before list */
> > > > +   list_for_each_entry(entry, _class(next)->locks_before, 
> > > > entry) {
> > > > +   if (entry->class == hlock_class(prev)) {
> > > > +   if (distance == 1)
> > > > +   entry->distance = 1;
> > > > +   entry->dep |= calc_dep(next->read, prev->read);
> > > > return 1;
> > > > }
> > > > }
> > > 
> > > I think it all becomes simpler if you use only 2 bits. Such that:
> > > 
> > >   bit0 is the prev R (0) or N (1) value,
> > >   bit1 is the next R (0) or N (1) value.
> > > 
> > > I think this should work because we don't care about the empty set
> > > (currently ) and all the complexity in patch 5 is because we can
> > > have R bits set when there's also N bits. The concequence of that is
> > > that we cannot replace ! with ~ (which is what I kept doing).
> > > 
> > > But with only 2 bits, we only track the strongest relation in the set,
> > > which is exactly what we appear to need.
> > > 
> > 
> > But if we only have RN and NR, both bits will be set, we can not check
> > whether we have NN or not. Consider we have:
> > 
> > A -(RR)-> B
> > B -(NR)-> C and B -(RN)-> C
> > C -(RN)-> A
> > 
> > this is not a deadlock case, but with "two bits" approach, we can not
> > differ this with:
> > 
> > A -(RR)-> B
> > B -(NN)-> C
> > C -(RN)-> A
> > 
> > , which is a deadlock.
> > 
> > But maybe "three bits" (NR, RN and NN bits) approach works, that is if
> > ->dep is 0, we indicates this is only RR, and is_rx() becomes:
> > 
> > static inline bool is_rx(u8 dep)
> > {
> > return !(dep & (NR_MASK | NN_MASK));
> > }
> > 
> > and is_xr() becomes:
> > 
> > static inline bool is_xr(u8 dep)
> > {
> > return !(dep & (RN_MASK | NN_MASK));
> > }
> > 
> > , with this I think your simplification with have_xr works, thanks!
> > 
> 
> Ah! I see. Actually your very first approach works, except the
> definitions of is_rx() and ir_xr() are wrong. In that approach, you
> define
>   
>   static inline bool is_rx(u8 dep)
>   {
>   return !!(dep & (DEP_RR_MASK | DEP_RN_MASK);
>   }
> 
> , which means "whether we have a R* dependency?". But in fact, what we
> need to check is "whether we _only_ have R* dependencies?", if so and
> have_xr is true, that means we could only have a -(*R)-> A -(R*)-> if we
> pick the next dependency, and that means we should skip. So my new
> definition above works, and I think we better name it as only_rx() to
> avoid confusion? Ditto for is_xr().
> 
> I also reorder bit number for each kind of dependency, so that we have a
> simple __calc_dep_bit(), see the following:
> 
>   /*
>* DEP_*_BIT in lock_list::dep
>*
>* For dependency @prev -> @next:
>*
>*   RR: both @prev and @next are recursive read locks, i.e. ->read == 
>

Re: [PATCH 1/3] arm64: dts: ls1012a: add cpu idle support

2018-02-23 Thread Shawn Guo

On Thu, Feb 08, 2018 at 03:54:34PM +0800, Ran Wang wrote:
> From: Yuantian Tang 
> 
> Signed-off-by: Tang Yuantian 
> ---
>  arch/arm64/boot/dts/freescale/fsl-ls1012a.dtsi |   18 ++
>  1 files changed, 18 insertions(+), 0 deletions(-)

Applied all, thanks.

Re: [PATCH 1/3] arm64: dts: ls1012a: add cpu idle support

2018-02-23 Thread Shawn Guo

On Thu, Feb 08, 2018 at 03:54:34PM +0800, Ran Wang wrote:
> From: Yuantian Tang 
> 
> Signed-off-by: Tang Yuantian 
> ---
>  arch/arm64/boot/dts/freescale/fsl-ls1012a.dtsi |   18 ++
>  1 files changed, 18 insertions(+), 0 deletions(-)

Applied all, thanks.

[lkp-robot] [rcu] 056becf54e: BUG:KASAN:null-ptr-deref_in__lock_acquire

2018-02-23 Thread kernel test robot

TO: Paul E. McKenney 
CC: LKML , Paul E. McKenney 
, linux-kernel@vger.kernel.org, l...@01.org



FYI, we noticed the following commit (built with gcc-7):

commit: 056becf54ef1ab39db14a66625353899dba6762f ("rcu: Parallelize expedited 
grace-period initialization")
https://git.kernel.org/cgit/linux/kernel/git/paulmck/linux-rcu.git rcu/dev

in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -smp 2 -m 512M

caused below changes (please refer to attached dmesg/kmsg for entire 
log/backtrace):


+--+++
|  | 28ea7ed1b3 
| 056becf54e |
+--+++
| boot_successes   | 2  
| 0  |
| boot_failures| 6  
| 41 |
| invoked_oom-killer:gfp_mask=0x   | 6  
||
| Mem-Info | 6  
||
| Kernel_panic-not_syncing:Out_of_memory_and_no_killable_processes | 6  
||
| BUG:KASAN:null-ptr-deref_in__lock_acquire| 0  
| 41 |
| BUG:unable_to_handle_kernel  | 0  
| 41 |
| Oops:#[##]   | 0  
| 41 |
| RIP:__lock_acquire   | 0  
| 41 |
| Kernel_panic-not_syncing:Fatal_exception | 0  
| 41 |
+--+++



[0.037875] BUG: KASAN: null-ptr-deref in __lock_acquire+0x171/0x13d0
[0.04] Read of size 8 at addr 0018 by task swapper/0/0
[0.04] 
[0.04] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
4.16.0-rc1-00044-g056becf #1
[0.04] Call Trace:
[0.04]  dump_stack+0x81/0xb3
[0.04]  kasan_report+0x22a/0x25a
[0.04]  __lock_acquire+0x171/0x13d0
[0.04]  ? lookup_chain_cache+0x42/0x6b
[0.04]  ? mark_lock+0x25b/0x26d
[0.04]  ? rcu_report_exp_cpu_mult+0x21/0x6d
[0.04]  ? debug_check_no_locks_freed+0x19f/0x19f
[0.04]  ? debug_check_no_locks_freed+0x19f/0x19f
[0.04]  ? acpi_hw_read+0x1a0/0x202
[0.04]  ? rcu_report_exp_cpu_mult+0x21/0x6d
[0.04]  ? lock_acquire+0x1c0/0x209
[0.04]  lock_acquire+0x1c0/0x209
[0.04]  ? rcu_report_exp_cpu_mult+0x21/0x6d
[0.04]  _raw_spin_lock_irqsave+0x43/0x56
[0.04]  ? rcu_report_exp_cpu_mult+0x21/0x6d
[0.04]  rcu_report_exp_cpu_mult+0x21/0x6d
[0.04]  ? sync_sched_exp_handler+0x111/0x111
[0.04]  sync_rcu_exp_select_cpus+0x2ff/0x412
[0.04]  ? rcu_read_lock_sched_held+0x60/0x66
[0.04]  ? sync_sched_exp_handler+0x111/0x111
[0.04]  _synchronize_rcu_expedited+0x427/0x5ba
[0.04]  ? signal_pending+0x15/0x15
[0.04]  ? acpi_hw_write_pm1_control+0x52/0x52
[0.04]  ? acpi_hw_write_pm1_control+0x52/0x52
[0.04]  ? __change_page_attr_set_clr+0x420/0x420
[0.04]  ? printk+0x94/0xb0
[0.04]  ? show_regs_print_info+0xa/0xa
[0.04]  ? lock_downgrade+0x26a/0x26a
[0.04]  ? acpi_read_bit_register+0xb1/0xde
[0.04]  ? acpi_read+0xa/0xa
[0.04]  ? acpi_read+0xa/0xa
[0.04]  ? acpi_hw_get_mode+0x91/0xc2
[0.04]  ? _find_next_bit+0x3f/0xe4
[0.04]  ? __lock_is_held+0x2a/0x87
[0.04]  ? lock_is_held_type+0x78/0x86
[0.04]  rcu_test_sync_prims+0xa/0x23
[0.04]  rest_init+0xb/0xcf
[0.04]  start_kernel+0x59a/0x5be
[0.04]  ? mem_encrypt_init+0x6/0x6
[0.04]  ? memcpy_orig+0x54/0x110
[0.04]  ? x86_family+0x5/0x1d
[0.04]  ? load_ucode_bsp+0x3a/0xab
[0.04]  secondary_startup_64+0xa5/0xb0
[0.04] 
==
[0.04] Disabling lock debugging due to kernel taint
[0.04] BUG: unable to handle kernel NULL pointer dereference at 
0018
[0.04] IP: __lock_acquire+0x171/0x13d0
[0.04] PGD 0 P4D 0 
[0.04] Oops:  [#1] PREEMPT SMP KASAN PTI
[0.04] Modules linked in:
[0.04] CPU: 0 PID: 0 Comm: swapper/0 Tainted: GB
4.16.0-rc1-00044-g056becf #1
[0.04] RIP: 0010:__lock_acquire+0x171/0x13d0
[0.04] RSP: :b6a079a0 EFLAGS: 00010056
[0.04] RAX: 0096 RBX:  RCX: b50c9e31
[0.04] RDX: 0003 RSI: 0003 RDI: 0001
[0.04]

[lkp-robot] [rcu] 056becf54e: BUG:KASAN:null-ptr-deref_in__lock_acquire

2018-02-23 Thread kernel test robot

TO: Paul E. McKenney 
CC: LKML , Paul E. McKenney 
, linux-kernel@vger.kernel.org, l...@01.org



FYI, we noticed the following commit (built with gcc-7):

commit: 056becf54ef1ab39db14a66625353899dba6762f ("rcu: Parallelize expedited 
grace-period initialization")
https://git.kernel.org/cgit/linux/kernel/git/paulmck/linux-rcu.git rcu/dev

in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -smp 2 -m 512M

caused below changes (please refer to attached dmesg/kmsg for entire 
log/backtrace):


+--+++
|  | 28ea7ed1b3 
| 056becf54e |
+--+++
| boot_successes   | 2  
| 0  |
| boot_failures| 6  
| 41 |
| invoked_oom-killer:gfp_mask=0x   | 6  
||
| Mem-Info | 6  
||
| Kernel_panic-not_syncing:Out_of_memory_and_no_killable_processes | 6  
||
| BUG:KASAN:null-ptr-deref_in__lock_acquire| 0  
| 41 |
| BUG:unable_to_handle_kernel  | 0  
| 41 |
| Oops:#[##]   | 0  
| 41 |
| RIP:__lock_acquire   | 0  
| 41 |
| Kernel_panic-not_syncing:Fatal_exception | 0  
| 41 |
+--+++



[0.037875] BUG: KASAN: null-ptr-deref in __lock_acquire+0x171/0x13d0
[0.04] Read of size 8 at addr 0018 by task swapper/0/0
[0.04] 
[0.04] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
4.16.0-rc1-00044-g056becf #1
[0.04] Call Trace:
[0.04]  dump_stack+0x81/0xb3
[0.04]  kasan_report+0x22a/0x25a
[0.04]  __lock_acquire+0x171/0x13d0
[0.04]  ? lookup_chain_cache+0x42/0x6b
[0.04]  ? mark_lock+0x25b/0x26d
[0.04]  ? rcu_report_exp_cpu_mult+0x21/0x6d
[0.04]  ? debug_check_no_locks_freed+0x19f/0x19f
[0.04]  ? debug_check_no_locks_freed+0x19f/0x19f
[0.04]  ? acpi_hw_read+0x1a0/0x202
[0.04]  ? rcu_report_exp_cpu_mult+0x21/0x6d
[0.04]  ? lock_acquire+0x1c0/0x209
[0.04]  lock_acquire+0x1c0/0x209
[0.04]  ? rcu_report_exp_cpu_mult+0x21/0x6d
[0.04]  _raw_spin_lock_irqsave+0x43/0x56
[0.04]  ? rcu_report_exp_cpu_mult+0x21/0x6d
[0.04]  rcu_report_exp_cpu_mult+0x21/0x6d
[0.04]  ? sync_sched_exp_handler+0x111/0x111
[0.04]  sync_rcu_exp_select_cpus+0x2ff/0x412
[0.04]  ? rcu_read_lock_sched_held+0x60/0x66
[0.04]  ? sync_sched_exp_handler+0x111/0x111
[0.04]  _synchronize_rcu_expedited+0x427/0x5ba
[0.04]  ? signal_pending+0x15/0x15
[0.04]  ? acpi_hw_write_pm1_control+0x52/0x52
[0.04]  ? acpi_hw_write_pm1_control+0x52/0x52
[0.04]  ? __change_page_attr_set_clr+0x420/0x420
[0.04]  ? printk+0x94/0xb0
[0.04]  ? show_regs_print_info+0xa/0xa
[0.04]  ? lock_downgrade+0x26a/0x26a
[0.04]  ? acpi_read_bit_register+0xb1/0xde
[0.04]  ? acpi_read+0xa/0xa
[0.04]  ? acpi_read+0xa/0xa
[0.04]  ? acpi_hw_get_mode+0x91/0xc2
[0.04]  ? _find_next_bit+0x3f/0xe4
[0.04]  ? __lock_is_held+0x2a/0x87
[0.04]  ? lock_is_held_type+0x78/0x86
[0.04]  rcu_test_sync_prims+0xa/0x23
[0.04]  rest_init+0xb/0xcf
[0.04]  start_kernel+0x59a/0x5be
[0.04]  ? mem_encrypt_init+0x6/0x6
[0.04]  ? memcpy_orig+0x54/0x110
[0.04]  ? x86_family+0x5/0x1d
[0.04]  ? load_ucode_bsp+0x3a/0xab
[0.04]  secondary_startup_64+0xa5/0xb0
[0.04] 
==
[0.04] Disabling lock debugging due to kernel taint
[0.04] BUG: unable to handle kernel NULL pointer dereference at 
0018
[0.04] IP: __lock_acquire+0x171/0x13d0
[0.04] PGD 0 P4D 0 
[0.04] Oops:  [#1] PREEMPT SMP KASAN PTI
[0.04] Modules linked in:
[0.04] CPU: 0 PID: 0 Comm: swapper/0 Tainted: GB
4.16.0-rc1-00044-g056becf #1
[0.04] RIP: 0010:__lock_acquire+0x171/0x13d0
[0.04] RSP: :b6a079a0 EFLAGS: 00010056
[0.04] RAX: 0096 RBX:  RCX: b50c9e31
[0.04] RDX: 0003 RSI: 0003 RDI: 0001
[0.04] RBP: b6a07b50 R08: dc00 R09: 
[0.04] R10:

[PATCH 1/1] iommu/vt-d: Fix a potential memory leak

2018-02-23 Thread Lu Baolu

A memory block was allocated in intel_svm_bind_mm() but never freed
in a failure path. This patch fixes this by free it to avoid memory
leakage.

Cc: Ashok Raj 
Cc: Jacob Pan 
Cc:  # v4.4+
Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel-svm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 35a408d..3d4b924 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -396,6 +396,7 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int 
flags, struct svm_dev_
pasid_max - 1, GFP_KERNEL);
if (ret < 0) {
kfree(svm);
+   kfree(sdev);
goto out;
}
svm->pasid = ret;
-- 
2.7.4

[PATCH 1/1] iommu/vt-d: Fix a potential memory leak

2018-02-23 Thread Lu Baolu

A memory block was allocated in intel_svm_bind_mm() but never freed
in a failure path. This patch fixes this by free it to avoid memory
leakage.

Cc: Ashok Raj 
Cc: Jacob Pan 
Cc:  # v4.4+
Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel-svm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 35a408d..3d4b924 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -396,6 +396,7 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int 
flags, struct svm_dev_
pasid_max - 1, GFP_KERNEL);
if (ret < 0) {
kfree(svm);
+   kfree(sdev);
goto out;
}
svm->pasid = ret;
-- 
2.7.4

[RFC] vfio iommu type1: improve memory pinning process for raw PFN mapping

2018-02-23 Thread jason

When using vfio to pass through a PCIe device (e.g. a GPU card) that
has a huge BAR (e.g. 16GB), a lot of cycles are wasted on memory
pinning because PFNs of PCI BAR are not backed by struct page, and
the corresponding VMA has flags VM_IO|VM_PFNMAP.

With this change, memory pinning process will firstly try to figure
out whether the corresponding region is a raw PFN mapping, and if so
it can skip unnecessary user memory pinning process.

Even though it commes with a little overhead, finding vma and testing
flags, on each call, it can significantly improve VM's boot up time
when passing through devices via VFIO.
---
 drivers/vfio/vfio_iommu_type1.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index e30e29ae4819..1a471ece3f9c 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -374,6 +374,24 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned 
long vaddr,
return ret;
 }

+static int try_io_pfnmap(struct mm_struct *mm, unsigned long vaddr, long npage,
+unsigned long *pfn)
+{
+   struct vm_area_struct *vma;
+   int pinned = 0;
+
+   down_read(>mmap_sem);
+   vma = find_vma_intersection(mm, vaddr, vaddr + 1);
+   if (vma && vma->vm_flags & (VM_IO | VM_PFNMAP)) {
+   *pfn = ((vaddr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
+   if (is_invalid_reserved_pfn(*pfn))
+   pinned = min(npage, (long)vma_pages(vma));
+   }
+   up_read(>mmap_sem);
+
+   return pinned;
+}
+
 /*
  * Attempt to pin pages.  We really don't want to track all the pfns and
  * the iommu can only map chunks of consecutive pfns anyway, so get the
@@ -392,6 +410,10 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, 
unsigned long vaddr,
if (!current->mm)
return -ENODEV;

+   ret = try_io_pfnmap(current->mm, vaddr, npage, pfn_base);
+   if (ret)
+   return ret;
+
ret = vaddr_get_pfn(current->mm, vaddr, dma->prot, pfn_base);
if (ret)
return ret;
--
2.13.6

[RFC] vfio iommu type1: improve memory pinning process for raw PFN mapping

2018-02-23 Thread jason

When using vfio to pass through a PCIe device (e.g. a GPU card) that
has a huge BAR (e.g. 16GB), a lot of cycles are wasted on memory
pinning because PFNs of PCI BAR are not backed by struct page, and
the corresponding VMA has flags VM_IO|VM_PFNMAP.

With this change, memory pinning process will firstly try to figure
out whether the corresponding region is a raw PFN mapping, and if so
it can skip unnecessary user memory pinning process.

Even though it commes with a little overhead, finding vma and testing
flags, on each call, it can significantly improve VM's boot up time
when passing through devices via VFIO.
---
 drivers/vfio/vfio_iommu_type1.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index e30e29ae4819..1a471ece3f9c 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -374,6 +374,24 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned 
long vaddr,
return ret;
 }

+static int try_io_pfnmap(struct mm_struct *mm, unsigned long vaddr, long npage,
+unsigned long *pfn)
+{
+   struct vm_area_struct *vma;
+   int pinned = 0;
+
+   down_read(>mmap_sem);
+   vma = find_vma_intersection(mm, vaddr, vaddr + 1);
+   if (vma && vma->vm_flags & (VM_IO | VM_PFNMAP)) {
+   *pfn = ((vaddr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
+   if (is_invalid_reserved_pfn(*pfn))
+   pinned = min(npage, (long)vma_pages(vma));
+   }
+   up_read(>mmap_sem);
+
+   return pinned;
+}
+
 /*
  * Attempt to pin pages.  We really don't want to track all the pfns and
  * the iommu can only map chunks of consecutive pfns anyway, so get the
@@ -392,6 +410,10 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, 
unsigned long vaddr,
if (!current->mm)
return -ENODEV;

+   ret = try_io_pfnmap(current->mm, vaddr, npage, pfn_base);
+   if (ret)
+   return ret;
+
ret = vaddr_get_pfn(current->mm, vaddr, dma->prot, pfn_base);
if (ret)
return ret;
--
2.13.6

[PATCH v3 18/18] selftests: ftrace: Add a testcase for array type with kprobe_event

2018-02-23 Thread Masami Hiramatsu

Add a testcase for array type with kprobe event.
This tests good/bad syntax combinations and also
the traced data is correct in several way.
If the kernel doesn't support array type, it skips
the test as UNSUPPORTED.

Signed-off-by: Masami Hiramatsu 
---
 .../ftrace/test.d/kprobe/kprobe_args_array.tc  |   75 
 1 file changed, 75 insertions(+)
 create mode 100644 
tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_array.tc

diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_array.tc 
b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_array.tc
new file mode 100644
index ..27c9628c1fff
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_array.tc
@@ -0,0 +1,75 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: Kprobe event array argument
+
+[ -f kprobe_events ] || exit_unsupported # this is configurable
+
+grep -q "\[\]" README || exit_unsupported # version issue
+
+GOODSYM="_sdata"
+if ! grep -qw ${GOODSYM} /proc/kallsyms ; then
+  GOODSYM="create_trace_kprobe"
+fi
+case `uname -m` in
+x86_64)
+  ARG2=%si
+  OFFS=8
+;;
+i[3456]86)
+  ARG2=%cx
+  OFFS=4
+;;
+aarch64)
+  ARG2=%x1
+  OFFS=8
+;;
+arm*)
+  ARG2=%r1
+  OFFS=4
+;;
+*)
+  echo "Please implement other architecture here"
+  exit_untested
+esac
+
+create_testprobe() { # args
+  echo "p:testprobe create_trace_kprobe $*" > kprobe_events
+}
+
+echo 0 > events/enable
+echo > kprobe_events
+
+: "Syntax test"
+create_testprobe "+0(${ARG2}):x8[1] +0(${ARG2}):s16[1] +0(${ARG2}):u32[1]"
+create_testprobe "+0(${ARG2}):x64[1] +0(${ARG2}):symbol[1]"
+create_testprobe "+0(${ARG2}):b2@3/8[1] +0(${ARG2}):string[1]"
+create_testprobe "+0(${ARG2}):x8[64] @${GOODSYM}:x8[4]"
+
+! create_testprobe "${ARG2}:x8[1]" # Can not use array type on register
+! create_testprobe "\$comm:x8[1]" # Can not use array type on \$comm
+! create_testprobe "\$comm:string[1]" # No, even if it is string array
+! create_testprobe "+0(${ARG2}):x64[0]" # array size >= 1
+! create_testprobe "+0(${ARG2}):x64[65]" # array size <= 64
+
+: "Test get argument (1)"
+create_testprobe "arg1=+0(${ARG2}):string[1]" > kprobe_events
+echo 1 > events/kprobes/testprobe/enable
+! echo test >> kprobe_events
+tail -n 1 trace | grep -qe "testprobe.* arg1={\"test\"}"
+echo 0 > events/kprobes/testprobe/enable
+
+: "Test get argument (2)"
+create_testprobe "arg1=+0(${ARG2}):string[3]" > kprobe_events
+echo 1 > events/kprobes/testprobe/enable
+! echo foo bar buzz >> kprobe_events
+tail -n 1 trace | grep -qe "testprobe.* arg1={\"foo\",\"bar\",\"buzz\"}"
+echo 0 > events/kprobes/testprobe/enable
+
+: "Test get argument (3)"
+create_testprobe "arg1=+0(+0(${ARG2})):u8[4]" > kprobe_events
+echo 1 > events/kprobes/testprobe/enable
+! echo 1234 >> kprobe_events
+tail -n 1 trace | grep -qe "testprobe.* arg1={49,50,51,52}" # ascii code
+echo 0 > events/kprobes/testprobe/enable
+
+echo > kprobe_events

[PATCH v3 18/18] selftests: ftrace: Add a testcase for array type with kprobe_event

2018-02-23 Thread Masami Hiramatsu

Add a testcase for array type with kprobe event.
This tests good/bad syntax combinations and also
the traced data is correct in several way.
If the kernel doesn't support array type, it skips
the test as UNSUPPORTED.

Signed-off-by: Masami Hiramatsu 
---
 .../ftrace/test.d/kprobe/kprobe_args_array.tc  |   75 
 1 file changed, 75 insertions(+)
 create mode 100644 
tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_array.tc

diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_array.tc 
b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_array.tc
new file mode 100644
index ..27c9628c1fff
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_array.tc
@@ -0,0 +1,75 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: Kprobe event array argument
+
+[ -f kprobe_events ] || exit_unsupported # this is configurable
+
+grep -q "\[\]" README || exit_unsupported # version issue
+
+GOODSYM="_sdata"
+if ! grep -qw ${GOODSYM} /proc/kallsyms ; then
+  GOODSYM="create_trace_kprobe"
+fi
+case `uname -m` in
+x86_64)
+  ARG2=%si
+  OFFS=8
+;;
+i[3456]86)
+  ARG2=%cx
+  OFFS=4
+;;
+aarch64)
+  ARG2=%x1
+  OFFS=8
+;;
+arm*)
+  ARG2=%r1
+  OFFS=4
+;;
+*)
+  echo "Please implement other architecture here"
+  exit_untested
+esac
+
+create_testprobe() { # args
+  echo "p:testprobe create_trace_kprobe $*" > kprobe_events
+}
+
+echo 0 > events/enable
+echo > kprobe_events
+
+: "Syntax test"
+create_testprobe "+0(${ARG2}):x8[1] +0(${ARG2}):s16[1] +0(${ARG2}):u32[1]"
+create_testprobe "+0(${ARG2}):x64[1] +0(${ARG2}):symbol[1]"
+create_testprobe "+0(${ARG2}):b2@3/8[1] +0(${ARG2}):string[1]"
+create_testprobe "+0(${ARG2}):x8[64] @${GOODSYM}:x8[4]"
+
+! create_testprobe "${ARG2}:x8[1]" # Can not use array type on register
+! create_testprobe "\$comm:x8[1]" # Can not use array type on \$comm
+! create_testprobe "\$comm:string[1]" # No, even if it is string array
+! create_testprobe "+0(${ARG2}):x64[0]" # array size >= 1
+! create_testprobe "+0(${ARG2}):x64[65]" # array size <= 64
+
+: "Test get argument (1)"
+create_testprobe "arg1=+0(${ARG2}):string[1]" > kprobe_events
+echo 1 > events/kprobes/testprobe/enable
+! echo test >> kprobe_events
+tail -n 1 trace | grep -qe "testprobe.* arg1={\"test\"}"
+echo 0 > events/kprobes/testprobe/enable
+
+: "Test get argument (2)"
+create_testprobe "arg1=+0(${ARG2}):string[3]" > kprobe_events
+echo 1 > events/kprobes/testprobe/enable
+! echo foo bar buzz >> kprobe_events
+tail -n 1 trace | grep -qe "testprobe.* arg1={\"foo\",\"bar\",\"buzz\"}"
+echo 0 > events/kprobes/testprobe/enable
+
+: "Test get argument (3)"
+create_testprobe "arg1=+0(+0(${ARG2})):u8[4]" > kprobe_events
+echo 1 > events/kprobes/testprobe/enable
+! echo 1234 >> kprobe_events
+tail -n 1 trace | grep -qe "testprobe.* arg1={49,50,51,52}" # ascii code
+echo 0 > events/kprobes/testprobe/enable
+
+echo > kprobe_events

[PATCH v3 17/18] selftests: ftrace: Add a testcase for $argN with kprobe_event

2018-02-23 Thread Masami Hiramatsu

Add a testcase for array type with kprobe event.
This tests whether the traced data is correct or not.
If the kernel doesn't support array type, it skips
the test as UNSUPPORTED.

Signed-off-by: Masami Hiramatsu 
---
 .../ftrace/test.d/kprobe/kprobe_args_argN.tc   |   25 
 1 file changed, 25 insertions(+)
 create mode 100644 
tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_argN.tc

diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_argN.tc 
b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_argN.tc
new file mode 100644
index ..d5c5c8c3a51e
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_argN.tc
@@ -0,0 +1,25 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: Kprobe event argN argument
+
+[ -f kprobe_events ] || exit_unsupported # this is configurable
+
+grep -q "arg" README || exit_unsupported # version issue
+
+echo 0 > events/enable
+echo > kprobe_events
+
+: "Test bad pattern : arg0 is not allowed"
+! echo 'p:testprobe create_trace_kprobe $arg0' > kprobe_events
+
+: "Test get argument"
+echo 'p:testprobe create_trace_kprobe $arg1' > kprobe_events
+echo 1 > events/kprobes/testprobe/enable
+! echo test >> kprobe_events
+tail -n 1 trace | grep -qe "testprobe.* arg1=0x1"
+
+! echo test test test >> kprobe_events
+tail -n 1 trace | grep -qe "testprobe.* arg1=0x3"
+
+echo 0 > events/enable
+echo > kprobe_events

[PATCH v3 17/18] selftests: ftrace: Add a testcase for $argN with kprobe_event

2018-02-23 Thread Masami Hiramatsu

Add a testcase for array type with kprobe event.
This tests whether the traced data is correct or not.
If the kernel doesn't support array type, it skips
the test as UNSUPPORTED.

Signed-off-by: Masami Hiramatsu 
---
 .../ftrace/test.d/kprobe/kprobe_args_argN.tc   |   25 
 1 file changed, 25 insertions(+)
 create mode 100644 
tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_argN.tc

diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_argN.tc 
b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_argN.tc
new file mode 100644
index ..d5c5c8c3a51e
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_argN.tc
@@ -0,0 +1,25 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: Kprobe event argN argument
+
+[ -f kprobe_events ] || exit_unsupported # this is configurable
+
+grep -q "arg" README || exit_unsupported # version issue
+
+echo 0 > events/enable
+echo > kprobe_events
+
+: "Test bad pattern : arg0 is not allowed"
+! echo 'p:testprobe create_trace_kprobe $arg0' > kprobe_events
+
+: "Test get argument"
+echo 'p:testprobe create_trace_kprobe $arg1' > kprobe_events
+echo 1 > events/kprobes/testprobe/enable
+! echo test >> kprobe_events
+tail -n 1 trace | grep -qe "testprobe.* arg1=0x1"
+
+! echo test test test >> kprobe_events
+tail -n 1 trace | grep -qe "testprobe.* arg1=0x3"
+
+echo 0 > events/enable
+echo > kprobe_events

[PATCH v3 15/18] tracing: probeevent: Add array type support

2018-02-23 Thread Masami Hiramatsu

Add array type support for probe events.
This allows user to get arraied types from memory address.
The array type syntax is

TYPE[N]

Where TYPE is one of types (u8/16/32/64,s8/16/32/64,
x8/16/32/64, symbol, string) and N is a fixed value less
than 64.

The string array type is a bit different from other types. For
other base types, [1] is equal to 
(e.g. +0(%di):x32[1] is same as +0(%di):x32.) But string[1] is not
equal to string. The string type itself represents "char array",
but string array type represents "char * array". So, for example,
+0(%di):string[1] is equal to +0(+0(%di)):string.

Signed-off-by: Masami Hiramatsu 
---
 Changes in v2:
  - Add array description in README file
  - Fix to init s3 code out of loop.
  - Fix to proceed code when the last code is OP_ARRAY.
  - Add string array type and bitfield array type.
---
 Documentation/trace/kprobetrace.txt |   13 
 kernel/trace/trace.c|3 +
 kernel/trace/trace_probe.c  |  129 +++
 kernel/trace/trace_probe.h  |   14 
 kernel/trace/trace_probe_tmpl.h |   63 +++--
 5 files changed, 183 insertions(+), 39 deletions(-)

diff --git a/Documentation/trace/kprobetrace.txt 
b/Documentation/trace/kprobetrace.txt
index 1d082f8ffeee..8bf752dfc072 100644
--- a/Documentation/trace/kprobetrace.txt
+++ b/Documentation/trace/kprobetrace.txt
@@ -65,9 +65,22 @@ in decimal ('s' and 'u') or hexadecimal ('x'). Without type 
casting, 'x32'
 or 'x64' is used depends on the architecture (e.g. x86-32 uses x32, and
 x86-64 uses x64).
 
+These value types can be an array. To record array data, you can add '[N]'
+(where N is a fixed number, less than 64) to the base type.
+E.g. 'x16[4]' means an array of x16 (2bytes hex) with 4 elements.
+Note that the array can be applied to memory type fetchargs, you can not
+apply it to registers/stack-entries etc. (for example, '$stack1:x8[8]' is
+wrong, but '+8($stack):x8[8]' is OK.)
+
 String type is a special type, which fetches a "null-terminated" string from
 kernel space. This means it will fail and store NULL if the string container
 has been paged out.
+The string array type is a bit different from other types. For other base
+types, [1] is equal to  (e.g. +0(%di):x32[1] is same
+as +0(%di):x32.) But string[1] is not equal to string. The string type itself
+represents "char array", but string array type represents "char * array".
+So, for example, +0(%di):string[1] is equal to +0(+0(%di)):string.
+
 Bitfield is another special type, which takes 3 parameters, bit-width, bit-
 offset, and container-size (usually 32). The syntax is;
 
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index bcd1fd87082d..b7c6698265e5 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4614,7 +4614,8 @@ static const char readme_msg[] =
"\t   $stack, $stack, $retval, $comm\n"
 #endif
"\t type: s8/16/32/64, u8/16/32/64, x8/16/32/64, string, symbol,\n"
-   "\t   b@/\n"
+   "\t   b@/,\n"
+   "\t   []\n"
 #endif
"  events/\t\t- Directory containing all trace event subsystems:\n"
"  enable\t\t- Write 0/1 to enable/disable tracing of all events\n"
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 84887754702a..73359d248523 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -363,9 +363,9 @@ static int __parse_bitfield_probe_arg(const char *bf,
 int traceprobe_parse_probe_arg(char *arg, ssize_t *size,
struct probe_arg *parg, unsigned int flags)
 {
-   struct fetch_insn *code, *tmp = NULL;
-   const char *t;
-   int ret;
+   struct fetch_insn *code, *scode, *tmp = NULL;
+   char *t, *t2;
+   int ret, len;
 
if (strlen(arg) > MAX_ARGSTR_LEN) {
pr_info("Argument is too long.: %s\n",  arg);
@@ -376,24 +376,42 @@ int traceprobe_parse_probe_arg(char *arg, ssize_t *size,
pr_info("Failed to allocate memory for command '%s'.\n", arg);
return -ENOMEM;
}
-   t = strchr(parg->comm, ':');
+   t = strchr(arg, ':');
if (t) {
-   arg[t - parg->comm] = '\0';
-   t++;
+   *t = '\0';
+   t2 = strchr(++t, '[');
+   if (t2) {
+   *t2 = '\0';
+   parg->count = simple_strtoul(t2 + 1, , 0);
+   if (strcmp(t2, "]") || parg->count == 0)
+   return -EINVAL;
+   if (parg->count > MAX_ARRAY_LEN)
+   return -E2BIG;
+   }
}
/*
 * The default type of $comm should be "string", and it can't be
 * dereferenced.
 */
if (!t && strcmp(arg, "$comm") == 0)
-   t = "string";
-   parg->type = find_fetch_type(t);
+   parg->type =

[PATCH v3 15/18] tracing: probeevent: Add array type support

2018-02-23 Thread Masami Hiramatsu

Add array type support for probe events.
This allows user to get arraied types from memory address.
The array type syntax is

TYPE[N]

Where TYPE is one of types (u8/16/32/64,s8/16/32/64,
x8/16/32/64, symbol, string) and N is a fixed value less
than 64.

The string array type is a bit different from other types. For
other base types, [1] is equal to 
(e.g. +0(%di):x32[1] is same as +0(%di):x32.) But string[1] is not
equal to string. The string type itself represents "char array",
but string array type represents "char * array". So, for example,
+0(%di):string[1] is equal to +0(+0(%di)):string.

Signed-off-by: Masami Hiramatsu 
---
 Changes in v2:
  - Add array description in README file
  - Fix to init s3 code out of loop.
  - Fix to proceed code when the last code is OP_ARRAY.
  - Add string array type and bitfield array type.
---
 Documentation/trace/kprobetrace.txt |   13 
 kernel/trace/trace.c|3 +
 kernel/trace/trace_probe.c  |  129 +++
 kernel/trace/trace_probe.h  |   14 
 kernel/trace/trace_probe_tmpl.h |   63 +++--
 5 files changed, 183 insertions(+), 39 deletions(-)

diff --git a/Documentation/trace/kprobetrace.txt 
b/Documentation/trace/kprobetrace.txt
index 1d082f8ffeee..8bf752dfc072 100644
--- a/Documentation/trace/kprobetrace.txt
+++ b/Documentation/trace/kprobetrace.txt
@@ -65,9 +65,22 @@ in decimal ('s' and 'u') or hexadecimal ('x'). Without type 
casting, 'x32'
 or 'x64' is used depends on the architecture (e.g. x86-32 uses x32, and
 x86-64 uses x64).
 
+These value types can be an array. To record array data, you can add '[N]'
+(where N is a fixed number, less than 64) to the base type.
+E.g. 'x16[4]' means an array of x16 (2bytes hex) with 4 elements.
+Note that the array can be applied to memory type fetchargs, you can not
+apply it to registers/stack-entries etc. (for example, '$stack1:x8[8]' is
+wrong, but '+8($stack):x8[8]' is OK.)
+
 String type is a special type, which fetches a "null-terminated" string from
 kernel space. This means it will fail and store NULL if the string container
 has been paged out.
+The string array type is a bit different from other types. For other base
+types, [1] is equal to  (e.g. +0(%di):x32[1] is same
+as +0(%di):x32.) But string[1] is not equal to string. The string type itself
+represents "char array", but string array type represents "char * array".
+So, for example, +0(%di):string[1] is equal to +0(+0(%di)):string.
+
 Bitfield is another special type, which takes 3 parameters, bit-width, bit-
 offset, and container-size (usually 32). The syntax is;
 
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index bcd1fd87082d..b7c6698265e5 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4614,7 +4614,8 @@ static const char readme_msg[] =
"\t   $stack, $stack, $retval, $comm\n"
 #endif
"\t type: s8/16/32/64, u8/16/32/64, x8/16/32/64, string, symbol,\n"
-   "\t   b@/\n"
+   "\t   b@/,\n"
+   "\t   []\n"
 #endif
"  events/\t\t- Directory containing all trace event subsystems:\n"
"  enable\t\t- Write 0/1 to enable/disable tracing of all events\n"
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 84887754702a..73359d248523 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -363,9 +363,9 @@ static int __parse_bitfield_probe_arg(const char *bf,
 int traceprobe_parse_probe_arg(char *arg, ssize_t *size,
struct probe_arg *parg, unsigned int flags)
 {
-   struct fetch_insn *code, *tmp = NULL;
-   const char *t;
-   int ret;
+   struct fetch_insn *code, *scode, *tmp = NULL;
+   char *t, *t2;
+   int ret, len;
 
if (strlen(arg) > MAX_ARGSTR_LEN) {
pr_info("Argument is too long.: %s\n",  arg);
@@ -376,24 +376,42 @@ int traceprobe_parse_probe_arg(char *arg, ssize_t *size,
pr_info("Failed to allocate memory for command '%s'.\n", arg);
return -ENOMEM;
}
-   t = strchr(parg->comm, ':');
+   t = strchr(arg, ':');
if (t) {
-   arg[t - parg->comm] = '\0';
-   t++;
+   *t = '\0';
+   t2 = strchr(++t, '[');
+   if (t2) {
+   *t2 = '\0';
+   parg->count = simple_strtoul(t2 + 1, , 0);
+   if (strcmp(t2, "]") || parg->count == 0)
+   return -EINVAL;
+   if (parg->count > MAX_ARRAY_LEN)
+   return -E2BIG;
+   }
}
/*
 * The default type of $comm should be "string", and it can't be
 * dereferenced.
 */
if (!t && strcmp(arg, "$comm") == 0)
-   t = "string";
-   parg->type = find_fetch_type(t);
+   parg->type = find_fetch_type("string");

[PATCH v3 16/18] selftests: ftrace: Add a testcase for symbol type

2018-02-23 Thread Masami Hiramatsu

Add a testcase for symbol type with kprobe event.
This tests good/bad syntax combinations and also
the traced data.
If the kernel doesn't support symbol type, it skips
the test as UNSUPPORTED.

Signed-off-by: Masami Hiramatsu 
---
 Changes in v3:
  - Use IP/PC register to test the symbol type
---
 .../ftrace/test.d/kprobe/kprobe_args_symbol.tc |   77 
 1 file changed, 77 insertions(+)
 create mode 100644 
tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_symbol.tc

diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_symbol.tc 
b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_symbol.tc
new file mode 100644
index ..20a8664a838b
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_symbol.tc
@@ -0,0 +1,77 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: Kprobe event argument symbol type
+
+[ -f kprobe_events ] || exit_unsupported # this is configurable
+
+grep -qe "type:.* symbol" README || exit_unsupported # version issue
+
+echo 0 > events/enable
+echo > kprobe_events
+
+PROBEFUNC="vfs_read"
+GOODREG=
+BADREG=
+REG=
+GOODSYM="_sdata"
+if ! grep -qw ${GOODSYM} /proc/kallsyms ; then
+  GOODSYM=$PROBEFUNC
+fi
+
+case `uname -m` in
+x86_64|i[3456]86)
+  GOODREG=%ax
+  BADREG=%ex
+  REG=%ip
+;;
+aarch64)
+  GOODREG=%x0
+  BADREG=%ax
+  REG=%pc
+;;
+arm*)
+  GOODREG=%r0
+  BADREG=%ax
+  REG=%pc
+;;
+*)
+  echo "Please implement other architecture here"
+  exit_untested
+esac
+
+test_goodarg() # Good-args
+{
+  while [ "$1" ]; do
+echo "p ${PROBEFUNC} $1" > kprobe_events
+shift 1
+  done;
+}
+
+test_badarg() # Bad-args
+{
+  while [ "$1" ]; do
+! echo "p ${PROBEFUNC} $1" > kprobe_events
+shift 1
+  done;
+}
+
+echo > kprobe_events
+
+: "Symbol type"
+test_goodarg "${GOODREG}:symbol" "@${GOODSYM}:symbol" "@${GOODSYM}+10:symbol" \
+"\$stack0:symbol" "+0(\$stack):symbol"
+test_badarg "\$comm:symbol"
+
+: "Retval with symbol type"
+echo "r ${PROBEFUNC} \$retval:symbol" > kprobe_events
+
+echo > kprobe_events
+
+: "Test get symbol"
+echo "p:testprobe create_trace_kprobe ${REG}:symbol" > kprobe_events
+echo 1 > events/kprobes/testprobe/enable
+! echo test >> kprobe_events
+tail -n 1 trace | grep -q "arg1=create_trace_kprobe"
+
+echo 0 > events/enable
+echo > kprobe_events

[PATCH v3 16/18] selftests: ftrace: Add a testcase for symbol type

2018-02-23 Thread Masami Hiramatsu

Add a testcase for symbol type with kprobe event.
This tests good/bad syntax combinations and also
the traced data.
If the kernel doesn't support symbol type, it skips
the test as UNSUPPORTED.

Signed-off-by: Masami Hiramatsu 
---
 Changes in v3:
  - Use IP/PC register to test the symbol type
---
 .../ftrace/test.d/kprobe/kprobe_args_symbol.tc |   77 
 1 file changed, 77 insertions(+)
 create mode 100644 
tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_symbol.tc

diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_symbol.tc 
b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_symbol.tc
new file mode 100644
index ..20a8664a838b
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_symbol.tc
@@ -0,0 +1,77 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: Kprobe event argument symbol type
+
+[ -f kprobe_events ] || exit_unsupported # this is configurable
+
+grep -qe "type:.* symbol" README || exit_unsupported # version issue
+
+echo 0 > events/enable
+echo > kprobe_events
+
+PROBEFUNC="vfs_read"
+GOODREG=
+BADREG=
+REG=
+GOODSYM="_sdata"
+if ! grep -qw ${GOODSYM} /proc/kallsyms ; then
+  GOODSYM=$PROBEFUNC
+fi
+
+case `uname -m` in
+x86_64|i[3456]86)
+  GOODREG=%ax
+  BADREG=%ex
+  REG=%ip
+;;
+aarch64)
+  GOODREG=%x0
+  BADREG=%ax
+  REG=%pc
+;;
+arm*)
+  GOODREG=%r0
+  BADREG=%ax
+  REG=%pc
+;;
+*)
+  echo "Please implement other architecture here"
+  exit_untested
+esac
+
+test_goodarg() # Good-args
+{
+  while [ "$1" ]; do
+echo "p ${PROBEFUNC} $1" > kprobe_events
+shift 1
+  done;
+}
+
+test_badarg() # Bad-args
+{
+  while [ "$1" ]; do
+! echo "p ${PROBEFUNC} $1" > kprobe_events
+shift 1
+  done;
+}
+
+echo > kprobe_events
+
+: "Symbol type"
+test_goodarg "${GOODREG}:symbol" "@${GOODSYM}:symbol" "@${GOODSYM}+10:symbol" \
+"\$stack0:symbol" "+0(\$stack):symbol"
+test_badarg "\$comm:symbol"
+
+: "Retval with symbol type"
+echo "r ${PROBEFUNC} \$retval:symbol" > kprobe_events
+
+echo > kprobe_events
+
+: "Test get symbol"
+echo "p:testprobe create_trace_kprobe ${REG}:symbol" > kprobe_events
+echo 1 > events/kprobes/testprobe/enable
+! echo test >> kprobe_events
+tail -n 1 trace | grep -q "arg1=create_trace_kprobe"
+
+echo 0 > events/enable
+echo > kprobe_events

[PATCH v3 14/18] tracing: probeevent: Add $argN for accessing function args

2018-02-23 Thread Masami Hiramatsu

Add $argN special fetch variable for accessing function
arguments. This allows user to trace the Nth argument easily
at the function entry.

Note that this returns most probably assignment of registers
and stacks. In some case, it may not work well. If you need
to access correct registers or stacks you should use perf-probe.

Signed-off-by: Masami Hiramatsu 
---
 Changes in v2:
  - Add $argN in README file
  - Make N start from 1 as same as auto-generate event argument
names.
 Changes in v3:
  - Show $arg in README only when this feature is supported.
---
 Documentation/trace/kprobetrace.txt |   10 ++
 kernel/trace/trace.c|4 
 kernel/trace/trace_kprobe.c |   18 +-
 kernel/trace/trace_probe.c  |   36 ++-
 kernel/trace/trace_probe.h  |9 -
 kernel/trace/trace_uprobe.c |2 +-
 6 files changed, 55 insertions(+), 24 deletions(-)

diff --git a/Documentation/trace/kprobetrace.txt 
b/Documentation/trace/kprobetrace.txt
index d49381f2e411..1d082f8ffeee 100644
--- a/Documentation/trace/kprobetrace.txt
+++ b/Documentation/trace/kprobetrace.txt
@@ -43,16 +43,18 @@ Synopsis of kprobe_events
   @SYM[+|-offs]: Fetch memory at SYM +|- offs (SYM should be a data 
symbol)
   $stackN  : Fetch Nth entry of stack (N >= 0)
   $stack   : Fetch stack address.
-  $retval  : Fetch return value.(*)
+  $argN: Fetch the Nth function argument. (N >= 1) (*1)
+  $retval  : Fetch return value.(*2)
   $comm: Fetch current task comm.
-  +|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address.(**)
+  +|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address.(*3)
   NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
   FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types
  (u8/u16/u32/u64/s8/s16/s32/s64), hexadecimal types
  (x8/x16/x32/x64), "string" and bitfield are supported.
 
-  (*) only for return probe.
-  (**) this is useful for fetching a field of data structures.
+  (*1) only for the probe on function entry (offs == 0).
+  (*2) only for return probe.
+  (*3) this is useful for fetching a field of data structures.
 
 Types
 -
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 8f08811d15b8..bcd1fd87082d 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4608,7 +4608,11 @@ static const char readme_msg[] =
 #endif
"\t args: =fetcharg[:type]\n"
"\t fetcharg: %, @, @[+|-],\n"
+#ifdef CONFIG_HAVE_FUNCTION_ARG_ACCESS_API
+   "\t   $stack, $stack, $retval, $comm, $arg\n"
+#else
"\t   $stack, $stack, $retval, $comm\n"
+#endif
"\t type: s8/16/32/64, u8/16/32/64, x8/16/32/64, string, symbol,\n"
"\t   b@/\n"
 #endif
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 292d5ae6d18b..e5a52b5f70ec 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -490,13 +490,15 @@ static int create_trace_kprobe(int argc, char **argv)
long offset = 0;
void *addr = NULL;
char buf[MAX_EVENT_NAME_LEN];
+   unsigned int flags = TPARG_FL_KERNEL;
 
/* argc must be >= 1 */
if (argv[0][0] == 'p')
is_return = false;
-   else if (argv[0][0] == 'r')
+   else if (argv[0][0] == 'r') {
is_return = true;
-   else if (argv[0][0] == '-')
+   flags |= TPARG_FL_RETURN;
+   } else if (argv[0][0] == '-')
is_delete = true;
else {
pr_info("Probe definition must be started with 'p', 'r' or"
@@ -579,8 +581,9 @@ static int create_trace_kprobe(int argc, char **argv)
pr_info("Failed to parse either an address or a 
symbol.\n");
return ret;
}
-   if (offset && is_return &&
-   !kprobe_on_func_entry(NULL, symbol, offset)) {
+   if (kprobe_on_func_entry(NULL, symbol, offset))
+   flags |= TPARG_FL_FENTRY;
+   if (offset && is_return && !(flags & TPARG_FL_FENTRY)) {
pr_info("Given offset is not valid for return 
probe.\n");
return -EINVAL;
}
@@ -650,7 +653,7 @@ static int create_trace_kprobe(int argc, char **argv)
 
/* Parse fetch argument */
ret = traceprobe_parse_probe_arg(arg, >tp.size, parg,
-is_return, true);
+flags);
if (ret) {
pr_info("Parse error at argument[%d]. (%d)\n", i, ret);
goto error;
@@ -890,6 +893,11 @@ process_fetch_insn(struct fetch_insn *code, struct pt_regs 
*regs, void *dest,
case FETCH_OP_COMM:

[PATCH v3 14/18] tracing: probeevent: Add $argN for accessing function args

2018-02-23 Thread Masami Hiramatsu

Add $argN special fetch variable for accessing function
arguments. This allows user to trace the Nth argument easily
at the function entry.

Note that this returns most probably assignment of registers
and stacks. In some case, it may not work well. If you need
to access correct registers or stacks you should use perf-probe.

Signed-off-by: Masami Hiramatsu 
---
 Changes in v2:
  - Add $argN in README file
  - Make N start from 1 as same as auto-generate event argument
names.
 Changes in v3:
  - Show $arg in README only when this feature is supported.
---
 Documentation/trace/kprobetrace.txt |   10 ++
 kernel/trace/trace.c|4 
 kernel/trace/trace_kprobe.c |   18 +-
 kernel/trace/trace_probe.c  |   36 ++-
 kernel/trace/trace_probe.h  |9 -
 kernel/trace/trace_uprobe.c |2 +-
 6 files changed, 55 insertions(+), 24 deletions(-)

diff --git a/Documentation/trace/kprobetrace.txt 
b/Documentation/trace/kprobetrace.txt
index d49381f2e411..1d082f8ffeee 100644
--- a/Documentation/trace/kprobetrace.txt
+++ b/Documentation/trace/kprobetrace.txt
@@ -43,16 +43,18 @@ Synopsis of kprobe_events
   @SYM[+|-offs]: Fetch memory at SYM +|- offs (SYM should be a data 
symbol)
   $stackN  : Fetch Nth entry of stack (N >= 0)
   $stack   : Fetch stack address.
-  $retval  : Fetch return value.(*)
+  $argN: Fetch the Nth function argument. (N >= 1) (*1)
+  $retval  : Fetch return value.(*2)
   $comm: Fetch current task comm.
-  +|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address.(**)
+  +|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address.(*3)
   NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
   FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types
  (u8/u16/u32/u64/s8/s16/s32/s64), hexadecimal types
  (x8/x16/x32/x64), "string" and bitfield are supported.
 
-  (*) only for return probe.
-  (**) this is useful for fetching a field of data structures.
+  (*1) only for the probe on function entry (offs == 0).
+  (*2) only for return probe.
+  (*3) this is useful for fetching a field of data structures.
 
 Types
 -
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 8f08811d15b8..bcd1fd87082d 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4608,7 +4608,11 @@ static const char readme_msg[] =
 #endif
"\t args: =fetcharg[:type]\n"
"\t fetcharg: %, @, @[+|-],\n"
+#ifdef CONFIG_HAVE_FUNCTION_ARG_ACCESS_API
+   "\t   $stack, $stack, $retval, $comm, $arg\n"
+#else
"\t   $stack, $stack, $retval, $comm\n"
+#endif
"\t type: s8/16/32/64, u8/16/32/64, x8/16/32/64, string, symbol,\n"
"\t   b@/\n"
 #endif
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 292d5ae6d18b..e5a52b5f70ec 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -490,13 +490,15 @@ static int create_trace_kprobe(int argc, char **argv)
long offset = 0;
void *addr = NULL;
char buf[MAX_EVENT_NAME_LEN];
+   unsigned int flags = TPARG_FL_KERNEL;
 
/* argc must be >= 1 */
if (argv[0][0] == 'p')
is_return = false;
-   else if (argv[0][0] == 'r')
+   else if (argv[0][0] == 'r') {
is_return = true;
-   else if (argv[0][0] == '-')
+   flags |= TPARG_FL_RETURN;
+   } else if (argv[0][0] == '-')
is_delete = true;
else {
pr_info("Probe definition must be started with 'p', 'r' or"
@@ -579,8 +581,9 @@ static int create_trace_kprobe(int argc, char **argv)
pr_info("Failed to parse either an address or a 
symbol.\n");
return ret;
}
-   if (offset && is_return &&
-   !kprobe_on_func_entry(NULL, symbol, offset)) {
+   if (kprobe_on_func_entry(NULL, symbol, offset))
+   flags |= TPARG_FL_FENTRY;
+   if (offset && is_return && !(flags & TPARG_FL_FENTRY)) {
pr_info("Given offset is not valid for return 
probe.\n");
return -EINVAL;
}
@@ -650,7 +653,7 @@ static int create_trace_kprobe(int argc, char **argv)
 
/* Parse fetch argument */
ret = traceprobe_parse_probe_arg(arg, >tp.size, parg,
-is_return, true);
+flags);
if (ret) {
pr_info("Parse error at argument[%d]. (%d)\n", i, ret);
goto error;
@@ -890,6 +893,11 @@ process_fetch_insn(struct fetch_insn *code, struct pt_regs 
*regs, void *dest,
case FETCH_OP_COMM:
val = (unsigned

[PATCH v3 13/18] x86: ptrace: Add function argument access API

2018-02-23 Thread Masami Hiramatsu

Add regs_get_argument() which returns N th argument of the
function call.
Note that this chooses most probably assignment, in some case
it can be incorrect (e.g. passing data structure or floating
point etc.)

This is expected to be called from kprobes or ftrace with regs
where the top of stack is the return address.

Signed-off-by: Masami Hiramatsu 
---
 arch/Kconfig  |7 +++
 arch/x86/Kconfig  |1 +
 arch/x86/include/asm/ptrace.h |   38 ++
 3 files changed, 46 insertions(+)

diff --git a/arch/Kconfig b/arch/Kconfig
index 76c0b54443b1..4126ad4b122c 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -272,6 +272,13 @@ config HAVE_REGS_AND_STACK_ACCESS_API
  declared in asm/ptrace.h
  For example the kprobes-based event tracer needs this API.
 
+config HAVE_FUNCTION_ARG_ACCESS_API
+   bool
+   help
+ This symbol should be selected by an architecure if it supports
+ the API needed to access function arguments from pt_regs,
+ declared in asm/ptrace.h
+
 config HAVE_CLK
bool
help
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 552b3d0eae36..eb0cad381ace 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -176,6 +176,7 @@ config X86
select HAVE_PERF_USER_STACK_DUMP
select HAVE_RCU_TABLE_FREE
select HAVE_REGS_AND_STACK_ACCESS_API
+   select HAVE_FUNCTION_ARG_ACCESS_API
select HAVE_RELIABLE_STACKTRACE if X86_64 && 
UNWINDER_FRAME_POINTER && STACK_VALIDATION
select HAVE_STACK_VALIDATIONif X86_64
select HAVE_SYSCALL_TRACEPOINTS
diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index 6de1fd3d0097..c2304b25e2fd 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -256,6 +256,44 @@ static inline unsigned long 
regs_get_kernel_stack_nth(struct pt_regs *regs,
return 0;
 }
 
+/**
+ * regs_get_kernel_argument() - get Nth function argument in kernel
+ * @regs:  pt_regs of that context
+ * @n: function argument number (start from 0)
+ *
+ * regs_get_argument() returns @n th argument of the function call.
+ * Note that this chooses most probably assignment, in some case
+ * it can be incorrect.
+ * This is expected to be called from kprobes or ftrace with regs
+ * where the top of stack is the return address.
+ */
+static inline unsigned long regs_get_kernel_argument(struct pt_regs *regs,
+unsigned int n)
+{
+   static const unsigned int argument_offs[] = {
+#ifdef __i386__
+   offsetof(struct pt_regs, ax),
+   offsetof(struct pt_regs, cx),
+   offsetof(struct pt_regs, dx),
+#define NR_REG_ARGUMENTS 3
+#else
+   offsetof(struct pt_regs, di),
+   offsetof(struct pt_regs, si),
+   offsetof(struct pt_regs, dx),
+   offsetof(struct pt_regs, cx),
+   offsetof(struct pt_regs, r8),
+   offsetof(struct pt_regs, r9),
+#define NR_REG_ARGUMENTS 6
+#endif
+   };
+
+   if (n >= NR_REG_ARGUMENTS) {
+   n -= NR_REG_ARGUMENTS - 1;
+   return regs_get_kernel_stack_nth(regs, n);
+   } else
+   return regs_get_register(regs, argument_offs[n]);
+}
+
 #define arch_has_single_step() (1)
 #ifdef CONFIG_X86_DEBUGCTLMSR
 #define arch_has_block_step()  (1)

[PATCH v3 13/18] x86: ptrace: Add function argument access API

2018-02-23 Thread Masami Hiramatsu

Add regs_get_argument() which returns N th argument of the
function call.
Note that this chooses most probably assignment, in some case
it can be incorrect (e.g. passing data structure or floating
point etc.)

This is expected to be called from kprobes or ftrace with regs
where the top of stack is the return address.

Signed-off-by: Masami Hiramatsu 
---
 arch/Kconfig  |7 +++
 arch/x86/Kconfig  |1 +
 arch/x86/include/asm/ptrace.h |   38 ++
 3 files changed, 46 insertions(+)

diff --git a/arch/Kconfig b/arch/Kconfig
index 76c0b54443b1..4126ad4b122c 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -272,6 +272,13 @@ config HAVE_REGS_AND_STACK_ACCESS_API
  declared in asm/ptrace.h
  For example the kprobes-based event tracer needs this API.
 
+config HAVE_FUNCTION_ARG_ACCESS_API
+   bool
+   help
+ This symbol should be selected by an architecure if it supports
+ the API needed to access function arguments from pt_regs,
+ declared in asm/ptrace.h
+
 config HAVE_CLK
bool
help
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 552b3d0eae36..eb0cad381ace 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -176,6 +176,7 @@ config X86
select HAVE_PERF_USER_STACK_DUMP
select HAVE_RCU_TABLE_FREE
select HAVE_REGS_AND_STACK_ACCESS_API
+   select HAVE_FUNCTION_ARG_ACCESS_API
select HAVE_RELIABLE_STACKTRACE if X86_64 && 
UNWINDER_FRAME_POINTER && STACK_VALIDATION
select HAVE_STACK_VALIDATIONif X86_64
select HAVE_SYSCALL_TRACEPOINTS
diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index 6de1fd3d0097..c2304b25e2fd 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -256,6 +256,44 @@ static inline unsigned long 
regs_get_kernel_stack_nth(struct pt_regs *regs,
return 0;
 }
 
+/**
+ * regs_get_kernel_argument() - get Nth function argument in kernel
+ * @regs:  pt_regs of that context
+ * @n: function argument number (start from 0)
+ *
+ * regs_get_argument() returns @n th argument of the function call.
+ * Note that this chooses most probably assignment, in some case
+ * it can be incorrect.
+ * This is expected to be called from kprobes or ftrace with regs
+ * where the top of stack is the return address.
+ */
+static inline unsigned long regs_get_kernel_argument(struct pt_regs *regs,
+unsigned int n)
+{
+   static const unsigned int argument_offs[] = {
+#ifdef __i386__
+   offsetof(struct pt_regs, ax),
+   offsetof(struct pt_regs, cx),
+   offsetof(struct pt_regs, dx),
+#define NR_REG_ARGUMENTS 3
+#else
+   offsetof(struct pt_regs, di),
+   offsetof(struct pt_regs, si),
+   offsetof(struct pt_regs, dx),
+   offsetof(struct pt_regs, cx),
+   offsetof(struct pt_regs, r8),
+   offsetof(struct pt_regs, r9),
+#define NR_REG_ARGUMENTS 6
+#endif
+   };
+
+   if (n >= NR_REG_ARGUMENTS) {
+   n -= NR_REG_ARGUMENTS - 1;
+   return regs_get_kernel_stack_nth(regs, n);
+   } else
+   return regs_get_register(regs, argument_offs[n]);
+}
+
 #define arch_has_single_step() (1)
 #ifdef CONFIG_X86_DEBUGCTLMSR
 #define arch_has_block_step()  (1)

[PATCH v3 11/18] tracing: probeevent: Unify fetch_insn processing common part

2018-02-23 Thread Masami Hiramatsu

Unify the fetch_insn bottom process (from stage 2: dereference
indirect data) from kprobe and uprobe events, since those are
mostly same.

Signed-off-by: Masami Hiramatsu 
---
 kernel/trace/trace_kprobe.c |   47 +
 kernel/trace/trace_probe_tmpl.h |   55 ++-
 kernel/trace/trace_uprobe.c |   43 +-
 3 files changed, 63 insertions(+), 82 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 35f81e1eb70e..292d5ae6d18b 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -857,13 +857,18 @@ fetch_store_string(unsigned long addr, void *dest, void 
*base)
return ret;
 }
 
+static nokprobe_inline int
+probe_mem_read(void *dest, void *src, size_t size)
+{
+   return probe_kernel_read(dest, src, size);
+}
+
 /* Note that we don't verify it, since the code does not come from user space 
*/
 static int
 process_fetch_insn(struct fetch_insn *code, struct pt_regs *regs, void *dest,
   void *base)
 {
unsigned long val;
-   int ret = 0;
 
/* 1st stage: get value from context */
switch (code->op) {
@@ -890,45 +895,7 @@ process_fetch_insn(struct fetch_insn *code, struct pt_regs 
*regs, void *dest,
}
code++;
 
-   /* 2nd stage: dereference memory if needed */
-   while (code->op == FETCH_OP_DEREF) {
-   ret = probe_kernel_read(, (void *)val + code->offset,
-   sizeof(val));
-   if (ret)
-   return ret;
-   code++;
-   }
-
-   /* 3rd stage: store value to buffer */
-   if (unlikely(!dest)) {
-   if (code->op == FETCH_OP_ST_STRING)
-   return fetch_store_strlen(val + code->offset);
-   else
-   return -EILSEQ;
-   }
-
-   switch (code->op) {
-   case FETCH_OP_ST_RAW:
-   fetch_store_raw(val, code, dest);
-   break;
-   case FETCH_OP_ST_MEM:
-   probe_kernel_read(dest, (void *)val + code->offset, code->size);
-   break;
-   case FETCH_OP_ST_STRING:
-   ret = fetch_store_string(val + code->offset, dest, base);
-   break;
-   default:
-   return -EILSEQ;
-   }
-   code++;
-
-   /* 4th stage: modify stored value if needed */
-   if (code->op == FETCH_OP_MOD_BF) {
-   fetch_apply_bitfield(code, dest);
-   code++;
-   }
-
-   return code->op == FETCH_OP_END ? ret : -EILSEQ;
+   return process_fetch_insn_bottom(code, val, dest, base);
 }
 NOKPROBE_SYMBOL(process_fetch_insn)
 
diff --git a/kernel/trace/trace_probe_tmpl.h b/kernel/trace/trace_probe_tmpl.h
index d9aebd395a9d..32ae2fc78190 100644
--- a/kernel/trace/trace_probe_tmpl.h
+++ b/kernel/trace/trace_probe_tmpl.h
@@ -49,13 +49,66 @@ fetch_apply_bitfield(struct fetch_insn *code, void *buf)
 }
 
 /*
- * This must be defined for each callsite.
+ * These functions must be defined for each callsite.
  * Return consumed dynamic data size (>= 0), or error (< 0).
  * If dest is NULL, don't store result and return required dynamic data size.
  */
 static int
 process_fetch_insn(struct fetch_insn *code, struct pt_regs *regs,
   void *dest, void *base);
+static nokprobe_inline int fetch_store_strlen(unsigned long addr);
+static nokprobe_inline int
+fetch_store_string(unsigned long addr, void *dest, void *base);
+static nokprobe_inline int
+probe_mem_read(void *dest, void *src, size_t size);
+
+/* From the 2nd stage, routine is same */
+static nokprobe_inline int
+process_fetch_insn_bottom(struct fetch_insn *code, unsigned long val,
+  void *dest, void *base)
+{
+   int ret = 0;
+
+   /* 2nd stage: dereference memory if needed */
+   while (code->op == FETCH_OP_DEREF) {
+   ret = probe_mem_read(, (void *)val + code->offset,
+   sizeof(val));
+   if (ret)
+   return ret;
+   code++;
+   }
+
+   /* 3rd stage: store value to buffer */
+   if (unlikely(!dest)) {
+   if (code->op == FETCH_OP_ST_STRING)
+   return fetch_store_strlen(val + code->offset);
+   else
+   return -EILSEQ;
+   }
+
+   switch (code->op) {
+   case FETCH_OP_ST_RAW:
+   fetch_store_raw(val, code, dest);
+   break;
+   case FETCH_OP_ST_MEM:
+   probe_mem_read(dest, (void *)val + code->offset, code->size);
+   break;
+   case FETCH_OP_ST_STRING:
+   ret = fetch_store_string(val + code->offset, dest, base);
+   break;
+   default:
+   return -EILSEQ;
+   }
+   code++;
+
+   /* 4th stage: modify stored value if

[PATCH v3 11/18] tracing: probeevent: Unify fetch_insn processing common part

2018-02-23 Thread Masami Hiramatsu

Unify the fetch_insn bottom process (from stage 2: dereference
indirect data) from kprobe and uprobe events, since those are
mostly same.

Signed-off-by: Masami Hiramatsu 
---
 kernel/trace/trace_kprobe.c |   47 +
 kernel/trace/trace_probe_tmpl.h |   55 ++-
 kernel/trace/trace_uprobe.c |   43 +-
 3 files changed, 63 insertions(+), 82 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 35f81e1eb70e..292d5ae6d18b 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -857,13 +857,18 @@ fetch_store_string(unsigned long addr, void *dest, void 
*base)
return ret;
 }
 
+static nokprobe_inline int
+probe_mem_read(void *dest, void *src, size_t size)
+{
+   return probe_kernel_read(dest, src, size);
+}
+
 /* Note that we don't verify it, since the code does not come from user space 
*/
 static int
 process_fetch_insn(struct fetch_insn *code, struct pt_regs *regs, void *dest,
   void *base)
 {
unsigned long val;
-   int ret = 0;
 
/* 1st stage: get value from context */
switch (code->op) {
@@ -890,45 +895,7 @@ process_fetch_insn(struct fetch_insn *code, struct pt_regs 
*regs, void *dest,
}
code++;
 
-   /* 2nd stage: dereference memory if needed */
-   while (code->op == FETCH_OP_DEREF) {
-   ret = probe_kernel_read(, (void *)val + code->offset,
-   sizeof(val));
-   if (ret)
-   return ret;
-   code++;
-   }
-
-   /* 3rd stage: store value to buffer */
-   if (unlikely(!dest)) {
-   if (code->op == FETCH_OP_ST_STRING)
-   return fetch_store_strlen(val + code->offset);
-   else
-   return -EILSEQ;
-   }
-
-   switch (code->op) {
-   case FETCH_OP_ST_RAW:
-   fetch_store_raw(val, code, dest);
-   break;
-   case FETCH_OP_ST_MEM:
-   probe_kernel_read(dest, (void *)val + code->offset, code->size);
-   break;
-   case FETCH_OP_ST_STRING:
-   ret = fetch_store_string(val + code->offset, dest, base);
-   break;
-   default:
-   return -EILSEQ;
-   }
-   code++;
-
-   /* 4th stage: modify stored value if needed */
-   if (code->op == FETCH_OP_MOD_BF) {
-   fetch_apply_bitfield(code, dest);
-   code++;
-   }
-
-   return code->op == FETCH_OP_END ? ret : -EILSEQ;
+   return process_fetch_insn_bottom(code, val, dest, base);
 }
 NOKPROBE_SYMBOL(process_fetch_insn)
 
diff --git a/kernel/trace/trace_probe_tmpl.h b/kernel/trace/trace_probe_tmpl.h
index d9aebd395a9d..32ae2fc78190 100644
--- a/kernel/trace/trace_probe_tmpl.h
+++ b/kernel/trace/trace_probe_tmpl.h
@@ -49,13 +49,66 @@ fetch_apply_bitfield(struct fetch_insn *code, void *buf)
 }
 
 /*
- * This must be defined for each callsite.
+ * These functions must be defined for each callsite.
  * Return consumed dynamic data size (>= 0), or error (< 0).
  * If dest is NULL, don't store result and return required dynamic data size.
  */
 static int
 process_fetch_insn(struct fetch_insn *code, struct pt_regs *regs,
   void *dest, void *base);
+static nokprobe_inline int fetch_store_strlen(unsigned long addr);
+static nokprobe_inline int
+fetch_store_string(unsigned long addr, void *dest, void *base);
+static nokprobe_inline int
+probe_mem_read(void *dest, void *src, size_t size);
+
+/* From the 2nd stage, routine is same */
+static nokprobe_inline int
+process_fetch_insn_bottom(struct fetch_insn *code, unsigned long val,
+  void *dest, void *base)
+{
+   int ret = 0;
+
+   /* 2nd stage: dereference memory if needed */
+   while (code->op == FETCH_OP_DEREF) {
+   ret = probe_mem_read(, (void *)val + code->offset,
+   sizeof(val));
+   if (ret)
+   return ret;
+   code++;
+   }
+
+   /* 3rd stage: store value to buffer */
+   if (unlikely(!dest)) {
+   if (code->op == FETCH_OP_ST_STRING)
+   return fetch_store_strlen(val + code->offset);
+   else
+   return -EILSEQ;
+   }
+
+   switch (code->op) {
+   case FETCH_OP_ST_RAW:
+   fetch_store_raw(val, code, dest);
+   break;
+   case FETCH_OP_ST_MEM:
+   probe_mem_read(dest, (void *)val + code->offset, code->size);
+   break;
+   case FETCH_OP_ST_STRING:
+   ret = fetch_store_string(val + code->offset, dest, base);
+   break;
+   default:
+   return -EILSEQ;
+   }
+   code++;
+
+   /* 4th stage: modify stored value if needed */
+   if

[PATCH v3 12/18] tracing: probeevent: Add symbol type

2018-02-23 Thread Masami Hiramatsu

Add "symbol" type to probeevent, which is an alias of u32 or u64
(depends on BITS_PER_LONG). This shows the result value in
symbol+offset style. This type is only available with kprobe
events.

Signed-off-by: Masami Hiramatsu 
---
 Changes in v2:
  - Add symbol type to README file.
---
 Documentation/trace/kprobetrace.txt |3 +++
 kernel/trace/trace.c|2 +-
 kernel/trace/trace_probe.c  |8 
 kernel/trace/trace_probe.h  |   12 +---
 4 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/Documentation/trace/kprobetrace.txt 
b/Documentation/trace/kprobetrace.txt
index 1a3a3d6bc2a8..d49381f2e411 100644
--- a/Documentation/trace/kprobetrace.txt
+++ b/Documentation/trace/kprobetrace.txt
@@ -62,6 +62,7 @@ respectively. 'x' prefix implies it is unsigned. Traced 
arguments are shown
 in decimal ('s' and 'u') or hexadecimal ('x'). Without type casting, 'x32'
 or 'x64' is used depends on the architecture (e.g. x86-32 uses x32, and
 x86-64 uses x64).
+
 String type is a special type, which fetches a "null-terminated" string from
 kernel space. This means it will fail and store NULL if the string container
 has been paged out.
@@ -70,6 +71,8 @@ offset, and container-size (usually 32). The syntax is;
 
  b@/
 
+Symbol type('symbol') is an alias of u32 or u64 type (depends on BITS_PER_LONG)
+which shows given pointer in "symbol+offset" style.
 For $comm, the default type is "string"; any other type is invalid.
 
 
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 20a2300ae4e8..8f08811d15b8 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4609,7 +4609,7 @@ static const char readme_msg[] =
"\t args: =fetcharg[:type]\n"
"\t fetcharg: %, @, @[+|-],\n"
"\t   $stack, $stack, $retval, $comm\n"
-   "\t type: s8/16/32/64, u8/16/32/64, x8/16/32/64, string,\n"
+   "\t type: s8/16/32/64, u8/16/32/64, x8/16/32/64, string, symbol,\n"
"\t   b@/\n"
 #endif
"  events/\t\t- Directory containing all trace event subsystems:\n"
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 9c1b55e0ccfd..4c83b00ebec0 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -58,6 +58,13 @@ DEFINE_BASIC_PRINT_TYPE_FUNC(x16, u16, "0x%x")
 DEFINE_BASIC_PRINT_TYPE_FUNC(x32, u32, "0x%x")
 DEFINE_BASIC_PRINT_TYPE_FUNC(x64, u64, "0x%Lx")
 
+int PRINT_TYPE_FUNC_NAME(symbol)(struct trace_seq *s, void *data, void *ent)
+{
+   trace_seq_printf(s, "%pS", (void *)*(unsigned long *)data);
+   return !trace_seq_has_overflowed(s);
+}
+const char PRINT_TYPE_FMT_NAME(symbol)[] = "%pS";
+
 /* Print type function for string type */
 int PRINT_TYPE_FUNC_NAME(string)(struct trace_seq *s, void *data, void *ent)
 {
@@ -91,6 +98,7 @@ static const struct fetch_type probe_fetch_types[] = {
ASSIGN_FETCH_TYPE_ALIAS(x16, u16, u16, 0),
ASSIGN_FETCH_TYPE_ALIAS(x32, u32, u32, 0),
ASSIGN_FETCH_TYPE_ALIAS(x64, u64, u64, 0),
+   ASSIGN_FETCH_TYPE_ALIAS(symbol, ADDR_FETCH_TYPE, ADDR_FETCH_TYPE, 0),
 
ASSIGN_FETCH_TYPE_END
 };
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 3bc43c1ce628..ef477bd8468a 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -157,6 +157,7 @@ DECLARE_BASIC_PRINT_TYPE_FUNC(x32);
 DECLARE_BASIC_PRINT_TYPE_FUNC(x64);
 
 DECLARE_BASIC_PRINT_TYPE_FUNC(string);
+DECLARE_BASIC_PRINT_TYPE_FUNC(symbol);
 
 /* Default (unsigned long) fetch type */
 #define __DEFAULT_FETCH_TYPE(t) x##t
@@ -164,6 +165,10 @@ DECLARE_BASIC_PRINT_TYPE_FUNC(string);
 #define DEFAULT_FETCH_TYPE _DEFAULT_FETCH_TYPE(BITS_PER_LONG)
 #define DEFAULT_FETCH_TYPE_STR __stringify(DEFAULT_FETCH_TYPE)
 
+#define __ADDR_FETCH_TYPE(t) u##t
+#define _ADDR_FETCH_TYPE(t) __ADDR_FETCH_TYPE(t)
+#define ADDR_FETCH_TYPE _ADDR_FETCH_TYPE(BITS_PER_LONG)
+
 #define __ASSIGN_FETCH_TYPE(_name, ptype, ftype, _size, sign, _fmttype)
\
{.name = _name, \
 .size = _size, \
@@ -172,13 +177,14 @@ DECLARE_BASIC_PRINT_TYPE_FUNC(string);
 .fmt = PRINT_TYPE_FMT_NAME(ptype), \
 .fmttype = _fmttype,   \
}
-
+#define _ASSIGN_FETCH_TYPE(_name, ptype, ftype, _size, sign, _fmttype) \
+   __ASSIGN_FETCH_TYPE(_name, ptype, ftype, _size, sign, #_fmttype)
 #define ASSIGN_FETCH_TYPE(ptype, ftype, sign)  \
-   __ASSIGN_FETCH_TYPE(#ptype, ptype, ftype, sizeof(ftype), sign, #ptype)
+   _ASSIGN_FETCH_TYPE(#ptype, ptype, ftype, sizeof(ftype), sign, ptype)
 
 /* If ptype is an alias of atype, use this macro (show atype in format) */
 #define ASSIGN_FETCH_TYPE_ALIAS(ptype, atype, ftype, sign) \
-   __ASSIGN_FETCH_TYPE(#ptype, ptype, ftype, sizeof(ftype), sign, #atype)
+   _ASSIGN_FETCH_TYPE(#ptype, ptype, ftype, sizeof(ftype), sign, atype)

[PATCH v3 12/18] tracing: probeevent: Add symbol type

2018-02-23 Thread Masami Hiramatsu

Add "symbol" type to probeevent, which is an alias of u32 or u64
(depends on BITS_PER_LONG). This shows the result value in
symbol+offset style. This type is only available with kprobe
events.

Signed-off-by: Masami Hiramatsu 
---
 Changes in v2:
  - Add symbol type to README file.
---
 Documentation/trace/kprobetrace.txt |3 +++
 kernel/trace/trace.c|2 +-
 kernel/trace/trace_probe.c  |8 
 kernel/trace/trace_probe.h  |   12 +---
 4 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/Documentation/trace/kprobetrace.txt 
b/Documentation/trace/kprobetrace.txt
index 1a3a3d6bc2a8..d49381f2e411 100644
--- a/Documentation/trace/kprobetrace.txt
+++ b/Documentation/trace/kprobetrace.txt
@@ -62,6 +62,7 @@ respectively. 'x' prefix implies it is unsigned. Traced 
arguments are shown
 in decimal ('s' and 'u') or hexadecimal ('x'). Without type casting, 'x32'
 or 'x64' is used depends on the architecture (e.g. x86-32 uses x32, and
 x86-64 uses x64).
+
 String type is a special type, which fetches a "null-terminated" string from
 kernel space. This means it will fail and store NULL if the string container
 has been paged out.
@@ -70,6 +71,8 @@ offset, and container-size (usually 32). The syntax is;
 
  b@/
 
+Symbol type('symbol') is an alias of u32 or u64 type (depends on BITS_PER_LONG)
+which shows given pointer in "symbol+offset" style.
 For $comm, the default type is "string"; any other type is invalid.
 
 
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 20a2300ae4e8..8f08811d15b8 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4609,7 +4609,7 @@ static const char readme_msg[] =
"\t args: =fetcharg[:type]\n"
"\t fetcharg: %, @, @[+|-],\n"
"\t   $stack, $stack, $retval, $comm\n"
-   "\t type: s8/16/32/64, u8/16/32/64, x8/16/32/64, string,\n"
+   "\t type: s8/16/32/64, u8/16/32/64, x8/16/32/64, string, symbol,\n"
"\t   b@/\n"
 #endif
"  events/\t\t- Directory containing all trace event subsystems:\n"
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 9c1b55e0ccfd..4c83b00ebec0 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -58,6 +58,13 @@ DEFINE_BASIC_PRINT_TYPE_FUNC(x16, u16, "0x%x")
 DEFINE_BASIC_PRINT_TYPE_FUNC(x32, u32, "0x%x")
 DEFINE_BASIC_PRINT_TYPE_FUNC(x64, u64, "0x%Lx")
 
+int PRINT_TYPE_FUNC_NAME(symbol)(struct trace_seq *s, void *data, void *ent)
+{
+   trace_seq_printf(s, "%pS", (void *)*(unsigned long *)data);
+   return !trace_seq_has_overflowed(s);
+}
+const char PRINT_TYPE_FMT_NAME(symbol)[] = "%pS";
+
 /* Print type function for string type */
 int PRINT_TYPE_FUNC_NAME(string)(struct trace_seq *s, void *data, void *ent)
 {
@@ -91,6 +98,7 @@ static const struct fetch_type probe_fetch_types[] = {
ASSIGN_FETCH_TYPE_ALIAS(x16, u16, u16, 0),
ASSIGN_FETCH_TYPE_ALIAS(x32, u32, u32, 0),
ASSIGN_FETCH_TYPE_ALIAS(x64, u64, u64, 0),
+   ASSIGN_FETCH_TYPE_ALIAS(symbol, ADDR_FETCH_TYPE, ADDR_FETCH_TYPE, 0),
 
ASSIGN_FETCH_TYPE_END
 };
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 3bc43c1ce628..ef477bd8468a 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -157,6 +157,7 @@ DECLARE_BASIC_PRINT_TYPE_FUNC(x32);
 DECLARE_BASIC_PRINT_TYPE_FUNC(x64);
 
 DECLARE_BASIC_PRINT_TYPE_FUNC(string);
+DECLARE_BASIC_PRINT_TYPE_FUNC(symbol);
 
 /* Default (unsigned long) fetch type */
 #define __DEFAULT_FETCH_TYPE(t) x##t
@@ -164,6 +165,10 @@ DECLARE_BASIC_PRINT_TYPE_FUNC(string);
 #define DEFAULT_FETCH_TYPE _DEFAULT_FETCH_TYPE(BITS_PER_LONG)
 #define DEFAULT_FETCH_TYPE_STR __stringify(DEFAULT_FETCH_TYPE)
 
+#define __ADDR_FETCH_TYPE(t) u##t
+#define _ADDR_FETCH_TYPE(t) __ADDR_FETCH_TYPE(t)
+#define ADDR_FETCH_TYPE _ADDR_FETCH_TYPE(BITS_PER_LONG)
+
 #define __ASSIGN_FETCH_TYPE(_name, ptype, ftype, _size, sign, _fmttype)
\
{.name = _name, \
 .size = _size, \
@@ -172,13 +177,14 @@ DECLARE_BASIC_PRINT_TYPE_FUNC(string);
 .fmt = PRINT_TYPE_FMT_NAME(ptype), \
 .fmttype = _fmttype,   \
}
-
+#define _ASSIGN_FETCH_TYPE(_name, ptype, ftype, _size, sign, _fmttype) \
+   __ASSIGN_FETCH_TYPE(_name, ptype, ftype, _size, sign, #_fmttype)
 #define ASSIGN_FETCH_TYPE(ptype, ftype, sign)  \
-   __ASSIGN_FETCH_TYPE(#ptype, ptype, ftype, sizeof(ftype), sign, #ptype)
+   _ASSIGN_FETCH_TYPE(#ptype, ptype, ftype, sizeof(ftype), sign, ptype)
 
 /* If ptype is an alias of atype, use this macro (show atype in format) */
 #define ASSIGN_FETCH_TYPE_ALIAS(ptype, atype, ftype, sign) \
-   __ASSIGN_FETCH_TYPE(#ptype, ptype, ftype, sizeof(ftype), sign, #atype)
+   _ASSIGN_FETCH_TYPE(#ptype, ptype, ftype, sizeof(ftype), sign, atype)
 
 #define

[PATCH v3 10/18] tracing: probeevent: Append traceprobe_ for exported function

2018-02-23 Thread Masami Hiramatsu

Append traceprobe_ for exported function set_print_fmt() as
same as other functions.

Signed-off-by: Masami Hiramatsu 
---
 kernel/trace/trace_kprobe.c |4 ++--
 kernel/trace/trace_probe.c  |2 +-
 kernel/trace/trace_probe.h  |2 +-
 kernel/trace/trace_uprobe.c |4 ++--
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 02d19eff49bb..35f81e1eb70e 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1305,7 +1305,7 @@ static int register_kprobe_event(struct trace_kprobe *tk)
 
init_trace_event_call(tk, call);
 
-   if (set_print_fmt(>tp, trace_kprobe_is_return(tk)) < 0)
+   if (traceprobe_set_print_fmt(>tp, trace_kprobe_is_return(tk)) < 0)
return -ENOMEM;
ret = register_trace_event(>event);
if (!ret) {
@@ -1362,7 +1362,7 @@ create_local_trace_kprobe(char *func, void *addr, 
unsigned long offs,
 
init_trace_event_call(tk, >tp.call);
 
-   if (set_print_fmt(>tp, trace_kprobe_is_return(tk)) < 0) {
+   if (traceprobe_set_print_fmt(>tp, trace_kprobe_is_return(tk)) < 0) {
ret = -ENOMEM;
goto error;
}
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 9ffc07c7c949..9c1b55e0ccfd 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -502,7 +502,7 @@ static int __set_print_fmt(struct trace_probe *tp, char 
*buf, int len,
return pos;
 }
 
-int set_print_fmt(struct trace_probe *tp, bool is_return)
+int traceprobe_set_print_fmt(struct trace_probe *tp, bool is_return)
 {
int len;
char *print_fmt;
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index d1b8bd74bf56..3bc43c1ce628 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -266,7 +266,7 @@ extern void traceprobe_free_probe_arg(struct probe_arg 
*arg);
 
 extern int traceprobe_split_symbol_offset(char *symbol, long *offset);
 
-extern int set_print_fmt(struct trace_probe *tp, bool is_return);
+extern int traceprobe_set_print_fmt(struct trace_probe *tp, bool is_return);
 
 #ifdef CONFIG_PERF_EVENTS
 extern struct trace_event_call *
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index de4f91bb313a..fafd48310823 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -1314,7 +1314,7 @@ static int register_uprobe_event(struct trace_uprobe *tu)
 
init_trace_event_call(tu, call);
 
-   if (set_print_fmt(>tp, is_ret_probe(tu)) < 0)
+   if (traceprobe_set_print_fmt(>tp, is_ret_probe(tu)) < 0)
return -ENOMEM;
 
ret = register_trace_event(>event);
@@ -1388,7 +1388,7 @@ create_local_trace_uprobe(char *name, unsigned long offs, 
bool is_return)
tu->filename = kstrdup(name, GFP_KERNEL);
init_trace_event_call(tu, >tp.call);
 
-   if (set_print_fmt(>tp, is_ret_probe(tu)) < 0) {
+   if (traceprobe_set_print_fmt(>tp, is_ret_probe(tu)) < 0) {
ret = -ENOMEM;
goto error;
}

[PATCH v3 10/18] tracing: probeevent: Append traceprobe_ for exported function

2018-02-23 Thread Masami Hiramatsu

Append traceprobe_ for exported function set_print_fmt() as
same as other functions.

Signed-off-by: Masami Hiramatsu 
---
 kernel/trace/trace_kprobe.c |4 ++--
 kernel/trace/trace_probe.c  |2 +-
 kernel/trace/trace_probe.h  |2 +-
 kernel/trace/trace_uprobe.c |4 ++--
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 02d19eff49bb..35f81e1eb70e 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1305,7 +1305,7 @@ static int register_kprobe_event(struct trace_kprobe *tk)
 
init_trace_event_call(tk, call);
 
-   if (set_print_fmt(>tp, trace_kprobe_is_return(tk)) < 0)
+   if (traceprobe_set_print_fmt(>tp, trace_kprobe_is_return(tk)) < 0)
return -ENOMEM;
ret = register_trace_event(>event);
if (!ret) {
@@ -1362,7 +1362,7 @@ create_local_trace_kprobe(char *func, void *addr, 
unsigned long offs,
 
init_trace_event_call(tk, >tp.call);
 
-   if (set_print_fmt(>tp, trace_kprobe_is_return(tk)) < 0) {
+   if (traceprobe_set_print_fmt(>tp, trace_kprobe_is_return(tk)) < 0) {
ret = -ENOMEM;
goto error;
}
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 9ffc07c7c949..9c1b55e0ccfd 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -502,7 +502,7 @@ static int __set_print_fmt(struct trace_probe *tp, char 
*buf, int len,
return pos;
 }
 
-int set_print_fmt(struct trace_probe *tp, bool is_return)
+int traceprobe_set_print_fmt(struct trace_probe *tp, bool is_return)
 {
int len;
char *print_fmt;
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index d1b8bd74bf56..3bc43c1ce628 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -266,7 +266,7 @@ extern void traceprobe_free_probe_arg(struct probe_arg 
*arg);
 
 extern int traceprobe_split_symbol_offset(char *symbol, long *offset);
 
-extern int set_print_fmt(struct trace_probe *tp, bool is_return);
+extern int traceprobe_set_print_fmt(struct trace_probe *tp, bool is_return);
 
 #ifdef CONFIG_PERF_EVENTS
 extern struct trace_event_call *
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index de4f91bb313a..fafd48310823 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -1314,7 +1314,7 @@ static int register_uprobe_event(struct trace_uprobe *tu)
 
init_trace_event_call(tu, call);
 
-   if (set_print_fmt(>tp, is_ret_probe(tu)) < 0)
+   if (traceprobe_set_print_fmt(>tp, is_ret_probe(tu)) < 0)
return -ENOMEM;
 
ret = register_trace_event(>event);
@@ -1388,7 +1388,7 @@ create_local_trace_uprobe(char *name, unsigned long offs, 
bool is_return)
tu->filename = kstrdup(name, GFP_KERNEL);
init_trace_event_call(tu, >tp.call);
 
-   if (set_print_fmt(>tp, is_ret_probe(tu)) < 0) {
+   if (traceprobe_set_print_fmt(>tp, is_ret_probe(tu)) < 0) {
ret = -ENOMEM;
goto error;
}

[lkp-robot] [printk] c162d5b433: BUG:KASAN:use-after-scope_in_c

2018-02-23 Thread kernel test robot

TO: Petr Mladek 
CC: Cong Wang , Dave Hansen , 
Johannes Weiner , Mel Gorman , Michal 
Hocko , Vlastimil Babka , Peter Zijlstra 
, Linus Torvalds , Jan 
Kara , Mathieu Desnoyers , Tetsuo 
Handa , Byungchul Park 
, Tejun Heo , Pavel Machek 
, Steven Rostedt (VMware) , Sergey 
Senozhatsky , LKML 
, linux-kernel@vger.kernel.org, l...@01.org



FYI, we noticed the following commit (built with gcc-7):

commit: c162d5b4338d72deed61aa65ed0f2f4ba2bbc8ab ("printk: Hide console waiter 
logic into helpers")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -cpu host -smp 2 -m 1G

caused below changes (please refer to attached dmesg/kmsg for entire 
log/backtrace):


++++
|| dbdda842fe | c162d5b433 |
++++
| boot_successes | 0  | 0  |
| boot_failures  | 18 | 16 |
| BUG:KASAN:use-after-scope_in_p | 18 ||
| BUG:KASAN:use-after-scope_in_c | 0  | 16 |
++++



[0.00] BUG: KASAN: use-after-scope in console_unlock+0x185/0x960
[0.00] BUG: KASAN: use-after-scope in console_unlock+0x185/0x960
[0.00] Write of size 1 at addr 828079b8 by task swapper/0
[0.00] Write of size 1 at addr 828079b8 by task swapper/0
[0.00] 
[0.00] 
[0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0-12953-gc162d5b #1
[0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0-12953-gc162d5b #1
[0.00] Call Trace:
[0.00] Call Trace:
[0.00]  ? dump_stack+0x11d/0x1c5
[0.00]  ? dump_stack+0x11d/0x1c5
[0.00]  ? printk+0xb5/0xd1
[0.00]  ? printk+0xb5/0xd1
[0.00]  ? arch_local_irq_restore+0x17/0x17
[0.00]  ? arch_local_irq_restore+0x17/0x17
[0.00]  ? do_raw_spin_unlock+0x137/0x169
[0.00]  ? do_raw_spin_unlock+0x137/0x169
[0.00]  ? irq_trace+0x2e/0x32
[0.00]  ? irq_trace+0x2e/0x32
[0.00]  ? console_unlock+0x185/0x960
[0.00]  ? console_unlock+0x185/0x960
[0.00]  ? print_address_description+0x6e/0x23b
[0.00]  ? print_address_description+0x6e/0x23b
[0.00]  ? console_unlock+0x185/0x960
[0.00]  ? console_unlock+0x185/0x960
[0.00]  ? kasan_report+0x223/0x249
[0.00]  ? kasan_report+0x223/0x249
[0.00]  ? console_unlock+0x185/0x960
[0.00]  ? console_unlock+0x185/0x960
[0.00]  ? wake_up_klogd+0xdf/0xdf
[0.00]  ? wake_up_klogd+0xdf/0xdf
[0.00]  ? do_raw_spin_unlock+0x145/0x169
[0.00]  ? do_raw_spin_unlock+0x145/0x169
[0.00]  ? do_raw_spin_trylock+0xed/0xed
[0.00]  ? do_raw_spin_trylock+0xed/0xed
[0.00]  ? irq_trace+0x2e/0x32
[0.00]  ? irq_trace+0x2e/0x32
[0.00]  ? _raw_spin_unlock_irqrestore+0x3b/0x54
[0.00]  ? _raw_spin_unlock_irqrestore+0x3b/0x54
[0.00]  ? time_hardirqs_off+0x12/0x2d
[0.00]  ? time_hardirqs_off+0x12/0x2d
[0.00]  ? arch_local_save_flags+0x7/0x8
[0.00]  ? arch_local_save_flags+0x7/0x8
[0.00]  ? trace_hardirqs_off_caller+0x127/0x139
[0.00]  ? trace_hardirqs_off_caller+0x127/0x139
[0.00]  ? irq_trace+0x2e/0x32
[0.00]  ? irq_trace+0x2e/0x32
[0.00]  ? vprintk_emit+0x579/0x823
[0.00]  ? vprintk_emit+0x579/0x823
[0.00]  ? __down_trylock_console_sem+0x90/0xa4
[0.00]  ? __down_trylock_console_sem+0x90/0xa4
[0.00]  ? __down_trylock_console_sem+0x9d/0xa4
[0.00]  ? __down_trylock_console_sem+0x9d/0xa4
[0.00]  ? vprintk_emit+0x7ec/0x823
[0.00]  ? vprintk_emit+0x7ec/0x823
[0.00]  ? console_unlock+0x960/0x960
[0.00]  ? console_unlock+0x960/0x960
[0.00]  ? memblock_merge_regions+0x2d/0x154
[0.00]  ? memblock_merge_regions+0x2d/0x154
[0.00]  ? memblock_add_range+0x322/0x333
[0.00]  ? memblock_add_range+0x322/0x333
[0.00]  ? memblock_reserve+0xbb/0xe1
[0.00]  ? memblock_reserve+0xbb/0xe1
[0.00]  ? memblock_add+0xe1/0xe1
[0.00]  ? memblock_add+0xe1/0xe1
[0.00]  ? set_pte+0x24/0x27
[0.00]  ? set_pte+0x24/0x27
[0.00]  ? vprintk_func+0x94/0xa5
[0.00]  ? vprintk_func+0x94/0xa5
[0.00]  ? printk+0xb5/0xd1
[0.00]  ?

[PATCH v3 09/18] tracing: probeevent: Return consumed bytes of dynamic area

2018-02-23 Thread Masami Hiramatsu

Cleanup string fetching routine so that returns the consumed
bytes of dynamic area and store the string information as
data_loc format instead of data_rloc.
This simplifies the fetcharg loop.

Signed-off-by: Masami Hiramatsu 
---
 kernel/trace/trace_kprobe.c |   51 +++---
 kernel/trace/trace_probe.h  |   26 ---
 kernel/trace/trace_probe_tmpl.h |   52 +-
 kernel/trace/trace_uprobe.c |   53 ---
 4 files changed, 82 insertions(+), 100 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 4caf019b5917..02d19eff49bb 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -807,8 +807,8 @@ static const struct file_operations kprobe_profile_ops = {
 /* Kprobe specific fetch functions */
 
 /* Return the length of string -- including null terminal byte */
-static nokprobe_inline void
-fetch_store_strlen(unsigned long addr, void *dest)
+static nokprobe_inline int
+fetch_store_strlen(unsigned long addr)
 {
mm_segment_t old_fs;
int ret, len = 0;
@@ -826,25 +826,22 @@ fetch_store_strlen(unsigned long addr, void *dest)
pagefault_enable();
set_fs(old_fs);
 
-   if (ret < 0)/* Failed to check the length */
-   *(u32 *)dest = 0;
-   else
-   *(u32 *)dest = len;
+   return (ret < 0) ? ret : len;
 }
 
 /*
  * Fetch a null-terminated string. Caller MUST set *(u32 *)buf with max
  * length and relative data location.
  */
-static nokprobe_inline void
-fetch_store_string(unsigned long addr, void *dest)
+static nokprobe_inline int
+fetch_store_string(unsigned long addr, void *dest, void *base)
 {
-   int maxlen = get_rloc_len(*(u32 *)dest);
-   u8 *dst = get_rloc_data(dest);
+   int maxlen = get_loc_len(*(u32 *)dest);
+   u8 *dst = get_loc_data(dest, base);
long ret;
 
if (!maxlen)
-   return;
+   return -ENOMEM;
 
/*
 * Try to get string again, since the string can be changed while
@@ -854,19 +851,19 @@ fetch_store_string(unsigned long addr, void *dest)
 
if (ret < 0) {  /* Failed to fetch string */
dst[0] = '\0';
-   *(u32 *)dest = make_data_rloc(0, get_rloc_offs(*(u32 *)dest));
-   } else {
-   *(u32 *)dest = make_data_rloc(ret, get_rloc_offs(*(u32 *)dest));
+   ret = 0;
}
+   *(u32 *)dest = make_data_loc(ret, (void *)dst - base);
+   return ret;
 }
 
 /* Note that we don't verify it, since the code does not come from user space 
*/
 static int
 process_fetch_insn(struct fetch_insn *code, struct pt_regs *regs, void *dest,
-  bool pre)
+  void *base)
 {
unsigned long val;
-   int ret;
+   int ret = 0;
 
/* 1st stage: get value from context */
switch (code->op) {
@@ -903,6 +900,13 @@ process_fetch_insn(struct fetch_insn *code, struct pt_regs 
*regs, void *dest,
}
 
/* 3rd stage: store value to buffer */
+   if (unlikely(!dest)) {
+   if (code->op == FETCH_OP_ST_STRING)
+   return fetch_store_strlen(val + code->offset);
+   else
+   return -EILSEQ;
+   }
+
switch (code->op) {
case FETCH_OP_ST_RAW:
fetch_store_raw(val, code, dest);
@@ -911,10 +915,7 @@ process_fetch_insn(struct fetch_insn *code, struct pt_regs 
*regs, void *dest,
probe_kernel_read(dest, (void *)val + code->offset, code->size);
break;
case FETCH_OP_ST_STRING:
-   if (pre)
-   fetch_store_strlen(val + code->offset, dest);
-   else
-   fetch_store_string(val + code->offset, dest);
+   ret = fetch_store_string(val + code->offset, dest, base);
break;
default:
return -EILSEQ;
@@ -927,7 +928,7 @@ process_fetch_insn(struct fetch_insn *code, struct pt_regs 
*regs, void *dest,
code++;
}
 
-   return code->op == FETCH_OP_END ? 0 : -EILSEQ;
+   return code->op == FETCH_OP_END ? ret : -EILSEQ;
 }
 NOKPROBE_SYMBOL(process_fetch_insn)
 
@@ -962,7 +963,7 @@ __kprobe_trace_func(struct trace_kprobe *tk, struct pt_regs 
*regs,
 
entry = ring_buffer_event_data(event);
entry->ip = (unsigned long)tk->rp.kp.addr;
-   store_trace_args(sizeof(*entry), >tp, regs, (u8 *)[1], dsize);
+   store_trace_args([1], >tp, regs, sizeof(*entry), dsize);
 
event_trigger_unlock_commit_regs(trace_file, buffer, event,
 entry, irq_flags, pc, regs);
@@ -1011,7 +1012,7 @@ __kretprobe_trace_func(struct trace_kprobe *tk, struct 
kretprobe_instance *ri,
entry = ring_buffer_event_data(event);
entry->func = (unsigned

[lkp-robot] [printk] c162d5b433: BUG:KASAN:use-after-scope_in_c

2018-02-23 Thread kernel test robot

TO: Petr Mladek 
CC: Cong Wang , Dave Hansen , 
Johannes Weiner , Mel Gorman , Michal 
Hocko , Vlastimil Babka , Peter Zijlstra 
, Linus Torvalds , Jan 
Kara , Mathieu Desnoyers , Tetsuo 
Handa , Byungchul Park 
, Tejun Heo , Pavel Machek 
, Steven Rostedt (VMware) , Sergey 
Senozhatsky , LKML 
, linux-kernel@vger.kernel.org, l...@01.org



FYI, we noticed the following commit (built with gcc-7):

commit: c162d5b4338d72deed61aa65ed0f2f4ba2bbc8ab ("printk: Hide console waiter 
logic into helpers")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -cpu host -smp 2 -m 1G

caused below changes (please refer to attached dmesg/kmsg for entire 
log/backtrace):


++++
|| dbdda842fe | c162d5b433 |
++++
| boot_successes | 0  | 0  |
| boot_failures  | 18 | 16 |
| BUG:KASAN:use-after-scope_in_p | 18 ||
| BUG:KASAN:use-after-scope_in_c | 0  | 16 |
++++



[0.00] BUG: KASAN: use-after-scope in console_unlock+0x185/0x960
[0.00] BUG: KASAN: use-after-scope in console_unlock+0x185/0x960
[0.00] Write of size 1 at addr 828079b8 by task swapper/0
[0.00] Write of size 1 at addr 828079b8 by task swapper/0
[0.00] 
[0.00] 
[0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0-12953-gc162d5b #1
[0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0-12953-gc162d5b #1
[0.00] Call Trace:
[0.00] Call Trace:
[0.00]  ? dump_stack+0x11d/0x1c5
[0.00]  ? dump_stack+0x11d/0x1c5
[0.00]  ? printk+0xb5/0xd1
[0.00]  ? printk+0xb5/0xd1
[0.00]  ? arch_local_irq_restore+0x17/0x17
[0.00]  ? arch_local_irq_restore+0x17/0x17
[0.00]  ? do_raw_spin_unlock+0x137/0x169
[0.00]  ? do_raw_spin_unlock+0x137/0x169
[0.00]  ? irq_trace+0x2e/0x32
[0.00]  ? irq_trace+0x2e/0x32
[0.00]  ? console_unlock+0x185/0x960
[0.00]  ? console_unlock+0x185/0x960
[0.00]  ? print_address_description+0x6e/0x23b
[0.00]  ? print_address_description+0x6e/0x23b
[0.00]  ? console_unlock+0x185/0x960
[0.00]  ? console_unlock+0x185/0x960
[0.00]  ? kasan_report+0x223/0x249
[0.00]  ? kasan_report+0x223/0x249
[0.00]  ? console_unlock+0x185/0x960
[0.00]  ? console_unlock+0x185/0x960
[0.00]  ? wake_up_klogd+0xdf/0xdf
[0.00]  ? wake_up_klogd+0xdf/0xdf
[0.00]  ? do_raw_spin_unlock+0x145/0x169
[0.00]  ? do_raw_spin_unlock+0x145/0x169
[0.00]  ? do_raw_spin_trylock+0xed/0xed
[0.00]  ? do_raw_spin_trylock+0xed/0xed
[0.00]  ? irq_trace+0x2e/0x32
[0.00]  ? irq_trace+0x2e/0x32
[0.00]  ? _raw_spin_unlock_irqrestore+0x3b/0x54
[0.00]  ? _raw_spin_unlock_irqrestore+0x3b/0x54
[0.00]  ? time_hardirqs_off+0x12/0x2d
[0.00]  ? time_hardirqs_off+0x12/0x2d
[0.00]  ? arch_local_save_flags+0x7/0x8
[0.00]  ? arch_local_save_flags+0x7/0x8
[0.00]  ? trace_hardirqs_off_caller+0x127/0x139
[0.00]  ? trace_hardirqs_off_caller+0x127/0x139
[0.00]  ? irq_trace+0x2e/0x32
[0.00]  ? irq_trace+0x2e/0x32
[0.00]  ? vprintk_emit+0x579/0x823
[0.00]  ? vprintk_emit+0x579/0x823
[0.00]  ? __down_trylock_console_sem+0x90/0xa4
[0.00]  ? __down_trylock_console_sem+0x90/0xa4
[0.00]  ? __down_trylock_console_sem+0x9d/0xa4
[0.00]  ? __down_trylock_console_sem+0x9d/0xa4
[0.00]  ? vprintk_emit+0x7ec/0x823
[0.00]  ? vprintk_emit+0x7ec/0x823
[0.00]  ? console_unlock+0x960/0x960
[0.00]  ? console_unlock+0x960/0x960
[0.00]  ? memblock_merge_regions+0x2d/0x154
[0.00]  ? memblock_merge_regions+0x2d/0x154
[0.00]  ? memblock_add_range+0x322/0x333
[0.00]  ? memblock_add_range+0x322/0x333
[0.00]  ? memblock_reserve+0xbb/0xe1
[0.00]  ? memblock_reserve+0xbb/0xe1
[0.00]  ? memblock_add+0xe1/0xe1
[0.00]  ? memblock_add+0xe1/0xe1
[0.00]  ? set_pte+0x24/0x27
[0.00]  ? set_pte+0x24/0x27
[0.00]  ? vprintk_func+0x94/0xa5
[0.00]  ? vprintk_func+0x94/0xa5
[0.00]  ? printk+0xb5/0xd1
[0.00]  ? printk+0xb5/0xd1
[0.00]  ? show_regs_print_info+0x41/0x41
[0.00]  ? show_regs_print_info+0x41/0x41
[0.00]  ? kasan_populate_zero_shadow+0x37b/0x3f6
[0.00]  ? kasan_populate_zero_shadow+0x37b/0x3f6
[0.00]  ? native_flush_tlb_global+0x74/0x80
[0.00]  ? native_flush_tlb_global+0x74/0x80
[0.00]  ? kasan_init+0x211/0x22d
[0.00]  ? kasan_init+0x211/0x22d

[PATCH v3 09/18] tracing: probeevent: Return consumed bytes of dynamic area

2018-02-23 Thread Masami Hiramatsu

Cleanup string fetching routine so that returns the consumed
bytes of dynamic area and store the string information as
data_loc format instead of data_rloc.
This simplifies the fetcharg loop.

Signed-off-by: Masami Hiramatsu 
---
 kernel/trace/trace_kprobe.c |   51 +++---
 kernel/trace/trace_probe.h  |   26 ---
 kernel/trace/trace_probe_tmpl.h |   52 +-
 kernel/trace/trace_uprobe.c |   53 ---
 4 files changed, 82 insertions(+), 100 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 4caf019b5917..02d19eff49bb 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -807,8 +807,8 @@ static const struct file_operations kprobe_profile_ops = {
 /* Kprobe specific fetch functions */
 
 /* Return the length of string -- including null terminal byte */
-static nokprobe_inline void
-fetch_store_strlen(unsigned long addr, void *dest)
+static nokprobe_inline int
+fetch_store_strlen(unsigned long addr)
 {
mm_segment_t old_fs;
int ret, len = 0;
@@ -826,25 +826,22 @@ fetch_store_strlen(unsigned long addr, void *dest)
pagefault_enable();
set_fs(old_fs);
 
-   if (ret < 0)/* Failed to check the length */
-   *(u32 *)dest = 0;
-   else
-   *(u32 *)dest = len;
+   return (ret < 0) ? ret : len;
 }
 
 /*
  * Fetch a null-terminated string. Caller MUST set *(u32 *)buf with max
  * length and relative data location.
  */
-static nokprobe_inline void
-fetch_store_string(unsigned long addr, void *dest)
+static nokprobe_inline int
+fetch_store_string(unsigned long addr, void *dest, void *base)
 {
-   int maxlen = get_rloc_len(*(u32 *)dest);
-   u8 *dst = get_rloc_data(dest);
+   int maxlen = get_loc_len(*(u32 *)dest);
+   u8 *dst = get_loc_data(dest, base);
long ret;
 
if (!maxlen)
-   return;
+   return -ENOMEM;
 
/*
 * Try to get string again, since the string can be changed while
@@ -854,19 +851,19 @@ fetch_store_string(unsigned long addr, void *dest)
 
if (ret < 0) {  /* Failed to fetch string */
dst[0] = '\0';
-   *(u32 *)dest = make_data_rloc(0, get_rloc_offs(*(u32 *)dest));
-   } else {
-   *(u32 *)dest = make_data_rloc(ret, get_rloc_offs(*(u32 *)dest));
+   ret = 0;
}
+   *(u32 *)dest = make_data_loc(ret, (void *)dst - base);
+   return ret;
 }
 
 /* Note that we don't verify it, since the code does not come from user space 
*/
 static int
 process_fetch_insn(struct fetch_insn *code, struct pt_regs *regs, void *dest,
-  bool pre)
+  void *base)
 {
unsigned long val;
-   int ret;
+   int ret = 0;
 
/* 1st stage: get value from context */
switch (code->op) {
@@ -903,6 +900,13 @@ process_fetch_insn(struct fetch_insn *code, struct pt_regs 
*regs, void *dest,
}
 
/* 3rd stage: store value to buffer */
+   if (unlikely(!dest)) {
+   if (code->op == FETCH_OP_ST_STRING)
+   return fetch_store_strlen(val + code->offset);
+   else
+   return -EILSEQ;
+   }
+
switch (code->op) {
case FETCH_OP_ST_RAW:
fetch_store_raw(val, code, dest);
@@ -911,10 +915,7 @@ process_fetch_insn(struct fetch_insn *code, struct pt_regs 
*regs, void *dest,
probe_kernel_read(dest, (void *)val + code->offset, code->size);
break;
case FETCH_OP_ST_STRING:
-   if (pre)
-   fetch_store_strlen(val + code->offset, dest);
-   else
-   fetch_store_string(val + code->offset, dest);
+   ret = fetch_store_string(val + code->offset, dest, base);
break;
default:
return -EILSEQ;
@@ -927,7 +928,7 @@ process_fetch_insn(struct fetch_insn *code, struct pt_regs 
*regs, void *dest,
code++;
}
 
-   return code->op == FETCH_OP_END ? 0 : -EILSEQ;
+   return code->op == FETCH_OP_END ? ret : -EILSEQ;
 }
 NOKPROBE_SYMBOL(process_fetch_insn)
 
@@ -962,7 +963,7 @@ __kprobe_trace_func(struct trace_kprobe *tk, struct pt_regs 
*regs,
 
entry = ring_buffer_event_data(event);
entry->ip = (unsigned long)tk->rp.kp.addr;
-   store_trace_args(sizeof(*entry), >tp, regs, (u8 *)[1], dsize);
+   store_trace_args([1], >tp, regs, sizeof(*entry), dsize);
 
event_trigger_unlock_commit_regs(trace_file, buffer, event,
 entry, irq_flags, pc, regs);
@@ -1011,7 +1012,7 @@ __kretprobe_trace_func(struct trace_kprobe *tk, struct 
kretprobe_instance *ri,
entry = ring_buffer_event_data(event);
entry->func = (unsigned long)tk->rp.kp.addr;

[PATCH v3 07/18] tracing: probeevent: Introduce new argument fetching code

2018-02-23 Thread Masami Hiramatsu

Replace {k,u}probe event argument fetching framework
with switch-case based. Currently that is implemented
with structures, macros and chain of function-pointers,
which is more complicated than necessary and may get
a performance penalty by retpoline.

This simplify that with an array of "fetch_insn" (opcode
and oprands), and make process_fetch_insn() just
interprets it. No function pointers are used.

Signed-off-by: Masami Hiramatsu 
---
 Changes in v3:
  - Split out probe type table unification.
---
 kernel/trace/trace_kprobe.c |  291 +---
 kernel/trace/trace_probe.c  |  401 +++
 kernel/trace/trace_probe.h  |  230 --
 kernel/trace/trace_probe_tmpl.h |  120 
 kernel/trace/trace_uprobe.c |  127 
 5 files changed, 491 insertions(+), 678 deletions(-)
 create mode 100644 kernel/trace/trace_probe_tmpl.h

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 1018867022b6..8423815ff986 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -24,6 +24,7 @@
 #include 
 
 #include "trace_probe.h"
+#include "trace_probe_tmpl.h"
 
 #define KPROBE_EVENT_SYSTEM "kprobes"
 #define KRETPROBE_MAXACTIVE_MAX 4096
@@ -121,160 +122,6 @@ static int kprobe_dispatcher(struct kprobe *kp, struct 
pt_regs *regs);
 static int kretprobe_dispatcher(struct kretprobe_instance *ri,
struct pt_regs *regs);
 
-/* Memory fetching by symbol */
-struct symbol_cache {
-   char*symbol;
-   longoffset;
-   unsigned long   addr;
-};
-
-unsigned long update_symbol_cache(struct symbol_cache *sc)
-{
-   sc->addr = (unsigned long)kallsyms_lookup_name(sc->symbol);
-
-   if (sc->addr)
-   sc->addr += sc->offset;
-
-   return sc->addr;
-}
-
-void free_symbol_cache(struct symbol_cache *sc)
-{
-   kfree(sc->symbol);
-   kfree(sc);
-}
-
-struct symbol_cache *alloc_symbol_cache(const char *sym, long offset)
-{
-   struct symbol_cache *sc;
-
-   if (!sym || strlen(sym) == 0)
-   return NULL;
-
-   sc = kzalloc(sizeof(struct symbol_cache), GFP_KERNEL);
-   if (!sc)
-   return NULL;
-
-   sc->symbol = kstrdup(sym, GFP_KERNEL);
-   if (!sc->symbol) {
-   kfree(sc);
-   return NULL;
-   }
-   sc->offset = offset;
-   update_symbol_cache(sc);
-
-   return sc;
-}
-
-/*
- * Kprobes-specific fetch functions
- */
-#define DEFINE_FETCH_stack(type)   \
-static void FETCH_FUNC_NAME(stack, type)(struct pt_regs *regs, \
- void *offset, void *dest) \
-{  \
-   *(type *)dest = (type)regs_get_kernel_stack_nth(regs,   \
-   (unsigned int)((unsigned long)offset)); \
-}  \
-NOKPROBE_SYMBOL(FETCH_FUNC_NAME(stack, type));
-
-DEFINE_BASIC_FETCH_FUNCS(stack)
-/* No string on the stack entry */
-#define fetch_stack_string NULL
-#define fetch_stack_string_sizeNULL
-
-#define DEFINE_FETCH_memory(type)  \
-static void FETCH_FUNC_NAME(memory, type)(struct pt_regs *regs,
\
- void *addr, void *dest)   \
-{  \
-   type retval;\
-   if (probe_kernel_address(addr, retval)) \
-   *(type *)dest = 0;  \
-   else\
-   *(type *)dest = retval; \
-}  \
-NOKPROBE_SYMBOL(FETCH_FUNC_NAME(memory, type));
-
-DEFINE_BASIC_FETCH_FUNCS(memory)
-/*
- * Fetch a null-terminated string. Caller MUST set *(u32 *)dest with max
- * length and relative data location.
- */
-static void FETCH_FUNC_NAME(memory, string)(struct pt_regs *regs,
-   void *addr, void *dest)
-{
-   int maxlen = get_rloc_len(*(u32 *)dest);
-   u8 *dst = get_rloc_data(dest);
-   long ret;
-
-   if (!maxlen)
-   return;
-
-   /*
-* Try to get string again, since the string can be changed while
-* probing.
-*/
-   ret = strncpy_from_unsafe(dst, addr, maxlen);
-
-   if (ret < 0) {  /* Failed to fetch string */
-   dst[0] = '\0';
-   *(u32 *)dest = make_data_rloc(0, get_rloc_offs(*(u32 *)dest));
-   } else {
-   *(u32 *)dest = make_data_rloc(ret, get_rloc_offs(*(u32 *)dest));
-   }
-}

[PATCH v3 07/18] tracing: probeevent: Introduce new argument fetching code

2018-02-23 Thread Masami Hiramatsu

Replace {k,u}probe event argument fetching framework
with switch-case based. Currently that is implemented
with structures, macros and chain of function-pointers,
which is more complicated than necessary and may get
a performance penalty by retpoline.

This simplify that with an array of "fetch_insn" (opcode
and oprands), and make process_fetch_insn() just
interprets it. No function pointers are used.

Signed-off-by: Masami Hiramatsu 
---
 Changes in v3:
  - Split out probe type table unification.
---
 kernel/trace/trace_kprobe.c |  291 +---
 kernel/trace/trace_probe.c  |  401 +++
 kernel/trace/trace_probe.h  |  230 --
 kernel/trace/trace_probe_tmpl.h |  120 
 kernel/trace/trace_uprobe.c |  127 
 5 files changed, 491 insertions(+), 678 deletions(-)
 create mode 100644 kernel/trace/trace_probe_tmpl.h

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 1018867022b6..8423815ff986 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -24,6 +24,7 @@
 #include 
 
 #include "trace_probe.h"
+#include "trace_probe_tmpl.h"
 
 #define KPROBE_EVENT_SYSTEM "kprobes"
 #define KRETPROBE_MAXACTIVE_MAX 4096
@@ -121,160 +122,6 @@ static int kprobe_dispatcher(struct kprobe *kp, struct 
pt_regs *regs);
 static int kretprobe_dispatcher(struct kretprobe_instance *ri,
struct pt_regs *regs);
 
-/* Memory fetching by symbol */
-struct symbol_cache {
-   char*symbol;
-   longoffset;
-   unsigned long   addr;
-};
-
-unsigned long update_symbol_cache(struct symbol_cache *sc)
-{
-   sc->addr = (unsigned long)kallsyms_lookup_name(sc->symbol);
-
-   if (sc->addr)
-   sc->addr += sc->offset;
-
-   return sc->addr;
-}
-
-void free_symbol_cache(struct symbol_cache *sc)
-{
-   kfree(sc->symbol);
-   kfree(sc);
-}
-
-struct symbol_cache *alloc_symbol_cache(const char *sym, long offset)
-{
-   struct symbol_cache *sc;
-
-   if (!sym || strlen(sym) == 0)
-   return NULL;
-
-   sc = kzalloc(sizeof(struct symbol_cache), GFP_KERNEL);
-   if (!sc)
-   return NULL;
-
-   sc->symbol = kstrdup(sym, GFP_KERNEL);
-   if (!sc->symbol) {
-   kfree(sc);
-   return NULL;
-   }
-   sc->offset = offset;
-   update_symbol_cache(sc);
-
-   return sc;
-}
-
-/*
- * Kprobes-specific fetch functions
- */
-#define DEFINE_FETCH_stack(type)   \
-static void FETCH_FUNC_NAME(stack, type)(struct pt_regs *regs, \
- void *offset, void *dest) \
-{  \
-   *(type *)dest = (type)regs_get_kernel_stack_nth(regs,   \
-   (unsigned int)((unsigned long)offset)); \
-}  \
-NOKPROBE_SYMBOL(FETCH_FUNC_NAME(stack, type));
-
-DEFINE_BASIC_FETCH_FUNCS(stack)
-/* No string on the stack entry */
-#define fetch_stack_string NULL
-#define fetch_stack_string_sizeNULL
-
-#define DEFINE_FETCH_memory(type)  \
-static void FETCH_FUNC_NAME(memory, type)(struct pt_regs *regs,
\
- void *addr, void *dest)   \
-{  \
-   type retval;\
-   if (probe_kernel_address(addr, retval)) \
-   *(type *)dest = 0;  \
-   else\
-   *(type *)dest = retval; \
-}  \
-NOKPROBE_SYMBOL(FETCH_FUNC_NAME(memory, type));
-
-DEFINE_BASIC_FETCH_FUNCS(memory)
-/*
- * Fetch a null-terminated string. Caller MUST set *(u32 *)dest with max
- * length and relative data location.
- */
-static void FETCH_FUNC_NAME(memory, string)(struct pt_regs *regs,
-   void *addr, void *dest)
-{
-   int maxlen = get_rloc_len(*(u32 *)dest);
-   u8 *dst = get_rloc_data(dest);
-   long ret;
-
-   if (!maxlen)
-   return;
-
-   /*
-* Try to get string again, since the string can be changed while
-* probing.
-*/
-   ret = strncpy_from_unsafe(dst, addr, maxlen);
-
-   if (ret < 0) {  /* Failed to fetch string */
-   dst[0] = '\0';
-   *(u32 *)dest = make_data_rloc(0, get_rloc_offs(*(u32 *)dest));
-   } else {
-   *(u32 *)dest = make_data_rloc(ret, get_rloc_offs(*(u32 *)dest));
-   }
-}

[PATCH v3 08/18] tracing: probeevent: Unify fetch type tables

2018-02-23 Thread Masami Hiramatsu

Unify {k,u}probe_fetch_type_table to probe_fetch_type_table
because the main difference of those type tables (fetcharg
methods) are gone. Now we can consolidate it.

Signed-off-by: Masami Hiramatsu 
---
 kernel/trace/trace_kprobe.c |   27 +-
 kernel/trace/trace_probe.c  |   54 +--
 kernel/trace/trace_probe.h  |6 +
 kernel/trace/trace_uprobe.c |   27 +-
 4 files changed, 39 insertions(+), 75 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 8423815ff986..4caf019b5917 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -122,30 +122,6 @@ static int kprobe_dispatcher(struct kprobe *kp, struct 
pt_regs *regs);
 static int kretprobe_dispatcher(struct kretprobe_instance *ri,
struct pt_regs *regs);
 
-/* Fetch type information table */
-static const struct fetch_type kprobes_fetch_type_table[] = {
-   /* Special types */
-   [FETCH_TYPE_STRING] = __ASSIGN_FETCH_TYPE("string", string, string,
-   sizeof(u32), 1, "__data_loc char[]"),
-   [FETCH_TYPE_STRSIZE] = __ASSIGN_FETCH_TYPE("string_size", u32,
-   string_size, sizeof(u32), 0, "u32"),
-   /* Basic types */
-   ASSIGN_FETCH_TYPE(u8,  u8,  0),
-   ASSIGN_FETCH_TYPE(u16, u16, 0),
-   ASSIGN_FETCH_TYPE(u32, u32, 0),
-   ASSIGN_FETCH_TYPE(u64, u64, 0),
-   ASSIGN_FETCH_TYPE(s8,  u8,  1),
-   ASSIGN_FETCH_TYPE(s16, u16, 1),
-   ASSIGN_FETCH_TYPE(s32, u32, 1),
-   ASSIGN_FETCH_TYPE(s64, u64, 1),
-   ASSIGN_FETCH_TYPE_ALIAS(x8,  u8,  u8,  0),
-   ASSIGN_FETCH_TYPE_ALIAS(x16, u16, u16, 0),
-   ASSIGN_FETCH_TYPE_ALIAS(x32, u32, u32, 0),
-   ASSIGN_FETCH_TYPE_ALIAS(x64, u64, u64, 0),
-
-   ASSIGN_FETCH_TYPE_END
-};
-
 /*
  * Allocate new trace_probe and initialize it (including kprobes).
  */
@@ -674,8 +650,7 @@ static int create_trace_kprobe(int argc, char **argv)
 
/* Parse fetch argument */
ret = traceprobe_parse_probe_arg(arg, >tp.size, parg,
-is_return, true,
-kprobes_fetch_type_table);
+is_return, true);
if (ret) {
pr_info("Parse error at argument[%d]. (%d)\n", i, ret);
goto error;
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index a7e36606718a..9ffc07c7c949 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -73,8 +73,29 @@ int PRINT_TYPE_FUNC_NAME(string)(struct trace_seq *s, void 
*data, void *ent)
 
 const char PRINT_TYPE_FMT_NAME(string)[] = "\\\"%s\\\"";
 
-static const struct fetch_type *find_fetch_type(const char *type,
-   const struct fetch_type *ftbl)
+/* Fetch type information table */
+static const struct fetch_type probe_fetch_types[] = {
+   /* Special types */
+   __ASSIGN_FETCH_TYPE("string", string, string, sizeof(u32), 1,
+   "__data_loc char[]"),
+   /* Basic types */
+   ASSIGN_FETCH_TYPE(u8,  u8,  0),
+   ASSIGN_FETCH_TYPE(u16, u16, 0),
+   ASSIGN_FETCH_TYPE(u32, u32, 0),
+   ASSIGN_FETCH_TYPE(u64, u64, 0),
+   ASSIGN_FETCH_TYPE(s8,  u8,  1),
+   ASSIGN_FETCH_TYPE(s16, u16, 1),
+   ASSIGN_FETCH_TYPE(s32, u32, 1),
+   ASSIGN_FETCH_TYPE(s64, u64, 1),
+   ASSIGN_FETCH_TYPE_ALIAS(x8,  u8,  u8,  0),
+   ASSIGN_FETCH_TYPE_ALIAS(x16, u16, u16, 0),
+   ASSIGN_FETCH_TYPE_ALIAS(x32, u32, u32, 0),
+   ASSIGN_FETCH_TYPE_ALIAS(x64, u64, u64, 0),
+
+   ASSIGN_FETCH_TYPE_END
+};
+
+static const struct fetch_type *find_fetch_type(const char *type)
 {
int i;
 
@@ -95,21 +116,21 @@ static const struct fetch_type *find_fetch_type(const char 
*type,
 
switch (bs) {
case 8:
-   return find_fetch_type("u8", ftbl);
+   return find_fetch_type("u8");
case 16:
-   return find_fetch_type("u16", ftbl);
+   return find_fetch_type("u16");
case 32:
-   return find_fetch_type("u32", ftbl);
+   return find_fetch_type("u32");
case 64:
-   return find_fetch_type("u64", ftbl);
+   return find_fetch_type("u64");
default:
goto fail;
}
}
 
-   for (i = 0; ftbl[i].name; i++) {
-   if (strcmp(type, ftbl[i].name) == 0)
-   return [i];
+   for (i = 0; probe_fetch_types[i].name; i++) {
+   if (strcmp(type, probe_fetch_types[i].name) == 0)
+   return

[PATCH v3 08/18] tracing: probeevent: Unify fetch type tables

2018-02-23 Thread Masami Hiramatsu

Unify {k,u}probe_fetch_type_table to probe_fetch_type_table
because the main difference of those type tables (fetcharg
methods) are gone. Now we can consolidate it.

Signed-off-by: Masami Hiramatsu 
---
 kernel/trace/trace_kprobe.c |   27 +-
 kernel/trace/trace_probe.c  |   54 +--
 kernel/trace/trace_probe.h  |6 +
 kernel/trace/trace_uprobe.c |   27 +-
 4 files changed, 39 insertions(+), 75 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 8423815ff986..4caf019b5917 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -122,30 +122,6 @@ static int kprobe_dispatcher(struct kprobe *kp, struct 
pt_regs *regs);
 static int kretprobe_dispatcher(struct kretprobe_instance *ri,
struct pt_regs *regs);
 
-/* Fetch type information table */
-static const struct fetch_type kprobes_fetch_type_table[] = {
-   /* Special types */
-   [FETCH_TYPE_STRING] = __ASSIGN_FETCH_TYPE("string", string, string,
-   sizeof(u32), 1, "__data_loc char[]"),
-   [FETCH_TYPE_STRSIZE] = __ASSIGN_FETCH_TYPE("string_size", u32,
-   string_size, sizeof(u32), 0, "u32"),
-   /* Basic types */
-   ASSIGN_FETCH_TYPE(u8,  u8,  0),
-   ASSIGN_FETCH_TYPE(u16, u16, 0),
-   ASSIGN_FETCH_TYPE(u32, u32, 0),
-   ASSIGN_FETCH_TYPE(u64, u64, 0),
-   ASSIGN_FETCH_TYPE(s8,  u8,  1),
-   ASSIGN_FETCH_TYPE(s16, u16, 1),
-   ASSIGN_FETCH_TYPE(s32, u32, 1),
-   ASSIGN_FETCH_TYPE(s64, u64, 1),
-   ASSIGN_FETCH_TYPE_ALIAS(x8,  u8,  u8,  0),
-   ASSIGN_FETCH_TYPE_ALIAS(x16, u16, u16, 0),
-   ASSIGN_FETCH_TYPE_ALIAS(x32, u32, u32, 0),
-   ASSIGN_FETCH_TYPE_ALIAS(x64, u64, u64, 0),
-
-   ASSIGN_FETCH_TYPE_END
-};
-
 /*
  * Allocate new trace_probe and initialize it (including kprobes).
  */
@@ -674,8 +650,7 @@ static int create_trace_kprobe(int argc, char **argv)
 
/* Parse fetch argument */
ret = traceprobe_parse_probe_arg(arg, >tp.size, parg,
-is_return, true,
-kprobes_fetch_type_table);
+is_return, true);
if (ret) {
pr_info("Parse error at argument[%d]. (%d)\n", i, ret);
goto error;
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index a7e36606718a..9ffc07c7c949 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -73,8 +73,29 @@ int PRINT_TYPE_FUNC_NAME(string)(struct trace_seq *s, void 
*data, void *ent)
 
 const char PRINT_TYPE_FMT_NAME(string)[] = "\\\"%s\\\"";
 
-static const struct fetch_type *find_fetch_type(const char *type,
-   const struct fetch_type *ftbl)
+/* Fetch type information table */
+static const struct fetch_type probe_fetch_types[] = {
+   /* Special types */
+   __ASSIGN_FETCH_TYPE("string", string, string, sizeof(u32), 1,
+   "__data_loc char[]"),
+   /* Basic types */
+   ASSIGN_FETCH_TYPE(u8,  u8,  0),
+   ASSIGN_FETCH_TYPE(u16, u16, 0),
+   ASSIGN_FETCH_TYPE(u32, u32, 0),
+   ASSIGN_FETCH_TYPE(u64, u64, 0),
+   ASSIGN_FETCH_TYPE(s8,  u8,  1),
+   ASSIGN_FETCH_TYPE(s16, u16, 1),
+   ASSIGN_FETCH_TYPE(s32, u32, 1),
+   ASSIGN_FETCH_TYPE(s64, u64, 1),
+   ASSIGN_FETCH_TYPE_ALIAS(x8,  u8,  u8,  0),
+   ASSIGN_FETCH_TYPE_ALIAS(x16, u16, u16, 0),
+   ASSIGN_FETCH_TYPE_ALIAS(x32, u32, u32, 0),
+   ASSIGN_FETCH_TYPE_ALIAS(x64, u64, u64, 0),
+
+   ASSIGN_FETCH_TYPE_END
+};
+
+static const struct fetch_type *find_fetch_type(const char *type)
 {
int i;
 
@@ -95,21 +116,21 @@ static const struct fetch_type *find_fetch_type(const char 
*type,
 
switch (bs) {
case 8:
-   return find_fetch_type("u8", ftbl);
+   return find_fetch_type("u8");
case 16:
-   return find_fetch_type("u16", ftbl);
+   return find_fetch_type("u16");
case 32:
-   return find_fetch_type("u32", ftbl);
+   return find_fetch_type("u32");
case 64:
-   return find_fetch_type("u64", ftbl);
+   return find_fetch_type("u64");
default:
goto fail;
}
}
 
-   for (i = 0; ftbl[i].name; i++) {
-   if (strcmp(type, ftbl[i].name) == 0)
-   return [i];
+   for (i = 0; probe_fetch_types[i].name; i++) {
+   if (strcmp(type, probe_fetch_types[i].name) == 0)
+   return _fetch_types[i];
}

[PATCH v3 06/18] tracing: probeevent: Remove NOKPROBE_SYMBOL from print functions

2018-02-23 Thread Masami Hiramatsu

Remove unneeded NOKPROBE_SYMBOL from print functions since
the print functions are only used when printing out the
trace data, and not from kprobe handler.

Signed-off-by: Masami Hiramatsu 
---
 kernel/trace/trace_probe.c |4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 92d96ed6a2bf..0a809efabeb6 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -43,8 +43,7 @@ int PRINT_TYPE_FUNC_NAME(tname)(struct trace_seq *s, void 
*data, void *ent)\
trace_seq_printf(s, fmt, *(type *)data);\
return !trace_seq_has_overflowed(s);\
 }  \
-const char PRINT_TYPE_FMT_NAME(tname)[] = fmt; \
-NOKPROBE_SYMBOL(PRINT_TYPE_FUNC_NAME(tname));
+const char PRINT_TYPE_FMT_NAME(tname)[] = fmt;
 
 DEFINE_BASIC_PRINT_TYPE_FUNC(u8,  u8,  "%u")
 DEFINE_BASIC_PRINT_TYPE_FUNC(u16, u16, "%u")
@@ -71,7 +70,6 @@ int PRINT_TYPE_FUNC_NAME(string)(struct trace_seq *s, void 
*data, void *ent)
 (const char *)get_loc_data(data, ent));
return !trace_seq_has_overflowed(s);
 }
-NOKPROBE_SYMBOL(PRINT_TYPE_FUNC_NAME(string));
 
 const char PRINT_TYPE_FMT_NAME(string)[] = "\\\"%s\\\"";

[PATCH v3 05/18] tracing: probeevent: Cleanup argument field definition

2018-02-23 Thread Masami Hiramatsu

Cleanup event argument definition code in one place for
maintenancability.

Signed-off-by: Masami Hiramatsu 
---
 kernel/trace/trace_kprobe.c |   32 
 kernel/trace/trace_probe.c  |   21 +
 kernel/trace/trace_probe.h  |2 ++
 kernel/trace/trace_uprobe.c |   15 ++-
 4 files changed, 29 insertions(+), 41 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 503827750446..1018867022b6 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1146,49 +1146,25 @@ print_kretprobe_event(struct trace_iterator *iter, int 
flags,
 
 static int kprobe_event_define_fields(struct trace_event_call *event_call)
 {
-   int ret, i;
+   int ret;
struct kprobe_trace_entry_head field;
struct trace_kprobe *tk = (struct trace_kprobe *)event_call->data;
 
DEFINE_FIELD(unsigned long, ip, FIELD_STRING_IP, 0);
-   /* Set argument names as fields */
-   for (i = 0; i < tk->tp.nr_args; i++) {
-   struct probe_arg *parg = >tp.args[i];
 
-   ret = trace_define_field(event_call, parg->type->fmttype,
-parg->name,
-sizeof(field) + parg->offset,
-parg->type->size,
-parg->type->is_signed,
-FILTER_OTHER);
-   if (ret)
-   return ret;
-   }
-   return 0;
+   return traceprobe_define_arg_fields(event_call, sizeof(field), >tp);
 }
 
 static int kretprobe_event_define_fields(struct trace_event_call *event_call)
 {
-   int ret, i;
+   int ret;
struct kretprobe_trace_entry_head field;
struct trace_kprobe *tk = (struct trace_kprobe *)event_call->data;
 
DEFINE_FIELD(unsigned long, func, FIELD_STRING_FUNC, 0);
DEFINE_FIELD(unsigned long, ret_ip, FIELD_STRING_RETIP, 0);
-   /* Set argument names as fields */
-   for (i = 0; i < tk->tp.nr_args; i++) {
-   struct probe_arg *parg = >tp.args[i];
 
-   ret = trace_define_field(event_call, parg->type->fmttype,
-parg->name,
-sizeof(field) + parg->offset,
-parg->type->size,
-parg->type->is_signed,
-FILTER_OTHER);
-   if (ret)
-   return ret;
-   }
-   return 0;
+   return traceprobe_define_arg_fields(event_call, sizeof(field), >tp);
 }
 
 #ifdef CONFIG_PERF_EVENTS
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index e90cccf0adc7..92d96ed6a2bf 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -680,3 +680,24 @@ int set_print_fmt(struct trace_probe *tp, bool is_return)
 
return 0;
 }
+
+int traceprobe_define_arg_fields(struct trace_event_call *event_call,
+size_t offset, struct trace_probe *tp)
+{
+   int ret, i;
+
+   /* Set argument names as fields */
+   for (i = 0; i < tp->nr_args; i++) {
+   struct probe_arg *parg = >args[i];
+
+   ret = trace_define_field(event_call, parg->type->fmttype,
+parg->name,
+offset + parg->offset,
+parg->type->size,
+parg->type->is_signed,
+FILTER_OTHER);
+   if (ret)
+   return ret;
+   }
+   return 0;
+}
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 0c8e66f9c855..de928052926b 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -440,3 +440,5 @@ extern struct trace_event_call *
 create_local_trace_uprobe(char *name, unsigned long offs, bool is_return);
 extern void destroy_local_trace_uprobe(struct trace_event_call *event_call);
 #endif
+extern int traceprobe_define_arg_fields(struct trace_event_call *event_call,
+   size_t offset, struct trace_probe *tp);
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 4c006a693663..887da2bb63aa 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -982,7 +982,7 @@ probe_event_disable(struct trace_uprobe *tu, struct 
trace_event_file *file)
 
 static int uprobe_event_define_fields(struct trace_event_call *event_call)
 {
-   int ret, i, size;
+   int ret, size;
struct uprobe_trace_entry_head field;
struct trace_uprobe *tu = event_call->data;
 
@@ -994,19 +994,8 @@ static int uprobe_event_define_fields(struct 
trace_event_call *event_call)

[PATCH v3 06/18] tracing: probeevent: Remove NOKPROBE_SYMBOL from print functions

2018-02-23 Thread Masami Hiramatsu

Remove unneeded NOKPROBE_SYMBOL from print functions since
the print functions are only used when printing out the
trace data, and not from kprobe handler.

Signed-off-by: Masami Hiramatsu 
---
 kernel/trace/trace_probe.c |4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 92d96ed6a2bf..0a809efabeb6 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -43,8 +43,7 @@ int PRINT_TYPE_FUNC_NAME(tname)(struct trace_seq *s, void 
*data, void *ent)\
trace_seq_printf(s, fmt, *(type *)data);\
return !trace_seq_has_overflowed(s);\
 }  \
-const char PRINT_TYPE_FMT_NAME(tname)[] = fmt; \
-NOKPROBE_SYMBOL(PRINT_TYPE_FUNC_NAME(tname));
+const char PRINT_TYPE_FMT_NAME(tname)[] = fmt;
 
 DEFINE_BASIC_PRINT_TYPE_FUNC(u8,  u8,  "%u")
 DEFINE_BASIC_PRINT_TYPE_FUNC(u16, u16, "%u")
@@ -71,7 +70,6 @@ int PRINT_TYPE_FUNC_NAME(string)(struct trace_seq *s, void 
*data, void *ent)
 (const char *)get_loc_data(data, ent));
return !trace_seq_has_overflowed(s);
 }
-NOKPROBE_SYMBOL(PRINT_TYPE_FUNC_NAME(string));
 
 const char PRINT_TYPE_FMT_NAME(string)[] = "\\\"%s\\\"";

[PATCH v3 05/18] tracing: probeevent: Cleanup argument field definition

2018-02-23 Thread Masami Hiramatsu

Cleanup event argument definition code in one place for
maintenancability.

Signed-off-by: Masami Hiramatsu 
---
 kernel/trace/trace_kprobe.c |   32 
 kernel/trace/trace_probe.c  |   21 +
 kernel/trace/trace_probe.h  |2 ++
 kernel/trace/trace_uprobe.c |   15 ++-
 4 files changed, 29 insertions(+), 41 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 503827750446..1018867022b6 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1146,49 +1146,25 @@ print_kretprobe_event(struct trace_iterator *iter, int 
flags,
 
 static int kprobe_event_define_fields(struct trace_event_call *event_call)
 {
-   int ret, i;
+   int ret;
struct kprobe_trace_entry_head field;
struct trace_kprobe *tk = (struct trace_kprobe *)event_call->data;
 
DEFINE_FIELD(unsigned long, ip, FIELD_STRING_IP, 0);
-   /* Set argument names as fields */
-   for (i = 0; i < tk->tp.nr_args; i++) {
-   struct probe_arg *parg = >tp.args[i];
 
-   ret = trace_define_field(event_call, parg->type->fmttype,
-parg->name,
-sizeof(field) + parg->offset,
-parg->type->size,
-parg->type->is_signed,
-FILTER_OTHER);
-   if (ret)
-   return ret;
-   }
-   return 0;
+   return traceprobe_define_arg_fields(event_call, sizeof(field), >tp);
 }
 
 static int kretprobe_event_define_fields(struct trace_event_call *event_call)
 {
-   int ret, i;
+   int ret;
struct kretprobe_trace_entry_head field;
struct trace_kprobe *tk = (struct trace_kprobe *)event_call->data;
 
DEFINE_FIELD(unsigned long, func, FIELD_STRING_FUNC, 0);
DEFINE_FIELD(unsigned long, ret_ip, FIELD_STRING_RETIP, 0);
-   /* Set argument names as fields */
-   for (i = 0; i < tk->tp.nr_args; i++) {
-   struct probe_arg *parg = >tp.args[i];
 
-   ret = trace_define_field(event_call, parg->type->fmttype,
-parg->name,
-sizeof(field) + parg->offset,
-parg->type->size,
-parg->type->is_signed,
-FILTER_OTHER);
-   if (ret)
-   return ret;
-   }
-   return 0;
+   return traceprobe_define_arg_fields(event_call, sizeof(field), >tp);
 }
 
 #ifdef CONFIG_PERF_EVENTS
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index e90cccf0adc7..92d96ed6a2bf 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -680,3 +680,24 @@ int set_print_fmt(struct trace_probe *tp, bool is_return)
 
return 0;
 }
+
+int traceprobe_define_arg_fields(struct trace_event_call *event_call,
+size_t offset, struct trace_probe *tp)
+{
+   int ret, i;
+
+   /* Set argument names as fields */
+   for (i = 0; i < tp->nr_args; i++) {
+   struct probe_arg *parg = >args[i];
+
+   ret = trace_define_field(event_call, parg->type->fmttype,
+parg->name,
+offset + parg->offset,
+parg->type->size,
+parg->type->is_signed,
+FILTER_OTHER);
+   if (ret)
+   return ret;
+   }
+   return 0;
+}
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 0c8e66f9c855..de928052926b 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -440,3 +440,5 @@ extern struct trace_event_call *
 create_local_trace_uprobe(char *name, unsigned long offs, bool is_return);
 extern void destroy_local_trace_uprobe(struct trace_event_call *event_call);
 #endif
+extern int traceprobe_define_arg_fields(struct trace_event_call *event_call,
+   size_t offset, struct trace_probe *tp);
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 4c006a693663..887da2bb63aa 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -982,7 +982,7 @@ probe_event_disable(struct trace_uprobe *tu, struct 
trace_event_file *file)
 
 static int uprobe_event_define_fields(struct trace_event_call *event_call)
 {
-   int ret, i, size;
+   int ret, size;
struct uprobe_trace_entry_head field;
struct trace_uprobe *tu = event_call->data;
 
@@ -994,19 +994,8 @@ static int uprobe_event_define_fields(struct 
trace_event_call *event_call)
DEFINE_FIELD(unsigned long, vaddr[0],

[PATCH v3 04/18] tracing: probeevent: Cleanup print argument functions

2018-02-23 Thread Masami Hiramatsu

Current print argument functions prints the argument
name too. It is not good for printing out multiple
values for one argument. This change it to just print
out the value.

Signed-off-by: Masami Hiramatsu 
---
 kernel/trace/trace_kprobe.c |   20 ++--
 kernel/trace/trace_probe.c  |   12 +---
 kernel/trace/trace_probe.h  |   19 ---
 kernel/trace/trace_uprobe.c |9 ++---
 4 files changed, 29 insertions(+), 31 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index b5b1d8aa47d6..503827750446 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1090,8 +1090,6 @@ print_kprobe_event(struct trace_iterator *iter, int flags,
struct kprobe_trace_entry_head *field;
struct trace_seq *s = >seq;
struct trace_probe *tp;
-   u8 *data;
-   int i;
 
field = (struct kprobe_trace_entry_head *)iter->ent;
tp = container_of(event, struct trace_probe, call.event);
@@ -1103,11 +1101,9 @@ print_kprobe_event(struct trace_iterator *iter, int 
flags,
 
trace_seq_putc(s, ')');
 
-   data = (u8 *)[1];
-   for (i = 0; i < tp->nr_args; i++)
-   if (!tp->args[i].type->print(s, tp->args[i].name,
-data + tp->args[i].offset, field))
-   goto out;
+   if (print_probe_args(s, tp->args, tp->nr_args,
+(u8 *)[1], field) < 0)
+   goto out;
 
trace_seq_putc(s, '\n');
  out:
@@ -1121,8 +1117,6 @@ print_kretprobe_event(struct trace_iterator *iter, int 
flags,
struct kretprobe_trace_entry_head *field;
struct trace_seq *s = >seq;
struct trace_probe *tp;
-   u8 *data;
-   int i;
 
field = (struct kretprobe_trace_entry_head *)iter->ent;
tp = container_of(event, struct trace_probe, call.event);
@@ -1139,11 +1133,9 @@ print_kretprobe_event(struct trace_iterator *iter, int 
flags,
 
trace_seq_putc(s, ')');
 
-   data = (u8 *)[1];
-   for (i = 0; i < tp->nr_args; i++)
-   if (!tp->args[i].type->print(s, tp->args[i].name,
-data + tp->args[i].offset, field))
-   goto out;
+   if (print_probe_args(s, tp->args, tp->nr_args,
+(u8 *)[1], field) < 0)
+   goto out;
 
trace_seq_putc(s, '\n');
 
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index d3d1ee820336..e90cccf0adc7 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -38,10 +38,9 @@ const char *reserved_field_names[] = {
 
 /* Printing  in basic type function template */
 #define DEFINE_BASIC_PRINT_TYPE_FUNC(tname, type, fmt) \
-int PRINT_TYPE_FUNC_NAME(tname)(struct trace_seq *s, const char *name, \
-   void *data, void *ent)  \
+int PRINT_TYPE_FUNC_NAME(tname)(struct trace_seq *s, void *data, void *ent)\
 {  \
-   trace_seq_printf(s, " %s=" fmt, name, *(type *)data);   \
+   trace_seq_printf(s, fmt, *(type *)data);\
return !trace_seq_has_overflowed(s);\
 }  \
 const char PRINT_TYPE_FMT_NAME(tname)[] = fmt; \
@@ -61,15 +60,14 @@ DEFINE_BASIC_PRINT_TYPE_FUNC(x32, u32, "0x%x")
 DEFINE_BASIC_PRINT_TYPE_FUNC(x64, u64, "0x%Lx")
 
 /* Print type function for string type */
-int PRINT_TYPE_FUNC_NAME(string)(struct trace_seq *s, const char *name,
-void *data, void *ent)
+int PRINT_TYPE_FUNC_NAME(string)(struct trace_seq *s, void *data, void *ent)
 {
int len = *(u32 *)data >> 16;
 
if (!len)
-   trace_seq_printf(s, " %s=(fault)", name);
+   trace_seq_puts(s, "(fault)");
else
-   trace_seq_printf(s, " %s=\"%s\"", name,
+   trace_seq_printf(s, "\"%s\"",
 (const char *)get_loc_data(data, ent));
return !trace_seq_has_overflowed(s);
 }
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 75daff22ccea..0c8e66f9c855 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -94,7 +94,7 @@ static nokprobe_inline void *get_loc_data(u32 *dl, void *ent)
 /* Data fetch function type */
 typedefvoid (*fetch_func_t)(struct pt_regs *, void *, void *);
 /* Printing function type */
-typedef int (*print_type_func_t)(struct trace_seq *, const char *, void *, 
void *);
+typedef int (*print_type_func_t)(struct trace_seq *, void *, void *);
 
 /* Fetch types */
 enum {
@@ -136,8 +136,7 @@ typedef u32 string_size;
 
 /* Printing  in basic type function template */
 #define DECLARE_BASIC_PRINT_TYPE_FUNC(type)

[PATCH v3 04/18] tracing: probeevent: Cleanup print argument functions

2018-02-23 Thread Masami Hiramatsu

Current print argument functions prints the argument
name too. It is not good for printing out multiple
values for one argument. This change it to just print
out the value.

Signed-off-by: Masami Hiramatsu 
---
 kernel/trace/trace_kprobe.c |   20 ++--
 kernel/trace/trace_probe.c  |   12 +---
 kernel/trace/trace_probe.h  |   19 ---
 kernel/trace/trace_uprobe.c |9 ++---
 4 files changed, 29 insertions(+), 31 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index b5b1d8aa47d6..503827750446 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1090,8 +1090,6 @@ print_kprobe_event(struct trace_iterator *iter, int flags,
struct kprobe_trace_entry_head *field;
struct trace_seq *s = >seq;
struct trace_probe *tp;
-   u8 *data;
-   int i;
 
field = (struct kprobe_trace_entry_head *)iter->ent;
tp = container_of(event, struct trace_probe, call.event);
@@ -1103,11 +1101,9 @@ print_kprobe_event(struct trace_iterator *iter, int 
flags,
 
trace_seq_putc(s, ')');
 
-   data = (u8 *)[1];
-   for (i = 0; i < tp->nr_args; i++)
-   if (!tp->args[i].type->print(s, tp->args[i].name,
-data + tp->args[i].offset, field))
-   goto out;
+   if (print_probe_args(s, tp->args, tp->nr_args,
+(u8 *)[1], field) < 0)
+   goto out;
 
trace_seq_putc(s, '\n');
  out:
@@ -1121,8 +1117,6 @@ print_kretprobe_event(struct trace_iterator *iter, int 
flags,
struct kretprobe_trace_entry_head *field;
struct trace_seq *s = >seq;
struct trace_probe *tp;
-   u8 *data;
-   int i;
 
field = (struct kretprobe_trace_entry_head *)iter->ent;
tp = container_of(event, struct trace_probe, call.event);
@@ -1139,11 +1133,9 @@ print_kretprobe_event(struct trace_iterator *iter, int 
flags,
 
trace_seq_putc(s, ')');
 
-   data = (u8 *)[1];
-   for (i = 0; i < tp->nr_args; i++)
-   if (!tp->args[i].type->print(s, tp->args[i].name,
-data + tp->args[i].offset, field))
-   goto out;
+   if (print_probe_args(s, tp->args, tp->nr_args,
+(u8 *)[1], field) < 0)
+   goto out;
 
trace_seq_putc(s, '\n');
 
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index d3d1ee820336..e90cccf0adc7 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -38,10 +38,9 @@ const char *reserved_field_names[] = {
 
 /* Printing  in basic type function template */
 #define DEFINE_BASIC_PRINT_TYPE_FUNC(tname, type, fmt) \
-int PRINT_TYPE_FUNC_NAME(tname)(struct trace_seq *s, const char *name, \
-   void *data, void *ent)  \
+int PRINT_TYPE_FUNC_NAME(tname)(struct trace_seq *s, void *data, void *ent)\
 {  \
-   trace_seq_printf(s, " %s=" fmt, name, *(type *)data);   \
+   trace_seq_printf(s, fmt, *(type *)data);\
return !trace_seq_has_overflowed(s);\
 }  \
 const char PRINT_TYPE_FMT_NAME(tname)[] = fmt; \
@@ -61,15 +60,14 @@ DEFINE_BASIC_PRINT_TYPE_FUNC(x32, u32, "0x%x")
 DEFINE_BASIC_PRINT_TYPE_FUNC(x64, u64, "0x%Lx")
 
 /* Print type function for string type */
-int PRINT_TYPE_FUNC_NAME(string)(struct trace_seq *s, const char *name,
-void *data, void *ent)
+int PRINT_TYPE_FUNC_NAME(string)(struct trace_seq *s, void *data, void *ent)
 {
int len = *(u32 *)data >> 16;
 
if (!len)
-   trace_seq_printf(s, " %s=(fault)", name);
+   trace_seq_puts(s, "(fault)");
else
-   trace_seq_printf(s, " %s=\"%s\"", name,
+   trace_seq_printf(s, "\"%s\"",
 (const char *)get_loc_data(data, ent));
return !trace_seq_has_overflowed(s);
 }
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 75daff22ccea..0c8e66f9c855 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -94,7 +94,7 @@ static nokprobe_inline void *get_loc_data(u32 *dl, void *ent)
 /* Data fetch function type */
 typedefvoid (*fetch_func_t)(struct pt_regs *, void *, void *);
 /* Printing function type */
-typedef int (*print_type_func_t)(struct trace_seq *, const char *, void *, 
void *);
+typedef int (*print_type_func_t)(struct trace_seq *, void *, void *);
 
 /* Fetch types */
 enum {
@@ -136,8 +136,7 @@ typedef u32 string_size;
 
 /* Printing  in basic type function template */
 #define DECLARE_BASIC_PRINT_TYPE_FUNC(type)

[PATCH v3 03/18] selftests: ftrace: Add a testcase for string type with kprobe_event

2018-02-23 Thread Masami Hiramatsu

Add a testcase for string type with kprobe event.
This tests good/bad syntax combinations and also
the traced data is correct in several way.

Signed-off-by: Masami Hiramatsu 
---
 .../ftrace/test.d/kprobe/kprobe_args_string.tc |   46 
 1 file changed, 46 insertions(+)
 create mode 100644 
tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc

diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc 
b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc
new file mode 100644
index ..5ba73035e1d9
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc
@@ -0,0 +1,46 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: Kprobe event string type argument
+
+[ -f kprobe_events ] || exit_unsupported # this is configurable
+
+echo 0 > events/enable
+echo > kprobe_events
+
+case `uname -m` in
+x86_64)
+  ARG2=%si
+  OFFS=8
+;;
+i[3456]86)
+  ARG2=%cx
+  OFFS=4
+;;
+aarch64)
+  ARG2=%x1
+  OFFS=8
+;;
+arm*)
+  ARG2=%r1
+  OFFS=4
+;;
+*)
+  echo "Please implement other architecture here"
+  exit_untested
+esac
+
+: "Test get argument (1)"
+echo "p:testprobe create_trace_kprobe arg1=+0(+0(${ARG2})):string" > 
kprobe_events
+echo 1 > events/kprobes/testprobe/enable
+! echo test >> kprobe_events
+tail -n 1 trace | grep -qe "testprobe.* arg1=\"test\""
+
+echo 0 > events/kprobes/testprobe/enable
+: "Test get argument (2)"
+echo "p:testprobe create_trace_kprobe arg1=+0(+0(${ARG2})):string 
arg2=+0(+${OFFS}(${ARG2})):string" > kprobe_events
+echo 1 > events/kprobes/testprobe/enable
+! echo test1 test2 >> kprobe_events
+tail -n 1 trace | grep -qe "testprobe.* arg1=\"test1\" arg2=\"test2\""
+
+echo 0 > events/enable
+echo > kprobe_events

[PATCH v3 03/18] selftests: ftrace: Add a testcase for string type with kprobe_event

2018-02-23 Thread Masami Hiramatsu

Add a testcase for string type with kprobe event.
This tests good/bad syntax combinations and also
the traced data is correct in several way.

Signed-off-by: Masami Hiramatsu 
---
 .../ftrace/test.d/kprobe/kprobe_args_string.tc |   46 
 1 file changed, 46 insertions(+)
 create mode 100644 
tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc

diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc 
b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc
new file mode 100644
index ..5ba73035e1d9
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc
@@ -0,0 +1,46 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: Kprobe event string type argument
+
+[ -f kprobe_events ] || exit_unsupported # this is configurable
+
+echo 0 > events/enable
+echo > kprobe_events
+
+case `uname -m` in
+x86_64)
+  ARG2=%si
+  OFFS=8
+;;
+i[3456]86)
+  ARG2=%cx
+  OFFS=4
+;;
+aarch64)
+  ARG2=%x1
+  OFFS=8
+;;
+arm*)
+  ARG2=%r1
+  OFFS=4
+;;
+*)
+  echo "Please implement other architecture here"
+  exit_untested
+esac
+
+: "Test get argument (1)"
+echo "p:testprobe create_trace_kprobe arg1=+0(+0(${ARG2})):string" > 
kprobe_events
+echo 1 > events/kprobes/testprobe/enable
+! echo test >> kprobe_events
+tail -n 1 trace | grep -qe "testprobe.* arg1=\"test\""
+
+echo 0 > events/kprobes/testprobe/enable
+: "Test get argument (2)"
+echo "p:testprobe create_trace_kprobe arg1=+0(+0(${ARG2})):string 
arg2=+0(+${OFFS}(${ARG2})):string" > kprobe_events
+echo 1 > events/kprobes/testprobe/enable
+! echo test1 test2 >> kprobe_events
+tail -n 1 trace | grep -qe "testprobe.* arg1=\"test1\" arg2=\"test2\""
+
+echo 0 > events/enable
+echo > kprobe_events

Re: [RFC tip/locking/lockdep v5 04/17] lockdep: Introduce lock_list::dep

2018-02-23 Thread Boqun Feng

On Fri, Feb 23, 2018 at 08:37:32PM +0800, Boqun Feng wrote:
> On Fri, Feb 23, 2018 at 12:55:20PM +0100, Peter Zijlstra wrote:
> > On Thu, Feb 22, 2018 at 03:08:51PM +0800, Boqun Feng wrote:
> > > @@ -1012,6 +1013,33 @@ static inline bool bfs_error(enum bfs_result res)
> > >   return res < 0;
> > >  }
> > >  
> > > +#define DEP_NN_BIT 0
> > > +#define DEP_RN_BIT 1
> > > +#define DEP_NR_BIT 2
> > > +#define DEP_RR_BIT 3
> > > +
> > > +#define DEP_NN_MASK (1U << (DEP_NN_BIT))
> > > +#define DEP_RN_MASK (1U << (DEP_RN_BIT))
> > > +#define DEP_NR_MASK (1U << (DEP_NR_BIT))
> > > +#define DEP_RR_MASK (1U << (DEP_RR_BIT))
> > > +
> > > +static inline unsigned int __calc_dep_bit(int prev, int next)
> > > +{
> > > + if (prev == 2 && next != 2)
> > > + return DEP_RN_BIT;
> > > + if (prev != 2 && next == 2)
> > > + return DEP_NR_BIT;
> > > + if (prev == 2 && next == 2)
> > > + return DEP_RR_BIT;
> > > + else
> > > + return DEP_NN_BIT;
> > > +}
> > > +
> > > +static inline unsigned int calc_dep(int prev, int next)
> > > +{
> > > + return 1U << __calc_dep_bit(prev, next);
> > > +}
> > > +
> > >  static enum bfs_result __bfs(struct lock_list *source_entry,
> > >void *data,
> > >int (*match)(struct lock_list *entry, void *data),
> > > @@ -1921,6 +1949,16 @@ check_prev_add(struct task_struct *curr, struct 
> > > held_lock *prev,
> > >   if (entry->class == hlock_class(next)) {
> > >   if (distance == 1)
> > >   entry->distance = 1;
> > > + entry->dep |= calc_dep(prev->read, next->read);
> > > + }
> > > + }
> > > +
> > > + /* Also, update the reverse dependency in @next's ->locks_before list */
> > > + list_for_each_entry(entry, _class(next)->locks_before, entry) {
> > > + if (entry->class == hlock_class(prev)) {
> > > + if (distance == 1)
> > > + entry->distance = 1;
> > > + entry->dep |= calc_dep(next->read, prev->read);
> > >   return 1;
> > >   }
> > >   }
> > 
> > I think it all becomes simpler if you use only 2 bits. Such that:
> > 
> >   bit0 is the prev R (0) or N (1) value,
> >   bit1 is the next R (0) or N (1) value.
> > 
> > I think this should work because we don't care about the empty set
> > (currently ) and all the complexity in patch 5 is because we can
> > have R bits set when there's also N bits. The concequence of that is
> > that we cannot replace ! with ~ (which is what I kept doing).
> > 
> > But with only 2 bits, we only track the strongest relation in the set,
> > which is exactly what we appear to need.
> > 
> 
> But if we only have RN and NR, both bits will be set, we can not check
> whether we have NN or not. Consider we have:
> 
>   A -(RR)-> B
>   B -(NR)-> C and B -(RN)-> C
>   C -(RN)-> A
> 
> this is not a deadlock case, but with "two bits" approach, we can not
> differ this with:
> 
>   A -(RR)-> B
>   B -(NN)-> C
>   C -(RN)-> A
> 
> , which is a deadlock.
> 
> But maybe "three bits" (NR, RN and NN bits) approach works, that is if
> ->dep is 0, we indicates this is only RR, and is_rx() becomes:
> 
>   static inline bool is_rx(u8 dep)
>   {
>   return !(dep & (NR_MASK | NN_MASK));
>   }
> 
> and is_xr() becomes:
> 
>   static inline bool is_xr(u8 dep)
>   {
>   return !(dep & (RN_MASK | NN_MASK));
>   }
> 
> , with this I think your simplification with have_xr works, thanks!
> 

Ah! I see. Actually your very first approach works, except the
definitions of is_rx() and ir_xr() are wrong. In that approach, you
define

static inline bool is_rx(u8 dep)
{
return !!(dep & (DEP_RR_MASK | DEP_RN_MASK);
}

, which means "whether we have a R* dependency?". But in fact, what we
need to check is "whether we _only_ have R* dependencies?", if so and
have_xr is true, that means we could only have a -(*R)-> A -(R*)-> if we
pick the next dependency, and that means we should skip. So my new
definition above works, and I think we better name it as only_rx() to
avoid confusion? Ditto for is_xr().

I also reorder bit number for each kind of dependency, so that we have a
simple __calc_dep_bit(), see the following:

/*
 * DEP_*_BIT in lock_list::dep
 *
 * For dependency @prev -> @next:
 *
 *   RR: both @prev and @next are recursive read locks, i.e. ->read == 
2.
 *   RN: @prev is recursive and @next is non-recursive.
 *   NR: @prev is a not recursive and @next is recursive.
 *   NN: both @prev and @next are non-recursive.
 * 
 * Note that we define the value of DEP_*_BITs so that:
 *  bit0 is prev->read != 2
 *  bit1 is next->read != 2
 */
#define DEP_RR_BIT 0
#define DEP_RN_BIT 1
#define DEP_NR_BIT 2

[PATCH v3 01/18] [BUGFIX] tracing: probeevent: Fix to support minus offset from symbol

2018-02-23 Thread Masami Hiramatsu

In Documentation/trace/kprobetrace.txt, it says

 @SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol)

However, the parser doesn't parse minus offset correctly, since
commit 2fba0c8867af ("tracing/kprobes: Fix probe offset to be
unsigned") drops minus ("-") offset support for kprobe probe
address usage.

This fixes the traceprobe_split_symbol_offset() to parse minus
offset again with checking the offset range, and add a minus
offset check in kprobe probe address usage.

Fixes: 2fba0c8867af ("tracing/kprobes: Fix probe offset to be unsigned")
Signed-off-by: Masami Hiramatsu 
---
 Changes in v3:
  - Use kstrtol instead of kstrtoul. (Thanks Namhyung!)
---
 kernel/trace/trace_kprobe.c |4 ++--
 kernel/trace/trace_probe.c  |8 +++-
 kernel/trace/trace_probe.h  |2 +-
 3 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 5ce9b8cf7be3..b5b1d8aa47d6 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -667,7 +667,7 @@ static int create_trace_kprobe(int argc, char **argv)
char *symbol = NULL, *event = NULL, *group = NULL;
int maxactive = 0;
char *arg;
-   unsigned long offset = 0;
+   long offset = 0;
void *addr = NULL;
char buf[MAX_EVENT_NAME_LEN];
 
@@ -755,7 +755,7 @@ static int create_trace_kprobe(int argc, char **argv)
symbol = argv[1];
/* TODO: support .init module functions */
ret = traceprobe_split_symbol_offset(symbol, );
-   if (ret) {
+   if (ret || offset < 0) {
pr_info("Failed to parse either an address or a 
symbol.\n");
return ret;
}
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index d59357308677..d3d1ee820336 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -320,7 +320,7 @@ static fetch_func_t get_fetch_size_function(const struct 
fetch_type *type,
 }
 
 /* Split symbol and offset. */
-int traceprobe_split_symbol_offset(char *symbol, unsigned long *offset)
+int traceprobe_split_symbol_offset(char *symbol, long *offset)
 {
char *tmp;
int ret;
@@ -328,13 +328,11 @@ int traceprobe_split_symbol_offset(char *symbol, unsigned 
long *offset)
if (!offset)
return -EINVAL;
 
-   tmp = strchr(symbol, '+');
+   tmp = strpbrk(symbol, "+-");
if (tmp) {
-   /* skip sign because kstrtoul doesn't accept '+' */
-   ret = kstrtoul(tmp + 1, 0, offset);
+   ret = kstrtol(tmp + 1, 0, offset);
if (ret)
return ret;
-
*tmp = '\0';
} else
*offset = 0;
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 0745f895f780..75daff22ccea 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -365,7 +365,7 @@ extern int traceprobe_conflict_field_name(const char *name,
 extern void traceprobe_update_arg(struct probe_arg *arg);
 extern void traceprobe_free_probe_arg(struct probe_arg *arg);
 
-extern int traceprobe_split_symbol_offset(char *symbol, unsigned long *offset);
+extern int traceprobe_split_symbol_offset(char *symbol, long *offset);
 
 /* Sum up total data length for dynamic arraies (strings) */
 static nokprobe_inline int

Re: [RFC tip/locking/lockdep v5 04/17] lockdep: Introduce lock_list::dep

2018-02-23 Thread Boqun Feng

On Fri, Feb 23, 2018 at 08:37:32PM +0800, Boqun Feng wrote:
> On Fri, Feb 23, 2018 at 12:55:20PM +0100, Peter Zijlstra wrote:
> > On Thu, Feb 22, 2018 at 03:08:51PM +0800, Boqun Feng wrote:
> > > @@ -1012,6 +1013,33 @@ static inline bool bfs_error(enum bfs_result res)
> > >   return res < 0;
> > >  }
> > >  
> > > +#define DEP_NN_BIT 0
> > > +#define DEP_RN_BIT 1
> > > +#define DEP_NR_BIT 2
> > > +#define DEP_RR_BIT 3
> > > +
> > > +#define DEP_NN_MASK (1U << (DEP_NN_BIT))
> > > +#define DEP_RN_MASK (1U << (DEP_RN_BIT))
> > > +#define DEP_NR_MASK (1U << (DEP_NR_BIT))
> > > +#define DEP_RR_MASK (1U << (DEP_RR_BIT))
> > > +
> > > +static inline unsigned int __calc_dep_bit(int prev, int next)
> > > +{
> > > + if (prev == 2 && next != 2)
> > > + return DEP_RN_BIT;
> > > + if (prev != 2 && next == 2)
> > > + return DEP_NR_BIT;
> > > + if (prev == 2 && next == 2)
> > > + return DEP_RR_BIT;
> > > + else
> > > + return DEP_NN_BIT;
> > > +}
> > > +
> > > +static inline unsigned int calc_dep(int prev, int next)
> > > +{
> > > + return 1U << __calc_dep_bit(prev, next);
> > > +}
> > > +
> > >  static enum bfs_result __bfs(struct lock_list *source_entry,
> > >void *data,
> > >int (*match)(struct lock_list *entry, void *data),
> > > @@ -1921,6 +1949,16 @@ check_prev_add(struct task_struct *curr, struct 
> > > held_lock *prev,
> > >   if (entry->class == hlock_class(next)) {
> > >   if (distance == 1)
> > >   entry->distance = 1;
> > > + entry->dep |= calc_dep(prev->read, next->read);
> > > + }
> > > + }
> > > +
> > > + /* Also, update the reverse dependency in @next's ->locks_before list */
> > > + list_for_each_entry(entry, _class(next)->locks_before, entry) {
> > > + if (entry->class == hlock_class(prev)) {
> > > + if (distance == 1)
> > > + entry->distance = 1;
> > > + entry->dep |= calc_dep(next->read, prev->read);
> > >   return 1;
> > >   }
> > >   }
> > 
> > I think it all becomes simpler if you use only 2 bits. Such that:
> > 
> >   bit0 is the prev R (0) or N (1) value,
> >   bit1 is the next R (0) or N (1) value.
> > 
> > I think this should work because we don't care about the empty set
> > (currently ) and all the complexity in patch 5 is because we can
> > have R bits set when there's also N bits. The concequence of that is
> > that we cannot replace ! with ~ (which is what I kept doing).
> > 
> > But with only 2 bits, we only track the strongest relation in the set,
> > which is exactly what we appear to need.
> > 
> 
> But if we only have RN and NR, both bits will be set, we can not check
> whether we have NN or not. Consider we have:
> 
>   A -(RR)-> B
>   B -(NR)-> C and B -(RN)-> C
>   C -(RN)-> A
> 
> this is not a deadlock case, but with "two bits" approach, we can not
> differ this with:
> 
>   A -(RR)-> B
>   B -(NN)-> C
>   C -(RN)-> A
> 
> , which is a deadlock.
> 
> But maybe "three bits" (NR, RN and NN bits) approach works, that is if
> ->dep is 0, we indicates this is only RR, and is_rx() becomes:
> 
>   static inline bool is_rx(u8 dep)
>   {
>   return !(dep & (NR_MASK | NN_MASK));
>   }
> 
> and is_xr() becomes:
> 
>   static inline bool is_xr(u8 dep)
>   {
>   return !(dep & (RN_MASK | NN_MASK));
>   }
> 
> , with this I think your simplification with have_xr works, thanks!
> 

Ah! I see. Actually your very first approach works, except the
definitions of is_rx() and ir_xr() are wrong. In that approach, you
define

static inline bool is_rx(u8 dep)
{
return !!(dep & (DEP_RR_MASK | DEP_RN_MASK);
}

, which means "whether we have a R* dependency?". But in fact, what we
need to check is "whether we _only_ have R* dependencies?", if so and
have_xr is true, that means we could only have a -(*R)-> A -(R*)-> if we
pick the next dependency, and that means we should skip. So my new
definition above works, and I think we better name it as only_rx() to
avoid confusion? Ditto for is_xr().

I also reorder bit number for each kind of dependency, so that we have a
simple __calc_dep_bit(), see the following:

/*
 * DEP_*_BIT in lock_list::dep
 *
 * For dependency @prev -> @next:
 *
 *   RR: both @prev and @next are recursive read locks, i.e. ->read == 
2.
 *   RN: @prev is recursive and @next is non-recursive.
 *   NR: @prev is a not recursive and @next is recursive.
 *   NN: both @prev and @next are non-recursive.
 * 
 * Note that we define the value of DEP_*_BITs so that:
 *  bit0 is prev->read != 2
 *  bit1 is next->read != 2
 */
#define DEP_RR_BIT 0
#define DEP_RN_BIT 1
#define DEP_NR_BIT 2

[PATCH v3 01/18] [BUGFIX] tracing: probeevent: Fix to support minus offset from symbol

2018-02-23 Thread Masami Hiramatsu

In Documentation/trace/kprobetrace.txt, it says

 @SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol)

However, the parser doesn't parse minus offset correctly, since
commit 2fba0c8867af ("tracing/kprobes: Fix probe offset to be
unsigned") drops minus ("-") offset support for kprobe probe
address usage.

This fixes the traceprobe_split_symbol_offset() to parse minus
offset again with checking the offset range, and add a minus
offset check in kprobe probe address usage.

Fixes: 2fba0c8867af ("tracing/kprobes: Fix probe offset to be unsigned")
Signed-off-by: Masami Hiramatsu 
---
 Changes in v3:
  - Use kstrtol instead of kstrtoul. (Thanks Namhyung!)
---
 kernel/trace/trace_kprobe.c |4 ++--
 kernel/trace/trace_probe.c  |8 +++-
 kernel/trace/trace_probe.h  |2 +-
 3 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 5ce9b8cf7be3..b5b1d8aa47d6 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -667,7 +667,7 @@ static int create_trace_kprobe(int argc, char **argv)
char *symbol = NULL, *event = NULL, *group = NULL;
int maxactive = 0;
char *arg;
-   unsigned long offset = 0;
+   long offset = 0;
void *addr = NULL;
char buf[MAX_EVENT_NAME_LEN];
 
@@ -755,7 +755,7 @@ static int create_trace_kprobe(int argc, char **argv)
symbol = argv[1];
/* TODO: support .init module functions */
ret = traceprobe_split_symbol_offset(symbol, );
-   if (ret) {
+   if (ret || offset < 0) {
pr_info("Failed to parse either an address or a 
symbol.\n");
return ret;
}
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index d59357308677..d3d1ee820336 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -320,7 +320,7 @@ static fetch_func_t get_fetch_size_function(const struct 
fetch_type *type,
 }
 
 /* Split symbol and offset. */
-int traceprobe_split_symbol_offset(char *symbol, unsigned long *offset)
+int traceprobe_split_symbol_offset(char *symbol, long *offset)
 {
char *tmp;
int ret;
@@ -328,13 +328,11 @@ int traceprobe_split_symbol_offset(char *symbol, unsigned 
long *offset)
if (!offset)
return -EINVAL;
 
-   tmp = strchr(symbol, '+');
+   tmp = strpbrk(symbol, "+-");
if (tmp) {
-   /* skip sign because kstrtoul doesn't accept '+' */
-   ret = kstrtoul(tmp + 1, 0, offset);
+   ret = kstrtol(tmp + 1, 0, offset);
if (ret)
return ret;
-
*tmp = '\0';
} else
*offset = 0;
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 0745f895f780..75daff22ccea 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -365,7 +365,7 @@ extern int traceprobe_conflict_field_name(const char *name,
 extern void traceprobe_update_arg(struct probe_arg *arg);
 extern void traceprobe_free_probe_arg(struct probe_arg *arg);
 
-extern int traceprobe_split_symbol_offset(char *symbol, unsigned long *offset);
+extern int traceprobe_split_symbol_offset(char *symbol, long *offset);
 
 /* Sum up total data length for dynamic arraies (strings) */
 static nokprobe_inline int

[PATCH v3 02/18] selftests: ftrace: Add probe event argument syntax testcase

2018-02-23 Thread Masami Hiramatsu

Add a testcase for probe event argument syntax which
ensures the kprobe_events interface correctly parses
given event arguments.

Signed-off-by: Masami Hiramatsu 
---
 .../ftrace/test.d/kprobe/kprobe_args_syntax.tc |   97 
 1 file changed, 97 insertions(+)
 create mode 100644 
tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_syntax.tc

diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_syntax.tc 
b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_syntax.tc
new file mode 100644
index ..231bcd2c4eb5
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_syntax.tc
@@ -0,0 +1,97 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: Kprobe event argument syntax
+
+[ -f kprobe_events ] || exit_unsupported # this is configurable
+
+grep "x8/16/32/64" README > /dev/null || exit_unsupported # version issue
+
+echo 0 > events/enable
+echo > kprobe_events
+
+PROBEFUNC="vfs_read"
+GOODREG=
+BADREG=
+GOODSYM="_sdata"
+if ! grep -qw ${GOODSYM} /proc/kallsyms ; then
+  GOODSYM=$PROBEFUNC
+fi
+BADSYM="deaqswdefr"
+SYMADDR=0x`grep -w ${GOODSYM} /proc/kallsyms | cut -f 1 -d " "`
+GOODTYPE="x16"
+BADTYPE="y16"
+
+case `uname -m` in
+x86_64|i[3456]86)
+  GOODREG=%ax
+  BADREG=%ex
+;;
+aarch64)
+  GOODREG=%x0
+  BADREG=%ax
+;;
+arm*)
+  GOODREG=%r0
+  BADREG=%ax
+;;
+esac
+
+test_goodarg() # Good-args
+{
+  while [ "$1" ]; do
+echo "p ${PROBEFUNC} $1" > kprobe_events
+shift 1
+  done;
+}
+
+test_badarg() # Bad-args
+{
+  while [ "$1" ]; do
+! echo "p ${PROBEFUNC} $1" > kprobe_events
+shift 1
+  done;
+}
+
+echo > kprobe_events
+
+: "Register access"
+test_goodarg ${GOODREG}
+test_badarg ${BADREG}
+
+: "Symbol access"
+test_goodarg "@${GOODSYM}" "@${SYMADDR}" "@${GOODSYM}+10" "@${GOODSYM}-10"
+test_badarg "@" "@${BADSYM}" "@${GOODSYM}*10" "@${GOODSYM}/10" \
+   "@${GOODSYM}%10" "@${GOODSYM}&10" "@${GOODSYM}|10"
+
+: "Stack access"
+test_goodarg "\$stack" "\$stack0" "\$stack1"
+test_badarg "\$stackp" "\$stack0+10" "\$stack1-10"
+
+: "Retval access"
+echo "r ${PROBEFUNC} \$retval" > kprobe_events
+! echo "p ${PROBEFUNC} \$retval" > kprobe_events
+
+: "Comm access"
+test_goodarg "\$comm"
+
+: "Indirect memory access"
+test_goodarg "+0(${GOODREG})" "-0(${GOODREG})" "+10(\$stack)" \
+   "+0(\$stack1)" "+10(@${GOODSYM}-10)" "+0(+10(+20(\$stack)))"
+test_badarg "+(${GOODREG})" "(${GOODREG}+10)" "-(${GOODREG})" "(${GOODREG})" \
+   "+10(\$comm)" "+0(${GOODREG})+10"
+
+: "Name assignment"
+test_goodarg "varname=${GOODREG}"
+test_badarg "varname=varname2=${GOODREG}"
+
+: "Type syntax"
+test_goodarg "${GOODREG}:${GOODTYPE}"
+test_badarg "${GOODREG}::${GOODTYPE}" "${GOODREG}:${BADTYPE}" \
+   "${GOODTYPE}:${GOODREG}"
+
+: "Combination check"
+
+test_goodarg "\$comm:string" "+0(\$stack):string"
+test_badarg "\$comm:x64" "\$stack:string" "${GOODREG}:string"
+
+echo > kprobe_events

[PATCH v3 02/18] selftests: ftrace: Add probe event argument syntax testcase

2018-02-23 Thread Masami Hiramatsu

Add a testcase for probe event argument syntax which
ensures the kprobe_events interface correctly parses
given event arguments.

Signed-off-by: Masami Hiramatsu 
---
 .../ftrace/test.d/kprobe/kprobe_args_syntax.tc |   97 
 1 file changed, 97 insertions(+)
 create mode 100644 
tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_syntax.tc

diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_syntax.tc 
b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_syntax.tc
new file mode 100644
index ..231bcd2c4eb5
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_syntax.tc
@@ -0,0 +1,97 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: Kprobe event argument syntax
+
+[ -f kprobe_events ] || exit_unsupported # this is configurable
+
+grep "x8/16/32/64" README > /dev/null || exit_unsupported # version issue
+
+echo 0 > events/enable
+echo > kprobe_events
+
+PROBEFUNC="vfs_read"
+GOODREG=
+BADREG=
+GOODSYM="_sdata"
+if ! grep -qw ${GOODSYM} /proc/kallsyms ; then
+  GOODSYM=$PROBEFUNC
+fi
+BADSYM="deaqswdefr"
+SYMADDR=0x`grep -w ${GOODSYM} /proc/kallsyms | cut -f 1 -d " "`
+GOODTYPE="x16"
+BADTYPE="y16"
+
+case `uname -m` in
+x86_64|i[3456]86)
+  GOODREG=%ax
+  BADREG=%ex
+;;
+aarch64)
+  GOODREG=%x0
+  BADREG=%ax
+;;
+arm*)
+  GOODREG=%r0
+  BADREG=%ax
+;;
+esac
+
+test_goodarg() # Good-args
+{
+  while [ "$1" ]; do
+echo "p ${PROBEFUNC} $1" > kprobe_events
+shift 1
+  done;
+}
+
+test_badarg() # Bad-args
+{
+  while [ "$1" ]; do
+! echo "p ${PROBEFUNC} $1" > kprobe_events
+shift 1
+  done;
+}
+
+echo > kprobe_events
+
+: "Register access"
+test_goodarg ${GOODREG}
+test_badarg ${BADREG}
+
+: "Symbol access"
+test_goodarg "@${GOODSYM}" "@${SYMADDR}" "@${GOODSYM}+10" "@${GOODSYM}-10"
+test_badarg "@" "@${BADSYM}" "@${GOODSYM}*10" "@${GOODSYM}/10" \
+   "@${GOODSYM}%10" "@${GOODSYM}&10" "@${GOODSYM}|10"
+
+: "Stack access"
+test_goodarg "\$stack" "\$stack0" "\$stack1"
+test_badarg "\$stackp" "\$stack0+10" "\$stack1-10"
+
+: "Retval access"
+echo "r ${PROBEFUNC} \$retval" > kprobe_events
+! echo "p ${PROBEFUNC} \$retval" > kprobe_events
+
+: "Comm access"
+test_goodarg "\$comm"
+
+: "Indirect memory access"
+test_goodarg "+0(${GOODREG})" "-0(${GOODREG})" "+10(\$stack)" \
+   "+0(\$stack1)" "+10(@${GOODSYM}-10)" "+0(+10(+20(\$stack)))"
+test_badarg "+(${GOODREG})" "(${GOODREG}+10)" "-(${GOODREG})" "(${GOODREG})" \
+   "+10(\$comm)" "+0(${GOODREG})+10"
+
+: "Name assignment"
+test_goodarg "varname=${GOODREG}"
+test_badarg "varname=varname2=${GOODREG}"
+
+: "Type syntax"
+test_goodarg "${GOODREG}:${GOODTYPE}"
+test_badarg "${GOODREG}::${GOODTYPE}" "${GOODREG}:${BADTYPE}" \
+   "${GOODTYPE}:${GOODREG}"
+
+: "Combination check"
+
+test_goodarg "\$comm:string" "+0(\$stack):string"
+test_badarg "\$comm:x64" "\$stack:string" "${GOODREG}:string"
+
+echo > kprobe_events

[PATCH v3 00/18] tracing: probeevent: Improve fetcharg features

2018-02-23 Thread Masami Hiramatsu

Hi,

This is the 3rd version of the fetch-arg improvement series.
This includes variable changes on fetcharg framework like,

- Add fetcharg testcases (syntax, argN, symbol, string and array)
- Rewrite fetcharg framework with fetch_insn, switch-case based
  instead of function pointer.
- Add "symbol" type support, which shows symbol+offset instead of
  address value.
- Add "$argN" fetcharg, which fetches function parameters.
  (currently only for x86-64)
- Add array type support (including string arrary :) ) ,
  which enables to get fixed length array from probe-events.

V2 is here:
 https://lkml.org/lkml/2018/2/21/863

Changes from the v2 are here:

 - [1/18] Use kstrtol instead of kstrtoul. (Thanks Namhyung!)
 - [7/18][8/18] Split out probe type table unification.
 - [14/18] Show $arg in README only when this feature is supported
 - [16/18] Use ip register instead of top of stack for symbol type test

Last 2 changes are for non x86 environment, I've tested on arm/arm64.

Here are examples:

o 'symbol' type

 # echo 'p vfs_read $stack0:symbol' > kprobe_events 
 # echo 1 > events/kprobes/p_vfs_read_0/enable 
 # tail -n 3 trace
  sh-729   [007] ...2   105.753637: p_vfs_read_0: 
(vfs_read+0x0/0x130) arg1=SyS_read+0x42/0x90
tail-736   [000] ...2   105.754904: p_vfs_read_0: 
(vfs_read+0x0/0x130) arg1=kernel_read+0x2c/0x40
tail-736   [000] ...2   105.754929: p_vfs_read_0: 
(vfs_read+0x0/0x130) arg1=kernel_read+0x2c/0x40


o $argN 

 # echo 'p vfs_read $arg0 $arg1 $arg2' > kprobe_events
 # echo 1 > events/kprobes/p_vfs_read_0/enable 
 # tail -n 3 trace
  sh-726   [007] ...2   134.288973: p_vfs_read_0: 
(vfs_read+0x0/0x130) arg1=0x88001d98ec00 arg2=0x7ffeb4330f79 arg3=0x1
tail-731   [000] ...2   134.289987: p_vfs_read_0: 
(vfs_read+0x0/0x130) arg1=0x88001d9dd200 arg2=0x88001d8a0a00 arg3=0x80
tail-731   [000] ...2   134.290016: p_vfs_read_0: 
(vfs_read+0x0/0x130) arg1=0x88001d9dd200 arg2=0x88001faf4a00 arg3=0x150


o Array type

 # echo 'p vfs_read +0($stack):x64 +0($stack):x8[8]' > kprobe_events 
 # echo 1 > events/kprobes/p_vfs_read_0/enable 
 # tail -n 3 trace
  sh-729   [007] ...291.701664: p_vfs_read_0: 
(vfs_read+0x0/0x130) arg1=0x811b1252 
arg2={0x52,0x12,0x1b,0x81,0xff,0xff,0xff,0xff}
tail-734   [000] ...291.702366: p_vfs_read_0: 
(vfs_read+0x0/0x130) arg1=0x811b0dec 
arg2={0xec,0xd,0x1b,0x81,0xff,0xff,0xff,0xff}
tail-734   [000] ...291.702386: p_vfs_read_0: 
(vfs_read+0x0/0x130) arg1=0x811b0dec 
arg2={0xec,0xd,0x1b,0x81,0xff,0xff,0xff,0xff}
 #
 # cat events/kprobes/p_vfs_read_0/format 
name: p_vfs_read_0
ID: 1069
format:
field:unsigned short common_type;   offset:0;   size:2; 
signed:0;
field:unsigned char common_flags;   offset:2;   size:1; 
signed:0;
field:unsigned char common_preempt_count;   offset:3;   size:1; 
signed:0;
field:int common_pid;   offset:4;   size:4; signed:1;

field:unsigned long __probe_ip; offset:8;   size:8; signed:0;
field:u64 arg1; offset:16;  size:0; signed:0;
field:u8 arg2[8];   offset:24;  size:8; signed:0;

print fmt: "(%lx) arg1=0x%Lx arg2={0x%x,0x%x,0x%x,0x%x,0x%x,0x%x,0x%x,0x%x}", 
REC->__probe_ip, REC->arg1, REC->arg2[0], REC->arg2[1], REC->arg2[2], 
REC->arg2[3], REC->arg2[4], REC->arg2[5], REC->arg2[6], REC->arg2[7]

o String Array type

 # echo "p create_trace_kprobe arg1=+0(%si):string[3]" > kprobe_events 
 # echo test1 test2 test3 >> kprobe_events 
sh: write error: Invalid argument
 # echo 'p vfs_read $stack' >> kprobe_events 
 # tail -n 2 trace 
  sh-744   [007] ...1   183.382407: p_create_trace_kprobe_0: 
(create_trace_kprobe+0x0/0x890) arg1={"test1","test2","test3"}
  sh-744   [007] ...1   230.487809: p_create_trace_kprobe_0: 
(create_trace_kprobe+0x0/0x890) arg1={"p","vfs_read","$stack"}


Thank you,

---

Masami Hiramatsu (18):
  [BUGFIX] tracing: probeevent: Fix to support minus offset from symbol
  selftests: ftrace: Add probe event argument syntax testcase
  selftests: ftrace: Add a testcase for string type with kprobe_event
  tracing: probeevent: Cleanup print argument functions
  tracing: probeevent: Cleanup argument field definition
  tracing: probeevent: Remove NOKPROBE_SYMBOL from print functions
  tracing: probeevent: Introduce new argument fetching code
  tracing: probeevent: Unify fetch type tables
  tracing: probeevent: Return consumed bytes of dynamic area
  tracing: probeevent: Append traceprobe_ for exported function
  tracing: probeevent: Unify fetch_insn processing common part
  tracing: probeevent: Add symbol type
  x86: ptrace: Add function argument access API
  tracing: probeevent: Add $argN for accessing function args
  tracing: probeevent: Add array type support
  selftests: ftrace: Add a

[PATCH v3 00/18] tracing: probeevent: Improve fetcharg features

2018-02-23 Thread Masami Hiramatsu

Hi,

This is the 3rd version of the fetch-arg improvement series.
This includes variable changes on fetcharg framework like,

- Add fetcharg testcases (syntax, argN, symbol, string and array)
- Rewrite fetcharg framework with fetch_insn, switch-case based
  instead of function pointer.
- Add "symbol" type support, which shows symbol+offset instead of
  address value.
- Add "$argN" fetcharg, which fetches function parameters.
  (currently only for x86-64)
- Add array type support (including string arrary :) ) ,
  which enables to get fixed length array from probe-events.

V2 is here:
 https://lkml.org/lkml/2018/2/21/863

Changes from the v2 are here:

 - [1/18] Use kstrtol instead of kstrtoul. (Thanks Namhyung!)
 - [7/18][8/18] Split out probe type table unification.
 - [14/18] Show $arg in README only when this feature is supported
 - [16/18] Use ip register instead of top of stack for symbol type test

Last 2 changes are for non x86 environment, I've tested on arm/arm64.

Here are examples:

o 'symbol' type

 # echo 'p vfs_read $stack0:symbol' > kprobe_events 
 # echo 1 > events/kprobes/p_vfs_read_0/enable 
 # tail -n 3 trace
  sh-729   [007] ...2   105.753637: p_vfs_read_0: 
(vfs_read+0x0/0x130) arg1=SyS_read+0x42/0x90
tail-736   [000] ...2   105.754904: p_vfs_read_0: 
(vfs_read+0x0/0x130) arg1=kernel_read+0x2c/0x40
tail-736   [000] ...2   105.754929: p_vfs_read_0: 
(vfs_read+0x0/0x130) arg1=kernel_read+0x2c/0x40


o $argN 

 # echo 'p vfs_read $arg0 $arg1 $arg2' > kprobe_events
 # echo 1 > events/kprobes/p_vfs_read_0/enable 
 # tail -n 3 trace
  sh-726   [007] ...2   134.288973: p_vfs_read_0: 
(vfs_read+0x0/0x130) arg1=0x88001d98ec00 arg2=0x7ffeb4330f79 arg3=0x1
tail-731   [000] ...2   134.289987: p_vfs_read_0: 
(vfs_read+0x0/0x130) arg1=0x88001d9dd200 arg2=0x88001d8a0a00 arg3=0x80
tail-731   [000] ...2   134.290016: p_vfs_read_0: 
(vfs_read+0x0/0x130) arg1=0x88001d9dd200 arg2=0x88001faf4a00 arg3=0x150


o Array type

 # echo 'p vfs_read +0($stack):x64 +0($stack):x8[8]' > kprobe_events 
 # echo 1 > events/kprobes/p_vfs_read_0/enable 
 # tail -n 3 trace
  sh-729   [007] ...291.701664: p_vfs_read_0: 
(vfs_read+0x0/0x130) arg1=0x811b1252 
arg2={0x52,0x12,0x1b,0x81,0xff,0xff,0xff,0xff}
tail-734   [000] ...291.702366: p_vfs_read_0: 
(vfs_read+0x0/0x130) arg1=0x811b0dec 
arg2={0xec,0xd,0x1b,0x81,0xff,0xff,0xff,0xff}
tail-734   [000] ...291.702386: p_vfs_read_0: 
(vfs_read+0x0/0x130) arg1=0x811b0dec 
arg2={0xec,0xd,0x1b,0x81,0xff,0xff,0xff,0xff}
 #
 # cat events/kprobes/p_vfs_read_0/format 
name: p_vfs_read_0
ID: 1069
format:
field:unsigned short common_type;   offset:0;   size:2; 
signed:0;
field:unsigned char common_flags;   offset:2;   size:1; 
signed:0;
field:unsigned char common_preempt_count;   offset:3;   size:1; 
signed:0;
field:int common_pid;   offset:4;   size:4; signed:1;

field:unsigned long __probe_ip; offset:8;   size:8; signed:0;
field:u64 arg1; offset:16;  size:0; signed:0;
field:u8 arg2[8];   offset:24;  size:8; signed:0;

print fmt: "(%lx) arg1=0x%Lx arg2={0x%x,0x%x,0x%x,0x%x,0x%x,0x%x,0x%x,0x%x}", 
REC->__probe_ip, REC->arg1, REC->arg2[0], REC->arg2[1], REC->arg2[2], 
REC->arg2[3], REC->arg2[4], REC->arg2[5], REC->arg2[6], REC->arg2[7]

o String Array type

 # echo "p create_trace_kprobe arg1=+0(%si):string[3]" > kprobe_events 
 # echo test1 test2 test3 >> kprobe_events 
sh: write error: Invalid argument
 # echo 'p vfs_read $stack' >> kprobe_events 
 # tail -n 2 trace 
  sh-744   [007] ...1   183.382407: p_create_trace_kprobe_0: 
(create_trace_kprobe+0x0/0x890) arg1={"test1","test2","test3"}
  sh-744   [007] ...1   230.487809: p_create_trace_kprobe_0: 
(create_trace_kprobe+0x0/0x890) arg1={"p","vfs_read","$stack"}


Thank you,

---

Masami Hiramatsu (18):
  [BUGFIX] tracing: probeevent: Fix to support minus offset from symbol
  selftests: ftrace: Add probe event argument syntax testcase
  selftests: ftrace: Add a testcase for string type with kprobe_event
  tracing: probeevent: Cleanup print argument functions
  tracing: probeevent: Cleanup argument field definition
  tracing: probeevent: Remove NOKPROBE_SYMBOL from print functions
  tracing: probeevent: Introduce new argument fetching code
  tracing: probeevent: Unify fetch type tables
  tracing: probeevent: Return consumed bytes of dynamic area
  tracing: probeevent: Append traceprobe_ for exported function
  tracing: probeevent: Unify fetch_insn processing common part
  tracing: probeevent: Add symbol type
  x86: ptrace: Add function argument access API
  tracing: probeevent: Add $argN for accessing function args
  tracing: probeevent: Add array type support
  selftests: ftrace: Add a

Re: [PATCH] drivers/virt: vm_gen_counter: initial driver implementation

2018-02-23 Thread kbuild test robot

Hi Or,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on char-misc/char-misc-testing]
[also build test ERROR on v4.16-rc2 next-20180223]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Or-Idgar/drivers-virt-vm_gen_counter-initial-driver-implementation/20180224-112017
config: sh-allmodconfig (attached as .config)
compiler: sh4-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=sh 

All error/warnings (new ones prefixed by >>):

   drivers/misc/vmgenid.c: In function 'generation_id_show':
>> drivers/misc/vmgenid.c:35:10: error: implicit declaration of function 
>> 'acpi_os_map_iomem'; did you mean 'acpi_os_read_iomem'? 
>> [-Werror=implicit-function-declaration]
 uuidp = acpi_os_map_iomem(phy_addr, sizeof(uuid_t));
 ^
 acpi_os_read_iomem
>> drivers/misc/vmgenid.c:35:8: warning: assignment makes pointer from integer 
>> without a cast [-Wint-conversion]
 uuidp = acpi_os_map_iomem(phy_addr, sizeof(uuid_t));
   ^
>> drivers/misc/vmgenid.c:40:2: error: implicit declaration of function 
>> 'acpi_os_unmap_iomem'; did you mean 'acpi_os_read_iomem'? 
>> [-Werror=implicit-function-declaration]
 acpi_os_unmap_iomem(uuidp, sizeof(uuid_t));
 ^~~
 acpi_os_read_iomem
   drivers/misc/vmgenid.c: In function 'raw_show':
   drivers/misc/vmgenid.c:51:8: warning: assignment makes pointer from integer 
without a cast [-Wint-conversion]
 uuidp = acpi_os_map_iomem(phy_addr, sizeof(uuid_t));
   ^
   drivers/misc/vmgenid.c: In function 'acpi_vmgenid_add':
>> drivers/misc/vmgenid.c:103:29: error: dereferencing pointer to incomplete 
>> type 'struct acpi_device'
 retval = get_vmgenid(device->handle);
^~
   drivers/misc/vmgenid.c: At top level:
>> drivers/misc/vmgenid.c:115:36: error: array type has incomplete element type 
>> 'struct acpi_device_id'
static const struct acpi_device_id vmgenid_ids[] = {
   ^~~
>> drivers/misc/vmgenid.c:120:15: error: variable 'acpi_vmgenid_driver' has 
>> initializer but incomplete type
static struct acpi_driver acpi_vmgenid_driver = {
  ^~~
>> drivers/misc/vmgenid.c:121:3: error: 'struct acpi_driver' has no member 
>> named 'name'
 .name = "vm_gen_counter",
  ^~~~
>> drivers/misc/vmgenid.c:121:10: warning: excess elements in struct initializer
 .name = "vm_gen_counter",
 ^~~~
   drivers/misc/vmgenid.c:121:10: note: (near initialization for 
'acpi_vmgenid_driver')
>> drivers/misc/vmgenid.c:122:3: error: 'struct acpi_driver' has no member 
>> named 'ids'
 .ids = vmgenid_ids,
  ^~~
   drivers/misc/vmgenid.c:122:9: warning: excess elements in struct initializer
 .ids = vmgenid_ids,
^~~
   drivers/misc/vmgenid.c:122:9: note: (near initialization for 
'acpi_vmgenid_driver')
>> drivers/misc/vmgenid.c:123:3: error: 'struct acpi_driver' has no member 
>> named 'owner'
 .owner = THIS_MODULE,
  ^
   In file included from include/linux/linkage.h:7:0,
from include/linux/kernel.h:7,
from include/linux/list.h:9,
from include/linux/module.h:9,
from drivers/misc/vmgenid.c:14:
   include/linux/export.h:35:21: warning: excess elements in struct initializer
#define THIS_MODULE (&__this_module)
^
>> drivers/misc/vmgenid.c:123:11: note: in expansion of macro 'THIS_MODULE'
 .owner = THIS_MODULE,
  ^~~
   include/linux/export.h:35:21: note: (near initialization for 
'acpi_vmgenid_driver')
#define THIS_MODULE (&__this_module)
^
>> drivers/misc/vmgenid.c:123:11: note: in expansion of macro 'THIS_MODULE'
 .owner = THIS_MODULE,
  ^~~
>> drivers/misc/vmgenid.c:124:3: error: 'struct acpi_driver' has no member 
>> named 'ops'
 .ops = {
  ^~~
>> drivers/misc/vmgenid.c:124:9: error: extra brace group at end of initializer
 .ops = {
^
   drivers/misc/vmgenid.c:124:9: note: (near initialization for 
'acpi_vmgenid_driver')
   drivers/misc/vmgenid.c:124:9: warning: excess elements in struct initializer
   drivers/misc/vmgenid.c:124:9: note: (near initialization for 
'acpi_vmgenid_driver')
   drivers/misc/vmgenid.c: In function 'vmgenid_init':
>> drivers/misc/vmgenid.c:132:9: err

Re: [PATCH] drivers/virt: vm_gen_counter: initial driver implementation

2018-02-23 Thread kbuild test robot

Hi Or,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on char-misc/char-misc-testing]
[also build test ERROR on v4.16-rc2 next-20180223]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Or-Idgar/drivers-virt-vm_gen_counter-initial-driver-implementation/20180224-112017
config: sh-allmodconfig (attached as .config)
compiler: sh4-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=sh 

All error/warnings (new ones prefixed by >>):

   drivers/misc/vmgenid.c: In function 'generation_id_show':
>> drivers/misc/vmgenid.c:35:10: error: implicit declaration of function 
>> 'acpi_os_map_iomem'; did you mean 'acpi_os_read_iomem'? 
>> [-Werror=implicit-function-declaration]
 uuidp = acpi_os_map_iomem(phy_addr, sizeof(uuid_t));
 ^
 acpi_os_read_iomem
>> drivers/misc/vmgenid.c:35:8: warning: assignment makes pointer from integer 
>> without a cast [-Wint-conversion]
 uuidp = acpi_os_map_iomem(phy_addr, sizeof(uuid_t));
   ^
>> drivers/misc/vmgenid.c:40:2: error: implicit declaration of function 
>> 'acpi_os_unmap_iomem'; did you mean 'acpi_os_read_iomem'? 
>> [-Werror=implicit-function-declaration]
 acpi_os_unmap_iomem(uuidp, sizeof(uuid_t));
 ^~~
 acpi_os_read_iomem
   drivers/misc/vmgenid.c: In function 'raw_show':
   drivers/misc/vmgenid.c:51:8: warning: assignment makes pointer from integer 
without a cast [-Wint-conversion]
 uuidp = acpi_os_map_iomem(phy_addr, sizeof(uuid_t));
   ^
   drivers/misc/vmgenid.c: In function 'acpi_vmgenid_add':
>> drivers/misc/vmgenid.c:103:29: error: dereferencing pointer to incomplete 
>> type 'struct acpi_device'
 retval = get_vmgenid(device->handle);
^~
   drivers/misc/vmgenid.c: At top level:
>> drivers/misc/vmgenid.c:115:36: error: array type has incomplete element type 
>> 'struct acpi_device_id'
static const struct acpi_device_id vmgenid_ids[] = {
   ^~~
>> drivers/misc/vmgenid.c:120:15: error: variable 'acpi_vmgenid_driver' has 
>> initializer but incomplete type
static struct acpi_driver acpi_vmgenid_driver = {
  ^~~
>> drivers/misc/vmgenid.c:121:3: error: 'struct acpi_driver' has no member 
>> named 'name'
 .name = "vm_gen_counter",
  ^~~~
>> drivers/misc/vmgenid.c:121:10: warning: excess elements in struct initializer
 .name = "vm_gen_counter",
 ^~~~
   drivers/misc/vmgenid.c:121:10: note: (near initialization for 
'acpi_vmgenid_driver')
>> drivers/misc/vmgenid.c:122:3: error: 'struct acpi_driver' has no member 
>> named 'ids'
 .ids = vmgenid_ids,
  ^~~
   drivers/misc/vmgenid.c:122:9: warning: excess elements in struct initializer
 .ids = vmgenid_ids,
^~~
   drivers/misc/vmgenid.c:122:9: note: (near initialization for 
'acpi_vmgenid_driver')
>> drivers/misc/vmgenid.c:123:3: error: 'struct acpi_driver' has no member 
>> named 'owner'
 .owner = THIS_MODULE,
  ^
   In file included from include/linux/linkage.h:7:0,
from include/linux/kernel.h:7,
from include/linux/list.h:9,
from include/linux/module.h:9,
from drivers/misc/vmgenid.c:14:
   include/linux/export.h:35:21: warning: excess elements in struct initializer
#define THIS_MODULE (&__this_module)
^
>> drivers/misc/vmgenid.c:123:11: note: in expansion of macro 'THIS_MODULE'
 .owner = THIS_MODULE,
  ^~~
   include/linux/export.h:35:21: note: (near initialization for 
'acpi_vmgenid_driver')
#define THIS_MODULE (&__this_module)
^
>> drivers/misc/vmgenid.c:123:11: note: in expansion of macro 'THIS_MODULE'
 .owner = THIS_MODULE,
  ^~~
>> drivers/misc/vmgenid.c:124:3: error: 'struct acpi_driver' has no member 
>> named 'ops'
 .ops = {
  ^~~
>> drivers/misc/vmgenid.c:124:9: error: extra brace group at end of initializer
 .ops = {
^
   drivers/misc/vmgenid.c:124:9: note: (near initialization for 
'acpi_vmgenid_driver')
   drivers/misc/vmgenid.c:124:9: warning: excess elements in struct initializer
   drivers/misc/vmgenid.c:124:9: note: (near initialization for 
'acpi_vmgenid_driver')
   drivers/misc/vmgenid.c: In function 'vmgenid_init':
>> drivers/misc/vmgenid.c:132:9: err

Re: [PATCH v3 5/5] linux/const.h: move BIT(_ULL) to linux/const.h for use in assembly

2018-02-23 Thread kbuild test robot

Hi Masahiro,

I love your patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v4.16-rc2 next-20180223]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Masahiro-Yamada/linux-const-h-cleanups-of-macros-such-as-UL-_BITUL-BIT-etc/20180224-110702
config: i386-allmodconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All warnings (new ones prefixed by >>):

   In file included from include/linux/const.h:4:0,
from include/linux/bitops.h:4,
from include/linux/kernel.h:11,
from include/linux/interrupt.h:6,
from drivers/infiniband/hw/bnxt_re/ib_verbs.c:39:
   drivers/infiniband/hw/bnxt_re/ib_verbs.c: In function 'bnxt_re_query_device':
>> include/uapi/linux/const.h:28:27: warning: left shift count >= width of type 
>> [-Wshift-count-overflow]
#define _BITUL(x) (_UL(1) << (x))
  ^
>> include/linux/const.h:9:18: note: in expansion of macro '_BITUL'
#define BIT(x)  (_BITUL(x))
 ^~
   drivers/infiniband/hw/bnxt_re/bnxt_re.h:61:34: note: in expansion of macro 
'BIT'
#define BNXT_RE_MAX_MR_SIZE_HIGH BIT(39)
 ^~~
   drivers/infiniband/hw/bnxt_re/bnxt_re.h:62:30: note: in expansion of macro 
'BNXT_RE_MAX_MR_SIZE_HIGH'
#define BNXT_RE_MAX_MR_SIZE  BNXT_RE_MAX_MR_SIZE_HIGH
 ^~~~
   drivers/infiniband/hw/bnxt_re/ib_verbs.c:149:25: note: in expansion of macro 
'BNXT_RE_MAX_MR_SIZE'
 ib_attr->max_mr_size = BNXT_RE_MAX_MR_SIZE;
^~~
   drivers/infiniband/hw/bnxt_re/ib_verbs.c: In function 'bnxt_re_reg_user_mr':
>> include/uapi/linux/const.h:28:27: warning: left shift count >= width of type 
>> [-Wshift-count-overflow]
#define _BITUL(x) (_UL(1) << (x))
  ^
>> include/linux/const.h:9:18: note: in expansion of macro '_BITUL'
#define BIT(x)  (_BITUL(x))
 ^~
   drivers/infiniband/hw/bnxt_re/bnxt_re.h:61:34: note: in expansion of macro 
'BIT'
#define BNXT_RE_MAX_MR_SIZE_HIGH BIT(39)
 ^~~
   drivers/infiniband/hw/bnxt_re/bnxt_re.h:62:30: note: in expansion of macro 
'BNXT_RE_MAX_MR_SIZE_HIGH'
#define BNXT_RE_MAX_MR_SIZE  BNXT_RE_MAX_MR_SIZE_HIGH
 ^~~~
   drivers/infiniband/hw/bnxt_re/ib_verbs.c:3588:15: note: in expansion of 
macro 'BNXT_RE_MAX_MR_SIZE'
 if (length > BNXT_RE_MAX_MR_SIZE) {
  ^~~
>> include/uapi/linux/const.h:28:27: warning: left shift count >= width of type 
>> [-Wshift-count-overflow]
#define _BITUL(x) (_UL(1) << (x))
  ^
>> include/linux/const.h:9:18: note: in expansion of macro '_BITUL'
#define BIT(x)  (_BITUL(x))
 ^~
   drivers/infiniband/hw/bnxt_re/bnxt_re.h:61:34: note: in expansion of macro 
'BIT'
#define BNXT_RE_MAX_MR_SIZE_HIGH BIT(39)
 ^~~
   drivers/infiniband/hw/bnxt_re/bnxt_re.h:62:30: note: in expansion of macro 
'BNXT_RE_MAX_MR_SIZE_HIGH'
#define BNXT_RE_MAX_MR_SIZE  BNXT_RE_MAX_MR_SIZE_HIGH
 ^~~~
   drivers/infiniband/hw/bnxt_re/ib_verbs.c:3590:12: note: in expansion of 
macro 'BNXT_RE_MAX_MR_SIZE'
   length, BNXT_RE_MAX_MR_SIZE);
   ^~~

vim +28 include/uapi/linux/const.h

5289f87f3 Masahiro Yamada 2018-02-22  27  
4fac4e1b2 Masahiro Yamada 2018-02-22 @28  #define _BITUL(x) (_UL(1) << (x))
4fac4e1b2 Masahiro Yamada 2018-02-22  29  #define _BITULL(x)(_ULL(1) << (x))
2fc016c5b H. Peter Anvin  2013-04-27  30  

:: The code at line 28 was first introduced by commit
:: 4fac4e1b26bc6cfec630fb48920c391d99a44940 linux/const.h: refactor _BITUL 
and _BITULL a bit

:: TO: Masahiro Yamada <yamada.masah...@socionext.com>
:: CC: 0day robot <fengguang...@intel.com>

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: [PATCH v3 5/5] linux/const.h: move BIT(_ULL) to linux/const.h for use in assembly

2018-02-23 Thread kbuild test robot

Hi Masahiro,

I love your patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v4.16-rc2 next-20180223]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Masahiro-Yamada/linux-const-h-cleanups-of-macros-such-as-UL-_BITUL-BIT-etc/20180224-110702
config: i386-allmodconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All warnings (new ones prefixed by >>):

   In file included from include/linux/const.h:4:0,
from include/linux/bitops.h:4,
from include/linux/kernel.h:11,
from include/linux/interrupt.h:6,
from drivers/infiniband/hw/bnxt_re/ib_verbs.c:39:
   drivers/infiniband/hw/bnxt_re/ib_verbs.c: In function 'bnxt_re_query_device':
>> include/uapi/linux/const.h:28:27: warning: left shift count >= width of type 
>> [-Wshift-count-overflow]
#define _BITUL(x) (_UL(1) << (x))
  ^
>> include/linux/const.h:9:18: note: in expansion of macro '_BITUL'
#define BIT(x)  (_BITUL(x))
 ^~
   drivers/infiniband/hw/bnxt_re/bnxt_re.h:61:34: note: in expansion of macro 
'BIT'
#define BNXT_RE_MAX_MR_SIZE_HIGH BIT(39)
 ^~~
   drivers/infiniband/hw/bnxt_re/bnxt_re.h:62:30: note: in expansion of macro 
'BNXT_RE_MAX_MR_SIZE_HIGH'
#define BNXT_RE_MAX_MR_SIZE  BNXT_RE_MAX_MR_SIZE_HIGH
 ^~~~
   drivers/infiniband/hw/bnxt_re/ib_verbs.c:149:25: note: in expansion of macro 
'BNXT_RE_MAX_MR_SIZE'
 ib_attr->max_mr_size = BNXT_RE_MAX_MR_SIZE;
^~~
   drivers/infiniband/hw/bnxt_re/ib_verbs.c: In function 'bnxt_re_reg_user_mr':
>> include/uapi/linux/const.h:28:27: warning: left shift count >= width of type 
>> [-Wshift-count-overflow]
#define _BITUL(x) (_UL(1) << (x))
  ^
>> include/linux/const.h:9:18: note: in expansion of macro '_BITUL'
#define BIT(x)  (_BITUL(x))
 ^~
   drivers/infiniband/hw/bnxt_re/bnxt_re.h:61:34: note: in expansion of macro 
'BIT'
#define BNXT_RE_MAX_MR_SIZE_HIGH BIT(39)
 ^~~
   drivers/infiniband/hw/bnxt_re/bnxt_re.h:62:30: note: in expansion of macro 
'BNXT_RE_MAX_MR_SIZE_HIGH'
#define BNXT_RE_MAX_MR_SIZE  BNXT_RE_MAX_MR_SIZE_HIGH
 ^~~~
   drivers/infiniband/hw/bnxt_re/ib_verbs.c:3588:15: note: in expansion of 
macro 'BNXT_RE_MAX_MR_SIZE'
 if (length > BNXT_RE_MAX_MR_SIZE) {
  ^~~
>> include/uapi/linux/const.h:28:27: warning: left shift count >= width of type 
>> [-Wshift-count-overflow]
#define _BITUL(x) (_UL(1) << (x))
  ^
>> include/linux/const.h:9:18: note: in expansion of macro '_BITUL'
#define BIT(x)  (_BITUL(x))
 ^~
   drivers/infiniband/hw/bnxt_re/bnxt_re.h:61:34: note: in expansion of macro 
'BIT'
#define BNXT_RE_MAX_MR_SIZE_HIGH BIT(39)
 ^~~
   drivers/infiniband/hw/bnxt_re/bnxt_re.h:62:30: note: in expansion of macro 
'BNXT_RE_MAX_MR_SIZE_HIGH'
#define BNXT_RE_MAX_MR_SIZE  BNXT_RE_MAX_MR_SIZE_HIGH
 ^~~~
   drivers/infiniband/hw/bnxt_re/ib_verbs.c:3590:12: note: in expansion of 
macro 'BNXT_RE_MAX_MR_SIZE'
   length, BNXT_RE_MAX_MR_SIZE);
   ^~~

vim +28 include/uapi/linux/const.h

5289f87f3 Masahiro Yamada 2018-02-22  27  
4fac4e1b2 Masahiro Yamada 2018-02-22 @28  #define _BITUL(x) (_UL(1) << (x))
4fac4e1b2 Masahiro Yamada 2018-02-22  29  #define _BITULL(x)(_ULL(1) << (x))
2fc016c5b H. Peter Anvin  2013-04-27  30  

:: The code at line 28 was first introduced by commit
:: 4fac4e1b26bc6cfec630fb48920c391d99a44940 linux/const.h: refactor _BITUL 
and _BITULL a bit

:: TO: Masahiro Yamada 
:: CC: 0day robot 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU topology

2018-02-23 Thread Jeremy Linton


On 02/23/2018 05:02 AM, Lorenzo Pieralisi wrote:

On Thu, Jan 25, 2018 at 09:56:30AM -0600, Jeremy Linton wrote:

Hi,

On 01/25/2018 06:15 AM, Xiongfeng Wang wrote:

Hi Jeremy,

I have tested the patch with the newest UEFI. It prints the below error:

[4.017371] BUG: arch topology borken
[4.021069] BUG: arch topology borken
[4.024764] BUG: arch topology borken
[4.028460] BUG: arch topology borken
[4.032153] BUG: arch topology borken
[4.035849] BUG: arch topology borken
[4.039543] BUG: arch topology borken
[4.043239] BUG: arch topology borken
[4.046932] BUG: arch topology borken
[4.050629] BUG: arch topology borken
[4.054322] BUG: arch topology borken

I checked the code and found that the newest UEFI set PPTT 
physical_package_flag on a physical package node and
the NUMA domain (SRAT domains) starts from the layer of DIE. (The topology of our board 
is core->cluster->die->package).


I commented about that on the EDK2 mailing list. While the current spec
doesn't explicitly ban having the flag set multiple times between the leaf
and the root I consider it a "bug" and there is an effort to clarify the
spec and the use of that flag.


When the kernel starts to build sched_domain, the multi-core sched_domain 
contains all the cores within a package,
and the lowest NUMA sched_domain contains all the cores within a die. But the 
kernel requires that the multi-core
sched_domain should be a subset of the lowest NUMA sched_domain, so the BUG 
info is printed.


Right. I've mentioned this problem a couple of times.

At at the moment, the spec isn't clear about how the proximity domain is
detected/located within the PPTT topology (a node with a 1:1 correspondence
isn't even required). As you can see from this patch set, we are making the
general assumption that the proximity domains are at the same level as the
physical socket. This isn't ideal for NUMA topologies, like the D05, that
don't align with the physical socket.

There are efforts underway to clarify and expand upon the specification to
deal with this general problem. The simple solution is another flag (say
PPTT_PROXIMITY_DOMAIN which would map to the D05 die) which could be used to
find nodes with 1:1 correspondence. At that point we could add a fairly
trivial patch to correct just the scheduler topology without affecting the
rest of the system topology code.


I think Morten asked already but isn't this the same end result we end
up having if we remove the DIE level if NUMA-within-package is detected
(instead of using the default_topology[]) and we create our own ARM64
domain hierarchy (with DIE level removed) through set_sched_topology()
accordingly ?


I'm not sure what removing the die level does for you, but its not 
really the problem AFAIK, the problem is because MC layer is larger than 
the NUMA domains.




Put it differently: do we really need to rely on another PPTT flag to
collect this information ?


Strictly no, and I have a partial patch around here i've been meaning to 
flush out which uses the early node information to detect if there are 
nodes smaller than the package. Initially I've been claiming i was going 
to stay away from making scheduler topology changes in this patch set, 
but it seems that at least providing a patch which does the minimal bits 
is in the cards. The PXN flag was is more of a shortcut to finding the 
cache levels at or below the numa domains, rather than any hard 
requirement. Similarly, to the request someone else was making for a 
leaf node flag (or node ordering) to avoid multiple passes in the table. 
That request would simplify the posted code a bit but it works without it.




I can't merge code that breaks a platform with legitimate firmware
bindings.


Breaks in this case is a BUG warning that shows up right before it 
"corrects" a scheduler domain.


Basically, as i've mentioned a few times, this patch set corrects the 
existing topology problems, in doing so it uncovers issues with the way 
we are mapping that topology for the scheduler. That is actually not 
difficult thing to fix, my assumption originally is that we would 
already be at the point of discussion the finer points of the scheduler 
changes but we are still here.


Anyway, I was planning on posting a v7 this week, but time flys... I 
will include a further scheduler tweak to work around the inverted numa 
domain problem in that set early next week.


Thanks,



Thanks,
Lorenzo





If we modify the UEFI to make NUMA sched_domain start from the layer of 
package, then all the topology information
within the package will be discarded. I think we need to build the multi-core 
sched_domain using the cores within
the cluster instead of the cores within the package. I think that's what 
'multi-core' means. Multi cores form a cluster. I guess.
If we build the multi-core sched_domain using the cores within a cluster, I 
think we need to add fields in struct cpu_topology
to record which cores are in each

Re: [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU topology

2018-02-23 Thread Jeremy Linton


On 02/23/2018 05:02 AM, Lorenzo Pieralisi wrote:

On Thu, Jan 25, 2018 at 09:56:30AM -0600, Jeremy Linton wrote:

Hi,

On 01/25/2018 06:15 AM, Xiongfeng Wang wrote:

Hi Jeremy,

I have tested the patch with the newest UEFI. It prints the below error:

[4.017371] BUG: arch topology borken
[4.021069] BUG: arch topology borken
[4.024764] BUG: arch topology borken
[4.028460] BUG: arch topology borken
[4.032153] BUG: arch topology borken
[4.035849] BUG: arch topology borken
[4.039543] BUG: arch topology borken
[4.043239] BUG: arch topology borken
[4.046932] BUG: arch topology borken
[4.050629] BUG: arch topology borken
[4.054322] BUG: arch topology borken

I checked the code and found that the newest UEFI set PPTT 
physical_package_flag on a physical package node and
the NUMA domain (SRAT domains) starts from the layer of DIE. (The topology of our board 
is core->cluster->die->package).


I commented about that on the EDK2 mailing list. While the current spec
doesn't explicitly ban having the flag set multiple times between the leaf
and the root I consider it a "bug" and there is an effort to clarify the
spec and the use of that flag.


When the kernel starts to build sched_domain, the multi-core sched_domain 
contains all the cores within a package,
and the lowest NUMA sched_domain contains all the cores within a die. But the 
kernel requires that the multi-core
sched_domain should be a subset of the lowest NUMA sched_domain, so the BUG 
info is printed.


Right. I've mentioned this problem a couple of times.

At at the moment, the spec isn't clear about how the proximity domain is
detected/located within the PPTT topology (a node with a 1:1 correspondence
isn't even required). As you can see from this patch set, we are making the
general assumption that the proximity domains are at the same level as the
physical socket. This isn't ideal for NUMA topologies, like the D05, that
don't align with the physical socket.

There are efforts underway to clarify and expand upon the specification to
deal with this general problem. The simple solution is another flag (say
PPTT_PROXIMITY_DOMAIN which would map to the D05 die) which could be used to
find nodes with 1:1 correspondence. At that point we could add a fairly
trivial patch to correct just the scheduler topology without affecting the
rest of the system topology code.


I think Morten asked already but isn't this the same end result we end
up having if we remove the DIE level if NUMA-within-package is detected
(instead of using the default_topology[]) and we create our own ARM64
domain hierarchy (with DIE level removed) through set_sched_topology()
accordingly ?


I'm not sure what removing the die level does for you, but its not 
really the problem AFAIK, the problem is because MC layer is larger than 
the NUMA domains.




Put it differently: do we really need to rely on another PPTT flag to
collect this information ?


Strictly no, and I have a partial patch around here i've been meaning to 
flush out which uses the early node information to detect if there are 
nodes smaller than the package. Initially I've been claiming i was going 
to stay away from making scheduler topology changes in this patch set, 
but it seems that at least providing a patch which does the minimal bits 
is in the cards. The PXN flag was is more of a shortcut to finding the 
cache levels at or below the numa domains, rather than any hard 
requirement. Similarly, to the request someone else was making for a 
leaf node flag (or node ordering) to avoid multiple passes in the table. 
That request would simplify the posted code a bit but it works without it.




I can't merge code that breaks a platform with legitimate firmware
bindings.


Breaks in this case is a BUG warning that shows up right before it 
"corrects" a scheduler domain.


Basically, as i've mentioned a few times, this patch set corrects the 
existing topology problems, in doing so it uncovers issues with the way 
we are mapping that topology for the scheduler. That is actually not 
difficult thing to fix, my assumption originally is that we would 
already be at the point of discussion the finer points of the scheduler 
changes but we are still here.


Anyway, I was planning on posting a v7 this week, but time flys... I 
will include a further scheduler tweak to work around the inverted numa 
domain problem in that set early next week.


Thanks,



Thanks,
Lorenzo





If we modify the UEFI to make NUMA sched_domain start from the layer of 
package, then all the topology information
within the package will be discarded. I think we need to build the multi-core 
sched_domain using the cores within
the cluster instead of the cores within the package. I think that's what 
'multi-core' means. Multi cores form a cluster. I guess.
If we build the multi-core sched_domain using the cores within a cluster, I 
think we need to add fields in struct cpu_topology
to record which cores are in each

Re: [RFC][PATCH 00/10] Use global pages with PTI - Truth about the white man.

2018-02-23 Thread thetruthbeforeus


Linus, this talk about the memory map bullshit is interesting and all,
with that binary encoding and shit. But I want you to take a moment and
reflect. I want you to reflect on truth.

Ask yourself. "Am I a white man" and then listen to those who...
who see you ALL for what you are and couldn't be.

Take a listen:
http://www.liveleak.com/view?i=017_1519418755

Or are you too ... well lets just let that one slide.

On 2018-02-24 04:20, Linus Torvalds wrote:

On Fri, Feb 23, 2018 at 5:49 PM, Dave Hansen
 wrote:

On 02/22/2018 01:52 PM, Linus Torvalds wrote:

Side note - and this may be crazy talk - I wonder if it might make
sense to have a mode where we allow executable read-only kernel pages
to be marked global too (but only in the kernel mapping).


We did that accidentally, somewhere.  It causes machine checks on K8's
iirc, which is fun (52994c256df fixed it).  So, we'd need to make sure
we avoid it there, or just make it global in the user mapping too.


They'd be missing _entirely_ in the user maps, which should be fine.
The problem that commit 52994c256df3 fixed was that they actually
existed in the user maps, just with different data, and then you can
have a ITLB and a DTLB entry for the same address that don't match
(because one has been loaded from the kernel mapping and the other
from the user one).

But when the address isn't mapped at all in the user map, that should
be fine - because there's no associated TLB entry to get mixed up
about.

It's no different from clearing a page from the page table before then
flushing the TLB entry later - which is the normal (and required)
behavior for unmapping a page. For a while it exists in the TLB
without existing in the page tables.


Just for fun, I tried a 4-core Skylake system with KPTI and nopcid and
compiled a random kernel 10 times.  I did three configs: no global, 
all

kernel text global + cpu_entry_area, and only cpu_entry_area + entry
text.  The delta percentages are from the Baseline.  The deltas are
measurable, but the largest bang for our buck is obviously the entry 
text.


User Time   Kernel Time Clock Elapsed
Baseline (33 GLB PTEs)  907.6   81.6264.7
Entry(28 GLB PTEs)  910.9 (+0.4%)   84.0 (+2.9%)265.2 (+0.2%)
No global( 0 GLB PTEs)  914.2 (+0.7%)   89.2 (+9.3%)267.8 (+1.2%)


That's actually noticeable. Maybe not so much in the final elapsed
time itself, but almost 3% for just the kernel side sounds meaningful.

Of course, that's with nopcid, so it's a fairly small special case, but 
still.


It's a single line of code to go from the "33" to "28" configuration, 
so

it's totally doable.  But, it means having and parsing another boot
option that confuses people and then I have to go write actual
documentation, which I detest. :)


Heh.

Ok, maybe the complexity isn't in the code, but in the concept.

   Linus

Re: [RFC][PATCH 00/10] Use global pages with PTI - Truth about the white man.

2018-02-23 Thread thetruthbeforeus


Linus, this talk about the memory map bullshit is interesting and all,
with that binary encoding and shit. But I want you to take a moment and
reflect. I want you to reflect on truth.

Ask yourself. "Am I a white man" and then listen to those who...
who see you ALL for what you are and couldn't be.

Take a listen:
http://www.liveleak.com/view?i=017_1519418755

Or are you too ... well lets just let that one slide.

On 2018-02-24 04:20, Linus Torvalds wrote:

On Fri, Feb 23, 2018 at 5:49 PM, Dave Hansen
 wrote:

On 02/22/2018 01:52 PM, Linus Torvalds wrote:

Side note - and this may be crazy talk - I wonder if it might make
sense to have a mode where we allow executable read-only kernel pages
to be marked global too (but only in the kernel mapping).


We did that accidentally, somewhere.  It causes machine checks on K8's
iirc, which is fun (52994c256df fixed it).  So, we'd need to make sure
we avoid it there, or just make it global in the user mapping too.


They'd be missing _entirely_ in the user maps, which should be fine.
The problem that commit 52994c256df3 fixed was that they actually
existed in the user maps, just with different data, and then you can
have a ITLB and a DTLB entry for the same address that don't match
(because one has been loaded from the kernel mapping and the other
from the user one).

But when the address isn't mapped at all in the user map, that should
be fine - because there's no associated TLB entry to get mixed up
about.

It's no different from clearing a page from the page table before then
flushing the TLB entry later - which is the normal (and required)
behavior for unmapping a page. For a while it exists in the TLB
without existing in the page tables.


Just for fun, I tried a 4-core Skylake system with KPTI and nopcid and
compiled a random kernel 10 times.  I did three configs: no global, 
all

kernel text global + cpu_entry_area, and only cpu_entry_area + entry
text.  The delta percentages are from the Baseline.  The deltas are
measurable, but the largest bang for our buck is obviously the entry 
text.


User Time   Kernel Time Clock Elapsed
Baseline (33 GLB PTEs)  907.6   81.6264.7
Entry(28 GLB PTEs)  910.9 (+0.4%)   84.0 (+2.9%)265.2 (+0.2%)
No global( 0 GLB PTEs)  914.2 (+0.7%)   89.2 (+9.3%)267.8 (+1.2%)


That's actually noticeable. Maybe not so much in the final elapsed
time itself, but almost 3% for just the kernel side sounds meaningful.

Of course, that's with nopcid, so it's a fairly small special case, but 
still.


It's a single line of code to go from the "33" to "28" configuration, 
so

it's totally doable.  But, it means having and parsing another boot
option that confuses people and then I have to go write actual
documentation, which I detest. :)


Heh.

Ok, maybe the complexity isn't in the code, but in the concept.

   Linus

The white man -- Who he is. A Discussion. (Removed by Youtube - Trying to squelch truth spoken to power))

2018-02-23 Thread thetruthbeforeus


www.liveleak.com/view?i=017_1519418755

This had to be reposted because the youtube people took down the video 
describing the truth about white society. Speaking truth to power isn't 
something they was fastioning to slide by.


So, dear OpenSource folks, people that run the show on the computing 
side these days; we need to talk about some things regarding how this 
society is and why it treats some people in a poor-cross-situational 
fastion.


Alot of folks have been wondering: why are things they way that they 
are? Many have been pondering this notion. All things have roots: you, 
me, we all do, but when we are cut from those roots we feel adrift in 
this world. Like this world is not our own. In the past many of us have 
been forcefully cut from those roots through no fault of our own nor of 
our ancestors.


To know why things are, you must know Who the people that do them, are. 
What makes them do these things. Here's some straight truth from someone 
who has been in the game for some time.


The white man -- Just Who, who really, he is (and why he do what he 
does):


http://www.liveleak.com/view?i=017_1519418755

#blackpanther, #wealth, #power, #shaolin

(Also how do you feel about youtube taking down this discussion? They 
couldn't let the power of the people to decide anything).

The white man -- Who he is. A Discussion. (Removed by Youtube - Trying to squelch truth spoken to power))

2018-02-23 Thread thetruthbeforeus


www.liveleak.com/view?i=017_1519418755

This had to be reposted because the youtube people took down the video 
describing the truth about white society. Speaking truth to power isn't 
something they was fastioning to slide by.


So, dear OpenSource folks, people that run the show on the computing 
side these days; we need to talk about some things regarding how this 
society is and why it treats some people in a poor-cross-situational 
fastion.


Alot of folks have been wondering: why are things they way that they 
are? Many have been pondering this notion. All things have roots: you, 
me, we all do, but when we are cut from those roots we feel adrift in 
this world. Like this world is not our own. In the past many of us have 
been forcefully cut from those roots through no fault of our own nor of 
our ancestors.


To know why things are, you must know Who the people that do them, are. 
What makes them do these things. Here's some straight truth from someone 
who has been in the game for some time.


The white man -- Just Who, who really, he is (and why he do what he 
does):


http://www.liveleak.com/view?i=017_1519418755

#blackpanther, #wealth, #power, #shaolin

(Also how do you feel about youtube taking down this discussion? They 
couldn't let the power of the people to decide anything).

Re: [RFC][PATCH 00/10] Use global pages with PTI

2018-02-23 Thread Linus Torvalds

On Fri, Feb 23, 2018 at 5:49 PM, Dave Hansen
 wrote:
> On 02/22/2018 01:52 PM, Linus Torvalds wrote:
>> Side note - and this may be crazy talk - I wonder if it might make
>> sense to have a mode where we allow executable read-only kernel pages
>> to be marked global too (but only in the kernel mapping).
>
> We did that accidentally, somewhere.  It causes machine checks on K8's
> iirc, which is fun (52994c256df fixed it).  So, we'd need to make sure
> we avoid it there, or just make it global in the user mapping too.

They'd be missing _entirely_ in the user maps, which should be fine.
The problem that commit 52994c256df3 fixed was that they actually
existed in the user maps, just with different data, and then you can
have a ITLB and a DTLB entry for the same address that don't match
(because one has been loaded from the kernel mapping and the other
from the user one).

But when the address isn't mapped at all in the user map, that should
be fine - because there's no associated TLB entry to get mixed up
about.

It's no different from clearing a page from the page table before then
flushing the TLB entry later - which is the normal (and required)
behavior for unmapping a page. For a while it exists in the TLB
without existing in the page tables.

> Just for fun, I tried a 4-core Skylake system with KPTI and nopcid and
> compiled a random kernel 10 times.  I did three configs: no global, all
> kernel text global + cpu_entry_area, and only cpu_entry_area + entry
> text.  The delta percentages are from the Baseline.  The deltas are
> measurable, but the largest bang for our buck is obviously the entry text.
>
> User Time   Kernel Time Clock Elapsed
> Baseline (33 GLB PTEs)  907.6   81.6264.7
> Entry(28 GLB PTEs)  910.9 (+0.4%)   84.0 (+2.9%)265.2 (+0.2%)
> No global( 0 GLB PTEs)  914.2 (+0.7%)   89.2 (+9.3%)267.8 (+1.2%)

That's actually noticeable. Maybe not so much in the final elapsed
time itself, but almost 3% for just the kernel side sounds meaningful.

Of course, that's with nopcid, so it's a fairly small special case, but still.

> It's a single line of code to go from the "33" to "28" configuration, so
> it's totally doable.  But, it means having and parsing another boot
> option that confuses people and then I have to go write actual
> documentation, which I detest. :)

Heh.

Ok, maybe the complexity isn't in the code, but in the concept.

   Linus

Re: [RFC][PATCH 00/10] Use global pages with PTI

2018-02-23 Thread Linus Torvalds

On Fri, Feb 23, 2018 at 5:49 PM, Dave Hansen
 wrote:
> On 02/22/2018 01:52 PM, Linus Torvalds wrote:
>> Side note - and this may be crazy talk - I wonder if it might make
>> sense to have a mode where we allow executable read-only kernel pages
>> to be marked global too (but only in the kernel mapping).
>
> We did that accidentally, somewhere.  It causes machine checks on K8's
> iirc, which is fun (52994c256df fixed it).  So, we'd need to make sure
> we avoid it there, or just make it global in the user mapping too.

They'd be missing _entirely_ in the user maps, which should be fine.
The problem that commit 52994c256df3 fixed was that they actually
existed in the user maps, just with different data, and then you can
have a ITLB and a DTLB entry for the same address that don't match
(because one has been loaded from the kernel mapping and the other
from the user one).

But when the address isn't mapped at all in the user map, that should
be fine - because there's no associated TLB entry to get mixed up
about.

It's no different from clearing a page from the page table before then
flushing the TLB entry later - which is the normal (and required)
behavior for unmapping a page. For a while it exists in the TLB
without existing in the page tables.

> Just for fun, I tried a 4-core Skylake system with KPTI and nopcid and
> compiled a random kernel 10 times.  I did three configs: no global, all
> kernel text global + cpu_entry_area, and only cpu_entry_area + entry
> text.  The delta percentages are from the Baseline.  The deltas are
> measurable, but the largest bang for our buck is obviously the entry text.
>
> User Time   Kernel Time Clock Elapsed
> Baseline (33 GLB PTEs)  907.6   81.6264.7
> Entry(28 GLB PTEs)  910.9 (+0.4%)   84.0 (+2.9%)265.2 (+0.2%)
> No global( 0 GLB PTEs)  914.2 (+0.7%)   89.2 (+9.3%)267.8 (+1.2%)

That's actually noticeable. Maybe not so much in the final elapsed
time itself, but almost 3% for just the kernel side sounds meaningful.

Of course, that's with nopcid, so it's a fairly small special case, but still.

> It's a single line of code to go from the "33" to "28" configuration, so
> it's totally doable.  But, it means having and parsing another boot
> option that confuses people and then I have to go write actual
> documentation, which I detest. :)

Heh.

Ok, maybe the complexity isn't in the code, but in the concept.

   Linus

Re: Removing architectures without upstream gcc support

2018-02-23 Thread Guenter Roeck


On 02/23/2018 01:34 PM, Adam Borowski wrote:

On Fri, Feb 23, 2018 at 02:32:08PM -0500, James Bottomley wrote:

On Fri, 2018-02-23 at 18:19 +, Al Viro wrote:
[...]

IIRC, parisc/qemu stuff had been announced a while ago;


I have, but it didn't work sufficiently for me to either boot a kernel
using system emulation or start an architecture container using user
emulation.  I'll try again now that qemu has gone through several
revisions.


Doesn't seem to work.  Debian package (1:2.11+dfsg-1) ships hppa support but
it doesn't even install binfmt; otherwise, -user is functional enough for a
minimal executable (so arch-test reports it as working[1]), but not for
anything libc:

[/srv/chroots/hppa]# chroot . /usr/bin/qemu-hppa-static /bin/true
qemu-hppa-static: /build/qemu-v8TF72/qemu-2.11+dfsg/target/hppa/translate.c:422: 
nullify_end: Assertion `status != DISAS_NORETURN && status != 
DISAS_IAQ_N_UPDATED' failed.
Segmentation fault

This looks bad enough that I didn't even look at qemu-system.



qemu-system-hppa support was added to qemu end of January. It seems to boot 
fine,
only I lost my ability to build a root file system :-( so it may take a bit
for me to create one.

Guenter

Re: Removing architectures without upstream gcc support

2018-02-23 Thread Guenter Roeck


On 02/23/2018 01:34 PM, Adam Borowski wrote:

On Fri, Feb 23, 2018 at 02:32:08PM -0500, James Bottomley wrote:

On Fri, 2018-02-23 at 18:19 +, Al Viro wrote:
[...]

IIRC, parisc/qemu stuff had been announced a while ago;


I have, but it didn't work sufficiently for me to either boot a kernel
using system emulation or start an architecture container using user
emulation.  I'll try again now that qemu has gone through several
revisions.


Doesn't seem to work.  Debian package (1:2.11+dfsg-1) ships hppa support but
it doesn't even install binfmt; otherwise, -user is functional enough for a
minimal executable (so arch-test reports it as working[1]), but not for
anything libc:

[/srv/chroots/hppa]# chroot . /usr/bin/qemu-hppa-static /bin/true
qemu-hppa-static: /build/qemu-v8TF72/qemu-2.11+dfsg/target/hppa/translate.c:422: 
nullify_end: Assertion `status != DISAS_NORETURN && status != 
DISAS_IAQ_N_UPDATED' failed.
Segmentation fault

This looks bad enough that I didn't even look at qemu-system.



qemu-system-hppa support was added to qemu end of January. It seems to boot 
fine,
only I lost my ability to build a root file system :-( so it may take a bit
for me to create one.

Guenter

RE: [PATCH 3/4] drm/ttm: handle already locked BOs during eviction and swapout.

2018-02-23 Thread He, Roger

I missed the Per-VM-BO share the reservation object with root bo. So context is 
not NULL here.
So,  this patch is:

Reviewed-by: Roger He 

Thanks
Roger(Hongbo.He)
-Original Message-
From: Christian König [mailto:ckoenig.leichtzumer...@gmail.com]
Sent: Friday, February 23, 2018 8:06 PM
To: He, Roger ; amd-...@lists.freedesktop.org; 
dri-de...@lists.freedesktop.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/4] drm/ttm: handle already locked BOs during eviction and 
swapout.

Am 23.02.2018 um 10:46 schrieb He, Roger:
>
> -Original Message-
> From: dri-devel [mailto:dri-devel-boun...@lists.freedesktop.org] On 
> Behalf Of Christian K?nig
> Sent: Tuesday, February 20, 2018 8:58 PM
> To: amd-...@lists.freedesktop.org; dri-de...@lists.freedesktop.org; 
> linux-kernel@vger.kernel.org
> Subject: [PATCH 3/4] drm/ttm: handle already locked BOs during eviction and 
> swapout.
>
> This solves the problem that when we swapout a BO from a domain we sometimes 
> couldn't make room for it because holding the lock blocks all other BOs with 
> this reservation object.
>
> Signed-off-by: Christian König 
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c | 33 -
>   1 file changed, 16 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c 
> b/drivers/gpu/drm/ttm/ttm_bo.c index d90b1cf10b27..3a44c2ee4155 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -713,31 +713,30 @@ bool ttm_bo_eviction_valuable(struct 
> ttm_buffer_object *bo,  EXPORT_SYMBOL(ttm_bo_eviction_valuable);
>   
>   /**
> - * Check the target bo is allowable to be evicted or swapout, including 
> cases:
> - *
> - * a. if share same reservation object with ctx->resv, have 
> assumption
> - * reservation objects should already be locked, so not lock again 
> and
> - * return true directly when either the opreation 
> allow_reserved_eviction
> - * or the target bo already is in delayed free list;
> - *
> - * b. Otherwise, trylock it.
> + * Check if the target bo is allowed to be evicted or swapedout.
>*/
>   static bool ttm_bo_evict_swapout_allowable(struct ttm_buffer_object *bo,
> - struct ttm_operation_ctx *ctx, bool *locked)
> +struct ttm_operation_ctx *ctx,
> +bool *locked)
>   {
> - bool ret = false;
> + /* First check if we can lock it */
> + *locked = reservation_object_trylock(bo->resv);
> + if (*locked)
> + return true;
>   
> - *locked = false;
> + /* Check if it's locked because it is part of the current operation 
> +*/
>   if (bo->resv == ctx->resv) {
>   reservation_object_assert_held(bo->resv);
> - if (ctx->allow_reserved_eviction || !list_empty(>ddestroy))
> - ret = true;
> - } else {
> - *locked = reservation_object_trylock(bo->resv);
> - ret = *locked;
> + return ctx->allow_reserved_eviction ||
> + !list_empty(>ddestroy);
>   }
>   
> - return ret;
> + /* Check if it's locked because it was already evicted */
> + if (ww_mutex_is_owned_by(>resv->lock, NULL))
> + return true;
>
> For the special case: when command submission with Per-VM-BO enabled, 
> All BOs  a/b/c are always valid BO. After the validation of BOs a and 
> b, when validation of BO c, is it possible to return true and then evict BO a 
> and b by mistake ?
> Because a/b/c share same task_struct.

No, that's why I check the context as well. BOs explicitly reserved 
have a non NULL context while BOs trylocked for swapout havea NULL 
context.

BOs have a non NULL context only when command submission and reserved 
by ttm_eu_re6serve_buffers  .
But for Per-VM-BO a/b/c they always are not in BO list, so they will be 
not reserved and have always NULL context.
So above case also can happen. Anything missing here?  

>
> + /* Some other thread is using it, don't touch it */
> + return false;
>   }
>   
>   static int ttm_mem_evict_first(struct ttm_bo_device *bdev,
> --
> 2.14.1
>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

RE: [PATCH 3/4] drm/ttm: handle already locked BOs during eviction and swapout.

2018-02-23 Thread He, Roger

I missed the Per-VM-BO share the reservation object with root bo. So context is 
not NULL here.
So,  this patch is:

Reviewed-by: Roger He 

Thanks
Roger(Hongbo.He)
-Original Message-
From: Christian König [mailto:ckoenig.leichtzumer...@gmail.com]
Sent: Friday, February 23, 2018 8:06 PM
To: He, Roger ; amd-...@lists.freedesktop.org; 
dri-de...@lists.freedesktop.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/4] drm/ttm: handle already locked BOs during eviction and 
swapout.

Am 23.02.2018 um 10:46 schrieb He, Roger:
>
> -Original Message-
> From: dri-devel [mailto:dri-devel-boun...@lists.freedesktop.org] On 
> Behalf Of Christian K?nig
> Sent: Tuesday, February 20, 2018 8:58 PM
> To: amd-...@lists.freedesktop.org; dri-de...@lists.freedesktop.org; 
> linux-kernel@vger.kernel.org
> Subject: [PATCH 3/4] drm/ttm: handle already locked BOs during eviction and 
> swapout.
>
> This solves the problem that when we swapout a BO from a domain we sometimes 
> couldn't make room for it because holding the lock blocks all other BOs with 
> this reservation object.
>
> Signed-off-by: Christian König 
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c | 33 -
>   1 file changed, 16 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c 
> b/drivers/gpu/drm/ttm/ttm_bo.c index d90b1cf10b27..3a44c2ee4155 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -713,31 +713,30 @@ bool ttm_bo_eviction_valuable(struct 
> ttm_buffer_object *bo,  EXPORT_SYMBOL(ttm_bo_eviction_valuable);
>   
>   /**
> - * Check the target bo is allowable to be evicted or swapout, including 
> cases:
> - *
> - * a. if share same reservation object with ctx->resv, have 
> assumption
> - * reservation objects should already be locked, so not lock again 
> and
> - * return true directly when either the opreation 
> allow_reserved_eviction
> - * or the target bo already is in delayed free list;
> - *
> - * b. Otherwise, trylock it.
> + * Check if the target bo is allowed to be evicted or swapedout.
>*/
>   static bool ttm_bo_evict_swapout_allowable(struct ttm_buffer_object *bo,
> - struct ttm_operation_ctx *ctx, bool *locked)
> +struct ttm_operation_ctx *ctx,
> +bool *locked)
>   {
> - bool ret = false;
> + /* First check if we can lock it */
> + *locked = reservation_object_trylock(bo->resv);
> + if (*locked)
> + return true;
>   
> - *locked = false;
> + /* Check if it's locked because it is part of the current operation 
> +*/
>   if (bo->resv == ctx->resv) {
>   reservation_object_assert_held(bo->resv);
> - if (ctx->allow_reserved_eviction || !list_empty(>ddestroy))
> - ret = true;
> - } else {
> - *locked = reservation_object_trylock(bo->resv);
> - ret = *locked;
> + return ctx->allow_reserved_eviction ||
> + !list_empty(>ddestroy);
>   }
>   
> - return ret;
> + /* Check if it's locked because it was already evicted */
> + if (ww_mutex_is_owned_by(>resv->lock, NULL))
> + return true;
>
> For the special case: when command submission with Per-VM-BO enabled, 
> All BOs  a/b/c are always valid BO. After the validation of BOs a and 
> b, when validation of BO c, is it possible to return true and then evict BO a 
> and b by mistake ?
> Because a/b/c share same task_struct.

No, that's why I check the context as well. BOs explicitly reserved 
have a non NULL context while BOs trylocked for swapout havea NULL 
context.

BOs have a non NULL context only when command submission and reserved 
by ttm_eu_re6serve_buffers  .
But for Per-VM-BO a/b/c they always are not in BO list, so they will be 
not reserved and have always NULL context.
So above case also can happen. Anything missing here?  

>
> + /* Some other thread is using it, don't touch it */
> + return false;
>   }
>   
>   static int ttm_mem_evict_first(struct ttm_bo_device *bdev,
> --
> 2.14.1
>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

RE: [PATCH 3/4] drm/ttm: handle already locked BOs during eviction and swapout.

2018-02-23 Thread He, Roger



-Original Message-
From: Christian König [mailto:ckoenig.leichtzumer...@gmail.com] 
Sent: Friday, February 23, 2018 8:06 PM
To: He, Roger ; amd-...@lists.freedesktop.org; 
dri-de...@lists.freedesktop.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/4] drm/ttm: handle already locked BOs during eviction and 
swapout.

Am 23.02.2018 um 10:46 schrieb He, Roger:
>
> -Original Message-
> From: dri-devel [mailto:dri-devel-boun...@lists.freedesktop.org] On 
> Behalf Of Christian K?nig
> Sent: Tuesday, February 20, 2018 8:58 PM
> To: amd-...@lists.freedesktop.org; dri-de...@lists.freedesktop.org; 
> linux-kernel@vger.kernel.org
> Subject: [PATCH 3/4] drm/ttm: handle already locked BOs during eviction and 
> swapout.
>
> This solves the problem that when we swapout a BO from a domain we sometimes 
> couldn't make room for it because holding the lock blocks all other BOs with 
> this reservation object.
>
> Signed-off-by: Christian König 
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c | 33 -
>   1 file changed, 16 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c 
> b/drivers/gpu/drm/ttm/ttm_bo.c index d90b1cf10b27..3a44c2ee4155 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -713,31 +713,30 @@ bool ttm_bo_eviction_valuable(struct 
> ttm_buffer_object *bo,  EXPORT_SYMBOL(ttm_bo_eviction_valuable);
>   
>   /**
> - * Check the target bo is allowable to be evicted or swapout, including 
> cases:
> - *
> - * a. if share same reservation object with ctx->resv, have 
> assumption
> - * reservation objects should already be locked, so not lock again 
> and
> - * return true directly when either the opreation 
> allow_reserved_eviction
> - * or the target bo already is in delayed free list;
> - *
> - * b. Otherwise, trylock it.
> + * Check if the target bo is allowed to be evicted or swapedout.
>*/
>   static bool ttm_bo_evict_swapout_allowable(struct ttm_buffer_object *bo,
> - struct ttm_operation_ctx *ctx, bool *locked)
> +struct ttm_operation_ctx *ctx,
> +bool *locked)
>   {
> - bool ret = false;
> + /* First check if we can lock it */
> + *locked = reservation_object_trylock(bo->resv);
> + if (*locked)
> + return true;
>   
> - *locked = false;
> + /* Check if it's locked because it is part of the current operation 
> +*/
>   if (bo->resv == ctx->resv) {
>   reservation_object_assert_held(bo->resv);
> - if (ctx->allow_reserved_eviction || !list_empty(>ddestroy))
> - ret = true;
> - } else {
> - *locked = reservation_object_trylock(bo->resv);
> - ret = *locked;
> + return ctx->allow_reserved_eviction ||
> + !list_empty(>ddestroy);
>   }
>   
> - return ret;
> + /* Check if it's locked because it was already evicted */
> + if (ww_mutex_is_owned_by(>resv->lock, NULL))
> + return true;
>
> For the special case: when command submission with Per-VM-BO enabled, 
> All BOs  a/b/c are always valid BO. After the validation of BOs a and 
> b, when validation of BO c, is it possible to return true and then evict BO a 
> and b by mistake ?
> Because a/b/c share same task_struct.

No, that's why I check the context as well. BOs explicitly reserved 
have a non NULL context while BOs trylocked for swapout have a NULL context.

When BOs have a non NULL context only when command submission and reserved by 
ttm_eu_re6serve_buffers  .
But for Per-VM-BO a/b/c they always are not in BO list, so they will be not 
reserved and have always NULL context.
So above case also can happen. Anything missing here?  

Thanks
Roger(Hongbo.He)
>
> + /* Some other thread is using it, don't touch it */
> + return false;
>   }
>   
>   static int ttm_mem_evict_first(struct ttm_bo_device *bdev,
> --
> 2.14.1
>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

RE: [PATCH 3/4] drm/ttm: handle already locked BOs during eviction and swapout.

2018-02-23 Thread He, Roger



-Original Message-
From: Christian König [mailto:ckoenig.leichtzumer...@gmail.com] 
Sent: Friday, February 23, 2018 8:06 PM
To: He, Roger ; amd-...@lists.freedesktop.org; 
dri-de...@lists.freedesktop.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/4] drm/ttm: handle already locked BOs during eviction and 
swapout.

Am 23.02.2018 um 10:46 schrieb He, Roger:
>
> -Original Message-
> From: dri-devel [mailto:dri-devel-boun...@lists.freedesktop.org] On 
> Behalf Of Christian K?nig
> Sent: Tuesday, February 20, 2018 8:58 PM
> To: amd-...@lists.freedesktop.org; dri-de...@lists.freedesktop.org; 
> linux-kernel@vger.kernel.org
> Subject: [PATCH 3/4] drm/ttm: handle already locked BOs during eviction and 
> swapout.
>
> This solves the problem that when we swapout a BO from a domain we sometimes 
> couldn't make room for it because holding the lock blocks all other BOs with 
> this reservation object.
>
> Signed-off-by: Christian König 
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c | 33 -
>   1 file changed, 16 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c 
> b/drivers/gpu/drm/ttm/ttm_bo.c index d90b1cf10b27..3a44c2ee4155 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -713,31 +713,30 @@ bool ttm_bo_eviction_valuable(struct 
> ttm_buffer_object *bo,  EXPORT_SYMBOL(ttm_bo_eviction_valuable);
>   
>   /**
> - * Check the target bo is allowable to be evicted or swapout, including 
> cases:
> - *
> - * a. if share same reservation object with ctx->resv, have 
> assumption
> - * reservation objects should already be locked, so not lock again 
> and
> - * return true directly when either the opreation 
> allow_reserved_eviction
> - * or the target bo already is in delayed free list;
> - *
> - * b. Otherwise, trylock it.
> + * Check if the target bo is allowed to be evicted or swapedout.
>*/
>   static bool ttm_bo_evict_swapout_allowable(struct ttm_buffer_object *bo,
> - struct ttm_operation_ctx *ctx, bool *locked)
> +struct ttm_operation_ctx *ctx,
> +bool *locked)
>   {
> - bool ret = false;
> + /* First check if we can lock it */
> + *locked = reservation_object_trylock(bo->resv);
> + if (*locked)
> + return true;
>   
> - *locked = false;
> + /* Check if it's locked because it is part of the current operation 
> +*/
>   if (bo->resv == ctx->resv) {
>   reservation_object_assert_held(bo->resv);
> - if (ctx->allow_reserved_eviction || !list_empty(>ddestroy))
> - ret = true;
> - } else {
> - *locked = reservation_object_trylock(bo->resv);
> - ret = *locked;
> + return ctx->allow_reserved_eviction ||
> + !list_empty(>ddestroy);
>   }
>   
> - return ret;
> + /* Check if it's locked because it was already evicted */
> + if (ww_mutex_is_owned_by(>resv->lock, NULL))
> + return true;
>
> For the special case: when command submission with Per-VM-BO enabled, 
> All BOs  a/b/c are always valid BO. After the validation of BOs a and 
> b, when validation of BO c, is it possible to return true and then evict BO a 
> and b by mistake ?
> Because a/b/c share same task_struct.

No, that's why I check the context as well. BOs explicitly reserved 
have a non NULL context while BOs trylocked for swapout have a NULL context.

When BOs have a non NULL context only when command submission and reserved by 
ttm_eu_re6serve_buffers  .
But for Per-VM-BO a/b/c they always are not in BO list, so they will be not 
reserved and have always NULL context.
So above case also can happen. Anything missing here?  

Thanks
Roger(Hongbo.He)
>
> + /* Some other thread is using it, don't touch it */
> + return false;
>   }
>   
>   static int ttm_mem_evict_first(struct ttm_bo_device *bdev,
> --
> 2.14.1
>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH V4 net 3/3] tuntap: correctly add the missing XDP flush

2018-02-23 Thread Jason Wang

We don't flush batched XDP packets through xdp_do_flush_map(), this
will cause packets stall at TX queue. Consider we don't do XDP on NAPI
poll(), the only possible fix is to call xdp_do_flush_map()
immediately after xdp_do_redirect().

Note, this in fact won't try to batch packets through devmap, we could
address in the future.

Reported-by: Christoffer Dall 
Fixes: 761876c857cb ("tap: XDP support")
Signed-off-by: Jason Wang 
---
 drivers/net/tun.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 63d39fe6..7433bb2 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1663,6 +1663,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct 
*tun,
get_page(alloc_frag->page);
alloc_frag->offset += buflen;
err = xdp_do_redirect(tun->dev, , xdp_prog);
+   xdp_do_flush_map();
if (err)
goto err_redirect;
rcu_read_unlock();
-- 
2.7.4

[PATCH V4 net 3/3] tuntap: correctly add the missing XDP flush

2018-02-23 Thread Jason Wang

We don't flush batched XDP packets through xdp_do_flush_map(), this
will cause packets stall at TX queue. Consider we don't do XDP on NAPI
poll(), the only possible fix is to call xdp_do_flush_map()
immediately after xdp_do_redirect().

Note, this in fact won't try to batch packets through devmap, we could
address in the future.

Reported-by: Christoffer Dall 
Fixes: 761876c857cb ("tap: XDP support")
Signed-off-by: Jason Wang 
---
 drivers/net/tun.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 63d39fe6..7433bb2 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1663,6 +1663,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct 
*tun,
get_page(alloc_frag->page);
alloc_frag->offset += buflen;
err = xdp_do_redirect(tun->dev, , xdp_prog);
+   xdp_do_flush_map();
if (err)
goto err_redirect;
rcu_read_unlock();
-- 
2.7.4

[PATCH V4 net 1/3] Revert "tuntap: add missing xdp flush"

2018-02-23 Thread Jason Wang

This reverts commit 762c330d670e3d4b795cf7a8d761866fdd1eef49. The
reason is we try to batch packets for devmap which causes calling
xdp_do_flush() in the process context. Simply disabling preemption
may not work since process may move among processors which lead
xdp_do_flush() to miss some flushes on some processors.

So simply revert the patch, a follow-up patch will add the xdp flush
correctly.

Reported-by: Christoffer Dall 
Fixes: 762c330d670e ("tuntap: add missing xdp flush")
Signed-off-by: Jason Wang 
---
 drivers/net/tun.c | 15 ---
 1 file changed, 15 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index b52258c..2823a4a 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -181,7 +181,6 @@ struct tun_file {
struct tun_struct *detached;
struct ptr_ring tx_ring;
struct xdp_rxq_info xdp_rxq;
-   int xdp_pending_pkts;
 };
 
 struct tun_flow_entry {
@@ -1662,7 +1661,6 @@ static struct sk_buff *tun_build_skb(struct tun_struct 
*tun,
case XDP_REDIRECT:
get_page(alloc_frag->page);
alloc_frag->offset += buflen;
-   ++tfile->xdp_pending_pkts;
err = xdp_do_redirect(tun->dev, , xdp_prog);
if (err)
goto err_redirect;
@@ -1984,11 +1982,6 @@ static ssize_t tun_chr_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
result = tun_get_user(tun, tfile, NULL, from,
  file->f_flags & O_NONBLOCK, false);
 
-   if (tfile->xdp_pending_pkts) {
-   tfile->xdp_pending_pkts = 0;
-   xdp_do_flush_map();
-   }
-
tun_put(tun);
return result;
 }
@@ -2325,13 +2318,6 @@ static int tun_sendmsg(struct socket *sock, struct 
msghdr *m, size_t total_len)
ret = tun_get_user(tun, tfile, m->msg_control, >msg_iter,
   m->msg_flags & MSG_DONTWAIT,
   m->msg_flags & MSG_MORE);
-
-   if (tfile->xdp_pending_pkts >= NAPI_POLL_WEIGHT ||
-   !(m->msg_flags & MSG_MORE)) {
-   tfile->xdp_pending_pkts = 0;
-   xdp_do_flush_map();
-   }
-
tun_put(tun);
return ret;
 }
@@ -3163,7 +3149,6 @@ static int tun_chr_open(struct inode *inode, struct file 
* file)
sock_set_flag(>sk, SOCK_ZEROCOPY);
 
memset(>tx_ring, 0, sizeof(tfile->tx_ring));
-   tfile->xdp_pending_pkts = 0;
 
return 0;
 }
-- 
2.7.4

[PATCH V4 net 1/3] Revert "tuntap: add missing xdp flush"

2018-02-23 Thread Jason Wang

This reverts commit 762c330d670e3d4b795cf7a8d761866fdd1eef49. The
reason is we try to batch packets for devmap which causes calling
xdp_do_flush() in the process context. Simply disabling preemption
may not work since process may move among processors which lead
xdp_do_flush() to miss some flushes on some processors.

So simply revert the patch, a follow-up patch will add the xdp flush
correctly.

Reported-by: Christoffer Dall 
Fixes: 762c330d670e ("tuntap: add missing xdp flush")
Signed-off-by: Jason Wang 
---
 drivers/net/tun.c | 15 ---
 1 file changed, 15 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index b52258c..2823a4a 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -181,7 +181,6 @@ struct tun_file {
struct tun_struct *detached;
struct ptr_ring tx_ring;
struct xdp_rxq_info xdp_rxq;
-   int xdp_pending_pkts;
 };
 
 struct tun_flow_entry {
@@ -1662,7 +1661,6 @@ static struct sk_buff *tun_build_skb(struct tun_struct 
*tun,
case XDP_REDIRECT:
get_page(alloc_frag->page);
alloc_frag->offset += buflen;
-   ++tfile->xdp_pending_pkts;
err = xdp_do_redirect(tun->dev, , xdp_prog);
if (err)
goto err_redirect;
@@ -1984,11 +1982,6 @@ static ssize_t tun_chr_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
result = tun_get_user(tun, tfile, NULL, from,
  file->f_flags & O_NONBLOCK, false);
 
-   if (tfile->xdp_pending_pkts) {
-   tfile->xdp_pending_pkts = 0;
-   xdp_do_flush_map();
-   }
-
tun_put(tun);
return result;
 }
@@ -2325,13 +2318,6 @@ static int tun_sendmsg(struct socket *sock, struct 
msghdr *m, size_t total_len)
ret = tun_get_user(tun, tfile, m->msg_control, >msg_iter,
   m->msg_flags & MSG_DONTWAIT,
   m->msg_flags & MSG_MORE);
-
-   if (tfile->xdp_pending_pkts >= NAPI_POLL_WEIGHT ||
-   !(m->msg_flags & MSG_MORE)) {
-   tfile->xdp_pending_pkts = 0;
-   xdp_do_flush_map();
-   }
-
tun_put(tun);
return ret;
 }
@@ -3163,7 +3149,6 @@ static int tun_chr_open(struct inode *inode, struct file 
* file)
sock_set_flag(>sk, SOCK_ZEROCOPY);
 
memset(>tx_ring, 0, sizeof(tfile->tx_ring));
-   tfile->xdp_pending_pkts = 0;
 
return 0;
 }
-- 
2.7.4

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2774 matches

Mail list logo