date:20180808

Re: [PATCH v2] RISC-V: Don't use a global include guard for uapi/asm/syscalls.h

2018-08-08 Thread Christoph Hellwig

This actually seems to break the compilation for me in for-next:

hch@carbon:~/work/linux$ make ARCH=riscv
  CALLscripts/checksyscalls.sh
:1335:2: warning: #warning syscall rseq not implemented [-Wcpp]
  CHK include/generated/compile.h
  CC  arch/riscv/kernel/syscall_table.o
./arch/riscv/include/uapi/asm/syscalls.h:29:36: error: 'sys_riscv_flush_icache' 
undeclared here (not in a function); did you mean '__NR_riscv_flush_icache'?
 __SYSCALL(__NR_riscv_flush_icache, sys_riscv_flush_icache)
^
arch/riscv/kernel/syscall_table.c:21:37: note: in definition of macro 
'__SYSCALL'
 #define __SYSCALL(nr, call) [nr] = (call),
 ^~~~
make[1]: *** [scripts/Makefile.build:318: arch/riscv/kernel/syscall_table.o] 
Error 1
make: *** [Makefile:1029: arch/riscv/kernel] Error 2

Reverting it for now, but I guess this might hit others too..

Re: [PATCH v2] RISC-V: Don't use a global include guard for uapi/asm/syscalls.h

2018-08-08 Thread Christoph Hellwig

This actually seems to break the compilation for me in for-next:

hch@carbon:~/work/linux$ make ARCH=riscv
  CALLscripts/checksyscalls.sh
:1335:2: warning: #warning syscall rseq not implemented [-Wcpp]
  CHK include/generated/compile.h
  CC  arch/riscv/kernel/syscall_table.o
./arch/riscv/include/uapi/asm/syscalls.h:29:36: error: 'sys_riscv_flush_icache' 
undeclared here (not in a function); did you mean '__NR_riscv_flush_icache'?
 __SYSCALL(__NR_riscv_flush_icache, sys_riscv_flush_icache)
^
arch/riscv/kernel/syscall_table.c:21:37: note: in definition of macro 
'__SYSCALL'
 #define __SYSCALL(nr, call) [nr] = (call),
 ^~~~
make[1]: *** [scripts/Makefile.build:318: arch/riscv/kernel/syscall_table.o] 
Error 1
make: *** [Makefile:1029: arch/riscv/kernel] Error 2

Reverting it for now, but I guess this might hit others too..

Re: [PATCH v8 07/22] KVM: s390: refactor crypto initialization

2018-08-08 Thread Janosch Frank

On 08.08.2018 16:44, Tony Krowiak wrote:
> From: Tony Krowiak 
> 
> This patch refactors the code that initializes and sets up the
> crypto configuration for a guest. The following changes are
> implemented via this patch:
> 
> 1. Prior to the introduction of AP device virtualization, it
>was not necessary to provide guest access to the CRYCB
>unless the MSA extension 3 (MSAX3) facility was installed
>on the host system. With the introduction of AP device
>virtualization, the CRYCB must be made accessible to the
>guest if the AP instructions are installed on the host
>and are to be provided to the guest.
> 
> 2. Introduces a flag indicating AP instructions executed on
>the guest shall be interpreted by the firmware. It is
>initialized to indicate AP instructions are to be
>to be interpreted and is used to set the SIE bit for
>each vcpu during vcpu setup.
> 
> Signed-off-by: Tony Krowiak 
> Reviewed-by: Halil Pasic 
> Acked-by: Christian Borntraeger 
> Tested-by: Michael Mueller 
> Tested-by: Farhan Ali 
> Signed-off-by: Christian Borntraeger 

Acked-by: Janosch Frank 

> ---
>  arch/s390/include/asm/kvm_host.h |3 +
>  arch/s390/include/uapi/asm/kvm.h |1 +
>  arch/s390/kvm/kvm-s390.c |   86 
> --
>  3 files changed, 49 insertions(+), 41 deletions(-)
> 
> diff --git a/arch/s390/include/asm/kvm_host.h 
> b/arch/s390/include/asm/kvm_host.h
> index af39561..0c13f61 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -187,6 +187,7 @@ struct kvm_s390_sie_block {
>  #define ECA_AIV  0x0020
>  #define ECA_VX   0x0002
>  #define ECA_PROTEXCI 0x2000
> +#define ECA_APIE 0x0008
>  #define ECA_SII  0x0001
>   __u32   eca;/* 0x004c */
>  #define ICPT_INST0x04
> @@ -256,6 +257,7 @@ struct kvm_s390_sie_block {
>   __u8reservede4[4];  /* 0x00e4 */
>   __u64   tecmc;  /* 0x00e8 */
>   __u8reservedf0[12]; /* 0x00f0 */
> +#define CRYCB_FORMAT_MASK 0x0003
>  #define CRYCB_FORMAT1 0x0001
>  #define CRYCB_FORMAT2 0x0003
>   __u32   crycbd; /* 0x00fc */
> @@ -714,6 +716,7 @@ struct kvm_s390_crypto {
>   __u32 crycbd;
>   __u8 aes_kw;
>   __u8 dea_kw;
> + __u8 apie;

In the last review I wanted a comment here to know what they do.

>  static void kvm_s390_vcpu_crypto_setup(struct kvm_vcpu *vcpu)
>  {
> - if (!test_kvm_facility(vcpu->kvm, 76))
> + /*
> +  * If neither the AP instructions nor the MSAX3 facility are installed
> +  * on the host, then there is no need for a CRYCB in SIE because the
> +  * they will not be installed on the guest either.

the they

> +  */
> + if (ap_instructions_available() && !test_facility(76))
>   return;

I know you're not responsible for that one :) but 0 being the wanted
value here is a bit counter-intuitive.

>  
> - vcpu->arch.sie_block->ecb3 &= ~(ECB3_AES | ECB3_DEA);
> + vcpu->arch.sie_block->crycbd = vcpu->kvm->arch.crypto.crycbd;
> +
> + vcpu->arch.sie_block->eca &= ~ECA_APIE;

The scb is zero allocated, are the ECA and the ECB3s set somewhere
in-between, or is that your way of making sure the controls are
definitely gone for good?

> + if (vcpu->kvm->arch.crypto.apie &&
> + test_kvm_cpu_feat(vcpu->kvm, KVM_S390_VM_CPU_FEAT_AP))
> + vcpu->arch.sie_block->eca |= ECA_APIE;
>  
> - if (vcpu->kvm->arch.crypto.aes_kw)
> - vcpu->arch.sie_block->ecb3 |= ECB3_AES;
> - if (vcpu->kvm->arch.crypto.dea_kw)
> - vcpu->arch.sie_block->ecb3 |= ECB3_DEA;
> + /* If MSAX3 is installed on the guest, set up protected key support */
> + if (test_kvm_facility(vcpu->kvm, 76)) {
> + vcpu->arch.sie_block->ecb3 &= ~(ECB3_AES | ECB3_DEA);
>  
> - vcpu->arch.sie_block->crycbd = vcpu->kvm->arch.crypto.crycbd;
> + if (vcpu->kvm->arch.crypto.aes_kw)
> + vcpu->arch.sie_block->ecb3 |= ECB3_AES;
> + if (vcpu->kvm->arch.crypto.dea_kw)
> + vcpu->arch.sie_block->ecb3 |= ECB3_DEA;
> + }
>  }
>  
>  void kvm_s390_vcpu_unsetup_cmma(struct kvm_vcpu *vcpu)
> 




signature.asc
Description: OpenPGP digital signature

Re: [PATCH v8 07/22] KVM: s390: refactor crypto initialization

2018-08-08 Thread Janosch Frank

On 08.08.2018 16:44, Tony Krowiak wrote:
> From: Tony Krowiak 
> 
> This patch refactors the code that initializes and sets up the
> crypto configuration for a guest. The following changes are
> implemented via this patch:
> 
> 1. Prior to the introduction of AP device virtualization, it
>was not necessary to provide guest access to the CRYCB
>unless the MSA extension 3 (MSAX3) facility was installed
>on the host system. With the introduction of AP device
>virtualization, the CRYCB must be made accessible to the
>guest if the AP instructions are installed on the host
>and are to be provided to the guest.
> 
> 2. Introduces a flag indicating AP instructions executed on
>the guest shall be interpreted by the firmware. It is
>initialized to indicate AP instructions are to be
>to be interpreted and is used to set the SIE bit for
>each vcpu during vcpu setup.
> 
> Signed-off-by: Tony Krowiak 
> Reviewed-by: Halil Pasic 
> Acked-by: Christian Borntraeger 
> Tested-by: Michael Mueller 
> Tested-by: Farhan Ali 
> Signed-off-by: Christian Borntraeger 

Acked-by: Janosch Frank 

> ---
>  arch/s390/include/asm/kvm_host.h |3 +
>  arch/s390/include/uapi/asm/kvm.h |1 +
>  arch/s390/kvm/kvm-s390.c |   86 
> --
>  3 files changed, 49 insertions(+), 41 deletions(-)
> 
> diff --git a/arch/s390/include/asm/kvm_host.h 
> b/arch/s390/include/asm/kvm_host.h
> index af39561..0c13f61 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -187,6 +187,7 @@ struct kvm_s390_sie_block {
>  #define ECA_AIV  0x0020
>  #define ECA_VX   0x0002
>  #define ECA_PROTEXCI 0x2000
> +#define ECA_APIE 0x0008
>  #define ECA_SII  0x0001
>   __u32   eca;/* 0x004c */
>  #define ICPT_INST0x04
> @@ -256,6 +257,7 @@ struct kvm_s390_sie_block {
>   __u8reservede4[4];  /* 0x00e4 */
>   __u64   tecmc;  /* 0x00e8 */
>   __u8reservedf0[12]; /* 0x00f0 */
> +#define CRYCB_FORMAT_MASK 0x0003
>  #define CRYCB_FORMAT1 0x0001
>  #define CRYCB_FORMAT2 0x0003
>   __u32   crycbd; /* 0x00fc */
> @@ -714,6 +716,7 @@ struct kvm_s390_crypto {
>   __u32 crycbd;
>   __u8 aes_kw;
>   __u8 dea_kw;
> + __u8 apie;

In the last review I wanted a comment here to know what they do.

>  static void kvm_s390_vcpu_crypto_setup(struct kvm_vcpu *vcpu)
>  {
> - if (!test_kvm_facility(vcpu->kvm, 76))
> + /*
> +  * If neither the AP instructions nor the MSAX3 facility are installed
> +  * on the host, then there is no need for a CRYCB in SIE because the
> +  * they will not be installed on the guest either.

the they

> +  */
> + if (ap_instructions_available() && !test_facility(76))
>   return;

I know you're not responsible for that one :) but 0 being the wanted
value here is a bit counter-intuitive.

>  
> - vcpu->arch.sie_block->ecb3 &= ~(ECB3_AES | ECB3_DEA);
> + vcpu->arch.sie_block->crycbd = vcpu->kvm->arch.crypto.crycbd;
> +
> + vcpu->arch.sie_block->eca &= ~ECA_APIE;

The scb is zero allocated, are the ECA and the ECB3s set somewhere
in-between, or is that your way of making sure the controls are
definitely gone for good?

> + if (vcpu->kvm->arch.crypto.apie &&
> + test_kvm_cpu_feat(vcpu->kvm, KVM_S390_VM_CPU_FEAT_AP))
> + vcpu->arch.sie_block->eca |= ECA_APIE;
>  
> - if (vcpu->kvm->arch.crypto.aes_kw)
> - vcpu->arch.sie_block->ecb3 |= ECB3_AES;
> - if (vcpu->kvm->arch.crypto.dea_kw)
> - vcpu->arch.sie_block->ecb3 |= ECB3_DEA;
> + /* If MSAX3 is installed on the guest, set up protected key support */
> + if (test_kvm_facility(vcpu->kvm, 76)) {
> + vcpu->arch.sie_block->ecb3 &= ~(ECB3_AES | ECB3_DEA);
>  
> - vcpu->arch.sie_block->crycbd = vcpu->kvm->arch.crypto.crycbd;
> + if (vcpu->kvm->arch.crypto.aes_kw)
> + vcpu->arch.sie_block->ecb3 |= ECB3_AES;
> + if (vcpu->kvm->arch.crypto.dea_kw)
> + vcpu->arch.sie_block->ecb3 |= ECB3_DEA;
> + }
>  }
>  
>  void kvm_s390_vcpu_unsetup_cmma(struct kvm_vcpu *vcpu)
> 




signature.asc
Description: OpenPGP digital signature

Re: [PATCH v5 1/2] leds: core: Introduce LED pattern trigger

2018-08-08 Thread Baolin Wang

Hi Jacek,

On 9 August 2018 at 05:28, Jacek Anaszewski  wrote:
> Hi Baolin,
>
> On 08/08/2018 08:01 AM, Baolin Wang wrote:
>> Hi Jacek,
>>
>> On 8 August 2018 at 05:54, Jacek Anaszewski  
>> wrote:
>>> Hi Baolin,
>>>
>>> Thank you for addressing the review remarks.
>>> Since the patch set is targeted for 4.19, then we have three weeks
>>> before it will be merged to the for-next anyway. That said, I propose
>>> one more modification, please take a look below.
>>
>> Sure.
>>
>>>
>>> On 08/06/2018 02:05 PM, Baolin Wang wrote:
 Some LED controllers have support for autonomously controlling
 brightness over time, according to some preprogrammed pattern or
 function.

 This patch adds pattern trigger that LED device can configure the
 pattern and trigger it.

 Signed-off-by: Raphael Teysseyre 
 Signed-off-by: Baolin Wang 
 ---
 Changes from v4:
  - Change the repeat file to return the originally written number.
  - Improve comments.
  - Fix some build warnings.

 Changes from v3:
  - Reset pattern number to 0 if user provides incorrect pattern string.
  - Support one pattern.

 Changes from v2:
  - Remove hardware_pattern boolen.
  - Chnage the pattern string format.

 Changes from v1:
  - Use ATTRIBUTE_GROUPS() to define attributes.
  - Introduce hardware_pattern flag to determine if software pattern
  or hardware pattern.
  - Re-implement pattern_trig_store_pattern() function.
  - Remove pattern_get() interface.
  - Improve comments.
  - Other small optimization.
 ---
  .../ABI/testing/sysfs-class-led-trigger-pattern|   24 ++
  drivers/leds/trigger/Kconfig   |7 +
  drivers/leds/trigger/Makefile  |1 +
  drivers/leds/trigger/ledtrig-pattern.c |  266 
 
  include/linux/leds.h   |   16 ++
  5 files changed, 314 insertions(+)
  create mode 100644 
 Documentation/ABI/testing/sysfs-class-led-trigger-pattern
  create mode 100644 drivers/leds/trigger/ledtrig-pattern.c

 diff --git a/Documentation/ABI/testing/sysfs-class-led-trigger-pattern 
 b/Documentation/ABI/testing/sysfs-class-led-trigger-pattern
 new file mode 100644
 index 000..bc7475f
 --- /dev/null
 +++ b/Documentation/ABI/testing/sysfs-class-led-trigger-pattern
 @@ -0,0 +1,24 @@
 +What:/sys/class/leds//pattern
 +Date:August 2018
 +KernelVersion:   4.19
 +Description:
 + Specify a pattern for the LED, for LED hardware that support
 + altering the brightness as a function of time.
 +
 + The pattern is given by a series of tuples, of brightness and
 + duration (ms). The LED is expected to traverse the series and
 + each brightness value for the specified duration. Duration of
 + 0 means brightness should immediately change to new value.
 +
 + The format of the pattern values should be:
 + "brightness_1 duration_1 brightness_2 duration_2 brightness_3
 + duration_3 ...".
 +
 +What:/sys/class/leds//repeat
 +Date:August 2018
 +KernelVersion:   4.19
 +Description:
 + Specify a pattern repeat number. 0 means repeat indefinitely.
 +
 + This file will always return the originally written repeat
 + number.
 diff --git a/drivers/leds/trigger/Kconfig b/drivers/leds/trigger/Kconfig
 index 4018af7..b76fc3c 100644
 --- a/drivers/leds/trigger/Kconfig
 +++ b/drivers/leds/trigger/Kconfig
 @@ -129,4 +129,11 @@ config LEDS_TRIGGER_NETDEV
 This allows LEDs to be controlled by network device activity.
 If unsure, say Y.

 +config LEDS_TRIGGER_PATTERN
 + tristate "LED Pattern Trigger"
 + help
 +   This allows LEDs to be controlled by a software or hardware pattern
 +   which is a series of tuples, of brightness and duration (ms).
 +   If unsure, say N
 +
  endif # LEDS_TRIGGERS
 diff --git a/drivers/leds/trigger/Makefile b/drivers/leds/trigger/Makefile
 index f3cfe19..9bcb64e 100644
 --- a/drivers/leds/trigger/Makefile
 +++ b/drivers/leds/trigger/Makefile
 @@ -13,3 +13,4 @@ obj-$(CONFIG_LEDS_TRIGGER_TRANSIENT)+= 
 ledtrig-transient.o
  obj-$(CONFIG_LEDS_TRIGGER_CAMERA)+= ledtrig-camera.o
  obj-$(CONFIG_LEDS_TRIGGER_PANIC) += ledtrig-panic.o
  obj-$(CONFIG_LEDS_TRIGGER_NETDEV)+= ledtrig-netdev.o
 +obj-$(CONFIG_LEDS_TRIGGER_PATTERN)   += ledtrig-pattern.o
 diff --git a/drivers/leds/trigger/ledtrig-pattern.c 
 b/drivers/leds/trigger/ledtrig-pattern.c
 new file mode 100644
 index 000..63b94a2

Re: [PATCH v5 1/2] leds: core: Introduce LED pattern trigger

2018-08-08 Thread Baolin Wang

Hi Jacek,

On 9 August 2018 at 05:28, Jacek Anaszewski  wrote:
> Hi Baolin,
>
> On 08/08/2018 08:01 AM, Baolin Wang wrote:
>> Hi Jacek,
>>
>> On 8 August 2018 at 05:54, Jacek Anaszewski  
>> wrote:
>>> Hi Baolin,
>>>
>>> Thank you for addressing the review remarks.
>>> Since the patch set is targeted for 4.19, then we have three weeks
>>> before it will be merged to the for-next anyway. That said, I propose
>>> one more modification, please take a look below.
>>
>> Sure.
>>
>>>
>>> On 08/06/2018 02:05 PM, Baolin Wang wrote:
 Some LED controllers have support for autonomously controlling
 brightness over time, according to some preprogrammed pattern or
 function.

 This patch adds pattern trigger that LED device can configure the
 pattern and trigger it.

 Signed-off-by: Raphael Teysseyre 
 Signed-off-by: Baolin Wang 
 ---
 Changes from v4:
  - Change the repeat file to return the originally written number.
  - Improve comments.
  - Fix some build warnings.

 Changes from v3:
  - Reset pattern number to 0 if user provides incorrect pattern string.
  - Support one pattern.

 Changes from v2:
  - Remove hardware_pattern boolen.
  - Chnage the pattern string format.

 Changes from v1:
  - Use ATTRIBUTE_GROUPS() to define attributes.
  - Introduce hardware_pattern flag to determine if software pattern
  or hardware pattern.
  - Re-implement pattern_trig_store_pattern() function.
  - Remove pattern_get() interface.
  - Improve comments.
  - Other small optimization.
 ---
  .../ABI/testing/sysfs-class-led-trigger-pattern|   24 ++
  drivers/leds/trigger/Kconfig   |7 +
  drivers/leds/trigger/Makefile  |1 +
  drivers/leds/trigger/ledtrig-pattern.c |  266 
 
  include/linux/leds.h   |   16 ++
  5 files changed, 314 insertions(+)
  create mode 100644 
 Documentation/ABI/testing/sysfs-class-led-trigger-pattern
  create mode 100644 drivers/leds/trigger/ledtrig-pattern.c

 diff --git a/Documentation/ABI/testing/sysfs-class-led-trigger-pattern 
 b/Documentation/ABI/testing/sysfs-class-led-trigger-pattern
 new file mode 100644
 index 000..bc7475f
 --- /dev/null
 +++ b/Documentation/ABI/testing/sysfs-class-led-trigger-pattern
 @@ -0,0 +1,24 @@
 +What:/sys/class/leds//pattern
 +Date:August 2018
 +KernelVersion:   4.19
 +Description:
 + Specify a pattern for the LED, for LED hardware that support
 + altering the brightness as a function of time.
 +
 + The pattern is given by a series of tuples, of brightness and
 + duration (ms). The LED is expected to traverse the series and
 + each brightness value for the specified duration. Duration of
 + 0 means brightness should immediately change to new value.
 +
 + The format of the pattern values should be:
 + "brightness_1 duration_1 brightness_2 duration_2 brightness_3
 + duration_3 ...".
 +
 +What:/sys/class/leds//repeat
 +Date:August 2018
 +KernelVersion:   4.19
 +Description:
 + Specify a pattern repeat number. 0 means repeat indefinitely.
 +
 + This file will always return the originally written repeat
 + number.
 diff --git a/drivers/leds/trigger/Kconfig b/drivers/leds/trigger/Kconfig
 index 4018af7..b76fc3c 100644
 --- a/drivers/leds/trigger/Kconfig
 +++ b/drivers/leds/trigger/Kconfig
 @@ -129,4 +129,11 @@ config LEDS_TRIGGER_NETDEV
 This allows LEDs to be controlled by network device activity.
 If unsure, say Y.

 +config LEDS_TRIGGER_PATTERN
 + tristate "LED Pattern Trigger"
 + help
 +   This allows LEDs to be controlled by a software or hardware pattern
 +   which is a series of tuples, of brightness and duration (ms).
 +   If unsure, say N
 +
  endif # LEDS_TRIGGERS
 diff --git a/drivers/leds/trigger/Makefile b/drivers/leds/trigger/Makefile
 index f3cfe19..9bcb64e 100644
 --- a/drivers/leds/trigger/Makefile
 +++ b/drivers/leds/trigger/Makefile
 @@ -13,3 +13,4 @@ obj-$(CONFIG_LEDS_TRIGGER_TRANSIENT)+= 
 ledtrig-transient.o
  obj-$(CONFIG_LEDS_TRIGGER_CAMERA)+= ledtrig-camera.o
  obj-$(CONFIG_LEDS_TRIGGER_PANIC) += ledtrig-panic.o
  obj-$(CONFIG_LEDS_TRIGGER_NETDEV)+= ledtrig-netdev.o
 +obj-$(CONFIG_LEDS_TRIGGER_PATTERN)   += ledtrig-pattern.o
 diff --git a/drivers/leds/trigger/ledtrig-pattern.c 
 b/drivers/leds/trigger/ledtrig-pattern.c
 new file mode 100644
 index 000..63b94a2

[PATCH] sched: idle: Reenable sched tick for cpuidle request

2018-08-08 Thread Leo Yan

The idle loop stops tick by respecting the decision from cpuidle
framework, if the condition 'need_resched()' is false without any task
scheduling, the CPU keeps running in the loop in do_idle() and it has no
chance call tick_nohz_idle_exit() to enable the tick.  This results in
the idle loop cannot reenable sched tick afterwards, if the idle
governor selects a shallow state, thus the powernightmares issue can
occur again.

This issue can be easily reproduce with the case on Arm Hikey board: use
CPU0 to send IPI to CPU7, CPU7 receives the IPI and in the callback
function it start a hrtimer with 4ms, so the 4ms timer delta value can
let 'menu' governor to choose deepest state in the next entering idle
time.  From then on, CPU7 restarts hrtimer with 1ms interval for total
10 times, so this can utilize the typical pattern in 'menu' governor to
have prediction for 1ms duration, finally idle governor is easily to
select a shallow state, on Hikey board it usually is to select CPU off
state.  From then on, CPU7 stays in this shallow state for long time
until there have other interrupts on it.

C2: cluster off; C1: CPU off

Idle state:   C2C2C2C2C2C2C2C1
->
Interrupt:   ^^ ^ ^ ^ ^ ^ ^ ^
IPI  Timer Timer Timer Timer Timer Timer Timer Timer
 4ms   1ms   1ms   1ms   1ms   1ms   1ms   1ms

To fix this issue, the idle loop needs to support reenabling sched tick.
This patch checks the conditions 'stop_tick' is false when the tick is
stopped, this condition indicates the cpuidle governor asks to reenable
the tick and we can use tick_nohz_idle_restart_tick() for this purpose.

A synthetic case is used to to verify this patch, we use CPU0 to send
IPI to wake up CPU7 with 50ms interval, CPU7 generate a series hrtimer
events (the first interval is 4ms, then the sequential 10 timer events
are 1ms interval, same as described above).  We do statistics for idle
states duration, the unit is second (s), the testing result shows the
C2 state (deepest state) staying time can be improved significantly for
CPU7 (+7.942s for 10s execution time on CPU7) and all CPUs wide
(+13.360s for ~80s of all CPUs execution time).

   Without patches With patches Difference
     ---
CPUC0 C1  C2 C0 C1  C2  C0  C1   C2
00.000  0.027   9.941  0.055  0.038   9.700  +0.055  +0.010   -0.240
10.045  0.000   9.964  0.019  0.000   9.943  -0.026  +0.000   -0.020
20.002  0.003  10.007  0.035  0.053   9.916  +0.033  +0.049   -0.090
30.000  0.023   9.994  0.024  0.246   9.732  +0.024  +0.222   -0.261
40.032  0.000   9.985  0.015  0.007   9.993  -0.016  +0.007   +0.008
50.001  0.000   9.226  0.039  0.000   9.971  +0.038  +0.000   +0.744
60.000  0.000   0.000  0.036  0.000   5.278  +0.036  +0.000   +5.278
71.894  8.013   0.059  1.509  0.026   8.002  -0.384  -7.987   +7.942
All  1.976  8.068  59.179  1.737  0.372  72.539  -0.239  -7.695  +13.360

Cc: Daniel Lezcano 
Cc: Vincent Guittot 
Signed-off-by: Leo Yan 
---
 kernel/sched/idle.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 1a3e9bd..802286e 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -190,10 +190,18 @@ static void cpuidle_idle_call(void)
 */
next_state = cpuidle_select(drv, dev, _tick);
 
-   if (stop_tick)
+   if (stop_tick) {
tick_nohz_idle_stop_tick();
-   else
+   } else {
+   /*
+* The cpuidle framework says to not stop tick but
+* the tick has been stopped yet, so restart it.
+*/
+   if (tick_nohz_tick_stopped())
+   tick_nohz_idle_restart_tick();
+
tick_nohz_idle_retain_tick();
+   }
 
rcu_idle_enter();
 
-- 
2.7.4

Crypto Fixes for 4.18

2018-08-08 Thread Herbert Xu

Hi Linus: 

This push fixes a performance regression in arm64 NEON crypto as
well as a crash in x86 aegis/morus on unsupported CPUs.


Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6.git linus


Ard Biesheuvel (1):
  crypto: arm64 - revert NEON yield for fast AEAD implementations

Ondrej Mosnacek (1):
  crypto: x86/aegis,morus - Fix and simplify CPUID checks

 arch/arm64/crypto/aes-ce-ccm-core.S|  150 
 arch/arm64/crypto/ghash-ce-core.S  |   76 ++--
 arch/x86/crypto/aegis128-aesni-glue.c  |   12 +--
 arch/x86/crypto/aegis128l-aesni-glue.c |   12 +--
 arch/x86/crypto/aegis256-aesni-glue.c  |   12 +--
 arch/x86/crypto/morus1280-avx2-glue.c  |   10 +--
 arch/x86/crypto/morus1280-sse2-glue.c  |   10 +--
 arch/x86/crypto/morus640-sse2-glue.c   |   10 +--
 8 files changed, 101 insertions(+), 191 deletions(-)

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

[PATCH] sched: idle: Reenable sched tick for cpuidle request

2018-08-08 Thread Leo Yan

The idle loop stops tick by respecting the decision from cpuidle
framework, if the condition 'need_resched()' is false without any task
scheduling, the CPU keeps running in the loop in do_idle() and it has no
chance call tick_nohz_idle_exit() to enable the tick.  This results in
the idle loop cannot reenable sched tick afterwards, if the idle
governor selects a shallow state, thus the powernightmares issue can
occur again.

This issue can be easily reproduce with the case on Arm Hikey board: use
CPU0 to send IPI to CPU7, CPU7 receives the IPI and in the callback
function it start a hrtimer with 4ms, so the 4ms timer delta value can
let 'menu' governor to choose deepest state in the next entering idle
time.  From then on, CPU7 restarts hrtimer with 1ms interval for total
10 times, so this can utilize the typical pattern in 'menu' governor to
have prediction for 1ms duration, finally idle governor is easily to
select a shallow state, on Hikey board it usually is to select CPU off
state.  From then on, CPU7 stays in this shallow state for long time
until there have other interrupts on it.

C2: cluster off; C1: CPU off

Idle state:   C2C2C2C2C2C2C2C1
->
Interrupt:   ^^ ^ ^ ^ ^ ^ ^ ^
IPI  Timer Timer Timer Timer Timer Timer Timer Timer
 4ms   1ms   1ms   1ms   1ms   1ms   1ms   1ms

To fix this issue, the idle loop needs to support reenabling sched tick.
This patch checks the conditions 'stop_tick' is false when the tick is
stopped, this condition indicates the cpuidle governor asks to reenable
the tick and we can use tick_nohz_idle_restart_tick() for this purpose.

A synthetic case is used to to verify this patch, we use CPU0 to send
IPI to wake up CPU7 with 50ms interval, CPU7 generate a series hrtimer
events (the first interval is 4ms, then the sequential 10 timer events
are 1ms interval, same as described above).  We do statistics for idle
states duration, the unit is second (s), the testing result shows the
C2 state (deepest state) staying time can be improved significantly for
CPU7 (+7.942s for 10s execution time on CPU7) and all CPUs wide
(+13.360s for ~80s of all CPUs execution time).

   Without patches With patches Difference
     ---
CPUC0 C1  C2 C0 C1  C2  C0  C1   C2
00.000  0.027   9.941  0.055  0.038   9.700  +0.055  +0.010   -0.240
10.045  0.000   9.964  0.019  0.000   9.943  -0.026  +0.000   -0.020
20.002  0.003  10.007  0.035  0.053   9.916  +0.033  +0.049   -0.090
30.000  0.023   9.994  0.024  0.246   9.732  +0.024  +0.222   -0.261
40.032  0.000   9.985  0.015  0.007   9.993  -0.016  +0.007   +0.008
50.001  0.000   9.226  0.039  0.000   9.971  +0.038  +0.000   +0.744
60.000  0.000   0.000  0.036  0.000   5.278  +0.036  +0.000   +5.278
71.894  8.013   0.059  1.509  0.026   8.002  -0.384  -7.987   +7.942
All  1.976  8.068  59.179  1.737  0.372  72.539  -0.239  -7.695  +13.360

Cc: Daniel Lezcano 
Cc: Vincent Guittot 
Signed-off-by: Leo Yan 
---
 kernel/sched/idle.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 1a3e9bd..802286e 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -190,10 +190,18 @@ static void cpuidle_idle_call(void)
 */
next_state = cpuidle_select(drv, dev, _tick);
 
-   if (stop_tick)
+   if (stop_tick) {
tick_nohz_idle_stop_tick();
-   else
+   } else {
+   /*
+* The cpuidle framework says to not stop tick but
+* the tick has been stopped yet, so restart it.
+*/
+   if (tick_nohz_tick_stopped())
+   tick_nohz_idle_restart_tick();
+
tick_nohz_idle_retain_tick();
+   }
 
rcu_idle_enter();
 
-- 
2.7.4

Crypto Fixes for 4.18

2018-08-08 Thread Herbert Xu

Hi Linus: 

This push fixes a performance regression in arm64 NEON crypto as
well as a crash in x86 aegis/morus on unsupported CPUs.


Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6.git linus


Ard Biesheuvel (1):
  crypto: arm64 - revert NEON yield for fast AEAD implementations

Ondrej Mosnacek (1):
  crypto: x86/aegis,morus - Fix and simplify CPUID checks

 arch/arm64/crypto/aes-ce-ccm-core.S|  150 
 arch/arm64/crypto/ghash-ce-core.S  |   76 ++--
 arch/x86/crypto/aegis128-aesni-glue.c  |   12 +--
 arch/x86/crypto/aegis128l-aesni-glue.c |   12 +--
 arch/x86/crypto/aegis256-aesni-glue.c  |   12 +--
 arch/x86/crypto/morus1280-avx2-glue.c  |   10 +--
 arch/x86/crypto/morus1280-sse2-glue.c  |   10 +--
 arch/x86/crypto/morus640-sse2-glue.c   |   10 +--
 8 files changed, 101 insertions(+), 191 deletions(-)

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH RFC 2/2] KVM: s390: introduce and use KVM_REQ_VSIE_RESTART

2018-08-08 Thread Janosch Frank

On 07.08.2018 14:51, David Hildenbrand wrote:
> When we change the crycb (or execution controls), we also have to make sure
> that the vSIE shadow datastructures properly consider the changed
> values before rerunning the vSIE. We can achieve that by simply using a
> VCPU request now.
> 
> This has to be a synchronous request (== handled before entering the
> (v)SIE again).
> 
> The request will make sure that the vSIE handler is left, and that the
> request will be processed (NOP), therefore forcing a reload of all
> vSIE data (including rebuilding the crycb) when re-entering the vSIE
> interception handler the next time.
> 
> Signed-off-by: David Hildenbrand 

Reviewed-by: Janosch Frank 



signature.asc
Description: OpenPGP digital signature

Re: [PATCH RFC 2/2] KVM: s390: introduce and use KVM_REQ_VSIE_RESTART

2018-08-08 Thread Janosch Frank

On 07.08.2018 14:51, David Hildenbrand wrote:
> When we change the crycb (or execution controls), we also have to make sure
> that the vSIE shadow datastructures properly consider the changed
> values before rerunning the vSIE. We can achieve that by simply using a
> VCPU request now.
> 
> This has to be a synchronous request (== handled before entering the
> (v)SIE again).
> 
> The request will make sure that the vSIE handler is left, and that the
> request will be processed (NOP), therefore forcing a reload of all
> vSIE data (including rebuilding the crycb) when re-entering the vSIE
> interception handler the next time.
> 
> Signed-off-by: David Hildenbrand 

Reviewed-by: Janosch Frank 



signature.asc
Description: OpenPGP digital signature

Re: [PATCH RFC 1/2] KVM: s390: vsie: simulate VCPU SIE entry/exit

2018-08-08 Thread Janosch Frank

On 07.08.2018 14:51, David Hildenbrand wrote:
> VCPU requests and VCPU blocking right now don't take care of the vSIE
> (as it was not necessary until now). But we want to have VCPU requests
> that will also be handled before running the vSIE again.
> 
> So let's simulate a SIE entry when entering the vSIE loop and check
> for PROG_ flags. The existing infrastructure (e.g. exit_sie()) will then
> detect that the SIE (in form of the vSIE execution loop) is running and
> properly kick the vSIE CPU, resulting in it leaving the vSIE loop and
> therefore the vSIE interception handler, allowing it to handle VCPU
> requests.
> 
> E.g. if we want to modify the crycb of the VCPU and make sure that any
> masks also get applied to the VSIE crycb shadow (which uses masks from the
> VCPU crycb), we will need a way to hinder the vSIE from running and make
> sure to process the updated crycb before reentering the vSIE again.
> 
> Signed-off-by: David Hildenbrand 

Finally found some time:
Reviewed-by: Janosch Frank 

As the first user will be AP, I guess the patches will be queued with
them. Thanks for helping out :)



signature.asc
Description: OpenPGP digital signature

Re: [PATCH RFC 1/2] KVM: s390: vsie: simulate VCPU SIE entry/exit

2018-08-08 Thread Janosch Frank

On 07.08.2018 14:51, David Hildenbrand wrote:
> VCPU requests and VCPU blocking right now don't take care of the vSIE
> (as it was not necessary until now). But we want to have VCPU requests
> that will also be handled before running the vSIE again.
> 
> So let's simulate a SIE entry when entering the vSIE loop and check
> for PROG_ flags. The existing infrastructure (e.g. exit_sie()) will then
> detect that the SIE (in form of the vSIE execution loop) is running and
> properly kick the vSIE CPU, resulting in it leaving the vSIE loop and
> therefore the vSIE interception handler, allowing it to handle VCPU
> requests.
> 
> E.g. if we want to modify the crycb of the VCPU and make sure that any
> masks also get applied to the VSIE crycb shadow (which uses masks from the
> VCPU crycb), we will need a way to hinder the vSIE from running and make
> sure to process the updated crycb before reentering the vSIE again.
> 
> Signed-off-by: David Hildenbrand 

Finally found some time:
Reviewed-by: Janosch Frank 

As the first user will be AP, I guess the patches will be queued with
them. Thanks for helping out :)



signature.asc
Description: OpenPGP digital signature

[PATCH] kbuild: remove deprecated host-progs variable

2018-08-08 Thread Masahiro Yamada

The host-progs has been supported as an alias of hostprogs-y at least
since the beginning of Git era, with the clear prompt:
  Usage of host-progs is deprecated. Please replace with hostprogs-y!

Enough time for the migration has passed.

Signed-off-by: Masahiro Yamada 
---

 arch/xtensa/boot/Makefile | 3 +--
 scripts/Makefile.build| 7 ---
 scripts/Makefile.clean| 1 -
 3 files changed, 1 insertion(+), 10 deletions(-)

diff --git a/arch/xtensa/boot/Makefile b/arch/xtensa/boot/Makefile
index 53e4178..dc9e0ba 100644
--- a/arch/xtensa/boot/Makefile
+++ b/arch/xtensa/boot/Makefile
@@ -30,8 +30,7 @@ Image: boot-elf
 zImage: boot-redboot
 uImage: $(obj)/uImage
 
-boot-elf boot-redboot: $(addprefix $(obj)/,$(subdir-y)) \
-  $(addprefix $(obj)/,$(host-progs))
+boot-elf boot-redboot: $(addprefix $(obj)/,$(subdir-y))
$(Q)$(MAKE) $(build)=$(obj)/$@ $(MAKECMDGOALS)
 
 OBJCOPYFLAGS = --strip-all -R .comment -R .note.gnu.build-id -O binary
diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index ba1c83b..b134b37 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -53,13 +53,6 @@ endif
 
 include scripts/Makefile.lib
 
-ifdef host-progs
-ifneq ($(hostprogs-y),$(host-progs))
-$(warning kbuild: $(obj)/Makefile - Usage of host-progs is deprecated. Please 
replace with hostprogs-y!)
-hostprogs-y += $(host-progs)
-endif
-endif
-
 # Do not include host rules unless needed
 ifneq 
($(hostprogs-y)$(hostprogs-m)$(hostlibs-y)$(hostlibs-m)$(hostcxxlibs-y)$(hostcxxlibs-m),)
 include scripts/Makefile.host
diff --git a/scripts/Makefile.clean b/scripts/Makefile.clean
index 17ef94c..0b80e320 100644
--- a/scripts/Makefile.clean
+++ b/scripts/Makefile.clean
@@ -38,7 +38,6 @@ subdir-ymn:= $(addprefix $(obj)/,$(subdir-ymn))
 
 __clean-files  := $(extra-y) $(extra-m) $(extra-)   \
   $(always) $(targets) $(clean-files)   \
-  $(host-progs) \
   $(hostprogs-y) $(hostprogs-m) $(hostprogs-) \
   $(hostlibs-y) $(hostlibs-m) $(hostlibs-) \
   $(hostcxxlibs-y) $(hostcxxlibs-m)
-- 
2.7.4

[PATCH] kbuild: remove deprecated host-progs variable

2018-08-08 Thread Masahiro Yamada

The host-progs has been supported as an alias of hostprogs-y at least
since the beginning of Git era, with the clear prompt:
  Usage of host-progs is deprecated. Please replace with hostprogs-y!

Enough time for the migration has passed.

Signed-off-by: Masahiro Yamada 
---

 arch/xtensa/boot/Makefile | 3 +--
 scripts/Makefile.build| 7 ---
 scripts/Makefile.clean| 1 -
 3 files changed, 1 insertion(+), 10 deletions(-)

diff --git a/arch/xtensa/boot/Makefile b/arch/xtensa/boot/Makefile
index 53e4178..dc9e0ba 100644
--- a/arch/xtensa/boot/Makefile
+++ b/arch/xtensa/boot/Makefile
@@ -30,8 +30,7 @@ Image: boot-elf
 zImage: boot-redboot
 uImage: $(obj)/uImage
 
-boot-elf boot-redboot: $(addprefix $(obj)/,$(subdir-y)) \
-  $(addprefix $(obj)/,$(host-progs))
+boot-elf boot-redboot: $(addprefix $(obj)/,$(subdir-y))
$(Q)$(MAKE) $(build)=$(obj)/$@ $(MAKECMDGOALS)
 
 OBJCOPYFLAGS = --strip-all -R .comment -R .note.gnu.build-id -O binary
diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index ba1c83b..b134b37 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -53,13 +53,6 @@ endif
 
 include scripts/Makefile.lib
 
-ifdef host-progs
-ifneq ($(hostprogs-y),$(host-progs))
-$(warning kbuild: $(obj)/Makefile - Usage of host-progs is deprecated. Please 
replace with hostprogs-y!)
-hostprogs-y += $(host-progs)
-endif
-endif
-
 # Do not include host rules unless needed
 ifneq 
($(hostprogs-y)$(hostprogs-m)$(hostlibs-y)$(hostlibs-m)$(hostcxxlibs-y)$(hostcxxlibs-m),)
 include scripts/Makefile.host
diff --git a/scripts/Makefile.clean b/scripts/Makefile.clean
index 17ef94c..0b80e320 100644
--- a/scripts/Makefile.clean
+++ b/scripts/Makefile.clean
@@ -38,7 +38,6 @@ subdir-ymn:= $(addprefix $(obj)/,$(subdir-ymn))
 
 __clean-files  := $(extra-y) $(extra-m) $(extra-)   \
   $(always) $(targets) $(clean-files)   \
-  $(host-progs) \
   $(hostprogs-y) $(hostprogs-m) $(hostprogs-) \
   $(hostlibs-y) $(hostlibs-m) $(hostlibs-) \
   $(hostcxxlibs-y) $(hostcxxlibs-m)
-- 
2.7.4

Re: [PATCH 1/1] kbuild: allow alternate src for target's implicit prerequisite

2018-08-08 Thread Masahiro Yamada

2018-08-06 23:37 GMT+09:00 Vasily Gorbik :
> With kbuild there is no easy way to re-compile source files from another
> directory, which is required for the decompressor on some platforms
> (files like lib/ctype.c, lib/cmdline.c, etc). Writing custom build
> rules for those files is not feasible since original rules are complex
> and kbuild functions and variables are not exposed.
>
> The simplest solution is to reverse include source files either into
> existing files or separate files. That eliminates the need to tackle
> with the kbuild rules, but is ugly.
>
> Here is another solution to that problem, utilizing secondary expansion.
> Build rules are in a form:
> $(obj)/%.o: $(src)/%.c ...
> $(obj)/%.o: $(src)/%.S ...
>
> "src" variable could be changed to cover the need of specifying alternate
> source file directory.
> src := $(if $(SRCDIR_$(@F)),$(SRCDIR_$(@F)),$(src))
>
> So, if there is SRCDIR_ set, it will be used, original "src" is
> used otherwise. But that wouldn't work as it is. To be able to utilize
> automatic variables in implicit prerequisite secondary expansion has to
> be used and src value has to be additionally escaped.
>
> Alternate src dir then could be specified in Makefile as:
> obj-y := file1.o file2.o
> SRCDIR_file1.o := file1/src/dir
> SRCDIR_file2.o := file2/src/dir
>
> Which is enough to build $(obj)/file1.o from file1/src/dir/file1.(c|S),
> and $(obj)/file2.o from file2/src/dir/file2.(c|S)


Is this a simple solution compared with existing ones?
Kind of.

However, it is just moving the complexity around
from Makefiles to the core build scripts, after all.


I guess the increase of build time is not too big,
( 1 % or so for the incremental build of allmodconfig)
but I do not like performing the second expansion for every directory
to save some a few Makefiles.



> Secondary expansion has been introduced with make 3.81, which is
> minimal supported version by kbuild itself.
>
> Signed-off-by: Vasily Gorbik 
> ---
>  scripts/Makefile.build | 26 +++---
>  1 file changed, 15 insertions(+), 11 deletions(-)
>
> diff --git a/scripts/Makefile.build b/scripts/Makefile.build
> index 514ed63ff571..97c6ece96cfb 100644
> --- a/scripts/Makefile.build
> +++ b/scripts/Makefile.build
> @@ -70,6 +70,10 @@ $(warning kbuild: Makefile.build is included improperly)
>  endif
>
>  # ===
> +# Allow to specify alternate source directory of target's implicit 
> prerequisite
> +# e.g. 'SRCDIR_cmdline.o := lib'


This does not work for single targets.

'make foo/bar/cmdline.o' works,
but 'make foo/bar/cmdline.i' does not.


> +.SECONDEXPANSION:
> +srcdir := $$(if $$(SRCDIR_$$(@F)),$$(SRCDIR_$$(@F)),$(src))
>


-- 
Best Regards
Masahiro Yamada

Re: [PATCH 1/1] kbuild: allow alternate src for target's implicit prerequisite

2018-08-08 Thread Masahiro Yamada

2018-08-06 23:37 GMT+09:00 Vasily Gorbik :
> With kbuild there is no easy way to re-compile source files from another
> directory, which is required for the decompressor on some platforms
> (files like lib/ctype.c, lib/cmdline.c, etc). Writing custom build
> rules for those files is not feasible since original rules are complex
> and kbuild functions and variables are not exposed.
>
> The simplest solution is to reverse include source files either into
> existing files or separate files. That eliminates the need to tackle
> with the kbuild rules, but is ugly.
>
> Here is another solution to that problem, utilizing secondary expansion.
> Build rules are in a form:
> $(obj)/%.o: $(src)/%.c ...
> $(obj)/%.o: $(src)/%.S ...
>
> "src" variable could be changed to cover the need of specifying alternate
> source file directory.
> src := $(if $(SRCDIR_$(@F)),$(SRCDIR_$(@F)),$(src))
>
> So, if there is SRCDIR_ set, it will be used, original "src" is
> used otherwise. But that wouldn't work as it is. To be able to utilize
> automatic variables in implicit prerequisite secondary expansion has to
> be used and src value has to be additionally escaped.
>
> Alternate src dir then could be specified in Makefile as:
> obj-y := file1.o file2.o
> SRCDIR_file1.o := file1/src/dir
> SRCDIR_file2.o := file2/src/dir
>
> Which is enough to build $(obj)/file1.o from file1/src/dir/file1.(c|S),
> and $(obj)/file2.o from file2/src/dir/file2.(c|S)


Is this a simple solution compared with existing ones?
Kind of.

However, it is just moving the complexity around
from Makefiles to the core build scripts, after all.


I guess the increase of build time is not too big,
( 1 % or so for the incremental build of allmodconfig)
but I do not like performing the second expansion for every directory
to save some a few Makefiles.



> Secondary expansion has been introduced with make 3.81, which is
> minimal supported version by kbuild itself.
>
> Signed-off-by: Vasily Gorbik 
> ---
>  scripts/Makefile.build | 26 +++---
>  1 file changed, 15 insertions(+), 11 deletions(-)
>
> diff --git a/scripts/Makefile.build b/scripts/Makefile.build
> index 514ed63ff571..97c6ece96cfb 100644
> --- a/scripts/Makefile.build
> +++ b/scripts/Makefile.build
> @@ -70,6 +70,10 @@ $(warning kbuild: Makefile.build is included improperly)
>  endif
>
>  # ===
> +# Allow to specify alternate source directory of target's implicit 
> prerequisite
> +# e.g. 'SRCDIR_cmdline.o := lib'


This does not work for single targets.

'make foo/bar/cmdline.o' works,
but 'make foo/bar/cmdline.i' does not.


> +.SECONDEXPANSION:
> +srcdir := $$(if $$(SRCDIR_$$(@F)),$$(SRCDIR_$$(@F)),$(src))
>


-- 
Best Regards
Masahiro Yamada

ALT+CTRL+F2 - Switch Terminal - agetty process

2018-08-08 Thread Mohit Gupta

I am trying to understand how agetty process is called when
ALT+CTRL+F2/ALT+CTRL+F3/ keys are pressed in CentOS or in any
other linux distro.

What I know so far
- systemd runs agetty.service with next tty[N] number
- agetty then run login process and then other process continues.


What I want to know
---
When ALT+CTRL+F2 keys are pressed what happens in kernel?

Looking at keyboard.c seems k_cons is called which calls set_console.

static void k_cons(struct vc_data *vc, unsigned char value, char up_flag)
{
   if (up_flag)
return;

set_console(value);
}

I am not sure how set_console triggers agetty with tty2/tty3/tty4 as
one of it's parameter.

Any information will be helpful.

Regards
Mohit

ALT+CTRL+F2 - Switch Terminal - agetty process

2018-08-08 Thread Mohit Gupta

I am trying to understand how agetty process is called when
ALT+CTRL+F2/ALT+CTRL+F3/ keys are pressed in CentOS or in any
other linux distro.

What I know so far
- systemd runs agetty.service with next tty[N] number
- agetty then run login process and then other process continues.


What I want to know
---
When ALT+CTRL+F2 keys are pressed what happens in kernel?

Looking at keyboard.c seems k_cons is called which calls set_console.

static void k_cons(struct vc_data *vc, unsigned char value, char up_flag)
{
   if (up_flag)
return;

set_console(value);
}

I am not sure how set_console triggers agetty with tty2/tty3/tty4 as
one of it's parameter.

Any information will be helpful.

Regards
Mohit

Re: [PATCH] kbuild: add Map option to save vmlinux linker map file(s)

2018-08-08 Thread Masahiro Yamada

2018-08-06 22:38 GMT+09:00 Vasily Gorbik :
> Add "Map" kbuild option, so that "make Map=1" would save vmlinux linker
> map into vmlinux.map, which is quite useful during making kernel changes
> related to how the kernel is composed.
>
> KBUILD_SAVE_LINK_MAP flag is exported and architectures
> supporting compressed kernel images might respect it and produce
> arch/*/boot/compressed/vmlinux.map for the decompressor code as well.
>
> Signed-off-by: Vasily Gorbik 
> ---


If the map file is quite useful, should it be generated all the time?
Or, how about CONFIG option if you want this conditionally generated?

I do not want to increase fine-grained command-line switches
except _really_ useful ones.




>  .gitignore |  1 +
>  Makefile   | 14 ++
>  2 files changed, 15 insertions(+)
>
> diff --git a/.gitignore b/.gitignore
> index 97ba6b79834c..1d2308e597ad 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -44,6 +44,7 @@
>  *.xz
>  Module.symvers
>  modules.builtin
> +vmlinux.map
>
>  #
>  # Top-level generic files
> diff --git a/Makefile b/Makefile
> index 7a3c4548162b..f1790deae03b 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -176,6 +176,14 @@ ifndef KBUILD_CHECKSRC
>KBUILD_CHECKSRC = 0
>  endif
>
> +# Use 'make Map=1' to enable saving linker map file(s):
> +# vmlinux.map for vmlinux,
> +# (arch/*/boot/compressed/vmlinux.map for arch/*/boot/compressed/vmlinux)
> +
> +ifeq ("$(origin Map)", "command line")
> +   export KBUILD_SAVE_LINK_MAP := $(Map)
> +endif
> +
>  # Use make M=dir to specify directory of external module to build
>  # Old syntax make ... SUBDIRS=$PWD is still supported
>  # Setting the environment variable KBUILD_EXTMOD take precedence
> @@ -838,6 +846,11 @@ ifeq ($(CONFIG_STRIP_ASM_SYMS),y)
>  LDFLAGS_vmlinux+= $(call ld-option, -X,)
>  endif
>
> +ifdef KBUILD_SAVE_LINK_MAP
> +LDFLAGS_vmlinux+= -Map=vmlinux.map
> +CLEAN_FILES+= vmlinux.map
> +endif
> +
>  # insure the checker run with the right endianness
>  CHECKFLAGS += $(if $(CONFIG_CPU_BIG_ENDIAN),-mbig-endian,-mlittle-endian)
>
> @@ -1434,6 +1447,7 @@ help:
> @echo  '2: warnings which occur quite often but may 
> still be relevant'
> @echo  '3: more obscure warnings, can most likely be 
> ignored'
> @echo  'Multiple levels can be combined with W=12 or 
> W=123'
> +   @echo  '  make Map=1 [targets] Save vmlinux linker map file(s)'
> @echo  ''
> @echo  'Execute "make" or "make all" to build all targets marked with 
> [*] '
> @echo  'For further info see the ./README file'
> --
> 2.18.0.13.gd42ae10
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kbuild" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Best Regards
Masahiro Yamada

Re: [PATCH] kbuild: add Map option to save vmlinux linker map file(s)

2018-08-08 Thread Masahiro Yamada

2018-08-06 22:38 GMT+09:00 Vasily Gorbik :
> Add "Map" kbuild option, so that "make Map=1" would save vmlinux linker
> map into vmlinux.map, which is quite useful during making kernel changes
> related to how the kernel is composed.
>
> KBUILD_SAVE_LINK_MAP flag is exported and architectures
> supporting compressed kernel images might respect it and produce
> arch/*/boot/compressed/vmlinux.map for the decompressor code as well.
>
> Signed-off-by: Vasily Gorbik 
> ---


If the map file is quite useful, should it be generated all the time?
Or, how about CONFIG option if you want this conditionally generated?

I do not want to increase fine-grained command-line switches
except _really_ useful ones.




>  .gitignore |  1 +
>  Makefile   | 14 ++
>  2 files changed, 15 insertions(+)
>
> diff --git a/.gitignore b/.gitignore
> index 97ba6b79834c..1d2308e597ad 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -44,6 +44,7 @@
>  *.xz
>  Module.symvers
>  modules.builtin
> +vmlinux.map
>
>  #
>  # Top-level generic files
> diff --git a/Makefile b/Makefile
> index 7a3c4548162b..f1790deae03b 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -176,6 +176,14 @@ ifndef KBUILD_CHECKSRC
>KBUILD_CHECKSRC = 0
>  endif
>
> +# Use 'make Map=1' to enable saving linker map file(s):
> +# vmlinux.map for vmlinux,
> +# (arch/*/boot/compressed/vmlinux.map for arch/*/boot/compressed/vmlinux)
> +
> +ifeq ("$(origin Map)", "command line")
> +   export KBUILD_SAVE_LINK_MAP := $(Map)
> +endif
> +
>  # Use make M=dir to specify directory of external module to build
>  # Old syntax make ... SUBDIRS=$PWD is still supported
>  # Setting the environment variable KBUILD_EXTMOD take precedence
> @@ -838,6 +846,11 @@ ifeq ($(CONFIG_STRIP_ASM_SYMS),y)
>  LDFLAGS_vmlinux+= $(call ld-option, -X,)
>  endif
>
> +ifdef KBUILD_SAVE_LINK_MAP
> +LDFLAGS_vmlinux+= -Map=vmlinux.map
> +CLEAN_FILES+= vmlinux.map
> +endif
> +
>  # insure the checker run with the right endianness
>  CHECKFLAGS += $(if $(CONFIG_CPU_BIG_ENDIAN),-mbig-endian,-mlittle-endian)
>
> @@ -1434,6 +1447,7 @@ help:
> @echo  '2: warnings which occur quite often but may 
> still be relevant'
> @echo  '3: more obscure warnings, can most likely be 
> ignored'
> @echo  'Multiple levels can be combined with W=12 or 
> W=123'
> +   @echo  '  make Map=1 [targets] Save vmlinux linker map file(s)'
> @echo  ''
> @echo  'Execute "make" or "make all" to build all targets marked with 
> [*] '
> @echo  'For further info see the ./README file'
> --
> 2.18.0.13.gd42ae10
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kbuild" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Best Regards
Masahiro Yamada

[PATCH v3] perf report: Create auxiliary trace data files for s390

2018-08-08 Thread Thomas Richter

Create auxiliary trace data log files when invoked with option
 --itrace=d as in

[root@s35lp76 perf] ./perf report -i perf.data.aux1 --stdio --itrace=d

perf report creates several data files in the current directory
named aux.smp.## where ## is a 2 digit hex number with
leading zeros representing the CPU number this trace data
was recorded from. The file contents is binary and contains
the CPU-Measurement Sampling Data Blocks (SDBs).

The directory to save the auxiliary trace buffer can be
changed using the perf config file and command. Specify
section 'auxtrace' keyword 'dumpdir' and assign it a valid
directory name. If the directory does not exist or has the
wrong file type, the current directory is used.

[root@p23lp27 perf]# ./perf config auxtrace.dumpdir=/tmp
[root@p23lp27 perf]# ./perf config --user -l
auxtrace.dumpdir=/tmp
[root@p23lp27 perf]# ./perf report ...
[root@p23lp27 perf]# ll /tmp/aux.smp.00
-rw-r--r-- 1 root root 204800 Aug  2 13:48 /tmp/aux.smp.00
[root@p23lp27 perf]#

Signed-off-by: Thomas Richter 
Reviewed-by: Hendrik Brueckner 
---
 tools/perf/util/s390-cpumsf.c | 94 +--
 1 file changed, 91 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/s390-cpumsf.c b/tools/perf/util/s390-cpumsf.c
index d2c78ffd9fee..a2eeebbfb25f 100644
--- a/tools/perf/util/s390-cpumsf.c
+++ b/tools/perf/util/s390-cpumsf.c
@@ -147,6 +147,9 @@
 #include 
 #include 
 
+#include 
+#include 
+
 #include "cpumap.h"
 #include "color.h"
 #include "evsel.h"
@@ -159,6 +162,7 @@
 #include "auxtrace.h"
 #include "s390-cpumsf.h"
 #include "s390-cpumsf-kernel.h"
+#include "config.h"
 
 struct s390_cpumsf {
struct auxtrace auxtrace;
@@ -170,6 +174,8 @@ struct s390_cpumsf {
u32 pmu_type;
u16 machine_type;
booldata_queued;
+   booluse_logfile;
+   char*logdir;
 };
 
 struct s390_cpumsf_queue {
@@ -177,6 +183,7 @@ struct s390_cpumsf_queue {
unsigned intqueue_nr;
struct auxtrace_buffer  *buffer;
int cpu;
+   FILE*logfile;
 };
 
 /* Display s390 CPU measurement facility basic-sampling data entry */
@@ -595,6 +602,12 @@ static int s390_cpumsf_run_decoder(struct 
s390_cpumsf_queue *sfq,
buffer->use_size = buffer->size;
buffer->use_data = buffer->data;
}
+   if (sfq->logfile) { /* Write into log file */
+   size_t rc = fwrite(buffer->data, buffer->size, 1,
+  sfq->logfile);
+   if (rc != 1)
+   pr_err("Failed to write auxiliary data\n");
+   }
} else
buffer = sfq->buffer;
 
@@ -606,6 +619,13 @@ static int s390_cpumsf_run_decoder(struct 
s390_cpumsf_queue *sfq,
return -ENOMEM;
buffer->use_size = buffer->size;
buffer->use_data = buffer->data;
+
+   if (sfq->logfile) { /* Write into log file */
+   size_t rc = fwrite(buffer->data, buffer->size, 1,
+  sfq->logfile);
+   if (rc != 1)
+   pr_err("Failed to write auxiliary data\n");
+   }
}
pr_debug4("%s queue_nr:%d buffer:%" PRId64 " offset:%#" PRIx64 " 
size:%#zx rest:%#zx\n",
  __func__, sfq->queue_nr, buffer->buffer_nr, buffer->offset,
@@ -640,6 +660,23 @@ s390_cpumsf_alloc_queue(struct s390_cpumsf *sf, unsigned 
int queue_nr)
sfq->sf = sf;
sfq->queue_nr = queue_nr;
sfq->cpu = -1;
+   if (sf->use_logfile) {
+   char *name;
+   int rc;
+
+   rc = (sf->logdir)
+   ? asprintf(, "%s/aux.smp.%02x",
+sf->logdir, queue_nr)
+   : asprintf(, "aux.smp.%02x", queue_nr);
+   if (rc > 0)
+   sfq->logfile = fopen(name, "w");
+   if (sfq->logfile == NULL) {
+   pr_err("Failed to open auxiliary log file %s,"
+  "continue...\n", name);
+   sf->use_logfile = false;
+   }
+   free(name);
+   }
return sfq;
 }
 
@@ -850,8 +887,16 @@ static void s390_cpumsf_free_queues(struct perf_session 
*session)
struct auxtrace_queues *queues = >queues;
unsigned int i;
 
-   for (i = 0; i < queues->nr_queues; i++)
+   for (i = 0; i < queues->nr_queues; i++) {
+   struct s390_cpumsf_queue *sfq = (struct s390_cpumsf_queue *)
+   queues->queue_array[i].priv;
+
+   if (sfq != NULL && sfq->logfile) {
+

[PATCH v3] perf report: Create auxiliary trace data files for s390

2018-08-08 Thread Thomas Richter

Create auxiliary trace data log files when invoked with option
 --itrace=d as in

[root@s35lp76 perf] ./perf report -i perf.data.aux1 --stdio --itrace=d

perf report creates several data files in the current directory
named aux.smp.## where ## is a 2 digit hex number with
leading zeros representing the CPU number this trace data
was recorded from. The file contents is binary and contains
the CPU-Measurement Sampling Data Blocks (SDBs).

The directory to save the auxiliary trace buffer can be
changed using the perf config file and command. Specify
section 'auxtrace' keyword 'dumpdir' and assign it a valid
directory name. If the directory does not exist or has the
wrong file type, the current directory is used.

[root@p23lp27 perf]# ./perf config auxtrace.dumpdir=/tmp
[root@p23lp27 perf]# ./perf config --user -l
auxtrace.dumpdir=/tmp
[root@p23lp27 perf]# ./perf report ...
[root@p23lp27 perf]# ll /tmp/aux.smp.00
-rw-r--r-- 1 root root 204800 Aug  2 13:48 /tmp/aux.smp.00
[root@p23lp27 perf]#

Signed-off-by: Thomas Richter 
Reviewed-by: Hendrik Brueckner 
---
 tools/perf/util/s390-cpumsf.c | 94 +--
 1 file changed, 91 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/s390-cpumsf.c b/tools/perf/util/s390-cpumsf.c
index d2c78ffd9fee..a2eeebbfb25f 100644
--- a/tools/perf/util/s390-cpumsf.c
+++ b/tools/perf/util/s390-cpumsf.c
@@ -147,6 +147,9 @@
 #include 
 #include 
 
+#include 
+#include 
+
 #include "cpumap.h"
 #include "color.h"
 #include "evsel.h"
@@ -159,6 +162,7 @@
 #include "auxtrace.h"
 #include "s390-cpumsf.h"
 #include "s390-cpumsf-kernel.h"
+#include "config.h"
 
 struct s390_cpumsf {
struct auxtrace auxtrace;
@@ -170,6 +174,8 @@ struct s390_cpumsf {
u32 pmu_type;
u16 machine_type;
booldata_queued;
+   booluse_logfile;
+   char*logdir;
 };
 
 struct s390_cpumsf_queue {
@@ -177,6 +183,7 @@ struct s390_cpumsf_queue {
unsigned intqueue_nr;
struct auxtrace_buffer  *buffer;
int cpu;
+   FILE*logfile;
 };
 
 /* Display s390 CPU measurement facility basic-sampling data entry */
@@ -595,6 +602,12 @@ static int s390_cpumsf_run_decoder(struct 
s390_cpumsf_queue *sfq,
buffer->use_size = buffer->size;
buffer->use_data = buffer->data;
}
+   if (sfq->logfile) { /* Write into log file */
+   size_t rc = fwrite(buffer->data, buffer->size, 1,
+  sfq->logfile);
+   if (rc != 1)
+   pr_err("Failed to write auxiliary data\n");
+   }
} else
buffer = sfq->buffer;
 
@@ -606,6 +619,13 @@ static int s390_cpumsf_run_decoder(struct 
s390_cpumsf_queue *sfq,
return -ENOMEM;
buffer->use_size = buffer->size;
buffer->use_data = buffer->data;
+
+   if (sfq->logfile) { /* Write into log file */
+   size_t rc = fwrite(buffer->data, buffer->size, 1,
+  sfq->logfile);
+   if (rc != 1)
+   pr_err("Failed to write auxiliary data\n");
+   }
}
pr_debug4("%s queue_nr:%d buffer:%" PRId64 " offset:%#" PRIx64 " 
size:%#zx rest:%#zx\n",
  __func__, sfq->queue_nr, buffer->buffer_nr, buffer->offset,
@@ -640,6 +660,23 @@ s390_cpumsf_alloc_queue(struct s390_cpumsf *sf, unsigned 
int queue_nr)
sfq->sf = sf;
sfq->queue_nr = queue_nr;
sfq->cpu = -1;
+   if (sf->use_logfile) {
+   char *name;
+   int rc;
+
+   rc = (sf->logdir)
+   ? asprintf(, "%s/aux.smp.%02x",
+sf->logdir, queue_nr)
+   : asprintf(, "aux.smp.%02x", queue_nr);
+   if (rc > 0)
+   sfq->logfile = fopen(name, "w");
+   if (sfq->logfile == NULL) {
+   pr_err("Failed to open auxiliary log file %s,"
+  "continue...\n", name);
+   sf->use_logfile = false;
+   }
+   free(name);
+   }
return sfq;
 }
 
@@ -850,8 +887,16 @@ static void s390_cpumsf_free_queues(struct perf_session 
*session)
struct auxtrace_queues *queues = >queues;
unsigned int i;
 
-   for (i = 0; i < queues->nr_queues; i++)
+   for (i = 0; i < queues->nr_queues; i++) {
+   struct s390_cpumsf_queue *sfq = (struct s390_cpumsf_queue *)
+   queues->queue_array[i].priv;
+
+   if (sfq != NULL && sfq->logfile) {
+

Re: [PATCH] extcon: maxim: Add SPDX license identifiers

2018-08-08 Thread Chanwoo Choi

Hi Krzysztof,

On 2018년 08월 08일 01:21, Krzysztof Kozlowski wrote:
> Replace GPL v2.0+ license statements with SPDX license identifiers.
> 
> Signed-off-by: Krzysztof Kozlowski 
> ---
>  drivers/extcon/extcon-max14577.c | 24 +++-
>  drivers/extcon/extcon-max77693.c | 22 ++
>  drivers/extcon/extcon-max77843.c | 19 +++
>  drivers/extcon/extcon-max8997.c  | 22 ++
>  4 files changed, 26 insertions(+), 61 deletions(-)
> 

Thanks. Applied it for v4.20.

(snip)

-- 
Best Regards,
Chanwoo Choi
Samsung Electronics

Re: [PATCH] extcon: maxim: Add SPDX license identifiers

2018-08-08 Thread Chanwoo Choi

Hi Krzysztof,

On 2018년 08월 08일 01:21, Krzysztof Kozlowski wrote:
> Replace GPL v2.0+ license statements with SPDX license identifiers.
> 
> Signed-off-by: Krzysztof Kozlowski 
> ---
>  drivers/extcon/extcon-max14577.c | 24 +++-
>  drivers/extcon/extcon-max77693.c | 22 ++
>  drivers/extcon/extcon-max77843.c | 19 +++
>  drivers/extcon/extcon-max8997.c  | 22 ++
>  4 files changed, 26 insertions(+), 61 deletions(-)
> 

Thanks. Applied it for v4.20.

(snip)

-- 
Best Regards,
Chanwoo Choi
Samsung Electronics

Re: [PATCH 2/3] perf report: Add raw report support for s390 auxiliary trace

2018-08-08 Thread Thomas-Mich Richter

On 08/08/2018 06:42 PM, Arnaldo Carvalho de Melo wrote:
> Em Wed, Aug 08, 2018 at 01:14:51PM -0300, Arnaldo Carvalho de Melo escreveu:
>> Em Wed, Aug 08, 2018 at 01:08:43PM -0300, Arnaldo Carvalho de Melo escreveu:
>>> No need for __packed.
>>  
>>> I'm removing that to avoid having this failling in compilers that have
>>> such a warning, since we default to Werror.
>>  
>>> Holler if there is something I'missing :-)
>>
>> In file included from util/cpumap.h:10,
>>  from util/s390-cpumsf.c:150:
>> util/s390-cpumsf.c: In function 's390_cpumsf_diag_show':
>> util/s390-cpumsf.c:208:10: error: format '%lx' expects argument of type 
>> 'long unsigned int', but argument 4 has type 'size_t' {aka 'unsigned int'} 
>> [-Werror=format=]
>>pr_err("Invalid AUX trace diagnostic entry [%#08lx]\n", pos);
>>   ^~~
>> /git/linux/tools/perf/util/debug.h:20:21: note: in definition of macro 
>> 'pr_fmt'
>>  #define pr_fmt(fmt) fmt
>>  ^~~
>> util/s390-cpumsf.c:208:3: note: in expansion of macro 'pr_err'
>>pr_err("Invalid AUX trace diagnostic entry [%#08lx]\n", pos);
>>^~
>> util/s390-cpumsf.c: In function 's390_cpumsf_trailer_show':
>> util/s390-cpumsf.c:233:10: error: format '%lx' expects argument of type 
>> 'long unsigned int', but argument 4 has type 'size_t' {aka 'unsigned int'} 
>> [-Werror=format=]
>>pr_err("Invalid AUX trace trailer entry [%#08lx]\n", pos);
> 
> So for those I applied this, seems to pass the ones that were failing,
> restarting tests...
> 
> diff --git a/tools/perf/util/s390-cpumsf.c b/tools/perf/util/s390-cpumsf.c
> index 2bcb160a08f0..d2c78ffd9fee 100644
> --- a/tools/perf/util/s390-cpumsf.c
> +++ b/tools/perf/util/s390-cpumsf.c
> @@ -187,7 +187,7 @@ static bool s390_cpumsf_basic_show(const char *color, 
> size_t pos,
>   pr_err("Invalid AUX trace basic entry [%#08zx]\n", pos);
>   return false;
>   }
> - color_fprintf(stdout, color, "[%#08x] Basic   Def:%04x Inst:%#04x"
> + color_fprintf(stdout, color, "[%#08zx] Basic   Def:%04x Inst:%#04x"
> " %c%c%c%c AS:%d ASN:%#04x IA:%#018llx\n"
> "\t\tCL:%d HPP:%#018llx GPP:%#018llx\n",
> pos, basic->def, basic->U,
> @@ -205,10 +205,10 @@ static bool s390_cpumsf_diag_show(const char *color, 
> size_t pos,
> struct hws_diag_entry *diag)
>  {
>   if (diag->def < S390_CPUMSF_DIAG_DEF_FIRST) {
> - pr_err("Invalid AUX trace diagnostic entry [%#08lx]\n", pos);
> + pr_err("Invalid AUX trace diagnostic entry [%#08zx]\n", pos);
>   return false;
>   }
> - color_fprintf(stdout, color, "[%#08x] DiagDef:%04x %c\n",
> + color_fprintf(stdout, color, "[%#08zx] DiagDef:%04x %c\n",
> pos, diag->def, diag->I ? 'I' : ' ');
>   return true;
>  }
> @@ -230,10 +230,10 @@ static bool s390_cpumsf_trailer_show(const char *color, 
> size_t pos,
>struct hws_trailer_entry *te)
>  {
>   if (te->bsdes != sizeof(struct hws_basic_entry)) {
> - pr_err("Invalid AUX trace trailer entry [%#08lx]\n", pos);
> + pr_err("Invalid AUX trace trailer entry [%#08zx]\n", pos);
>   return false;
>   }
> - color_fprintf(stdout, color, "[%#08x] Trailer %c%c%c bsdes:%d"
> + color_fprintf(stdout, color, "[%#08zx] Trailer %c%c%c bsdes:%d"
> " dsdes:%d Overflow:%lld Time:%#llx\n"
> "\t\tC:%d TOD:%#lx 1:%#llx 2:%#llx\n",
> pos,
> @@ -418,7 +418,7 @@ static bool s390_cpumsf_make_event(size_t pos,
>   event.sample.header.misc = sample.cpumode;
>   event.sample.header.size = sizeof(struct perf_event_header);
>  
> - pr_debug4("%s pos:%#zx ip:%#lx P:%d CL:%d pid:%d.%d cpumode:%d 
> cpu:%d\n",
> + pr_debug4("%s pos:%#zx ip:%#" PRIx64 " P:%d CL:%d pid:%d.%d cpumode:%d 
> cpu:%d\n",
>__func__, pos, sample.ip, basic->P, basic->CL, sample.pid,
>sample.tid, sample.cpumode, sample.cpu);
>   if (perf_session__deliver_synth_event(sfq->sf->session, ,
> @@ -498,7 +498,7 @@ static int s390_cpumsf_samples(struct s390_cpumsf_queue 
> *sfq, u64 *ts)
>*/
>   aux_ts = get_trailer_time(buf);
>   if (!aux_ts) {
> - pr_err("[%#08lx] Invalid AUX trailer entry TOD clock base\n",
> + pr_err("[%#08" PRIx64 "] Invalid AUX trailer entry TOD clock 
> base\n",
>  sfq->buffer->data_offset);
>   aux_ts = ~0ULL;
>   goto out;
> @@ -607,7 +607,7 @@ static int s390_cpumsf_run_decoder(struct 
> s390_cpumsf_queue *sfq,
>   buffer->use_size = buffer->size;
>   buffer->use_data = buffer->data;
>   }
> - pr_debug4("%s queue_nr:%d buffer:%ld offset:%#lx size:%#zx rest:%#zx\n",
> + pr_debug4("%s

Re: [PATCH 2/3] perf report: Add raw report support for s390 auxiliary trace

2018-08-08 Thread Thomas-Mich Richter

On 08/08/2018 06:42 PM, Arnaldo Carvalho de Melo wrote:
> Em Wed, Aug 08, 2018 at 01:14:51PM -0300, Arnaldo Carvalho de Melo escreveu:
>> Em Wed, Aug 08, 2018 at 01:08:43PM -0300, Arnaldo Carvalho de Melo escreveu:
>>> No need for __packed.
>>  
>>> I'm removing that to avoid having this failling in compilers that have
>>> such a warning, since we default to Werror.
>>  
>>> Holler if there is something I'missing :-)
>>
>> In file included from util/cpumap.h:10,
>>  from util/s390-cpumsf.c:150:
>> util/s390-cpumsf.c: In function 's390_cpumsf_diag_show':
>> util/s390-cpumsf.c:208:10: error: format '%lx' expects argument of type 
>> 'long unsigned int', but argument 4 has type 'size_t' {aka 'unsigned int'} 
>> [-Werror=format=]
>>pr_err("Invalid AUX trace diagnostic entry [%#08lx]\n", pos);
>>   ^~~
>> /git/linux/tools/perf/util/debug.h:20:21: note: in definition of macro 
>> 'pr_fmt'
>>  #define pr_fmt(fmt) fmt
>>  ^~~
>> util/s390-cpumsf.c:208:3: note: in expansion of macro 'pr_err'
>>pr_err("Invalid AUX trace diagnostic entry [%#08lx]\n", pos);
>>^~
>> util/s390-cpumsf.c: In function 's390_cpumsf_trailer_show':
>> util/s390-cpumsf.c:233:10: error: format '%lx' expects argument of type 
>> 'long unsigned int', but argument 4 has type 'size_t' {aka 'unsigned int'} 
>> [-Werror=format=]
>>pr_err("Invalid AUX trace trailer entry [%#08lx]\n", pos);
> 
> So for those I applied this, seems to pass the ones that were failing,
> restarting tests...
> 
> diff --git a/tools/perf/util/s390-cpumsf.c b/tools/perf/util/s390-cpumsf.c
> index 2bcb160a08f0..d2c78ffd9fee 100644
> --- a/tools/perf/util/s390-cpumsf.c
> +++ b/tools/perf/util/s390-cpumsf.c
> @@ -187,7 +187,7 @@ static bool s390_cpumsf_basic_show(const char *color, 
> size_t pos,
>   pr_err("Invalid AUX trace basic entry [%#08zx]\n", pos);
>   return false;
>   }
> - color_fprintf(stdout, color, "[%#08x] Basic   Def:%04x Inst:%#04x"
> + color_fprintf(stdout, color, "[%#08zx] Basic   Def:%04x Inst:%#04x"
> " %c%c%c%c AS:%d ASN:%#04x IA:%#018llx\n"
> "\t\tCL:%d HPP:%#018llx GPP:%#018llx\n",
> pos, basic->def, basic->U,
> @@ -205,10 +205,10 @@ static bool s390_cpumsf_diag_show(const char *color, 
> size_t pos,
> struct hws_diag_entry *diag)
>  {
>   if (diag->def < S390_CPUMSF_DIAG_DEF_FIRST) {
> - pr_err("Invalid AUX trace diagnostic entry [%#08lx]\n", pos);
> + pr_err("Invalid AUX trace diagnostic entry [%#08zx]\n", pos);
>   return false;
>   }
> - color_fprintf(stdout, color, "[%#08x] DiagDef:%04x %c\n",
> + color_fprintf(stdout, color, "[%#08zx] DiagDef:%04x %c\n",
> pos, diag->def, diag->I ? 'I' : ' ');
>   return true;
>  }
> @@ -230,10 +230,10 @@ static bool s390_cpumsf_trailer_show(const char *color, 
> size_t pos,
>struct hws_trailer_entry *te)
>  {
>   if (te->bsdes != sizeof(struct hws_basic_entry)) {
> - pr_err("Invalid AUX trace trailer entry [%#08lx]\n", pos);
> + pr_err("Invalid AUX trace trailer entry [%#08zx]\n", pos);
>   return false;
>   }
> - color_fprintf(stdout, color, "[%#08x] Trailer %c%c%c bsdes:%d"
> + color_fprintf(stdout, color, "[%#08zx] Trailer %c%c%c bsdes:%d"
> " dsdes:%d Overflow:%lld Time:%#llx\n"
> "\t\tC:%d TOD:%#lx 1:%#llx 2:%#llx\n",
> pos,
> @@ -418,7 +418,7 @@ static bool s390_cpumsf_make_event(size_t pos,
>   event.sample.header.misc = sample.cpumode;
>   event.sample.header.size = sizeof(struct perf_event_header);
>  
> - pr_debug4("%s pos:%#zx ip:%#lx P:%d CL:%d pid:%d.%d cpumode:%d 
> cpu:%d\n",
> + pr_debug4("%s pos:%#zx ip:%#" PRIx64 " P:%d CL:%d pid:%d.%d cpumode:%d 
> cpu:%d\n",
>__func__, pos, sample.ip, basic->P, basic->CL, sample.pid,
>sample.tid, sample.cpumode, sample.cpu);
>   if (perf_session__deliver_synth_event(sfq->sf->session, ,
> @@ -498,7 +498,7 @@ static int s390_cpumsf_samples(struct s390_cpumsf_queue 
> *sfq, u64 *ts)
>*/
>   aux_ts = get_trailer_time(buf);
>   if (!aux_ts) {
> - pr_err("[%#08lx] Invalid AUX trailer entry TOD clock base\n",
> + pr_err("[%#08" PRIx64 "] Invalid AUX trailer entry TOD clock 
> base\n",
>  sfq->buffer->data_offset);
>   aux_ts = ~0ULL;
>   goto out;
> @@ -607,7 +607,7 @@ static int s390_cpumsf_run_decoder(struct 
> s390_cpumsf_queue *sfq,
>   buffer->use_size = buffer->size;
>   buffer->use_data = buffer->data;
>   }
> - pr_debug4("%s queue_nr:%d buffer:%ld offset:%#lx size:%#zx rest:%#zx\n",
> + pr_debug4("%s

[PATCH v8 0/6] Uprobes: Support SDT markers having reference count (semaphore)

2018-08-08 Thread Ravi Bangoria

v8 changes:
 - Call delayed_uprobe_remove(uprobe, NULL) from put_uprobe() as well.
 - Other minor optimizations.

v7: https://lkml.org/lkml/2018/7/31/20


Description:
Userspace Statically Defined Tracepoints[1] are dtrace style markers
inside userspace applications. Applications like PostgreSQL, MySQL,
Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
have these markers embedded in them. These markers are added by developer
at important places in the code. Each marker source expands to a single
nop instruction in the compiled code but there may be additional
overhead for computing the marker arguments which expands to couple of
instructions. In case the overhead is more, execution of it can be
omitted by runtime if() condition when no one is tracing on the marker:

if (reference_counter > 0) {
Execute marker instructions;
}

Default value of reference counter is 0. Tracer has to increment the 
reference counter before tracing on a marker and decrement it when
done with the tracing.

Currently, perf tool has limited supports for SDT markers. I.e. it
can not trace markers surrounded by reference counter. Also, it's
not easy to add reference counter logic in userspace tool like perf,
so basic idea for this patchset is to add reference counter logic in
the a uprobe infrastructure. Ex,[2]

  # cat tick.c
... 
for (i = 0; i < 100; i++) {
DTRACE_PROBE1(tick, loop1, i);
if (TICK_LOOP2_ENABLED()) {
DTRACE_PROBE1(tick, loop2, i); 
}
printf("hi: %d\n", i); 
sleep(1);
}   
... 

Here tick:loop1 is marker without reference counter where as tick:loop2
is surrounded by reference counter condition.

  # perf buildid-cache --add /tmp/tick
  # perf probe sdt_tick:loop1
  # perf probe sdt_tick:loop2

  # perf stat -e sdt_tick:loop1,sdt_tick:loop2 -- /tmp/tick
  hi: 0
  hi: 1
  hi: 2
  ^C
  Performance counter stats for '/tmp/tick':
 3  sdt_tick:loop1
 0  sdt_tick:loop2
 2.747086086 seconds time elapsed

Perf failed to record data for tick:loop2. Same experiment with this
patch series:

  # ./perf buildid-cache --add /tmp/tick
  # ./perf probe sdt_tick:loop2
  # ./perf stat -e sdt_tick:loop2 /tmp/tick
hi: 0
hi: 1
hi: 2
^C  
 Performance counter stats for '/tmp/tick':
 3  sdt_tick:loop2
   2.561851452 seconds time elapsed

[1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation


Ravi Bangoria (6):
  Uprobes: Simplify uprobe_register() body
  Uprobe: Additional argument arch_uprobe to uprobe_write_opcode()
  Uprobes: Support SDT markers having reference count (semaphore)
  Uprobes/sdt: Prevent multiple reference counter for same uprobe
  trace_uprobe/sdt: Prevent multiple reference counter for same uprobe
  perf probe: Support SDT markers having reference counter (semaphore)

 arch/arm/probes/uprobes/core.c |   2 +-
 arch/mips/kernel/uprobes.c |   2 +-
 include/linux/uprobes.h|   7 +-
 kernel/events/uprobes.c| 329 +++--
 kernel/trace/trace.c   |   2 +-
 kernel/trace/trace_uprobe.c|  75 +-
 tools/perf/util/probe-event.c  |  39 -
 tools/perf/util/probe-event.h  |   1 +
 tools/perf/util/probe-file.c   |  34 -
 tools/perf/util/probe-file.h   |   1 +
 tools/perf/util/symbol-elf.c   |  46 --
 tools/perf/util/symbol.h   |   7 +
 12 files changed, 472 insertions(+), 73 deletions(-)

-- 
2.14.4

[PATCH v8 0/6] Uprobes: Support SDT markers having reference count (semaphore)

2018-08-08 Thread Ravi Bangoria

v8 changes:
 - Call delayed_uprobe_remove(uprobe, NULL) from put_uprobe() as well.
 - Other minor optimizations.

v7: https://lkml.org/lkml/2018/7/31/20


Description:
Userspace Statically Defined Tracepoints[1] are dtrace style markers
inside userspace applications. Applications like PostgreSQL, MySQL,
Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
have these markers embedded in them. These markers are added by developer
at important places in the code. Each marker source expands to a single
nop instruction in the compiled code but there may be additional
overhead for computing the marker arguments which expands to couple of
instructions. In case the overhead is more, execution of it can be
omitted by runtime if() condition when no one is tracing on the marker:

if (reference_counter > 0) {
Execute marker instructions;
}

Default value of reference counter is 0. Tracer has to increment the 
reference counter before tracing on a marker and decrement it when
done with the tracing.

Currently, perf tool has limited supports for SDT markers. I.e. it
can not trace markers surrounded by reference counter. Also, it's
not easy to add reference counter logic in userspace tool like perf,
so basic idea for this patchset is to add reference counter logic in
the a uprobe infrastructure. Ex,[2]

  # cat tick.c
... 
for (i = 0; i < 100; i++) {
DTRACE_PROBE1(tick, loop1, i);
if (TICK_LOOP2_ENABLED()) {
DTRACE_PROBE1(tick, loop2, i); 
}
printf("hi: %d\n", i); 
sleep(1);
}   
... 

Here tick:loop1 is marker without reference counter where as tick:loop2
is surrounded by reference counter condition.

  # perf buildid-cache --add /tmp/tick
  # perf probe sdt_tick:loop1
  # perf probe sdt_tick:loop2

  # perf stat -e sdt_tick:loop1,sdt_tick:loop2 -- /tmp/tick
  hi: 0
  hi: 1
  hi: 2
  ^C
  Performance counter stats for '/tmp/tick':
 3  sdt_tick:loop1
 0  sdt_tick:loop2
 2.747086086 seconds time elapsed

Perf failed to record data for tick:loop2. Same experiment with this
patch series:

  # ./perf buildid-cache --add /tmp/tick
  # ./perf probe sdt_tick:loop2
  # ./perf stat -e sdt_tick:loop2 /tmp/tick
hi: 0
hi: 1
hi: 2
^C  
 Performance counter stats for '/tmp/tick':
 3  sdt_tick:loop2
   2.561851452 seconds time elapsed

[1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation


Ravi Bangoria (6):
  Uprobes: Simplify uprobe_register() body
  Uprobe: Additional argument arch_uprobe to uprobe_write_opcode()
  Uprobes: Support SDT markers having reference count (semaphore)
  Uprobes/sdt: Prevent multiple reference counter for same uprobe
  trace_uprobe/sdt: Prevent multiple reference counter for same uprobe
  perf probe: Support SDT markers having reference counter (semaphore)

 arch/arm/probes/uprobes/core.c |   2 +-
 arch/mips/kernel/uprobes.c |   2 +-
 include/linux/uprobes.h|   7 +-
 kernel/events/uprobes.c| 329 +++--
 kernel/trace/trace.c   |   2 +-
 kernel/trace/trace_uprobe.c|  75 +-
 tools/perf/util/probe-event.c  |  39 -
 tools/perf/util/probe-event.h  |   1 +
 tools/perf/util/probe-file.c   |  34 -
 tools/perf/util/probe-file.h   |   1 +
 tools/perf/util/symbol-elf.c   |  46 --
 tools/perf/util/symbol.h   |   7 +
 12 files changed, 472 insertions(+), 73 deletions(-)

-- 
2.14.4

[PATCH v8 1/6] Uprobes: Simplify uprobe_register() body

2018-08-08 Thread Ravi Bangoria

Simplify uprobe_register() function body and let __uprobe_register()
handle everything. Also move dependency functions around to fix build
failures.

Signed-off-by: Ravi Bangoria 
---
 kernel/events/uprobes.c | 69 ++---
 1 file changed, 36 insertions(+), 33 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index ccc579a7d32e..471eac896635 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -840,13 +840,8 @@ register_for_each_vma(struct uprobe *uprobe, struct 
uprobe_consumer *new)
return err;
 }
 
-static int __uprobe_register(struct uprobe *uprobe, struct uprobe_consumer *uc)
-{
-   consumer_add(uprobe, uc);
-   return register_for_each_vma(uprobe, uc);
-}
-
-static void __uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer 
*uc)
+static void
+__uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc)
 {
int err;
 
@@ -860,24 +855,46 @@ static void __uprobe_unregister(struct uprobe *uprobe, 
struct uprobe_consumer *u
 }
 
 /*
- * uprobe_register - register a probe
+ * uprobe_unregister - unregister a already registered probe.
+ * @inode: the file in which the probe has to be removed.
+ * @offset: offset from the start of the file.
+ * @uc: identify which probe if multiple probes are colocated.
+ */
+void uprobe_unregister(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc)
+{
+   struct uprobe *uprobe;
+
+   uprobe = find_uprobe(inode, offset);
+   if (WARN_ON(!uprobe))
+   return;
+
+   down_write(>register_rwsem);
+   __uprobe_unregister(uprobe, uc);
+   up_write(>register_rwsem);
+   put_uprobe(uprobe);
+}
+EXPORT_SYMBOL_GPL(uprobe_unregister);
+
+/*
+ * __uprobe_register - register a probe
  * @inode: the file in which the probe has to be placed.
  * @offset: offset from the start of the file.
  * @uc: information on howto handle the probe..
  *
- * Apart from the access refcount, uprobe_register() takes a creation
+ * Apart from the access refcount, __uprobe_register() takes a creation
  * refcount (thro alloc_uprobe) if and only if this @uprobe is getting
  * inserted into the rbtree (i.e first consumer for a @inode:@offset
  * tuple).  Creation refcount stops uprobe_unregister from freeing the
  * @uprobe even before the register operation is complete. Creation
  * refcount is released when the last @uc for the @uprobe
- * unregisters. Caller of uprobe_register() is required to keep @inode
+ * unregisters. Caller of __uprobe_register() is required to keep @inode
  * (and the containing mount) referenced.
  *
  * Return errno if it cannot successully install probes
  * else return 0 (success)
  */
-int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer 
*uc)
+static int __uprobe_register(struct inode *inode, loff_t offset,
+struct uprobe_consumer *uc)
 {
struct uprobe *uprobe;
int ret;
@@ -904,7 +921,8 @@ int uprobe_register(struct inode *inode, loff_t offset, 
struct uprobe_consumer *
down_write(>register_rwsem);
ret = -EAGAIN;
if (likely(uprobe_is_active(uprobe))) {
-   ret = __uprobe_register(uprobe, uc);
+   consumer_add(uprobe, uc);
+   ret = register_for_each_vma(uprobe, uc);
if (ret)
__uprobe_unregister(uprobe, uc);
}
@@ -915,6 +933,12 @@ int uprobe_register(struct inode *inode, loff_t offset, 
struct uprobe_consumer *
goto retry;
return ret;
 }
+
+int uprobe_register(struct inode *inode, loff_t offset,
+   struct uprobe_consumer *uc)
+{
+   return __uprobe_register(inode, offset, uc);
+}
 EXPORT_SYMBOL_GPL(uprobe_register);
 
 /*
@@ -946,27 +970,6 @@ int uprobe_apply(struct inode *inode, loff_t offset,
return ret;
 }
 
-/*
- * uprobe_unregister - unregister a already registered probe.
- * @inode: the file in which the probe has to be removed.
- * @offset: offset from the start of the file.
- * @uc: identify which probe if multiple probes are colocated.
- */
-void uprobe_unregister(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc)
-{
-   struct uprobe *uprobe;
-
-   uprobe = find_uprobe(inode, offset);
-   if (WARN_ON(!uprobe))
-   return;
-
-   down_write(>register_rwsem);
-   __uprobe_unregister(uprobe, uc);
-   up_write(>register_rwsem);
-   put_uprobe(uprobe);
-}
-EXPORT_SYMBOL_GPL(uprobe_unregister);
-
 static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm)
 {
struct vm_area_struct *vma;
-- 
2.14.4

[PATCH v8 5/6] trace_uprobe/sdt: Prevent multiple reference counter for same uprobe

2018-08-08 Thread Ravi Bangoria

We assume to have only one reference counter for one uprobe.
Don't allow user to add multiple trace_uprobe entries having
same inode+offset but different reference counter.

Ex,
  # echo "p:sdt_tick/loop2 /home/ravi/tick:0x6e4(0x10036)" > uprobe_events
  # echo "p:sdt_tick/loop2_1 /home/ravi/tick:0x6e4(0xf)" >> uprobe_events
  bash: echo: write error: Invalid argument

  # dmesg
  trace_kprobe: Reference counter offset mismatch.

There is one exception though:
When user is trying to replace the old entry with the new
one, we allow this if the new entry does not conflict with
any other existing entries.

Signed-off-by: Ravi Bangoria 
---
 kernel/trace/trace_uprobe.c | 37 +++--
 1 file changed, 35 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index bf2be098eb08..be64d943d7ea 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -324,6 +324,35 @@ static int unregister_trace_uprobe(struct trace_uprobe *tu)
return 0;
 }
 
+/*
+ * Uprobe with multiple reference counter is not allowed. i.e.
+ * If inode and offset matches, reference counter offset *must*
+ * match as well. Though, there is one exception: If user is
+ * replacing old trace_uprobe with new one(same group/event),
+ * then we allow same uprobe with new reference counter as far
+ * as the new one does not conflict with any other existing
+ * ones.
+ */
+static struct trace_uprobe *find_old_trace_uprobe(struct trace_uprobe *new)
+{
+   struct trace_uprobe *tmp, *old = NULL;
+   struct inode *new_inode = d_real_inode(new->path.dentry);
+
+   old = find_probe_event(trace_event_name(>tp.call),
+   new->tp.call.class->system);
+
+   list_for_each_entry(tmp, _list, list) {
+   if ((old ? old != tmp : true) &&
+   new_inode == d_real_inode(tmp->path.dentry) &&
+   new->offset == tmp->offset &&
+   new->ref_ctr_offset != tmp->ref_ctr_offset) {
+   pr_warn("Reference counter offset mismatch.");
+   return ERR_PTR(-EINVAL);
+   }
+   }
+   return old;
+}
+
 /* Register a trace_uprobe and probe_event */
 static int register_trace_uprobe(struct trace_uprobe *tu)
 {
@@ -333,8 +362,12 @@ static int register_trace_uprobe(struct trace_uprobe *tu)
mutex_lock(_lock);
 
/* register as an event */
-   old_tu = find_probe_event(trace_event_name(>tp.call),
-   tu->tp.call.class->system);
+   old_tu = find_old_trace_uprobe(tu);
+   if (IS_ERR(old_tu)) {
+   ret = PTR_ERR(old_tu);
+   goto end;
+   }
+
if (old_tu) {
/* delete old event */
ret = unregister_trace_uprobe(old_tu);
-- 
2.14.4

[PATCH v8 2/6] Uprobe: Additional argument arch_uprobe to uprobe_write_opcode()

2018-08-08 Thread Ravi Bangoria

Add addition argument 'arch_uprobe' to uprobe_write_opcode().
We need this in later set of patches.

Signed-off-by: Ravi Bangoria 
---
 arch/arm/probes/uprobes/core.c | 2 +-
 arch/mips/kernel/uprobes.c | 2 +-
 include/linux/uprobes.h| 2 +-
 kernel/events/uprobes.c| 9 +
 4 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/arm/probes/uprobes/core.c b/arch/arm/probes/uprobes/core.c
index d1329f1ba4e4..bf992264060e 100644
--- a/arch/arm/probes/uprobes/core.c
+++ b/arch/arm/probes/uprobes/core.c
@@ -32,7 +32,7 @@ bool is_swbp_insn(uprobe_opcode_t *insn)
 int set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm,
 unsigned long vaddr)
 {
-   return uprobe_write_opcode(mm, vaddr,
+   return uprobe_write_opcode(auprobe, mm, vaddr,
   __opcode_to_mem_arm(auprobe->bpinsn));
 }
 
diff --git a/arch/mips/kernel/uprobes.c b/arch/mips/kernel/uprobes.c
index f7a0645ccb82..4aaff3b3175c 100644
--- a/arch/mips/kernel/uprobes.c
+++ b/arch/mips/kernel/uprobes.c
@@ -224,7 +224,7 @@ unsigned long arch_uretprobe_hijack_return_addr(
 int __weak set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm,
unsigned long vaddr)
 {
-   return uprobe_write_opcode(mm, vaddr, UPROBE_SWBP_INSN);
+   return uprobe_write_opcode(auprobe, mm, vaddr, UPROBE_SWBP_INSN);
 }
 
 void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 0a294e950df8..bb9d2084af03 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -121,7 +121,7 @@ extern bool is_swbp_insn(uprobe_opcode_t *insn);
 extern bool is_trap_insn(uprobe_opcode_t *insn);
 extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs);
 extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
-extern int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr, 
uprobe_opcode_t);
+extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct 
*mm, unsigned long vaddr, uprobe_opcode_t);
 extern int uprobe_register(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc);
 extern int uprobe_apply(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc, bool);
 extern void uprobe_unregister(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc);
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 471eac896635..c0418ba52ba8 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -299,8 +299,8 @@ static int verify_opcode(struct page *page, unsigned long 
vaddr, uprobe_opcode_t
  * Called with mm->mmap_sem held for write.
  * Return 0 (success) or a negative errno.
  */
-int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr,
-   uprobe_opcode_t opcode)
+int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
+   unsigned long vaddr, uprobe_opcode_t opcode)
 {
struct page *old_page, *new_page;
struct vm_area_struct *vma;
@@ -351,7 +351,7 @@ int uprobe_write_opcode(struct mm_struct *mm, unsigned long 
vaddr,
  */
 int __weak set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm, 
unsigned long vaddr)
 {
-   return uprobe_write_opcode(mm, vaddr, UPROBE_SWBP_INSN);
+   return uprobe_write_opcode(auprobe, mm, vaddr, UPROBE_SWBP_INSN);
 }
 
 /**
@@ -366,7 +366,8 @@ int __weak set_swbp(struct arch_uprobe *auprobe, struct 
mm_struct *mm, unsigned
 int __weak
 set_orig_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long 
vaddr)
 {
-   return uprobe_write_opcode(mm, vaddr, *(uprobe_opcode_t 
*)>insn);
+   return uprobe_write_opcode(auprobe, mm, vaddr,
+   *(uprobe_opcode_t *)>insn);
 }
 
 static struct uprobe *get_uprobe(struct uprobe *uprobe)
-- 
2.14.4

[PATCH v8 6/6] perf probe: Support SDT markers having reference counter (semaphore)

2018-08-08 Thread Ravi Bangoria

With this, perf buildid-cache will save SDT markers with reference
counter in probe cache. Perf probe will be able to probe markers
having reference counter. Ex,

  # readelf -n /tmp/tick | grep -A1 loop2
Name: loop2
... Semaphore: 0x10020036

  # ./perf buildid-cache --add /tmp/tick
  # ./perf probe sdt_tick:loop2
  # ./perf stat -e sdt_tick:loop2 /tmp/tick
hi: 0
hi: 1
hi: 2
^C
 Performance counter stats for '/tmp/tick':
 3  sdt_tick:loop2
   2.561851452 seconds time elapsed

Signed-off-by: Ravi Bangoria 
Acked-by: Masami Hiramatsu 
Acked-by: Srikar Dronamraju 
---
 tools/perf/util/probe-event.c | 39 
 tools/perf/util/probe-event.h |  1 +
 tools/perf/util/probe-file.c  | 34 ++--
 tools/perf/util/probe-file.h  |  1 +
 tools/perf/util/symbol-elf.c  | 46 ---
 tools/perf/util/symbol.h  |  7 +++
 6 files changed, 106 insertions(+), 22 deletions(-)

diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index f119eb628dbb..e86f8be89157 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -1819,6 +1819,12 @@ int parse_probe_trace_command(const char *cmd, struct 
probe_trace_event *tev)
tp->offset = strtoul(fmt2_str, NULL, 10);
}
 
+   if (tev->uprobes) {
+   fmt2_str = strchr(p, '(');
+   if (fmt2_str)
+   tp->ref_ctr_offset = strtoul(fmt2_str + 1, NULL, 0);
+   }
+
tev->nargs = argc - 2;
tev->args = zalloc(sizeof(struct probe_trace_arg) * tev->nargs);
if (tev->args == NULL) {
@@ -2012,6 +2018,22 @@ static int synthesize_probe_trace_arg(struct 
probe_trace_arg *arg,
return err;
 }
 
+static int
+synthesize_uprobe_trace_def(struct probe_trace_event *tev, struct strbuf *buf)
+{
+   struct probe_trace_point *tp = >point;
+   int err;
+
+   err = strbuf_addf(buf, "%s:0x%lx", tp->module, tp->address);
+
+   if (err >= 0 && tp->ref_ctr_offset) {
+   if (!uprobe_ref_ctr_is_supported())
+   return -1;
+   err = strbuf_addf(buf, "(0x%lx)", tp->ref_ctr_offset);
+   }
+   return err >= 0 ? 0 : -1;
+}
+
 char *synthesize_probe_trace_command(struct probe_trace_event *tev)
 {
struct probe_trace_point *tp = >point;
@@ -2041,15 +2063,17 @@ char *synthesize_probe_trace_command(struct 
probe_trace_event *tev)
}
 
/* Use the tp->address for uprobes */
-   if (tev->uprobes)
-   err = strbuf_addf(, "%s:0x%lx", tp->module, tp->address);
-   else if (!strncmp(tp->symbol, "0x", 2))
+   if (tev->uprobes) {
+   err = synthesize_uprobe_trace_def(tev, );
+   } else if (!strncmp(tp->symbol, "0x", 2)) {
/* Absolute address. See try_to_find_absolute_address() */
err = strbuf_addf(, "%s%s0x%lx", tp->module ?: "",
  tp->module ? ":" : "", tp->address);
-   else
+   } else {
err = strbuf_addf(, "%s%s%s+%lu", tp->module ?: "",
tp->module ? ":" : "", tp->symbol, tp->offset);
+   }
+
if (err)
goto error;
 
@@ -2633,6 +2657,13 @@ static void warn_uprobe_event_compat(struct 
probe_trace_event *tev)
 {
int i;
char *buf = synthesize_probe_trace_command(tev);
+   struct probe_trace_point *tp = >point;
+
+   if (tp->ref_ctr_offset && !uprobe_ref_ctr_is_supported()) {
+   pr_warning("A semaphore is associated with %s:%s and "
+  "seems your kernel doesn't support it.\n",
+  tev->group, tev->event);
+   }
 
/* Old uprobe event doesn't support memory dereference */
if (!tev->uprobes || tev->nargs == 0 || !buf)
diff --git a/tools/perf/util/probe-event.h b/tools/perf/util/probe-event.h
index 45b14f020558..15a98c3a2a2f 100644
--- a/tools/perf/util/probe-event.h
+++ b/tools/perf/util/probe-event.h
@@ -27,6 +27,7 @@ struct probe_trace_point {
char*symbol;/* Base symbol */
char*module;/* Module name */
unsigned long   offset; /* Offset from symbol */
+   unsigned long   ref_ctr_offset; /* SDT reference counter offset */
unsigned long   address;/* Actual address of the trace point */
boolretprobe;   /* Return probe flag */
 };
diff --git a/tools/perf/util/probe-file.c b/tools/perf/util/probe-file.c
index b76088fadf3d..aac7817d9e14 100644
--- a/tools/perf/util/probe-file.c
+++ b/tools/perf/util/probe-file.c
@@ -696,8 +696,16 @@ int probe_cache__add_entry(struct probe_cache *pcache,
 #ifdef HAVE_GELF_GETNOTE_SUPPORT
 static unsigned long long sdt_note__get_addr(struct sdt_note *note)
 {
-   return note->bit32 ? (unsigned long

[PATCH v8 1/6] Uprobes: Simplify uprobe_register() body

2018-08-08 Thread Ravi Bangoria

Simplify uprobe_register() function body and let __uprobe_register()
handle everything. Also move dependency functions around to fix build
failures.

Signed-off-by: Ravi Bangoria 
---
 kernel/events/uprobes.c | 69 ++---
 1 file changed, 36 insertions(+), 33 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index ccc579a7d32e..471eac896635 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -840,13 +840,8 @@ register_for_each_vma(struct uprobe *uprobe, struct 
uprobe_consumer *new)
return err;
 }
 
-static int __uprobe_register(struct uprobe *uprobe, struct uprobe_consumer *uc)
-{
-   consumer_add(uprobe, uc);
-   return register_for_each_vma(uprobe, uc);
-}
-
-static void __uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer 
*uc)
+static void
+__uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc)
 {
int err;
 
@@ -860,24 +855,46 @@ static void __uprobe_unregister(struct uprobe *uprobe, 
struct uprobe_consumer *u
 }
 
 /*
- * uprobe_register - register a probe
+ * uprobe_unregister - unregister a already registered probe.
+ * @inode: the file in which the probe has to be removed.
+ * @offset: offset from the start of the file.
+ * @uc: identify which probe if multiple probes are colocated.
+ */
+void uprobe_unregister(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc)
+{
+   struct uprobe *uprobe;
+
+   uprobe = find_uprobe(inode, offset);
+   if (WARN_ON(!uprobe))
+   return;
+
+   down_write(>register_rwsem);
+   __uprobe_unregister(uprobe, uc);
+   up_write(>register_rwsem);
+   put_uprobe(uprobe);
+}
+EXPORT_SYMBOL_GPL(uprobe_unregister);
+
+/*
+ * __uprobe_register - register a probe
  * @inode: the file in which the probe has to be placed.
  * @offset: offset from the start of the file.
  * @uc: information on howto handle the probe..
  *
- * Apart from the access refcount, uprobe_register() takes a creation
+ * Apart from the access refcount, __uprobe_register() takes a creation
  * refcount (thro alloc_uprobe) if and only if this @uprobe is getting
  * inserted into the rbtree (i.e first consumer for a @inode:@offset
  * tuple).  Creation refcount stops uprobe_unregister from freeing the
  * @uprobe even before the register operation is complete. Creation
  * refcount is released when the last @uc for the @uprobe
- * unregisters. Caller of uprobe_register() is required to keep @inode
+ * unregisters. Caller of __uprobe_register() is required to keep @inode
  * (and the containing mount) referenced.
  *
  * Return errno if it cannot successully install probes
  * else return 0 (success)
  */
-int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer 
*uc)
+static int __uprobe_register(struct inode *inode, loff_t offset,
+struct uprobe_consumer *uc)
 {
struct uprobe *uprobe;
int ret;
@@ -904,7 +921,8 @@ int uprobe_register(struct inode *inode, loff_t offset, 
struct uprobe_consumer *
down_write(>register_rwsem);
ret = -EAGAIN;
if (likely(uprobe_is_active(uprobe))) {
-   ret = __uprobe_register(uprobe, uc);
+   consumer_add(uprobe, uc);
+   ret = register_for_each_vma(uprobe, uc);
if (ret)
__uprobe_unregister(uprobe, uc);
}
@@ -915,6 +933,12 @@ int uprobe_register(struct inode *inode, loff_t offset, 
struct uprobe_consumer *
goto retry;
return ret;
 }
+
+int uprobe_register(struct inode *inode, loff_t offset,
+   struct uprobe_consumer *uc)
+{
+   return __uprobe_register(inode, offset, uc);
+}
 EXPORT_SYMBOL_GPL(uprobe_register);
 
 /*
@@ -946,27 +970,6 @@ int uprobe_apply(struct inode *inode, loff_t offset,
return ret;
 }
 
-/*
- * uprobe_unregister - unregister a already registered probe.
- * @inode: the file in which the probe has to be removed.
- * @offset: offset from the start of the file.
- * @uc: identify which probe if multiple probes are colocated.
- */
-void uprobe_unregister(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc)
-{
-   struct uprobe *uprobe;
-
-   uprobe = find_uprobe(inode, offset);
-   if (WARN_ON(!uprobe))
-   return;
-
-   down_write(>register_rwsem);
-   __uprobe_unregister(uprobe, uc);
-   up_write(>register_rwsem);
-   put_uprobe(uprobe);
-}
-EXPORT_SYMBOL_GPL(uprobe_unregister);
-
 static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm)
 {
struct vm_area_struct *vma;
-- 
2.14.4

[PATCH v8 5/6] trace_uprobe/sdt: Prevent multiple reference counter for same uprobe

2018-08-08 Thread Ravi Bangoria

We assume to have only one reference counter for one uprobe.
Don't allow user to add multiple trace_uprobe entries having
same inode+offset but different reference counter.

Ex,
  # echo "p:sdt_tick/loop2 /home/ravi/tick:0x6e4(0x10036)" > uprobe_events
  # echo "p:sdt_tick/loop2_1 /home/ravi/tick:0x6e4(0xf)" >> uprobe_events
  bash: echo: write error: Invalid argument

  # dmesg
  trace_kprobe: Reference counter offset mismatch.

There is one exception though:
When user is trying to replace the old entry with the new
one, we allow this if the new entry does not conflict with
any other existing entries.

Signed-off-by: Ravi Bangoria 
---
 kernel/trace/trace_uprobe.c | 37 +++--
 1 file changed, 35 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index bf2be098eb08..be64d943d7ea 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -324,6 +324,35 @@ static int unregister_trace_uprobe(struct trace_uprobe *tu)
return 0;
 }
 
+/*
+ * Uprobe with multiple reference counter is not allowed. i.e.
+ * If inode and offset matches, reference counter offset *must*
+ * match as well. Though, there is one exception: If user is
+ * replacing old trace_uprobe with new one(same group/event),
+ * then we allow same uprobe with new reference counter as far
+ * as the new one does not conflict with any other existing
+ * ones.
+ */
+static struct trace_uprobe *find_old_trace_uprobe(struct trace_uprobe *new)
+{
+   struct trace_uprobe *tmp, *old = NULL;
+   struct inode *new_inode = d_real_inode(new->path.dentry);
+
+   old = find_probe_event(trace_event_name(>tp.call),
+   new->tp.call.class->system);
+
+   list_for_each_entry(tmp, _list, list) {
+   if ((old ? old != tmp : true) &&
+   new_inode == d_real_inode(tmp->path.dentry) &&
+   new->offset == tmp->offset &&
+   new->ref_ctr_offset != tmp->ref_ctr_offset) {
+   pr_warn("Reference counter offset mismatch.");
+   return ERR_PTR(-EINVAL);
+   }
+   }
+   return old;
+}
+
 /* Register a trace_uprobe and probe_event */
 static int register_trace_uprobe(struct trace_uprobe *tu)
 {
@@ -333,8 +362,12 @@ static int register_trace_uprobe(struct trace_uprobe *tu)
mutex_lock(_lock);
 
/* register as an event */
-   old_tu = find_probe_event(trace_event_name(>tp.call),
-   tu->tp.call.class->system);
+   old_tu = find_old_trace_uprobe(tu);
+   if (IS_ERR(old_tu)) {
+   ret = PTR_ERR(old_tu);
+   goto end;
+   }
+
if (old_tu) {
/* delete old event */
ret = unregister_trace_uprobe(old_tu);
-- 
2.14.4

[PATCH v8 2/6] Uprobe: Additional argument arch_uprobe to uprobe_write_opcode()

2018-08-08 Thread Ravi Bangoria

Add addition argument 'arch_uprobe' to uprobe_write_opcode().
We need this in later set of patches.

Signed-off-by: Ravi Bangoria 
---
 arch/arm/probes/uprobes/core.c | 2 +-
 arch/mips/kernel/uprobes.c | 2 +-
 include/linux/uprobes.h| 2 +-
 kernel/events/uprobes.c| 9 +
 4 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/arm/probes/uprobes/core.c b/arch/arm/probes/uprobes/core.c
index d1329f1ba4e4..bf992264060e 100644
--- a/arch/arm/probes/uprobes/core.c
+++ b/arch/arm/probes/uprobes/core.c
@@ -32,7 +32,7 @@ bool is_swbp_insn(uprobe_opcode_t *insn)
 int set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm,
 unsigned long vaddr)
 {
-   return uprobe_write_opcode(mm, vaddr,
+   return uprobe_write_opcode(auprobe, mm, vaddr,
   __opcode_to_mem_arm(auprobe->bpinsn));
 }
 
diff --git a/arch/mips/kernel/uprobes.c b/arch/mips/kernel/uprobes.c
index f7a0645ccb82..4aaff3b3175c 100644
--- a/arch/mips/kernel/uprobes.c
+++ b/arch/mips/kernel/uprobes.c
@@ -224,7 +224,7 @@ unsigned long arch_uretprobe_hijack_return_addr(
 int __weak set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm,
unsigned long vaddr)
 {
-   return uprobe_write_opcode(mm, vaddr, UPROBE_SWBP_INSN);
+   return uprobe_write_opcode(auprobe, mm, vaddr, UPROBE_SWBP_INSN);
 }
 
 void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 0a294e950df8..bb9d2084af03 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -121,7 +121,7 @@ extern bool is_swbp_insn(uprobe_opcode_t *insn);
 extern bool is_trap_insn(uprobe_opcode_t *insn);
 extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs);
 extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
-extern int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr, 
uprobe_opcode_t);
+extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct 
*mm, unsigned long vaddr, uprobe_opcode_t);
 extern int uprobe_register(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc);
 extern int uprobe_apply(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc, bool);
 extern void uprobe_unregister(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc);
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 471eac896635..c0418ba52ba8 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -299,8 +299,8 @@ static int verify_opcode(struct page *page, unsigned long 
vaddr, uprobe_opcode_t
  * Called with mm->mmap_sem held for write.
  * Return 0 (success) or a negative errno.
  */
-int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr,
-   uprobe_opcode_t opcode)
+int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
+   unsigned long vaddr, uprobe_opcode_t opcode)
 {
struct page *old_page, *new_page;
struct vm_area_struct *vma;
@@ -351,7 +351,7 @@ int uprobe_write_opcode(struct mm_struct *mm, unsigned long 
vaddr,
  */
 int __weak set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm, 
unsigned long vaddr)
 {
-   return uprobe_write_opcode(mm, vaddr, UPROBE_SWBP_INSN);
+   return uprobe_write_opcode(auprobe, mm, vaddr, UPROBE_SWBP_INSN);
 }
 
 /**
@@ -366,7 +366,8 @@ int __weak set_swbp(struct arch_uprobe *auprobe, struct 
mm_struct *mm, unsigned
 int __weak
 set_orig_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long 
vaddr)
 {
-   return uprobe_write_opcode(mm, vaddr, *(uprobe_opcode_t 
*)>insn);
+   return uprobe_write_opcode(auprobe, mm, vaddr,
+   *(uprobe_opcode_t *)>insn);
 }
 
 static struct uprobe *get_uprobe(struct uprobe *uprobe)
-- 
2.14.4

[PATCH v8 6/6] perf probe: Support SDT markers having reference counter (semaphore)

2018-08-08 Thread Ravi Bangoria

With this, perf buildid-cache will save SDT markers with reference
counter in probe cache. Perf probe will be able to probe markers
having reference counter. Ex,

  # readelf -n /tmp/tick | grep -A1 loop2
Name: loop2
... Semaphore: 0x10020036

  # ./perf buildid-cache --add /tmp/tick
  # ./perf probe sdt_tick:loop2
  # ./perf stat -e sdt_tick:loop2 /tmp/tick
hi: 0
hi: 1
hi: 2
^C
 Performance counter stats for '/tmp/tick':
 3  sdt_tick:loop2
   2.561851452 seconds time elapsed

Signed-off-by: Ravi Bangoria 
Acked-by: Masami Hiramatsu 
Acked-by: Srikar Dronamraju 
---
 tools/perf/util/probe-event.c | 39 
 tools/perf/util/probe-event.h |  1 +
 tools/perf/util/probe-file.c  | 34 ++--
 tools/perf/util/probe-file.h  |  1 +
 tools/perf/util/symbol-elf.c  | 46 ---
 tools/perf/util/symbol.h  |  7 +++
 6 files changed, 106 insertions(+), 22 deletions(-)

diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index f119eb628dbb..e86f8be89157 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -1819,6 +1819,12 @@ int parse_probe_trace_command(const char *cmd, struct 
probe_trace_event *tev)
tp->offset = strtoul(fmt2_str, NULL, 10);
}
 
+   if (tev->uprobes) {
+   fmt2_str = strchr(p, '(');
+   if (fmt2_str)
+   tp->ref_ctr_offset = strtoul(fmt2_str + 1, NULL, 0);
+   }
+
tev->nargs = argc - 2;
tev->args = zalloc(sizeof(struct probe_trace_arg) * tev->nargs);
if (tev->args == NULL) {
@@ -2012,6 +2018,22 @@ static int synthesize_probe_trace_arg(struct 
probe_trace_arg *arg,
return err;
 }
 
+static int
+synthesize_uprobe_trace_def(struct probe_trace_event *tev, struct strbuf *buf)
+{
+   struct probe_trace_point *tp = >point;
+   int err;
+
+   err = strbuf_addf(buf, "%s:0x%lx", tp->module, tp->address);
+
+   if (err >= 0 && tp->ref_ctr_offset) {
+   if (!uprobe_ref_ctr_is_supported())
+   return -1;
+   err = strbuf_addf(buf, "(0x%lx)", tp->ref_ctr_offset);
+   }
+   return err >= 0 ? 0 : -1;
+}
+
 char *synthesize_probe_trace_command(struct probe_trace_event *tev)
 {
struct probe_trace_point *tp = >point;
@@ -2041,15 +2063,17 @@ char *synthesize_probe_trace_command(struct 
probe_trace_event *tev)
}
 
/* Use the tp->address for uprobes */
-   if (tev->uprobes)
-   err = strbuf_addf(, "%s:0x%lx", tp->module, tp->address);
-   else if (!strncmp(tp->symbol, "0x", 2))
+   if (tev->uprobes) {
+   err = synthesize_uprobe_trace_def(tev, );
+   } else if (!strncmp(tp->symbol, "0x", 2)) {
/* Absolute address. See try_to_find_absolute_address() */
err = strbuf_addf(, "%s%s0x%lx", tp->module ?: "",
  tp->module ? ":" : "", tp->address);
-   else
+   } else {
err = strbuf_addf(, "%s%s%s+%lu", tp->module ?: "",
tp->module ? ":" : "", tp->symbol, tp->offset);
+   }
+
if (err)
goto error;
 
@@ -2633,6 +2657,13 @@ static void warn_uprobe_event_compat(struct 
probe_trace_event *tev)
 {
int i;
char *buf = synthesize_probe_trace_command(tev);
+   struct probe_trace_point *tp = >point;
+
+   if (tp->ref_ctr_offset && !uprobe_ref_ctr_is_supported()) {
+   pr_warning("A semaphore is associated with %s:%s and "
+  "seems your kernel doesn't support it.\n",
+  tev->group, tev->event);
+   }
 
/* Old uprobe event doesn't support memory dereference */
if (!tev->uprobes || tev->nargs == 0 || !buf)
diff --git a/tools/perf/util/probe-event.h b/tools/perf/util/probe-event.h
index 45b14f020558..15a98c3a2a2f 100644
--- a/tools/perf/util/probe-event.h
+++ b/tools/perf/util/probe-event.h
@@ -27,6 +27,7 @@ struct probe_trace_point {
char*symbol;/* Base symbol */
char*module;/* Module name */
unsigned long   offset; /* Offset from symbol */
+   unsigned long   ref_ctr_offset; /* SDT reference counter offset */
unsigned long   address;/* Actual address of the trace point */
boolretprobe;   /* Return probe flag */
 };
diff --git a/tools/perf/util/probe-file.c b/tools/perf/util/probe-file.c
index b76088fadf3d..aac7817d9e14 100644
--- a/tools/perf/util/probe-file.c
+++ b/tools/perf/util/probe-file.c
@@ -696,8 +696,16 @@ int probe_cache__add_entry(struct probe_cache *pcache,
 #ifdef HAVE_GELF_GETNOTE_SUPPORT
 static unsigned long long sdt_note__get_addr(struct sdt_note *note)
 {
-   return note->bit32 ? (unsigned long

[PATCH v8 4/6] Uprobes/sdt: Prevent multiple reference counter for same uprobe

2018-08-08 Thread Ravi Bangoria

We assume to have only one reference counter for one uprobe.
Don't allow user to register multiple uprobes having same
inode+offset but different reference counter.

Signed-off-by: Ravi Bangoria 
---
 kernel/events/uprobes.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 61b0481ef417..492a0e005b4d 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -689,6 +689,12 @@ static struct uprobe *alloc_uprobe(struct inode *inode, 
loff_t offset,
cur_uprobe = insert_uprobe(uprobe);
/* a uprobe exists for this inode:offset combination */
if (cur_uprobe) {
+   if (cur_uprobe->ref_ctr_offset != uprobe->ref_ctr_offset) {
+   pr_warn("Reference counter offset mismatch.\n");
+   put_uprobe(cur_uprobe);
+   kfree(uprobe);
+   return ERR_PTR(-EINVAL);
+   }
kfree(uprobe);
uprobe = cur_uprobe;
}
@@ -1103,6 +1109,9 @@ static int __uprobe_register(struct inode *inode, loff_t 
offset,
uprobe = alloc_uprobe(inode, offset, ref_ctr_offset);
if (!uprobe)
return -ENOMEM;
+   if (IS_ERR(uprobe))
+   return PTR_ERR(uprobe);
+
/*
 * We can race with uprobe_unregister()->delete_uprobe().
 * Check uprobe_is_active() and retry if it is false.
-- 
2.14.4

[PATCH v8 4/6] Uprobes/sdt: Prevent multiple reference counter for same uprobe

2018-08-08 Thread Ravi Bangoria

We assume to have only one reference counter for one uprobe.
Don't allow user to register multiple uprobes having same
inode+offset but different reference counter.

Signed-off-by: Ravi Bangoria 
---
 kernel/events/uprobes.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 61b0481ef417..492a0e005b4d 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -689,6 +689,12 @@ static struct uprobe *alloc_uprobe(struct inode *inode, 
loff_t offset,
cur_uprobe = insert_uprobe(uprobe);
/* a uprobe exists for this inode:offset combination */
if (cur_uprobe) {
+   if (cur_uprobe->ref_ctr_offset != uprobe->ref_ctr_offset) {
+   pr_warn("Reference counter offset mismatch.\n");
+   put_uprobe(cur_uprobe);
+   kfree(uprobe);
+   return ERR_PTR(-EINVAL);
+   }
kfree(uprobe);
uprobe = cur_uprobe;
}
@@ -1103,6 +1109,9 @@ static int __uprobe_register(struct inode *inode, loff_t 
offset,
uprobe = alloc_uprobe(inode, offset, ref_ctr_offset);
if (!uprobe)
return -ENOMEM;
+   if (IS_ERR(uprobe))
+   return PTR_ERR(uprobe);
+
/*
 * We can race with uprobe_unregister()->delete_uprobe().
 * Check uprobe_is_active() and retry if it is false.
-- 
2.14.4

[PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore)

2018-08-08 Thread Ravi Bangoria

Userspace Statically Defined Tracepoints[1] are dtrace style markers
inside userspace applications. Applications like PostgreSQL, MySQL,
Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
have these markers embedded in them. These markers are added by developer
at important places in the code. Each marker source expands to a single
nop instruction in the compiled code but there may be additional
overhead for computing the marker arguments which expands to couple of
instructions. In case the overhead is more, execution of it can be
omitted by runtime if() condition when no one is tracing on the marker:

if (reference_counter > 0) {
Execute marker instructions;
}

Default value of reference counter is 0. Tracer has to increment the
reference counter before tracing on a marker and decrement it when
done with the tracing.

Implement the reference counter logic in core uprobe. User will be
able to use it from trace_uprobe as well as from kernel module. New
trace_uprobe definition with reference counter will now be:

:[(ref_ctr_offset)]

where ref_ctr_offset is an optional field. For kernel module, new
variant of uprobe_register() has been introduced:

uprobe_register_refctr(inode, offset, ref_ctr_offset, consumer)

No new variant for uprobe_unregister() because it's assumed to have
only one reference counter for one uprobe.

[1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation

Note: 'reference counter' is called as 'semaphore' in original Dtrace
(or Systemtap, bcc and even in ELF) documentation and code. But the
term 'semaphore' is misleading in this context. This is just a counter
used to hold number of tracers tracing on a marker. This is not really
used for any synchronization. So we are referring it as 'reference
counter' in kernel / perf code.

Signed-off-by: Ravi Bangoria 
Reviewed-by: Masami Hiramatsu 
[Only trace_uprobe.c]
---
 include/linux/uprobes.h |   5 +
 kernel/events/uprobes.c | 246 ++--
 kernel/trace/trace.c|   2 +-
 kernel/trace/trace_uprobe.c |  38 ++-
 4 files changed, 280 insertions(+), 11 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index bb9d2084af03..103a48a48872 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -123,6 +123,7 @@ extern unsigned long uprobe_get_swbp_addr(struct pt_regs 
*regs);
 extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
 extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct 
*mm, unsigned long vaddr, uprobe_opcode_t);
 extern int uprobe_register(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc);
+extern int uprobe_register_refctr(struct inode *inode, loff_t offset, loff_t 
ref_ctr_offset, struct uprobe_consumer *uc);
 extern int uprobe_apply(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc, bool);
 extern void uprobe_unregister(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc);
 extern int uprobe_mmap(struct vm_area_struct *vma);
@@ -160,6 +161,10 @@ uprobe_register(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc)
 {
return -ENOSYS;
 }
+static inline int uprobe_register_refctr(struct inode *inode, loff_t offset, 
loff_t ref_ctr_offset, struct uprobe_consumer *uc)
+{
+   return -ENOSYS;
+}
 static inline int
 uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, 
bool add)
 {
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index c0418ba52ba8..61b0481ef417 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -73,6 +73,7 @@ struct uprobe {
struct uprobe_consumer  *consumers;
struct inode*inode; /* Also hold a ref to inode */
loff_t  offset;
+   loff_t  ref_ctr_offset;
unsigned long   flags;
 
/*
@@ -88,6 +89,15 @@ struct uprobe {
struct arch_uprobe  arch;
 };
 
+struct delayed_uprobe {
+   struct list_head list;
+   struct uprobe *uprobe;
+   struct mm_struct *mm;
+};
+
+static DEFINE_MUTEX(delayed_uprobe_lock);
+static LIST_HEAD(delayed_uprobe_list);
+
 /*
  * Execute out of line area: anonymous executable mapping installed
  * by the probed task to execute the copy of the original instruction
@@ -282,6 +292,157 @@ static int verify_opcode(struct page *page, unsigned long 
vaddr, uprobe_opcode_t
return 1;
 }
 
+static struct delayed_uprobe *
+delayed_uprobe_check(struct uprobe *uprobe, struct mm_struct *mm)
+{
+   struct delayed_uprobe *du;
+
+   list_for_each_entry(du, _uprobe_list, list)
+   if (du->uprobe == uprobe && du->mm == mm)
+   return du;
+   return NULL;
+}
+
+static int delayed_uprobe_add(struct uprobe *uprobe, struct mm_struct *mm)
+{
+   struct delayed_uprobe *du;
+
+   if (delayed_uprobe_check(uprobe, mm))
+   return 0;
+

[PATCH v8 3/6] Uprobes: Support SDT markers having reference count (semaphore)

2018-08-08 Thread Ravi Bangoria

Userspace Statically Defined Tracepoints[1] are dtrace style markers
inside userspace applications. Applications like PostgreSQL, MySQL,
Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
have these markers embedded in them. These markers are added by developer
at important places in the code. Each marker source expands to a single
nop instruction in the compiled code but there may be additional
overhead for computing the marker arguments which expands to couple of
instructions. In case the overhead is more, execution of it can be
omitted by runtime if() condition when no one is tracing on the marker:

if (reference_counter > 0) {
Execute marker instructions;
}

Default value of reference counter is 0. Tracer has to increment the
reference counter before tracing on a marker and decrement it when
done with the tracing.

Implement the reference counter logic in core uprobe. User will be
able to use it from trace_uprobe as well as from kernel module. New
trace_uprobe definition with reference counter will now be:

:[(ref_ctr_offset)]

where ref_ctr_offset is an optional field. For kernel module, new
variant of uprobe_register() has been introduced:

uprobe_register_refctr(inode, offset, ref_ctr_offset, consumer)

No new variant for uprobe_unregister() because it's assumed to have
only one reference counter for one uprobe.

[1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation

Note: 'reference counter' is called as 'semaphore' in original Dtrace
(or Systemtap, bcc and even in ELF) documentation and code. But the
term 'semaphore' is misleading in this context. This is just a counter
used to hold number of tracers tracing on a marker. This is not really
used for any synchronization. So we are referring it as 'reference
counter' in kernel / perf code.

Signed-off-by: Ravi Bangoria 
Reviewed-by: Masami Hiramatsu 
[Only trace_uprobe.c]
---
 include/linux/uprobes.h |   5 +
 kernel/events/uprobes.c | 246 ++--
 kernel/trace/trace.c|   2 +-
 kernel/trace/trace_uprobe.c |  38 ++-
 4 files changed, 280 insertions(+), 11 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index bb9d2084af03..103a48a48872 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -123,6 +123,7 @@ extern unsigned long uprobe_get_swbp_addr(struct pt_regs 
*regs);
 extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
 extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct 
*mm, unsigned long vaddr, uprobe_opcode_t);
 extern int uprobe_register(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc);
+extern int uprobe_register_refctr(struct inode *inode, loff_t offset, loff_t 
ref_ctr_offset, struct uprobe_consumer *uc);
 extern int uprobe_apply(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc, bool);
 extern void uprobe_unregister(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc);
 extern int uprobe_mmap(struct vm_area_struct *vma);
@@ -160,6 +161,10 @@ uprobe_register(struct inode *inode, loff_t offset, struct 
uprobe_consumer *uc)
 {
return -ENOSYS;
 }
+static inline int uprobe_register_refctr(struct inode *inode, loff_t offset, 
loff_t ref_ctr_offset, struct uprobe_consumer *uc)
+{
+   return -ENOSYS;
+}
 static inline int
 uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, 
bool add)
 {
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index c0418ba52ba8..61b0481ef417 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -73,6 +73,7 @@ struct uprobe {
struct uprobe_consumer  *consumers;
struct inode*inode; /* Also hold a ref to inode */
loff_t  offset;
+   loff_t  ref_ctr_offset;
unsigned long   flags;
 
/*
@@ -88,6 +89,15 @@ struct uprobe {
struct arch_uprobe  arch;
 };
 
+struct delayed_uprobe {
+   struct list_head list;
+   struct uprobe *uprobe;
+   struct mm_struct *mm;
+};
+
+static DEFINE_MUTEX(delayed_uprobe_lock);
+static LIST_HEAD(delayed_uprobe_list);
+
 /*
  * Execute out of line area: anonymous executable mapping installed
  * by the probed task to execute the copy of the original instruction
@@ -282,6 +292,157 @@ static int verify_opcode(struct page *page, unsigned long 
vaddr, uprobe_opcode_t
return 1;
 }
 
+static struct delayed_uprobe *
+delayed_uprobe_check(struct uprobe *uprobe, struct mm_struct *mm)
+{
+   struct delayed_uprobe *du;
+
+   list_for_each_entry(du, _uprobe_list, list)
+   if (du->uprobe == uprobe && du->mm == mm)
+   return du;
+   return NULL;
+}
+
+static int delayed_uprobe_add(struct uprobe *uprobe, struct mm_struct *mm)
+{
+   struct delayed_uprobe *du;
+
+   if (delayed_uprobe_check(uprobe, mm))
+   return 0;
+

FUSE: write operations trigger balance_dirty_pages when using writeback cache

2018-08-08 Thread 刘硕然

Dear Miklos,

Recently I've been testing FUSE and libfuse example passthrough_ll with 
writeback cache on, and found out that the performance drops significantly 
compared to that in local filesystem. As I can see from trace, 
balance_dirty_pages is triggered very frequently even if there not enough pages 
that shall be sent to libfuse. I'm not sure if this is a known fact or the FUSE 
writeback feature requires some specific configurations. Trace log is attached.

dd-19067 [001]  195295.568097: balance_dirty_pages: bdi 0:42: limit=3180390 
setpoint=2782421 dirty=5 bdi_setpoint=0 bdi_dirty=32 dirty_ratelimit=32 
task_ratelimit=0 dirtied=32 dirtied_pause=32 paused=0 pause=33 period=33 
think=0 cgroup_ino=1
dd-19067 [001]  195295.602029: balance_dirty_pages: bdi 0:42: limit=3180390 
setpoint=2782421 dirty=5 bdi_setpoint=0 bdi_dirty=33 dirty_ratelimit=32 
task_ratelimit=0 dirtied=1 dirtied_pause=0 paused=0 pause=34 period=34 think=1 
cgroup_ino=1
dd-19067 [001]  195295.637026: balance_dirty_pages: bdi 0:42: limit=3180390 
setpoint=2782421 dirty=5 bdi_setpoint=0 bdi_dirty=34 dirty_ratelimit=32 
task_ratelimit=0 dirtied=1 dirtied_pause=0 paused=0 pause=35 period=35 think=1 
cgroup_ino=1

BTW, I'm using Linux kernel 4.17.12 and libfuse 3.2.5. Look forward to hearing 
from you. Thanks in advance.

Regards,
Shuoran Liu

FUSE: write operations trigger balance_dirty_pages when using writeback cache

2018-08-08 Thread 刘硕然

Dear Miklos,

Recently I've been testing FUSE and libfuse example passthrough_ll with 
writeback cache on, and found out that the performance drops significantly 
compared to that in local filesystem. As I can see from trace, 
balance_dirty_pages is triggered very frequently even if there not enough pages 
that shall be sent to libfuse. I'm not sure if this is a known fact or the FUSE 
writeback feature requires some specific configurations. Trace log is attached.

dd-19067 [001]  195295.568097: balance_dirty_pages: bdi 0:42: limit=3180390 
setpoint=2782421 dirty=5 bdi_setpoint=0 bdi_dirty=32 dirty_ratelimit=32 
task_ratelimit=0 dirtied=32 dirtied_pause=32 paused=0 pause=33 period=33 
think=0 cgroup_ino=1
dd-19067 [001]  195295.602029: balance_dirty_pages: bdi 0:42: limit=3180390 
setpoint=2782421 dirty=5 bdi_setpoint=0 bdi_dirty=33 dirty_ratelimit=32 
task_ratelimit=0 dirtied=1 dirtied_pause=0 paused=0 pause=34 period=34 think=1 
cgroup_ino=1
dd-19067 [001]  195295.637026: balance_dirty_pages: bdi 0:42: limit=3180390 
setpoint=2782421 dirty=5 bdi_setpoint=0 bdi_dirty=34 dirty_ratelimit=32 
task_ratelimit=0 dirtied=1 dirtied_pause=0 paused=0 pause=35 period=35 think=1 
cgroup_ino=1

BTW, I'm using Linux kernel 4.17.12 and libfuse 3.2.5. Look forward to hearing 
from you. Thanks in advance.

Regards,
Shuoran Liu

Re: [PATCH] dmaengine: sprd: Support DMA link-list mode

2018-08-08 Thread Baolin Wang

Hi Vinod,

(Comments from Eric Long)
On 9 August 2018 at 10:32, Vinod  wrote:
> On 26-07-18, 16:00, Baolin Wang wrote:
>> From: Eric Long 
>>
>> The Spreadtrum DMA can support the link-list transaction mode, which means
>> DMA controller can do transaction one by one automatically once we linked
>> these transaction by link-list register.
>>
>> Signed-off-by: Eric Long 
>> Signed-off-by: Baolin Wang 
>> ---
>>  drivers/dma/sprd-dma.c   |   82 
>> ++
>>  include/linux/dma/sprd-dma.h |   69 +++
>>  2 files changed, 144 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/dma/sprd-dma.c b/drivers/dma/sprd-dma.c
>> index 55df0d4..649bd2c 100644
>> --- a/drivers/dma/sprd-dma.c
>> +++ b/drivers/dma/sprd-dma.c
>> @@ -68,6 +68,7 @@
>>
>>  /* SPRD_DMA_CHN_CFG register definition */
>>  #define SPRD_DMA_CHN_EN  BIT(0)
>> +#define SPRD_DMA_LINKLIST_EN BIT(4)
>>  #define SPRD_DMA_WAIT_BDONE_OFFSET   24
>>  #define SPRD_DMA_DONOT_WAIT_BDONE1
>>
>> @@ -103,7 +104,7 @@
>>  #define SPRD_DMA_REQ_MODE_MASK   GENMASK(1, 0)
>>  #define SPRD_DMA_FIX_SEL_OFFSET  21
>>  #define SPRD_DMA_FIX_EN_OFFSET   20
>> -#define SPRD_DMA_LLIST_END_OFFSET19
>> +#define SPRD_DMA_LLIST_END   BIT(19)
>>  #define SPRD_DMA_FRG_LEN_MASKGENMASK(16, 0)
>>
>>  /* SPRD_DMA_CHN_BLK_LEN register definition */
>> @@ -164,6 +165,7 @@ struct sprd_dma_desc {
>>  struct sprd_dma_chn {
>>   struct virt_dma_chanvc;
>>   void __iomem*chn_base;
>> + struct sprd_dma_linklistlinklist;
>>   struct dma_slave_config slave_cfg;
>>   u32 chn_num;
>>   u32 dev_id;
>> @@ -582,7 +584,8 @@ static int sprd_dma_get_step(enum dma_slave_buswidth 
>> buswidth)
>>  }
>>
>>  static int sprd_dma_fill_desc(struct dma_chan *chan,
>> -   struct sprd_dma_desc *sdesc,
>> +   struct sprd_dma_chn_hw *hw,
>> +   unsigned int sglen, int sg_index,
>> dma_addr_t src, dma_addr_t dst, u32 len,
>> enum dma_transfer_direction dir,
>> unsigned long flags,
>> @@ -590,7 +593,6 @@ static int sprd_dma_fill_desc(struct dma_chan *chan,
>>  {
>>   struct sprd_dma_dev *sdev = to_sprd_dma_dev(chan);
>>   struct sprd_dma_chn *schan = to_sprd_dma_chan(chan);
>> - struct sprd_dma_chn_hw *hw = >chn_hw;
>>   u32 req_mode = (flags >> SPRD_DMA_REQ_SHIFT) & SPRD_DMA_REQ_MODE_MASK;
>>   u32 int_mode = flags & SPRD_DMA_INT_MASK;
>>   int src_datawidth, dst_datawidth, src_step, dst_step;
>> @@ -670,12 +672,58 @@ static int sprd_dma_fill_desc(struct dma_chan *chan,
>>   temp |= (src_step & SPRD_DMA_TRSF_STEP_MASK) << 
>> SPRD_DMA_SRC_TRSF_STEP_OFFSET;
>>   hw->trsf_step = temp;
>>
>> + /* link-list configuration */
>> + if (schan->linklist.phy_addr) {
>> + if (sg_index == sglen - 1)
>> + hw->frg_len |= SPRD_DMA_LLIST_END;
>> +
>> + hw->cfg |= SPRD_DMA_LINKLIST_EN;
>> + hw->llist_ptr = schan->linklist.phy_addr +
>> + ((sg_index + 1) % sglen) * sizeof(*hw) +
>> + SPRD_DMA_CHN_SRC_ADDR;
>> + } else {
>> + hw->llist_ptr = 0;
>> + }
>> +
>>   hw->frg_step = 0;
>>   hw->src_blk_step = 0;
>>   hw->des_blk_step = 0;
>>   return 0;
>>  }
>>
>> +static int sprd_dma_fill_linklist_desc(struct dma_chan *chan,
>> +unsigned int sglen, int sg_index,
>> +dma_addr_t src, dma_addr_t dst, u32 len,
>> +enum dma_transfer_direction dir,
>> +unsigned long flags,
>> +struct dma_slave_config *slave_cfg)
>> +{
>> + struct sprd_dma_chn *schan = to_sprd_dma_chan(chan);
>> + struct sprd_dma_chn_hw *hw;
>> +
>> + if (sglen < 2 || !schan->linklist.virt_addr)
>> + return -EINVAL;
>
> why the limitation of 2 here?

Spreadtrum DMA link-list mode needs more than 2 link-list
configurations. If there is only one sg, it doesn't need to fill the
link-list configuration. Moreover, we have valided the sg length in
sprd_dma_prep_slave_sg(), so will remove the redundant validation
here.

>
>> +
>> + hw = (struct sprd_dma_chn_hw *)(schan->linklist.virt_addr +
>> + sg_index * sizeof(*hw));
>> +
>> + return sprd_dma_fill_desc(chan, hw, sglen, sg_index, src, dst, len, 
>> dir,
>> +   flags, slave_cfg);
>> +}
>> +
>> +static int sprd_dma_fill_chn_desc(struct dma_chan *chan,
>> +   struct sprd_dma_desc *sdesc,
>> +   dma_addr_t src, dma_addr_t dst, u32 len,

Re: [PATCH] dmaengine: sprd: Support DMA link-list mode

2018-08-08 Thread Baolin Wang

Hi Vinod,

(Comments from Eric Long)
On 9 August 2018 at 10:32, Vinod  wrote:
> On 26-07-18, 16:00, Baolin Wang wrote:
>> From: Eric Long 
>>
>> The Spreadtrum DMA can support the link-list transaction mode, which means
>> DMA controller can do transaction one by one automatically once we linked
>> these transaction by link-list register.
>>
>> Signed-off-by: Eric Long 
>> Signed-off-by: Baolin Wang 
>> ---
>>  drivers/dma/sprd-dma.c   |   82 
>> ++
>>  include/linux/dma/sprd-dma.h |   69 +++
>>  2 files changed, 144 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/dma/sprd-dma.c b/drivers/dma/sprd-dma.c
>> index 55df0d4..649bd2c 100644
>> --- a/drivers/dma/sprd-dma.c
>> +++ b/drivers/dma/sprd-dma.c
>> @@ -68,6 +68,7 @@
>>
>>  /* SPRD_DMA_CHN_CFG register definition */
>>  #define SPRD_DMA_CHN_EN  BIT(0)
>> +#define SPRD_DMA_LINKLIST_EN BIT(4)
>>  #define SPRD_DMA_WAIT_BDONE_OFFSET   24
>>  #define SPRD_DMA_DONOT_WAIT_BDONE1
>>
>> @@ -103,7 +104,7 @@
>>  #define SPRD_DMA_REQ_MODE_MASK   GENMASK(1, 0)
>>  #define SPRD_DMA_FIX_SEL_OFFSET  21
>>  #define SPRD_DMA_FIX_EN_OFFSET   20
>> -#define SPRD_DMA_LLIST_END_OFFSET19
>> +#define SPRD_DMA_LLIST_END   BIT(19)
>>  #define SPRD_DMA_FRG_LEN_MASKGENMASK(16, 0)
>>
>>  /* SPRD_DMA_CHN_BLK_LEN register definition */
>> @@ -164,6 +165,7 @@ struct sprd_dma_desc {
>>  struct sprd_dma_chn {
>>   struct virt_dma_chanvc;
>>   void __iomem*chn_base;
>> + struct sprd_dma_linklistlinklist;
>>   struct dma_slave_config slave_cfg;
>>   u32 chn_num;
>>   u32 dev_id;
>> @@ -582,7 +584,8 @@ static int sprd_dma_get_step(enum dma_slave_buswidth 
>> buswidth)
>>  }
>>
>>  static int sprd_dma_fill_desc(struct dma_chan *chan,
>> -   struct sprd_dma_desc *sdesc,
>> +   struct sprd_dma_chn_hw *hw,
>> +   unsigned int sglen, int sg_index,
>> dma_addr_t src, dma_addr_t dst, u32 len,
>> enum dma_transfer_direction dir,
>> unsigned long flags,
>> @@ -590,7 +593,6 @@ static int sprd_dma_fill_desc(struct dma_chan *chan,
>>  {
>>   struct sprd_dma_dev *sdev = to_sprd_dma_dev(chan);
>>   struct sprd_dma_chn *schan = to_sprd_dma_chan(chan);
>> - struct sprd_dma_chn_hw *hw = >chn_hw;
>>   u32 req_mode = (flags >> SPRD_DMA_REQ_SHIFT) & SPRD_DMA_REQ_MODE_MASK;
>>   u32 int_mode = flags & SPRD_DMA_INT_MASK;
>>   int src_datawidth, dst_datawidth, src_step, dst_step;
>> @@ -670,12 +672,58 @@ static int sprd_dma_fill_desc(struct dma_chan *chan,
>>   temp |= (src_step & SPRD_DMA_TRSF_STEP_MASK) << 
>> SPRD_DMA_SRC_TRSF_STEP_OFFSET;
>>   hw->trsf_step = temp;
>>
>> + /* link-list configuration */
>> + if (schan->linklist.phy_addr) {
>> + if (sg_index == sglen - 1)
>> + hw->frg_len |= SPRD_DMA_LLIST_END;
>> +
>> + hw->cfg |= SPRD_DMA_LINKLIST_EN;
>> + hw->llist_ptr = schan->linklist.phy_addr +
>> + ((sg_index + 1) % sglen) * sizeof(*hw) +
>> + SPRD_DMA_CHN_SRC_ADDR;
>> + } else {
>> + hw->llist_ptr = 0;
>> + }
>> +
>>   hw->frg_step = 0;
>>   hw->src_blk_step = 0;
>>   hw->des_blk_step = 0;
>>   return 0;
>>  }
>>
>> +static int sprd_dma_fill_linklist_desc(struct dma_chan *chan,
>> +unsigned int sglen, int sg_index,
>> +dma_addr_t src, dma_addr_t dst, u32 len,
>> +enum dma_transfer_direction dir,
>> +unsigned long flags,
>> +struct dma_slave_config *slave_cfg)
>> +{
>> + struct sprd_dma_chn *schan = to_sprd_dma_chan(chan);
>> + struct sprd_dma_chn_hw *hw;
>> +
>> + if (sglen < 2 || !schan->linklist.virt_addr)
>> + return -EINVAL;
>
> why the limitation of 2 here?

Spreadtrum DMA link-list mode needs more than 2 link-list
configurations. If there is only one sg, it doesn't need to fill the
link-list configuration. Moreover, we have valided the sg length in
sprd_dma_prep_slave_sg(), so will remove the redundant validation
here.

>
>> +
>> + hw = (struct sprd_dma_chn_hw *)(schan->linklist.virt_addr +
>> + sg_index * sizeof(*hw));
>> +
>> + return sprd_dma_fill_desc(chan, hw, sglen, sg_index, src, dst, len, 
>> dir,
>> +   flags, slave_cfg);
>> +}
>> +
>> +static int sprd_dma_fill_chn_desc(struct dma_chan *chan,
>> +   struct sprd_dma_desc *sdesc,
>> +   dma_addr_t src, dma_addr_t dst, u32 len,

Re: [PATCH 2/2] spi: sprd: Add SPI driver for Spreadtrum SC9860

2018-08-08 Thread Baolin Wang

Hi Trent,

On 9 August 2018 at 03:08, Trent Piepho  wrote:
> On Wed, 2018-08-08 at 11:19 +0800, Baolin Wang wrote:
>> On 8 August 2018 at 01:10, Trent Piepho  wrote:
>> > On Tue, 2018-08-07 at 18:43 +0800, Baolin Wang wrote:
>> > >
>> > > +static u32 sprd_spi_transfer_max_timeout(struct sprd_spi *ss,
>> > > +  struct spi_transfer *t)
>> > > +{
>> > > + /*
>> > > +  * The time spent on transmission of the full FIFO data is the 
>> > > maximum
>> > > +  * SPI transmission time.
>> > > +  */
>> > > + u32 size = t->bits_per_word * SPRD_SPI_FIFO_SIZE;
>> > > + u32 bit_time_us = SPRD_SPI_HZ / t->speed_hz + 1;
>
> There's another flaw here in that the transfer speed, t->speed_hz,
> might not be exactly what is used, due to limitations of the clock
> divider.  It would be better to find the actual SPI clock used, then
> use that in the calculations.

Right, will use the real speed to calculate the transfer time.

>
>> > > + u32 total_time_us = size * bit_time_us;
>> > > + /*
>> > > +  * There is an interval between data and the data in our SPI 
>> > > hardware,
>> > > +  * so the total transmission time need add the interval time.
>> > > +  *
>> > > +  * The inteval calculation formula:
>> > > +  * interval time (source clock cycles) = interval * 4 + 10.
>> > > +  */
>> > > + u32 interval_cycle = SPRD_SPI_FIFO_SIZE * ((ss->interval << 2) + 
>> > > 10);
>> > > + u32 interval_time_us = interval_cycle * SPRD_SPI_HZ / ss->src_clk 
>> > > + 1;
>> >
>> > If interval is greater than 31, this will overflow.
>>
>> Usually we will not set such a big value for interval,  but this is a
>> risk like you said. So we will check and limit the interval value to
>> make sure the formula will not overflow.
>>
>
> Better would be to limit the inter word delay to whatever maximum value
> your hardware supports, and then write code that can calculate that
> without error.

Yes, will define the maximum word delay values to avoid overflow.

>
>> > > +static void sprd_spi_set_speed(struct sprd_spi *ss, u32 speed_hz)
>> > > +{
>> > > + /*
>> > > +  * From SPI datasheet, the prescale calculation formula:
>> > > +  * prescale = SPI source clock / (2 * SPI_freq) - 1;
>> > > +  */
>> > > + u32 clk_div = ss->src_clk / (speed_hz << 1) - 1;
>> >
>> > You should probably round up here.  The convention is to use the
>> > closest speed less than equal to the requested speed, but since this is
>> > a divider, rounding it down will select the next highest speed.
>>
>> Per the datasheet, we do not need round up/down the speed. Since our
>> hardware can handle the clock calculated by the above formula if the
>> requested speed is in the normal region (less than ->max_speed_hz).
>
> That is not what I mean.  Let me explain differently.
>
> An integer divider like this can not produce any frequency exactly.
> Consider if src_clk is 10 MHz.  A clk_div value of 0 produces a 5 MHz
> SPI clock.  A clk_div value of 1 produces a 2.5 MHz SPI clock.
>
> What if the transfer requests a SPI clock of 3 MHz?
>
> Your math above will produce a SPI clock of 5 MHz, faster than
> requested.  This is not the convention in Linux SPI masters.  You
> should instead of have chosen a clk_div value of 1 to get a SPI clock
> of 2.5 MHz, the closest clock possible that is not greater than the
> requested value.
>
> To do this, round up.
>
> clk_div = DIV_ROUND_UP(ss->src_clk, speed_hz * 2) - 1;

Thanks for your patient explanation. After talking with Lanqing who is
the author of the SPI driver, we think you are definitely correct and
will fix in next version according to your suggestion. Thanks a lot.

-- 
Baolin Wang
Best Regards

Re: [PATCH 2/2] spi: sprd: Add SPI driver for Spreadtrum SC9860

2018-08-08 Thread Baolin Wang

Hi Trent,

On 9 August 2018 at 03:08, Trent Piepho  wrote:
> On Wed, 2018-08-08 at 11:19 +0800, Baolin Wang wrote:
>> On 8 August 2018 at 01:10, Trent Piepho  wrote:
>> > On Tue, 2018-08-07 at 18:43 +0800, Baolin Wang wrote:
>> > >
>> > > +static u32 sprd_spi_transfer_max_timeout(struct sprd_spi *ss,
>> > > +  struct spi_transfer *t)
>> > > +{
>> > > + /*
>> > > +  * The time spent on transmission of the full FIFO data is the 
>> > > maximum
>> > > +  * SPI transmission time.
>> > > +  */
>> > > + u32 size = t->bits_per_word * SPRD_SPI_FIFO_SIZE;
>> > > + u32 bit_time_us = SPRD_SPI_HZ / t->speed_hz + 1;
>
> There's another flaw here in that the transfer speed, t->speed_hz,
> might not be exactly what is used, due to limitations of the clock
> divider.  It would be better to find the actual SPI clock used, then
> use that in the calculations.

Right, will use the real speed to calculate the transfer time.

>
>> > > + u32 total_time_us = size * bit_time_us;
>> > > + /*
>> > > +  * There is an interval between data and the data in our SPI 
>> > > hardware,
>> > > +  * so the total transmission time need add the interval time.
>> > > +  *
>> > > +  * The inteval calculation formula:
>> > > +  * interval time (source clock cycles) = interval * 4 + 10.
>> > > +  */
>> > > + u32 interval_cycle = SPRD_SPI_FIFO_SIZE * ((ss->interval << 2) + 
>> > > 10);
>> > > + u32 interval_time_us = interval_cycle * SPRD_SPI_HZ / ss->src_clk 
>> > > + 1;
>> >
>> > If interval is greater than 31, this will overflow.
>>
>> Usually we will not set such a big value for interval,  but this is a
>> risk like you said. So we will check and limit the interval value to
>> make sure the formula will not overflow.
>>
>
> Better would be to limit the inter word delay to whatever maximum value
> your hardware supports, and then write code that can calculate that
> without error.

Yes, will define the maximum word delay values to avoid overflow.

>
>> > > +static void sprd_spi_set_speed(struct sprd_spi *ss, u32 speed_hz)
>> > > +{
>> > > + /*
>> > > +  * From SPI datasheet, the prescale calculation formula:
>> > > +  * prescale = SPI source clock / (2 * SPI_freq) - 1;
>> > > +  */
>> > > + u32 clk_div = ss->src_clk / (speed_hz << 1) - 1;
>> >
>> > You should probably round up here.  The convention is to use the
>> > closest speed less than equal to the requested speed, but since this is
>> > a divider, rounding it down will select the next highest speed.
>>
>> Per the datasheet, we do not need round up/down the speed. Since our
>> hardware can handle the clock calculated by the above formula if the
>> requested speed is in the normal region (less than ->max_speed_hz).
>
> That is not what I mean.  Let me explain differently.
>
> An integer divider like this can not produce any frequency exactly.
> Consider if src_clk is 10 MHz.  A clk_div value of 0 produces a 5 MHz
> SPI clock.  A clk_div value of 1 produces a 2.5 MHz SPI clock.
>
> What if the transfer requests a SPI clock of 3 MHz?
>
> Your math above will produce a SPI clock of 5 MHz, faster than
> requested.  This is not the convention in Linux SPI masters.  You
> should instead of have chosen a clk_div value of 1 to get a SPI clock
> of 2.5 MHz, the closest clock possible that is not greater than the
> requested value.
>
> To do this, round up.
>
> clk_div = DIV_ROUND_UP(ss->src_clk, speed_hz * 2) - 1;

Thanks for your patient explanation. After talking with Lanqing who is
the author of the SPI driver, we think you are definitely correct and
will fix in next version according to your suggestion. Thanks a lot.

-- 
Baolin Wang
Best Regards

Re: [PATCH 1/2] dt-bindings: spi: Add Spreadtrum SPI controller documentation

2018-08-08 Thread Baolin Wang

Hi Trent,

On 9 August 2018 at 02:57, Trent Piepho  wrote:
> On Wed, 2018-08-08 at 11:54 +0100, Mark Brown wrote:
>> On Wed, Aug 08, 2018 at 06:35:28PM +0800, Baolin Wang wrote:
>> > On 8 August 2018 at 17:50, Mark Brown  wrote:
>> > > Right, I don't think we added this yet (if we did I can't see
>> > > it).  I'd
>> > > add a new field to spi_transfer for this, then other controllers
>> > > with
>> > > the same support can implement it as well and drivers can start
>> > > using
>> > > it too.
>> > OK. So I will name the new filed as 'word_delay', is it OK for you?
>>
>> Sounds good, yes.
>
> Should it be in µs like the existing iter-transfer delay?  I think
> perhaps units of cycles of the SPI clock make more sense?

Since some SPI controllers just want some interval values (neither µs
unit nor cycles unit ) set into hardware, and the hardware will
convert to the correct delay time automatically. So I did not force
'word_delay' unit as µs or cycle, and just let the slave devices
decide the unit which depends on the SPI hardware requirement.

-- 
Baolin Wang
Best Regards

Re: [PATCH 1/2] dt-bindings: spi: Add Spreadtrum SPI controller documentation

2018-08-08 Thread Baolin Wang

Hi Trent,

On 9 August 2018 at 02:57, Trent Piepho  wrote:
> On Wed, 2018-08-08 at 11:54 +0100, Mark Brown wrote:
>> On Wed, Aug 08, 2018 at 06:35:28PM +0800, Baolin Wang wrote:
>> > On 8 August 2018 at 17:50, Mark Brown  wrote:
>> > > Right, I don't think we added this yet (if we did I can't see
>> > > it).  I'd
>> > > add a new field to spi_transfer for this, then other controllers
>> > > with
>> > > the same support can implement it as well and drivers can start
>> > > using
>> > > it too.
>> > OK. So I will name the new filed as 'word_delay', is it OK for you?
>>
>> Sounds good, yes.
>
> Should it be in µs like the existing iter-transfer delay?  I think
> perhaps units of cycles of the SPI clock make more sense?

Since some SPI controllers just want some interval values (neither µs
unit nor cycles unit ) set into hardware, and the hardware will
convert to the correct delay time automatically. So I did not force
'word_delay' unit as µs or cycle, and just let the slave devices
decide the unit which depends on the SPI hardware requirement.

-- 
Baolin Wang
Best Regards

[PATCH v2 2/2] ASoC: trace: track dapm type in snd_soc_dapm_widget

2018-08-08 Thread Liu, Changcheng

Track snd_soc_dapm_widget:id which is used as dapm
sequence index in dapm_down_seq/dapm_up_seq. It's
useful for checking dapm seq after tracking it.

Signed-off-by: Liu Changcheng 

diff --git a/include/trace/events/asoc.h b/include/trace/events/asoc.h
index bf17edc..7c4e8de 100644
--- a/include/trace/events/asoc.h
+++ b/include/trace/events/asoc.h
@@ -92,16 +92,18 @@ DECLARE_EVENT_CLASS(snd_soc_dapm_widget,
 
TP_STRUCT__entry(
__string(   name,   w->name )
+   __field(int,id  )
__field(int,val )
),
 
TP_fast_assign(
__assign_str(name, w->name);
+   __entry->id = (int)w->id;
__entry->val = val;
),
 
-   TP_printk("widget=%s val=%d", __get_str(name),
-   __entry->val)
+   TP_printk("widget=%s dapm_id=%d val=%d", __get_str(name),
+   __entry->id, __entry->val)
 );
 
 DEFINE_EVENT(snd_soc_dapm_widget, snd_soc_dapm_widget_power,
-- 
2.7.4

[PATCH v2 2/2] ASoC: trace: track dapm type in snd_soc_dapm_widget

2018-08-08 Thread Liu, Changcheng

Track snd_soc_dapm_widget:id which is used as dapm
sequence index in dapm_down_seq/dapm_up_seq. It's
useful for checking dapm seq after tracking it.

Signed-off-by: Liu Changcheng 

diff --git a/include/trace/events/asoc.h b/include/trace/events/asoc.h
index bf17edc..7c4e8de 100644
--- a/include/trace/events/asoc.h
+++ b/include/trace/events/asoc.h
@@ -92,16 +92,18 @@ DECLARE_EVENT_CLASS(snd_soc_dapm_widget,
 
TP_STRUCT__entry(
__string(   name,   w->name )
+   __field(int,id  )
__field(int,val )
),
 
TP_fast_assign(
__assign_str(name, w->name);
+   __entry->id = (int)w->id;
__entry->val = val;
),
 
-   TP_printk("widget=%s val=%d", __get_str(name),
-   __entry->val)
+   TP_printk("widget=%s dapm_id=%d val=%d", __get_str(name),
+   __entry->id, __entry->val)
 );
 
 DEFINE_EVENT(snd_soc_dapm_widget, snd_soc_dapm_widget_power,
-- 
2.7.4

[PATCH v2 1/2] ASoC: trace: remove unnecessary typecast

2018-08-08 Thread Liu, Changcheng

There's no need to do typecast since the defined type matches the output
fromat requirement.

Signed-off-by: Liu Changcheng 

diff --git a/include/trace/events/asoc.h b/include/trace/events/asoc.h
index 40c300f..bf17edc 100644
--- a/include/trace/events/asoc.h
+++ b/include/trace/events/asoc.h
@@ -32,7 +32,7 @@ DECLARE_EVENT_CLASS(snd_soc_card,
__entry->val = val;
),
 
-   TP_printk("card=%s val=%d", __get_str(name), (int)__entry->val)
+   TP_printk("card=%s val=%d", __get_str(name), __entry->val)
 );
 
 DEFINE_EVENT(snd_soc_card, snd_soc_bias_level_start,
@@ -101,7 +101,7 @@ DECLARE_EVENT_CLASS(snd_soc_dapm_widget,
),
 
TP_printk("widget=%s val=%d", __get_str(name),
- (int)__entry->val)
+   __entry->val)
 );
 
 DEFINE_EVENT(snd_soc_dapm_widget, snd_soc_dapm_widget_power,
@@ -149,8 +149,8 @@ TRACE_EVENT(snd_soc_dapm_walk_done,
),
 
TP_printk("%s: checks %d power, %d path, %d neighbour",
- __get_str(name), (int)__entry->power_checks,
- (int)__entry->path_checks, (int)__entry->neighbour_checks)
+   __get_str(name), __entry->power_checks,
+   __entry->path_checks, __entry->neighbour_checks)
 );
 
 TRACE_EVENT(snd_soc_dapm_path,
@@ -180,8 +180,7 @@ TRACE_EVENT(snd_soc_dapm_path,
),
 
TP_printk("%c%s %s %s %s %s",
-   (int) __entry->path_node &&
-   (int) __entry->path_connect ? '*' : ' ',
+   __entry->path_node && __entry->path_connect ? '*' : ' ',
__get_str(wname), DAPM_ARROW(__entry->path_dir),
__get_str(pname), DAPM_ARROW(__entry->path_dir),
__get_str(pnname))
@@ -242,8 +241,8 @@ TRACE_EVENT(snd_soc_jack_report,
__entry->val = val;
),
 
-   TP_printk("jack=%s %x/%x", __get_str(name), (int)__entry->val,
- (int)__entry->mask)
+   TP_printk("jack=%s %x/%x", __get_str(name), __entry->val,
+   __entry->mask)
 );
 
 TRACE_EVENT(snd_soc_jack_notify,
@@ -262,7 +261,7 @@ TRACE_EVENT(snd_soc_jack_notify,
__entry->val = val;
),
 
-   TP_printk("jack=%s %x", __get_str(name), (int)__entry->val)
+   TP_printk("jack=%s %x", __get_str(name), __entry->val)
 );
 
 #endif /* _TRACE_ASOC_H */
-- 
2.7.4

[PATCH v2 1/2] ASoC: trace: remove unnecessary typecast

2018-08-08 Thread Liu, Changcheng

There's no need to do typecast since the defined type matches the output
fromat requirement.

Signed-off-by: Liu Changcheng 

diff --git a/include/trace/events/asoc.h b/include/trace/events/asoc.h
index 40c300f..bf17edc 100644
--- a/include/trace/events/asoc.h
+++ b/include/trace/events/asoc.h
@@ -32,7 +32,7 @@ DECLARE_EVENT_CLASS(snd_soc_card,
__entry->val = val;
),
 
-   TP_printk("card=%s val=%d", __get_str(name), (int)__entry->val)
+   TP_printk("card=%s val=%d", __get_str(name), __entry->val)
 );
 
 DEFINE_EVENT(snd_soc_card, snd_soc_bias_level_start,
@@ -101,7 +101,7 @@ DECLARE_EVENT_CLASS(snd_soc_dapm_widget,
),
 
TP_printk("widget=%s val=%d", __get_str(name),
- (int)__entry->val)
+   __entry->val)
 );
 
 DEFINE_EVENT(snd_soc_dapm_widget, snd_soc_dapm_widget_power,
@@ -149,8 +149,8 @@ TRACE_EVENT(snd_soc_dapm_walk_done,
),
 
TP_printk("%s: checks %d power, %d path, %d neighbour",
- __get_str(name), (int)__entry->power_checks,
- (int)__entry->path_checks, (int)__entry->neighbour_checks)
+   __get_str(name), __entry->power_checks,
+   __entry->path_checks, __entry->neighbour_checks)
 );
 
 TRACE_EVENT(snd_soc_dapm_path,
@@ -180,8 +180,7 @@ TRACE_EVENT(snd_soc_dapm_path,
),
 
TP_printk("%c%s %s %s %s %s",
-   (int) __entry->path_node &&
-   (int) __entry->path_connect ? '*' : ' ',
+   __entry->path_node && __entry->path_connect ? '*' : ' ',
__get_str(wname), DAPM_ARROW(__entry->path_dir),
__get_str(pname), DAPM_ARROW(__entry->path_dir),
__get_str(pnname))
@@ -242,8 +241,8 @@ TRACE_EVENT(snd_soc_jack_report,
__entry->val = val;
),
 
-   TP_printk("jack=%s %x/%x", __get_str(name), (int)__entry->val,
- (int)__entry->mask)
+   TP_printk("jack=%s %x/%x", __get_str(name), __entry->val,
+   __entry->mask)
 );
 
 TRACE_EVENT(snd_soc_jack_notify,
@@ -262,7 +261,7 @@ TRACE_EVENT(snd_soc_jack_notify,
__entry->val = val;
),
 
-   TP_printk("jack=%s %x", __get_str(name), (int)__entry->val)
+   TP_printk("jack=%s %x", __get_str(name), __entry->val)
 );
 
 #endif /* _TRACE_ASOC_H */
-- 
2.7.4

[PATCH v2 0/2] ASoC: trace: format file and add trace field

2018-08-08 Thread Liu, Changcheng

1. Some typecast are not needed 
2. trace dapm type in snd_soc_dapm_widget class event

Liu Changcheng (2):
  ASoC: trace: remove unnecessary typecast
  ASoC: trace: track dapm type in snd_soc_dapm_widget

v1->v2
  correct code style to meet kernel requirement.

 include/trace/events/asoc.h | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

-- 
2.7.4

Re: [PATCH 0/4][RFC v2] Introduce the in-kernel hibernation encryption

2018-08-08 Thread Yu Chen

Hi Pavel, Joey, Oliver
Please let me describe the original requirement and my
understanding about hibernation encryption here, thus
help us sync on the same thread:
On Wed, Aug 08, 2018 at 07:50:36PM +0200, Pavel Machek wrote:
> Hi!
> 
> > > > > User space doesn't need to involve. The EFI root key is generated by
> > > > > EFI boot stub and be transfer to kernel. It's stored in EFI boot 
> > > > > service
> > > > > variable that it can only be accessed by trusted EFI binary when
> > > > > secure boot is enabled.
> > > > >
> > > > Okay, this apply to the 'suspend' phase, right?
> > > > I'm still a little confused about the 'resume' phase.
> > > > Taking encryption as example(not signature),
> > > > the purpose of doing hibernation encryption is to prevent other users
> > > > from stealing ram content. Say, user A uses a  passphrase to generate 
> > > > the
> > > 
> > > No, I don't think that's purpose here.
> > > 
> > > Purpose here is to prevent user from reading/modifying kernel memory
> > > content on machine he owns.
> > >
> > Say, A puts his laptop into hibernation and walks away,
> > and B walks by, and opens A's laptop and wakes up the system and he
> > can do what he wants. Although EFI key/TPM trusted key is enabled,
> > currently there's no certification during resume, which sounds
> > unsafe to me. Afterall, the original requirement is to probe
> 
> Define unsafe.
> 
> If you want security against bad people resuming your machines, please
Yes, this is one of the requirements.
> take a look at existing uswsusp solutions. It defends against that.
>
> If you want security against bad people tampering with your machines
> physically, sorry, there's no way to defend against that.
No, this is not the requirement.
> 
> But I thought you were trying to do something for secure boot, and "bad
> person resumes your machine" is out of scope there.
> 
Not exactly, secure boot is one solution to meet the requirement.
> So please always explain security against _what kind of attack_ you
> are trying to improve; intelligent communication is not possible
> without that.
> 
User requirement:
A is the user, B is the attacker, user A launches a STD and
encrypts A's ram data, then writes these encrypted data onto
the disk,  so that: Even if user B has access to the disk,
B could not know the content of A. Which implies:
1. If B unplugs the disk from A's machine, and plugs the disk onto
   another machine, B could not decode the content without A's
   'permission'.
2. If B is using the same machine as A, even A has walked away leaving
   the system suspend, B could not resume to A's context without
   A's 'permission'.

Previously, there are three proposal for this:
a. Enhance the uswsusp(Pavel)
b. Using user provided password to generate the key, for encryption(Yu)
c. Using security boot(TPM or EFI key) for encryption(Joey)

Since I was proposing solution b, I'll say a little more about it.
The original idea was that, the user provides a password, then this
password is used to generate the key, which means, if user B has provided
an incorrect password, the kernel will fail to decrypt the data and is
likely to fail the resume process. That is to say, no matter
which physical machine B is using, only if he has provided the
password, he would be able to resume. In the first version, the key
deviration was firstly done in kernel space, which satisfies the
requirement and both saftey. Unfortunately it was rejected and
people would like to see the key generated in user space instead.
However, using user provided key directly is not safe, according
to the discussion in the thread. I don't have good idea on
how to improve this, but only have some workarounds, say, ask the
kernel to use TPM key to protects the user provided 'key', etc.

Then let's talk a little more about secure boot. According
to my understanding, the situation secure boot tries to deal
with is a little different from the user case we raised above -
It is an enhancement for case 1, because it refuses to resume
once the machine is changed. And it does not cover case 2. But
if it is a requirement from the user, that's ok.

uswsusp is to do all the staff in user space, and other two
aim to do all the staff in kernel space. I'm not arguing
which one is better, but I'm not sure how often user is using
it, as we don't get uswsusp related bug report on kernel
bugzilla (both internally)recent years. Another point is,
why the compression is in kernel rather than in uswsusp,
it looks like the same case as encryption here.

Best,
Yu

[PATCH v2 0/2] ASoC: trace: format file and add trace field

2018-08-08 Thread Liu, Changcheng

1. Some typecast are not needed 
2. trace dapm type in snd_soc_dapm_widget class event

Liu Changcheng (2):
  ASoC: trace: remove unnecessary typecast
  ASoC: trace: track dapm type in snd_soc_dapm_widget

v1->v2
  correct code style to meet kernel requirement.

 include/trace/events/asoc.h | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

-- 
2.7.4

Re: [PATCH 0/4][RFC v2] Introduce the in-kernel hibernation encryption

2018-08-08 Thread Yu Chen

Hi Pavel, Joey, Oliver
Please let me describe the original requirement and my
understanding about hibernation encryption here, thus
help us sync on the same thread:
On Wed, Aug 08, 2018 at 07:50:36PM +0200, Pavel Machek wrote:
> Hi!
> 
> > > > > User space doesn't need to involve. The EFI root key is generated by
> > > > > EFI boot stub and be transfer to kernel. It's stored in EFI boot 
> > > > > service
> > > > > variable that it can only be accessed by trusted EFI binary when
> > > > > secure boot is enabled.
> > > > >
> > > > Okay, this apply to the 'suspend' phase, right?
> > > > I'm still a little confused about the 'resume' phase.
> > > > Taking encryption as example(not signature),
> > > > the purpose of doing hibernation encryption is to prevent other users
> > > > from stealing ram content. Say, user A uses a  passphrase to generate 
> > > > the
> > > 
> > > No, I don't think that's purpose here.
> > > 
> > > Purpose here is to prevent user from reading/modifying kernel memory
> > > content on machine he owns.
> > >
> > Say, A puts his laptop into hibernation and walks away,
> > and B walks by, and opens A's laptop and wakes up the system and he
> > can do what he wants. Although EFI key/TPM trusted key is enabled,
> > currently there's no certification during resume, which sounds
> > unsafe to me. Afterall, the original requirement is to probe
> 
> Define unsafe.
> 
> If you want security against bad people resuming your machines, please
Yes, this is one of the requirements.
> take a look at existing uswsusp solutions. It defends against that.
>
> If you want security against bad people tampering with your machines
> physically, sorry, there's no way to defend against that.
No, this is not the requirement.
> 
> But I thought you were trying to do something for secure boot, and "bad
> person resumes your machine" is out of scope there.
> 
Not exactly, secure boot is one solution to meet the requirement.
> So please always explain security against _what kind of attack_ you
> are trying to improve; intelligent communication is not possible
> without that.
> 
User requirement:
A is the user, B is the attacker, user A launches a STD and
encrypts A's ram data, then writes these encrypted data onto
the disk,  so that: Even if user B has access to the disk,
B could not know the content of A. Which implies:
1. If B unplugs the disk from A's machine, and plugs the disk onto
   another machine, B could not decode the content without A's
   'permission'.
2. If B is using the same machine as A, even A has walked away leaving
   the system suspend, B could not resume to A's context without
   A's 'permission'.

Previously, there are three proposal for this:
a. Enhance the uswsusp(Pavel)
b. Using user provided password to generate the key, for encryption(Yu)
c. Using security boot(TPM or EFI key) for encryption(Joey)

Since I was proposing solution b, I'll say a little more about it.
The original idea was that, the user provides a password, then this
password is used to generate the key, which means, if user B has provided
an incorrect password, the kernel will fail to decrypt the data and is
likely to fail the resume process. That is to say, no matter
which physical machine B is using, only if he has provided the
password, he would be able to resume. In the first version, the key
deviration was firstly done in kernel space, which satisfies the
requirement and both saftey. Unfortunately it was rejected and
people would like to see the key generated in user space instead.
However, using user provided key directly is not safe, according
to the discussion in the thread. I don't have good idea on
how to improve this, but only have some workarounds, say, ask the
kernel to use TPM key to protects the user provided 'key', etc.

Then let's talk a little more about secure boot. According
to my understanding, the situation secure boot tries to deal
with is a little different from the user case we raised above -
It is an enhancement for case 1, because it refuses to resume
once the machine is changed. And it does not cover case 2. But
if it is a requirement from the user, that's ok.

uswsusp is to do all the staff in user space, and other two
aim to do all the staff in kernel space. I'm not arguing
which one is better, but I'm not sure how often user is using
it, as we don't get uswsusp related bug report on kernel
bugzilla (both internally)recent years. Another point is,
why the compression is in kernel rather than in uswsusp,
it looks like the same case as encryption here.

Best,
Yu

[PATCH v3] resource: Merge resources on a node when hot-adding memory

2018-08-08 Thread Rashmica Gupta

When hot-removing memory release_mem_region_adjustable() splits
iomem resources if they are not the exact size of the memory being
hot-deleted. Adding this memory back to the kernel adds a new
resource.

Eg a node has memory 0x0 - 0xf. Offlining and hot-removing
1GB from 0xf4000 results in the single resource 0x0-0xf being
split into two resources: 0x0-0xf3fff and 0xf8000-0xf.

When we hot-add the memory back we now have three resources:
0x0-0xf3fff, 0xf4000-0xf7fff, and 0xf8000-0xf.

Now if we try to remove some memory that overlaps these resources,
like 2GB from 0xf4000, release_mem_region_adjustable() fails as it
expects the chunk of memory to be within the boundaries of a single
resource.

This patch adds a function request_resource_and_merge(). This is called
instead of request_resource_conflict() when registering a resource in
add_memory(). It calls request_resource_conflict() and if hot-removing is
enabled (if it isn't we won't get resource fragmentation) we attempt to
merge contiguous resources on the node.

Signed-off-by: Rashmica Gupta 
---
v2->v3: Update Xen balloon, make the commit msg and a comment clearer,
and changed '>' to '>=' when comparing the end of a resource and the
end of a node.

v1->v2: Only attempt to merge resources if hot-remove is enabled.

 drivers/xen/balloon.c  |   3 +-
 include/linux/ioport.h |   2 +
 include/linux/memory_hotplug.h |   2 +-
 kernel/resource.c  | 120 +
 mm/memory_hotplug.c|  22 
 5 files changed, 136 insertions(+), 13 deletions(-)

diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 065f0b607373..9b972b37b0da 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -401,7 +401,8 @@ static enum bp_state reserve_additional_memory(void)
 * callers drop the mutex before trying again.
 */
mutex_unlock(_mutex);
-   rc = add_memory_resource(nid, resource, memhp_auto_online);
+   rc = add_memory_resource(nid, resource->start, resource_size(resource),
+memhp_auto_online);
mutex_lock(_mutex);
 
if (rc) {
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index da0ebaec25f0..f5b93a711e86 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -189,6 +189,8 @@ extern int allocate_resource(struct resource *root, struct 
resource *new,
   resource_size_t,
   resource_size_t),
 void *alignf_data);
+extern struct resource *request_resource_and_merge(struct resource *parent,
+  struct resource *new, int 
nid);
 struct resource *lookup_resource(struct resource *root, resource_size_t start);
 int adjust_resource(struct resource *res, resource_size_t start,
resource_size_t size);
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 4e9828cda7a2..9c00f97c8cc6 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -322,7 +322,7 @@ static inline void remove_memory(int nid, u64 start, u64 
size) {}
 extern int walk_memory_range(unsigned long start_pfn, unsigned long end_pfn,
void *arg, int (*func)(struct memory_block *, void *));
 extern int add_memory(int nid, u64 start, u64 size);
-extern int add_memory_resource(int nid, struct resource *resource, bool 
online);
+extern int add_memory_resource(int nid, u64 start, u64 size, bool online);
 extern int arch_add_memory(int nid, u64 start, u64 size,
struct vmem_altmap *altmap, bool want_memblock);
 extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
diff --git a/kernel/resource.c b/kernel/resource.c
index 30e1bc68503b..a31d3f5bccb7 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -1621,3 +1621,123 @@ static int __init strict_iomem(char *str)
 }
 
 __setup("iomem=", strict_iomem);
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+#ifdef CONFIG_MEMORY_HOTREMOVE
+/*
+ * Attempt to merge resource and it's sibling
+ */
+static int merge_resources(struct resource *res)
+{
+   struct resource *next;
+   struct resource *tmp;
+   uint64_t size;
+   int ret = -EINVAL;
+
+   next = res->sibling;
+
+   /*
+* Not sure how to handle two different children. So only attempt
+* to merge two resources if neither have children, only one has a
+* child or if both have the same child.
+*/
+   if ((res->child && next->child) && (res->child != next->child))
+   return ret;
+
+   if (res->end + 1 != next->start)
+   return ret;
+
+   if (res->flags != next->flags)
+   return ret;
+
+   /* Update sibling and child of resource */
+   res->sibling = next->sibling;
+   tmp

[PATCH v3] resource: Merge resources on a node when hot-adding memory

2018-08-08 Thread Rashmica Gupta

When hot-removing memory release_mem_region_adjustable() splits
iomem resources if they are not the exact size of the memory being
hot-deleted. Adding this memory back to the kernel adds a new
resource.

Eg a node has memory 0x0 - 0xf. Offlining and hot-removing
1GB from 0xf4000 results in the single resource 0x0-0xf being
split into two resources: 0x0-0xf3fff and 0xf8000-0xf.

When we hot-add the memory back we now have three resources:
0x0-0xf3fff, 0xf4000-0xf7fff, and 0xf8000-0xf.

Now if we try to remove some memory that overlaps these resources,
like 2GB from 0xf4000, release_mem_region_adjustable() fails as it
expects the chunk of memory to be within the boundaries of a single
resource.

This patch adds a function request_resource_and_merge(). This is called
instead of request_resource_conflict() when registering a resource in
add_memory(). It calls request_resource_conflict() and if hot-removing is
enabled (if it isn't we won't get resource fragmentation) we attempt to
merge contiguous resources on the node.

Signed-off-by: Rashmica Gupta 
---
v2->v3: Update Xen balloon, make the commit msg and a comment clearer,
and changed '>' to '>=' when comparing the end of a resource and the
end of a node.

v1->v2: Only attempt to merge resources if hot-remove is enabled.

 drivers/xen/balloon.c  |   3 +-
 include/linux/ioport.h |   2 +
 include/linux/memory_hotplug.h |   2 +-
 kernel/resource.c  | 120 +
 mm/memory_hotplug.c|  22 
 5 files changed, 136 insertions(+), 13 deletions(-)

diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 065f0b607373..9b972b37b0da 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -401,7 +401,8 @@ static enum bp_state reserve_additional_memory(void)
 * callers drop the mutex before trying again.
 */
mutex_unlock(_mutex);
-   rc = add_memory_resource(nid, resource, memhp_auto_online);
+   rc = add_memory_resource(nid, resource->start, resource_size(resource),
+memhp_auto_online);
mutex_lock(_mutex);
 
if (rc) {
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index da0ebaec25f0..f5b93a711e86 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -189,6 +189,8 @@ extern int allocate_resource(struct resource *root, struct 
resource *new,
   resource_size_t,
   resource_size_t),
 void *alignf_data);
+extern struct resource *request_resource_and_merge(struct resource *parent,
+  struct resource *new, int 
nid);
 struct resource *lookup_resource(struct resource *root, resource_size_t start);
 int adjust_resource(struct resource *res, resource_size_t start,
resource_size_t size);
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 4e9828cda7a2..9c00f97c8cc6 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -322,7 +322,7 @@ static inline void remove_memory(int nid, u64 start, u64 
size) {}
 extern int walk_memory_range(unsigned long start_pfn, unsigned long end_pfn,
void *arg, int (*func)(struct memory_block *, void *));
 extern int add_memory(int nid, u64 start, u64 size);
-extern int add_memory_resource(int nid, struct resource *resource, bool 
online);
+extern int add_memory_resource(int nid, u64 start, u64 size, bool online);
 extern int arch_add_memory(int nid, u64 start, u64 size,
struct vmem_altmap *altmap, bool want_memblock);
 extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
diff --git a/kernel/resource.c b/kernel/resource.c
index 30e1bc68503b..a31d3f5bccb7 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -1621,3 +1621,123 @@ static int __init strict_iomem(char *str)
 }
 
 __setup("iomem=", strict_iomem);
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+#ifdef CONFIG_MEMORY_HOTREMOVE
+/*
+ * Attempt to merge resource and it's sibling
+ */
+static int merge_resources(struct resource *res)
+{
+   struct resource *next;
+   struct resource *tmp;
+   uint64_t size;
+   int ret = -EINVAL;
+
+   next = res->sibling;
+
+   /*
+* Not sure how to handle two different children. So only attempt
+* to merge two resources if neither have children, only one has a
+* child or if both have the same child.
+*/
+   if ((res->child && next->child) && (res->child != next->child))
+   return ret;
+
+   if (res->end + 1 != next->start)
+   return ret;
+
+   if (res->flags != next->flags)
+   return ret;
+
+   /* Update sibling and child of resource */
+   res->sibling = next->sibling;
+   tmp

RE: [PATCH 0/4] locks: avoid thundering-herd wake-ups

2018-08-08 Thread NeilBrown

On Wed, Aug 08 2018, Frank Filz wrote:

>
> Now here's another question... How does this new logic play with Open
> File Description Locks? Should still be ok since there's a thread
> waiting on each of those.

At this level Posix locks and OFD locks are almost identical - just a
different owner.
So this enhancement should work equally for posix, flock, ofd and even
leases.

Thanks,
NeilBrown

>
> Frank

signature.asc
Description: PGP signature

RE: [PATCH 0/4] locks: avoid thundering-herd wake-ups

2018-08-08 Thread NeilBrown

On Wed, Aug 08 2018, Frank Filz wrote:

>
> Now here's another question... How does this new logic play with Open
> File Description Locks? Should still be ok since there's a thread
> waiting on each of those.

At this level Posix locks and OFD locks are almost identical - just a
different owner.
So this enhancement should work equally for posix, flock, ofd and even
leases.

Thanks,
NeilBrown

>
> Frank

signature.asc
Description: PGP signature

Re: [PATCH v3 3/4] dmaengine: Add Actions Semi Owl family S900 DMA driver

2018-08-08 Thread Vinod

On 26-07-18, 10:36, Manivannan Sadhasivam wrote:
> Add Actions Semi Owl family S900 DMA driver.

Applied, thanks

-- 
~Vinod

Re: [PATCH v3 3/4] dmaengine: Add Actions Semi Owl family S900 DMA driver

2018-08-08 Thread Vinod

On 26-07-18, 10:36, Manivannan Sadhasivam wrote:
> Add Actions Semi Owl family S900 DMA driver.

Applied, thanks

-- 
~Vinod

Re: [PATCH v3 1/4] dt-bindings: dmaengine: Add binding for Actions Semi Owl SoCs

2018-08-08 Thread Vinod

On 26-07-18, 10:36, Manivannan Sadhasivam wrote:
> Add devicetree binding for Actions Semi Owl SoCs DMA controller.

Applied, thanks

-- 
~Vinod

Re: [PATCH v3 1/4] dt-bindings: dmaengine: Add binding for Actions Semi Owl SoCs

2018-08-08 Thread Vinod

On 26-07-18, 10:36, Manivannan Sadhasivam wrote:
> Add devicetree binding for Actions Semi Owl SoCs DMA controller.

Applied, thanks

-- 
~Vinod

[PATCH v1 2/2] ASoC: trace: track dapm type in snd_soc_dapm_widget

2018-08-08 Thread Liu, Changcheng

ASoC: trace: track dapm type in snd_soc_dapm_widget

Track snd_soc_dapm_widget:id which is used as dapm
sequence index in dapm_down_seq/dapm_up_seq. It's
useful for checking dapm seq after tracking it.

Signed-off-by: Liu Changcheng 

diff --git a/include/trace/events/asoc.h b/include/trace/events/asoc.h
index bf17edc..aa08a0b 100644
--- a/include/trace/events/asoc.h
+++ b/include/trace/events/asoc.h
@@ -92,16 +92,18 @@ DECLARE_EVENT_CLASS(snd_soc_dapm_widget,
 
TP_STRUCT__entry(
__string(   name,   w->name )
+   __field(int,id  )
__field(int,val )
),
 
TP_fast_assign(
__assign_str(name, w->name);
+   __entry->id= (int)w->id;
__entry->val = val;
),
 
-   TP_printk("widget=%s val=%d", __get_str(name),
-   __entry->val)
+   TP_printk("widget=%s dapm_id=%d val=%d", __get_str(name),
+   __entry->id, __entry->val)
 );
 
 DEFINE_EVENT(snd_soc_dapm_widget, snd_soc_dapm_widget_power,
-- 
2.7.4

[PATCH v1 2/2] ASoC: trace: track dapm type in snd_soc_dapm_widget

2018-08-08 Thread Liu, Changcheng

ASoC: trace: track dapm type in snd_soc_dapm_widget

Track snd_soc_dapm_widget:id which is used as dapm
sequence index in dapm_down_seq/dapm_up_seq. It's
useful for checking dapm seq after tracking it.

Signed-off-by: Liu Changcheng 

diff --git a/include/trace/events/asoc.h b/include/trace/events/asoc.h
index bf17edc..aa08a0b 100644
--- a/include/trace/events/asoc.h
+++ b/include/trace/events/asoc.h
@@ -92,16 +92,18 @@ DECLARE_EVENT_CLASS(snd_soc_dapm_widget,
 
TP_STRUCT__entry(
__string(   name,   w->name )
+   __field(int,id  )
__field(int,val )
),
 
TP_fast_assign(
__assign_str(name, w->name);
+   __entry->id= (int)w->id;
__entry->val = val;
),
 
-   TP_printk("widget=%s val=%d", __get_str(name),
-   __entry->val)
+   TP_printk("widget=%s dapm_id=%d val=%d", __get_str(name),
+   __entry->id, __entry->val)
 );
 
 DEFINE_EVENT(snd_soc_dapm_widget, snd_soc_dapm_widget_power,
-- 
2.7.4

[PATCH v1 1/2] ASoC: trace: remove unnecessary typecast

2018-08-08 Thread Liu, Changcheng

There's no need to do typecast since the defined type matches the output
fromat requirement.

Signed-off-by: Liu Changcheng 

diff --git a/include/trace/events/asoc.h b/include/trace/events/asoc.h
index 40c300f..bf17edc 100644
--- a/include/trace/events/asoc.h
+++ b/include/trace/events/asoc.h
@@ -32,7 +32,7 @@ DECLARE_EVENT_CLASS(snd_soc_card,
__entry->val = val;
),
 
-   TP_printk("card=%s val=%d", __get_str(name), (int)__entry->val)
+   TP_printk("card=%s val=%d", __get_str(name), __entry->val)
 );
 
 DEFINE_EVENT(snd_soc_card, snd_soc_bias_level_start,
@@ -101,7 +101,7 @@ DECLARE_EVENT_CLASS(snd_soc_dapm_widget,
),
 
TP_printk("widget=%s val=%d", __get_str(name),
- (int)__entry->val)
+   __entry->val)
 );
 
 DEFINE_EVENT(snd_soc_dapm_widget, snd_soc_dapm_widget_power,
@@ -149,8 +149,8 @@ TRACE_EVENT(snd_soc_dapm_walk_done,
),
 
TP_printk("%s: checks %d power, %d path, %d neighbour",
- __get_str(name), (int)__entry->power_checks,
- (int)__entry->path_checks, (int)__entry->neighbour_checks)
+   __get_str(name), __entry->power_checks,
+   __entry->path_checks, __entry->neighbour_checks)
 );
 
 TRACE_EVENT(snd_soc_dapm_path,
@@ -180,8 +180,7 @@ TRACE_EVENT(snd_soc_dapm_path,
),
 
TP_printk("%c%s %s %s %s %s",
-   (int) __entry->path_node &&
-   (int) __entry->path_connect ? '*' : ' ',
+   __entry->path_node && __entry->path_connect ? '*' : ' ',
__get_str(wname), DAPM_ARROW(__entry->path_dir),
__get_str(pname), DAPM_ARROW(__entry->path_dir),
__get_str(pnname))
@@ -242,8 +241,8 @@ TRACE_EVENT(snd_soc_jack_report,
__entry->val = val;
),
 
-   TP_printk("jack=%s %x/%x", __get_str(name), (int)__entry->val,
- (int)__entry->mask)
+   TP_printk("jack=%s %x/%x", __get_str(name), __entry->val,
+   __entry->mask)
 );
 
 TRACE_EVENT(snd_soc_jack_notify,
@@ -262,7 +261,7 @@ TRACE_EVENT(snd_soc_jack_notify,
__entry->val = val;
),
 
-   TP_printk("jack=%s %x", __get_str(name), (int)__entry->val)
+   TP_printk("jack=%s %x", __get_str(name), __entry->val)
 );
 
 #endif /* _TRACE_ASOC_H */
-- 
2.7.4

[PATCH v1 1/2] ASoC: trace: remove unnecessary typecast

2018-08-08 Thread Liu, Changcheng

There's no need to do typecast since the defined type matches the output
fromat requirement.

Signed-off-by: Liu Changcheng 

diff --git a/include/trace/events/asoc.h b/include/trace/events/asoc.h
index 40c300f..bf17edc 100644
--- a/include/trace/events/asoc.h
+++ b/include/trace/events/asoc.h
@@ -32,7 +32,7 @@ DECLARE_EVENT_CLASS(snd_soc_card,
__entry->val = val;
),
 
-   TP_printk("card=%s val=%d", __get_str(name), (int)__entry->val)
+   TP_printk("card=%s val=%d", __get_str(name), __entry->val)
 );
 
 DEFINE_EVENT(snd_soc_card, snd_soc_bias_level_start,
@@ -101,7 +101,7 @@ DECLARE_EVENT_CLASS(snd_soc_dapm_widget,
),
 
TP_printk("widget=%s val=%d", __get_str(name),
- (int)__entry->val)
+   __entry->val)
 );
 
 DEFINE_EVENT(snd_soc_dapm_widget, snd_soc_dapm_widget_power,
@@ -149,8 +149,8 @@ TRACE_EVENT(snd_soc_dapm_walk_done,
),
 
TP_printk("%s: checks %d power, %d path, %d neighbour",
- __get_str(name), (int)__entry->power_checks,
- (int)__entry->path_checks, (int)__entry->neighbour_checks)
+   __get_str(name), __entry->power_checks,
+   __entry->path_checks, __entry->neighbour_checks)
 );
 
 TRACE_EVENT(snd_soc_dapm_path,
@@ -180,8 +180,7 @@ TRACE_EVENT(snd_soc_dapm_path,
),
 
TP_printk("%c%s %s %s %s %s",
-   (int) __entry->path_node &&
-   (int) __entry->path_connect ? '*' : ' ',
+   __entry->path_node && __entry->path_connect ? '*' : ' ',
__get_str(wname), DAPM_ARROW(__entry->path_dir),
__get_str(pname), DAPM_ARROW(__entry->path_dir),
__get_str(pnname))
@@ -242,8 +241,8 @@ TRACE_EVENT(snd_soc_jack_report,
__entry->val = val;
),
 
-   TP_printk("jack=%s %x/%x", __get_str(name), (int)__entry->val,
- (int)__entry->mask)
+   TP_printk("jack=%s %x/%x", __get_str(name), __entry->val,
+   __entry->mask)
 );
 
 TRACE_EVENT(snd_soc_jack_notify,
@@ -262,7 +261,7 @@ TRACE_EVENT(snd_soc_jack_notify,
__entry->val = val;
),
 
-   TP_printk("jack=%s %x", __get_str(name), (int)__entry->val)
+   TP_printk("jack=%s %x", __get_str(name), __entry->val)
 );
 
 #endif /* _TRACE_ASOC_H */
-- 
2.7.4

[PATCH v1 0/2] ASoC: trace: format file and add trace field

2018-08-08 Thread Liu, Changcheng

1. Some typecast are not needed.
2. trace dapm type in snd_soc_dapm_widget class event.

Liu Changcheng (2):
  ASoC: trace: remove unnecessary typecast
  ASoC: trace: track dapm type in snd_soc_dapm_widget

 include/trace/events/asoc.h | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

-- 
2.7.4

Re: [PATCH v3 2/2] clk: meson: add sub MMC clock controller driver

2018-08-08 Thread Yixun Lan

Hi Jerome

On 07/30/18 16:57, Jerome Brunet wrote:
> On Fri, 2018-07-27 at 09:45 -0700, Stephen Boyd wrote:
>> Quoting Stephen Boyd (2018-07-27 09:41:40)
>>> Quoting Yixun Lan (2018-07-27 07:52:23)
 HI Stephen:

 On 07/26/2018 11:20 PM, Stephen Boyd wrote:
> Quoting Yixun Lan (2018-07-12 14:12:44)
>> diff --git a/drivers/clk/meson/mmc-clkc.c b/drivers/clk/meson/mmc-clkc.c
>> new file mode 100644
>> index ..36c4c7cd69a6
>> --- /dev/null
>> +++ b/drivers/clk/meson/mmc-clkc.c
>> @@ -0,0 +1,367 @@
>> +// SPDX-License-Identifier: (GPL-2.0+ OR MIT)
>> +/*
>> + * Amlogic Meson MMC Sub Clock Controller Driver
>> + *
>> + * Copyright (c) 2017 Baylibre SAS.
>> + * Author: Jerome Brunet 
>> + *
>> + * Copyright (c) 2018 Amlogic, inc.
>> + * Author: Yixun Lan 
>> + */
>> +
>> +#include 
>
> Is this include used?
>

 this is needed by clk_get_rate()
 see drivers/clk/meson/mmc-clkc.c:204
>>>
>>> Hmm ok. That's unfortunate.
>>
>> You should be able to read the hardware to figure out the clk frequency?
>> This may be a sign that the phase clk_ops are bad and should be passing
>> in the frequency of the parent clk to the op so that phase can be
>> calculated. Jerome?
>>
> 
> It could be a away to do it but:
> a) if we modify the API, we would need to update every clock driver using it.
>There is not that many users of the phase API but still, it is annoying
> b) This particular driver need the parent rate, other might need something 
> else
> I guess. (parent phase ??, duty cycle ??) 
> 
> I think the real problem here it that you are using the consumer API. You 
> should
> be using the provider API like clk_hw_get_rate. Look at the clk-divider.c 
> which
> use clk_hw_round_rate() on the parent clock. 
I will replace it with clk_hw_get_rate()

> 
> Clock drivers should deal with 'struct clk_hw', not 'struct clk'. I think it 
> was
> mentioned in the past that the 'clk' within 'struct clk_hw' might be removed
> someday.
> 
> Yixun, please don't put your clock driver within the controller driver. Please
> implement your 'phase-delay' clock in its own file and export the ops, like
> every other clock in the amlogic directory. Also, please review your list of
> '#define', some of them are unnecessary copy/paste from the MMC driver.
> 
will implement a clk-phase-delay.c

I can move the extra CC list

Yixun

[PATCH v1 0/2] ASoC: trace: format file and add trace field

2018-08-08 Thread Liu, Changcheng

1. Some typecast are not needed.
2. trace dapm type in snd_soc_dapm_widget class event.

Liu Changcheng (2):
  ASoC: trace: remove unnecessary typecast
  ASoC: trace: track dapm type in snd_soc_dapm_widget

 include/trace/events/asoc.h | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

-- 
2.7.4

Re: [PATCH v3 2/2] clk: meson: add sub MMC clock controller driver

2018-08-08 Thread Yixun Lan

Hi Jerome

On 07/30/18 16:57, Jerome Brunet wrote:
> On Fri, 2018-07-27 at 09:45 -0700, Stephen Boyd wrote:
>> Quoting Stephen Boyd (2018-07-27 09:41:40)
>>> Quoting Yixun Lan (2018-07-27 07:52:23)
 HI Stephen:

 On 07/26/2018 11:20 PM, Stephen Boyd wrote:
> Quoting Yixun Lan (2018-07-12 14:12:44)
>> diff --git a/drivers/clk/meson/mmc-clkc.c b/drivers/clk/meson/mmc-clkc.c
>> new file mode 100644
>> index ..36c4c7cd69a6
>> --- /dev/null
>> +++ b/drivers/clk/meson/mmc-clkc.c
>> @@ -0,0 +1,367 @@
>> +// SPDX-License-Identifier: (GPL-2.0+ OR MIT)
>> +/*
>> + * Amlogic Meson MMC Sub Clock Controller Driver
>> + *
>> + * Copyright (c) 2017 Baylibre SAS.
>> + * Author: Jerome Brunet 
>> + *
>> + * Copyright (c) 2018 Amlogic, inc.
>> + * Author: Yixun Lan 
>> + */
>> +
>> +#include 
>
> Is this include used?
>

 this is needed by clk_get_rate()
 see drivers/clk/meson/mmc-clkc.c:204
>>>
>>> Hmm ok. That's unfortunate.
>>
>> You should be able to read the hardware to figure out the clk frequency?
>> This may be a sign that the phase clk_ops are bad and should be passing
>> in the frequency of the parent clk to the op so that phase can be
>> calculated. Jerome?
>>
> 
> It could be a away to do it but:
> a) if we modify the API, we would need to update every clock driver using it.
>There is not that many users of the phase API but still, it is annoying
> b) This particular driver need the parent rate, other might need something 
> else
> I guess. (parent phase ??, duty cycle ??) 
> 
> I think the real problem here it that you are using the consumer API. You 
> should
> be using the provider API like clk_hw_get_rate. Look at the clk-divider.c 
> which
> use clk_hw_round_rate() on the parent clock. 
I will replace it with clk_hw_get_rate()

> 
> Clock drivers should deal with 'struct clk_hw', not 'struct clk'. I think it 
> was
> mentioned in the past that the 'clk' within 'struct clk_hw' might be removed
> someday.
> 
> Yixun, please don't put your clock driver within the controller driver. Please
> implement your 'phase-delay' clock in its own file and export the ops, like
> every other clock in the amlogic directory. Also, please review your list of
> '#define', some of them are unnecessary copy/paste from the MMC driver.
> 
will implement a clk-phase-delay.c

I can move the extra CC list

Yixun

Re: [PATCH 07/46] dmaengine: omap-dma: use dmaenginem_async_device_register to simplify the code

2018-08-08 Thread Vinod

On 03-08-18, 10:47, Peter Ujfalusi wrote:
> 
> 
> On 2018-08-03 10:19, Huang Shijie wrote:
> > Use dmaenginem_async_device_register to simplify the code:
> >remove dma_async_device_unregister
> > 
> > Signed-off-by: Huang Shijie 
> > ---
> >  drivers/dma/ti/omap-dma.c | 5 +
> >  1 file changed, 1 insertion(+), 4 deletions(-)
> > 
> > diff --git a/drivers/dma/ti/omap-dma.c b/drivers/dma/ti/omap-dma.c
> > index a4a931ddf6f6..085748c6eb67 100644
> > --- a/drivers/dma/ti/omap-dma.c
> > +++ b/drivers/dma/ti/omap-dma.c
> > @@ -1566,7 +1566,7 @@ static int omap_dma_probe(struct platform_device 
> > *pdev)
> > }
> > }
> >  
> > -   rc = dma_async_device_register(>ddev);
> > +   rc = dmaenginem_async_device_register(>ddev);
> 
> Why it is dmaenginem_async_device_register() and not aligned other
> resource managed functions (devm_* dmam_*), like
> devm_dma_async_device_register()
> 
> and in dmaenginem_async_device_register() what is the 'm' in dmaenginem ?
> DMAengine Managed?

yup, as you rightly said we could have done devm_dmaengine... or like
few others do dmaenginem_

Yes it makes a little odd API but am trying to move away from dma_ to
dmaengine_ for everything new..

-- 
~Vinod

Re: [PATCH 07/46] dmaengine: omap-dma: use dmaenginem_async_device_register to simplify the code

2018-08-08 Thread Vinod

On 03-08-18, 10:47, Peter Ujfalusi wrote:
> 
> 
> On 2018-08-03 10:19, Huang Shijie wrote:
> > Use dmaenginem_async_device_register to simplify the code:
> >remove dma_async_device_unregister
> > 
> > Signed-off-by: Huang Shijie 
> > ---
> >  drivers/dma/ti/omap-dma.c | 5 +
> >  1 file changed, 1 insertion(+), 4 deletions(-)
> > 
> > diff --git a/drivers/dma/ti/omap-dma.c b/drivers/dma/ti/omap-dma.c
> > index a4a931ddf6f6..085748c6eb67 100644
> > --- a/drivers/dma/ti/omap-dma.c
> > +++ b/drivers/dma/ti/omap-dma.c
> > @@ -1566,7 +1566,7 @@ static int omap_dma_probe(struct platform_device 
> > *pdev)
> > }
> > }
> >  
> > -   rc = dma_async_device_register(>ddev);
> > +   rc = dmaenginem_async_device_register(>ddev);
> 
> Why it is dmaenginem_async_device_register() and not aligned other
> resource managed functions (devm_* dmam_*), like
> devm_dma_async_device_register()
> 
> and in dmaenginem_async_device_register() what is the 'm' in dmaenginem ?
> DMAengine Managed?

yup, as you rightly said we could have done devm_dmaengine... or like
few others do dmaenginem_

Yes it makes a little odd API but am trying to move away from dma_ to
dmaengine_ for everything new..

-- 
~Vinod

RE: [PATCH 0/4] locks: avoid thundering-herd wake-ups

2018-08-08 Thread Frank Filz

> On Wed, Aug 08, 2018 at 03:54:45PM -0400, J. Bruce Fields wrote:
> > On Wed, Aug 08, 2018 at 11:51:07AM +1000, NeilBrown wrote:
> > > If you have a many-core machine, and have many threads all wanting
> > > to briefly lock a give file (udev is known to do this), you can get
> > > quite poor performance.
> > >
> > > When one thread releases a lock, it wakes up all other threads that
> > > are waiting (classic thundering-herd) - one will get the lock and
> > > the others go to sleep.
> > > When you have few cores, this is not very noticeable: by the time
> > > the 4th or 5th thread gets enough CPU time to try to claim the lock,
> > > the earlier threads have claimed it, done what was needed, and
released.
> > > With 50+ cores, the contention can easily be measured.
> > >
> > > This patchset creates a tree of pending lock request in which
> > > siblings don't conflict and each lock request does conflict with its
parent.
> > > When a lock is released, only requests which don't conflict with
> > > each other a woken.
> >
> > Are you sure you aren't depending on the (incorrect) assumption that
> > "X blocks Y" is a transitive relation?
> >
> > OK I should be able to answer that question myself, my patience for
> > code-reading is at a real low this afternoon
> 
> In other words, is there the possibility of a tree of, say, exclusive
locks with
> (offset, length) like:
> 
>   (0, 2) waiting on (1, 2) waiting on (2, 2) waiting on (0, 4)
> 
> and when waking (0, 4) you could wake up (2, 2) but not (0, 2), leaving a
process
> waiting without there being an actual conflict.

That implies that the order the locks were received in was:

(0,4)
(2,2)
(1,2)
(0,2)

But couldn't (0,2) have been made only dependent on (0,4)? Of course then
(1,2) is dependent on BOTH (2,2) and (0,2). Does this tree logic handle that
case?

On the other hand, there might be a fairness reason to make (0,2) wait for
(1,2) even though it could have been granted concurrently with (2,2) since
this dependency tree also preserves some of the order of lock requests.

Frank

RE: [PATCH 0/4] locks: avoid thundering-herd wake-ups

2018-08-08 Thread Frank Filz

> On Wed, Aug 08, 2018 at 03:54:45PM -0400, J. Bruce Fields wrote:
> > On Wed, Aug 08, 2018 at 11:51:07AM +1000, NeilBrown wrote:
> > > If you have a many-core machine, and have many threads all wanting
> > > to briefly lock a give file (udev is known to do this), you can get
> > > quite poor performance.
> > >
> > > When one thread releases a lock, it wakes up all other threads that
> > > are waiting (classic thundering-herd) - one will get the lock and
> > > the others go to sleep.
> > > When you have few cores, this is not very noticeable: by the time
> > > the 4th or 5th thread gets enough CPU time to try to claim the lock,
> > > the earlier threads have claimed it, done what was needed, and
released.
> > > With 50+ cores, the contention can easily be measured.
> > >
> > > This patchset creates a tree of pending lock request in which
> > > siblings don't conflict and each lock request does conflict with its
parent.
> > > When a lock is released, only requests which don't conflict with
> > > each other a woken.
> >
> > Are you sure you aren't depending on the (incorrect) assumption that
> > "X blocks Y" is a transitive relation?
> >
> > OK I should be able to answer that question myself, my patience for
> > code-reading is at a real low this afternoon
> 
> In other words, is there the possibility of a tree of, say, exclusive
locks with
> (offset, length) like:
> 
>   (0, 2) waiting on (1, 2) waiting on (2, 2) waiting on (0, 4)
> 
> and when waking (0, 4) you could wake up (2, 2) but not (0, 2), leaving a
process
> waiting without there being an actual conflict.

That implies that the order the locks were received in was:

(0,4)
(2,2)
(1,2)
(0,2)

But couldn't (0,2) have been made only dependent on (0,4)? Of course then
(1,2) is dependent on BOTH (2,2) and (0,2). Does this tree logic handle that
case?

On the other hand, there might be a fairness reason to make (0,2) wait for
(1,2) even though it could have been granted concurrently with (2,2) since
this dependency tree also preserves some of the order of lock requests.

Frank

Re: [PATCH] dmaengine: sprd: Support DMA link-list mode

2018-08-08 Thread Vinod

On 26-07-18, 16:00, Baolin Wang wrote:
> From: Eric Long 
> 
> The Spreadtrum DMA can support the link-list transaction mode, which means
> DMA controller can do transaction one by one automatically once we linked
> these transaction by link-list register.
> 
> Signed-off-by: Eric Long 
> Signed-off-by: Baolin Wang 
> ---
>  drivers/dma/sprd-dma.c   |   82 
> ++
>  include/linux/dma/sprd-dma.h |   69 +++
>  2 files changed, 144 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/dma/sprd-dma.c b/drivers/dma/sprd-dma.c
> index 55df0d4..649bd2c 100644
> --- a/drivers/dma/sprd-dma.c
> +++ b/drivers/dma/sprd-dma.c
> @@ -68,6 +68,7 @@
>  
>  /* SPRD_DMA_CHN_CFG register definition */
>  #define SPRD_DMA_CHN_EN  BIT(0)
> +#define SPRD_DMA_LINKLIST_EN BIT(4)
>  #define SPRD_DMA_WAIT_BDONE_OFFSET   24
>  #define SPRD_DMA_DONOT_WAIT_BDONE1
>  
> @@ -103,7 +104,7 @@
>  #define SPRD_DMA_REQ_MODE_MASK   GENMASK(1, 0)
>  #define SPRD_DMA_FIX_SEL_OFFSET  21
>  #define SPRD_DMA_FIX_EN_OFFSET   20
> -#define SPRD_DMA_LLIST_END_OFFSET19
> +#define SPRD_DMA_LLIST_END   BIT(19)
>  #define SPRD_DMA_FRG_LEN_MASKGENMASK(16, 0)
>  
>  /* SPRD_DMA_CHN_BLK_LEN register definition */
> @@ -164,6 +165,7 @@ struct sprd_dma_desc {
>  struct sprd_dma_chn {
>   struct virt_dma_chanvc;
>   void __iomem*chn_base;
> + struct sprd_dma_linklistlinklist;
>   struct dma_slave_config slave_cfg;
>   u32 chn_num;
>   u32 dev_id;
> @@ -582,7 +584,8 @@ static int sprd_dma_get_step(enum dma_slave_buswidth 
> buswidth)
>  }
>  
>  static int sprd_dma_fill_desc(struct dma_chan *chan,
> -   struct sprd_dma_desc *sdesc,
> +   struct sprd_dma_chn_hw *hw,
> +   unsigned int sglen, int sg_index,
> dma_addr_t src, dma_addr_t dst, u32 len,
> enum dma_transfer_direction dir,
> unsigned long flags,
> @@ -590,7 +593,6 @@ static int sprd_dma_fill_desc(struct dma_chan *chan,
>  {
>   struct sprd_dma_dev *sdev = to_sprd_dma_dev(chan);
>   struct sprd_dma_chn *schan = to_sprd_dma_chan(chan);
> - struct sprd_dma_chn_hw *hw = >chn_hw;
>   u32 req_mode = (flags >> SPRD_DMA_REQ_SHIFT) & SPRD_DMA_REQ_MODE_MASK;
>   u32 int_mode = flags & SPRD_DMA_INT_MASK;
>   int src_datawidth, dst_datawidth, src_step, dst_step;
> @@ -670,12 +672,58 @@ static int sprd_dma_fill_desc(struct dma_chan *chan,
>   temp |= (src_step & SPRD_DMA_TRSF_STEP_MASK) << 
> SPRD_DMA_SRC_TRSF_STEP_OFFSET;
>   hw->trsf_step = temp;
>  
> + /* link-list configuration */
> + if (schan->linklist.phy_addr) {
> + if (sg_index == sglen - 1)
> + hw->frg_len |= SPRD_DMA_LLIST_END;
> +
> + hw->cfg |= SPRD_DMA_LINKLIST_EN;
> + hw->llist_ptr = schan->linklist.phy_addr +
> + ((sg_index + 1) % sglen) * sizeof(*hw) +
> + SPRD_DMA_CHN_SRC_ADDR;
> + } else {
> + hw->llist_ptr = 0;
> + }
> +
>   hw->frg_step = 0;
>   hw->src_blk_step = 0;
>   hw->des_blk_step = 0;
>   return 0;
>  }
>  
> +static int sprd_dma_fill_linklist_desc(struct dma_chan *chan,
> +unsigned int sglen, int sg_index,
> +dma_addr_t src, dma_addr_t dst, u32 len,
> +enum dma_transfer_direction dir,
> +unsigned long flags,
> +struct dma_slave_config *slave_cfg)
> +{
> + struct sprd_dma_chn *schan = to_sprd_dma_chan(chan);
> + struct sprd_dma_chn_hw *hw;
> +
> + if (sglen < 2 || !schan->linklist.virt_addr)
> + return -EINVAL;

why the limitation of 2 here?

> +
> + hw = (struct sprd_dma_chn_hw *)(schan->linklist.virt_addr +
> + sg_index * sizeof(*hw));
> +
> + return sprd_dma_fill_desc(chan, hw, sglen, sg_index, src, dst, len, dir,
> +   flags, slave_cfg);
> +}
> +
> +static int sprd_dma_fill_chn_desc(struct dma_chan *chan,
> +   struct sprd_dma_desc *sdesc,
> +   dma_addr_t src, dma_addr_t dst, u32 len,
> +   enum dma_transfer_direction dir,
> +   unsigned long flags,
> +   struct dma_slave_config *slave_cfg)
> +{
> + struct sprd_dma_chn_hw *hw = >chn_hw;
> +
> + return sprd_dma_fill_desc(chan, hw, 0, 0, src, dst, len, dir,
> +   flags, slave_cfg);
> +}

this fn does not do much, so why not call sprd_dma_fill_desc() instead
from

Re: [PATCH] dmaengine: sprd: Support DMA link-list mode

2018-08-08 Thread Vinod

On 26-07-18, 16:00, Baolin Wang wrote:
> From: Eric Long 
> 
> The Spreadtrum DMA can support the link-list transaction mode, which means
> DMA controller can do transaction one by one automatically once we linked
> these transaction by link-list register.
> 
> Signed-off-by: Eric Long 
> Signed-off-by: Baolin Wang 
> ---
>  drivers/dma/sprd-dma.c   |   82 
> ++
>  include/linux/dma/sprd-dma.h |   69 +++
>  2 files changed, 144 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/dma/sprd-dma.c b/drivers/dma/sprd-dma.c
> index 55df0d4..649bd2c 100644
> --- a/drivers/dma/sprd-dma.c
> +++ b/drivers/dma/sprd-dma.c
> @@ -68,6 +68,7 @@
>  
>  /* SPRD_DMA_CHN_CFG register definition */
>  #define SPRD_DMA_CHN_EN  BIT(0)
> +#define SPRD_DMA_LINKLIST_EN BIT(4)
>  #define SPRD_DMA_WAIT_BDONE_OFFSET   24
>  #define SPRD_DMA_DONOT_WAIT_BDONE1
>  
> @@ -103,7 +104,7 @@
>  #define SPRD_DMA_REQ_MODE_MASK   GENMASK(1, 0)
>  #define SPRD_DMA_FIX_SEL_OFFSET  21
>  #define SPRD_DMA_FIX_EN_OFFSET   20
> -#define SPRD_DMA_LLIST_END_OFFSET19
> +#define SPRD_DMA_LLIST_END   BIT(19)
>  #define SPRD_DMA_FRG_LEN_MASKGENMASK(16, 0)
>  
>  /* SPRD_DMA_CHN_BLK_LEN register definition */
> @@ -164,6 +165,7 @@ struct sprd_dma_desc {
>  struct sprd_dma_chn {
>   struct virt_dma_chanvc;
>   void __iomem*chn_base;
> + struct sprd_dma_linklistlinklist;
>   struct dma_slave_config slave_cfg;
>   u32 chn_num;
>   u32 dev_id;
> @@ -582,7 +584,8 @@ static int sprd_dma_get_step(enum dma_slave_buswidth 
> buswidth)
>  }
>  
>  static int sprd_dma_fill_desc(struct dma_chan *chan,
> -   struct sprd_dma_desc *sdesc,
> +   struct sprd_dma_chn_hw *hw,
> +   unsigned int sglen, int sg_index,
> dma_addr_t src, dma_addr_t dst, u32 len,
> enum dma_transfer_direction dir,
> unsigned long flags,
> @@ -590,7 +593,6 @@ static int sprd_dma_fill_desc(struct dma_chan *chan,
>  {
>   struct sprd_dma_dev *sdev = to_sprd_dma_dev(chan);
>   struct sprd_dma_chn *schan = to_sprd_dma_chan(chan);
> - struct sprd_dma_chn_hw *hw = >chn_hw;
>   u32 req_mode = (flags >> SPRD_DMA_REQ_SHIFT) & SPRD_DMA_REQ_MODE_MASK;
>   u32 int_mode = flags & SPRD_DMA_INT_MASK;
>   int src_datawidth, dst_datawidth, src_step, dst_step;
> @@ -670,12 +672,58 @@ static int sprd_dma_fill_desc(struct dma_chan *chan,
>   temp |= (src_step & SPRD_DMA_TRSF_STEP_MASK) << 
> SPRD_DMA_SRC_TRSF_STEP_OFFSET;
>   hw->trsf_step = temp;
>  
> + /* link-list configuration */
> + if (schan->linklist.phy_addr) {
> + if (sg_index == sglen - 1)
> + hw->frg_len |= SPRD_DMA_LLIST_END;
> +
> + hw->cfg |= SPRD_DMA_LINKLIST_EN;
> + hw->llist_ptr = schan->linklist.phy_addr +
> + ((sg_index + 1) % sglen) * sizeof(*hw) +
> + SPRD_DMA_CHN_SRC_ADDR;
> + } else {
> + hw->llist_ptr = 0;
> + }
> +
>   hw->frg_step = 0;
>   hw->src_blk_step = 0;
>   hw->des_blk_step = 0;
>   return 0;
>  }
>  
> +static int sprd_dma_fill_linklist_desc(struct dma_chan *chan,
> +unsigned int sglen, int sg_index,
> +dma_addr_t src, dma_addr_t dst, u32 len,
> +enum dma_transfer_direction dir,
> +unsigned long flags,
> +struct dma_slave_config *slave_cfg)
> +{
> + struct sprd_dma_chn *schan = to_sprd_dma_chan(chan);
> + struct sprd_dma_chn_hw *hw;
> +
> + if (sglen < 2 || !schan->linklist.virt_addr)
> + return -EINVAL;

why the limitation of 2 here?

> +
> + hw = (struct sprd_dma_chn_hw *)(schan->linklist.virt_addr +
> + sg_index * sizeof(*hw));
> +
> + return sprd_dma_fill_desc(chan, hw, sglen, sg_index, src, dst, len, dir,
> +   flags, slave_cfg);
> +}
> +
> +static int sprd_dma_fill_chn_desc(struct dma_chan *chan,
> +   struct sprd_dma_desc *sdesc,
> +   dma_addr_t src, dma_addr_t dst, u32 len,
> +   enum dma_transfer_direction dir,
> +   unsigned long flags,
> +   struct dma_slave_config *slave_cfg)
> +{
> + struct sprd_dma_chn_hw *hw = >chn_hw;
> +
> + return sprd_dma_fill_desc(chan, hw, 0, 0, src, dst, len, dir,
> +   flags, slave_cfg);
> +}

this fn does not do much, so why not call sprd_dma_fill_desc() instead
from

Re: [PATCH mmc-next v3 0/3] solve SDHCI DWC MSHC 128MB DMA boundary limitation

2018-08-08 Thread Jisheng Zhang

Hi Adrian, Ulf,

could you please review these patches? Any comments are welcome.

Thanks in advance,
Jisheng

On Mon, 30 Jul 2018 10:42:28 +0800 Jisheng Zhang wrote:

> When using DMA, if the DMA addr spans 128MB boundary, we have to split
> the DMA transfer into two so that each one doesn't exceed the boundary.
> 
> patch1 adds adma_table_num to struct sdhci_host so that driver can
> control the ADMA table number.
> patch2 introduces adma_write_desc() hook to struct sdhci_ops so that
> driver can override it.
> patch3 finally solves the 128MB boundary limitation.
> 
> since v2:
>   - make use of "likely" to check (!len || BOUNDARY_OK(addr, len))
>   - explictly include  for SZ_128M
> 
> since v1:
>   - fix BOUNDARY_OK macro if addr+len is aligned to 128MB
>   - use DIV_ROUND_UP to cal extra desc num
>   - fix !len for dwcmshc_adma_write_desc()
> 
> Jisheng Zhang (3):
>   mmc: sdhci: add adma_table_num member to struct sdhci_host
>   mmc: sdhci: introduce adma_write_desc() hook to struct sdhci_ops
>   mmc: sdhci-of-dwcmshc: solve 128MB DMA boundary limitation
> 
>  drivers/mmc/host/sdhci-of-dwcmshc.c | 43 ++
>  drivers/mmc/host/sdhci.c| 48 +++--
>  drivers/mmc/host/sdhci.h|  8 +
>  3 files changed, 83 insertions(+), 16 deletions(-)
>

Re: [PATCH mmc-next v3 0/3] solve SDHCI DWC MSHC 128MB DMA boundary limitation

2018-08-08 Thread Jisheng Zhang

Hi Adrian, Ulf,

could you please review these patches? Any comments are welcome.

Thanks in advance,
Jisheng

On Mon, 30 Jul 2018 10:42:28 +0800 Jisheng Zhang wrote:

> When using DMA, if the DMA addr spans 128MB boundary, we have to split
> the DMA transfer into two so that each one doesn't exceed the boundary.
> 
> patch1 adds adma_table_num to struct sdhci_host so that driver can
> control the ADMA table number.
> patch2 introduces adma_write_desc() hook to struct sdhci_ops so that
> driver can override it.
> patch3 finally solves the 128MB boundary limitation.
> 
> since v2:
>   - make use of "likely" to check (!len || BOUNDARY_OK(addr, len))
>   - explictly include  for SZ_128M
> 
> since v1:
>   - fix BOUNDARY_OK macro if addr+len is aligned to 128MB
>   - use DIV_ROUND_UP to cal extra desc num
>   - fix !len for dwcmshc_adma_write_desc()
> 
> Jisheng Zhang (3):
>   mmc: sdhci: add adma_table_num member to struct sdhci_host
>   mmc: sdhci: introduce adma_write_desc() hook to struct sdhci_ops
>   mmc: sdhci-of-dwcmshc: solve 128MB DMA boundary limitation
> 
>  drivers/mmc/host/sdhci-of-dwcmshc.c | 43 ++
>  drivers/mmc/host/sdhci.c| 48 +++--
>  drivers/mmc/host/sdhci.h|  8 +
>  3 files changed, 83 insertions(+), 16 deletions(-)
>

[PATCH 3/5] fs/locks: change all *_conflict() functions to return a new enum.

2018-08-08 Thread NeilBrown

In a future patch we will need to differentiate between conflicts that
are "transitive" and those that aren't.
A "transitive" conflict is defined as one where any lock that
conflicts with the first (newly requested) lock would conflict with
the existing lock.

So change posix_locks_conflict(), flock_locks_conflict() (both
currently returning int) and leases_conflict() (currently returning
bool) to return "enum conflict".
Add locks_transitive_overlap() to make it possible to compute the
correct conflict for posix locks.

The FL_NO_CONFLICT value is zero, so current code which only tests the
truth value of these functions will still work the same way.

And convert some
   return (foo);
to
   return foo;

Signed-off-by: NeilBrown 
---
 fs/locks.c |   64 ++--
 1 file changed, 49 insertions(+), 15 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index b4812da2a374..d06658b2dc7a 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -139,6 +139,16 @@
 #define IS_OFDLCK(fl)  (fl->fl_flags & FL_OFDLCK)
 #define IS_REMOTELCK(fl)   (fl->fl_pid <= 0)
 
+/* A transitive conflict is one where the first lock conflicts with
+ * the second lock, and any other lock that conflicts with the
+ * first lock, also conflicts with the second lock.
+ */
+enum conflict {
+   FL_NO_CONFLICT = 0,
+   FL_CONFLICT,
+   FL_TRANSITIVE_CONFLICT,
+};
+
 static inline bool is_remote_lock(struct file *filp)
 {
return likely(!(filp->f_path.dentry->d_sb->s_flags & SB_NOREMOTELOCK));
@@ -612,6 +622,15 @@ static inline int locks_overlap(struct file_lock *fl1, 
struct file_lock *fl2)
(fl2->fl_end >= fl1->fl_start));
 }
 
+/* Check for transitive-overlap - true if any lock that overlaps
+ * the first lock must overlap the seconds
+ */
+static inline bool locks_transitive_overlap(struct file_lock *fl1,
+   struct file_lock *fl2)
+{
+   return (fl1->fl_start >= fl2->fl_start) &&
+   (fl1->fl_end <= fl2->fl_end);
+}
 /*
  * Check whether two locks have the same owner.
  */
@@ -793,47 +812,61 @@ locks_delete_lock_ctx(struct file_lock *fl, struct 
list_head *dispose)
 /* Determine if lock sys_fl blocks lock caller_fl. Common functionality
  * checks for shared/exclusive status of overlapping locks.
  */
-static int locks_conflict(struct file_lock *caller_fl, struct file_lock 
*sys_fl)
+static enum conflict locks_conflict(struct file_lock *caller_fl,
+   struct file_lock *sys_fl)
 {
if (sys_fl->fl_type == F_WRLCK)
-   return 1;
+   return FL_TRANSITIVE_CONFLICT;
if (caller_fl->fl_type == F_WRLCK)
-   return 1;
-   return 0;
+   return FL_CONFLICT;
+   return FL_NO_CONFLICT;
 }
 
 /* Determine if lock sys_fl blocks lock caller_fl. POSIX specific
  * checking before calling the locks_conflict().
  */
-static int posix_locks_conflict(struct file_lock *caller_fl, struct file_lock 
*sys_fl)
+static enum conflict posix_locks_conflict(struct file_lock *caller_fl,
+ struct file_lock *sys_fl)
 {
/* POSIX locks owned by the same process do not conflict with
 * each other.
 */
if (posix_same_owner(caller_fl, sys_fl))
-   return (0);
+   return FL_NO_CONFLICT;
 
/* Check whether they overlap */
if (!locks_overlap(caller_fl, sys_fl))
-   return 0;
+   return FL_NO_CONFLICT;
 
-   return (locks_conflict(caller_fl, sys_fl));
+   switch (locks_conflict(caller_fl, sys_fl)) {
+   default:
+   case FL_NO_CONFLICT:
+   return FL_NO_CONFLICT;
+   case FL_CONFLICT:
+   return FL_CONFLICT;
+   case FL_TRANSITIVE_CONFLICT:
+   if (locks_transitive_overlap(caller_fl, sys_fl))
+   return FL_TRANSITIVE_CONFLICT;
+   else
+   return FL_CONFLICT;
+   }
 }
 
 /* Determine if lock sys_fl blocks lock caller_fl. FLOCK specific
  * checking before calling the locks_conflict().
  */
-static int flock_locks_conflict(struct file_lock *caller_fl, struct file_lock 
*sys_fl)
+static enum conflict flock_locks_conflict(struct file_lock *caller_fl,
+ struct file_lock *sys_fl)
 {
/* FLOCK locks referring to the same filp do not conflict with
 * each other.
 */
if (caller_fl->fl_file == sys_fl->fl_file)
-   return (0);
+   return FL_NO_CONFLICT;
if ((caller_fl->fl_type & LOCK_MAND) || (sys_fl->fl_type & LOCK_MAND))
-   return 0;
+   return FL_NO_CONFLICT;
 
-   return (locks_conflict(caller_fl, sys_fl));
+   return locks_conflict(caller_fl, sys_fl);
 }
 
 void
@@ -1435,12 +1468,13 @@ static void time_out_leases(struct inode *inode, struct 
list_head *dispose)
}
 }

[PATCH 5/5] fs/locks: create a tree of dependent requests.

2018-08-08 Thread NeilBrown

When we find an existing lock which conflicts with a request,
and the request wants to wait, we currently add the request
to a list.  When the lock is removed, the whole list is woken.
This can cause the thundering-herd problem.
To reduce the problem, we make use of the (new) fact that
a pending request can itself have a list of blocked requests.
When we find a conflict, we look through the existing blocked requests.
If any one of them blocks the new request, the new request is attached
below that request.
This way, when the lock is released, only a set of non-conflicting
locks will be woken.  The rest of the herd can stay asleep.

Reported-and-tested-by: Martin Wilck 
Signed-off-by: NeilBrown 
---
 fs/locks.c |   69 +++-
 1 file changed, 63 insertions(+), 6 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index fc64016d01ee..17843feb6f5b 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -738,6 +738,39 @@ static void locks_delete_block(struct file_lock *waiter)
spin_unlock(_lock_lock);
 }
 
+static void wake_non_conflicts(struct file_lock *waiter, struct file_lock 
*blocker,
+  enum conflict conflict(struct file_lock *,
+ struct file_lock *))
+{
+   struct file_lock *parent = waiter;
+   struct file_lock *fl;
+   struct file_lock  *t;
+
+   fl = list_entry(>fl_blocked, struct file_lock, fl_block);
+restart:
+   list_for_each_entry_safe_continue(fl, t, >fl_blocked, fl_block) 
{
+   switch (conflict(fl, blocker)) {
+   default:
+   case FL_NO_CONFLICT:
+   __locks_wake_one(fl);
+   break;
+   case FL_CONFLICT:
+   /* Need to check children */
+   parent = fl;
+   fl = list_entry(>fl_blocked, struct file_lock, 
fl_block);
+   goto restart;
+   case FL_TRANSITIVE_CONFLICT:
+   /* all children must also conflict, no need to check */
+   continue;
+   }
+   }
+   if (parent != waiter) {
+   parent = parent->fl_blocker;
+   fl = parent;
+   goto restart;
+   }
+}
+
 /* Insert waiter into blocker's block list.
  * We use a circular list so that processes can be easily woken up in
  * the order they blocked. The documentation doesn't require this but
@@ -747,11 +780,32 @@ static void locks_delete_block(struct file_lock *waiter)
  * fl_blocked list itself is protected by the blocked_lock_lock, but by 
ensuring
  * that the flc_lock is also held on insertions we can avoid taking the
  * blocked_lock_lock in some cases when we see that the fl_blocked list is 
empty.
+ *
+ * Rather than just adding to the list, we check for conflicts with any 
existing
+ * waiter, and add to that waiter instead.
+ * Thus wakeups don't happen until needed.
  */
 static void __locks_insert_block(struct file_lock *blocker,
-   struct file_lock *waiter)
+struct file_lock *waiter,
+enum conflict conflict(struct file_lock *,
+   struct file_lock *))
 {
+   struct file_lock *fl;
BUG_ON(!list_empty(>fl_block));
+
+   /* Any request in waiter->fl_blocked is know to conflict with
+* waiter, but it might not conflict with blocker.
+* If it doesn't, it needs to be woken now so it can find
+* somewhere else to wait, or possible it can get granted.
+*/
+   if (conflict(waiter, blocker) != FL_TRANSITIVE_CONFLICT)
+   wake_non_conflicts(waiter, blocker, conflict);
+new_blocker:
+   list_for_each_entry(fl, >fl_blocked, fl_block)
+   if (conflict(fl, waiter)) {
+   blocker =  fl;
+   goto new_blocker;
+   }
waiter->fl_blocker = blocker;
list_add_tail(>fl_block, >fl_blocked);
if (IS_POSIX(blocker) && !IS_OFDLCK(blocker))
@@ -760,10 +814,12 @@ static void __locks_insert_block(struct file_lock 
*blocker,
 
 /* Must be called with flc_lock held. */
 static void locks_insert_block(struct file_lock *blocker,
-   struct file_lock *waiter)
+  struct file_lock *waiter,
+  enum conflict conflict(struct file_lock *,
+ struct file_lock *))
 {
spin_lock(_lock_lock);
-   __locks_insert_block(blocker, waiter);
+   __locks_insert_block(blocker, waiter, conflict);
spin_unlock(_lock_lock);
 }
 
@@ -1033,7 +1089,7 @@ static int flock_lock_inode(struct inode *inode, struct 
file_lock *request)
if (!(request->fl_flags & FL_SLEEP))
goto

[PATCH 4/5] fs/locks: split out __locks_wake_one()

2018-08-08 Thread NeilBrown

This functionality will be useful in the next patch, so
split it out from __locks_wake_up_blocks().

Signed-off-by: NeilBrown 
---
 fs/locks.c |   15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index d06658b2dc7a..fc64016d01ee 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -710,6 +710,15 @@ static void __locks_delete_block(struct file_lock *waiter)
waiter->fl_blocker = NULL;
 }
 
+static void __locks_wake_one(struct file_lock *waiter)
+{
+   __locks_delete_block(waiter);
+   if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
+   waiter->fl_lmops->lm_notify(waiter);
+   else
+   wake_up(>fl_wait);
+}
+
 static void __locks_wake_up_blocks(struct file_lock *blocker)
 {
while (!list_empty(>fl_blocked)) {
@@ -717,11 +726,7 @@ static void __locks_wake_up_blocks(struct file_lock 
*blocker)
 
waiter = list_first_entry(>fl_blocked,
struct file_lock, fl_block);
-   __locks_delete_block(waiter);
-   if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
-   waiter->fl_lmops->lm_notify(waiter);
-   else
-   wake_up(>fl_wait);
+   __locks_wake_one(waiter);
}
 }

[PATCH 5/5] fs/locks: create a tree of dependent requests.

2018-08-08 Thread NeilBrown

When we find an existing lock which conflicts with a request,
and the request wants to wait, we currently add the request
to a list.  When the lock is removed, the whole list is woken.
This can cause the thundering-herd problem.
To reduce the problem, we make use of the (new) fact that
a pending request can itself have a list of blocked requests.
When we find a conflict, we look through the existing blocked requests.
If any one of them blocks the new request, the new request is attached
below that request.
This way, when the lock is released, only a set of non-conflicting
locks will be woken.  The rest of the herd can stay asleep.

Reported-and-tested-by: Martin Wilck 
Signed-off-by: NeilBrown 
---
 fs/locks.c |   69 +++-
 1 file changed, 63 insertions(+), 6 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index fc64016d01ee..17843feb6f5b 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -738,6 +738,39 @@ static void locks_delete_block(struct file_lock *waiter)
spin_unlock(_lock_lock);
 }
 
+static void wake_non_conflicts(struct file_lock *waiter, struct file_lock 
*blocker,
+  enum conflict conflict(struct file_lock *,
+ struct file_lock *))
+{
+   struct file_lock *parent = waiter;
+   struct file_lock *fl;
+   struct file_lock  *t;
+
+   fl = list_entry(>fl_blocked, struct file_lock, fl_block);
+restart:
+   list_for_each_entry_safe_continue(fl, t, >fl_blocked, fl_block) 
{
+   switch (conflict(fl, blocker)) {
+   default:
+   case FL_NO_CONFLICT:
+   __locks_wake_one(fl);
+   break;
+   case FL_CONFLICT:
+   /* Need to check children */
+   parent = fl;
+   fl = list_entry(>fl_blocked, struct file_lock, 
fl_block);
+   goto restart;
+   case FL_TRANSITIVE_CONFLICT:
+   /* all children must also conflict, no need to check */
+   continue;
+   }
+   }
+   if (parent != waiter) {
+   parent = parent->fl_blocker;
+   fl = parent;
+   goto restart;
+   }
+}
+
 /* Insert waiter into blocker's block list.
  * We use a circular list so that processes can be easily woken up in
  * the order they blocked. The documentation doesn't require this but
@@ -747,11 +780,32 @@ static void locks_delete_block(struct file_lock *waiter)
  * fl_blocked list itself is protected by the blocked_lock_lock, but by 
ensuring
  * that the flc_lock is also held on insertions we can avoid taking the
  * blocked_lock_lock in some cases when we see that the fl_blocked list is 
empty.
+ *
+ * Rather than just adding to the list, we check for conflicts with any 
existing
+ * waiter, and add to that waiter instead.
+ * Thus wakeups don't happen until needed.
  */
 static void __locks_insert_block(struct file_lock *blocker,
-   struct file_lock *waiter)
+struct file_lock *waiter,
+enum conflict conflict(struct file_lock *,
+   struct file_lock *))
 {
+   struct file_lock *fl;
BUG_ON(!list_empty(>fl_block));
+
+   /* Any request in waiter->fl_blocked is know to conflict with
+* waiter, but it might not conflict with blocker.
+* If it doesn't, it needs to be woken now so it can find
+* somewhere else to wait, or possible it can get granted.
+*/
+   if (conflict(waiter, blocker) != FL_TRANSITIVE_CONFLICT)
+   wake_non_conflicts(waiter, blocker, conflict);
+new_blocker:
+   list_for_each_entry(fl, >fl_blocked, fl_block)
+   if (conflict(fl, waiter)) {
+   blocker =  fl;
+   goto new_blocker;
+   }
waiter->fl_blocker = blocker;
list_add_tail(>fl_block, >fl_blocked);
if (IS_POSIX(blocker) && !IS_OFDLCK(blocker))
@@ -760,10 +814,12 @@ static void __locks_insert_block(struct file_lock 
*blocker,
 
 /* Must be called with flc_lock held. */
 static void locks_insert_block(struct file_lock *blocker,
-   struct file_lock *waiter)
+  struct file_lock *waiter,
+  enum conflict conflict(struct file_lock *,
+ struct file_lock *))
 {
spin_lock(_lock_lock);
-   __locks_insert_block(blocker, waiter);
+   __locks_insert_block(blocker, waiter, conflict);
spin_unlock(_lock_lock);
 }
 
@@ -1033,7 +1089,7 @@ static int flock_lock_inode(struct inode *inode, struct 
file_lock *request)
if (!(request->fl_flags & FL_SLEEP))
goto

[PATCH 4/5] fs/locks: split out __locks_wake_one()

2018-08-08 Thread NeilBrown

This functionality will be useful in the next patch, so
split it out from __locks_wake_up_blocks().

Signed-off-by: NeilBrown 
---
 fs/locks.c |   15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index d06658b2dc7a..fc64016d01ee 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -710,6 +710,15 @@ static void __locks_delete_block(struct file_lock *waiter)
waiter->fl_blocker = NULL;
 }
 
+static void __locks_wake_one(struct file_lock *waiter)
+{
+   __locks_delete_block(waiter);
+   if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
+   waiter->fl_lmops->lm_notify(waiter);
+   else
+   wake_up(>fl_wait);
+}
+
 static void __locks_wake_up_blocks(struct file_lock *blocker)
 {
while (!list_empty(>fl_blocked)) {
@@ -717,11 +726,7 @@ static void __locks_wake_up_blocks(struct file_lock 
*blocker)
 
waiter = list_first_entry(>fl_blocked,
struct file_lock, fl_block);
-   __locks_delete_block(waiter);
-   if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
-   waiter->fl_lmops->lm_notify(waiter);
-   else
-   wake_up(>fl_wait);
+   __locks_wake_one(waiter);
}
 }

[PATCH 3/5] fs/locks: change all *_conflict() functions to return a new enum.

2018-08-08 Thread NeilBrown

In a future patch we will need to differentiate between conflicts that
are "transitive" and those that aren't.
A "transitive" conflict is defined as one where any lock that
conflicts with the first (newly requested) lock would conflict with
the existing lock.

So change posix_locks_conflict(), flock_locks_conflict() (both
currently returning int) and leases_conflict() (currently returning
bool) to return "enum conflict".
Add locks_transitive_overlap() to make it possible to compute the
correct conflict for posix locks.

The FL_NO_CONFLICT value is zero, so current code which only tests the
truth value of these functions will still work the same way.

And convert some
   return (foo);
to
   return foo;

Signed-off-by: NeilBrown 
---
 fs/locks.c |   64 ++--
 1 file changed, 49 insertions(+), 15 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index b4812da2a374..d06658b2dc7a 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -139,6 +139,16 @@
 #define IS_OFDLCK(fl)  (fl->fl_flags & FL_OFDLCK)
 #define IS_REMOTELCK(fl)   (fl->fl_pid <= 0)
 
+/* A transitive conflict is one where the first lock conflicts with
+ * the second lock, and any other lock that conflicts with the
+ * first lock, also conflicts with the second lock.
+ */
+enum conflict {
+   FL_NO_CONFLICT = 0,
+   FL_CONFLICT,
+   FL_TRANSITIVE_CONFLICT,
+};
+
 static inline bool is_remote_lock(struct file *filp)
 {
return likely(!(filp->f_path.dentry->d_sb->s_flags & SB_NOREMOTELOCK));
@@ -612,6 +622,15 @@ static inline int locks_overlap(struct file_lock *fl1, 
struct file_lock *fl2)
(fl2->fl_end >= fl1->fl_start));
 }
 
+/* Check for transitive-overlap - true if any lock that overlaps
+ * the first lock must overlap the seconds
+ */
+static inline bool locks_transitive_overlap(struct file_lock *fl1,
+   struct file_lock *fl2)
+{
+   return (fl1->fl_start >= fl2->fl_start) &&
+   (fl1->fl_end <= fl2->fl_end);
+}
 /*
  * Check whether two locks have the same owner.
  */
@@ -793,47 +812,61 @@ locks_delete_lock_ctx(struct file_lock *fl, struct 
list_head *dispose)
 /* Determine if lock sys_fl blocks lock caller_fl. Common functionality
  * checks for shared/exclusive status of overlapping locks.
  */
-static int locks_conflict(struct file_lock *caller_fl, struct file_lock 
*sys_fl)
+static enum conflict locks_conflict(struct file_lock *caller_fl,
+   struct file_lock *sys_fl)
 {
if (sys_fl->fl_type == F_WRLCK)
-   return 1;
+   return FL_TRANSITIVE_CONFLICT;
if (caller_fl->fl_type == F_WRLCK)
-   return 1;
-   return 0;
+   return FL_CONFLICT;
+   return FL_NO_CONFLICT;
 }
 
 /* Determine if lock sys_fl blocks lock caller_fl. POSIX specific
  * checking before calling the locks_conflict().
  */
-static int posix_locks_conflict(struct file_lock *caller_fl, struct file_lock 
*sys_fl)
+static enum conflict posix_locks_conflict(struct file_lock *caller_fl,
+ struct file_lock *sys_fl)
 {
/* POSIX locks owned by the same process do not conflict with
 * each other.
 */
if (posix_same_owner(caller_fl, sys_fl))
-   return (0);
+   return FL_NO_CONFLICT;
 
/* Check whether they overlap */
if (!locks_overlap(caller_fl, sys_fl))
-   return 0;
+   return FL_NO_CONFLICT;
 
-   return (locks_conflict(caller_fl, sys_fl));
+   switch (locks_conflict(caller_fl, sys_fl)) {
+   default:
+   case FL_NO_CONFLICT:
+   return FL_NO_CONFLICT;
+   case FL_CONFLICT:
+   return FL_CONFLICT;
+   case FL_TRANSITIVE_CONFLICT:
+   if (locks_transitive_overlap(caller_fl, sys_fl))
+   return FL_TRANSITIVE_CONFLICT;
+   else
+   return FL_CONFLICT;
+   }
 }
 
 /* Determine if lock sys_fl blocks lock caller_fl. FLOCK specific
  * checking before calling the locks_conflict().
  */
-static int flock_locks_conflict(struct file_lock *caller_fl, struct file_lock 
*sys_fl)
+static enum conflict flock_locks_conflict(struct file_lock *caller_fl,
+ struct file_lock *sys_fl)
 {
/* FLOCK locks referring to the same filp do not conflict with
 * each other.
 */
if (caller_fl->fl_file == sys_fl->fl_file)
-   return (0);
+   return FL_NO_CONFLICT;
if ((caller_fl->fl_type & LOCK_MAND) || (sys_fl->fl_type & LOCK_MAND))
-   return 0;
+   return FL_NO_CONFLICT;
 
-   return (locks_conflict(caller_fl, sys_fl));
+   return locks_conflict(caller_fl, sys_fl);
 }
 
 void
@@ -1435,12 +1468,13 @@ static void time_out_leases(struct inode *inode, struct 
list_head *dispose)
}
 }

[PATCH 2/5] fs/locks: allow a lock request to block other requests.

2018-08-08 Thread NeilBrown

Currently, a lock can block pending requests, but all pending
requests are equal.  If lots of pending requests are
mutually exclusive, this means they will all be woken up
and all but one will fail.  This can hurt performance.

So we will allow pending requests to block other requests.
Only the first request will be woken, and it will wake the others.

This patch doesn't implement this fully, but prepares the way.

- It acknowledges that a request might be blocking other requests,
  and when the request is converted to a lock, those blocked
  requests are moved across.
- When a request is removed, all blocked requests are woken.
- When deadlock-detection looks for the lock which blocks a
  given request, we follow the chain of ->fl_blocker all
  the way to the top.

Signed-off-by: NeilBrown 
---
 fs/locks.c |   58 --
 1 file changed, 40 insertions(+), 18 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 322491e70e41..b4812da2a374 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -408,9 +408,19 @@ void locks_copy_lock(struct file_lock *new, struct 
file_lock *fl)
fl->fl_ops->fl_copy_lock(new, fl);
}
 }
-
 EXPORT_SYMBOL(locks_copy_lock);
 
+static void locks_move_blocks(struct file_lock *new, struct file_lock *fl)
+{
+   struct file_lock *f;
+
+   spin_lock(_lock_lock);
+   list_splice_init(>fl_blocked, >fl_blocked);
+   list_for_each_entry(f, >fl_blocked, fl_block)
+   f->fl_blocker = new;
+   spin_unlock(_lock_lock);
+}
+
 static inline int flock_translate_cmd(int cmd) {
if (cmd & LOCK_MAND)
return cmd & (LOCK_MAND | LOCK_RW);
@@ -681,9 +691,25 @@ static void __locks_delete_block(struct file_lock *waiter)
waiter->fl_blocker = NULL;
 }
 
+static void __locks_wake_up_blocks(struct file_lock *blocker)
+{
+   while (!list_empty(>fl_blocked)) {
+   struct file_lock *waiter;
+
+   waiter = list_first_entry(>fl_blocked,
+   struct file_lock, fl_block);
+   __locks_delete_block(waiter);
+   if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
+   waiter->fl_lmops->lm_notify(waiter);
+   else
+   wake_up(>fl_wait);
+   }
+}
+
 static void locks_delete_block(struct file_lock *waiter)
 {
spin_lock(_lock_lock);
+   __locks_wake_up_blocks(waiter);
__locks_delete_block(waiter);
spin_unlock(_lock_lock);
 }
@@ -735,17 +761,7 @@ static void locks_wake_up_blocks(struct file_lock *blocker)
return;
 
spin_lock(_lock_lock);
-   while (!list_empty(>fl_blocked)) {
-   struct file_lock *waiter;
-
-   waiter = list_first_entry(>fl_blocked,
-   struct file_lock, fl_block);
-   __locks_delete_block(waiter);
-   if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
-   waiter->fl_lmops->lm_notify(waiter);
-   else
-   wake_up(>fl_wait);
-   }
+   __locks_wake_up_blocks(blocker);
spin_unlock(_lock_lock);
 }
 
@@ -888,8 +904,11 @@ static struct file_lock *what_owner_is_waiting_for(struct 
file_lock *block_fl)
struct file_lock *fl;
 
hash_for_each_possible(blocked_hash, fl, fl_link, 
posix_owner_key(block_fl)) {
-   if (posix_same_owner(fl, block_fl))
-   return fl->fl_blocker;
+   if (posix_same_owner(fl, block_fl)) {
+   while (fl->fl_blocker)
+   fl = fl->fl_blocker;
+   return fl;
+   }
}
return NULL;
 }
@@ -982,6 +1001,7 @@ static int flock_lock_inode(struct inode *inode, struct 
file_lock *request)
if (request->fl_flags & FL_ACCESS)
goto out;
locks_copy_lock(new_fl, request);
+   locks_move_blocks(new_fl, request);
locks_insert_lock_ctx(new_fl, >flc_flock);
new_fl = NULL;
error = 0;
@@ -1174,6 +1194,7 @@ static int posix_lock_inode(struct inode *inode, struct 
file_lock *request,
goto out;
}
locks_copy_lock(new_fl, request);
+   locks_move_blocks(new_fl, request);
locks_insert_lock_ctx(new_fl, >fl_list);
fl = new_fl;
new_fl = NULL;
@@ -2582,13 +2603,14 @@ void locks_remove_file(struct file *filp)
 int
 posix_unblock_lock(struct file_lock *waiter)
 {
-   int status = 0;
+   int status = -ENOENT;
 
spin_lock(_lock_lock);
-   if (waiter->fl_blocker)
+   if (waiter->fl_blocker) {
+   __locks_wake_up_blocks(waiter);
__locks_delete_block(waiter);
-   else
-   status = -ENOENT;
+   status = 0;
+   }
spin_unlock(_lock_lock);
return

[PATCH 2/5] fs/locks: allow a lock request to block other requests.

2018-08-08 Thread NeilBrown

Currently, a lock can block pending requests, but all pending
requests are equal.  If lots of pending requests are
mutually exclusive, this means they will all be woken up
and all but one will fail.  This can hurt performance.

So we will allow pending requests to block other requests.
Only the first request will be woken, and it will wake the others.

This patch doesn't implement this fully, but prepares the way.

- It acknowledges that a request might be blocking other requests,
  and when the request is converted to a lock, those blocked
  requests are moved across.
- When a request is removed, all blocked requests are woken.
- When deadlock-detection looks for the lock which blocks a
  given request, we follow the chain of ->fl_blocker all
  the way to the top.

Signed-off-by: NeilBrown 
---
 fs/locks.c |   58 --
 1 file changed, 40 insertions(+), 18 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 322491e70e41..b4812da2a374 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -408,9 +408,19 @@ void locks_copy_lock(struct file_lock *new, struct 
file_lock *fl)
fl->fl_ops->fl_copy_lock(new, fl);
}
 }
-
 EXPORT_SYMBOL(locks_copy_lock);
 
+static void locks_move_blocks(struct file_lock *new, struct file_lock *fl)
+{
+   struct file_lock *f;
+
+   spin_lock(_lock_lock);
+   list_splice_init(>fl_blocked, >fl_blocked);
+   list_for_each_entry(f, >fl_blocked, fl_block)
+   f->fl_blocker = new;
+   spin_unlock(_lock_lock);
+}
+
 static inline int flock_translate_cmd(int cmd) {
if (cmd & LOCK_MAND)
return cmd & (LOCK_MAND | LOCK_RW);
@@ -681,9 +691,25 @@ static void __locks_delete_block(struct file_lock *waiter)
waiter->fl_blocker = NULL;
 }
 
+static void __locks_wake_up_blocks(struct file_lock *blocker)
+{
+   while (!list_empty(>fl_blocked)) {
+   struct file_lock *waiter;
+
+   waiter = list_first_entry(>fl_blocked,
+   struct file_lock, fl_block);
+   __locks_delete_block(waiter);
+   if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
+   waiter->fl_lmops->lm_notify(waiter);
+   else
+   wake_up(>fl_wait);
+   }
+}
+
 static void locks_delete_block(struct file_lock *waiter)
 {
spin_lock(_lock_lock);
+   __locks_wake_up_blocks(waiter);
__locks_delete_block(waiter);
spin_unlock(_lock_lock);
 }
@@ -735,17 +761,7 @@ static void locks_wake_up_blocks(struct file_lock *blocker)
return;
 
spin_lock(_lock_lock);
-   while (!list_empty(>fl_blocked)) {
-   struct file_lock *waiter;
-
-   waiter = list_first_entry(>fl_blocked,
-   struct file_lock, fl_block);
-   __locks_delete_block(waiter);
-   if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
-   waiter->fl_lmops->lm_notify(waiter);
-   else
-   wake_up(>fl_wait);
-   }
+   __locks_wake_up_blocks(blocker);
spin_unlock(_lock_lock);
 }
 
@@ -888,8 +904,11 @@ static struct file_lock *what_owner_is_waiting_for(struct 
file_lock *block_fl)
struct file_lock *fl;
 
hash_for_each_possible(blocked_hash, fl, fl_link, 
posix_owner_key(block_fl)) {
-   if (posix_same_owner(fl, block_fl))
-   return fl->fl_blocker;
+   if (posix_same_owner(fl, block_fl)) {
+   while (fl->fl_blocker)
+   fl = fl->fl_blocker;
+   return fl;
+   }
}
return NULL;
 }
@@ -982,6 +1001,7 @@ static int flock_lock_inode(struct inode *inode, struct 
file_lock *request)
if (request->fl_flags & FL_ACCESS)
goto out;
locks_copy_lock(new_fl, request);
+   locks_move_blocks(new_fl, request);
locks_insert_lock_ctx(new_fl, >flc_flock);
new_fl = NULL;
error = 0;
@@ -1174,6 +1194,7 @@ static int posix_lock_inode(struct inode *inode, struct 
file_lock *request,
goto out;
}
locks_copy_lock(new_fl, request);
+   locks_move_blocks(new_fl, request);
locks_insert_lock_ctx(new_fl, >fl_list);
fl = new_fl;
new_fl = NULL;
@@ -2582,13 +2603,14 @@ void locks_remove_file(struct file *filp)
 int
 posix_unblock_lock(struct file_lock *waiter)
 {
-   int status = 0;
+   int status = -ENOENT;
 
spin_lock(_lock_lock);
-   if (waiter->fl_blocker)
+   if (waiter->fl_blocker) {
+   __locks_wake_up_blocks(waiter);
__locks_delete_block(waiter);
-   else
-   status = -ENOENT;
+   status = 0;
+   }
spin_unlock(_lock_lock);
return

[PATCH 1/5] fs/locks: rename some lists and pointers.

2018-08-08 Thread NeilBrown

struct file lock contains an 'fl_next' pointer which
is used to point to the lock that this request is blocked
waiting for.  So rename it to fl_blocker.

The fl_blocked list_head in an active lock is the head of a list of
blocked requests.  In a request it is a node in that list.
These are two distinct uses, so replace with two list_heads
with different names.
fl_blocked is the head of a list of blocked requests
fl_block is a node on that list.

This separation will be used in the next patch.

Note that a tracepoint is changed to report fl_blocker instead
of fl_next.  Might this be an ABI breakage?

Signed-off-by: NeilBrown 
---
 fs/cifs/file.c  |2 +-
 fs/locks.c  |   40 ---
 include/linux/fs.h  |5 +++--
 include/trace/events/filelock.h |   16 
 4 files changed, 33 insertions(+), 30 deletions(-)

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 8d41ca7bfcf1..066ed2e4ba96 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1092,7 +1092,7 @@ cifs_posix_lock_set(struct file *file, struct file_lock 
*flock)
rc = posix_lock_file(file, flock, NULL);
up_write(>lock_sem);
if (rc == FILE_LOCK_DEFERRED) {
-   rc = wait_event_interruptible(flock->fl_wait, !flock->fl_next);
+   rc = wait_event_interruptible(flock->fl_wait, 
!flock->fl_blocker);
if (!rc)
goto try_again;
posix_unblock_lock(flock);
diff --git a/fs/locks.c b/fs/locks.c
index db7b6917d9c5..322491e70e41 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -194,7 +194,7 @@ static DEFINE_HASHTABLE(blocked_hash, BLOCKED_HASH_BITS);
  * This lock protects the blocked_hash. Generally, if you're accessing it, you
  * want to be holding this lock.
  *
- * In addition, it also protects the fl->fl_block list, and the fl->fl_next
+ * In addition, it also protects the fl->fl_blocked list, and the 
fl->fl_blocker
  * pointer for file_lock structures that are acting as lock requests (in
  * contrast to those that are acting as records of acquired locks).
  *
@@ -203,7 +203,7 @@ static DEFINE_HASHTABLE(blocked_hash, BLOCKED_HASH_BITS);
  * protected by this lock, we can skip acquiring it iff we already hold the
  * flc_lock.
  *
- * In particular, adding an entry to the fl_block list requires that you hold
+ * In particular, adding an entry to the fl_blocked list requires that you hold
  * both the flc_lock and the blocked_lock_lock (acquired in that order).
  * Deleting an entry from the list however only requires the file_lock_lock.
  */
@@ -302,6 +302,7 @@ static void locks_init_lock_heads(struct file_lock *fl)
 {
INIT_HLIST_NODE(>fl_link);
INIT_LIST_HEAD(>fl_list);
+   INIT_LIST_HEAD(>fl_blocked);
INIT_LIST_HEAD(>fl_block);
init_waitqueue_head(>fl_wait);
 }
@@ -341,6 +342,7 @@ void locks_free_lock(struct file_lock *fl)
 {
BUG_ON(waitqueue_active(>fl_wait));
BUG_ON(!list_empty(>fl_list));
+   BUG_ON(!list_empty(>fl_blocked));
BUG_ON(!list_empty(>fl_block));
BUG_ON(!hlist_unhashed(>fl_link));
 
@@ -676,7 +678,7 @@ static void __locks_delete_block(struct file_lock *waiter)
 {
locks_delete_global_blocked(waiter);
list_del_init(>fl_block);
-   waiter->fl_next = NULL;
+   waiter->fl_blocker = NULL;
 }
 
 static void locks_delete_block(struct file_lock *waiter)
@@ -692,16 +694,16 @@ static void locks_delete_block(struct file_lock *waiter)
  * it seems like the reasonable thing to do.
  *
  * Must be called with both the flc_lock and blocked_lock_lock held. The
- * fl_block list itself is protected by the blocked_lock_lock, but by ensuring
+ * fl_blocked list itself is protected by the blocked_lock_lock, but by 
ensuring
  * that the flc_lock is also held on insertions we can avoid taking the
- * blocked_lock_lock in some cases when we see that the fl_block list is empty.
+ * blocked_lock_lock in some cases when we see that the fl_blocked list is 
empty.
  */
 static void __locks_insert_block(struct file_lock *blocker,
struct file_lock *waiter)
 {
BUG_ON(!list_empty(>fl_block));
-   waiter->fl_next = blocker;
-   list_add_tail(>fl_block, >fl_block);
+   waiter->fl_blocker = blocker;
+   list_add_tail(>fl_block, >fl_blocked);
if (IS_POSIX(blocker) && !IS_OFDLCK(blocker))
locks_insert_global_blocked(waiter);
 }
@@ -725,18 +727,18 @@ static void locks_wake_up_blocks(struct file_lock 
*blocker)
/*
 * Avoid taking global lock if list is empty. This is safe since new
 * blocked requests are only added to the list under the flc_lock, and
-* the flc_lock is always held here. Note that removal from the fl_block
+* the flc_lock is always held here. Note that removal from the 
fl_blocked
 * list does not require the flc_lock, so we must recheck

[PATCH 1/5] fs/locks: rename some lists and pointers.

2018-08-08 Thread NeilBrown

struct file lock contains an 'fl_next' pointer which
is used to point to the lock that this request is blocked
waiting for.  So rename it to fl_blocker.

The fl_blocked list_head in an active lock is the head of a list of
blocked requests.  In a request it is a node in that list.
These are two distinct uses, so replace with two list_heads
with different names.
fl_blocked is the head of a list of blocked requests
fl_block is a node on that list.

This separation will be used in the next patch.

Note that a tracepoint is changed to report fl_blocker instead
of fl_next.  Might this be an ABI breakage?

Signed-off-by: NeilBrown 
---
 fs/cifs/file.c  |2 +-
 fs/locks.c  |   40 ---
 include/linux/fs.h  |5 +++--
 include/trace/events/filelock.h |   16 
 4 files changed, 33 insertions(+), 30 deletions(-)

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 8d41ca7bfcf1..066ed2e4ba96 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1092,7 +1092,7 @@ cifs_posix_lock_set(struct file *file, struct file_lock 
*flock)
rc = posix_lock_file(file, flock, NULL);
up_write(>lock_sem);
if (rc == FILE_LOCK_DEFERRED) {
-   rc = wait_event_interruptible(flock->fl_wait, !flock->fl_next);
+   rc = wait_event_interruptible(flock->fl_wait, 
!flock->fl_blocker);
if (!rc)
goto try_again;
posix_unblock_lock(flock);
diff --git a/fs/locks.c b/fs/locks.c
index db7b6917d9c5..322491e70e41 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -194,7 +194,7 @@ static DEFINE_HASHTABLE(blocked_hash, BLOCKED_HASH_BITS);
  * This lock protects the blocked_hash. Generally, if you're accessing it, you
  * want to be holding this lock.
  *
- * In addition, it also protects the fl->fl_block list, and the fl->fl_next
+ * In addition, it also protects the fl->fl_blocked list, and the 
fl->fl_blocker
  * pointer for file_lock structures that are acting as lock requests (in
  * contrast to those that are acting as records of acquired locks).
  *
@@ -203,7 +203,7 @@ static DEFINE_HASHTABLE(blocked_hash, BLOCKED_HASH_BITS);
  * protected by this lock, we can skip acquiring it iff we already hold the
  * flc_lock.
  *
- * In particular, adding an entry to the fl_block list requires that you hold
+ * In particular, adding an entry to the fl_blocked list requires that you hold
  * both the flc_lock and the blocked_lock_lock (acquired in that order).
  * Deleting an entry from the list however only requires the file_lock_lock.
  */
@@ -302,6 +302,7 @@ static void locks_init_lock_heads(struct file_lock *fl)
 {
INIT_HLIST_NODE(>fl_link);
INIT_LIST_HEAD(>fl_list);
+   INIT_LIST_HEAD(>fl_blocked);
INIT_LIST_HEAD(>fl_block);
init_waitqueue_head(>fl_wait);
 }
@@ -341,6 +342,7 @@ void locks_free_lock(struct file_lock *fl)
 {
BUG_ON(waitqueue_active(>fl_wait));
BUG_ON(!list_empty(>fl_list));
+   BUG_ON(!list_empty(>fl_blocked));
BUG_ON(!list_empty(>fl_block));
BUG_ON(!hlist_unhashed(>fl_link));
 
@@ -676,7 +678,7 @@ static void __locks_delete_block(struct file_lock *waiter)
 {
locks_delete_global_blocked(waiter);
list_del_init(>fl_block);
-   waiter->fl_next = NULL;
+   waiter->fl_blocker = NULL;
 }
 
 static void locks_delete_block(struct file_lock *waiter)
@@ -692,16 +694,16 @@ static void locks_delete_block(struct file_lock *waiter)
  * it seems like the reasonable thing to do.
  *
  * Must be called with both the flc_lock and blocked_lock_lock held. The
- * fl_block list itself is protected by the blocked_lock_lock, but by ensuring
+ * fl_blocked list itself is protected by the blocked_lock_lock, but by 
ensuring
  * that the flc_lock is also held on insertions we can avoid taking the
- * blocked_lock_lock in some cases when we see that the fl_block list is empty.
+ * blocked_lock_lock in some cases when we see that the fl_blocked list is 
empty.
  */
 static void __locks_insert_block(struct file_lock *blocker,
struct file_lock *waiter)
 {
BUG_ON(!list_empty(>fl_block));
-   waiter->fl_next = blocker;
-   list_add_tail(>fl_block, >fl_block);
+   waiter->fl_blocker = blocker;
+   list_add_tail(>fl_block, >fl_blocked);
if (IS_POSIX(blocker) && !IS_OFDLCK(blocker))
locks_insert_global_blocked(waiter);
 }
@@ -725,18 +727,18 @@ static void locks_wake_up_blocks(struct file_lock 
*blocker)
/*
 * Avoid taking global lock if list is empty. This is safe since new
 * blocked requests are only added to the list under the flc_lock, and
-* the flc_lock is always held here. Note that removal from the fl_block
+* the flc_lock is always held here. Note that removal from the 
fl_blocked
 * list does not require the flc_lock, so we must recheck

[PATCH 0/5 - V2] locks: avoid thundering-herd wake-ups

2018-08-08 Thread NeilBrown

This series adds "wake_non_conflicts()" to wake up
any waiters that are being added beneath a lock that they
don't actually conflict with, even though they conflict with a parent
which conflict with the top-level blocker.
Thanks for Bruce for highlighting this issue.

This series hasn't been tested beyond compile-test.

Original description:
If you have a many-core machine, and have many threads all wanting to
briefly lock a give file (udev is known to do this), you can get quite
poor performance.

When one thread releases a lock, it wakes up all other threads that
are waiting (classic thundering-herd) - one will get the lock and the
others go to sleep.
When you have few cores, this is not very noticeable: by the time the
4th or 5th thread gets enough CPU time to try to claim the lock, the
earlier threads have claimed it, done what was needed, and released.
With 50+ cores, the contention can easily be measured.

This patchset creates a tree of pending lock request in which siblings
don't conflict and each lock request does conflict with its parent.
When a lock is released, only requests which don't conflict with each
other a woken.

Testing shows that lock-acquisitions-per-second is now fairly stable even
as number of contending process goes to 1000.  Without this patch,
locks-per-second drops off steeply after a few 10s of processes.

There is a small cost to this extra complexity.
At 20 processes running a particular test on 72 cores, the lock
acquisitions per second drops from 1.8 million to 1.4 million with
this patch.  For 100 processes, this patch still provides 1.4 million
while without this patch there are about 700,000.

NeilBrown


---

NeilBrown (5):
  fs/locks: rename some lists and pointers.
  fs/locks: allow a lock request to block other requests.
  fs/locks: change all *_conflict() functions to return a new enum.
  fs/locks: split out __locks_wake_one()
  fs/locks: create a tree of dependent requests.


 fs/cifs/file.c  |2 
 fs/locks.c  |  228 ++-
 include/linux/fs.h  |5 +
 include/trace/events/filelock.h |   16 +--
 4 files changed, 186 insertions(+), 65 deletions(-)

--
Signature

[PATCH 0/5 - V2] locks: avoid thundering-herd wake-ups

2018-08-08 Thread NeilBrown

This series adds "wake_non_conflicts()" to wake up
any waiters that are being added beneath a lock that they
don't actually conflict with, even though they conflict with a parent
which conflict with the top-level blocker.
Thanks for Bruce for highlighting this issue.

This series hasn't been tested beyond compile-test.

Original description:
If you have a many-core machine, and have many threads all wanting to
briefly lock a give file (udev is known to do this), you can get quite
poor performance.

When one thread releases a lock, it wakes up all other threads that
are waiting (classic thundering-herd) - one will get the lock and the
others go to sleep.
When you have few cores, this is not very noticeable: by the time the
4th or 5th thread gets enough CPU time to try to claim the lock, the
earlier threads have claimed it, done what was needed, and released.
With 50+ cores, the contention can easily be measured.

This patchset creates a tree of pending lock request in which siblings
don't conflict and each lock request does conflict with its parent.
When a lock is released, only requests which don't conflict with each
other a woken.

Testing shows that lock-acquisitions-per-second is now fairly stable even
as number of contending process goes to 1000.  Without this patch,
locks-per-second drops off steeply after a few 10s of processes.

There is a small cost to this extra complexity.
At 20 processes running a particular test on 72 cores, the lock
acquisitions per second drops from 1.8 million to 1.4 million with
this patch.  For 100 processes, this patch still provides 1.4 million
while without this patch there are about 700,000.

NeilBrown


---

NeilBrown (5):
  fs/locks: rename some lists and pointers.
  fs/locks: allow a lock request to block other requests.
  fs/locks: change all *_conflict() functions to return a new enum.
  fs/locks: split out __locks_wake_one()
  fs/locks: create a tree of dependent requests.


 fs/cifs/file.c  |2 
 fs/locks.c  |  228 ++-
 include/linux/fs.h  |5 +
 include/trace/events/filelock.h |   16 +--
 4 files changed, 186 insertions(+), 65 deletions(-)

--
Signature

Re: [PATCH] net/9p/trans_virtio.c: decrease the refcount of 9p virtio device when removing it

2018-08-08 Thread Dominique Martinet

piaojun wrote on Thu, Aug 09, 2018:
> > What exact kernel commit are you running?
>
> My kernel commit id 6edf1d4cb0acde, and I replace the 9p code with
> 9p-next. And I wonder if this will work well?

That is somewhere on top of 4.18-rc1 and got merged in 4.18-rc4, which
are close enough so while I can question the practice I don't see why
not.

I've just tried the following:
$ git checkout 6edf1d4cb0acde
$ git checkout martinetd/9p-next net/9p fs/9p include/net/9p
(martinetd/9p-next is 9f961802a7 as of this mail)

$ uname -r
4.18.0-rc1+
$ lsmod | grep -E '^9pnet_virtio' || echo "not loaded"
9pnet_virtio   32768  0
$ sudo modprobe -r 9pnet_virtio
$ lsmod | grep -E '^9pnet_virtio' || echo "not loaded"
not loaded
$ sudo modprobe 9pnet_virtio
$ sudo mount -t 9p -o debug=1,trans=virtio shm /mnt
$ ls /mnt

$ cat /sys/module/9pnet_virtio/drivers/virtio\:9pnet_virtio/*/mount_tag
tmpshm (these could use a new line...)
$ sudo umount /mnt
$ sudo modprobe -r 9pnet_virtio
$ lsmod | grep -E '^9pnet_virtio' || echo "not loaded"
not loaded

The /sys/devices/pci*/*/virtio*/mount_tag files are also removed
properly; I don't see any problem.


Not being able to reproduce is fine in general, but I also get problems
when applying the patch and unloading the module multiple times so I
can't help but question this patch and think your problem lies somewhere
else.

-- 
Dominique

Re: [PATCH] net/9p/trans_virtio.c: decrease the refcount of 9p virtio device when removing it

2018-08-08 Thread Dominique Martinet

piaojun wrote on Thu, Aug 09, 2018:
> > What exact kernel commit are you running?
>
> My kernel commit id 6edf1d4cb0acde, and I replace the 9p code with
> 9p-next. And I wonder if this will work well?

That is somewhere on top of 4.18-rc1 and got merged in 4.18-rc4, which
are close enough so while I can question the practice I don't see why
not.

I've just tried the following:
$ git checkout 6edf1d4cb0acde
$ git checkout martinetd/9p-next net/9p fs/9p include/net/9p
(martinetd/9p-next is 9f961802a7 as of this mail)

$ uname -r
4.18.0-rc1+
$ lsmod | grep -E '^9pnet_virtio' || echo "not loaded"
9pnet_virtio   32768  0
$ sudo modprobe -r 9pnet_virtio
$ lsmod | grep -E '^9pnet_virtio' || echo "not loaded"
not loaded
$ sudo modprobe 9pnet_virtio
$ sudo mount -t 9p -o debug=1,trans=virtio shm /mnt
$ ls /mnt

$ cat /sys/module/9pnet_virtio/drivers/virtio\:9pnet_virtio/*/mount_tag
tmpshm (these could use a new line...)
$ sudo umount /mnt
$ sudo modprobe -r 9pnet_virtio
$ lsmod | grep -E '^9pnet_virtio' || echo "not loaded"
not loaded

The /sys/devices/pci*/*/virtio*/mount_tag files are also removed
properly; I don't see any problem.


Not being able to reproduce is fine in general, but I also get problems
when applying the patch and unloading the module multiple times so I
can't help but question this patch and think your problem lies somewhere
else.

-- 
Dominique

Re: [PATCH RFC ftrace/core] tracing: Directly call tracer and lockdep probes

2018-08-08 Thread Steven Rostedt

On Wed, 8 Aug 2018 15:20:22 -0700
Joel Fernandes  wrote:

> On Wed, Aug 8, 2018 at 3:00 PM, Joel Fernandes (Google)
>  wrote:
> > From: Steven Rostedt 
> >  
> 
> I realize I kept your authorship since I started working based on your patch 
> ;-)
> 
> Well sorry about that, and you could change it to mine or keep it as
> is it is if you want ;-)
> 

Actually I already started testing a second patch that basically does
your extension ;-) I'll post it shortly.

-- Steve

Re: [PATCH RFC ftrace/core] tracing: Directly call tracer and lockdep probes

2018-08-08 Thread Steven Rostedt

On Wed, 8 Aug 2018 15:20:22 -0700
Joel Fernandes  wrote:

> On Wed, Aug 8, 2018 at 3:00 PM, Joel Fernandes (Google)
>  wrote:
> > From: Steven Rostedt 
> >  
> 
> I realize I kept your authorship since I started working based on your patch 
> ;-)
> 
> Well sorry about that, and you could change it to mine or keep it as
> is it is if you want ;-)
> 

Actually I already started testing a second patch that basically does
your extension ;-) I'll post it shortly.

-- Steve

Re: [PATCH 0/4] locks: avoid thundering-herd wake-ups

2018-08-08 Thread NeilBrown

On Wed, Aug 08 2018, J. Bruce Fields wrote:

> On Wed, Aug 08, 2018 at 12:47:22PM -0400, Jeff Layton wrote:
>> On Wed, 2018-08-08 at 11:51 +1000, NeilBrown wrote:
>> > If you have a many-core machine, and have many threads all wanting to
>> > briefly lock a give file (udev is known to do this), you can get quite
>> > poor performance.
>> > 
>> > When one thread releases a lock, it wakes up all other threads that
>> > are waiting (classic thundering-herd) - one will get the lock and the
>> > others go to sleep.
>> > When you have few cores, this is not very noticeable: by the time the
>> > 4th or 5th thread gets enough CPU time to try to claim the lock, the
>> > earlier threads have claimed it, done what was needed, and released.
>> > With 50+ cores, the contention can easily be measured.
>> > 
>> > This patchset creates a tree of pending lock request in which siblings
>> > don't conflict and each lock request does conflict with its parent.
>> > When a lock is released, only requests which don't conflict with each
>> > other a woken.
>> > 
>> > Testing shows that lock-acquisitions-per-second is now fairly stable even
>> > as number of contending process goes to 1000.  Without this patch,
>> > locks-per-second drops off steeply after a few 10s of processes.
>> > 
>> > There is a small cost to this extra complexity.
>> > At 20 processes running a particular test on 72 cores, the lock
>> > acquisitions per second drops from 1.8 million to 1.4 million with
>> > this patch.  For 100 processes, this patch still provides 1.4 million
>> > while without this patch there are about 700,000.
>> > 
>> > NeilBrown
>> > 
>> > ---
>> > 
>> > NeilBrown (4):
>> >   fs/locks: rename some lists and pointers.
>> >   fs/locks: allow a lock request to block other requests.
>> >   fs/locks: change all *_conflict() functions to return bool.
>> >   fs/locks: create a tree of dependent requests.
>> > 
>> > 
>> >  fs/cifs/file.c  |2 -
>> >  fs/locks.c  |  142 
>> > +--
>> >  include/linux/fs.h  |5 +
>> >  include/trace/events/filelock.h |   16 ++--
>> >  4 files changed, 103 insertions(+), 62 deletions(-)
>> > 
>> 
>> Nice work! I looked over this and I think it looks good.
>> 
>> I made an attempt to fix this issue several years ago, but my method
>> sucked as it ended up penalizing the unlocking task too much. This is
>> much cleaner and should scale well overall, I think.
>
> I think I also took a crack at this at one point while I was at UM/CITI
> and never got anything I was happy with.  Looks like good work!
>
> I remember one main obstacle that I felt like I never had a good
> benchmark
>
> How did you choose this workload and hardware?  Was it in fact udev
> (booting a large machine?), or was there some other motivation?

I'm hoping Martin will chime in here - her identified the problem and
did most of the testing...

NeilBrown


>
> Not that I'm likely to do it any time soon, but could you share
> sufficient details for someone else to reproduce your results?
>
> --b.


signature.asc
Description: PGP signature

Re: [PATCH 0/4] locks: avoid thundering-herd wake-ups

2018-08-08 Thread NeilBrown

On Wed, Aug 08 2018, J. Bruce Fields wrote:

> On Wed, Aug 08, 2018 at 12:47:22PM -0400, Jeff Layton wrote:
>> On Wed, 2018-08-08 at 11:51 +1000, NeilBrown wrote:
>> > If you have a many-core machine, and have many threads all wanting to
>> > briefly lock a give file (udev is known to do this), you can get quite
>> > poor performance.
>> > 
>> > When one thread releases a lock, it wakes up all other threads that
>> > are waiting (classic thundering-herd) - one will get the lock and the
>> > others go to sleep.
>> > When you have few cores, this is not very noticeable: by the time the
>> > 4th or 5th thread gets enough CPU time to try to claim the lock, the
>> > earlier threads have claimed it, done what was needed, and released.
>> > With 50+ cores, the contention can easily be measured.
>> > 
>> > This patchset creates a tree of pending lock request in which siblings
>> > don't conflict and each lock request does conflict with its parent.
>> > When a lock is released, only requests which don't conflict with each
>> > other a woken.
>> > 
>> > Testing shows that lock-acquisitions-per-second is now fairly stable even
>> > as number of contending process goes to 1000.  Without this patch,
>> > locks-per-second drops off steeply after a few 10s of processes.
>> > 
>> > There is a small cost to this extra complexity.
>> > At 20 processes running a particular test on 72 cores, the lock
>> > acquisitions per second drops from 1.8 million to 1.4 million with
>> > this patch.  For 100 processes, this patch still provides 1.4 million
>> > while without this patch there are about 700,000.
>> > 
>> > NeilBrown
>> > 
>> > ---
>> > 
>> > NeilBrown (4):
>> >   fs/locks: rename some lists and pointers.
>> >   fs/locks: allow a lock request to block other requests.
>> >   fs/locks: change all *_conflict() functions to return bool.
>> >   fs/locks: create a tree of dependent requests.
>> > 
>> > 
>> >  fs/cifs/file.c  |2 -
>> >  fs/locks.c  |  142 
>> > +--
>> >  include/linux/fs.h  |5 +
>> >  include/trace/events/filelock.h |   16 ++--
>> >  4 files changed, 103 insertions(+), 62 deletions(-)
>> > 
>> 
>> Nice work! I looked over this and I think it looks good.
>> 
>> I made an attempt to fix this issue several years ago, but my method
>> sucked as it ended up penalizing the unlocking task too much. This is
>> much cleaner and should scale well overall, I think.
>
> I think I also took a crack at this at one point while I was at UM/CITI
> and never got anything I was happy with.  Looks like good work!
>
> I remember one main obstacle that I felt like I never had a good
> benchmark
>
> How did you choose this workload and hardware?  Was it in fact udev
> (booting a large machine?), or was there some other motivation?

I'm hoping Martin will chime in here - her identified the problem and
did most of the testing...

NeilBrown


>
> Not that I'm likely to do it any time soon, but could you share
> sufficient details for someone else to reproduce your results?
>
> --b.


signature.asc
Description: PGP signature

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1224 matches

Mail list logo