Re: [RFC PATCH 2/7] ALSA: ac97: add an ac97 bus

2016-05-15 Thread Takashi Iwai
On Sun, 15 May 2016 23:29:27 +0200,
Robert Jarzmik wrote:
> 
> Takashi Iwai  writes:
> 
> > On Sat, 14 May 2016 11:50:50 +0200,
> > Robert Jarzmik wrote:
> >> >> +unsigned int ac97_bus_scan_one(struct ac97_controller *ac97,
> >> >> + int codec_num)
> >> >> +{
> >> >> +   struct ac97_codec_device codec;
> >> >> +   unsigned short vid1, vid2;
> >> >> +   int ret;
> >> >> +
> >> >> +   codec.dev = *ac97->dev;
> >> >> +   codec.num = codec_num;
> >> >> +   ret = ac97->ops->read(, AC97_VENDOR_ID1);
> >> >> +   vid1 = (ret & 0x);
> >> >> +   if (ret < 0)
> >> >> +   return 0;
> >> >
> >> > Hmm.  This looks pretty hackish and dangerous.
> >> You mean returning 0 even if the read failed, right ?
> >
> > No, my concern is that it's creating a dummy codec object temporarily
> > on the stack just by copying some fields and calling the ops with it.
> > (And actually the current code may work wrongly because lack of
> >  zero-clear of the object.)
> Ah yes, I remember now, the on-stack generated device, indeed ugly.
> 
> > IMO, a cleaner way would be to define the ops passed with both
> > controller and codec objects as arguments, and pass NULL codec here.
> It's rather unusual to need both the device and its controller in bus
> operations. I must admit I have no better idea so far, so I'll try that just 
> to
> see how it looks like, and let's see next ...

Thinking of this again, I wonder now why we need to pass the codec
object at all.  It's the read/write ops via ac97, so we just need the
ac97_controller object and the address slot of the accessed codec?


Takashi


Re: [RFC PATCH 2/7] ALSA: ac97: add an ac97 bus

2016-05-15 Thread Takashi Iwai
On Sun, 15 May 2016 23:29:27 +0200,
Robert Jarzmik wrote:
> 
> Takashi Iwai  writes:
> 
> > On Sat, 14 May 2016 11:50:50 +0200,
> > Robert Jarzmik wrote:
> >> >> +unsigned int ac97_bus_scan_one(struct ac97_controller *ac97,
> >> >> + int codec_num)
> >> >> +{
> >> >> +   struct ac97_codec_device codec;
> >> >> +   unsigned short vid1, vid2;
> >> >> +   int ret;
> >> >> +
> >> >> +   codec.dev = *ac97->dev;
> >> >> +   codec.num = codec_num;
> >> >> +   ret = ac97->ops->read(, AC97_VENDOR_ID1);
> >> >> +   vid1 = (ret & 0x);
> >> >> +   if (ret < 0)
> >> >> +   return 0;
> >> >
> >> > Hmm.  This looks pretty hackish and dangerous.
> >> You mean returning 0 even if the read failed, right ?
> >
> > No, my concern is that it's creating a dummy codec object temporarily
> > on the stack just by copying some fields and calling the ops with it.
> > (And actually the current code may work wrongly because lack of
> >  zero-clear of the object.)
> Ah yes, I remember now, the on-stack generated device, indeed ugly.
> 
> > IMO, a cleaner way would be to define the ops passed with both
> > controller and codec objects as arguments, and pass NULL codec here.
> It's rather unusual to need both the device and its controller in bus
> operations. I must admit I have no better idea so far, so I'll try that just 
> to
> see how it looks like, and let's see next ...

Thinking of this again, I wonder now why we need to pass the codec
object at all.  It's the read/write ops via ac97, so we just need the
ac97_controller object and the address slot of the accessed codec?


Takashi


Re: [PATCH] ixgbe: take online CPU number as MQ max limit when alloc_etherdev_mq()

2016-05-15 Thread ethan zhao

Thanks for your reviewing.

Ethan

On 2016/5/13 20:52, Sergei Shtylyov wrote:

Hello.

On 5/13/2016 8:56 AM, Ethan Zhao wrote:


Allocating 64 Tx/Rx as default doesn't benefit perfomrnace when less


   Performance.


CPUs were assigned. especially when DCB is enabled, so we should take
num_online_cpus() as top limit, and aslo to make sure every TC has


   Also.


at least one queue, take the MAX_TRAFFIC_CLASS as bottom limit of queues
number.

Signed-off-by: Ethan Zhao 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c

index 7df3fe2..1f9769c 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -9105,6 +9105,10 @@ static int ixgbe_probe(struct pci_dev *pdev, 
const struct pci_device_id *ent)

 indices = IXGBE_MAX_RSS_INDICES;
 #endif
 }
+/* Don't allocate too more queues than online cpus number */


   "Too" not needed here. CPUs.


+indices = min_t(int, indices, num_online_cpus());
+/* To make sure TC works, allocate at least 1 queue per TC */
+indices = max_t(int, indices, MAX_TRAFFIC_CLASS);

 netdev = alloc_etherdev_mq(sizeof(struct ixgbe_adapter), indices);
 if (!netdev) {


MBR, Sergei






Re: [PATCH] ixgbe: take online CPU number as MQ max limit when alloc_etherdev_mq()

2016-05-15 Thread ethan zhao

Thanks for your reviewing.

Ethan

On 2016/5/13 20:52, Sergei Shtylyov wrote:

Hello.

On 5/13/2016 8:56 AM, Ethan Zhao wrote:


Allocating 64 Tx/Rx as default doesn't benefit perfomrnace when less


   Performance.


CPUs were assigned. especially when DCB is enabled, so we should take
num_online_cpus() as top limit, and aslo to make sure every TC has


   Also.


at least one queue, take the MAX_TRAFFIC_CLASS as bottom limit of queues
number.

Signed-off-by: Ethan Zhao 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c

index 7df3fe2..1f9769c 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -9105,6 +9105,10 @@ static int ixgbe_probe(struct pci_dev *pdev, 
const struct pci_device_id *ent)

 indices = IXGBE_MAX_RSS_INDICES;
 #endif
 }
+/* Don't allocate too more queues than online cpus number */


   "Too" not needed here. CPUs.


+indices = min_t(int, indices, num_online_cpus());
+/* To make sure TC works, allocate at least 1 queue per TC */
+indices = max_t(int, indices, MAX_TRAFFIC_CLASS);

 netdev = alloc_etherdev_mq(sizeof(struct ixgbe_adapter), indices);
 if (!netdev) {


MBR, Sergei






RE: [PATCH 2/2] serial: 8250_mid: Read RX buffer on RX DMA timeout for DNV

2016-05-15 Thread Chuah, Kim Tatt


>-Original Message-
>From: Andy Shevchenko [mailto:andriy.shevche...@linux.intel.com]
>Sent: Friday, May 13, 2016 7:28 PM
>To: lkp ; Chuah, Kim Tatt ; Peter
>Hurley 
>Cc: kbuild-...@01.org; gre...@linuxfoundation.org; Koul, Vinod
>; heikki.kroge...@linux.intel.com;
>mika.westerb...@linux.intel.com; linux-kernel@vger.kernel.org; Tan, Jui Nee
>
>Subject: Re: [PATCH 2/2] serial: 8250_mid: Read RX buffer on RX DMA timeout for
>DNV
>
>On Fri, 2016-05-13 at 18:15 +0800, kbuild test robot wrote:
>> Hi,
>>
>> [auto build test ERROR on next-20160513] [cannot apply to
>> tty/tty-testing usb/usb-testing v4.6-rc7 v4.6-rc6
>> v4.6-rc5 v4.6-rc7]
>> [if your patch is applied to the wrong git tree, please drop us a note
>> to help improving the system]
>>
>> url:https://github.com/0day-ci/linux/commits/Chuah-Kim-Tatt/Fix-DN
>> V-HSUART-RX-DMA-timeout-interrupt-issue/20160513-162046
>> config: i386-randconfig-s0-201619 (attached as .config)
>> compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430
>> reproduce:
>> # save the attached .config to linux build tree
>> make ARCH=i386
>>
>> Note: the linux-review/Chuah-Kim-Tatt/Fix-DNV-HSUART-RX-DMA-timeout-
>> interrupt-issue/20160513-162046 HEAD
>> 0354112aa9821bec8d278ad06b3d543724f5291d builds fine.
>>   It only hurts bisectibility.
>>
>> All errors (new ones prefixed by >>):
>
>Peter, what happened to your DMA series in the linux-next? Did I miss any
>discussion related?
>
Hi Andy,
The error occurs when CONFIG_8250_MID is set to "m", because 
serial8250_rx_dma_flush() was not exported. Please advise.
>>
>> >
>> > >
>> > > ERROR: "serial8250_rx_dma_flush"
>> > > [drivers/tty/serial/8250/8250_mid.ko] undefined!
>> ---
>> 0-DAY kernel test infrastructureOpen Source Technology
>> Center https://lists.01.org/pipermail/kbuild-all
>> Intel Corporation
>
>--
>Andy Shevchenko 
>Intel Finland Oy



RE: [PATCH 2/2] serial: 8250_mid: Read RX buffer on RX DMA timeout for DNV

2016-05-15 Thread Chuah, Kim Tatt


>-Original Message-
>From: Andy Shevchenko [mailto:andriy.shevche...@linux.intel.com]
>Sent: Friday, May 13, 2016 7:28 PM
>To: lkp ; Chuah, Kim Tatt ; Peter
>Hurley 
>Cc: kbuild-...@01.org; gre...@linuxfoundation.org; Koul, Vinod
>; heikki.kroge...@linux.intel.com;
>mika.westerb...@linux.intel.com; linux-kernel@vger.kernel.org; Tan, Jui Nee
>
>Subject: Re: [PATCH 2/2] serial: 8250_mid: Read RX buffer on RX DMA timeout for
>DNV
>
>On Fri, 2016-05-13 at 18:15 +0800, kbuild test robot wrote:
>> Hi,
>>
>> [auto build test ERROR on next-20160513] [cannot apply to
>> tty/tty-testing usb/usb-testing v4.6-rc7 v4.6-rc6
>> v4.6-rc5 v4.6-rc7]
>> [if your patch is applied to the wrong git tree, please drop us a note
>> to help improving the system]
>>
>> url:https://github.com/0day-ci/linux/commits/Chuah-Kim-Tatt/Fix-DN
>> V-HSUART-RX-DMA-timeout-interrupt-issue/20160513-162046
>> config: i386-randconfig-s0-201619 (attached as .config)
>> compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430
>> reproduce:
>> # save the attached .config to linux build tree
>> make ARCH=i386
>>
>> Note: the linux-review/Chuah-Kim-Tatt/Fix-DNV-HSUART-RX-DMA-timeout-
>> interrupt-issue/20160513-162046 HEAD
>> 0354112aa9821bec8d278ad06b3d543724f5291d builds fine.
>>   It only hurts bisectibility.
>>
>> All errors (new ones prefixed by >>):
>
>Peter, what happened to your DMA series in the linux-next? Did I miss any
>discussion related?
>
Hi Andy,
The error occurs when CONFIG_8250_MID is set to "m", because 
serial8250_rx_dma_flush() was not exported. Please advise.
>>
>> >
>> > >
>> > > ERROR: "serial8250_rx_dma_flush"
>> > > [drivers/tty/serial/8250/8250_mid.ko] undefined!
>> ---
>> 0-DAY kernel test infrastructureOpen Source Technology
>> Center https://lists.01.org/pipermail/kbuild-all
>> Intel Corporation
>
>--
>Andy Shevchenko 
>Intel Finland Oy



[PATCH] ARM: DRA7: hwmod: Remove QSPI address space entry from hwmod

2016-05-15 Thread Vignesh R
QSPI address space information is passed from device tree. Therefore
remove legacy way of passing address space via hwmod data.

Signed-off-by: Vignesh R 
---
 arch/arm/mach-omap2/omap_hwmod_7xx_data.c | 10 --
 1 file changed, 10 deletions(-)

diff --git a/arch/arm/mach-omap2/omap_hwmod_7xx_data.c 
b/arch/arm/mach-omap2/omap_hwmod_7xx_data.c
index d0e7e5259ec3..fac0a2044da4 100644
--- a/arch/arm/mach-omap2/omap_hwmod_7xx_data.c
+++ b/arch/arm/mach-omap2/omap_hwmod_7xx_data.c
@@ -3410,21 +3410,11 @@ static struct omap_hwmod_ocp_if dra7xx_l4_cfg__pciess2 
= {
.user   = OCP_USER_MPU | OCP_USER_SDMA,
 };
 
-static struct omap_hwmod_addr_space dra7xx_qspi_addrs[] = {
-   {
-   .pa_start   = 0x4b30,
-   .pa_end = 0x4b30007f,
-   .flags  = ADDR_TYPE_RT
-   },
-   { }
-};
-
 /* l3_main_1 -> qspi */
 static struct omap_hwmod_ocp_if dra7xx_l3_main_1__qspi = {
.master = _l3_main_1_hwmod,
.slave  = _qspi_hwmod,
.clk= "l3_iclk_div",
-   .addr   = dra7xx_qspi_addrs,
.user   = OCP_USER_MPU | OCP_USER_SDMA,
 };
 
-- 
2.8.2



[PATCH] ARM: DRA7: hwmod: Remove QSPI address space entry from hwmod

2016-05-15 Thread Vignesh R
QSPI address space information is passed from device tree. Therefore
remove legacy way of passing address space via hwmod data.

Signed-off-by: Vignesh R 
---
 arch/arm/mach-omap2/omap_hwmod_7xx_data.c | 10 --
 1 file changed, 10 deletions(-)

diff --git a/arch/arm/mach-omap2/omap_hwmod_7xx_data.c 
b/arch/arm/mach-omap2/omap_hwmod_7xx_data.c
index d0e7e5259ec3..fac0a2044da4 100644
--- a/arch/arm/mach-omap2/omap_hwmod_7xx_data.c
+++ b/arch/arm/mach-omap2/omap_hwmod_7xx_data.c
@@ -3410,21 +3410,11 @@ static struct omap_hwmod_ocp_if dra7xx_l4_cfg__pciess2 
= {
.user   = OCP_USER_MPU | OCP_USER_SDMA,
 };
 
-static struct omap_hwmod_addr_space dra7xx_qspi_addrs[] = {
-   {
-   .pa_start   = 0x4b30,
-   .pa_end = 0x4b30007f,
-   .flags  = ADDR_TYPE_RT
-   },
-   { }
-};
-
 /* l3_main_1 -> qspi */
 static struct omap_hwmod_ocp_if dra7xx_l3_main_1__qspi = {
.master = _l3_main_1_hwmod,
.slave  = _qspi_hwmod,
.clk= "l3_iclk_div",
-   .addr   = dra7xx_qspi_addrs,
.user   = OCP_USER_MPU | OCP_USER_SDMA,
 };
 
-- 
2.8.2



RE: [PATCH V2 1/2] drivers: i2c: qup: Fix broken dma when CONFIG_DEBUG_SG is enabled

2016-05-15 Thread Sricharan
Hi,
> > With CONFIG_DEBUG_SG is enabled and when dma mode is used, below
> dump
> > is seen,
> >
> > [ cut here ]
> > kernel BUG at include/linux/scatterlist.h:140!
> > Internal error: Oops - BUG: 0 [#1] PREEMPT SMP Modules linked in:
> > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.4.0-00459-g9f087b9-dirty
> > #7 Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
> > task: ffc036868000 ti: ffc03687 task.ti: ffc03687
> > PC is at qup_sg_set_buf.isra.13+0x138/0x154
> > LR is at qup_sg_set_buf.isra.13+0x50/0x154 pc : []
> > lr : [] pstate: 6145 sp : ffc0368735c0
> > x29: ffc0368735c0 x28: ffc036873752
> > x27: ffc035233018 x26: ffc000c4e000
> > x25:  x24: 0004
> > x23:  x22: ffc035233668
> > x21: ff80004e3000 x20: ffc0352e0018
> > x19: 0040 x18: 0028
> > x17: 0004 x16: ffc0017a39c8
> > x15: 1cdf x14: ffc0019929d8
> > x13: ffc0352e0018 x12: 
> > x11: 0001 x10: 0001
> > x9 : ffc0012b2d70 x8 : ff80004e3000
> > x7 : 0018 x6 : 3000
> > x5 : ffc00199f018 x4 : ffc035233018
> > x3 : 0004 x2 : c000
> > x1 : 0003 x0 : 
> >
> > Process swapper/0 (pid: 1, stack limit = 0xffc036870020)
> > Stack: (0xffc0368735c0 to 0xffc036874000)
> >
> > Change allocation of sg buffers from dma_coherent memory to kzalloc to
> > fix the issue.
> 
> This description describes what you do. But not why it is the correct
solution
> to the OOPS. The OOPS  neither describes it. Please add some more
> explanation.
> 
   Ok,will describe it more. The reason it oops is sg_set_bug expects that
the
   buf parameter passed in should be a from the lowmem and a valid
pageframe.
  This is not true for pages from dma_alloc_coherent which can be carveouts,
hence
  the check fails. Allocating buffers using kzalloc fixes the issue.

Regards,
 Sricharan
  



RE: [PATCH V2 1/2] drivers: i2c: qup: Fix broken dma when CONFIG_DEBUG_SG is enabled

2016-05-15 Thread Sricharan
Hi,
> > With CONFIG_DEBUG_SG is enabled and when dma mode is used, below
> dump
> > is seen,
> >
> > [ cut here ]
> > kernel BUG at include/linux/scatterlist.h:140!
> > Internal error: Oops - BUG: 0 [#1] PREEMPT SMP Modules linked in:
> > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.4.0-00459-g9f087b9-dirty
> > #7 Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
> > task: ffc036868000 ti: ffc03687 task.ti: ffc03687
> > PC is at qup_sg_set_buf.isra.13+0x138/0x154
> > LR is at qup_sg_set_buf.isra.13+0x50/0x154 pc : []
> > lr : [] pstate: 6145 sp : ffc0368735c0
> > x29: ffc0368735c0 x28: ffc036873752
> > x27: ffc035233018 x26: ffc000c4e000
> > x25:  x24: 0004
> > x23:  x22: ffc035233668
> > x21: ff80004e3000 x20: ffc0352e0018
> > x19: 0040 x18: 0028
> > x17: 0004 x16: ffc0017a39c8
> > x15: 1cdf x14: ffc0019929d8
> > x13: ffc0352e0018 x12: 
> > x11: 0001 x10: 0001
> > x9 : ffc0012b2d70 x8 : ff80004e3000
> > x7 : 0018 x6 : 3000
> > x5 : ffc00199f018 x4 : ffc035233018
> > x3 : 0004 x2 : c000
> > x1 : 0003 x0 : 
> >
> > Process swapper/0 (pid: 1, stack limit = 0xffc036870020)
> > Stack: (0xffc0368735c0 to 0xffc036874000)
> >
> > Change allocation of sg buffers from dma_coherent memory to kzalloc to
> > fix the issue.
> 
> This description describes what you do. But not why it is the correct
solution
> to the OOPS. The OOPS  neither describes it. Please add some more
> explanation.
> 
   Ok,will describe it more. The reason it oops is sg_set_bug expects that
the
   buf parameter passed in should be a from the lowmem and a valid
pageframe.
  This is not true for pages from dma_alloc_coherent which can be carveouts,
hence
  the check fails. Allocating buffers using kzalloc fixes the issue.

Regards,
 Sricharan
  



RE: [PATCH V2 0/2] drivers: i2c: qup: Some misc fixes

2016-05-15 Thread Sricharan
Hi,

> > One for fixing the bug with CONFIG_DEBUG_SG enabled and another to
> > suspend the transfer for all errors instead of just for nack.
> 
> You haven't stated what was changed in V2.
ah, sorry, will resend..

Regards,
 Sricharan



RE: [PATCH V2 0/2] drivers: i2c: qup: Some misc fixes

2016-05-15 Thread Sricharan
Hi,

> > One for fixing the bug with CONFIG_DEBUG_SG enabled and another to
> > suspend the transfer for all errors instead of just for nack.
> 
> You haven't stated what was changed in V2.
ah, sorry, will resend..

Regards,
 Sricharan



[PATCH] gpu:drm:radeon:fix array out fo bouds

2016-05-15 Thread Heloise NH
From: tom will 

When the initial value of i is greater than zero,
it may cause endless loop, resulting in array out
of bouds, fix it.

Signed-off-by: tom will 
---
 drivers/gpu/drm/radeon/kv_dpm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/kv_dpm.c b/drivers/gpu/drm/radeon/kv_dpm.c
index d024074..a7e9786 100644
--- a/drivers/gpu/drm/radeon/kv_dpm.c
+++ b/drivers/gpu/drm/radeon/kv_dpm.c
@@ -2164,7 +2164,7 @@ static void kv_apply_state_adjust_rules(struct 
radeon_device *rdev,
if (pi->caps_stable_p_state) {
stable_p_state_sclk = (max_limits->sclk * 75) / 100;
 
-   for (i = table->count - 1; i >= 0; i++) {
+   for (i = table->count - 1; i >= 0; i--) {
if (stable_p_state_sclk >= table->entries[i].clk) {
stable_p_state_sclk = table->entries[i].clk;
break;
-- 
2.1.0




[PATCH] gpu:drm:radeon:fix array out fo bouds

2016-05-15 Thread Heloise NH
From: tom will 

When the initial value of i is greater than zero,
it may cause endless loop, resulting in array out
of bouds, fix it.

Signed-off-by: tom will 
---
 drivers/gpu/drm/radeon/kv_dpm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/kv_dpm.c b/drivers/gpu/drm/radeon/kv_dpm.c
index d024074..a7e9786 100644
--- a/drivers/gpu/drm/radeon/kv_dpm.c
+++ b/drivers/gpu/drm/radeon/kv_dpm.c
@@ -2164,7 +2164,7 @@ static void kv_apply_state_adjust_rules(struct 
radeon_device *rdev,
if (pi->caps_stable_p_state) {
stable_p_state_sclk = (max_limits->sclk * 75) / 100;
 
-   for (i = table->count - 1; i >= 0; i++) {
+   for (i = table->count - 1; i >= 0; i--) {
if (stable_p_state_sclk >= table->entries[i].clk) {
stable_p_state_sclk = table->entries[i].clk;
break;
-- 
2.1.0




Re: [Patch v5 5/8] firmware: qcom: scm: Convert to streaming DMA APIS

2016-05-15 Thread Andy Gross
On Fri, May 13, 2016 at 04:48:52PM -0700, Bjorn Andersson wrote:
> > +   cmd->len = cpu_to_le32(alloc_len);
> > +   cmd->buf_offset = cpu_to_le32(sizeof(*cmd));
> > +   cmd->resp_hdr_offset = cpu_to_le32(sizeof(*cmd) + cmd_len);
> > +
> > cmd->id = cpu_to_le32((svc_id << 10) | cmd_id);
> > if (cmd_buf)
> > -   memcpy(qcom_scm_get_command_buffer(cmd), cmd_buf, cmd_len);
> > +   memcpy(cmd->buf, cmd_buf, cmd_len);
> > +
> > +   rsp = (void *)cmd->buf + le32_to_cpu(cmd->resp_hdr_offset);
> 
> I believe resp_hdr_offset counts from the beginning of the buffer and
> that this therefor is supposed to be:
> 
>   rsp = (void *)cmd + le32_to_cpu(cmd->resp_hdr_offset);
> 
> With that corrected, feel free to add:
> 
> Reviewed-by: Bjorn Andersson 

I'll fix that up.  Thanks for the review.


Andy


Re: [Patch v5 5/8] firmware: qcom: scm: Convert to streaming DMA APIS

2016-05-15 Thread Andy Gross
On Fri, May 13, 2016 at 04:48:52PM -0700, Bjorn Andersson wrote:
> > +   cmd->len = cpu_to_le32(alloc_len);
> > +   cmd->buf_offset = cpu_to_le32(sizeof(*cmd));
> > +   cmd->resp_hdr_offset = cpu_to_le32(sizeof(*cmd) + cmd_len);
> > +
> > cmd->id = cpu_to_le32((svc_id << 10) | cmd_id);
> > if (cmd_buf)
> > -   memcpy(qcom_scm_get_command_buffer(cmd), cmd_buf, cmd_len);
> > +   memcpy(cmd->buf, cmd_buf, cmd_len);
> > +
> > +   rsp = (void *)cmd->buf + le32_to_cpu(cmd->resp_hdr_offset);
> 
> I believe resp_hdr_offset counts from the beginning of the buffer and
> that this therefor is supposed to be:
> 
>   rsp = (void *)cmd + le32_to_cpu(cmd->resp_hdr_offset);
> 
> With that corrected, feel free to add:
> 
> Reviewed-by: Bjorn Andersson 

I'll fix that up.  Thanks for the review.


Andy


Re: linux-next: Tree for May 16

2016-05-15 Thread Stephen Rothwell
Hi all,

I forgot to say:  Please do not add any v4.8 destined material to your
linux-next included branches until after v4.7-rc1 has been released.

-- 
Cheers,
Stephen Rothwell


Re: linux-next: Tree for May 16

2016-05-15 Thread Stephen Rothwell
Hi all,

I forgot to say:  Please do not add any v4.8 destined material to your
linux-next included branches until after v4.7-rc1 has been released.

-- 
Cheers,
Stephen Rothwell


linux-next: Tree for May 16

2016-05-15 Thread Stephen Rothwell
Hi all,

Changes since 20160513:

Undropped tree: rdma-leon

The wireless-drivers-next tree gained a conflict aaginst the net-next
tree.

The rdma-leon tree lost its build failure.

The spi tree gained a build failure so I used the version from next-20160513.

The clk tree gained a conflict against the mips tree.

Non-merge commits (relative to Linus' tree): 10066
 8572 files changed, 442704 insertions(+), 185902 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc and an allmodconfig (with
CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a
native build of tools/perf. After the final fixups (if any), I do an
x86_64 modules_install followed by builds for x86_64 allnoconfig,
powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig
(this fails its final link) and pseries_le_defconfig and i386, sparc
and sparc64 defconfig.

Below is a summary of the state of the merge.

I am currently merging 235 trees (counting Linus' and 35 trees of patches
pending for Linus' tree).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (5f95063c686a Merge branch 'x86-urgent-for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip)
Merging fixes/master (b507146bb6b9 Merge branch 'linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6)
Merging kbuild-current/rc-fixes (3d1450d54a4f Makefile: Force gzip and xz on 
module install)
Merging arc-current/for-curr (44549e8f5eea Linux 4.6-rc7)
Merging arm-current/fixes (ec953b70f368 ARM: 8573/1: domain: move 
{set,get}_domain under config guard)
Merging m68k-current/for-linus (7b8ba82ad4ad m68k/defconfig: Update defconfigs 
for v4.6-rc2)
Merging metag-fixes/fixes (0164a711c97b metag: Fix ioremap_wc/ioremap_cached 
build errors)
Merging powerpc-fixes/fixes (b4c112114aab powerpc: Fix bad inline asm 
constraint in create_zero_mask())
Merging powerpc-merge-mpe/fixes (bc0195aad0da Linux 4.2-rc2)
Merging sparc/master (33656a1f2ee5 Merge branch 'for_linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs)
Merging net/master (272911b889f4 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging ipsec/master (d6af1a31cc72 vti: Add pmtu handling to vti_xmit.)
Merging ipvs/master (f28f20da704d Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging wireless-drivers/master (cbbba30f1ac9 Merge tag 
'iwlwifi-for-kalle-2016-05-04' of 
https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes)
Merging mac80211/master (e6436be21e77 mac80211: fix statistics leak if 
dev_alloc_name() fails)
Merging sound-current/for-linus (84add303ef95 ALSA: usb-audio: Yet another 
Phoneix Audio device quirk)
Merging pci-current/for-linus (9a2a5a638f8e PCI: Do not treat EPROBE_DEFER as 
device attach failure)
Merging driver-core.current/driver-core-linus (c3b46c73264b Linux 4.6-rc4)
Merging tty.current/tty-linus (44549e8f5eea Linux 4.6-rc7)
Merging usb.current/usb-linus (44549e8f5eea Linux 4.6-rc7)
Merging usb-gadget-fixes/fixes (38740a5b87d5 usb: gadget: f_fs: Fix 
use-after-free)
Merging usb-serial-fixes/usb-linus (74d2a91aec97 USB: serial: option: add even 
more ZTE device ids)
Merging usb-chipidea-fixes/ci-for-usb-stable (d144dfea8af7 usb: chipidea: otg: 
change workqueue ci_otg as freezable)
Merging staging.current/staging-linus (44549e8f5eea Linux 4.6-rc7)
Merging char-misc.current/char-misc-linus (44549e8f5eea Linux 4.6-rc7)
Merging input-current/for-linus (c52c545ead97 Input: twl6040-vibra - fix DT 
node memory management)
Merging crypto-current/master (df27b26f04ed crypto: testmgr - Use kmalloc 
memory for RSA input)
Merging ide/master (1993b176a822 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide)
Merging devicetree-current/devicetree/merge (f76502aa9140 of/dynamic: Fix test 
for PPC_PSERIES)
Merging rr-fixes/fixes (8244062ef1e5 modules: fix longstanding 

linux-next: Tree for May 16

2016-05-15 Thread Stephen Rothwell
Hi all,

Changes since 20160513:

Undropped tree: rdma-leon

The wireless-drivers-next tree gained a conflict aaginst the net-next
tree.

The rdma-leon tree lost its build failure.

The spi tree gained a build failure so I used the version from next-20160513.

The clk tree gained a conflict against the mips tree.

Non-merge commits (relative to Linus' tree): 10066
 8572 files changed, 442704 insertions(+), 185902 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc and an allmodconfig (with
CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a
native build of tools/perf. After the final fixups (if any), I do an
x86_64 modules_install followed by builds for x86_64 allnoconfig,
powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig
(this fails its final link) and pseries_le_defconfig and i386, sparc
and sparc64 defconfig.

Below is a summary of the state of the merge.

I am currently merging 235 trees (counting Linus' and 35 trees of patches
pending for Linus' tree).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (5f95063c686a Merge branch 'x86-urgent-for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip)
Merging fixes/master (b507146bb6b9 Merge branch 'linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6)
Merging kbuild-current/rc-fixes (3d1450d54a4f Makefile: Force gzip and xz on 
module install)
Merging arc-current/for-curr (44549e8f5eea Linux 4.6-rc7)
Merging arm-current/fixes (ec953b70f368 ARM: 8573/1: domain: move 
{set,get}_domain under config guard)
Merging m68k-current/for-linus (7b8ba82ad4ad m68k/defconfig: Update defconfigs 
for v4.6-rc2)
Merging metag-fixes/fixes (0164a711c97b metag: Fix ioremap_wc/ioremap_cached 
build errors)
Merging powerpc-fixes/fixes (b4c112114aab powerpc: Fix bad inline asm 
constraint in create_zero_mask())
Merging powerpc-merge-mpe/fixes (bc0195aad0da Linux 4.2-rc2)
Merging sparc/master (33656a1f2ee5 Merge branch 'for_linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs)
Merging net/master (272911b889f4 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging ipsec/master (d6af1a31cc72 vti: Add pmtu handling to vti_xmit.)
Merging ipvs/master (f28f20da704d Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging wireless-drivers/master (cbbba30f1ac9 Merge tag 
'iwlwifi-for-kalle-2016-05-04' of 
https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes)
Merging mac80211/master (e6436be21e77 mac80211: fix statistics leak if 
dev_alloc_name() fails)
Merging sound-current/for-linus (84add303ef95 ALSA: usb-audio: Yet another 
Phoneix Audio device quirk)
Merging pci-current/for-linus (9a2a5a638f8e PCI: Do not treat EPROBE_DEFER as 
device attach failure)
Merging driver-core.current/driver-core-linus (c3b46c73264b Linux 4.6-rc4)
Merging tty.current/tty-linus (44549e8f5eea Linux 4.6-rc7)
Merging usb.current/usb-linus (44549e8f5eea Linux 4.6-rc7)
Merging usb-gadget-fixes/fixes (38740a5b87d5 usb: gadget: f_fs: Fix 
use-after-free)
Merging usb-serial-fixes/usb-linus (74d2a91aec97 USB: serial: option: add even 
more ZTE device ids)
Merging usb-chipidea-fixes/ci-for-usb-stable (d144dfea8af7 usb: chipidea: otg: 
change workqueue ci_otg as freezable)
Merging staging.current/staging-linus (44549e8f5eea Linux 4.6-rc7)
Merging char-misc.current/char-misc-linus (44549e8f5eea Linux 4.6-rc7)
Merging input-current/for-linus (c52c545ead97 Input: twl6040-vibra - fix DT 
node memory management)
Merging crypto-current/master (df27b26f04ed crypto: testmgr - Use kmalloc 
memory for RSA input)
Merging ide/master (1993b176a822 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide)
Merging devicetree-current/devicetree/merge (f76502aa9140 of/dynamic: Fix test 
for PPC_PSERIES)
Merging rr-fixes/fixes (8244062ef1e5 modules: fix longstanding 

Re: [PATCH 0/5] cpufreq: governor: Rework API to use callbacks instead of events

2016-05-15 Thread Viresh Kumar
On 14-05-16, 00:58, Rafael J. Wysocki wrote:
> Hi,
> 
> This series is on top of the current linux-next witn the following two patches
> applied:
> 
> https://patchwork.kernel.org/patch/9080801/
> https://patchwork.kernel.org/patch/9080791/
> 
> It cleans up a few things and then reworks the governor API to get rid of
> governor events and use callbacks representing individual governor operations
> (init, exit, start, stop, limits update) instead.
> 
> I'm regarding it as v4.8 material, but I'd like to put it into linux-next as
> soon as 4.7-rc1 is out so subsequent cpufreq development happens on top of it.
> 
> It has been lightly tested without any problems showing up so far.

Acked-by: Viresh Kumar 

-- 
viresh


Re: [PATCH 0/5] cpufreq: governor: Rework API to use callbacks instead of events

2016-05-15 Thread Viresh Kumar
On 14-05-16, 00:58, Rafael J. Wysocki wrote:
> Hi,
> 
> This series is on top of the current linux-next witn the following two patches
> applied:
> 
> https://patchwork.kernel.org/patch/9080801/
> https://patchwork.kernel.org/patch/9080791/
> 
> It cleans up a few things and then reworks the governor API to get rid of
> governor events and use callbacks representing individual governor operations
> (init, exit, start, stop, limits update) instead.
> 
> I'm regarding it as v4.8 material, but I'd like to put it into linux-next as
> soon as 4.7-rc1 is out so subsequent cpufreq development happens on top of it.
> 
> It has been lightly tested without any problems showing up so far.

Acked-by: Viresh Kumar 

-- 
viresh


Re: [PATCH 5/5] cpufreq: governor: Get rid of governor events

2016-05-15 Thread Viresh Kumar
On 14-05-16, 01:02, Rafael J. Wysocki wrote:
> --- linux-pm.orig/drivers/cpufreq/cpufreq_governor.c
> -int cpufreq_governor_dbs(struct cpufreq_policy *policy, unsigned int event)
> -{
> - if (event == CPUFREQ_GOV_POLICY_INIT) {
> - return cpufreq_governor_init(policy);
> - } else if (policy->governor_data) {

So, we aren't checking this anymore.

I am not sure (right now) if this will be a problem.

-- 
viresh


Re: [PATCH 5/5] cpufreq: governor: Get rid of governor events

2016-05-15 Thread Viresh Kumar
On 14-05-16, 01:02, Rafael J. Wysocki wrote:
> --- linux-pm.orig/drivers/cpufreq/cpufreq_governor.c
> -int cpufreq_governor_dbs(struct cpufreq_policy *policy, unsigned int event)
> -{
> - if (event == CPUFREQ_GOV_POLICY_INIT) {
> - return cpufreq_governor_init(policy);
> - } else if (policy->governor_data) {

So, we aren't checking this anymore.

I am not sure (right now) if this will be a problem.

-- 
viresh


[PATCH] perf script: Fix display inconsitency when call-graph config is used

2016-05-15 Thread He Kuang
There's a display inconsistency when 'call-graph' config event appears
in different position. The problem can be reproduced like this:

We record signal_deliver with call-graph and signal_generate without it.

  $ perf record -g -a -e signal:signal_deliver -e 
signal:signal_generate/call-graph=no/

  [ perf record: Captured and wrote 0.017 MB perf.data (2 samples) ]

  $ perf script

  kworker/u2:113 [000]  6563.875949: signal:signal_generate: sig=2 errno=0 
code=128 comm=perf pid=1313 grp=1 res=0 ff61cc __send_signal+0x3ec 
([kernel.kallsyms])
  perf  1313 [000]  6563.877584:  signal:signal_deliver: sig=2 errno=0 code=128 
sa_handler=43115e sa_flags=1400
  7314 get_signal+0x80007f0023a4 ([kernel.kallsyms])
  7fffe358 do_signal+0x80007f002028 ([kernel.kallsyms])
  7fffa5e8 exit_to_usermode_loop+0x80007f002053 ([kernel.kallsyms])
  ...

Then we exchange the order of these two events in commandline, and keep
signal_generate without call-graph.

  $ perf record -g -a -e signal:signal_generate/call-graph=no/ -e 
signal:signal_deliver

  [ perf record: Captured and wrote 0.017 MB perf.data (2 samples) ]

  $ perf script

kworker/u2:2  1314 [000]  6933.353060: signal:signal_generate: sig=2 
errno=0 code=128 comm=perf pid=1321 grp=1 res=0
perf  1321 [000]  6933.353872:  signal:signal_deliver: sig=2 
errno=0 code=128 sa_handler=43115e sa_flags=1400

This time, the callchain of the event signal_deliver disappeared. The
problem is caused by that perf only checks for the first evsel in evlist
and decides if callchain should be printed.

This patch travseres all evsels in evlist to see if any of them have
callchains, and shows the right result:

  $ perf script

  kworker/u2:2  1314 [000]  6933.353060: signal:signal_generate: sig=2 errno=0 
code=128 comm=perf pid=1321 grp=1 res=0 ff61cc __send_signal+0x3ec 
([kernel.kallsyms])
  perf  1321 [000]  6933.353872:  signal:signal_deliver: sig=2 errno=0 code=128 
sa_handler=43115e sa_flags=1400
  7314 get_signal+0x80007f0023a4 ([kernel.kallsyms])
  7fffe358 do_signal+0x80007f002028 ([kernel.kallsyms])
  7fffa5e8 exit_to_usermode_loop+0x80007f002053 ([kernel.kallsyms])
  ...

Signed-off-by: He Kuang 
---
 tools/perf/builtin-script.c | 23 +--
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index efca816..7a18b92 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -339,7 +339,7 @@ static void set_print_ip_opts(struct perf_event_attr *attr)
  */
 static int perf_session__check_output_opt(struct perf_session *session)
 {
-   int j;
+   unsigned int j;
struct perf_evsel *evsel;
 
for (j = 0; j < PERF_TYPE_MAX; ++j) {
@@ -388,17 +388,20 @@ static int perf_session__check_output_opt(struct 
perf_session *session)
struct perf_event_attr *attr;
 
j = PERF_TYPE_TRACEPOINT;
-   evsel = perf_session__find_first_evtype(session, j);
-   if (evsel == NULL)
-   goto out;
 
-   attr = >attr;
+   evlist__for_each(session->evlist, evsel) {
+   if (evsel->attr.type != j)
+   continue;
+
+   attr = >attr;
 
-   if (attr->sample_type & PERF_SAMPLE_CALLCHAIN) {
-   output[j].fields |= PERF_OUTPUT_IP;
-   output[j].fields |= PERF_OUTPUT_SYM;
-   output[j].fields |= PERF_OUTPUT_DSO;
-   set_print_ip_opts(attr);
+   if (attr->sample_type & PERF_SAMPLE_CALLCHAIN) {
+   output[j].fields |= PERF_OUTPUT_IP;
+   output[j].fields |= PERF_OUTPUT_SYM;
+   output[j].fields |= PERF_OUTPUT_DSO;
+   set_print_ip_opts(attr);
+   goto out;
+   }
}
}
 
-- 
1.8.5.2



[PATCH] perf script: Fix display inconsitency when call-graph config is used

2016-05-15 Thread He Kuang
There's a display inconsistency when 'call-graph' config event appears
in different position. The problem can be reproduced like this:

We record signal_deliver with call-graph and signal_generate without it.

  $ perf record -g -a -e signal:signal_deliver -e 
signal:signal_generate/call-graph=no/

  [ perf record: Captured and wrote 0.017 MB perf.data (2 samples) ]

  $ perf script

  kworker/u2:113 [000]  6563.875949: signal:signal_generate: sig=2 errno=0 
code=128 comm=perf pid=1313 grp=1 res=0 ff61cc __send_signal+0x3ec 
([kernel.kallsyms])
  perf  1313 [000]  6563.877584:  signal:signal_deliver: sig=2 errno=0 code=128 
sa_handler=43115e sa_flags=1400
  7314 get_signal+0x80007f0023a4 ([kernel.kallsyms])
  7fffe358 do_signal+0x80007f002028 ([kernel.kallsyms])
  7fffa5e8 exit_to_usermode_loop+0x80007f002053 ([kernel.kallsyms])
  ...

Then we exchange the order of these two events in commandline, and keep
signal_generate without call-graph.

  $ perf record -g -a -e signal:signal_generate/call-graph=no/ -e 
signal:signal_deliver

  [ perf record: Captured and wrote 0.017 MB perf.data (2 samples) ]

  $ perf script

kworker/u2:2  1314 [000]  6933.353060: signal:signal_generate: sig=2 
errno=0 code=128 comm=perf pid=1321 grp=1 res=0
perf  1321 [000]  6933.353872:  signal:signal_deliver: sig=2 
errno=0 code=128 sa_handler=43115e sa_flags=1400

This time, the callchain of the event signal_deliver disappeared. The
problem is caused by that perf only checks for the first evsel in evlist
and decides if callchain should be printed.

This patch travseres all evsels in evlist to see if any of them have
callchains, and shows the right result:

  $ perf script

  kworker/u2:2  1314 [000]  6933.353060: signal:signal_generate: sig=2 errno=0 
code=128 comm=perf pid=1321 grp=1 res=0 ff61cc __send_signal+0x3ec 
([kernel.kallsyms])
  perf  1321 [000]  6933.353872:  signal:signal_deliver: sig=2 errno=0 code=128 
sa_handler=43115e sa_flags=1400
  7314 get_signal+0x80007f0023a4 ([kernel.kallsyms])
  7fffe358 do_signal+0x80007f002028 ([kernel.kallsyms])
  7fffa5e8 exit_to_usermode_loop+0x80007f002053 ([kernel.kallsyms])
  ...

Signed-off-by: He Kuang 
---
 tools/perf/builtin-script.c | 23 +--
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index efca816..7a18b92 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -339,7 +339,7 @@ static void set_print_ip_opts(struct perf_event_attr *attr)
  */
 static int perf_session__check_output_opt(struct perf_session *session)
 {
-   int j;
+   unsigned int j;
struct perf_evsel *evsel;
 
for (j = 0; j < PERF_TYPE_MAX; ++j) {
@@ -388,17 +388,20 @@ static int perf_session__check_output_opt(struct 
perf_session *session)
struct perf_event_attr *attr;
 
j = PERF_TYPE_TRACEPOINT;
-   evsel = perf_session__find_first_evtype(session, j);
-   if (evsel == NULL)
-   goto out;
 
-   attr = >attr;
+   evlist__for_each(session->evlist, evsel) {
+   if (evsel->attr.type != j)
+   continue;
+
+   attr = >attr;
 
-   if (attr->sample_type & PERF_SAMPLE_CALLCHAIN) {
-   output[j].fields |= PERF_OUTPUT_IP;
-   output[j].fields |= PERF_OUTPUT_SYM;
-   output[j].fields |= PERF_OUTPUT_DSO;
-   set_print_ip_opts(attr);
+   if (attr->sample_type & PERF_SAMPLE_CALLCHAIN) {
+   output[j].fields |= PERF_OUTPUT_IP;
+   output[j].fields |= PERF_OUTPUT_SYM;
+   output[j].fields |= PERF_OUTPUT_DSO;
+   set_print_ip_opts(attr);
+   goto out;
+   }
}
}
 
-- 
1.8.5.2



Re: [rcu_sched stall] regression/miss-config ?

2016-05-15 Thread santosh.shilim...@oracle.com

On 5/15/16 2:18 PM, Santosh Shilimkar wrote:

Hi Paul,

I was asking Sasha about [1] since other folks in Oracle
also stumbled upon similar RCU stalls with v4.1 kernel in
different workloads. I was reported similar issue with
RDS as well and looking at [1], [2], [3] and [4], thought
of reaching out to see if you can help us to understand
this issue better.

Have also included RCU specific config used in these
test(s). Its very hard to reproduce the issue but one of
the data point is, it reproduces on systems with larger
CPUs(64+). Same workload with less than 64 CPUs, don't
show the issue. Someone also told me, making use of
SLAB instead SLUB allocator makes difference but I
haven't verified that part for RDS.

Let me know your thoughts. Thanks in advance !!


One of my colleague told me the pastebin server I used
is Oracle internal only so adding the relevant logs along
with email.

Regards,
Santosh


[1] https://lkml.org/lkml/2014/12/14/304



[2]  Log 1 snippet:
-
 INFO: rcu_sched self-detected stall on CPU
 INFO: rcu_sched self-detected stall on CPU { 54}  (t=6 jiffies 
g=66023 c=66022 q=0)

 Task dump for CPU 54:
 ksoftirqd/54R  running task0   389  2 0x0008
  0007 88ff7f403d38 810a8621 0036
  81ab6540 88ff7f403d58 810a86cf 0086
  81ab6940 88ff7f403d88 810e3ad3 81ab6540
 Call Trace:
[] sched_show_task+0xb1/0x120
  [] dump_cpu_task+0x3f/0x50
  [] rcu_dump_cpu_stacks+0x83/0xc0
  [] print_cpu_stall+0xfc/0x170
  [] __rcu_pending+0x2bb/0x2c0
  [] rcu_check_callbacks+0x9d/0x170
  [] update_process_times+0x42/0x70
  [] tick_sched_handle+0x39/0x80
  [] tick_sched_timer+0x44/0x80
  [] __run_hrtimer+0x74/0x1d0
  [] ? tick_nohz_handler+0xa0/0xa0
  [] hrtimer_interrupt+0x102/0x240
  [] local_apic_timer_interrupt+0x39/0x60
  [] smp_apic_timer_interrupt+0x45/0x59
  [] apic_timer_interrupt+0x6e/0x80
[] ? free_one_page+0x164/0x380
  [] ? __free_pages_ok+0xc3/0xe0
  [] __free_pages+0x25/0x40
  [] rds_message_purge+0x60/0x150 [rds]
  [] rds_message_put+0x44/0x80 [rds]
  [] rds_ib_send_cqe_handler+0x134/0x2d0 [rds_rdma]
  [] ? _raw_spin_unlock_irqrestore+0x1b/0x50
  [] ? mlx4_ib_poll_cq+0xb3/0x2a0 [mlx4_ib]
  [] poll_cq+0xa1/0xe0 [rds_rdma]
  [] rds_ib_tasklet_fn_send+0x79/0xf0 [rds_rdma]
  [] tasklet_action+0xb1/0xc0
  [] __do_softirq+0xf2/0x2f0
  [] ? smpboot_create_threads+0x80/0x80
  [] run_ksoftirqd+0x2d/0x50
  [] smpboot_thread_fn+0x116/0x170
  [] kthread+0xce/0xf0
  [] ? kthread_freezable_should_stop+0x70/0x70
  [] ret_from_fork+0x42/0x70
  [] ? kthread_freezable_should_stop+0x70/0x70
 Task dump for CPU 66:
 ksoftirqd/66R  running task0   474  2 0x0008
  04208040664afe58 88ff664ac008 00010005d97a 000a39b2e200
  0018 0006 88ff664a1c00 88ff669a5260
  88ff664a1c00 81aa4ec0 810a4930 
 Call Trace:
  [] ? smpboot_create_threads+0x80/0x80
  [] ? run_ksoftirqd+0x2d/0x50
  [] ? smpboot_thread_fn+0x116/0x170
  [] ? kthread+0xce/0xf0
  [] ? kthread_freezable_should_stop+0x70/0x70
  [] ? ret_from_fork+0x42/0x70
  [] ? kthread_freezable_should_stop+0x70/0x70
  { 66}  (t=60011 jiffies g=66023 c=66022 q=0)
 Task dump for CPU 54:
 ksoftirqd/54R  running task0   389  2 0x0008
  04208040669e3e58 88ff669e0008 00010005d981 000a39b28e00
   0006 88ff669d1c00 88ff669a5020
  88ff669d1c00 81aa4ec0 810a4930 
 Call Trace:
  [] ? smpboot_create_threads+0x80/0x80
  [] ? run_ksoftirqd+0x2d/0x50
  [] ? smpboot_thread_fn+0x116/0x170
  [] ? kthread+0xce/0xf0
  [] ? kthread_freezable_should_stop+0x70/0x70
  [] ? ret_from_fork+0x42/0x70
  [] ? kthread_freezable_should_stop+0x70/0x70
 Task dump for CPU 66:
 ksoftirqd/66R  running task0   474  2 0x0008
  0003 88ff7f703d38 810a8621 0042
  81ab6540 88ff7f703d58 810a86cf 0086
  81ab6a40 88ff7f703d88 810e3ad3 81ab6540
 Call Trace:
[] sched_show_task+0xb1/0x120
  [] dump_cpu_task+0x3f/0x50
  [] rcu_dump_cpu_stacks+0x83/0xc0
  [] print_cpu_stall+0xfc/0x170
  [] __rcu_pending+0x2bb/0x2c0
  [] rcu_check_callbacks+0x9d/0x170
  [] update_process_times+0x42/0x70
  [] tick_sched_handle+0x39/0x80
  [] tick_sched_timer+0x44/0x80
  [] __run_hrtimer+0x74/0x1d0
  [] ? tick_nohz_handler+0xa0/0xa0
  [] hrtimer_interrupt+0x102/0x240
  [] local_apic_timer_interrupt+0x39/0x60
  [] smp_apic_timer_interrupt+0x45/0x59
  [] apic_timer_interrupt+0x6e/0x80
[] ? free_one_page+0x164/0x380
  [] ? __free_pages_ok+0xc3/0xe0
  [] __free_pages+0x25/0x40
  [] rds_message_purge+0x60/0x150 [rds]
  [] rds_message_put+0x44/0x80 [rds]
  [] rds_ib_send_cqe_handler+0x134/0x2d0 [rds_rdma]
  [] ? 

Re: [rcu_sched stall] regression/miss-config ?

2016-05-15 Thread santosh.shilim...@oracle.com

On 5/15/16 2:18 PM, Santosh Shilimkar wrote:

Hi Paul,

I was asking Sasha about [1] since other folks in Oracle
also stumbled upon similar RCU stalls with v4.1 kernel in
different workloads. I was reported similar issue with
RDS as well and looking at [1], [2], [3] and [4], thought
of reaching out to see if you can help us to understand
this issue better.

Have also included RCU specific config used in these
test(s). Its very hard to reproduce the issue but one of
the data point is, it reproduces on systems with larger
CPUs(64+). Same workload with less than 64 CPUs, don't
show the issue. Someone also told me, making use of
SLAB instead SLUB allocator makes difference but I
haven't verified that part for RDS.

Let me know your thoughts. Thanks in advance !!


One of my colleague told me the pastebin server I used
is Oracle internal only so adding the relevant logs along
with email.

Regards,
Santosh


[1] https://lkml.org/lkml/2014/12/14/304



[2]  Log 1 snippet:
-
 INFO: rcu_sched self-detected stall on CPU
 INFO: rcu_sched self-detected stall on CPU { 54}  (t=6 jiffies 
g=66023 c=66022 q=0)

 Task dump for CPU 54:
 ksoftirqd/54R  running task0   389  2 0x0008
  0007 88ff7f403d38 810a8621 0036
  81ab6540 88ff7f403d58 810a86cf 0086
  81ab6940 88ff7f403d88 810e3ad3 81ab6540
 Call Trace:
[] sched_show_task+0xb1/0x120
  [] dump_cpu_task+0x3f/0x50
  [] rcu_dump_cpu_stacks+0x83/0xc0
  [] print_cpu_stall+0xfc/0x170
  [] __rcu_pending+0x2bb/0x2c0
  [] rcu_check_callbacks+0x9d/0x170
  [] update_process_times+0x42/0x70
  [] tick_sched_handle+0x39/0x80
  [] tick_sched_timer+0x44/0x80
  [] __run_hrtimer+0x74/0x1d0
  [] ? tick_nohz_handler+0xa0/0xa0
  [] hrtimer_interrupt+0x102/0x240
  [] local_apic_timer_interrupt+0x39/0x60
  [] smp_apic_timer_interrupt+0x45/0x59
  [] apic_timer_interrupt+0x6e/0x80
[] ? free_one_page+0x164/0x380
  [] ? __free_pages_ok+0xc3/0xe0
  [] __free_pages+0x25/0x40
  [] rds_message_purge+0x60/0x150 [rds]
  [] rds_message_put+0x44/0x80 [rds]
  [] rds_ib_send_cqe_handler+0x134/0x2d0 [rds_rdma]
  [] ? _raw_spin_unlock_irqrestore+0x1b/0x50
  [] ? mlx4_ib_poll_cq+0xb3/0x2a0 [mlx4_ib]
  [] poll_cq+0xa1/0xe0 [rds_rdma]
  [] rds_ib_tasklet_fn_send+0x79/0xf0 [rds_rdma]
  [] tasklet_action+0xb1/0xc0
  [] __do_softirq+0xf2/0x2f0
  [] ? smpboot_create_threads+0x80/0x80
  [] run_ksoftirqd+0x2d/0x50
  [] smpboot_thread_fn+0x116/0x170
  [] kthread+0xce/0xf0
  [] ? kthread_freezable_should_stop+0x70/0x70
  [] ret_from_fork+0x42/0x70
  [] ? kthread_freezable_should_stop+0x70/0x70
 Task dump for CPU 66:
 ksoftirqd/66R  running task0   474  2 0x0008
  04208040664afe58 88ff664ac008 00010005d97a 000a39b2e200
  0018 0006 88ff664a1c00 88ff669a5260
  88ff664a1c00 81aa4ec0 810a4930 
 Call Trace:
  [] ? smpboot_create_threads+0x80/0x80
  [] ? run_ksoftirqd+0x2d/0x50
  [] ? smpboot_thread_fn+0x116/0x170
  [] ? kthread+0xce/0xf0
  [] ? kthread_freezable_should_stop+0x70/0x70
  [] ? ret_from_fork+0x42/0x70
  [] ? kthread_freezable_should_stop+0x70/0x70
  { 66}  (t=60011 jiffies g=66023 c=66022 q=0)
 Task dump for CPU 54:
 ksoftirqd/54R  running task0   389  2 0x0008
  04208040669e3e58 88ff669e0008 00010005d981 000a39b28e00
   0006 88ff669d1c00 88ff669a5020
  88ff669d1c00 81aa4ec0 810a4930 
 Call Trace:
  [] ? smpboot_create_threads+0x80/0x80
  [] ? run_ksoftirqd+0x2d/0x50
  [] ? smpboot_thread_fn+0x116/0x170
  [] ? kthread+0xce/0xf0
  [] ? kthread_freezable_should_stop+0x70/0x70
  [] ? ret_from_fork+0x42/0x70
  [] ? kthread_freezable_should_stop+0x70/0x70
 Task dump for CPU 66:
 ksoftirqd/66R  running task0   474  2 0x0008
  0003 88ff7f703d38 810a8621 0042
  81ab6540 88ff7f703d58 810a86cf 0086
  81ab6a40 88ff7f703d88 810e3ad3 81ab6540
 Call Trace:
[] sched_show_task+0xb1/0x120
  [] dump_cpu_task+0x3f/0x50
  [] rcu_dump_cpu_stacks+0x83/0xc0
  [] print_cpu_stall+0xfc/0x170
  [] __rcu_pending+0x2bb/0x2c0
  [] rcu_check_callbacks+0x9d/0x170
  [] update_process_times+0x42/0x70
  [] tick_sched_handle+0x39/0x80
  [] tick_sched_timer+0x44/0x80
  [] __run_hrtimer+0x74/0x1d0
  [] ? tick_nohz_handler+0xa0/0xa0
  [] hrtimer_interrupt+0x102/0x240
  [] local_apic_timer_interrupt+0x39/0x60
  [] smp_apic_timer_interrupt+0x45/0x59
  [] apic_timer_interrupt+0x6e/0x80
[] ? free_one_page+0x164/0x380
  [] ? __free_pages_ok+0xc3/0xe0
  [] __free_pages+0x25/0x40
  [] rds_message_purge+0x60/0x150 [rds]
  [] rds_message_put+0x44/0x80 [rds]
  [] rds_ib_send_cqe_handler+0x134/0x2d0 [rds_rdma]
  [] ? 

Re: [PATCH net-next] tuntap: introduce tx skb ring

2016-05-15 Thread Michael S. Tsirkin
On Mon, May 16, 2016 at 09:17:01AM +0800, Jason Wang wrote:
> We used to queue tx packets in sk_receive_queue, this is less
> efficient since it requires spinlocks to synchronize between producer
> and consumer.
> 
> This patch tries to address this by using circular buffer which allows
> lockless synchronization. This is done by switching from
> sk_receive_queue to a tx skb ring with a new flag IFF_TX_RING and when
> this is set:

Why do we need a new flag? Is there a userspace-visible
behaviour change?

> 
> - store pointer to skb in circular buffer in tun_net_xmit(), and read
>   it from the circular buffer in tun_do_read().
> - introduce a new proto_ops peek which could be implemented by
>   specific socket which does not use sk_receive_queue.
> - store skb length in circular buffer too, and implement a lockless
>   peek for tuntap.
> - change vhost_net to use proto_ops->peek() instead
> - new spinlocks were introduced to synchronize among producers (and so
>   did for consumers).
> 
> Pktgen test shows about 9% improvement on guest receiving pps:
> 
> Before: ~148pps
> After : ~161pps
> 
> (I'm not sure noblocking read is still needed, so it was not included
>  in this patch)

How do you mean? Of course we must support blocking and non-blocking
read - userspace uses it.

> Signed-off-by: Jason Wang 
> ---
> ---
>  drivers/net/tun.c   | 157 
> +---
>  drivers/vhost/net.c |  16 -
>  include/linux/net.h |   1 +
>  include/uapi/linux/if_tun.h |   1 +
>  4 files changed, 165 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 425e983..6001ece 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -71,6 +71,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  
> @@ -130,6 +131,8 @@ struct tap_filter {
>  #define MAX_TAP_FLOWS  4096
>  
>  #define TUN_FLOW_EXPIRE (3 * HZ)
> +#define TUN_RING_SIZE 256

Can we resize this according to tx_queue_len set by user?

> +#define TUN_RING_MASK (TUN_RING_SIZE - 1)
>  
>  struct tun_pcpu_stats {
>   u64 rx_packets;
> @@ -142,6 +145,11 @@ struct tun_pcpu_stats {
>   u32 rx_frame_errors;
>  };
>  
> +struct tun_desc {
> + struct sk_buff *skb;
> + int len; /* Cached skb len for peeking */
> +};
> +
>  /* A tun_file connects an open character device to a tuntap netdevice. It
>   * also contains all socket related structures (except sock_fprog and 
> tap_filter)
>   * to serve as one transmit queue for tuntap device. The sock_fprog and
> @@ -167,6 +175,13 @@ struct tun_file {
>   };
>   struct list_head next;
>   struct tun_struct *detached;
> + /* reader lock */
> + spinlock_t rlock;
> + unsigned long tail;
> + struct tun_desc tx_descs[TUN_RING_SIZE];
> + /* writer lock */
> + spinlock_t wlock;
> + unsigned long head;
>  };
>  
>  struct tun_flow_entry {
> @@ -515,7 +530,27 @@ static struct tun_struct *tun_enable_queue(struct 
> tun_file *tfile)
>  
>  static void tun_queue_purge(struct tun_file *tfile)
>  {
> + unsigned long head, tail;
> + struct tun_desc *desc;
> + struct sk_buff *skb;
>   skb_queue_purge(>sk.sk_receive_queue);
> + spin_lock(>rlock);
> +
> + head = ACCESS_ONCE(tfile->head);
> + tail = tfile->tail;
> +
> + /* read tail before reading descriptor at tail */
> + smp_rmb();

I think you mean read *head* here


> +
> + while (CIRC_CNT(head, tail, TUN_RING_SIZE) >= 1) {
> + desc = >tx_descs[tail];
> + skb = desc->skb;
> + kfree_skb(skb);
> + tail = (tail + 1) & TUN_RING_MASK;
> + /* read descriptor before incrementing tail. */
> + smp_store_release(>tail, tail & TUN_RING_MASK);
> + }
> + spin_unlock(>rlock);
>   skb_queue_purge(>sk.sk_error_queue);
>  }
>

Barrier pairing seems messed up. Could you tag
each barrier with its pair pls?
E.g. add /* Barrier A for pairing */ Before barrier and
its pair.
  
> @@ -824,6 +859,7 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, 
> struct net_device *dev)
>   int txq = skb->queue_mapping;
>   struct tun_file *tfile;
>   u32 numqueues = 0;
> + unsigned long flags;
>  
>   rcu_read_lock();
>   tfile = rcu_dereference(tun->tfiles[txq]);
> @@ -888,8 +924,35 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, 
> struct net_device *dev)
>  
>   nf_reset(skb);
>  
> - /* Enqueue packet */
> - skb_queue_tail(>socket.sk->sk_receive_queue, skb);
> + if (tun->flags & IFF_TX_RING) {
> + unsigned long head, tail;
> +
> + spin_lock_irqsave(>wlock, flags);
> +
> + head = tfile->head;
> + tail = ACCESS_ONCE(tfile->tail);

this should be acquire

> +
> + if (CIRC_SPACE(head, tail, TUN_RING_SIZE) >= 1) {
> + struct tun_desc *desc = >tx_descs[head];
> +
> + 

Re: [PATCH net-next] tuntap: introduce tx skb ring

2016-05-15 Thread Michael S. Tsirkin
On Mon, May 16, 2016 at 09:17:01AM +0800, Jason Wang wrote:
> We used to queue tx packets in sk_receive_queue, this is less
> efficient since it requires spinlocks to synchronize between producer
> and consumer.
> 
> This patch tries to address this by using circular buffer which allows
> lockless synchronization. This is done by switching from
> sk_receive_queue to a tx skb ring with a new flag IFF_TX_RING and when
> this is set:

Why do we need a new flag? Is there a userspace-visible
behaviour change?

> 
> - store pointer to skb in circular buffer in tun_net_xmit(), and read
>   it from the circular buffer in tun_do_read().
> - introduce a new proto_ops peek which could be implemented by
>   specific socket which does not use sk_receive_queue.
> - store skb length in circular buffer too, and implement a lockless
>   peek for tuntap.
> - change vhost_net to use proto_ops->peek() instead
> - new spinlocks were introduced to synchronize among producers (and so
>   did for consumers).
> 
> Pktgen test shows about 9% improvement on guest receiving pps:
> 
> Before: ~148pps
> After : ~161pps
> 
> (I'm not sure noblocking read is still needed, so it was not included
>  in this patch)

How do you mean? Of course we must support blocking and non-blocking
read - userspace uses it.

> Signed-off-by: Jason Wang 
> ---
> ---
>  drivers/net/tun.c   | 157 
> +---
>  drivers/vhost/net.c |  16 -
>  include/linux/net.h |   1 +
>  include/uapi/linux/if_tun.h |   1 +
>  4 files changed, 165 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 425e983..6001ece 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -71,6 +71,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  
> @@ -130,6 +131,8 @@ struct tap_filter {
>  #define MAX_TAP_FLOWS  4096
>  
>  #define TUN_FLOW_EXPIRE (3 * HZ)
> +#define TUN_RING_SIZE 256

Can we resize this according to tx_queue_len set by user?

> +#define TUN_RING_MASK (TUN_RING_SIZE - 1)
>  
>  struct tun_pcpu_stats {
>   u64 rx_packets;
> @@ -142,6 +145,11 @@ struct tun_pcpu_stats {
>   u32 rx_frame_errors;
>  };
>  
> +struct tun_desc {
> + struct sk_buff *skb;
> + int len; /* Cached skb len for peeking */
> +};
> +
>  /* A tun_file connects an open character device to a tuntap netdevice. It
>   * also contains all socket related structures (except sock_fprog and 
> tap_filter)
>   * to serve as one transmit queue for tuntap device. The sock_fprog and
> @@ -167,6 +175,13 @@ struct tun_file {
>   };
>   struct list_head next;
>   struct tun_struct *detached;
> + /* reader lock */
> + spinlock_t rlock;
> + unsigned long tail;
> + struct tun_desc tx_descs[TUN_RING_SIZE];
> + /* writer lock */
> + spinlock_t wlock;
> + unsigned long head;
>  };
>  
>  struct tun_flow_entry {
> @@ -515,7 +530,27 @@ static struct tun_struct *tun_enable_queue(struct 
> tun_file *tfile)
>  
>  static void tun_queue_purge(struct tun_file *tfile)
>  {
> + unsigned long head, tail;
> + struct tun_desc *desc;
> + struct sk_buff *skb;
>   skb_queue_purge(>sk.sk_receive_queue);
> + spin_lock(>rlock);
> +
> + head = ACCESS_ONCE(tfile->head);
> + tail = tfile->tail;
> +
> + /* read tail before reading descriptor at tail */
> + smp_rmb();

I think you mean read *head* here


> +
> + while (CIRC_CNT(head, tail, TUN_RING_SIZE) >= 1) {
> + desc = >tx_descs[tail];
> + skb = desc->skb;
> + kfree_skb(skb);
> + tail = (tail + 1) & TUN_RING_MASK;
> + /* read descriptor before incrementing tail. */
> + smp_store_release(>tail, tail & TUN_RING_MASK);
> + }
> + spin_unlock(>rlock);
>   skb_queue_purge(>sk.sk_error_queue);
>  }
>

Barrier pairing seems messed up. Could you tag
each barrier with its pair pls?
E.g. add /* Barrier A for pairing */ Before barrier and
its pair.
  
> @@ -824,6 +859,7 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, 
> struct net_device *dev)
>   int txq = skb->queue_mapping;
>   struct tun_file *tfile;
>   u32 numqueues = 0;
> + unsigned long flags;
>  
>   rcu_read_lock();
>   tfile = rcu_dereference(tun->tfiles[txq]);
> @@ -888,8 +924,35 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, 
> struct net_device *dev)
>  
>   nf_reset(skb);
>  
> - /* Enqueue packet */
> - skb_queue_tail(>socket.sk->sk_receive_queue, skb);
> + if (tun->flags & IFF_TX_RING) {
> + unsigned long head, tail;
> +
> + spin_lock_irqsave(>wlock, flags);
> +
> + head = tfile->head;
> + tail = ACCESS_ONCE(tfile->tail);

this should be acquire

> +
> + if (CIRC_SPACE(head, tail, TUN_RING_SIZE) >= 1) {
> + struct tun_desc *desc = >tx_descs[head];
> +
> + 

Re: [PATCH 3/4] phy: rockchip-emmc: configure default output tap delay

2016-05-15 Thread Shawn Lin

Hi Doug,

On 2016/5/14 6:25, Doug Anderson wrote:

Hi,

On Thu, May 12, 2016 at 3:43 PM, Brian Norris  wrote:

The output tap delay controls helps maintain the hold requirements for
eMMC. The exact value is dependent on the SoC and other factors, though
it isn't really an exact science. But the default of 0 is not very good,
as it doesn't give the eMMC much hold time, so let's bump up to 4
(approx 90 degree phase?). If we need to configure this any further
(e.g., based on board or speed factors), we may need to consider a
device tree representation.


As I understand it, this solves much the same problem as my patch in
, but for the eMMC port
on rk3399 (which doesn't use dw_mmc).  As argued in that patch and
also in the discussion from
, if we eventually end up
needing to put something in the device tree we need to be really
careful.  Specifically to get the exact right value here I think you
need to consider the input clock, speed mode, and any SoC-specific
delays differences between the clock and the data lines.  That would
imply that, if anything, the device tree data would only contain
information about the SoC-specific delay differences and all other
work to set this value would involve coordination between the PHY and
the SDHCI controller.


However, as also discussed previously, we don't appear to need to be
very exact about the value here.  It seems like setting this to 4 (~90
degrees?) is a much better starting point than leaving it at the
default of 0.


The value, 4, is based on real silicon test observed from the
oscilloscope, and of course it meets the requirement of speed modes.
For arasan't phy, its phase is very accurate, so the real timing of
the value you set almost won't vary too much for different Socs.

So explicitly assigning 4 here looks sane currently except for crazy
PCB layout...





...so I'd be all for landing this patch.  Perhaps Shawn can chime in
and confirm that our understanding is correct and possibly we can
update the commit message.  Then presumably someone at Rockchip can
keep working to find a better way to set this long term.

Sound good?


-Doug






--
Best Regards
Shawn Lin



Re: [PATCH 3/4] phy: rockchip-emmc: configure default output tap delay

2016-05-15 Thread Shawn Lin

Hi Doug,

On 2016/5/14 6:25, Doug Anderson wrote:

Hi,

On Thu, May 12, 2016 at 3:43 PM, Brian Norris  wrote:

The output tap delay controls helps maintain the hold requirements for
eMMC. The exact value is dependent on the SoC and other factors, though
it isn't really an exact science. But the default of 0 is not very good,
as it doesn't give the eMMC much hold time, so let's bump up to 4
(approx 90 degree phase?). If we need to configure this any further
(e.g., based on board or speed factors), we may need to consider a
device tree representation.


As I understand it, this solves much the same problem as my patch in
, but for the eMMC port
on rk3399 (which doesn't use dw_mmc).  As argued in that patch and
also in the discussion from
, if we eventually end up
needing to put something in the device tree we need to be really
careful.  Specifically to get the exact right value here I think you
need to consider the input clock, speed mode, and any SoC-specific
delays differences between the clock and the data lines.  That would
imply that, if anything, the device tree data would only contain
information about the SoC-specific delay differences and all other
work to set this value would involve coordination between the PHY and
the SDHCI controller.


However, as also discussed previously, we don't appear to need to be
very exact about the value here.  It seems like setting this to 4 (~90
degrees?) is a much better starting point than leaving it at the
default of 0.


The value, 4, is based on real silicon test observed from the
oscilloscope, and of course it meets the requirement of speed modes.
For arasan't phy, its phase is very accurate, so the real timing of
the value you set almost won't vary too much for different Socs.

So explicitly assigning 4 here looks sane currently except for crazy
PCB layout...





...so I'd be all for landing this patch.  Perhaps Shawn can chime in
and confirm that our understanding is correct and possibly we can
update the commit message.  Then presumably someone at Rockchip can
keep working to find a better way to set this long term.

Sound good?


-Doug






--
Best Regards
Shawn Lin



Re: [PATCH net-next] tuntap: introduce tx skb ring

2016-05-15 Thread Eric Dumazet
On Mon, 2016-05-16 at 09:17 +0800, Jason Wang wrote:
> We used to queue tx packets in sk_receive_queue, this is less
> efficient since it requires spinlocks to synchronize between producer
> and consumer.

...

>   struct tun_struct *detached;
> + /* reader lock */
> + spinlock_t rlock;
> + unsigned long tail;
> + struct tun_desc tx_descs[TUN_RING_SIZE];
> + /* writer lock */
> + spinlock_t wlock;
> + unsigned long head;
>  };
>  

Ok, we had these kind of ideas floating around for many other cases,
like qdisc, UDP or af_packet sockets...

I believe we should have a common set of helpers, not hidden in
drivers/net/tun.c but in net/core/skb_ring.c or something, with more
flexibility (like the number of slots)


BTW, why are you using spin_lock_irqsave() in tun_net_xmit() and
tun_peek() ?

BH should be disabled already (in tun_next_xmit()), and we can not
transmit from hard irq.

Thanks.





Re: [PATCH net-next] tuntap: introduce tx skb ring

2016-05-15 Thread Eric Dumazet
On Mon, 2016-05-16 at 09:17 +0800, Jason Wang wrote:
> We used to queue tx packets in sk_receive_queue, this is less
> efficient since it requires spinlocks to synchronize between producer
> and consumer.

...

>   struct tun_struct *detached;
> + /* reader lock */
> + spinlock_t rlock;
> + unsigned long tail;
> + struct tun_desc tx_descs[TUN_RING_SIZE];
> + /* writer lock */
> + spinlock_t wlock;
> + unsigned long head;
>  };
>  

Ok, we had these kind of ideas floating around for many other cases,
like qdisc, UDP or af_packet sockets...

I believe we should have a common set of helpers, not hidden in
drivers/net/tun.c but in net/core/skb_ring.c or something, with more
flexibility (like the number of slots)


BTW, why are you using spin_lock_irqsave() in tun_net_xmit() and
tun_peek() ?

BH should be disabled already (in tun_next_xmit()), and we can not
transmit from hard irq.

Thanks.





Linux v4.6 does not work with Centos 7 (Radeon) but works with Linux v4.5

2016-05-15 Thread Jeff Merkey
A lot of breakage for Building v4.6 on Centos 7.

The following error occurs after booting Linux v4.6 on Centos 7 with a
Radeon Adapter:

R600_cp:  Failed to load Firmware "Radeon/R600_rlc.bin
FATAL Error during GPU init.

Linux v4.5 works fine.  Also of note, someone moved some scripts
around resulting in and error during kernel install.  This build was
with "make localmodconfig":

scripts/kconfig/conf  --silentoldconfig Kconfig
  CHK include/config/kernel.release
  CHK include/generated/uapi/linux/version.h
  CHK include/generated/utsrelease.h
  CHK include/generated/bounds.h
  CHK include/generated/timeconst.h
  CHK include/generated/asm-offsets.h
  CALLscripts/checksyscalls.sh
  CHK include/generated/compile.h
  CHK include/generated/uapi/linux/version.h
  TESTposttest
Succeed: decoded and checked 1795748 instructions
  TESTposttest
arch/x86/tools/insn_sanity: Success: decoded and checked 100
random instructions with 0 errors (seed:0x399955f4)
Kernel: arch/x86/boot/bzImage is ready  (#7)
  CHK include/config/kernel.release
  CHK include/generated/uapi/linux/version.h
  CHK include/generated/utsrelease.h
  CHK include/generated/bounds.h
  CHK include/generated/timeconst.h
  CHK include/generated/asm-offsets.h
  CALLscripts/checksyscalls.sh
  GZIPkernel/config_data.gz
  CHK kernel/config_data.h
  UPD kernel/config_data.h
  CC [M]  kernel/configs.o
  Building modules, stage 2.
  MODPOST 32 modules
  CC  kernel/configs.mod.o
  LD [M]  kernel/configs.ko
  INSTALL arch/x86/kernel/debug/mdb/mdb.ko
  INSTALL drivers/ata/acard-ahci.ko
  INSTALL drivers/ata/ahci.ko
  INSTALL drivers/ata/ahci_platform.ko
  INSTALL drivers/ata/ata_generic.ko
  INSTALL drivers/ata/libahci.ko
  INSTALL drivers/ata/libahci_platform.ko
  INSTALL drivers/ata/libata.ko
  INSTALL drivers/ata/pata_acpi.ko
  INSTALL drivers/ata/pata_atiixp.ko
  INSTALL drivers/base/regmap/regmap-i2c.ko
  INSTALL drivers/gpu/drm/drm.ko
  INSTALL drivers/gpu/drm/drm_kms_helper.ko
  INSTALL drivers/gpu/drm/radeon/radeon.ko
  INSTALL drivers/gpu/drm/ttm/ttm.ko
  INSTALL drivers/i2c/algos/i2c-algo-bit.ko
  INSTALL drivers/i2c/i2c-core.ko
  INSTALL drivers/input/serio/serio_raw.ko
  INSTALL drivers/net/ethernet/realtek/r8169.ko
  INSTALL drivers/net/mii.ko
  INSTALL drivers/scsi/sd_mod.ko
  INSTALL drivers/usb/storage/usb-storage.ko
  INSTALL drivers/video/fbdev/core/fb_sys_fops.ko
  INSTALL drivers/video/fbdev/core/syscopyarea.ko
  INSTALL drivers/video/fbdev/core/sysfillrect.ko
  INSTALL drivers/video/fbdev/core/sysimgblt.ko
  INSTALL drivers/xen/tmem.ko
  INSTALL drivers/xen/xen-privcmd.ko
  INSTALL fs/ext4/ext4.ko
  INSTALL fs/jbd2/jbd2.ko
  INSTALL fs/mbcache.ko
  INSTALL kernel/configs.ko
  DEPMOD  4.6.0
sh ./arch/x86/boot/install.sh 4.6.0 arch/x86/boot/bzImage \
System.map "/boot"
/usr/lib/dracut/modules.d/99kdumpbase/module-setup.sh: line 4:
/lib/kdump/kdump-lib.sh: No such file or directory
/usr/lib/dracut/modules.d/99kdumpbase/module-setup.sh: line 4:
/lib/kdump/kdump-lib.sh: No such file or directory


Jeff


Linux v4.6 does not work with Centos 7 (Radeon) but works with Linux v4.5

2016-05-15 Thread Jeff Merkey
A lot of breakage for Building v4.6 on Centos 7.

The following error occurs after booting Linux v4.6 on Centos 7 with a
Radeon Adapter:

R600_cp:  Failed to load Firmware "Radeon/R600_rlc.bin
FATAL Error during GPU init.

Linux v4.5 works fine.  Also of note, someone moved some scripts
around resulting in and error during kernel install.  This build was
with "make localmodconfig":

scripts/kconfig/conf  --silentoldconfig Kconfig
  CHK include/config/kernel.release
  CHK include/generated/uapi/linux/version.h
  CHK include/generated/utsrelease.h
  CHK include/generated/bounds.h
  CHK include/generated/timeconst.h
  CHK include/generated/asm-offsets.h
  CALLscripts/checksyscalls.sh
  CHK include/generated/compile.h
  CHK include/generated/uapi/linux/version.h
  TESTposttest
Succeed: decoded and checked 1795748 instructions
  TESTposttest
arch/x86/tools/insn_sanity: Success: decoded and checked 100
random instructions with 0 errors (seed:0x399955f4)
Kernel: arch/x86/boot/bzImage is ready  (#7)
  CHK include/config/kernel.release
  CHK include/generated/uapi/linux/version.h
  CHK include/generated/utsrelease.h
  CHK include/generated/bounds.h
  CHK include/generated/timeconst.h
  CHK include/generated/asm-offsets.h
  CALLscripts/checksyscalls.sh
  GZIPkernel/config_data.gz
  CHK kernel/config_data.h
  UPD kernel/config_data.h
  CC [M]  kernel/configs.o
  Building modules, stage 2.
  MODPOST 32 modules
  CC  kernel/configs.mod.o
  LD [M]  kernel/configs.ko
  INSTALL arch/x86/kernel/debug/mdb/mdb.ko
  INSTALL drivers/ata/acard-ahci.ko
  INSTALL drivers/ata/ahci.ko
  INSTALL drivers/ata/ahci_platform.ko
  INSTALL drivers/ata/ata_generic.ko
  INSTALL drivers/ata/libahci.ko
  INSTALL drivers/ata/libahci_platform.ko
  INSTALL drivers/ata/libata.ko
  INSTALL drivers/ata/pata_acpi.ko
  INSTALL drivers/ata/pata_atiixp.ko
  INSTALL drivers/base/regmap/regmap-i2c.ko
  INSTALL drivers/gpu/drm/drm.ko
  INSTALL drivers/gpu/drm/drm_kms_helper.ko
  INSTALL drivers/gpu/drm/radeon/radeon.ko
  INSTALL drivers/gpu/drm/ttm/ttm.ko
  INSTALL drivers/i2c/algos/i2c-algo-bit.ko
  INSTALL drivers/i2c/i2c-core.ko
  INSTALL drivers/input/serio/serio_raw.ko
  INSTALL drivers/net/ethernet/realtek/r8169.ko
  INSTALL drivers/net/mii.ko
  INSTALL drivers/scsi/sd_mod.ko
  INSTALL drivers/usb/storage/usb-storage.ko
  INSTALL drivers/video/fbdev/core/fb_sys_fops.ko
  INSTALL drivers/video/fbdev/core/syscopyarea.ko
  INSTALL drivers/video/fbdev/core/sysfillrect.ko
  INSTALL drivers/video/fbdev/core/sysimgblt.ko
  INSTALL drivers/xen/tmem.ko
  INSTALL drivers/xen/xen-privcmd.ko
  INSTALL fs/ext4/ext4.ko
  INSTALL fs/jbd2/jbd2.ko
  INSTALL fs/mbcache.ko
  INSTALL kernel/configs.ko
  DEPMOD  4.6.0
sh ./arch/x86/boot/install.sh 4.6.0 arch/x86/boot/bzImage \
System.map "/boot"
/usr/lib/dracut/modules.d/99kdumpbase/module-setup.sh: line 4:
/lib/kdump/kdump-lib.sh: No such file or directory
/usr/lib/dracut/modules.d/99kdumpbase/module-setup.sh: line 4:
/lib/kdump/kdump-lib.sh: No such file or directory


Jeff


linux-next: manual merge of the clk tree with the mips tree

2016-05-15 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the clk tree got a conflict in:

  drivers/clk/Kconfig

between commit:

  ce6e11884659 ("CLK: microchip: Add Microchip PIC32 clock driver.")

from the mips tree and commit:

  0bbd72b4c64f ("clk: Add Oxford Semiconductor OXNAS Standard Clocks")

from the clk tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/clk/Kconfig
index 90518cd7fc9c,2dd371deb23b..
--- a/drivers/clk/Kconfig
+++ b/drivers/clk/Kconfig
@@@ -197,9 -197,12 +197,15 @@@ config COMMON_CLK_PX
---help---
  Support for the Marvell PXA SoC.
  
 +config COMMON_CLK_PIC32
 +  def_bool COMMON_CLK && MACH_PIC32
 +
+ config COMMON_CLK_OXNAS
+   bool "Clock driver for the OXNAS SoC Family"
+   select MFD_SYSCON
+   ---help---
+ Support for the OXNAS SoC Family clocks.
+ 
  source "drivers/clk/bcm/Kconfig"
  source "drivers/clk/hisilicon/Kconfig"
  source "drivers/clk/mvebu/Kconfig"


linux-next: manual merge of the clk tree with the mips tree

2016-05-15 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the clk tree got a conflict in:

  drivers/clk/Kconfig

between commit:

  ce6e11884659 ("CLK: microchip: Add Microchip PIC32 clock driver.")

from the mips tree and commit:

  0bbd72b4c64f ("clk: Add Oxford Semiconductor OXNAS Standard Clocks")

from the clk tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/clk/Kconfig
index 90518cd7fc9c,2dd371deb23b..
--- a/drivers/clk/Kconfig
+++ b/drivers/clk/Kconfig
@@@ -197,9 -197,12 +197,15 @@@ config COMMON_CLK_PX
---help---
  Support for the Marvell PXA SoC.
  
 +config COMMON_CLK_PIC32
 +  def_bool COMMON_CLK && MACH_PIC32
 +
+ config COMMON_CLK_OXNAS
+   bool "Clock driver for the OXNAS SoC Family"
+   select MFD_SYSCON
+   ---help---
+ Support for the OXNAS SoC Family clocks.
+ 
  source "drivers/clk/bcm/Kconfig"
  source "drivers/clk/hisilicon/Kconfig"
  source "drivers/clk/mvebu/Kconfig"


[git pull] vfs.git

2016-05-15 Thread Al Viro
FWIW, I considered sending that pile in several pull requests, but for some
reason git request-pull v4.6 vfs work.lookups spews something very odd into
diffstat - files that have never been touched by it and, in fact, doing
merge with mainline does *not* end up with those files anywhere in the
diff.  Full pile doesn't produce any oddities of that sort, so...

Several series here:
* constified struct path * in LSM arguments (me)
* acl and xattr cleanups (some from me, most from Andreas)
* parallel lookups/readdir/atomic_open (me).  ->i_mutex replaced with
rwsem, pure lookups take it shared.  Exclusion is per-name - no parallel
lookups on the same name in the same parent at the same time.  ->atomic_open()
without O_CREAT is also called with parent locked shared.  ->iterate() is
being replaced by a new method (->iterate_shared()), which is called with
directory being locked only shared.  Most of the filesystems switched to it.
All of them (switched or not) get per-struct file exclusion for readdir
and lseek.  Incidentally, do_last()/lookup_open()/atomic_open() got cleaned
up quite a bit.
* preadv2 updates (Christoph)
* rlimit vs coredumping stuff (Omar Sandoval)
* cifs finally getting rid of copying iovecs, manually draining them,
etc. - sock_sendmsg() and sock_recvmsg() allow to simplify things quite a bit
(me).
* assorted cleanups and fixes

If you prefer that stuff to go in separate pulls, please say so.  I've no
idea what's triggering the junk in git request-pull for work.lookups -
looks like a merge from -rc1-based branch into -rc3-based one has caused
that somehow (commit 84695ffee).  Affected files are the ones changed in
mainline between -rc1 and -rc3 and they *are* identical to their -rc3 state
after that merge commit...

Anyway, sane-looking git request-pull for the whole pile follows:

The following changes since commit 38b78a5f18584db6fa7441e0f4531b283b0e6725:

  ovl: ignore permissions on underlying lookup (2016-05-10 23:58:18 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-next

for you to fetch changes up to 16a9fad4ef80ba18d84615eec1a6286b9d71e909:

  Merge branch 'sendmsg.cifs' into for-next (2016-05-12 22:31:59 -0400)


Al Viro (113):
  [apparmor] constify struct path * in a bunch of helpers
  mtd: switch open_mtd_by_chdev() to use of vfs_stat()
  mtd: switch ubi_open_volume_path() to vfs_stat()
  __d_alloc(): treat NULL name as QSTR("/", 1)
  bpf: reject invalid names right in ->lookup()
  constify security_path_truncate()
  constify vfs_truncate()
  apparmor_path_truncate(): path->mnt is never NULL
  tomoyo: constify assorted struct path *
  constify chown_common/security_path_chown
  constify security_sb_mount()
  constify chmod_common/security_path_chmod
  apparmor: new helper - common_path_perm()
  apparmor: constify aa_path_link()
  apparmor: constify common_perm_...()
  constify security_path_{unlink,rmdir}
  constify security_path_{mkdir,mknod,symlink}
  apparmor: remove useless checks for NULL ->mnt
  constify security_path_{link,rename}
  constify security_path_chroot()
  constify security_sb_pivotroot()
  constify ima_d_path()
  reiserfs_cache_default_acl(): use get_acl()
  [net] drop 'size' argument of sock_recvmsg()
  cifs: merge the hash calculation helpers
  cifs: quit playing games with draining iovecs
  cifs: no need to wank with copying and advancing iovec on recvmsg side 
either
  cifs_readv_receive: use cifs_read_from_socket()
  cifs: don't bother with kmap on read_pages side
  ecryptfs: avoid multiple aliases for directories
  ecryptfs_lookup(): try either only encrypted or plaintext name
  aio: remove a pointless assignment
  rw_verify_area(): saner calling conventions
  Merge branch 'for-linus' into work.iov_iter
  don't bother with ->d_inode->i_sb - it's always equal to ->d_sb
  cifs: kill more bogus checks in ->...xattr() methods
  reiserfs: switch to generic_{get,set,remove}xattr()
  xattr_handler: pass dentry and inode as separate arguments of ->get()
  ->getxattr(): pass dentry and inode as separate arguments
  Merge getxattr prototype change into work.lookups
  security_d_instantiate(): move to the point prior to attaching dentry to 
inode
  kernfs: use lookup_one_len_unlocked()
  configfs_detach_prep(): make sure that wait_mutex won't go away
  ocfs2: don't open-code inode_lock/inode_unlock
  orangefs: don't open-code inode_lock/inode_unlock
  reiserfs: open-code reiserfs_mutex_lock_safe() in reiserfs_unpack()
  reconnect_one(): use lookup_one_len_unlocked()
  ovl_lookup_real(): use lookup_one_len_unlocked()
  make ext2_get_page() and friends work without external serialization
  

[git pull] vfs.git

2016-05-15 Thread Al Viro
FWIW, I considered sending that pile in several pull requests, but for some
reason git request-pull v4.6 vfs work.lookups spews something very odd into
diffstat - files that have never been touched by it and, in fact, doing
merge with mainline does *not* end up with those files anywhere in the
diff.  Full pile doesn't produce any oddities of that sort, so...

Several series here:
* constified struct path * in LSM arguments (me)
* acl and xattr cleanups (some from me, most from Andreas)
* parallel lookups/readdir/atomic_open (me).  ->i_mutex replaced with
rwsem, pure lookups take it shared.  Exclusion is per-name - no parallel
lookups on the same name in the same parent at the same time.  ->atomic_open()
without O_CREAT is also called with parent locked shared.  ->iterate() is
being replaced by a new method (->iterate_shared()), which is called with
directory being locked only shared.  Most of the filesystems switched to it.
All of them (switched or not) get per-struct file exclusion for readdir
and lseek.  Incidentally, do_last()/lookup_open()/atomic_open() got cleaned
up quite a bit.
* preadv2 updates (Christoph)
* rlimit vs coredumping stuff (Omar Sandoval)
* cifs finally getting rid of copying iovecs, manually draining them,
etc. - sock_sendmsg() and sock_recvmsg() allow to simplify things quite a bit
(me).
* assorted cleanups and fixes

If you prefer that stuff to go in separate pulls, please say so.  I've no
idea what's triggering the junk in git request-pull for work.lookups -
looks like a merge from -rc1-based branch into -rc3-based one has caused
that somehow (commit 84695ffee).  Affected files are the ones changed in
mainline between -rc1 and -rc3 and they *are* identical to their -rc3 state
after that merge commit...

Anyway, sane-looking git request-pull for the whole pile follows:

The following changes since commit 38b78a5f18584db6fa7441e0f4531b283b0e6725:

  ovl: ignore permissions on underlying lookup (2016-05-10 23:58:18 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-next

for you to fetch changes up to 16a9fad4ef80ba18d84615eec1a6286b9d71e909:

  Merge branch 'sendmsg.cifs' into for-next (2016-05-12 22:31:59 -0400)


Al Viro (113):
  [apparmor] constify struct path * in a bunch of helpers
  mtd: switch open_mtd_by_chdev() to use of vfs_stat()
  mtd: switch ubi_open_volume_path() to vfs_stat()
  __d_alloc(): treat NULL name as QSTR("/", 1)
  bpf: reject invalid names right in ->lookup()
  constify security_path_truncate()
  constify vfs_truncate()
  apparmor_path_truncate(): path->mnt is never NULL
  tomoyo: constify assorted struct path *
  constify chown_common/security_path_chown
  constify security_sb_mount()
  constify chmod_common/security_path_chmod
  apparmor: new helper - common_path_perm()
  apparmor: constify aa_path_link()
  apparmor: constify common_perm_...()
  constify security_path_{unlink,rmdir}
  constify security_path_{mkdir,mknod,symlink}
  apparmor: remove useless checks for NULL ->mnt
  constify security_path_{link,rename}
  constify security_path_chroot()
  constify security_sb_pivotroot()
  constify ima_d_path()
  reiserfs_cache_default_acl(): use get_acl()
  [net] drop 'size' argument of sock_recvmsg()
  cifs: merge the hash calculation helpers
  cifs: quit playing games with draining iovecs
  cifs: no need to wank with copying and advancing iovec on recvmsg side 
either
  cifs_readv_receive: use cifs_read_from_socket()
  cifs: don't bother with kmap on read_pages side
  ecryptfs: avoid multiple aliases for directories
  ecryptfs_lookup(): try either only encrypted or plaintext name
  aio: remove a pointless assignment
  rw_verify_area(): saner calling conventions
  Merge branch 'for-linus' into work.iov_iter
  don't bother with ->d_inode->i_sb - it's always equal to ->d_sb
  cifs: kill more bogus checks in ->...xattr() methods
  reiserfs: switch to generic_{get,set,remove}xattr()
  xattr_handler: pass dentry and inode as separate arguments of ->get()
  ->getxattr(): pass dentry and inode as separate arguments
  Merge getxattr prototype change into work.lookups
  security_d_instantiate(): move to the point prior to attaching dentry to 
inode
  kernfs: use lookup_one_len_unlocked()
  configfs_detach_prep(): make sure that wait_mutex won't go away
  ocfs2: don't open-code inode_lock/inode_unlock
  orangefs: don't open-code inode_lock/inode_unlock
  reiserfs: open-code reiserfs_mutex_lock_safe() in reiserfs_unpack()
  reconnect_one(): use lookup_one_len_unlocked()
  ovl_lookup_real(): use lookup_one_len_unlocked()
  make ext2_get_page() and friends work without external serialization
  

Re: [PATCH 08/17] perf record: Don't poll on overwrite channel

2016-05-15 Thread Wangnan (F)



On 2016/5/13 21:12, Arnaldo Carvalho de Melo wrote:

Em Fri, May 13, 2016 at 07:56:05AM +, Wang Nan escreveu:

There's no need to receive events from overwritable ring buffer. Instead,
perf should make them run background until something happen. This patch
makes normal events from overwrite ring buffer ignored.

Signed-off-by: Wang Nan 
Signed-off-by: He Kuang 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Masami Hiramatsu 
Cc: Namhyung Kim 
Cc: Zefan Li 
Cc: pi3or...@163.com
---
  tools/perf/util/evlist.c | 23 +++
  1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index abce588..f0b0457 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -461,9 +461,9 @@ int perf_evlist__alloc_pollfd(struct perf_evlist *evlist)
return 0;
  }
  
-static int __perf_evlist__add_pollfd(struct perf_evlist *evlist, int fd, int idx)

+static int __perf_evlist__add_pollfd(struct perf_evlist *evlist, int fd, int 
idx, short revent)
  {
-   int pos = fdarray__add(>pollfd, fd, POLLIN | POLLERR | POLLHUP);
+   int pos = fdarray__add(>pollfd, fd, revent | POLLERR | POLLHUP);
/*
 * Save the idx so that when we filter out fds POLLHUP'ed we can
 * close the associated evlist->mmap[] entry.
@@ -479,7 +479,7 @@ static int __perf_evlist__add_pollfd(struct perf_evlist 
*evlist, int fd, int idx
  
  int perf_evlist__add_pollfd(struct perf_evlist *evlist, int fd)

  {
-   return __perf_evlist__add_pollfd(evlist, fd, -1);
+   return __perf_evlist__add_pollfd(evlist, fd, -1, POLLIN);
  }
  
  static void perf_evlist__munmap_filtered(struct fdarray *fda, int fd)

@@ -1077,6 +1077,18 @@ perf_evlist__channel_complete(struct perf_evlist *evlist)
return 0;
  }
  
+static bool

+perf_evlist__should_poll(struct perf_evlist *evlist,
+struct perf_evsel *evsel,
+int channel)
+{
+   if (evsel->system_wide)
+   return false;

So, what is the above doing in this patch? If we should not poll when in
syswide mode, then this should be in a separate patch, unrelated to
'channels'.  No?


I think the name 'system_wide' is more or less missleading. It is not means
an event in 'perf record -a', but means "a selected event to be opened 
always

without a pid when configured by perf_evsel__config().". See bf8e8f4b8.

Here we use similary logic in existing perf_evlist__mmap_per_evsel. It never
poll system_wide evsel:

/*
 * The system_wide flag causes a selected event to be 
opened

 * always without a pid.  Consequently it will never get a
 * POLLHUP, but it is used for tracking in combination with
 * other events, so it should not need to be polled anyway.
 * Therefore don't add it for polling.
 */
if (!evsel->system_wide &&
__perf_evlist__add_pollfd(evlist, fd, idx) < 0) {
perf_evlist__mmap_put(evlist, idx);
return -1;
}

Thank you.



Re: [PATCH 08/17] perf record: Don't poll on overwrite channel

2016-05-15 Thread Wangnan (F)



On 2016/5/13 21:12, Arnaldo Carvalho de Melo wrote:

Em Fri, May 13, 2016 at 07:56:05AM +, Wang Nan escreveu:

There's no need to receive events from overwritable ring buffer. Instead,
perf should make them run background until something happen. This patch
makes normal events from overwrite ring buffer ignored.

Signed-off-by: Wang Nan 
Signed-off-by: He Kuang 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Masami Hiramatsu 
Cc: Namhyung Kim 
Cc: Zefan Li 
Cc: pi3or...@163.com
---
  tools/perf/util/evlist.c | 23 +++
  1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index abce588..f0b0457 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -461,9 +461,9 @@ int perf_evlist__alloc_pollfd(struct perf_evlist *evlist)
return 0;
  }
  
-static int __perf_evlist__add_pollfd(struct perf_evlist *evlist, int fd, int idx)

+static int __perf_evlist__add_pollfd(struct perf_evlist *evlist, int fd, int 
idx, short revent)
  {
-   int pos = fdarray__add(>pollfd, fd, POLLIN | POLLERR | POLLHUP);
+   int pos = fdarray__add(>pollfd, fd, revent | POLLERR | POLLHUP);
/*
 * Save the idx so that when we filter out fds POLLHUP'ed we can
 * close the associated evlist->mmap[] entry.
@@ -479,7 +479,7 @@ static int __perf_evlist__add_pollfd(struct perf_evlist 
*evlist, int fd, int idx
  
  int perf_evlist__add_pollfd(struct perf_evlist *evlist, int fd)

  {
-   return __perf_evlist__add_pollfd(evlist, fd, -1);
+   return __perf_evlist__add_pollfd(evlist, fd, -1, POLLIN);
  }
  
  static void perf_evlist__munmap_filtered(struct fdarray *fda, int fd)

@@ -1077,6 +1077,18 @@ perf_evlist__channel_complete(struct perf_evlist *evlist)
return 0;
  }
  
+static bool

+perf_evlist__should_poll(struct perf_evlist *evlist,
+struct perf_evsel *evsel,
+int channel)
+{
+   if (evsel->system_wide)
+   return false;

So, what is the above doing in this patch? If we should not poll when in
syswide mode, then this should be in a separate patch, unrelated to
'channels'.  No?


I think the name 'system_wide' is more or less missleading. It is not means
an event in 'perf record -a', but means "a selected event to be opened 
always

without a pid when configured by perf_evsel__config().". See bf8e8f4b8.

Here we use similary logic in existing perf_evlist__mmap_per_evsel. It never
poll system_wide evsel:

/*
 * The system_wide flag causes a selected event to be 
opened

 * always without a pid.  Consequently it will never get a
 * POLLHUP, but it is used for tracking in combination with
 * other events, so it should not need to be polled anyway.
 * Therefore don't add it for polling.
 */
if (!evsel->system_wide &&
__perf_evlist__add_pollfd(evlist, fd, idx) < 0) {
perf_evlist__mmap_put(evlist, idx);
return -1;
}

Thank you.



Re: [PATCH v5 09/12] zsmalloc: separate free_zspage from putback_zspage

2016-05-15 Thread Sergey Senozhatsky
On (05/09/16 11:20), Minchan Kim wrote:
> Currently, putback_zspage does free zspage under class->lock
> if fullness become ZS_EMPTY but it makes trouble to implement
> locking scheme for new zspage migration.
> So, this patch is to separate free_zspage from putback_zspage
> and free zspage out of class->lock which is preparation for
> zspage migration.
> 
> Cc: Sergey Senozhatsky 
> Signed-off-by: Minchan Kim 


Reviewed-by: Sergey Senozhatsky 

-ss

> ---
>  mm/zsmalloc.c | 27 +++
>  1 file changed, 11 insertions(+), 16 deletions(-)
> 
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index 162a598a417a..5ccd83732a14 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -1685,14 +1685,12 @@ static struct zspage *isolate_zspage(struct 
> size_class *class, bool source)
>  
>  /*
>   * putback_zspage - add @zspage into right class's fullness list
> - * @pool: target pool
>   * @class: destination class
>   * @zspage: target page
>   *
>   * Return @zspage's fullness_group
>   */
> -static enum fullness_group putback_zspage(struct zs_pool *pool,
> - struct size_class *class,
> +static enum fullness_group putback_zspage(struct size_class *class,
>   struct zspage *zspage)
>  {
>   enum fullness_group fullness;
> @@ -1701,15 +1699,6 @@ static enum fullness_group putback_zspage(struct 
> zs_pool *pool,
>   insert_zspage(class, zspage, fullness);
>   set_zspage_mapping(zspage, class->index, fullness);
>  
> - if (fullness == ZS_EMPTY) {
> - zs_stat_dec(class, OBJ_ALLOCATED, get_maxobj_per_zspage(
> - class->size, class->pages_per_zspage));
> - atomic_long_sub(class->pages_per_zspage,
> - >pages_allocated);
> -
> - free_zspage(pool, zspage);
> - }
> -
>   return fullness;
>  }
>  
> @@ -1755,23 +1744,29 @@ static void __zs_compact(struct zs_pool *pool, struct 
> size_class *class)
>   if (!migrate_zspage(pool, class, ))
>   break;
>  
> - putback_zspage(pool, class, dst_zspage);
> + putback_zspage(class, dst_zspage);
>   }
>  
>   /* Stop if we couldn't find slot */
>   if (dst_zspage == NULL)
>   break;
>  
> - putback_zspage(pool, class, dst_zspage);
> - if (putback_zspage(pool, class, src_zspage) == ZS_EMPTY)
> + putback_zspage(class, dst_zspage);
> + if (putback_zspage(class, src_zspage) == ZS_EMPTY) {
> + zs_stat_dec(class, OBJ_ALLOCATED, get_maxobj_per_zspage(
> + class->size, class->pages_per_zspage));
> + atomic_long_sub(class->pages_per_zspage,
> + >pages_allocated);
> + free_zspage(pool, src_zspage);
>   pool->stats.pages_compacted += class->pages_per_zspage;
> + }
>   spin_unlock(>lock);
>   cond_resched();
>   spin_lock(>lock);
>   }
>  
>   if (src_zspage)
> - putback_zspage(pool, class, src_zspage);
> + putback_zspage(class, src_zspage);
>  
>   spin_unlock(>lock);
>  }
> -- 
> 1.9.1
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
> 


Re: [PATCH v5 09/12] zsmalloc: separate free_zspage from putback_zspage

2016-05-15 Thread Sergey Senozhatsky
On (05/09/16 11:20), Minchan Kim wrote:
> Currently, putback_zspage does free zspage under class->lock
> if fullness become ZS_EMPTY but it makes trouble to implement
> locking scheme for new zspage migration.
> So, this patch is to separate free_zspage from putback_zspage
> and free zspage out of class->lock which is preparation for
> zspage migration.
> 
> Cc: Sergey Senozhatsky 
> Signed-off-by: Minchan Kim 


Reviewed-by: Sergey Senozhatsky 

-ss

> ---
>  mm/zsmalloc.c | 27 +++
>  1 file changed, 11 insertions(+), 16 deletions(-)
> 
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index 162a598a417a..5ccd83732a14 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -1685,14 +1685,12 @@ static struct zspage *isolate_zspage(struct 
> size_class *class, bool source)
>  
>  /*
>   * putback_zspage - add @zspage into right class's fullness list
> - * @pool: target pool
>   * @class: destination class
>   * @zspage: target page
>   *
>   * Return @zspage's fullness_group
>   */
> -static enum fullness_group putback_zspage(struct zs_pool *pool,
> - struct size_class *class,
> +static enum fullness_group putback_zspage(struct size_class *class,
>   struct zspage *zspage)
>  {
>   enum fullness_group fullness;
> @@ -1701,15 +1699,6 @@ static enum fullness_group putback_zspage(struct 
> zs_pool *pool,
>   insert_zspage(class, zspage, fullness);
>   set_zspage_mapping(zspage, class->index, fullness);
>  
> - if (fullness == ZS_EMPTY) {
> - zs_stat_dec(class, OBJ_ALLOCATED, get_maxobj_per_zspage(
> - class->size, class->pages_per_zspage));
> - atomic_long_sub(class->pages_per_zspage,
> - >pages_allocated);
> -
> - free_zspage(pool, zspage);
> - }
> -
>   return fullness;
>  }
>  
> @@ -1755,23 +1744,29 @@ static void __zs_compact(struct zs_pool *pool, struct 
> size_class *class)
>   if (!migrate_zspage(pool, class, ))
>   break;
>  
> - putback_zspage(pool, class, dst_zspage);
> + putback_zspage(class, dst_zspage);
>   }
>  
>   /* Stop if we couldn't find slot */
>   if (dst_zspage == NULL)
>   break;
>  
> - putback_zspage(pool, class, dst_zspage);
> - if (putback_zspage(pool, class, src_zspage) == ZS_EMPTY)
> + putback_zspage(class, dst_zspage);
> + if (putback_zspage(class, src_zspage) == ZS_EMPTY) {
> + zs_stat_dec(class, OBJ_ALLOCATED, get_maxobj_per_zspage(
> + class->size, class->pages_per_zspage));
> + atomic_long_sub(class->pages_per_zspage,
> + >pages_allocated);
> + free_zspage(pool, src_zspage);
>   pool->stats.pages_compacted += class->pages_per_zspage;
> + }
>   spin_unlock(>lock);
>   cond_resched();
>   spin_lock(>lock);
>   }
>  
>   if (src_zspage)
> - putback_zspage(pool, class, src_zspage);
> + putback_zspage(class, src_zspage);
>  
>   spin_unlock(>lock);
>  }
> -- 
> 1.9.1
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
> 


Re: [PATCH v5 08/12] zsmalloc: introduce zspage structure

2016-05-15 Thread Sergey Senozhatsky
On (05/09/16 11:20), Minchan Kim wrote:
> We have squeezed meta data of zspage into first page's descriptor.
> So, to get meta data from subpage, we should get first page first
> of all. But it makes trouble to implment page migration feature
> of zsmalloc because any place where to get first page from subpage
> can be raced with first page migration. IOW, first page it got
> could be stale. For preventing it, I have tried several approahces
> but it made code complicated so finally, I concluded to separate
> metadata from first page. Of course, it consumes more memory. IOW,
> 16bytes per zspage on 32bit at the moment. It means we lost 1%
> at *worst case*(40B/4096B) which is not bad I think at the cost of
> maintenance.
> 
> Cc: Sergey Senozhatsky 
> Signed-off-by: Minchan Kim 
[..]
> @@ -153,8 +138,6 @@
>  enum fullness_group {
>   ZS_ALMOST_FULL,
>   ZS_ALMOST_EMPTY,
> - _ZS_NR_FULLNESS_GROUPS,
> -
>   ZS_EMPTY,
>   ZS_FULL
>  };
> @@ -203,7 +186,7 @@ static const int fullness_threshold_frac = 4;
>  
>  struct size_class {
>   spinlock_t lock;
> - struct page *fullness_list[_ZS_NR_FULLNESS_GROUPS];
> + struct list_head fullness_list[2];

seems that it also has some cleaup bits in it.

[..]
> -static int create_handle_cache(struct zs_pool *pool)
> +static int create_cache(struct zs_pool *pool)
>  {
>   pool->handle_cachep = kmem_cache_create("zs_handle", ZS_HANDLE_SIZE,
>   0, 0, NULL);
> - return pool->handle_cachep ? 0 : 1;
> + if (!pool->handle_cachep)
> + return 1;
> +
> + pool->zspage_cachep = kmem_cache_create("zspage", sizeof(struct zspage),
> + 0, 0, NULL);
> + if (!pool->zspage_cachep) {
> + kmem_cache_destroy(pool->handle_cachep);
^

do you need to NULL a pool->handle_cachep here?

zs_create_pool()
if (create_cache() == 1) {
pool->zspage_cachep NULL
pool->handle_cachep !NULL   already freed -> 
kmem_cache_destroy()
return 1;
goto err
}
err:
zs_destroy_pool()
destroy_cache() {
kmem_cache_destroy(pool->handle_cachep);  !NULL and 
freed
kmem_cache_destroy(pool->zspage_cachep);  NULL ok
}


can we also switch create_cache() to errnos? I just like a bit
better
return -ENOMEM;
else
return 0;

than

return 1;
else
return 0;


> @@ -997,44 +951,38 @@ static void init_zspage(struct size_class *class, 
> struct page *first_page)
>   off %= PAGE_SIZE;
>   }
>  
> - set_freeobj(first_page, (unsigned long)location_to_obj(first_page, 0));
> + set_freeobj(zspage,
> + (unsigned long)location_to_obj(zspage->first_page, 0));

static unsigned long location_to_obj()

it's already returning "(unsigned long)", so here and in several other places
this cast can be dropped.

[..]
> +static struct zspage *isolate_zspage(struct size_class *class, bool source)
>  {
> + struct zspage *zspage;
> + enum fullness_group fg[2] = {ZS_ALMOST_EMPTY, ZS_ALMOST_FULL};
> + if (!source) {
> + fg[0] = ZS_ALMOST_FULL;
> + fg[1] = ZS_ALMOST_EMPTY;
> + }
> +
> + for (i = 0; i < 2; i++) {

sorry, why not "for (i = ZS_ALMOST_EMPTY; i <= ZS_ALMOST_FULL ..." ?

> + zspage = list_first_entry_or_null(>fullness_list[fg[i]],
> + struct zspage, list);
> + if (zspage) {
> + remove_zspage(class, zspage, fg[i]);
> + return zspage;
>   }
>   }
>  
> - return page;
> + return zspage;
>  }

-ss


Re: [PATCH v5 08/12] zsmalloc: introduce zspage structure

2016-05-15 Thread Sergey Senozhatsky
On (05/09/16 11:20), Minchan Kim wrote:
> We have squeezed meta data of zspage into first page's descriptor.
> So, to get meta data from subpage, we should get first page first
> of all. But it makes trouble to implment page migration feature
> of zsmalloc because any place where to get first page from subpage
> can be raced with first page migration. IOW, first page it got
> could be stale. For preventing it, I have tried several approahces
> but it made code complicated so finally, I concluded to separate
> metadata from first page. Of course, it consumes more memory. IOW,
> 16bytes per zspage on 32bit at the moment. It means we lost 1%
> at *worst case*(40B/4096B) which is not bad I think at the cost of
> maintenance.
> 
> Cc: Sergey Senozhatsky 
> Signed-off-by: Minchan Kim 
[..]
> @@ -153,8 +138,6 @@
>  enum fullness_group {
>   ZS_ALMOST_FULL,
>   ZS_ALMOST_EMPTY,
> - _ZS_NR_FULLNESS_GROUPS,
> -
>   ZS_EMPTY,
>   ZS_FULL
>  };
> @@ -203,7 +186,7 @@ static const int fullness_threshold_frac = 4;
>  
>  struct size_class {
>   spinlock_t lock;
> - struct page *fullness_list[_ZS_NR_FULLNESS_GROUPS];
> + struct list_head fullness_list[2];

seems that it also has some cleaup bits in it.

[..]
> -static int create_handle_cache(struct zs_pool *pool)
> +static int create_cache(struct zs_pool *pool)
>  {
>   pool->handle_cachep = kmem_cache_create("zs_handle", ZS_HANDLE_SIZE,
>   0, 0, NULL);
> - return pool->handle_cachep ? 0 : 1;
> + if (!pool->handle_cachep)
> + return 1;
> +
> + pool->zspage_cachep = kmem_cache_create("zspage", sizeof(struct zspage),
> + 0, 0, NULL);
> + if (!pool->zspage_cachep) {
> + kmem_cache_destroy(pool->handle_cachep);
^

do you need to NULL a pool->handle_cachep here?

zs_create_pool()
if (create_cache() == 1) {
pool->zspage_cachep NULL
pool->handle_cachep !NULL   already freed -> 
kmem_cache_destroy()
return 1;
goto err
}
err:
zs_destroy_pool()
destroy_cache() {
kmem_cache_destroy(pool->handle_cachep);  !NULL and 
freed
kmem_cache_destroy(pool->zspage_cachep);  NULL ok
}


can we also switch create_cache() to errnos? I just like a bit
better
return -ENOMEM;
else
return 0;

than

return 1;
else
return 0;


> @@ -997,44 +951,38 @@ static void init_zspage(struct size_class *class, 
> struct page *first_page)
>   off %= PAGE_SIZE;
>   }
>  
> - set_freeobj(first_page, (unsigned long)location_to_obj(first_page, 0));
> + set_freeobj(zspage,
> + (unsigned long)location_to_obj(zspage->first_page, 0));

static unsigned long location_to_obj()

it's already returning "(unsigned long)", so here and in several other places
this cast can be dropped.

[..]
> +static struct zspage *isolate_zspage(struct size_class *class, bool source)
>  {
> + struct zspage *zspage;
> + enum fullness_group fg[2] = {ZS_ALMOST_EMPTY, ZS_ALMOST_FULL};
> + if (!source) {
> + fg[0] = ZS_ALMOST_FULL;
> + fg[1] = ZS_ALMOST_EMPTY;
> + }
> +
> + for (i = 0; i < 2; i++) {

sorry, why not "for (i = ZS_ALMOST_EMPTY; i <= ZS_ALMOST_FULL ..." ?

> + zspage = list_first_entry_or_null(>fullness_list[fg[i]],
> + struct zspage, list);
> + if (zspage) {
> + remove_zspage(class, zspage, fg[i]);
> + return zspage;
>   }
>   }
>  
> - return page;
> + return zspage;
>  }

-ss


RE: [PATCH] ACPI: Execute the _PTS method when system reboot

2016-05-15 Thread Ocean HY1 He
To whom may concern,

A Lenovo feature depends on _PTS method execution when reboot. And after check 
the ACPI spec, I think _PTS should be exectued when reboo. This patch could fix 
the problem.

Any comments of this patch? Many thanks!

Ocean He / 何海洋
SW Development Dept. 
Beijing Design Center
Enterprise Product Group
Mobile: 18911778926
E-mail: he...@lenovo.com
No.6 Chuang Ye Road, Haidian District, Beijing, China 100085

-Original Message-
From: Ocean HY1 He 
Sent: Monday, May 09, 2016 1:50 PM
To: r...@rjwysocki.net; l...@kernel.org
Cc: linux-a...@vger.kernel.org; linux-kernel@vger.kernel.org; David Tanaka; 
Ocean HY1 He; Nagananda Chumbalkar
Subject: [PATCH] ACPI: Execute the _PTS method when system reboot

The _PTS control method is defined in the section 7.4.1 of acpi 6.0
spec. The _PTS control method is executed by the OS during the sleep
transition process for S1, S2, S3, S4, and for orderly S5 shutdown.
The sleeping state value (For example, 1, 2, 3, 4 or 5 for the S5
soft-off state) is passed to the _PTS control method. This method
is called after OSPM has notified native device drivers of the sleep
state transition and before the OSPM has had a chance to fully
prepare the system for a sleep state transition.

The _PTS control method provides the BIOS a mechanism for performing
some housekeeping, such as writing the sleep type value to the embedded
controller, before entering the system sleeping state.

According to section 7.5 of acpi 6.0 spec, _PTS should run after _TTS.

Thus, a _PTS block notifier is added to the reboot notifier list so that
the _PTS object will also be evaluated when the system reboot.

Signed-off-by: Ocean He 
Signed-off-by: Nagananda Chumbalkar 
---
 drivers/acpi/sleep.c | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/drivers/acpi/sleep.c b/drivers/acpi/sleep.c
index 2a8b596..8b290fb 100644
--- a/drivers/acpi/sleep.c
+++ b/drivers/acpi/sleep.c
@@ -55,6 +55,26 @@ static struct notifier_block tts_notifier = {
.priority   = 0,
 };
 
+static int pts_notify_reboot(struct notifier_block *this,
+   unsigned long code, void *x)
+{
+   acpi_status status;
+
+   status = acpi_execute_simple_method(NULL, "\\_PTS", ACPI_STATE_S5);
+   if (ACPI_FAILURE(status) && status != AE_NOT_FOUND) {
+   /* It won't break anything. */
+   printk(KERN_NOTICE "Failure in evaluating _PTS object\n");
+   }
+
+   return NOTIFY_DONE;
+}
+
+static struct notifier_block pts_notifier = {
+   .notifier_call  = pts_notify_reboot,
+   .next   = NULL,
+   .priority   = 0,
+};
+
 static int acpi_sleep_prepare(u32 acpi_state)
 {
 #ifdef CONFIG_ACPI_SLEEP
@@ -896,5 +916,12 @@ int __init acpi_sleep_init(void)
 * object can also be evaluated when the system enters S5.
 */
register_reboot_notifier(_notifier);
+
+   /*
+* According to section 7.5 of acpi 6.0 spec, _PTS should run after
+* _TTS when the system enters S5.
+*/
+   register_reboot_notifier(_notifier);
+
return 0;
 }
-- 
1.8.3.1


RE: [PATCH] ACPI: Execute the _PTS method when system reboot

2016-05-15 Thread Ocean HY1 He
To whom may concern,

A Lenovo feature depends on _PTS method execution when reboot. And after check 
the ACPI spec, I think _PTS should be exectued when reboo. This patch could fix 
the problem.

Any comments of this patch? Many thanks!

Ocean He / 何海洋
SW Development Dept. 
Beijing Design Center
Enterprise Product Group
Mobile: 18911778926
E-mail: he...@lenovo.com
No.6 Chuang Ye Road, Haidian District, Beijing, China 100085

-Original Message-
From: Ocean HY1 He 
Sent: Monday, May 09, 2016 1:50 PM
To: r...@rjwysocki.net; l...@kernel.org
Cc: linux-a...@vger.kernel.org; linux-kernel@vger.kernel.org; David Tanaka; 
Ocean HY1 He; Nagananda Chumbalkar
Subject: [PATCH] ACPI: Execute the _PTS method when system reboot

The _PTS control method is defined in the section 7.4.1 of acpi 6.0
spec. The _PTS control method is executed by the OS during the sleep
transition process for S1, S2, S3, S4, and for orderly S5 shutdown.
The sleeping state value (For example, 1, 2, 3, 4 or 5 for the S5
soft-off state) is passed to the _PTS control method. This method
is called after OSPM has notified native device drivers of the sleep
state transition and before the OSPM has had a chance to fully
prepare the system for a sleep state transition.

The _PTS control method provides the BIOS a mechanism for performing
some housekeeping, such as writing the sleep type value to the embedded
controller, before entering the system sleeping state.

According to section 7.5 of acpi 6.0 spec, _PTS should run after _TTS.

Thus, a _PTS block notifier is added to the reboot notifier list so that
the _PTS object will also be evaluated when the system reboot.

Signed-off-by: Ocean He 
Signed-off-by: Nagananda Chumbalkar 
---
 drivers/acpi/sleep.c | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/drivers/acpi/sleep.c b/drivers/acpi/sleep.c
index 2a8b596..8b290fb 100644
--- a/drivers/acpi/sleep.c
+++ b/drivers/acpi/sleep.c
@@ -55,6 +55,26 @@ static struct notifier_block tts_notifier = {
.priority   = 0,
 };
 
+static int pts_notify_reboot(struct notifier_block *this,
+   unsigned long code, void *x)
+{
+   acpi_status status;
+
+   status = acpi_execute_simple_method(NULL, "\\_PTS", ACPI_STATE_S5);
+   if (ACPI_FAILURE(status) && status != AE_NOT_FOUND) {
+   /* It won't break anything. */
+   printk(KERN_NOTICE "Failure in evaluating _PTS object\n");
+   }
+
+   return NOTIFY_DONE;
+}
+
+static struct notifier_block pts_notifier = {
+   .notifier_call  = pts_notify_reboot,
+   .next   = NULL,
+   .priority   = 0,
+};
+
 static int acpi_sleep_prepare(u32 acpi_state)
 {
 #ifdef CONFIG_ACPI_SLEEP
@@ -896,5 +916,12 @@ int __init acpi_sleep_init(void)
 * object can also be evaluated when the system enters S5.
 */
register_reboot_notifier(_notifier);
+
+   /*
+* According to section 7.5 of acpi 6.0 spec, _PTS should run after
+* _TTS when the system enters S5.
+*/
+   register_reboot_notifier(_notifier);
+
return 0;
 }
-- 
1.8.3.1


Re: [PATCH] ixgbe: take online CPU number as MQ max limit when alloc_etherdev_mq()

2016-05-15 Thread ethan zhao

Alexander,

On 2016/5/14 0:46, Alexander Duyck wrote:

On Thu, May 12, 2016 at 10:56 PM, Ethan Zhao  wrote:

Allocating 64 Tx/Rx as default doesn't benefit perfomrnace when less
CPUs were assigned. especially when DCB is enabled, so we should take
num_online_cpus() as top limit, and aslo to make sure every TC has
at least one queue, take the MAX_TRAFFIC_CLASS as bottom limit of queues
number.

Signed-off-by: Ethan Zhao 

What is the harm in allowing the user to specify up to 64 queues if
they want to?  Also what is your opinion based on?  In the case of RSS


 There is no module parameter to specify queue number in this upstream 
ixgbe
  driver.  for what to specify more queues than num_online_cpus() via 
ethtool ?

 I couldn't figure out the benefit to do that.

 But if DCB is turned on after loading, the queues would be 64/64, that 
doesn't

 make sense if only 16 CPUs assigned.

traffic the upper limit is only 16 on older NICs, but last I knew the
latest X550 can support more queues for RSS.  Have you only been
testing on older NICs or did you test on the latest hardware as well?
  More queues for RSS than num_online_cpus() could bring better 
performance ?
  Test result shows false result.  even memory cost is not an issue for 
most of

  the expensive servers, but not for all.



If you want to control the number of queues allocated in a given
configuration you should look at the code over in the ixgbe_lib.c, not
  Yes,  RSS,  RSS with SRIOV, FCoE, DCB etc uses different queues 
calculation algorithm.
  But they all take the dev queues allocated in alloc_etherdev_mq() as 
upper limit.


 If we set 64 as default here, DCB would says "oh, there is 64 there, I 
could use it"

ixgbe_main.c.  All you are doing with this patch is denying the user
choice with this change as they then are not allowed to set more

  Yes, it is purposed to deny configuration that doesn't benefit.

queues.  Even if they find your decision was wrong for their
configuration.

- Alex


 Thanks,
 Ethan


Re: [PATCH] ixgbe: take online CPU number as MQ max limit when alloc_etherdev_mq()

2016-05-15 Thread ethan zhao

Alexander,

On 2016/5/14 0:46, Alexander Duyck wrote:

On Thu, May 12, 2016 at 10:56 PM, Ethan Zhao  wrote:

Allocating 64 Tx/Rx as default doesn't benefit perfomrnace when less
CPUs were assigned. especially when DCB is enabled, so we should take
num_online_cpus() as top limit, and aslo to make sure every TC has
at least one queue, take the MAX_TRAFFIC_CLASS as bottom limit of queues
number.

Signed-off-by: Ethan Zhao 

What is the harm in allowing the user to specify up to 64 queues if
they want to?  Also what is your opinion based on?  In the case of RSS


 There is no module parameter to specify queue number in this upstream 
ixgbe
  driver.  for what to specify more queues than num_online_cpus() via 
ethtool ?

 I couldn't figure out the benefit to do that.

 But if DCB is turned on after loading, the queues would be 64/64, that 
doesn't

 make sense if only 16 CPUs assigned.

traffic the upper limit is only 16 on older NICs, but last I knew the
latest X550 can support more queues for RSS.  Have you only been
testing on older NICs or did you test on the latest hardware as well?
  More queues for RSS than num_online_cpus() could bring better 
performance ?
  Test result shows false result.  even memory cost is not an issue for 
most of

  the expensive servers, but not for all.



If you want to control the number of queues allocated in a given
configuration you should look at the code over in the ixgbe_lib.c, not
  Yes,  RSS,  RSS with SRIOV, FCoE, DCB etc uses different queues 
calculation algorithm.
  But they all take the dev queues allocated in alloc_etherdev_mq() as 
upper limit.


 If we set 64 as default here, DCB would says "oh, there is 64 there, I 
could use it"

ixgbe_main.c.  All you are doing with this patch is denying the user
choice with this change as they then are not allowed to set more

  Yes, it is purposed to deny configuration that doesn't benefit.

queues.  Even if they find your decision was wrong for their
configuration.

- Alex


 Thanks,
 Ethan


Building older mips kernels with different versions of binutils; possible patch for 3.2 and 3.4

2016-05-15 Thread Guenter Roeck

Hi folks,

building mips images with a consistent infrastructure is becoming more and more 
difficult.
Current state is as follows.

Binutils/   2.222.242.25
Kernel
3.2 X   -   -
3.4 X   -   -
3.10X   X   -
3.14X   X   -
3.16X   X   -
3.18X   X   (X) [1]
4.1 X   X   (X)
4.4 X   X   (X)
4.5 X   X   (X)
4.6 X   X   (X)
next-   X   (X)

[1] (at least) allnoconfig fails to build with binutils 2.25 (2.25.1, more 
specifically).

I used the following toolchains for the above tests:
- Poky 1.3 (binutils 2.22)
- Poky 2.0 (binutils 2.25.1)
- gcc-4.6.3-nolibc from kernel.org (binutils 2.22)
- gcc-4.9.0-nolibc from kernel.org (binutils 2.24)

For 3.4 and 3.2 kernels to build with binutils v2.24, it would be necessary to
apply patch c02263063362 ("MIPS: Refactor 'clear_page' and 'copy_page' 
functions").
It applies cleanly to 3.4, but has a Makefile conflict in 3.2. It might
make sense to apply this patch to both releases. Would this be possible ?
This way, we would have at least one toolchain which can build all 3.2+ kernels.

Thanks,
Guenter


Building older mips kernels with different versions of binutils; possible patch for 3.2 and 3.4

2016-05-15 Thread Guenter Roeck

Hi folks,

building mips images with a consistent infrastructure is becoming more and more 
difficult.
Current state is as follows.

Binutils/   2.222.242.25
Kernel
3.2 X   -   -
3.4 X   -   -
3.10X   X   -
3.14X   X   -
3.16X   X   -
3.18X   X   (X) [1]
4.1 X   X   (X)
4.4 X   X   (X)
4.5 X   X   (X)
4.6 X   X   (X)
next-   X   (X)

[1] (at least) allnoconfig fails to build with binutils 2.25 (2.25.1, more 
specifically).

I used the following toolchains for the above tests:
- Poky 1.3 (binutils 2.22)
- Poky 2.0 (binutils 2.25.1)
- gcc-4.6.3-nolibc from kernel.org (binutils 2.22)
- gcc-4.9.0-nolibc from kernel.org (binutils 2.24)

For 3.4 and 3.2 kernels to build with binutils v2.24, it would be necessary to
apply patch c02263063362 ("MIPS: Refactor 'clear_page' and 'copy_page' 
functions").
It applies cleanly to 3.4, but has a Makefile conflict in 3.2. It might
make sense to apply this patch to both releases. Would this be possible ?
This way, we would have at least one toolchain which can build all 3.2+ kernels.

Thanks,
Guenter


Re: [PATCH v3 3/7 UPDATE] perf tools: Add option for the path of buildid dsos under symfs

2016-05-15 Thread David Ahern

On 5/15/16 7:30 PM, Hekuang wrote:

In previous patch, I use 'perf buildid-cache -a' to add vdso
binary into the HOST buildid dir.


So 'perf buildid-cache' needs the symfs option?



Re: [PATCH v3 3/7 UPDATE] perf tools: Add option for the path of buildid dsos under symfs

2016-05-15 Thread David Ahern

On 5/15/16 7:30 PM, Hekuang wrote:

In previous patch, I use 'perf buildid-cache -a' to add vdso
binary into the HOST buildid dir.


So 'perf buildid-cache' needs the symfs option?



[PATCH] audit: fixup: log on errors from filter user rules

2016-05-15 Thread Richard Guy Briggs
In commit 724e4fcc the intention was to pass any errors back from
audit_filter_user_rules() to audit_filter_user().  Add that code.

Signed-off-by: Richard Guy Briggs 
---
 kernel/auditfilter.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/auditfilter.c b/kernel/auditfilter.c
index b8ff9e1..96c9a1b 100644
--- a/kernel/auditfilter.c
+++ b/kernel/auditfilter.c
@@ -1339,8 +1339,8 @@ static int audit_filter_user_rules(struct audit_krule 
*rule, int type,
break;
}
 
-   if (!result)
-   return 0;
+   if (result <= 0)
+   return result;
}
switch (rule->action) {
case AUDIT_NEVER:*state = AUDIT_DISABLED;   break;
-- 
1.7.1



[PATCH] audit: fixup: log on errors from filter user rules

2016-05-15 Thread Richard Guy Briggs
In commit 724e4fcc the intention was to pass any errors back from
audit_filter_user_rules() to audit_filter_user().  Add that code.

Signed-off-by: Richard Guy Briggs 
---
 kernel/auditfilter.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/auditfilter.c b/kernel/auditfilter.c
index b8ff9e1..96c9a1b 100644
--- a/kernel/auditfilter.c
+++ b/kernel/auditfilter.c
@@ -1339,8 +1339,8 @@ static int audit_filter_user_rules(struct audit_krule 
*rule, int type,
break;
}
 
-   if (!result)
-   return 0;
+   if (result <= 0)
+   return result;
}
switch (rule->action) {
case AUDIT_NEVER:*state = AUDIT_DISABLED;   break;
-- 
1.7.1



[RFC PATCH 7/9] sched/fair: Optimize __update_sched_avg()

2016-05-15 Thread Yuyang Du
__update_sched_avg() has these steps:

  1. add the remainder of the last incomplete period
  2. decay old sum
  3. accumulate new sum in full periods since last_update_time
  4. add the current incomplete period
  5. update averages

Previously, we separately computed steps 1, 3, and 4, leading to
each one of them ugly in codes and costly in overhead.

Illustation:

 c1  c3   c4
 ^   ^^
 |   ||
   |<->|<->|<--->|
   ... |---x---|--| ... |--|-x (now)

c1, c3, and c4 are the accumulated (meanwhile decayed) contributions
in timing aspect of steps 1, 3, and 4 respectively.

With them, the accumulated contribution to load_sum, for example, is:

contrib = c1 * weight * freq_scaled;
contrib += c3 * weight * freq_scaled;
contrib += c4 * weight * freq_scaled;

Obviously, we can optimize the above by:

contrib = c1 + c3 + c4;
contrib *= weight * freq_scaled;

One issue is that c1 must be additionally decayed as opposed to
decaying it with step 2. However, peformance wise, the two approaches
should be on par with each other if you compare __decay_sum() with a
round of contrib accumulation. And overall, we definitely will save
tens of instructions, although the performance gain may not be observed
by benchmarks, whereas code wise, this apprach is obviously much simpler.

Code size (allyesconfig):

Before: kernel/sched/built-in.o 1404840
After : kernel/sched/built-in.o 1404016  (-824B)

Signed-off-by: Yuyang Du 
---
 kernel/sched/fair.c |  180 +--
 1 file changed, 89 insertions(+), 91 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1bbac7e..e1cde19 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -668,7 +668,7 @@ static unsigned long task_h_load(struct task_struct *p);
  */
 #define SCHED_AVG_HALFLIFE 32  /* number of periods as a half-life */
 #define SCHED_AVG_MAX 47742/* maximum possible sched avg */
-#define SCHED_AVG_MAX_N 345/* number of full periods to produce 
SCHED_AVG_MAX */
+#define SCHED_AVG_MAX_N 347/* number of full periods to produce 
SCHED_AVG_MAX */
 
 /* Give new sched_entity start runnable values to heavy its load in infant 
time */
 void init_entity_runnable_average(struct sched_entity *se)
@@ -2652,24 +2652,83 @@ static __always_inline u64 __decay_sum(u64 val, u32 n)
  * We can compute this efficiently by combining:
  * y^32 = 1/2 with precomputed \Sum 1024*y^n   (where n < 32)
  */
-static __always_inline u32 __accumulate_sum(u32 n)
+static __always_inline u32
+__accumulate_sum(u32 periods, u32 period_contrib, u32 remainder)
 {
-   u32 contrib = 0;
+   u32 contrib;
 
-   if (likely(n <= SCHED_AVG_HALFLIFE))
-   return __accumulated_sum_N[n];
-   else if (unlikely(n >= SCHED_AVG_MAX_N))
+   if (!periods)
+   return remainder - period_contrib;
+
+   if (unlikely(periods >= SCHED_AVG_MAX_N))
return SCHED_AVG_MAX;
 
-   /* Since n < SCHED_AVG_MAX_N, n/SCHED_AVG_HALFLIFE < 11 */
-   contrib = __accumulated_sum_N32[n/SCHED_AVG_HALFLIFE];
-   n %= SCHED_AVG_HALFLIFE;
-   contrib = __decay_sum(contrib, n);
-   return contrib + __accumulated_sum_N[n];
+   remainder += __decay_sum((u64)(1024 - period_contrib), periods);
+
+   periods -= 1;
+   if (likely(periods <= SCHED_AVG_HALFLIFE))
+   contrib = __accumulated_sum_N[periods];
+   else {
+   contrib = __accumulated_sum_N32[periods/SCHED_AVG_HALFLIFE];
+   periods %= SCHED_AVG_HALFLIFE;
+   contrib = __decay_sum(contrib, periods);
+   contrib += __accumulated_sum_N[periods];
+   }
+
+   return contrib + remainder;
 }
 
 #define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)
 
+static __always_inline u32 accumulate_sum(u64 delta, struct sched_avg *sa,
+   struct cfs_rq *cfs_rq, int cpu, unsigned long weight, int running)
+{
+   u32 contrib, periods;
+   unsigned long scale_freq, scale_cpu;
+
+   scale_freq = arch_scale_freq_capacity(NULL, cpu);
+   scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
+
+   delta += sa->period_contrib;
+   periods = delta >> 10; /* A period is 1024us (~1ms) */
+
+   /*
+* Accumulating *_sum has two steps.
+*
+* Step 1: decay old *_sum if we crossed period boundaries.
+*/
+   if (periods) {
+   sa->load_sum = __decay_sum(sa->load_sum, periods);
+   if (cfs_rq) {
+   cfs_rq->runnable_load_sum =
+   __decay_sum(cfs_rq->runnable_load_sum, periods);
+   }
+   sa->util_sum = __decay_sum((u64)(sa->util_sum), periods);
+   }
+
+   /*
+* Step 2: accumulate new *_sum since last_update_time. This at most has
+* three parts (at least one 

[RFC PATCH 2/9] documentation: Add scheduler/sched-avg.txt

2016-05-15 Thread Yuyang Du
This doc file has the program to generate the constants to compute
sched averages.

Signed-off-by: Yuyang Du 
---
 Documentation/scheduler/sched-avg.txt |   94 +
 1 file changed, 94 insertions(+)
 create mode 100644 Documentation/scheduler/sched-avg.txt

diff --git a/Documentation/scheduler/sched-avg.txt 
b/Documentation/scheduler/sched-avg.txt
new file mode 100644
index 000..45be4bd
--- /dev/null
+++ b/Documentation/scheduler/sched-avg.txt
@@ -0,0 +1,94 @@
+The following program is used to generate the constants for
+computing sched averages.
+
+==
+   C program (compile with -lm)
+==
+
+#include 
+#include 
+
+#define HALFLIFE 32
+#define SHIFT 32
+
+double y;
+
+void calc_decay_inv_multiply() {
+   int i;
+   unsigned int x;
+
+   printf("static const u32 __decay_inv_multiply_N[] = {");
+   for(i = 0; i < HALFLIFE; i++) {
+   x = ((1UL<<32)-1)*pow(y, i);
+
+   if (i % 6 == 0) printf("\n\t");
+   printf("0x%8x, ", x);
+   }
+   printf("\n};\n\n");
+}
+
+int sum = 1024;
+void calc_accumulated_sum() {
+   int i;
+
+   printf("static const u32 __accumulated_sum_N[] = {\n\t0,");
+   for(i = 1; i <= HALFLIFE; i++) {
+   if (i == 1)
+   sum *= y;
+   else
+   sum = sum*y + 1024*y;
+
+   if (i % 11 == 0) printf("\n\t");
+   printf("%5d,", sum);
+   }
+   printf("\n};\n\n");
+}
+
+int n = 1;
+/* first period */
+long max = 1024;
+
+void calc_converged_max() {
+   long last = 0, y_inv = ((1UL<<32)-1)*y;
+
+   for (; ; n++) {
+   if (n > 1)
+   max = ((max*y_inv)>>SHIFT) + 1024;
+   /*
+* This is the same as:
+* max = max*y + 1024;
+*/
+
+   if (last == max)
+   break;
+
+   last = max;
+   }
+   n--;
+   printf("#define SCHED_AVG_HALFLIFE %d\n", HALFLIFE);
+   printf("#define SCHED_AVG_MAX %ld\n", max);
+   printf("#define SCHED_AVG_MAX_N %d\n\n", n);
+}
+
+void calc_accumulated_sum_32() {
+   int i, x = sum;
+
+   printf("static const u32 __accumulated_sum_N32[] = {\n\t 0,");
+   for(i = 1; i <= n/HALFLIFE+1; i++) {
+   if (i > 1)
+   x = x/2 + sum;
+
+   if (i % 6 == 0) printf("\n\t");
+   printf("%6d,", x);
+   }
+   printf("\n};\n\n");
+}
+
+void main() {
+   y = pow(0.5, 1/(double)HALFLIFE);
+
+   calc_decay_inv_multiply();
+   calc_accumulated_sum();
+   calc_converged_max();
+   calc_accumulated_sum_32();
+}
-- 
1.7.9.5



[RFC PATCH 7/9] sched/fair: Optimize __update_sched_avg()

2016-05-15 Thread Yuyang Du
__update_sched_avg() has these steps:

  1. add the remainder of the last incomplete period
  2. decay old sum
  3. accumulate new sum in full periods since last_update_time
  4. add the current incomplete period
  5. update averages

Previously, we separately computed steps 1, 3, and 4, leading to
each one of them ugly in codes and costly in overhead.

Illustation:

 c1  c3   c4
 ^   ^^
 |   ||
   |<->|<->|<--->|
   ... |---x---|--| ... |--|-x (now)

c1, c3, and c4 are the accumulated (meanwhile decayed) contributions
in timing aspect of steps 1, 3, and 4 respectively.

With them, the accumulated contribution to load_sum, for example, is:

contrib = c1 * weight * freq_scaled;
contrib += c3 * weight * freq_scaled;
contrib += c4 * weight * freq_scaled;

Obviously, we can optimize the above by:

contrib = c1 + c3 + c4;
contrib *= weight * freq_scaled;

One issue is that c1 must be additionally decayed as opposed to
decaying it with step 2. However, peformance wise, the two approaches
should be on par with each other if you compare __decay_sum() with a
round of contrib accumulation. And overall, we definitely will save
tens of instructions, although the performance gain may not be observed
by benchmarks, whereas code wise, this apprach is obviously much simpler.

Code size (allyesconfig):

Before: kernel/sched/built-in.o 1404840
After : kernel/sched/built-in.o 1404016  (-824B)

Signed-off-by: Yuyang Du 
---
 kernel/sched/fair.c |  180 +--
 1 file changed, 89 insertions(+), 91 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1bbac7e..e1cde19 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -668,7 +668,7 @@ static unsigned long task_h_load(struct task_struct *p);
  */
 #define SCHED_AVG_HALFLIFE 32  /* number of periods as a half-life */
 #define SCHED_AVG_MAX 47742/* maximum possible sched avg */
-#define SCHED_AVG_MAX_N 345/* number of full periods to produce 
SCHED_AVG_MAX */
+#define SCHED_AVG_MAX_N 347/* number of full periods to produce 
SCHED_AVG_MAX */
 
 /* Give new sched_entity start runnable values to heavy its load in infant 
time */
 void init_entity_runnable_average(struct sched_entity *se)
@@ -2652,24 +2652,83 @@ static __always_inline u64 __decay_sum(u64 val, u32 n)
  * We can compute this efficiently by combining:
  * y^32 = 1/2 with precomputed \Sum 1024*y^n   (where n < 32)
  */
-static __always_inline u32 __accumulate_sum(u32 n)
+static __always_inline u32
+__accumulate_sum(u32 periods, u32 period_contrib, u32 remainder)
 {
-   u32 contrib = 0;
+   u32 contrib;
 
-   if (likely(n <= SCHED_AVG_HALFLIFE))
-   return __accumulated_sum_N[n];
-   else if (unlikely(n >= SCHED_AVG_MAX_N))
+   if (!periods)
+   return remainder - period_contrib;
+
+   if (unlikely(periods >= SCHED_AVG_MAX_N))
return SCHED_AVG_MAX;
 
-   /* Since n < SCHED_AVG_MAX_N, n/SCHED_AVG_HALFLIFE < 11 */
-   contrib = __accumulated_sum_N32[n/SCHED_AVG_HALFLIFE];
-   n %= SCHED_AVG_HALFLIFE;
-   contrib = __decay_sum(contrib, n);
-   return contrib + __accumulated_sum_N[n];
+   remainder += __decay_sum((u64)(1024 - period_contrib), periods);
+
+   periods -= 1;
+   if (likely(periods <= SCHED_AVG_HALFLIFE))
+   contrib = __accumulated_sum_N[periods];
+   else {
+   contrib = __accumulated_sum_N32[periods/SCHED_AVG_HALFLIFE];
+   periods %= SCHED_AVG_HALFLIFE;
+   contrib = __decay_sum(contrib, periods);
+   contrib += __accumulated_sum_N[periods];
+   }
+
+   return contrib + remainder;
 }
 
 #define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)
 
+static __always_inline u32 accumulate_sum(u64 delta, struct sched_avg *sa,
+   struct cfs_rq *cfs_rq, int cpu, unsigned long weight, int running)
+{
+   u32 contrib, periods;
+   unsigned long scale_freq, scale_cpu;
+
+   scale_freq = arch_scale_freq_capacity(NULL, cpu);
+   scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
+
+   delta += sa->period_contrib;
+   periods = delta >> 10; /* A period is 1024us (~1ms) */
+
+   /*
+* Accumulating *_sum has two steps.
+*
+* Step 1: decay old *_sum if we crossed period boundaries.
+*/
+   if (periods) {
+   sa->load_sum = __decay_sum(sa->load_sum, periods);
+   if (cfs_rq) {
+   cfs_rq->runnable_load_sum =
+   __decay_sum(cfs_rq->runnable_load_sum, periods);
+   }
+   sa->util_sum = __decay_sum((u64)(sa->util_sum), periods);
+   }
+
+   /*
+* Step 2: accumulate new *_sum since last_update_time. This at most has
+* three parts (at least one part): (1) remainder of 

[RFC PATCH 2/9] documentation: Add scheduler/sched-avg.txt

2016-05-15 Thread Yuyang Du
This doc file has the program to generate the constants to compute
sched averages.

Signed-off-by: Yuyang Du 
---
 Documentation/scheduler/sched-avg.txt |   94 +
 1 file changed, 94 insertions(+)
 create mode 100644 Documentation/scheduler/sched-avg.txt

diff --git a/Documentation/scheduler/sched-avg.txt 
b/Documentation/scheduler/sched-avg.txt
new file mode 100644
index 000..45be4bd
--- /dev/null
+++ b/Documentation/scheduler/sched-avg.txt
@@ -0,0 +1,94 @@
+The following program is used to generate the constants for
+computing sched averages.
+
+==
+   C program (compile with -lm)
+==
+
+#include 
+#include 
+
+#define HALFLIFE 32
+#define SHIFT 32
+
+double y;
+
+void calc_decay_inv_multiply() {
+   int i;
+   unsigned int x;
+
+   printf("static const u32 __decay_inv_multiply_N[] = {");
+   for(i = 0; i < HALFLIFE; i++) {
+   x = ((1UL<<32)-1)*pow(y, i);
+
+   if (i % 6 == 0) printf("\n\t");
+   printf("0x%8x, ", x);
+   }
+   printf("\n};\n\n");
+}
+
+int sum = 1024;
+void calc_accumulated_sum() {
+   int i;
+
+   printf("static const u32 __accumulated_sum_N[] = {\n\t0,");
+   for(i = 1; i <= HALFLIFE; i++) {
+   if (i == 1)
+   sum *= y;
+   else
+   sum = sum*y + 1024*y;
+
+   if (i % 11 == 0) printf("\n\t");
+   printf("%5d,", sum);
+   }
+   printf("\n};\n\n");
+}
+
+int n = 1;
+/* first period */
+long max = 1024;
+
+void calc_converged_max() {
+   long last = 0, y_inv = ((1UL<<32)-1)*y;
+
+   for (; ; n++) {
+   if (n > 1)
+   max = ((max*y_inv)>>SHIFT) + 1024;
+   /*
+* This is the same as:
+* max = max*y + 1024;
+*/
+
+   if (last == max)
+   break;
+
+   last = max;
+   }
+   n--;
+   printf("#define SCHED_AVG_HALFLIFE %d\n", HALFLIFE);
+   printf("#define SCHED_AVG_MAX %ld\n", max);
+   printf("#define SCHED_AVG_MAX_N %d\n\n", n);
+}
+
+void calc_accumulated_sum_32() {
+   int i, x = sum;
+
+   printf("static const u32 __accumulated_sum_N32[] = {\n\t 0,");
+   for(i = 1; i <= n/HALFLIFE+1; i++) {
+   if (i > 1)
+   x = x/2 + sum;
+
+   if (i % 6 == 0) printf("\n\t");
+   printf("%6d,", x);
+   }
+   printf("\n};\n\n");
+}
+
+void main() {
+   y = pow(0.5, 1/(double)HALFLIFE);
+
+   calc_decay_inv_multiply();
+   calc_accumulated_sum();
+   calc_converged_max();
+   calc_accumulated_sum_32();
+}
-- 
1.7.9.5



[RFC PATCH 3/9] sched/fair: Add static to remove_entity_load_avg()

2016-05-15 Thread Yuyang Du
remove_entity_load_avg() is only called in fair.c, so add static to it.

Signed-off-by: Yuyang Du 
---
 kernel/sched/fair.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2635561..66fba3f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3066,7 +3066,7 @@ static inline u64 cfs_rq_last_update_time(struct cfs_rq 
*cfs_rq)
  * Task first catches up with cfs_rq, and then subtract
  * itself from the cfs_rq (task must be off the queue now).
  */
-void remove_entity_load_avg(struct sched_entity *se)
+static void remove_entity_load_avg(struct sched_entity *se)
 {
struct cfs_rq *cfs_rq = cfs_rq_of(se);
u64 last_update_time;
-- 
1.7.9.5



[RFC PATCH 5/9] sched/fair: Change the variable to hold the number of periods to 32-bit

2016-05-15 Thread Yuyang Du
In sched average update, a period is about 1ms, so a 32-bit unsigned
integer can approximately hold a maximum of 49 (=2^32/1000/3600/24)
days.

For usual cases, 32-bit is big enough and 64-bit is needless. But if
a task sleeps longer than it, there can be two outcomes:

Consider a task sleeps m milliseconds (m > U32_MAX), let n = (u32)m

1. If n >= 32*64, then the task's sched avgs will be surely decayed
   to 0. In this case, it really doesn't matter that the 32-bit is not
   big enough to hold m. In other words, a task sleeps 2 secs or sleeps
   50 days are the same from sched average point of view.

2. If n < 32*64, first, the chance to be here is very low, which is
   about 0.5 in a million (=32*64/2^32), but if so, the task's sched
   avgs MAY NOT be decayed to 0, depending on how big its sums are,
   and the chance to 0 is still good as load_sum is way less than ~0ULL
   and util_sum way less than ~0U.

Nevertheless, what really maters is what happens in the worst-case
scenario, which is when (u32)m = 0? So in that case, it would be like
after so long a sleep, we treat the task as it never slept, and it has
the same sched averages as before. At any rate, it should hurt nothing
and there is nothing to worry about.

Signed-off-by: Yuyang Du 
---
 kernel/sched/fair.c |   31 ---
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index fddaa61..1fac2bf 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2617,21 +2617,18 @@ static const u32 __accumulated_sum_N32[] = {
 /*
  * val * y^n, where y^m ~= 0.5
  *
- * n is the number of periods past; a period is ~1ms
+ * n is the number of periods past. A period is ~1ms, so a 32bit
+ * integer can hold approximately a maximum of 49 (=2^32/1000/3600/24) days.
+ *
  * m is half-life in exponential decay; here it is SCHED_AVG_HALFLIFE=32.
  */
-static __always_inline u64 __decay_sum(u64 val, u64 n)
+static __always_inline u64 __decay_sum(u64 val, u32 n)
 {
-   unsigned int local_n;
-
if (!n)
return val;
else if (unlikely(n > SCHED_AVG_HALFLIFE * 63))
return 0;
 
-   /* after bounds checking we can collapse to 32-bit */
-   local_n = n;
-
/*
 * As y^HALFLIFE = 1/2, we can combine
 *y^n = 1/2^(n/HALFLIFE) * y^(n%HALFLIFE)
@@ -2639,12 +2636,12 @@ static __always_inline u64 __decay_sum(u64 val, u64 n)
 *
 * To achieve constant time __decay_load.
 */
-   if (unlikely(local_n >= SCHED_AVG_HALFLIFE)) {
-   val >>= local_n / SCHED_AVG_HALFLIFE;
-   local_n %= SCHED_AVG_HALFLIFE;
+   if (unlikely(n >= SCHED_AVG_HALFLIFE)) {
+   val >>= n / SCHED_AVG_HALFLIFE;
+   n %= SCHED_AVG_HALFLIFE;
}
 
-   val = mul_u64_u32_shr(val, __decay_inv_multiply_N[local_n], 32);
+   val = mul_u64_u32_shr(val, __decay_inv_multiply_N[n], 32);
return val;
 }
 
@@ -2655,7 +2652,7 @@ static __always_inline u64 __decay_sum(u64 val, u64 n)
  * We can compute this efficiently by combining:
  * y^32 = 1/2 with precomputed \Sum 1024*y^n   (where n < 32)
  */
-static u32 __accumulate_sum(u64 n)
+static u32 __accumulate_sum(u32 n)
 {
u32 contrib = 0;
 
@@ -2705,8 +2702,8 @@ static __always_inline int
 __update_sched_avg(u64 now, int cpu, struct sched_avg *sa,
   unsigned long weight, int running, struct cfs_rq *cfs_rq)
 {
-   u64 delta, scaled_delta, periods;
-   u32 contrib;
+   u64 delta, scaled_delta;
+   u32 contrib, periods;
unsigned int delta_w, scaled_delta_w, decayed = 0;
unsigned long scale_freq, scale_cpu;
 
@@ -2759,7 +2756,11 @@ __update_sched_avg(u64 now, int cpu, struct sched_avg 
*sa,
 
delta -= delta_w;
 
-   /* Figure out how many additional periods this update spans */
+   /*
+* Figure out how many additional periods this update spans.
+* A period is 1024*1024ns or ~1ms, so a 32bit integer can hold
+* approximately a maximum of 49 (=2^32/1000/3600/24) days.
+*/
periods = delta / 1024;
delta %= 1024;
 
-- 
1.7.9.5



[RFC PATCH 3/9] sched/fair: Add static to remove_entity_load_avg()

2016-05-15 Thread Yuyang Du
remove_entity_load_avg() is only called in fair.c, so add static to it.

Signed-off-by: Yuyang Du 
---
 kernel/sched/fair.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2635561..66fba3f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3066,7 +3066,7 @@ static inline u64 cfs_rq_last_update_time(struct cfs_rq 
*cfs_rq)
  * Task first catches up with cfs_rq, and then subtract
  * itself from the cfs_rq (task must be off the queue now).
  */
-void remove_entity_load_avg(struct sched_entity *se)
+static void remove_entity_load_avg(struct sched_entity *se)
 {
struct cfs_rq *cfs_rq = cfs_rq_of(se);
u64 last_update_time;
-- 
1.7.9.5



[RFC PATCH 5/9] sched/fair: Change the variable to hold the number of periods to 32-bit

2016-05-15 Thread Yuyang Du
In sched average update, a period is about 1ms, so a 32-bit unsigned
integer can approximately hold a maximum of 49 (=2^32/1000/3600/24)
days.

For usual cases, 32-bit is big enough and 64-bit is needless. But if
a task sleeps longer than it, there can be two outcomes:

Consider a task sleeps m milliseconds (m > U32_MAX), let n = (u32)m

1. If n >= 32*64, then the task's sched avgs will be surely decayed
   to 0. In this case, it really doesn't matter that the 32-bit is not
   big enough to hold m. In other words, a task sleeps 2 secs or sleeps
   50 days are the same from sched average point of view.

2. If n < 32*64, first, the chance to be here is very low, which is
   about 0.5 in a million (=32*64/2^32), but if so, the task's sched
   avgs MAY NOT be decayed to 0, depending on how big its sums are,
   and the chance to 0 is still good as load_sum is way less than ~0ULL
   and util_sum way less than ~0U.

Nevertheless, what really maters is what happens in the worst-case
scenario, which is when (u32)m = 0? So in that case, it would be like
after so long a sleep, we treat the task as it never slept, and it has
the same sched averages as before. At any rate, it should hurt nothing
and there is nothing to worry about.

Signed-off-by: Yuyang Du 
---
 kernel/sched/fair.c |   31 ---
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index fddaa61..1fac2bf 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2617,21 +2617,18 @@ static const u32 __accumulated_sum_N32[] = {
 /*
  * val * y^n, where y^m ~= 0.5
  *
- * n is the number of periods past; a period is ~1ms
+ * n is the number of periods past. A period is ~1ms, so a 32bit
+ * integer can hold approximately a maximum of 49 (=2^32/1000/3600/24) days.
+ *
  * m is half-life in exponential decay; here it is SCHED_AVG_HALFLIFE=32.
  */
-static __always_inline u64 __decay_sum(u64 val, u64 n)
+static __always_inline u64 __decay_sum(u64 val, u32 n)
 {
-   unsigned int local_n;
-
if (!n)
return val;
else if (unlikely(n > SCHED_AVG_HALFLIFE * 63))
return 0;
 
-   /* after bounds checking we can collapse to 32-bit */
-   local_n = n;
-
/*
 * As y^HALFLIFE = 1/2, we can combine
 *y^n = 1/2^(n/HALFLIFE) * y^(n%HALFLIFE)
@@ -2639,12 +2636,12 @@ static __always_inline u64 __decay_sum(u64 val, u64 n)
 *
 * To achieve constant time __decay_load.
 */
-   if (unlikely(local_n >= SCHED_AVG_HALFLIFE)) {
-   val >>= local_n / SCHED_AVG_HALFLIFE;
-   local_n %= SCHED_AVG_HALFLIFE;
+   if (unlikely(n >= SCHED_AVG_HALFLIFE)) {
+   val >>= n / SCHED_AVG_HALFLIFE;
+   n %= SCHED_AVG_HALFLIFE;
}
 
-   val = mul_u64_u32_shr(val, __decay_inv_multiply_N[local_n], 32);
+   val = mul_u64_u32_shr(val, __decay_inv_multiply_N[n], 32);
return val;
 }
 
@@ -2655,7 +2652,7 @@ static __always_inline u64 __decay_sum(u64 val, u64 n)
  * We can compute this efficiently by combining:
  * y^32 = 1/2 with precomputed \Sum 1024*y^n   (where n < 32)
  */
-static u32 __accumulate_sum(u64 n)
+static u32 __accumulate_sum(u32 n)
 {
u32 contrib = 0;
 
@@ -2705,8 +2702,8 @@ static __always_inline int
 __update_sched_avg(u64 now, int cpu, struct sched_avg *sa,
   unsigned long weight, int running, struct cfs_rq *cfs_rq)
 {
-   u64 delta, scaled_delta, periods;
-   u32 contrib;
+   u64 delta, scaled_delta;
+   u32 contrib, periods;
unsigned int delta_w, scaled_delta_w, decayed = 0;
unsigned long scale_freq, scale_cpu;
 
@@ -2759,7 +2756,11 @@ __update_sched_avg(u64 now, int cpu, struct sched_avg 
*sa,
 
delta -= delta_w;
 
-   /* Figure out how many additional periods this update spans */
+   /*
+* Figure out how many additional periods this update spans.
+* A period is 1024*1024ns or ~1ms, so a 32bit integer can hold
+* approximately a maximum of 49 (=2^32/1000/3600/24) days.
+*/
periods = delta / 1024;
delta %= 1024;
 
-- 
1.7.9.5



[RFC PATCH 9/9] sched/fair: Rename scale_load() and scale_load_down()

2016-05-15 Thread Yuyang Du
Rename scale_load() and scale_load_down() to user_to_kernel_load()
and kernel_to_user_load() respectively. This helps us tag them
clearly and avoid confusion.

Signed-off-by: Yuyang Du 
---
 kernel/sched/core.c  |8 
 kernel/sched/fair.c  |   11 ---
 kernel/sched/sched.h |   16 
 3 files changed, 20 insertions(+), 15 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 404c078..c4c84d4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -735,12 +735,12 @@ static void set_load_weight(struct task_struct *p)
 * SCHED_IDLE tasks get minimal weight:
 */
if (idle_policy(p->policy)) {
-   load->weight = scale_load(WEIGHT_IDLEPRIO);
+   load->weight = user_to_kernel_load(WEIGHT_IDLEPRIO);
load->inv_weight = WMULT_IDLEPRIO;
return;
}
 
-   load->weight = scale_load(sched_prio_to_weight[prio]);
+   load->weight = user_to_kernel_load(sched_prio_to_weight[prio]);
load->inv_weight = sched_prio_to_wmult[prio];
 }
 
@@ -8216,7 +8216,7 @@ static void cpu_cgroup_attach(struct cgroup_taskset *tset)
 static int cpu_shares_write_u64(struct cgroup_subsys_state *css,
struct cftype *cftype, u64 shareval)
 {
-   return sched_group_set_shares(css_tg(css), scale_load(shareval));
+   return sched_group_set_shares(css_tg(css), 
user_to_kernel_load(shareval));
 }
 
 static u64 cpu_shares_read_u64(struct cgroup_subsys_state *css,
@@ -8224,7 +8224,7 @@ static u64 cpu_shares_read_u64(struct cgroup_subsys_state 
*css,
 {
struct task_group *tg = css_tg(css);
 
-   return (u64) scale_load_down(tg->shares);
+   return (u64) kernel_to_user_load(tg->shares);
 }
 
 #ifdef CONFIG_CFS_BANDWIDTH
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 88913b8..fbf6220 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -189,7 +189,7 @@ static void __update_inv_weight(struct load_weight *lw)
if (likely(lw->inv_weight))
return;
 
-   w = scale_load_down(lw->weight);
+   w = kernel_to_user_load(lw->weight);
 
if (BITS_PER_LONG > 32 && unlikely(w >= WMULT_CONST))
lw->inv_weight = 1;
@@ -210,10 +210,14 @@ static void __update_inv_weight(struct load_weight *lw)
  *
  * Or, weight =< lw.weight (because lw.weight is the runqueue weight), thus
  * weight/lw.weight <= 1, and therefore our shift will also be positive.
+ *
+ * Note load.weight falls back to user load scale (i.e., NICE_0's load is
+ * 1024), instead of possibly increased kernel load scale (i.e., NICE_0's
+ * load is NICE_0_LOAD) due to multiplication and division efficiency.
  */
 static u64 __calc_delta(u64 delta_exec, unsigned long weight, struct 
load_weight *lw)
 {
-   u64 fact = scale_load_down(weight);
+   u64 fact = kernel_to_user_load(weight);
int shift = WMULT_SHIFT;
 
__update_inv_weight(lw);
@@ -8608,7 +8612,8 @@ int sched_group_set_shares(struct task_group *tg, 
unsigned long shares)
if (!tg->se[0])
return -EINVAL;
 
-   shares = clamp(shares, scale_load(MIN_SHARES), scale_load(MAX_SHARES));
+   shares = clamp(shares, user_to_kernel_load(MIN_SHARES),
+  user_to_kernel_load(MAX_SHARES));
 
mutex_lock(_mutex);
if (tg->shares == shares)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 3432985..0b8b3f3 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -57,22 +57,22 @@ static inline void cpu_load_update_active(struct rq 
*this_rq) { }
  */
 #ifdef CONFIG_64BIT
 # define NICE_0_LOAD_SHIFT (SCHED_FIXEDPOINT_SHIFT + 
SCHED_FIXEDPOINT_SHIFT)
-# define scale_load(w) ((w) << SCHED_FIXEDPOINT_SHIFT)
-# define scale_load_down(w)((w) >> SCHED_FIXEDPOINT_SHIFT)
+# define user_to_kernel_load(w)((w) << SCHED_FIXEDPOINT_SHIFT)
+# define kernel_to_user_load(w)((w) >> SCHED_FIXEDPOINT_SHIFT)
 #else
 # define NICE_0_LOAD_SHIFT (SCHED_FIXEDPOINT_SHIFT)
-# define scale_load(w) (w)
-# define scale_load_down(w)(w)
+# define user_to_kernel_load(w)(w)
+# define kernel_to_user_load(w)(w)
 #endif
 
 /*
  * Task weight (visible to users) and its load (invisible to users) have
  * independent resolution, but they should be well calibrated. We use
- * scale_load() and scale_load_down(w) to convert between them. The
- * following must be true:
- *
- *  scale_load(sched_prio_to_weight[USER_PRIO(NICE_TO_PRIO(0))]) == NICE_0_LOAD
+ * user_to_kernel_load() and kernel_to_user_load(w) to convert between
+ * them. The following must be true:
  *
+ * user_to_kernel_load(sched_prio_to_weight[USER_PRIO(NICE_TO_PRIO(0))]) == 
NICE_0_LOAD
+ * kernel_to_user_load(NICE_0_LOAD) == 
sched_prio_to_weight[USER_PRIO(NICE_TO_PRIO(0))]
  */
 #define NICE_0_LOAD(1L << NICE_0_LOAD_SHIFT)
 
-- 
1.7.9.5



[RFC PATCH 4/9] sched/fair: Rename variable names for sched averages

2016-05-15 Thread Yuyang Du
The names of sched averages (including load_avg and util_avg) have
been changed and added in the past a couple of years, some of the
names are a bit confusing especially to people who first read them.
This patch attempts to make the names more self-explaining. And some
comments are updated too.

The renames are listed as follows:

 - update_load_avg() to update_sched_avg()

 - enqueue_entity_load_avg() to enqueue_entity_sched_avg()

 - dequeue_entity_load_avg() to dequeue_entity_sched_avg()

 - detach_entity_load_avg() to detach_entity_sched_avg()

 - attach_entity_load_avg() to attach_entity_sched_avg()

 - remove_entity_load_avg() to remove_entity_sched_avg()

 - LOAD_AVG_PERIOD to SCHED_AVG_HALFLIFE

 - LOAD_AVG_MAX_N to SCHED_AVG_MAX_N

 - LOAD_AVG_MAX to SCHED_AVG_MAX

 - runnable_avg_yN_sum[] to __accumulated_sum_N[]

 - runnable_avg_yN_inv[] to __decay_inv_multiply_N[]

 - __compute_runnable_contrib() to __accumulate_sum()

 - decay_load() to __decay_sum()

Signed-off-by: Yuyang Du 
---
 include/linux/sched.h |2 +-
 kernel/sched/fair.c   |  219 +
 kernel/sched/sched.h  |2 +-
 3 files changed, 114 insertions(+), 109 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 1b43b45..9710e2b 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1221,7 +1221,7 @@ struct load_weight {
 
 /*
  * The load_avg/util_avg accumulates an infinite geometric series
- * (see __update_load_avg() in kernel/sched/fair.c).
+ * (see __update_sched_avg() in kernel/sched/fair.c).
  *
  * [load_avg definition]
  *
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 66fba3f..fddaa61 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -660,13 +660,15 @@ static int select_idle_sibling(struct task_struct *p, int 
cpu);
 static unsigned long task_h_load(struct task_struct *p);
 
 /*
- * We choose a half-life close to 1 scheduling period.
- * Note: The tables runnable_avg_yN_inv and runnable_avg_yN_sum are
- * dependent on this value.
+ * Note: everything in sched average calculation, including
+ * __decay_inv_multiply_N, __accumulated_sum_N, __accumulated_sum_N32,
+ * SCHED_AVG_MAX, and SCHED_AVG_MAX_N, is dependent on and only on
+ * (1) exponential decay, (2) a period of 1024*1024ns (~1ms), and (3)
+ * a half-life of 32 periods.
  */
-#define LOAD_AVG_PERIOD 32
-#define LOAD_AVG_MAX 47742 /* maximum possible load avg */
-#define LOAD_AVG_MAX_N 347 /* number of full periods to produce LOAD_AVG_MAX */
+#define SCHED_AVG_HALFLIFE 32  /* number of periods as a half-life */
+#define SCHED_AVG_MAX 47742/* maximum possible sched avg */
+#define SCHED_AVG_MAX_N 345/* number of full periods to produce 
SCHED_AVG_MAX */
 
 /* Give new sched_entity start runnable values to heavy its load in infant 
time */
 void init_entity_runnable_average(struct sched_entity *se)
@@ -681,7 +683,7 @@ void init_entity_runnable_average(struct sched_entity *se)
 */
sa->period_contrib = 1023;
sa->load_avg = scale_load_down(se->load.weight);
-   sa->load_sum = sa->load_avg * LOAD_AVG_MAX;
+   sa->load_sum = sa->load_avg * SCHED_AVG_MAX;
/*
 * At this point, util_avg won't be used in select_task_rq_fair anyway
 */
@@ -731,7 +733,7 @@ void post_init_entity_util_avg(struct sched_entity *se)
} else {
sa->util_avg = cap;
}
-   sa->util_sum = sa->util_avg * LOAD_AVG_MAX;
+   sa->util_sum = sa->util_avg * SCHED_AVG_MAX;
}
 }
 
@@ -1834,7 +1836,7 @@ static u64 numa_get_avg_runtime(struct task_struct *p, 
u64 *period)
*period = now - p->last_task_numa_placement;
} else {
delta = p->se.avg.load_sum / p->se.load.weight;
-   *period = LOAD_AVG_MAX;
+   *period = SCHED_AVG_MAX;
}
 
p->last_sum_exec_runtime = runtime;
@@ -2583,7 +2585,7 @@ static inline void update_cfs_shares(struct cfs_rq 
*cfs_rq)
 
 #ifdef CONFIG_SMP
 /* Precomputed fixed inverse multiplies for multiplication by y^n */
-static const u32 runnable_avg_yN_inv[] = {
+static const u32 __decay_inv_multiply_N[] = {
0x, 0xfa83b2da, 0xf5257d14, 0xefe4b99a, 0xeac0c6e6, 0xe5b906e6,
0xe0ccdeeb, 0xdbfbb796, 0xd744fcc9, 0xd2a81d91, 0xce248c14, 0xc9b9bd85,
0xc5672a10, 0xc12c4cc9, 0xbd08a39e, 0xb8fbaf46, 0xb504f333, 0xb123f581,
@@ -2596,7 +2598,7 @@ static const u32 runnable_avg_yN_inv[] = {
  * Precomputed \Sum y^k { 1<=k<=n }.  These are floor(true_value) to prevent
  * over-estimates when re-combining.
  */
-static const u32 runnable_avg_yN_sum[] = {
+static const u32 __accumulated_sum_N[] = {
0, 1002, 1982, 2941, 3880, 4798, 5697, 6576, 7437, 8279, 9103,
 9909,10698,11470,12226,12966,13690,14398,15091,15769,16433,17082,
17718,18340,18949,19545,20128,20698,21256,21802,22336,22859,23371,
@@ -2613,93 

[RFC PATCH 8/9] sched/fair: Remove scale_load_down() for load_avg

2016-05-15 Thread Yuyang Du
Currently, load_avg = scale_load_down(load) * runnable%. The extra scaling
down of load does not make much sense, because load_avg is primarily THE
load and on top of that, we take runnable time into account.

We therefore remove scale_load_down() for load_avg. But we need to
carefully consider the overflow risk if load has higher fixed point range
(2*SCHED_FIXEDPOINT_SHIFT). The only case an overflow may occur due
to us is on 64bit kernel with increased fixed point range. In this case,
the 64bit load_sum can afford 4251057 (=2^64/47742/88761/1024)
entities with the highest load (=88761*1024) always runnable on one
single cfs_rq, which may be an issue, but should be fine. Even if this
occurs at the end of the day, on the condition where it occurs, the
load average will not be useful anyway. And afterwards if the machine
can survive, the load will correct itself very quickly in no more than
~2 seconds (=32ms*64).

Signed-off-by: Yuyang Du 
---
 include/linux/sched.h |   19 ++-
 kernel/sched/fair.c   |   11 +--
 2 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 9710e2b..aca6b6f 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1225,7 +1225,7 @@ struct load_weight {
  *
  * [load_avg definition]
  *
- *   load_avg = runnable% * scale_load_down(load)
+ *   load_avg = runnable% * load
  *
  * where runnable% is the time ratio that a sched_entity is runnable.
  * For cfs_rq, it is the aggregated load_avg of all runnable and
@@ -1233,7 +1233,7 @@ struct load_weight {
  *
  * load_avg may also take frequency scaling into account:
  *
- *   load_avg = runnable% * scale_load_down(load) * freq%
+ *   load_avg = runnable% * load * freq%
  *
  * where freq% is the CPU frequency normalized to the highest frequency.
  *
@@ -1259,9 +1259,18 @@ struct load_weight {
  *
  * [Overflow issue]
  *
- * The 64-bit load_sum can have 4353082796 (=2^64/47742/88761) entities
- * with the highest load (=88761), always runnable on a single cfs_rq,
- * and should not overflow as the number already hits PID_MAX_LIMIT.
+ * On 64bit kernel:
+ *
+ * When load has small fixed point range (SCHED_FIXEDPOINT_SHIFT), the
+ * 64bit load_sum can have 4353082796 (=2^64/47742/88761) tasks with
+ * the highest load (=88761) always runnable on a cfs_rq, we should
+ * not overflow as the number already hits PID_MAX_LIMIT.
+ *
+ * When load has large fixed point range (2*SCHED_FIXEDPOINT_SHIFT),
+ * the 64bit load_sum can have 4251057 (=2^64/47742/88761/1024) tasks
+ * with the highest load (=88761*1024) always runnable on ONE cfs_rq,
+ * we should be fine. Even if the overflow occurs at the end of day,
+ * at the time the load_avg won't be useful anyway in that situation.
  *
  * For all other cases (including 32-bit kernels), struct load_weight's
  * weight will overflow first before we do, because:
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e1cde19..88913b8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -682,7 +682,7 @@ void init_entity_runnable_average(struct sched_entity *se)
 * will definitely be update (after enqueue).
 */
sa->period_contrib = 1023;
-   sa->load_avg = scale_load_down(se->load.weight);
+   sa->load_avg = se->load.weight;
sa->load_sum = sa->load_avg * SCHED_AVG_MAX;
/*
 * At this point, util_avg won't be used in select_task_rq_fair anyway
@@ -2929,7 +2929,7 @@ update_cfs_rq_sched_avg(u64 now, struct cfs_rq *cfs_rq, 
bool update_freq)
}
 
decayed = __update_sched_avg(now, cpu_of(rq_of(cfs_rq)), sa,
-   scale_load_down(cfs_rq->load.weight), cfs_rq->curr != NULL, 
cfs_rq);
+cfs_rq->load.weight, cfs_rq->curr != NULL, 
cfs_rq);
 
 #ifndef CONFIG_64BIT
smp_wmb();
@@ -2954,8 +2954,7 @@ static inline void update_sched_avg(struct sched_entity 
*se, int update_tg)
 * Track task load average for carrying it to new CPU after migrated, 
and
 * track group sched_entity load average for task_h_load calc in 
migration
 */
-   __update_sched_avg(now, cpu, >avg,
-  se->on_rq * scale_load_down(se->load.weight),
+   __update_sched_avg(now, cpu, >avg, se->on_rq * se->load.weight,
   cfs_rq->curr == se, NULL);
 
if (update_cfs_rq_sched_avg(now, cfs_rq, true) && update_tg)
@@ -2994,7 +2993,7 @@ skip_aging:
 static void detach_entity_sched_avg(struct cfs_rq *cfs_rq, struct sched_entity 
*se)
 {
__update_sched_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
-  >avg, se->on_rq * 
scale_load_down(se->load.weight),
+  >avg, se->on_rq * se->load.weight,
   cfs_rq->curr == se, NULL);
 
cfs_rq->avg.load_avg = max_t(long, cfs_rq->avg.load_avg - 
se->avg.load_avg, 0);
@@ -3016,7 +3015,7 @@ 

[RFC PATCH 9/9] sched/fair: Rename scale_load() and scale_load_down()

2016-05-15 Thread Yuyang Du
Rename scale_load() and scale_load_down() to user_to_kernel_load()
and kernel_to_user_load() respectively. This helps us tag them
clearly and avoid confusion.

Signed-off-by: Yuyang Du 
---
 kernel/sched/core.c  |8 
 kernel/sched/fair.c  |   11 ---
 kernel/sched/sched.h |   16 
 3 files changed, 20 insertions(+), 15 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 404c078..c4c84d4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -735,12 +735,12 @@ static void set_load_weight(struct task_struct *p)
 * SCHED_IDLE tasks get minimal weight:
 */
if (idle_policy(p->policy)) {
-   load->weight = scale_load(WEIGHT_IDLEPRIO);
+   load->weight = user_to_kernel_load(WEIGHT_IDLEPRIO);
load->inv_weight = WMULT_IDLEPRIO;
return;
}
 
-   load->weight = scale_load(sched_prio_to_weight[prio]);
+   load->weight = user_to_kernel_load(sched_prio_to_weight[prio]);
load->inv_weight = sched_prio_to_wmult[prio];
 }
 
@@ -8216,7 +8216,7 @@ static void cpu_cgroup_attach(struct cgroup_taskset *tset)
 static int cpu_shares_write_u64(struct cgroup_subsys_state *css,
struct cftype *cftype, u64 shareval)
 {
-   return sched_group_set_shares(css_tg(css), scale_load(shareval));
+   return sched_group_set_shares(css_tg(css), 
user_to_kernel_load(shareval));
 }
 
 static u64 cpu_shares_read_u64(struct cgroup_subsys_state *css,
@@ -8224,7 +8224,7 @@ static u64 cpu_shares_read_u64(struct cgroup_subsys_state 
*css,
 {
struct task_group *tg = css_tg(css);
 
-   return (u64) scale_load_down(tg->shares);
+   return (u64) kernel_to_user_load(tg->shares);
 }
 
 #ifdef CONFIG_CFS_BANDWIDTH
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 88913b8..fbf6220 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -189,7 +189,7 @@ static void __update_inv_weight(struct load_weight *lw)
if (likely(lw->inv_weight))
return;
 
-   w = scale_load_down(lw->weight);
+   w = kernel_to_user_load(lw->weight);
 
if (BITS_PER_LONG > 32 && unlikely(w >= WMULT_CONST))
lw->inv_weight = 1;
@@ -210,10 +210,14 @@ static void __update_inv_weight(struct load_weight *lw)
  *
  * Or, weight =< lw.weight (because lw.weight is the runqueue weight), thus
  * weight/lw.weight <= 1, and therefore our shift will also be positive.
+ *
+ * Note load.weight falls back to user load scale (i.e., NICE_0's load is
+ * 1024), instead of possibly increased kernel load scale (i.e., NICE_0's
+ * load is NICE_0_LOAD) due to multiplication and division efficiency.
  */
 static u64 __calc_delta(u64 delta_exec, unsigned long weight, struct 
load_weight *lw)
 {
-   u64 fact = scale_load_down(weight);
+   u64 fact = kernel_to_user_load(weight);
int shift = WMULT_SHIFT;
 
__update_inv_weight(lw);
@@ -8608,7 +8612,8 @@ int sched_group_set_shares(struct task_group *tg, 
unsigned long shares)
if (!tg->se[0])
return -EINVAL;
 
-   shares = clamp(shares, scale_load(MIN_SHARES), scale_load(MAX_SHARES));
+   shares = clamp(shares, user_to_kernel_load(MIN_SHARES),
+  user_to_kernel_load(MAX_SHARES));
 
mutex_lock(_mutex);
if (tg->shares == shares)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 3432985..0b8b3f3 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -57,22 +57,22 @@ static inline void cpu_load_update_active(struct rq 
*this_rq) { }
  */
 #ifdef CONFIG_64BIT
 # define NICE_0_LOAD_SHIFT (SCHED_FIXEDPOINT_SHIFT + 
SCHED_FIXEDPOINT_SHIFT)
-# define scale_load(w) ((w) << SCHED_FIXEDPOINT_SHIFT)
-# define scale_load_down(w)((w) >> SCHED_FIXEDPOINT_SHIFT)
+# define user_to_kernel_load(w)((w) << SCHED_FIXEDPOINT_SHIFT)
+# define kernel_to_user_load(w)((w) >> SCHED_FIXEDPOINT_SHIFT)
 #else
 # define NICE_0_LOAD_SHIFT (SCHED_FIXEDPOINT_SHIFT)
-# define scale_load(w) (w)
-# define scale_load_down(w)(w)
+# define user_to_kernel_load(w)(w)
+# define kernel_to_user_load(w)(w)
 #endif
 
 /*
  * Task weight (visible to users) and its load (invisible to users) have
  * independent resolution, but they should be well calibrated. We use
- * scale_load() and scale_load_down(w) to convert between them. The
- * following must be true:
- *
- *  scale_load(sched_prio_to_weight[USER_PRIO(NICE_TO_PRIO(0))]) == NICE_0_LOAD
+ * user_to_kernel_load() and kernel_to_user_load(w) to convert between
+ * them. The following must be true:
  *
+ * user_to_kernel_load(sched_prio_to_weight[USER_PRIO(NICE_TO_PRIO(0))]) == 
NICE_0_LOAD
+ * kernel_to_user_load(NICE_0_LOAD) == 
sched_prio_to_weight[USER_PRIO(NICE_TO_PRIO(0))]
  */
 #define NICE_0_LOAD(1L << NICE_0_LOAD_SHIFT)
 
-- 
1.7.9.5



[RFC PATCH 4/9] sched/fair: Rename variable names for sched averages

2016-05-15 Thread Yuyang Du
The names of sched averages (including load_avg and util_avg) have
been changed and added in the past a couple of years, some of the
names are a bit confusing especially to people who first read them.
This patch attempts to make the names more self-explaining. And some
comments are updated too.

The renames are listed as follows:

 - update_load_avg() to update_sched_avg()

 - enqueue_entity_load_avg() to enqueue_entity_sched_avg()

 - dequeue_entity_load_avg() to dequeue_entity_sched_avg()

 - detach_entity_load_avg() to detach_entity_sched_avg()

 - attach_entity_load_avg() to attach_entity_sched_avg()

 - remove_entity_load_avg() to remove_entity_sched_avg()

 - LOAD_AVG_PERIOD to SCHED_AVG_HALFLIFE

 - LOAD_AVG_MAX_N to SCHED_AVG_MAX_N

 - LOAD_AVG_MAX to SCHED_AVG_MAX

 - runnable_avg_yN_sum[] to __accumulated_sum_N[]

 - runnable_avg_yN_inv[] to __decay_inv_multiply_N[]

 - __compute_runnable_contrib() to __accumulate_sum()

 - decay_load() to __decay_sum()

Signed-off-by: Yuyang Du 
---
 include/linux/sched.h |2 +-
 kernel/sched/fair.c   |  219 +
 kernel/sched/sched.h  |2 +-
 3 files changed, 114 insertions(+), 109 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 1b43b45..9710e2b 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1221,7 +1221,7 @@ struct load_weight {
 
 /*
  * The load_avg/util_avg accumulates an infinite geometric series
- * (see __update_load_avg() in kernel/sched/fair.c).
+ * (see __update_sched_avg() in kernel/sched/fair.c).
  *
  * [load_avg definition]
  *
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 66fba3f..fddaa61 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -660,13 +660,15 @@ static int select_idle_sibling(struct task_struct *p, int 
cpu);
 static unsigned long task_h_load(struct task_struct *p);
 
 /*
- * We choose a half-life close to 1 scheduling period.
- * Note: The tables runnable_avg_yN_inv and runnable_avg_yN_sum are
- * dependent on this value.
+ * Note: everything in sched average calculation, including
+ * __decay_inv_multiply_N, __accumulated_sum_N, __accumulated_sum_N32,
+ * SCHED_AVG_MAX, and SCHED_AVG_MAX_N, is dependent on and only on
+ * (1) exponential decay, (2) a period of 1024*1024ns (~1ms), and (3)
+ * a half-life of 32 periods.
  */
-#define LOAD_AVG_PERIOD 32
-#define LOAD_AVG_MAX 47742 /* maximum possible load avg */
-#define LOAD_AVG_MAX_N 347 /* number of full periods to produce LOAD_AVG_MAX */
+#define SCHED_AVG_HALFLIFE 32  /* number of periods as a half-life */
+#define SCHED_AVG_MAX 47742/* maximum possible sched avg */
+#define SCHED_AVG_MAX_N 345/* number of full periods to produce 
SCHED_AVG_MAX */
 
 /* Give new sched_entity start runnable values to heavy its load in infant 
time */
 void init_entity_runnable_average(struct sched_entity *se)
@@ -681,7 +683,7 @@ void init_entity_runnable_average(struct sched_entity *se)
 */
sa->period_contrib = 1023;
sa->load_avg = scale_load_down(se->load.weight);
-   sa->load_sum = sa->load_avg * LOAD_AVG_MAX;
+   sa->load_sum = sa->load_avg * SCHED_AVG_MAX;
/*
 * At this point, util_avg won't be used in select_task_rq_fair anyway
 */
@@ -731,7 +733,7 @@ void post_init_entity_util_avg(struct sched_entity *se)
} else {
sa->util_avg = cap;
}
-   sa->util_sum = sa->util_avg * LOAD_AVG_MAX;
+   sa->util_sum = sa->util_avg * SCHED_AVG_MAX;
}
 }
 
@@ -1834,7 +1836,7 @@ static u64 numa_get_avg_runtime(struct task_struct *p, 
u64 *period)
*period = now - p->last_task_numa_placement;
} else {
delta = p->se.avg.load_sum / p->se.load.weight;
-   *period = LOAD_AVG_MAX;
+   *period = SCHED_AVG_MAX;
}
 
p->last_sum_exec_runtime = runtime;
@@ -2583,7 +2585,7 @@ static inline void update_cfs_shares(struct cfs_rq 
*cfs_rq)
 
 #ifdef CONFIG_SMP
 /* Precomputed fixed inverse multiplies for multiplication by y^n */
-static const u32 runnable_avg_yN_inv[] = {
+static const u32 __decay_inv_multiply_N[] = {
0x, 0xfa83b2da, 0xf5257d14, 0xefe4b99a, 0xeac0c6e6, 0xe5b906e6,
0xe0ccdeeb, 0xdbfbb796, 0xd744fcc9, 0xd2a81d91, 0xce248c14, 0xc9b9bd85,
0xc5672a10, 0xc12c4cc9, 0xbd08a39e, 0xb8fbaf46, 0xb504f333, 0xb123f581,
@@ -2596,7 +2598,7 @@ static const u32 runnable_avg_yN_inv[] = {
  * Precomputed \Sum y^k { 1<=k<=n }.  These are floor(true_value) to prevent
  * over-estimates when re-combining.
  */
-static const u32 runnable_avg_yN_sum[] = {
+static const u32 __accumulated_sum_N[] = {
0, 1002, 1982, 2941, 3880, 4798, 5697, 6576, 7437, 8279, 9103,
 9909,10698,11470,12226,12966,13690,14398,15091,15769,16433,17082,
17718,18340,18949,19545,20128,20698,21256,21802,22336,22859,23371,
@@ -2613,93 +2615,95 @@ static 

[RFC PATCH 8/9] sched/fair: Remove scale_load_down() for load_avg

2016-05-15 Thread Yuyang Du
Currently, load_avg = scale_load_down(load) * runnable%. The extra scaling
down of load does not make much sense, because load_avg is primarily THE
load and on top of that, we take runnable time into account.

We therefore remove scale_load_down() for load_avg. But we need to
carefully consider the overflow risk if load has higher fixed point range
(2*SCHED_FIXEDPOINT_SHIFT). The only case an overflow may occur due
to us is on 64bit kernel with increased fixed point range. In this case,
the 64bit load_sum can afford 4251057 (=2^64/47742/88761/1024)
entities with the highest load (=88761*1024) always runnable on one
single cfs_rq, which may be an issue, but should be fine. Even if this
occurs at the end of the day, on the condition where it occurs, the
load average will not be useful anyway. And afterwards if the machine
can survive, the load will correct itself very quickly in no more than
~2 seconds (=32ms*64).

Signed-off-by: Yuyang Du 
---
 include/linux/sched.h |   19 ++-
 kernel/sched/fair.c   |   11 +--
 2 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 9710e2b..aca6b6f 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1225,7 +1225,7 @@ struct load_weight {
  *
  * [load_avg definition]
  *
- *   load_avg = runnable% * scale_load_down(load)
+ *   load_avg = runnable% * load
  *
  * where runnable% is the time ratio that a sched_entity is runnable.
  * For cfs_rq, it is the aggregated load_avg of all runnable and
@@ -1233,7 +1233,7 @@ struct load_weight {
  *
  * load_avg may also take frequency scaling into account:
  *
- *   load_avg = runnable% * scale_load_down(load) * freq%
+ *   load_avg = runnable% * load * freq%
  *
  * where freq% is the CPU frequency normalized to the highest frequency.
  *
@@ -1259,9 +1259,18 @@ struct load_weight {
  *
  * [Overflow issue]
  *
- * The 64-bit load_sum can have 4353082796 (=2^64/47742/88761) entities
- * with the highest load (=88761), always runnable on a single cfs_rq,
- * and should not overflow as the number already hits PID_MAX_LIMIT.
+ * On 64bit kernel:
+ *
+ * When load has small fixed point range (SCHED_FIXEDPOINT_SHIFT), the
+ * 64bit load_sum can have 4353082796 (=2^64/47742/88761) tasks with
+ * the highest load (=88761) always runnable on a cfs_rq, we should
+ * not overflow as the number already hits PID_MAX_LIMIT.
+ *
+ * When load has large fixed point range (2*SCHED_FIXEDPOINT_SHIFT),
+ * the 64bit load_sum can have 4251057 (=2^64/47742/88761/1024) tasks
+ * with the highest load (=88761*1024) always runnable on ONE cfs_rq,
+ * we should be fine. Even if the overflow occurs at the end of day,
+ * at the time the load_avg won't be useful anyway in that situation.
  *
  * For all other cases (including 32-bit kernels), struct load_weight's
  * weight will overflow first before we do, because:
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e1cde19..88913b8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -682,7 +682,7 @@ void init_entity_runnable_average(struct sched_entity *se)
 * will definitely be update (after enqueue).
 */
sa->period_contrib = 1023;
-   sa->load_avg = scale_load_down(se->load.weight);
+   sa->load_avg = se->load.weight;
sa->load_sum = sa->load_avg * SCHED_AVG_MAX;
/*
 * At this point, util_avg won't be used in select_task_rq_fair anyway
@@ -2929,7 +2929,7 @@ update_cfs_rq_sched_avg(u64 now, struct cfs_rq *cfs_rq, 
bool update_freq)
}
 
decayed = __update_sched_avg(now, cpu_of(rq_of(cfs_rq)), sa,
-   scale_load_down(cfs_rq->load.weight), cfs_rq->curr != NULL, 
cfs_rq);
+cfs_rq->load.weight, cfs_rq->curr != NULL, 
cfs_rq);
 
 #ifndef CONFIG_64BIT
smp_wmb();
@@ -2954,8 +2954,7 @@ static inline void update_sched_avg(struct sched_entity 
*se, int update_tg)
 * Track task load average for carrying it to new CPU after migrated, 
and
 * track group sched_entity load average for task_h_load calc in 
migration
 */
-   __update_sched_avg(now, cpu, >avg,
-  se->on_rq * scale_load_down(se->load.weight),
+   __update_sched_avg(now, cpu, >avg, se->on_rq * se->load.weight,
   cfs_rq->curr == se, NULL);
 
if (update_cfs_rq_sched_avg(now, cfs_rq, true) && update_tg)
@@ -2994,7 +2993,7 @@ skip_aging:
 static void detach_entity_sched_avg(struct cfs_rq *cfs_rq, struct sched_entity 
*se)
 {
__update_sched_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
-  >avg, se->on_rq * 
scale_load_down(se->load.weight),
+  >avg, se->on_rq * se->load.weight,
   cfs_rq->curr == se, NULL);
 
cfs_rq->avg.load_avg = max_t(long, cfs_rq->avg.load_avg - 
se->avg.load_avg, 0);
@@ -3016,7 +3015,7 @@ 

[RFC PATCH 1/9] sched/fair: Chance LOAD_AVG_MAX_N from 345 to 347

2016-05-15 Thread Yuyang Du
In commit 5b51f2f80b3b906ce59bd4dce6eca3c7f34cb1b9
Author: Paul Turner 
Date:   Thu Oct 4 13:18:32 2012 +0200

sched: Make __update_entity_runnable_avg() fast

Paul has a program to compute LOAD_AVG_MAX_N, which basically means
how many periods (at least) are needed for LOAD_AVG_MAX, and the result
of calc_conv(1024) is 345:

  long mult_inv(long c, int n) {
return (c * runnable_avg_yN_inv[n]) >>  WMULT_SHIFT;
  }

  void calc_conv(long n) {
long old_n;
int i = -1;

printf("convergence (LOAD_AVG_MAX, LOAD_AVG_MAX_N)\n");
do {
old_n = n;
n = mult_inv(n, 1) + 1024;
i++;
} while (n != old_n);
printf("%d> %ld\n", i - 1, n);
printf("\n");
  }

The initial value of i is -1, which should be 1 as far as I can tell.
Accordingly, the final result of LOAD_AVG_MAX_N should be changed
from 345 to 347.

Signed-off-by: Yuyang Du 
---
 kernel/sched/fair.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 218f8e8..2635561 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -666,7 +666,7 @@ static unsigned long task_h_load(struct task_struct *p);
  */
 #define LOAD_AVG_PERIOD 32
 #define LOAD_AVG_MAX 47742 /* maximum possible load avg */
-#define LOAD_AVG_MAX_N 345 /* number of full periods to produce LOAD_AVG_MAX */
+#define LOAD_AVG_MAX_N 347 /* number of full periods to produce LOAD_AVG_MAX */
 
 /* Give new sched_entity start runnable values to heavy its load in infant 
time */
 void init_entity_runnable_average(struct sched_entity *se)
-- 
1.7.9.5



[RFC PATCH 1/9] sched/fair: Chance LOAD_AVG_MAX_N from 345 to 347

2016-05-15 Thread Yuyang Du
In commit 5b51f2f80b3b906ce59bd4dce6eca3c7f34cb1b9
Author: Paul Turner 
Date:   Thu Oct 4 13:18:32 2012 +0200

sched: Make __update_entity_runnable_avg() fast

Paul has a program to compute LOAD_AVG_MAX_N, which basically means
how many periods (at least) are needed for LOAD_AVG_MAX, and the result
of calc_conv(1024) is 345:

  long mult_inv(long c, int n) {
return (c * runnable_avg_yN_inv[n]) >>  WMULT_SHIFT;
  }

  void calc_conv(long n) {
long old_n;
int i = -1;

printf("convergence (LOAD_AVG_MAX, LOAD_AVG_MAX_N)\n");
do {
old_n = n;
n = mult_inv(n, 1) + 1024;
i++;
} while (n != old_n);
printf("%d> %ld\n", i - 1, n);
printf("\n");
  }

The initial value of i is -1, which should be 1 as far as I can tell.
Accordingly, the final result of LOAD_AVG_MAX_N should be changed
from 345 to 347.

Signed-off-by: Yuyang Du 
---
 kernel/sched/fair.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 218f8e8..2635561 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -666,7 +666,7 @@ static unsigned long task_h_load(struct task_struct *p);
  */
 #define LOAD_AVG_PERIOD 32
 #define LOAD_AVG_MAX 47742 /* maximum possible load avg */
-#define LOAD_AVG_MAX_N 345 /* number of full periods to produce LOAD_AVG_MAX */
+#define LOAD_AVG_MAX_N 347 /* number of full periods to produce LOAD_AVG_MAX */
 
 /* Give new sched_entity start runnable values to heavy its load in infant 
time */
 void init_entity_runnable_average(struct sched_entity *se)
-- 
1.7.9.5



[RFC PATCH 0/9] Clean up and optimize sched averages

2016-05-15 Thread Yuyang Du
Hi Peter,

Continue the left patches in this series. I realized some patches should
need thorough discussion (finally), so this post is marked RFC.

 - For LOAD_AVG_MAX_N, I am OK to stick to the old value, but it is worthwhile
   to get it cleared to the true value.

 - About the renames, I noticed there is an existing sched_avg_update(), but
   anyway, please NAK the renames you don't want, hopefully not all, ;)

 - Removing scale_load_down() for load_avg may have some unknown ramifications,
   is it worth trying?

The previous post is at: 
http://thread.gmane.org/gmane.linux.kernel/2214387/focus=2218488

Thanks,
Yuyang

--

Yuyang Du (9):
  sched/fair: Chance LOAD_AVG_MAX_N from 345 to 347
  documentation: Add scheduler/sched-avg.txt
  sched/fair: Add static to remove_entity_load_avg()
  sched/fair: Rename variable names for sched averages
  sched/fair: Change the variable to hold the number of periods to
32-bit
  sched/fair: Add __always_inline compiler attribute to
__accumulate_sum()
  sched/fair: Optimize __update_sched_avg()
  sched/fair: Remove scale_load_down() for load_avg
  sched/fair: Rename scale_load() and scale_load_down()

 Documentation/scheduler/sched-avg.txt |   94 
 include/linux/sched.h |   21 +-
 kernel/sched/core.c   |8 +-
 kernel/sched/fair.c   |  382 +
 kernel/sched/sched.h  |   18 +-
 5 files changed, 317 insertions(+), 206 deletions(-)
 create mode 100644 Documentation/scheduler/sched-avg.txt

-- 
1.7.9.5



[RFC PATCH 6/9] sched/fair: Add __always_inline compiler attribute to __accumulate_sum()

2016-05-15 Thread Yuyang Du
__accumulate_sum()'s caller and sibling functions are all inlined.
And it will be called almost every time in its caller. It does not
make sense if only it is not inlined.

Signed-off-by: Yuyang Du 
---
 kernel/sched/fair.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1fac2bf..1bbac7e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2652,7 +2652,7 @@ static __always_inline u64 __decay_sum(u64 val, u32 n)
  * We can compute this efficiently by combining:
  * y^32 = 1/2 with precomputed \Sum 1024*y^n   (where n < 32)
  */
-static u32 __accumulate_sum(u32 n)
+static __always_inline u32 __accumulate_sum(u32 n)
 {
u32 contrib = 0;
 
-- 
1.7.9.5



[RFC PATCH 0/9] Clean up and optimize sched averages

2016-05-15 Thread Yuyang Du
Hi Peter,

Continue the left patches in this series. I realized some patches should
need thorough discussion (finally), so this post is marked RFC.

 - For LOAD_AVG_MAX_N, I am OK to stick to the old value, but it is worthwhile
   to get it cleared to the true value.

 - About the renames, I noticed there is an existing sched_avg_update(), but
   anyway, please NAK the renames you don't want, hopefully not all, ;)

 - Removing scale_load_down() for load_avg may have some unknown ramifications,
   is it worth trying?

The previous post is at: 
http://thread.gmane.org/gmane.linux.kernel/2214387/focus=2218488

Thanks,
Yuyang

--

Yuyang Du (9):
  sched/fair: Chance LOAD_AVG_MAX_N from 345 to 347
  documentation: Add scheduler/sched-avg.txt
  sched/fair: Add static to remove_entity_load_avg()
  sched/fair: Rename variable names for sched averages
  sched/fair: Change the variable to hold the number of periods to
32-bit
  sched/fair: Add __always_inline compiler attribute to
__accumulate_sum()
  sched/fair: Optimize __update_sched_avg()
  sched/fair: Remove scale_load_down() for load_avg
  sched/fair: Rename scale_load() and scale_load_down()

 Documentation/scheduler/sched-avg.txt |   94 
 include/linux/sched.h |   21 +-
 kernel/sched/core.c   |8 +-
 kernel/sched/fair.c   |  382 +
 kernel/sched/sched.h  |   18 +-
 5 files changed, 317 insertions(+), 206 deletions(-)
 create mode 100644 Documentation/scheduler/sched-avg.txt

-- 
1.7.9.5



[RFC PATCH 6/9] sched/fair: Add __always_inline compiler attribute to __accumulate_sum()

2016-05-15 Thread Yuyang Du
__accumulate_sum()'s caller and sibling functions are all inlined.
And it will be called almost every time in its caller. It does not
make sense if only it is not inlined.

Signed-off-by: Yuyang Du 
---
 kernel/sched/fair.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1fac2bf..1bbac7e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2652,7 +2652,7 @@ static __always_inline u64 __decay_sum(u64 val, u32 n)
  * We can compute this efficiently by combining:
  * y^32 = 1/2 with precomputed \Sum 1024*y^n   (where n < 32)
  */
-static u32 __accumulate_sum(u32 n)
+static __always_inline u32 __accumulate_sum(u32 n)
 {
u32 contrib = 0;
 
-- 
1.7.9.5



Re: [PATCH v7 10/14] usb: otg: add hcd companion support

2016-05-15 Thread Peter Chen
On Thu, May 12, 2016 at 03:13:48PM +0300, Roger Quadros wrote:
> Hi,
> 
> On 12/05/16 13:31, Yoshihiro Shimoda wrote:
> > Hi,
> > 
> >> From: Roger Quadros
> >> Sent: Thursday, May 12, 2016 6:32 PM
> >>
> >> Hi,
> >>
> >> On 12/05/16 11:34, Roger Quadros wrote:
> >>> On 12/05/16 07:00, Yoshihiro Shimoda wrote:
>  Hi,
> 
> > From: Alan Stern
> > Sent: Wednesday, May 11, 2016 11:47 PM
> >
> > On Wed, 11 May 2016, Roger Quadros wrote:
> >
> >>> What I mean is if you have 2 EHCI controllers with 2 companion
> >>> controllers, don't you need to know which companion goes with which 
> >>> EHCI
> >>> controller? Just like you do for the otg-controller property.
> >>>
> >>
> >> That is a very good point. I'm not very sure and it seems that current 
> >> code won't work
> >> with multiple EHCI + companion instances.
> 
>  I may misunderstand this topic, but if I use the following environment, 
>  it works correctly.
> 
>  < My environment >
>  - an otg controller: Sets hcd-needs-companion.
>  - ehci0 and ohci0 and a function: They connect to the otg controller 
>  using "otg-controller" property.
>  - ehci1 and ohci1: No "otg-controller" property.
>  - ehci2 and ohci2: No "otg-controller" property.
> 
>  In this environment, all hosts works correctly.
>  Also I think if we have 2 otg controlelrs, it should be work because 
>  otg_dev instance differs.
> >>>
> >>> The topic is about more than one otg controllers and how to tie the right 
> >>> ehci and ohci
> >>> to the correct otg_dev instance especially in cases where we can't depend 
> >>> on probe order.
> >>>
>  Or, does this topic assume an otg controller handles 2 EHCI controllers?
>  I'm not sure such environment actually exists.
> >>>
> >>> No it is not about that.
> > 
> > Thank you for the reply. I understood it.
> > 
> >> Alan, does USB core even know which EHCI and OHCI are linked to the 
> >> same port
> >> or the handoff is software transparent?
> >
> > The core knows.  It doesn't use the information for a whole lot of
> > things, but it does use it in a couple of places.  Search for
> > "companion" in core/hcd-pci.c and you'll see.
> 
>  Thank you for the information. I didn't know this code.
>  If my understanding is correct, the core/hcd-pci.c code will not be used 
>  by non-PCI devices.
> >>>
> >>> That is correct.
> >>>
>  In other words, nobody sets "hcd->self.hs_companion" if we use such a 
>  device.
>  So, I will try to add such a code if needed.
> >>>
> >>> I think OTG core would have to rely on USB core in providing the right 
> >>> companion device,
> >>> just like we rely on it for the primary vs shared HCD case.
> >>>
> >>
> >> OK, it is not so simple.
> >>
> >> EHCI and companion port handoff is really meant to be software transparent.
> >>
> >> non-PCI devices really don't have knowledge of which OHCI instance is 
> >> companion to the EHCI.
> >> With device tree we could provide this mapping but for non-device tree 
> >> case we can't do
> >> anything.
> >>
> >> So my suggestion would be to keep dual role implementation limited to one 
> >> instance for
> >> EHCI + companion case for non-DT.
> >> For PCI case I don't see how dual role can be implemented. I don't think 
> >> we have any
> >> dual-role PCI cards.
> > 
> > R-Car Gen2 SoCs (r8a779[0134] / arm32) has USB 2.0 host controllers via PCI 
> > bus and
> > one high speed function controller via AXI bus.
> > One of channel can be used as host or function.
> > 
> >> For DT case we could have a DT binding to tie the EHCI and companion and 
> >> use that
> >> in the OTG framework.
> 
> After looking at the code it seems we don't need this special binding as we 
> are already
> linking the EHCI controller and companion controller to the single otg 
> controller instance
> using the otg-controller property.
> 

Then, how you know this EHCI + companion controller special case during otg adds
hcd, it needs special handling, right?

Peter

> So all is good as of now.
> 
> For non DT case, it is the responsibility of platform support code to ensure 
> that
> it calls usb_otg_add_hcd() with the correct otg controller instance for both 
> EHCI and
> companion controller and things should work fine there as well.
> 
> --
> cheers,
> -roger
> 
> > 
> > R-Car Gen3 SoC (r8a7795 / arm64) will be this type.
> > (Both USB 2.0 host/function controllers connect to AXI bus.)
> > 
> >> Any objections?
> > 
> > I don't have any objections because I'm just focus on R-Car Gen3 SoC for 
> > now.
> > If someone needs for PCI case, I think it is possible to add such a code 
> > somehow later.
> > 
> > Best regards,
> > Yoshihiro Shimoda
> > 
> >> cheers,
> >> -roger
> --
> To unsubscribe from this list: send the line "unsubscribe linux-usb" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  

Re: [PATCH v7 10/14] usb: otg: add hcd companion support

2016-05-15 Thread Peter Chen
On Thu, May 12, 2016 at 03:13:48PM +0300, Roger Quadros wrote:
> Hi,
> 
> On 12/05/16 13:31, Yoshihiro Shimoda wrote:
> > Hi,
> > 
> >> From: Roger Quadros
> >> Sent: Thursday, May 12, 2016 6:32 PM
> >>
> >> Hi,
> >>
> >> On 12/05/16 11:34, Roger Quadros wrote:
> >>> On 12/05/16 07:00, Yoshihiro Shimoda wrote:
>  Hi,
> 
> > From: Alan Stern
> > Sent: Wednesday, May 11, 2016 11:47 PM
> >
> > On Wed, 11 May 2016, Roger Quadros wrote:
> >
> >>> What I mean is if you have 2 EHCI controllers with 2 companion
> >>> controllers, don't you need to know which companion goes with which 
> >>> EHCI
> >>> controller? Just like you do for the otg-controller property.
> >>>
> >>
> >> That is a very good point. I'm not very sure and it seems that current 
> >> code won't work
> >> with multiple EHCI + companion instances.
> 
>  I may misunderstand this topic, but if I use the following environment, 
>  it works correctly.
> 
>  < My environment >
>  - an otg controller: Sets hcd-needs-companion.
>  - ehci0 and ohci0 and a function: They connect to the otg controller 
>  using "otg-controller" property.
>  - ehci1 and ohci1: No "otg-controller" property.
>  - ehci2 and ohci2: No "otg-controller" property.
> 
>  In this environment, all hosts works correctly.
>  Also I think if we have 2 otg controlelrs, it should be work because 
>  otg_dev instance differs.
> >>>
> >>> The topic is about more than one otg controllers and how to tie the right 
> >>> ehci and ohci
> >>> to the correct otg_dev instance especially in cases where we can't depend 
> >>> on probe order.
> >>>
>  Or, does this topic assume an otg controller handles 2 EHCI controllers?
>  I'm not sure such environment actually exists.
> >>>
> >>> No it is not about that.
> > 
> > Thank you for the reply. I understood it.
> > 
> >> Alan, does USB core even know which EHCI and OHCI are linked to the 
> >> same port
> >> or the handoff is software transparent?
> >
> > The core knows.  It doesn't use the information for a whole lot of
> > things, but it does use it in a couple of places.  Search for
> > "companion" in core/hcd-pci.c and you'll see.
> 
>  Thank you for the information. I didn't know this code.
>  If my understanding is correct, the core/hcd-pci.c code will not be used 
>  by non-PCI devices.
> >>>
> >>> That is correct.
> >>>
>  In other words, nobody sets "hcd->self.hs_companion" if we use such a 
>  device.
>  So, I will try to add such a code if needed.
> >>>
> >>> I think OTG core would have to rely on USB core in providing the right 
> >>> companion device,
> >>> just like we rely on it for the primary vs shared HCD case.
> >>>
> >>
> >> OK, it is not so simple.
> >>
> >> EHCI and companion port handoff is really meant to be software transparent.
> >>
> >> non-PCI devices really don't have knowledge of which OHCI instance is 
> >> companion to the EHCI.
> >> With device tree we could provide this mapping but for non-device tree 
> >> case we can't do
> >> anything.
> >>
> >> So my suggestion would be to keep dual role implementation limited to one 
> >> instance for
> >> EHCI + companion case for non-DT.
> >> For PCI case I don't see how dual role can be implemented. I don't think 
> >> we have any
> >> dual-role PCI cards.
> > 
> > R-Car Gen2 SoCs (r8a779[0134] / arm32) has USB 2.0 host controllers via PCI 
> > bus and
> > one high speed function controller via AXI bus.
> > One of channel can be used as host or function.
> > 
> >> For DT case we could have a DT binding to tie the EHCI and companion and 
> >> use that
> >> in the OTG framework.
> 
> After looking at the code it seems we don't need this special binding as we 
> are already
> linking the EHCI controller and companion controller to the single otg 
> controller instance
> using the otg-controller property.
> 

Then, how you know this EHCI + companion controller special case during otg adds
hcd, it needs special handling, right?

Peter

> So all is good as of now.
> 
> For non DT case, it is the responsibility of platform support code to ensure 
> that
> it calls usb_otg_add_hcd() with the correct otg controller instance for both 
> EHCI and
> companion controller and things should work fine there as well.
> 
> --
> cheers,
> -roger
> 
> > 
> > R-Car Gen3 SoC (r8a7795 / arm64) will be this type.
> > (Both USB 2.0 host/function controllers connect to AXI bus.)
> > 
> >> Any objections?
> > 
> > I don't have any objections because I'm just focus on R-Car Gen3 SoC for 
> > now.
> > If someone needs for PCI case, I think it is possible to add such a code 
> > somehow later.
> > 
> > Best regards,
> > Yoshihiro Shimoda
> > 
> >> cheers,
> >> -roger
> --
> To unsubscribe from this list: send the line "unsubscribe linux-usb" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  

[PATCH V5 3/6] vfio: platform: determine reset capability

2016-05-15 Thread Sinan Kaya
Creating a new function to determine if this driver supports reset
function or not. This is an attempt to abstract device tree calls
from the rest of the code.

Signed-off-by: Sinan Kaya 
---
 drivers/vfio/platform/vfio_platform_common.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/platform/vfio_platform_common.c 
b/drivers/vfio/platform/vfio_platform_common.c
index cb91dd3..25378bd 100644
--- a/drivers/vfio/platform/vfio_platform_common.c
+++ b/drivers/vfio/platform/vfio_platform_common.c
@@ -49,6 +49,11 @@ static vfio_platform_reset_fn_t 
vfio_platform_lookup_reset(const char *compat,
return reset_fn;
 }
 
+static bool vfio_platform_has_reset(struct vfio_platform_device *vdev)
+{
+   return vdev->of_reset ? true : false;
+}
+
 static void vfio_platform_get_reset(struct vfio_platform_device *vdev)
 {
vdev->of_reset = vfio_platform_lookup_reset(vdev->compat,
@@ -215,7 +220,7 @@ static long vfio_platform_ioctl(void *device_data,
if (info.argsz < minsz)
return -EINVAL;
 
-   if (vdev->of_reset)
+   if (vfio_platform_has_reset(vdev))
vdev->flags |= VFIO_DEVICE_FLAGS_RESET;
info.flags = vdev->flags;
info.num_regions = vdev->num_regions;
-- 
1.8.2.1



[PATCH V5 3/6] vfio: platform: determine reset capability

2016-05-15 Thread Sinan Kaya
Creating a new function to determine if this driver supports reset
function or not. This is an attempt to abstract device tree calls
from the rest of the code.

Signed-off-by: Sinan Kaya 
---
 drivers/vfio/platform/vfio_platform_common.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/platform/vfio_platform_common.c 
b/drivers/vfio/platform/vfio_platform_common.c
index cb91dd3..25378bd 100644
--- a/drivers/vfio/platform/vfio_platform_common.c
+++ b/drivers/vfio/platform/vfio_platform_common.c
@@ -49,6 +49,11 @@ static vfio_platform_reset_fn_t 
vfio_platform_lookup_reset(const char *compat,
return reset_fn;
 }
 
+static bool vfio_platform_has_reset(struct vfio_platform_device *vdev)
+{
+   return vdev->of_reset ? true : false;
+}
+
 static void vfio_platform_get_reset(struct vfio_platform_device *vdev)
 {
vdev->of_reset = vfio_platform_lookup_reset(vdev->compat,
@@ -215,7 +220,7 @@ static long vfio_platform_ioctl(void *device_data,
if (info.argsz < minsz)
return -EINVAL;
 
-   if (vdev->of_reset)
+   if (vfio_platform_has_reset(vdev))
vdev->flags |= VFIO_DEVICE_FLAGS_RESET;
info.flags = vdev->flags;
info.num_regions = vdev->num_regions;
-- 
1.8.2.1



[PATCH V5 1/6] vfio: platform: rename reset function

2016-05-15 Thread Sinan Kaya
Renaming the reset function to of_reset as it is only used
by the device tree based platforms.

Signed-off-by: Sinan Kaya 
---
 drivers/vfio/platform/vfio_platform_common.c  | 30 +--
 drivers/vfio/platform/vfio_platform_private.h |  6 +++---
 2 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/drivers/vfio/platform/vfio_platform_common.c 
b/drivers/vfio/platform/vfio_platform_common.c
index e65b142..08fd7c2 100644
--- a/drivers/vfio/platform/vfio_platform_common.c
+++ b/drivers/vfio/platform/vfio_platform_common.c
@@ -41,7 +41,7 @@ static vfio_platform_reset_fn_t 
vfio_platform_lookup_reset(const char *compat,
if (!strcmp(iter->compat, compat) &&
try_module_get(iter->owner)) {
*module = iter->owner;
-   reset_fn = iter->reset;
+   reset_fn = iter->of_reset;
break;
}
}
@@ -51,18 +51,18 @@ static vfio_platform_reset_fn_t 
vfio_platform_lookup_reset(const char *compat,
 
 static void vfio_platform_get_reset(struct vfio_platform_device *vdev)
 {
-   vdev->reset = vfio_platform_lookup_reset(vdev->compat,
-   >reset_module);
-   if (!vdev->reset) {
+   vdev->of_reset = vfio_platform_lookup_reset(vdev->compat,
+   >reset_module);
+   if (!vdev->of_reset) {
request_module("vfio-reset:%s", vdev->compat);
-   vdev->reset = vfio_platform_lookup_reset(vdev->compat,
->reset_module);
+   vdev->of_reset = vfio_platform_lookup_reset(vdev->compat,
+   >reset_module);
}
 }
 
 static void vfio_platform_put_reset(struct vfio_platform_device *vdev)
 {
-   if (vdev->reset)
+   if (vdev->of_reset)
module_put(vdev->reset_module);
 }
 
@@ -141,9 +141,9 @@ static void vfio_platform_release(void *device_data)
mutex_lock(_lock);
 
if (!(--vdev->refcnt)) {
-   if (vdev->reset) {
+   if (vdev->of_reset) {
dev_info(vdev->device, "reset\n");
-   vdev->reset(vdev);
+   vdev->of_reset(vdev);
} else {
dev_warn(vdev->device, "no reset function found!\n");
}
@@ -175,9 +175,9 @@ static int vfio_platform_open(void *device_data)
if (ret)
goto err_irq;
 
-   if (vdev->reset) {
+   if (vdev->of_reset) {
dev_info(vdev->device, "reset\n");
-   vdev->reset(vdev);
+   vdev->of_reset(vdev);
} else {
dev_warn(vdev->device, "no reset function found!\n");
}
@@ -213,7 +213,7 @@ static long vfio_platform_ioctl(void *device_data,
if (info.argsz < minsz)
return -EINVAL;
 
-   if (vdev->reset)
+   if (vdev->of_reset)
vdev->flags |= VFIO_DEVICE_FLAGS_RESET;
info.flags = vdev->flags;
info.num_regions = vdev->num_regions;
@@ -312,8 +312,8 @@ static long vfio_platform_ioctl(void *device_data,
return ret;
 
} else if (cmd == VFIO_DEVICE_RESET) {
-   if (vdev->reset)
-   return vdev->reset(vdev);
+   if (vdev->of_reset)
+   return vdev->of_reset(vdev);
else
return -EINVAL;
}
@@ -611,7 +611,7 @@ void vfio_platform_unregister_reset(const char *compat,
 
mutex_lock(_lock);
list_for_each_entry_safe(iter, temp, _list, link) {
-   if (!strcmp(iter->compat, compat) && (iter->reset == fn)) {
+   if (!strcmp(iter->compat, compat) && (iter->of_reset == fn)) {
list_del(>link);
break;
}
diff --git a/drivers/vfio/platform/vfio_platform_private.h 
b/drivers/vfio/platform/vfio_platform_private.h
index 42816dd..71ed7d1 100644
--- a/drivers/vfio/platform/vfio_platform_private.h
+++ b/drivers/vfio/platform/vfio_platform_private.h
@@ -71,7 +71,7 @@ struct vfio_platform_device {
struct resource*
(*get_resource)(struct vfio_platform_device *vdev, int i);
int (*get_irq)(struct vfio_platform_device *vdev, int i);
-   int (*reset)(struct vfio_platform_device *vdev);
+   int (*of_reset)(struct vfio_platform_device *vdev);
 };
 
 typedef int (*vfio_platform_reset_fn_t)(struct vfio_platform_device *vdev);
@@ -80,7 +80,7 @@ struct vfio_platform_reset_node {
struct list_head link;
char *compat;
struct module *owner;
-   

[PATCH V5 2/6] vfio: platform: move reset call to a common function

2016-05-15 Thread Sinan Kaya
The reset call sequence seems to replicate itself multiple times
across the file. Grouping them together for maintenance reasons.

Signed-off-by: Sinan Kaya 
---
 drivers/vfio/platform/vfio_platform_common.c | 31 ++--
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/drivers/vfio/platform/vfio_platform_common.c 
b/drivers/vfio/platform/vfio_platform_common.c
index 08fd7c2..cb91dd3 100644
--- a/drivers/vfio/platform/vfio_platform_common.c
+++ b/drivers/vfio/platform/vfio_platform_common.c
@@ -134,6 +134,18 @@ static void vfio_platform_regions_cleanup(struct 
vfio_platform_device *vdev)
kfree(vdev->regions);
 }
 
+static int vfio_platform_call_reset(struct vfio_platform_device *vdev)
+{
+   if (vdev->of_reset) {
+   dev_info(vdev->device, "reset\n");
+   vdev->of_reset(vdev);
+   return 0;
+   }
+
+   dev_warn(vdev->device, "no reset function found!\n");
+   return -EINVAL;
+}
+
 static void vfio_platform_release(void *device_data)
 {
struct vfio_platform_device *vdev = device_data;
@@ -141,12 +153,7 @@ static void vfio_platform_release(void *device_data)
mutex_lock(_lock);
 
if (!(--vdev->refcnt)) {
-   if (vdev->of_reset) {
-   dev_info(vdev->device, "reset\n");
-   vdev->of_reset(vdev);
-   } else {
-   dev_warn(vdev->device, "no reset function found!\n");
-   }
+   vfio_platform_call_reset(vdev);
vfio_platform_regions_cleanup(vdev);
vfio_platform_irq_cleanup(vdev);
}
@@ -175,12 +182,7 @@ static int vfio_platform_open(void *device_data)
if (ret)
goto err_irq;
 
-   if (vdev->of_reset) {
-   dev_info(vdev->device, "reset\n");
-   vdev->of_reset(vdev);
-   } else {
-   dev_warn(vdev->device, "no reset function found!\n");
-   }
+   vfio_platform_call_reset(vdev);
}
 
vdev->refcnt++;
@@ -312,10 +314,7 @@ static long vfio_platform_ioctl(void *device_data,
return ret;
 
} else if (cmd == VFIO_DEVICE_RESET) {
-   if (vdev->of_reset)
-   return vdev->of_reset(vdev);
-   else
-   return -EINVAL;
+   return vfio_platform_call_reset(vdev);
}
 
return -ENOTTY;
-- 
1.8.2.1



[PATCH V5 4/6] vfio: platform: add support for ACPI probe

2016-05-15 Thread Sinan Kaya
The code is using the compatible DT string to associate a reset driver
with the actual device itself. The compatible string does not exist on
ACPI based systems. HID is the unique identifier for a device driver
instead.

Signed-off-by: Sinan Kaya 
---
 drivers/vfio/platform/vfio_platform_common.c  | 58 ---
 drivers/vfio/platform/vfio_platform_private.h |  1 +
 2 files changed, 54 insertions(+), 5 deletions(-)

diff --git a/drivers/vfio/platform/vfio_platform_common.c 
b/drivers/vfio/platform/vfio_platform_common.c
index 25378bd..d859d3b 100644
--- a/drivers/vfio/platform/vfio_platform_common.c
+++ b/drivers/vfio/platform/vfio_platform_common.c
@@ -13,6 +13,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -49,6 +50,37 @@ static vfio_platform_reset_fn_t 
vfio_platform_lookup_reset(const char *compat,
return reset_fn;
 }
 
+#ifdef CONFIG_ACPI
+int vfio_platform_acpi_probe(struct vfio_platform_device *vdev,
+struct device *dev)
+{
+   struct acpi_device *adev = ACPI_COMPANION(dev);
+
+   if (acpi_disabled)
+   return -ENODEV;
+
+   if (!adev) {
+   pr_err("VFIO: ACPI companion device not found for %s\n",
+   vdev->name);
+   return -ENODEV;
+   }
+
+   vdev->acpihid = acpi_device_hid(adev);
+   if (!vdev->acpihid) {
+   pr_err("VFIO: cannot find ACPI HID for %s\n",
+  vdev->name);
+   return -ENODEV;
+   }
+   return 0;
+}
+#else
+int vfio_platform_acpi_probe(struct vfio_platform_device *vdev,
+struct device *dev)
+{
+   return -EINVAL;
+}
+#endif
+
 static bool vfio_platform_has_reset(struct vfio_platform_device *vdev)
 {
return vdev->of_reset ? true : false;
@@ -548,6 +580,21 @@ static const struct vfio_device_ops vfio_platform_ops = {
.mmap   = vfio_platform_mmap,
 };
 
+int vfio_platform_of_probe(struct vfio_platform_device *vdev,
+  struct device *dev)
+{
+   int ret;
+
+   ret = device_property_read_string(dev, "compatible",
+ >compat);
+   if (ret) {
+   pr_err("VFIO: cannot retrieve compat for %s\n",
+   vdev->name);
+   return ret;
+   }
+   return 0;
+}
+
 int vfio_platform_probe_common(struct vfio_platform_device *vdev,
   struct device *dev)
 {
@@ -557,11 +604,12 @@ int vfio_platform_probe_common(struct 
vfio_platform_device *vdev,
if (!vdev)
return -EINVAL;
 
-   ret = device_property_read_string(dev, "compatible", >compat);
-   if (ret) {
-   pr_err("VFIO: cannot retrieve compat for %s\n", vdev->name);
-   return -EINVAL;
-   }
+   ret = vfio_platform_acpi_probe(vdev, dev);
+   if (ret)
+   ret = vfio_platform_of_probe(vdev, dev);
+
+   if (ret)
+   return ret;
 
vdev->device = dev;
 
diff --git a/drivers/vfio/platform/vfio_platform_private.h 
b/drivers/vfio/platform/vfio_platform_private.h
index 71ed7d1..ba9e4f8 100644
--- a/drivers/vfio/platform/vfio_platform_private.h
+++ b/drivers/vfio/platform/vfio_platform_private.h
@@ -58,6 +58,7 @@ struct vfio_platform_device {
struct mutexigate;
struct module   *parent_module;
const char  *compat;
+   const char  *acpihid;
struct module   *reset_module;
struct device   *device;
 
-- 
1.8.2.1



[PATCH V5 1/6] vfio: platform: rename reset function

2016-05-15 Thread Sinan Kaya
Renaming the reset function to of_reset as it is only used
by the device tree based platforms.

Signed-off-by: Sinan Kaya 
---
 drivers/vfio/platform/vfio_platform_common.c  | 30 +--
 drivers/vfio/platform/vfio_platform_private.h |  6 +++---
 2 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/drivers/vfio/platform/vfio_platform_common.c 
b/drivers/vfio/platform/vfio_platform_common.c
index e65b142..08fd7c2 100644
--- a/drivers/vfio/platform/vfio_platform_common.c
+++ b/drivers/vfio/platform/vfio_platform_common.c
@@ -41,7 +41,7 @@ static vfio_platform_reset_fn_t 
vfio_platform_lookup_reset(const char *compat,
if (!strcmp(iter->compat, compat) &&
try_module_get(iter->owner)) {
*module = iter->owner;
-   reset_fn = iter->reset;
+   reset_fn = iter->of_reset;
break;
}
}
@@ -51,18 +51,18 @@ static vfio_platform_reset_fn_t 
vfio_platform_lookup_reset(const char *compat,
 
 static void vfio_platform_get_reset(struct vfio_platform_device *vdev)
 {
-   vdev->reset = vfio_platform_lookup_reset(vdev->compat,
-   >reset_module);
-   if (!vdev->reset) {
+   vdev->of_reset = vfio_platform_lookup_reset(vdev->compat,
+   >reset_module);
+   if (!vdev->of_reset) {
request_module("vfio-reset:%s", vdev->compat);
-   vdev->reset = vfio_platform_lookup_reset(vdev->compat,
->reset_module);
+   vdev->of_reset = vfio_platform_lookup_reset(vdev->compat,
+   >reset_module);
}
 }
 
 static void vfio_platform_put_reset(struct vfio_platform_device *vdev)
 {
-   if (vdev->reset)
+   if (vdev->of_reset)
module_put(vdev->reset_module);
 }
 
@@ -141,9 +141,9 @@ static void vfio_platform_release(void *device_data)
mutex_lock(_lock);
 
if (!(--vdev->refcnt)) {
-   if (vdev->reset) {
+   if (vdev->of_reset) {
dev_info(vdev->device, "reset\n");
-   vdev->reset(vdev);
+   vdev->of_reset(vdev);
} else {
dev_warn(vdev->device, "no reset function found!\n");
}
@@ -175,9 +175,9 @@ static int vfio_platform_open(void *device_data)
if (ret)
goto err_irq;
 
-   if (vdev->reset) {
+   if (vdev->of_reset) {
dev_info(vdev->device, "reset\n");
-   vdev->reset(vdev);
+   vdev->of_reset(vdev);
} else {
dev_warn(vdev->device, "no reset function found!\n");
}
@@ -213,7 +213,7 @@ static long vfio_platform_ioctl(void *device_data,
if (info.argsz < minsz)
return -EINVAL;
 
-   if (vdev->reset)
+   if (vdev->of_reset)
vdev->flags |= VFIO_DEVICE_FLAGS_RESET;
info.flags = vdev->flags;
info.num_regions = vdev->num_regions;
@@ -312,8 +312,8 @@ static long vfio_platform_ioctl(void *device_data,
return ret;
 
} else if (cmd == VFIO_DEVICE_RESET) {
-   if (vdev->reset)
-   return vdev->reset(vdev);
+   if (vdev->of_reset)
+   return vdev->of_reset(vdev);
else
return -EINVAL;
}
@@ -611,7 +611,7 @@ void vfio_platform_unregister_reset(const char *compat,
 
mutex_lock(_lock);
list_for_each_entry_safe(iter, temp, _list, link) {
-   if (!strcmp(iter->compat, compat) && (iter->reset == fn)) {
+   if (!strcmp(iter->compat, compat) && (iter->of_reset == fn)) {
list_del(>link);
break;
}
diff --git a/drivers/vfio/platform/vfio_platform_private.h 
b/drivers/vfio/platform/vfio_platform_private.h
index 42816dd..71ed7d1 100644
--- a/drivers/vfio/platform/vfio_platform_private.h
+++ b/drivers/vfio/platform/vfio_platform_private.h
@@ -71,7 +71,7 @@ struct vfio_platform_device {
struct resource*
(*get_resource)(struct vfio_platform_device *vdev, int i);
int (*get_irq)(struct vfio_platform_device *vdev, int i);
-   int (*reset)(struct vfio_platform_device *vdev);
+   int (*of_reset)(struct vfio_platform_device *vdev);
 };
 
 typedef int (*vfio_platform_reset_fn_t)(struct vfio_platform_device *vdev);
@@ -80,7 +80,7 @@ struct vfio_platform_reset_node {
struct list_head link;
char *compat;
struct module *owner;
-   vfio_platform_reset_fn_t reset;
+ 

[PATCH V5 2/6] vfio: platform: move reset call to a common function

2016-05-15 Thread Sinan Kaya
The reset call sequence seems to replicate itself multiple times
across the file. Grouping them together for maintenance reasons.

Signed-off-by: Sinan Kaya 
---
 drivers/vfio/platform/vfio_platform_common.c | 31 ++--
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/drivers/vfio/platform/vfio_platform_common.c 
b/drivers/vfio/platform/vfio_platform_common.c
index 08fd7c2..cb91dd3 100644
--- a/drivers/vfio/platform/vfio_platform_common.c
+++ b/drivers/vfio/platform/vfio_platform_common.c
@@ -134,6 +134,18 @@ static void vfio_platform_regions_cleanup(struct 
vfio_platform_device *vdev)
kfree(vdev->regions);
 }
 
+static int vfio_platform_call_reset(struct vfio_platform_device *vdev)
+{
+   if (vdev->of_reset) {
+   dev_info(vdev->device, "reset\n");
+   vdev->of_reset(vdev);
+   return 0;
+   }
+
+   dev_warn(vdev->device, "no reset function found!\n");
+   return -EINVAL;
+}
+
 static void vfio_platform_release(void *device_data)
 {
struct vfio_platform_device *vdev = device_data;
@@ -141,12 +153,7 @@ static void vfio_platform_release(void *device_data)
mutex_lock(_lock);
 
if (!(--vdev->refcnt)) {
-   if (vdev->of_reset) {
-   dev_info(vdev->device, "reset\n");
-   vdev->of_reset(vdev);
-   } else {
-   dev_warn(vdev->device, "no reset function found!\n");
-   }
+   vfio_platform_call_reset(vdev);
vfio_platform_regions_cleanup(vdev);
vfio_platform_irq_cleanup(vdev);
}
@@ -175,12 +182,7 @@ static int vfio_platform_open(void *device_data)
if (ret)
goto err_irq;
 
-   if (vdev->of_reset) {
-   dev_info(vdev->device, "reset\n");
-   vdev->of_reset(vdev);
-   } else {
-   dev_warn(vdev->device, "no reset function found!\n");
-   }
+   vfio_platform_call_reset(vdev);
}
 
vdev->refcnt++;
@@ -312,10 +314,7 @@ static long vfio_platform_ioctl(void *device_data,
return ret;
 
} else if (cmd == VFIO_DEVICE_RESET) {
-   if (vdev->of_reset)
-   return vdev->of_reset(vdev);
-   else
-   return -EINVAL;
+   return vfio_platform_call_reset(vdev);
}
 
return -ENOTTY;
-- 
1.8.2.1



[PATCH V5 4/6] vfio: platform: add support for ACPI probe

2016-05-15 Thread Sinan Kaya
The code is using the compatible DT string to associate a reset driver
with the actual device itself. The compatible string does not exist on
ACPI based systems. HID is the unique identifier for a device driver
instead.

Signed-off-by: Sinan Kaya 
---
 drivers/vfio/platform/vfio_platform_common.c  | 58 ---
 drivers/vfio/platform/vfio_platform_private.h |  1 +
 2 files changed, 54 insertions(+), 5 deletions(-)

diff --git a/drivers/vfio/platform/vfio_platform_common.c 
b/drivers/vfio/platform/vfio_platform_common.c
index 25378bd..d859d3b 100644
--- a/drivers/vfio/platform/vfio_platform_common.c
+++ b/drivers/vfio/platform/vfio_platform_common.c
@@ -13,6 +13,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -49,6 +50,37 @@ static vfio_platform_reset_fn_t 
vfio_platform_lookup_reset(const char *compat,
return reset_fn;
 }
 
+#ifdef CONFIG_ACPI
+int vfio_platform_acpi_probe(struct vfio_platform_device *vdev,
+struct device *dev)
+{
+   struct acpi_device *adev = ACPI_COMPANION(dev);
+
+   if (acpi_disabled)
+   return -ENODEV;
+
+   if (!adev) {
+   pr_err("VFIO: ACPI companion device not found for %s\n",
+   vdev->name);
+   return -ENODEV;
+   }
+
+   vdev->acpihid = acpi_device_hid(adev);
+   if (!vdev->acpihid) {
+   pr_err("VFIO: cannot find ACPI HID for %s\n",
+  vdev->name);
+   return -ENODEV;
+   }
+   return 0;
+}
+#else
+int vfio_platform_acpi_probe(struct vfio_platform_device *vdev,
+struct device *dev)
+{
+   return -EINVAL;
+}
+#endif
+
 static bool vfio_platform_has_reset(struct vfio_platform_device *vdev)
 {
return vdev->of_reset ? true : false;
@@ -548,6 +580,21 @@ static const struct vfio_device_ops vfio_platform_ops = {
.mmap   = vfio_platform_mmap,
 };
 
+int vfio_platform_of_probe(struct vfio_platform_device *vdev,
+  struct device *dev)
+{
+   int ret;
+
+   ret = device_property_read_string(dev, "compatible",
+ >compat);
+   if (ret) {
+   pr_err("VFIO: cannot retrieve compat for %s\n",
+   vdev->name);
+   return ret;
+   }
+   return 0;
+}
+
 int vfio_platform_probe_common(struct vfio_platform_device *vdev,
   struct device *dev)
 {
@@ -557,11 +604,12 @@ int vfio_platform_probe_common(struct 
vfio_platform_device *vdev,
if (!vdev)
return -EINVAL;
 
-   ret = device_property_read_string(dev, "compatible", >compat);
-   if (ret) {
-   pr_err("VFIO: cannot retrieve compat for %s\n", vdev->name);
-   return -EINVAL;
-   }
+   ret = vfio_platform_acpi_probe(vdev, dev);
+   if (ret)
+   ret = vfio_platform_of_probe(vdev, dev);
+
+   if (ret)
+   return ret;
 
vdev->device = dev;
 
diff --git a/drivers/vfio/platform/vfio_platform_private.h 
b/drivers/vfio/platform/vfio_platform_private.h
index 71ed7d1..ba9e4f8 100644
--- a/drivers/vfio/platform/vfio_platform_private.h
+++ b/drivers/vfio/platform/vfio_platform_private.h
@@ -58,6 +58,7 @@ struct vfio_platform_device {
struct mutexigate;
struct module   *parent_module;
const char  *compat;
+   const char  *acpihid;
struct module   *reset_module;
struct device   *device;
 
-- 
1.8.2.1



[PATCH V5 5/6] vfio: platform: call _RST method when using ACPI

2016-05-15 Thread Sinan Kaya
The device tree code checks for the presence of a reset driver and calls
the of_reset function pointer by looking up the reset driver as a module.

ACPI defines _RST method to perform device level reset. After the _RST
method is executed, the OS can resume using the device.

This patch checks the presence of _RST method and calls the _RST
method when reset is requested.

Signed-off-by: Sinan Kaya 
---
 drivers/vfio/platform/vfio_platform_common.c | 44 
 1 file changed, 44 insertions(+)

diff --git a/drivers/vfio/platform/vfio_platform_common.c 
b/drivers/vfio/platform/vfio_platform_common.c
index d859d3b..095d5b7 100644
--- a/drivers/vfio/platform/vfio_platform_common.c
+++ b/drivers/vfio/platform/vfio_platform_common.c
@@ -73,21 +73,59 @@ int vfio_platform_acpi_probe(struct vfio_platform_device 
*vdev,
}
return 0;
 }
+
+static int vfio_platform_acpi_call_reset(struct vfio_platform_device *vdev)
+{
+   struct device *dev = vdev->device;
+   acpi_handle handle = ACPI_HANDLE(dev);
+   acpi_status acpi_ret;
+   unsigned long long val;
+
+   acpi_ret = acpi_evaluate_integer(handle, "_RST", NULL, );
+   if (ACPI_FAILURE(acpi_ret))
+   return -EINVAL;
+
+   return 0;
+}
+
+static bool vfio_platform_acpi_has_reset(struct vfio_platform_device *vdev)
+{
+   struct device *dev = vdev->device;
+   acpi_handle handle = ACPI_HANDLE(dev);
+
+   return acpi_has_method(handle, "_RST");
+}
 #else
 int vfio_platform_acpi_probe(struct vfio_platform_device *vdev,
 struct device *dev)
 {
return -EINVAL;
 }
+
+static int vfio_platform_acpi_call_reset(struct vfio_platform_device *vdev)
+{
+   return -EINVAL;
+}
+
+static bool vfio_platform_acpi_has_reset(struct vfio_platform_device *vdev)
+{
+   return false;
+}
 #endif
 
 static bool vfio_platform_has_reset(struct vfio_platform_device *vdev)
 {
+   if (vdev->acpihid)
+   return vfio_platform_acpi_has_reset(vdev);
+
return vdev->of_reset ? true : false;
 }
 
 static void vfio_platform_get_reset(struct vfio_platform_device *vdev)
 {
+   if (vdev->acpihid)
+   return;
+
vdev->of_reset = vfio_platform_lookup_reset(vdev->compat,
>reset_module);
if (!vdev->of_reset) {
@@ -99,6 +137,9 @@ static void vfio_platform_get_reset(struct 
vfio_platform_device *vdev)
 
 static void vfio_platform_put_reset(struct vfio_platform_device *vdev)
 {
+   if (vdev->acpihid)
+   return;
+
if (vdev->of_reset)
module_put(vdev->reset_module);
 }
@@ -177,6 +218,9 @@ static int vfio_platform_call_reset(struct 
vfio_platform_device *vdev)
dev_info(vdev->device, "reset\n");
vdev->of_reset(vdev);
return 0;
+   } else if (vdev->acpihid) {
+   dev_info(vdev->device, "reset\n");
+   return vfio_platform_acpi_call_reset(vdev);
}
 
dev_warn(vdev->device, "no reset function found!\n");
-- 
1.8.2.1



Re: [PATCH v5 07/12] zsmalloc: factor page chain functionality out

2016-05-15 Thread Sergey Senozhatsky
On (05/09/16 11:20), Minchan Kim wrote:
> For page migration, we need to create page chain of zspage dynamically
> so this patch factors it out from alloc_zspage.
> 
> Cc: Sergey Senozhatsky 
> Signed-off-by: Minchan Kim 

Reviewed-by: Sergey Senozhatsky 

[..]
> + page = alloc_page(flags);
> + if (!page) {
> + while (--i >= 0)
> + __free_page(pages[i]);

put_page() ?

a minor nit, put_page() here probably will be in alignment
with __free_zspage(), which does put_page().

-ss

> + return NULL;
> + }
> + pages[i] = page;
>   }
>  
> + create_page_chain(pages, class->pages_per_zspage);
> + first_page = pages[0];
> + init_zspage(class, first_page);
> +
>   return first_page;
>  }


[PATCH V5 6/6] vfio, platform: make reset driver a requirement by default

2016-05-15 Thread Sinan Kaya
The code was allowing platform devices to be used without a supporting
VFIO reset driver. The hardware can be left in some inconsistent state
after a guest machine abort.

The reset driver will put the hardware back to safe state and disable
interrupts before returning the control back to the host machine.

Signed-off-by: Sinan Kaya 
---
 drivers/vfio/platform/vfio_amba.c |  5 +
 drivers/vfio/platform/vfio_platform.c |  5 +
 drivers/vfio/platform/vfio_platform_common.c  | 18 ++
 drivers/vfio/platform/vfio_platform_private.h |  1 +
 4 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/platform/vfio_amba.c 
b/drivers/vfio/platform/vfio_amba.c
index a66479b..7585902 100644
--- a/drivers/vfio/platform/vfio_amba.c
+++ b/drivers/vfio/platform/vfio_amba.c
@@ -23,6 +23,10 @@
 #define DRIVER_AUTHOR   "Antonios Motakis "
 #define DRIVER_DESC "VFIO for AMBA devices - User Level meta-driver"
 
+static bool reset_required = true;
+module_param(reset_required, bool, 0644);
+MODULE_PARM_DESC(reset_required, "override reset requirement (default: 1)");
+
 /* probing devices from the AMBA bus */
 
 static struct resource *get_amba_resource(struct vfio_platform_device *vdev,
@@ -68,6 +72,7 @@ static int vfio_amba_probe(struct amba_device *adev, const 
struct amba_id *id)
vdev->get_resource = get_amba_resource;
vdev->get_irq = get_amba_irq;
vdev->parent_module = THIS_MODULE;
+   vdev->reset_required = reset_required;
 
ret = vfio_platform_probe_common(vdev, >dev);
if (ret) {
diff --git a/drivers/vfio/platform/vfio_platform.c 
b/drivers/vfio/platform/vfio_platform.c
index b1cc3a7..ef89146 100644
--- a/drivers/vfio/platform/vfio_platform.c
+++ b/drivers/vfio/platform/vfio_platform.c
@@ -23,6 +23,10 @@
 #define DRIVER_AUTHOR   "Antonios Motakis "
 #define DRIVER_DESC "VFIO for platform devices - User Level meta-driver"
 
+static bool reset_required = true;
+module_param(reset_required, bool, 0644);
+MODULE_PARM_DESC(reset_required, "override reset requirement (default: 1)");
+
 /* probing devices from the linux platform bus */
 
 static struct resource *get_platform_resource(struct vfio_platform_device 
*vdev,
@@ -66,6 +70,7 @@ static int vfio_platform_probe(struct platform_device *pdev)
vdev->get_resource = get_platform_resource;
vdev->get_irq = get_platform_irq;
vdev->parent_module = THIS_MODULE;
+   vdev->reset_required = reset_required;
 
ret = vfio_platform_probe_common(vdev, >dev);
if (ret)
diff --git a/drivers/vfio/platform/vfio_platform_common.c 
b/drivers/vfio/platform/vfio_platform_common.c
index 095d5b7..89fb18f 100644
--- a/drivers/vfio/platform/vfio_platform_common.c
+++ b/drivers/vfio/platform/vfio_platform_common.c
@@ -121,10 +121,10 @@ static bool vfio_platform_has_reset(struct 
vfio_platform_device *vdev)
return vdev->of_reset ? true : false;
 }
 
-static void vfio_platform_get_reset(struct vfio_platform_device *vdev)
+static int vfio_platform_get_reset(struct vfio_platform_device *vdev)
 {
if (vdev->acpihid)
-   return;
+   return vfio_platform_acpi_has_reset(vdev) ? 0 : -EINVAL;
 
vdev->of_reset = vfio_platform_lookup_reset(vdev->compat,
>reset_module);
@@ -133,6 +133,8 @@ static void vfio_platform_get_reset(struct 
vfio_platform_device *vdev)
vdev->of_reset = vfio_platform_lookup_reset(vdev->compat,
>reset_module);
}
+
+   return vdev->of_reset ? 0 : -EINVAL;
 }
 
 static void vfio_platform_put_reset(struct vfio_platform_device *vdev)
@@ -263,7 +265,9 @@ static int vfio_platform_open(void *device_data)
if (ret)
goto err_irq;
 
-   vfio_platform_call_reset(vdev);
+   ret = vfio_platform_call_reset(vdev);
+   if (ret && vdev->reset_required)
+   goto err_irq;
}
 
vdev->refcnt++;
@@ -669,7 +673,13 @@ int vfio_platform_probe_common(struct vfio_platform_device 
*vdev,
return ret;
}
 
-   vfio_platform_get_reset(vdev);
+   ret = vfio_platform_get_reset(vdev);
+   if (ret && vdev->reset_required) {
+   pr_err("vfio: no reset function found for device %s\n",
+  vdev->name);
+   iommu_group_put(group);
+   return ret;
+   }
 
mutex_init(>igate);
 
diff --git a/drivers/vfio/platform/vfio_platform_private.h 
b/drivers/vfio/platform/vfio_platform_private.h
index ba9e4f8..68fbc00 100644
--- a/drivers/vfio/platform/vfio_platform_private.h
+++ b/drivers/vfio/platform/vfio_platform_private.h
@@ -50,6 +50,7 @@ struct vfio_platform_region {
 };
 
 struct vfio_platform_device {
+   

[PATCH V5 6/6] vfio, platform: make reset driver a requirement by default

2016-05-15 Thread Sinan Kaya
The code was allowing platform devices to be used without a supporting
VFIO reset driver. The hardware can be left in some inconsistent state
after a guest machine abort.

The reset driver will put the hardware back to safe state and disable
interrupts before returning the control back to the host machine.

Signed-off-by: Sinan Kaya 
---
 drivers/vfio/platform/vfio_amba.c |  5 +
 drivers/vfio/platform/vfio_platform.c |  5 +
 drivers/vfio/platform/vfio_platform_common.c  | 18 ++
 drivers/vfio/platform/vfio_platform_private.h |  1 +
 4 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/platform/vfio_amba.c 
b/drivers/vfio/platform/vfio_amba.c
index a66479b..7585902 100644
--- a/drivers/vfio/platform/vfio_amba.c
+++ b/drivers/vfio/platform/vfio_amba.c
@@ -23,6 +23,10 @@
 #define DRIVER_AUTHOR   "Antonios Motakis "
 #define DRIVER_DESC "VFIO for AMBA devices - User Level meta-driver"
 
+static bool reset_required = true;
+module_param(reset_required, bool, 0644);
+MODULE_PARM_DESC(reset_required, "override reset requirement (default: 1)");
+
 /* probing devices from the AMBA bus */
 
 static struct resource *get_amba_resource(struct vfio_platform_device *vdev,
@@ -68,6 +72,7 @@ static int vfio_amba_probe(struct amba_device *adev, const 
struct amba_id *id)
vdev->get_resource = get_amba_resource;
vdev->get_irq = get_amba_irq;
vdev->parent_module = THIS_MODULE;
+   vdev->reset_required = reset_required;
 
ret = vfio_platform_probe_common(vdev, >dev);
if (ret) {
diff --git a/drivers/vfio/platform/vfio_platform.c 
b/drivers/vfio/platform/vfio_platform.c
index b1cc3a7..ef89146 100644
--- a/drivers/vfio/platform/vfio_platform.c
+++ b/drivers/vfio/platform/vfio_platform.c
@@ -23,6 +23,10 @@
 #define DRIVER_AUTHOR   "Antonios Motakis "
 #define DRIVER_DESC "VFIO for platform devices - User Level meta-driver"
 
+static bool reset_required = true;
+module_param(reset_required, bool, 0644);
+MODULE_PARM_DESC(reset_required, "override reset requirement (default: 1)");
+
 /* probing devices from the linux platform bus */
 
 static struct resource *get_platform_resource(struct vfio_platform_device 
*vdev,
@@ -66,6 +70,7 @@ static int vfio_platform_probe(struct platform_device *pdev)
vdev->get_resource = get_platform_resource;
vdev->get_irq = get_platform_irq;
vdev->parent_module = THIS_MODULE;
+   vdev->reset_required = reset_required;
 
ret = vfio_platform_probe_common(vdev, >dev);
if (ret)
diff --git a/drivers/vfio/platform/vfio_platform_common.c 
b/drivers/vfio/platform/vfio_platform_common.c
index 095d5b7..89fb18f 100644
--- a/drivers/vfio/platform/vfio_platform_common.c
+++ b/drivers/vfio/platform/vfio_platform_common.c
@@ -121,10 +121,10 @@ static bool vfio_platform_has_reset(struct 
vfio_platform_device *vdev)
return vdev->of_reset ? true : false;
 }
 
-static void vfio_platform_get_reset(struct vfio_platform_device *vdev)
+static int vfio_platform_get_reset(struct vfio_platform_device *vdev)
 {
if (vdev->acpihid)
-   return;
+   return vfio_platform_acpi_has_reset(vdev) ? 0 : -EINVAL;
 
vdev->of_reset = vfio_platform_lookup_reset(vdev->compat,
>reset_module);
@@ -133,6 +133,8 @@ static void vfio_platform_get_reset(struct 
vfio_platform_device *vdev)
vdev->of_reset = vfio_platform_lookup_reset(vdev->compat,
>reset_module);
}
+
+   return vdev->of_reset ? 0 : -EINVAL;
 }
 
 static void vfio_platform_put_reset(struct vfio_platform_device *vdev)
@@ -263,7 +265,9 @@ static int vfio_platform_open(void *device_data)
if (ret)
goto err_irq;
 
-   vfio_platform_call_reset(vdev);
+   ret = vfio_platform_call_reset(vdev);
+   if (ret && vdev->reset_required)
+   goto err_irq;
}
 
vdev->refcnt++;
@@ -669,7 +673,13 @@ int vfio_platform_probe_common(struct vfio_platform_device 
*vdev,
return ret;
}
 
-   vfio_platform_get_reset(vdev);
+   ret = vfio_platform_get_reset(vdev);
+   if (ret && vdev->reset_required) {
+   pr_err("vfio: no reset function found for device %s\n",
+  vdev->name);
+   iommu_group_put(group);
+   return ret;
+   }
 
mutex_init(>igate);
 
diff --git a/drivers/vfio/platform/vfio_platform_private.h 
b/drivers/vfio/platform/vfio_platform_private.h
index ba9e4f8..68fbc00 100644
--- a/drivers/vfio/platform/vfio_platform_private.h
+++ b/drivers/vfio/platform/vfio_platform_private.h
@@ -50,6 +50,7 @@ struct vfio_platform_region {
 };
 
 struct vfio_platform_device {
+   boolreset_required;
struct vfio_platform_region 

[PATCH V5 5/6] vfio: platform: call _RST method when using ACPI

2016-05-15 Thread Sinan Kaya
The device tree code checks for the presence of a reset driver and calls
the of_reset function pointer by looking up the reset driver as a module.

ACPI defines _RST method to perform device level reset. After the _RST
method is executed, the OS can resume using the device.

This patch checks the presence of _RST method and calls the _RST
method when reset is requested.

Signed-off-by: Sinan Kaya 
---
 drivers/vfio/platform/vfio_platform_common.c | 44 
 1 file changed, 44 insertions(+)

diff --git a/drivers/vfio/platform/vfio_platform_common.c 
b/drivers/vfio/platform/vfio_platform_common.c
index d859d3b..095d5b7 100644
--- a/drivers/vfio/platform/vfio_platform_common.c
+++ b/drivers/vfio/platform/vfio_platform_common.c
@@ -73,21 +73,59 @@ int vfio_platform_acpi_probe(struct vfio_platform_device 
*vdev,
}
return 0;
 }
+
+static int vfio_platform_acpi_call_reset(struct vfio_platform_device *vdev)
+{
+   struct device *dev = vdev->device;
+   acpi_handle handle = ACPI_HANDLE(dev);
+   acpi_status acpi_ret;
+   unsigned long long val;
+
+   acpi_ret = acpi_evaluate_integer(handle, "_RST", NULL, );
+   if (ACPI_FAILURE(acpi_ret))
+   return -EINVAL;
+
+   return 0;
+}
+
+static bool vfio_platform_acpi_has_reset(struct vfio_platform_device *vdev)
+{
+   struct device *dev = vdev->device;
+   acpi_handle handle = ACPI_HANDLE(dev);
+
+   return acpi_has_method(handle, "_RST");
+}
 #else
 int vfio_platform_acpi_probe(struct vfio_platform_device *vdev,
 struct device *dev)
 {
return -EINVAL;
 }
+
+static int vfio_platform_acpi_call_reset(struct vfio_platform_device *vdev)
+{
+   return -EINVAL;
+}
+
+static bool vfio_platform_acpi_has_reset(struct vfio_platform_device *vdev)
+{
+   return false;
+}
 #endif
 
 static bool vfio_platform_has_reset(struct vfio_platform_device *vdev)
 {
+   if (vdev->acpihid)
+   return vfio_platform_acpi_has_reset(vdev);
+
return vdev->of_reset ? true : false;
 }
 
 static void vfio_platform_get_reset(struct vfio_platform_device *vdev)
 {
+   if (vdev->acpihid)
+   return;
+
vdev->of_reset = vfio_platform_lookup_reset(vdev->compat,
>reset_module);
if (!vdev->of_reset) {
@@ -99,6 +137,9 @@ static void vfio_platform_get_reset(struct 
vfio_platform_device *vdev)
 
 static void vfio_platform_put_reset(struct vfio_platform_device *vdev)
 {
+   if (vdev->acpihid)
+   return;
+
if (vdev->of_reset)
module_put(vdev->reset_module);
 }
@@ -177,6 +218,9 @@ static int vfio_platform_call_reset(struct 
vfio_platform_device *vdev)
dev_info(vdev->device, "reset\n");
vdev->of_reset(vdev);
return 0;
+   } else if (vdev->acpihid) {
+   dev_info(vdev->device, "reset\n");
+   return vfio_platform_acpi_call_reset(vdev);
}
 
dev_warn(vdev->device, "no reset function found!\n");
-- 
1.8.2.1



Re: [PATCH v5 07/12] zsmalloc: factor page chain functionality out

2016-05-15 Thread Sergey Senozhatsky
On (05/09/16 11:20), Minchan Kim wrote:
> For page migration, we need to create page chain of zspage dynamically
> so this patch factors it out from alloc_zspage.
> 
> Cc: Sergey Senozhatsky 
> Signed-off-by: Minchan Kim 

Reviewed-by: Sergey Senozhatsky 

[..]
> + page = alloc_page(flags);
> + if (!page) {
> + while (--i >= 0)
> + __free_page(pages[i]);

put_page() ?

a minor nit, put_page() here probably will be in alignment
with __free_zspage(), which does put_page().

-ss

> + return NULL;
> + }
> + pages[i] = page;
>   }
>  
> + create_page_chain(pages, class->pages_per_zspage);
> + first_page = pages[0];
> + init_zspage(class, first_page);
> +
>   return first_page;
>  }


Re: [patch net-next 07/11] net: hns: dsaf adds support of acpi

2016-05-15 Thread Yankejian (Hackim Yim)


On 2016/5/13 21:12, Andy Shevchenko wrote:
> On Fri, 2016-05-13 at 16:19 +0800, Yisen Zhuang wrote:
>> From: Kejian Yan 
>>
>> Dsaf needs to get configuration parameter by ACPI, so this patch add
>> support of ACPI.
>>
> Looks like at some point better to split driver to core part, and PCI
> and ACPI/DT/platform code.
>
> Too many changes where IS_ENABLED() involved shows as I can imagine bad
> architecture / split of the driver.

Hi Andy,
Actully, we use the unified function asap. The routine in DT/ACPI maybe 
difference. Some routine
will be treated in BIOS in ACPI case, but it will be treated in OS in DT case, 
so we need to distinguish
it.
And we will try to reduce the use of IS_ENABLED().

Thanks very much for your suggestions, Andy

Kejian




linux-next: build failure after merge of the spi tree

2016-05-15 Thread Stephen Rothwell
Hi Mark,

After merging the spi tree, today's linux-next build (x86_64 allmodconfig)
failed like this:

drivers/spi/spi-orion.c: In function 'orion_spi_write_read':
drivers/spi/spi-orion.c:407:3: error: implicit declaration of function 
'writesl' [-Werror=implicit-function-declaration]
   writesl(orion_spi->direct_access[cs].vaddr, xfer->tx_buf, cnt);
   ^
drivers/spi/spi-orion.c:411:4: error: implicit declaration of function 
'writesb' [-Werror=implicit-function-declaration]
writesb(orion_spi->direct_access[cs].vaddr, [cnt],
^

Caused by commit

  dbf24b7c0d94 ("spi: orion: Add direct access mode")

I have merged the version fof the spi tree from next-20160513 for today.

-- 
Cheers,
Stephen Rothwell


Re: [patch net-next 07/11] net: hns: dsaf adds support of acpi

2016-05-15 Thread Yankejian (Hackim Yim)


On 2016/5/13 21:12, Andy Shevchenko wrote:
> On Fri, 2016-05-13 at 16:19 +0800, Yisen Zhuang wrote:
>> From: Kejian Yan 
>>
>> Dsaf needs to get configuration parameter by ACPI, so this patch add
>> support of ACPI.
>>
> Looks like at some point better to split driver to core part, and PCI
> and ACPI/DT/platform code.
>
> Too many changes where IS_ENABLED() involved shows as I can imagine bad
> architecture / split of the driver.

Hi Andy,
Actully, we use the unified function asap. The routine in DT/ACPI maybe 
difference. Some routine
will be treated in BIOS in ACPI case, but it will be treated in OS in DT case, 
so we need to distinguish
it.
And we will try to reduce the use of IS_ENABLED().

Thanks very much for your suggestions, Andy

Kejian




linux-next: build failure after merge of the spi tree

2016-05-15 Thread Stephen Rothwell
Hi Mark,

After merging the spi tree, today's linux-next build (x86_64 allmodconfig)
failed like this:

drivers/spi/spi-orion.c: In function 'orion_spi_write_read':
drivers/spi/spi-orion.c:407:3: error: implicit declaration of function 
'writesl' [-Werror=implicit-function-declaration]
   writesl(orion_spi->direct_access[cs].vaddr, xfer->tx_buf, cnt);
   ^
drivers/spi/spi-orion.c:411:4: error: implicit declaration of function 
'writesb' [-Werror=implicit-function-declaration]
writesb(orion_spi->direct_access[cs].vaddr, [cnt],
^

Caused by commit

  dbf24b7c0d94 ("spi: orion: Add direct access mode")

I have merged the version fof the spi tree from next-20160513 for today.

-- 
Cheers,
Stephen Rothwell


Re: [patch net-next 06/11] ACPI: bus: move acpi_match_device_ids() to linux/acpi.h

2016-05-15 Thread Yankejian (Hackim Yim)


On 2016/5/13 21:15, Andy Shevchenko wrote:
> On Fri, 2016-05-13 at 16:19 +0800, Yisen Zhuang wrote:
>> From: Hanjun Guo 
>>
>> acpi_match_device_ids() will be used for drivers to match
>> different hardware versions, it will be compiled in non-ACPI
>> case, but acpi_match_device_ids() in acpi_bus.h and it can
>> only be used in ACPI case, so move it to linux/acpi.h and
>> introduce a stub function for it.
> I somehow doubt this is right move.
>
> Like I said in the previous comment the architectural split might make
> this a bit better.
>
> You might use 
>
> #ifdef IS_ENABLED(CONFIG_ACPI)
> #else
> #endif
>
> only once to some big part of code. If kernel is build without ACPI
> support you even will not have this in your driver at all.

Hi Andy,

Thanks for your suggestions. It will add stub function instead in next submit.


> -- 
> Andy Shevchenko 
> Intel Finland Oy
>
>
> .
>




Re: [patch net-next 06/11] ACPI: bus: move acpi_match_device_ids() to linux/acpi.h

2016-05-15 Thread Yankejian (Hackim Yim)


On 2016/5/13 21:15, Andy Shevchenko wrote:
> On Fri, 2016-05-13 at 16:19 +0800, Yisen Zhuang wrote:
>> From: Hanjun Guo 
>>
>> acpi_match_device_ids() will be used for drivers to match
>> different hardware versions, it will be compiled in non-ACPI
>> case, but acpi_match_device_ids() in acpi_bus.h and it can
>> only be used in ACPI case, so move it to linux/acpi.h and
>> introduce a stub function for it.
> I somehow doubt this is right move.
>
> Like I said in the previous comment the architectural split might make
> this a bit better.
>
> You might use 
>
> #ifdef IS_ENABLED(CONFIG_ACPI)
> #else
> #endif
>
> only once to some big part of code. If kernel is build without ACPI
> support you even will not have this in your driver at all.

Hi Andy,

Thanks for your suggestions. It will add stub function instead in next submit.


> -- 
> Andy Shevchenko 
> Intel Finland Oy
>
>
> .
>




Re: [PATCH v5 06/12] zsmalloc: use accessor

2016-05-15 Thread Sergey Senozhatsky
On (05/09/16 11:20), Minchan Kim wrote:
> Upcoming patch will change how to encode zspage meta so for easy review,
> this patch wraps code to access metadata as accessor.
> 
> Cc: Sergey Senozhatsky 
> Signed-off-by: Minchan Kim 

Reviewed-by: Sergey Senozhatsky 

-ss


Re: [PATCH v5 06/12] zsmalloc: use accessor

2016-05-15 Thread Sergey Senozhatsky
On (05/09/16 11:20), Minchan Kim wrote:
> Upcoming patch will change how to encode zspage meta so for easy review,
> this patch wraps code to access metadata as accessor.
> 
> Cc: Sergey Senozhatsky 
> Signed-off-by: Minchan Kim 

Reviewed-by: Sergey Senozhatsky 

-ss


  1   2   3   >