date:20160411

Re: [PATCH RFC v0 00/12] Cyclic Scheduler Against RTC

2016-04-11 Thread Mike Galbraith

On Mon, 2016-04-11 at 22:29 -0700, Bill Huey (hui) wrote:
> Hi,
> 
> This a crude cyclic scheduler implementation. It uses SCHED_FIFO tasks
> and runs them according to a map pattern specified by a 64 bit mask. Each
> bit corresponds to an entry into an 64 entry array of
> 'struct task_struct'. This works single core CPU 0 only for now.
> 
> Threads are 'admitted' to this map by an extension to the ioctl() via the
> of (rtc) real-time clock interface. The bit pattern then determines when
> the task will run or activate next.
> 
> The /dev/rtc interface is choosen for this purpose because of its
> accessibilty to userspace. For example, the mplayer program already use
> it as a timer source and could possibly benefit from being sync to a
> vertical retrace interrupt during decoding. Could be an OpenGL program
> needing precisely scheduler support for those same handling vertical
> retrace interrupts, low latency audio and timely handling of touch
> events amognst other uses.

Sounds like you want SGI's frame rate scheduler.

-Mike

Re: [PATCH RFC v0 00/12] Cyclic Scheduler Against RTC

2016-04-11 Thread Mike Galbraith

On Mon, 2016-04-11 at 22:29 -0700, Bill Huey (hui) wrote:
> Hi,
> 
> This a crude cyclic scheduler implementation. It uses SCHED_FIFO tasks
> and runs them according to a map pattern specified by a 64 bit mask. Each
> bit corresponds to an entry into an 64 entry array of
> 'struct task_struct'. This works single core CPU 0 only for now.
> 
> Threads are 'admitted' to this map by an extension to the ioctl() via the
> of (rtc) real-time clock interface. The bit pattern then determines when
> the task will run or activate next.
> 
> The /dev/rtc interface is choosen for this purpose because of its
> accessibilty to userspace. For example, the mplayer program already use
> it as a timer source and could possibly benefit from being sync to a
> vertical retrace interrupt during decoding. Could be an OpenGL program
> needing precisely scheduler support for those same handling vertical
> retrace interrupts, low latency audio and timely handling of touch
> events amognst other uses.

Sounds like you want SGI's frame rate scheduler.

-Mike

Re: [PATCH 1/3] ARM: dts: vf-colibri: alias the primary FEC as ethernet0

2016-04-11 Thread Shawn Guo

On Fri, Apr 01, 2016 at 11:13:39PM -0700, Stefan Agner wrote:
> The Vybrid based Colibri modules provide a on-module PHY which is
> connected to the second FEC instance FEC1. Since the on-module
> Ethernet port is considered as primary ethernet interface, alias
> fec1 as ethernet0. This also makes sure that the first MAC address
> provided by the boot loader gets assigned to the FEC instance used
> for the on-module PHY.
> 
> Signed-off-by: Stefan Agner 

Applied all, thanks.

Re: [PATCH 1/3] ARM: dts: vf-colibri: alias the primary FEC as ethernet0

2016-04-11 Thread Shawn Guo

On Fri, Apr 01, 2016 at 11:13:39PM -0700, Stefan Agner wrote:
> The Vybrid based Colibri modules provide a on-module PHY which is
> connected to the second FEC instance FEC1. Since the on-module
> Ethernet port is considered as primary ethernet interface, alias
> fec1 as ethernet0. This also makes sure that the first MAC address
> provided by the boot loader gets assigned to the FEC instance used
> for the on-module PHY.
> 
> Signed-off-by: Stefan Agner 

Applied all, thanks.

[PATCH] watchdog: dw_wdt: dont build for avr32

2016-04-11 Thread Sudip Mukherjee

The build of avr32 allmodconfig fails with the error:
ERROR: "__avr32_udiv64" [drivers/watchdog/kempld_wdt.ko] undefined!

Exclude this driver from the build of avr32.

Signed-off-by: Sudip Mukherjee 
---

avr32 build log is at:
https://travis-ci.org/sudipm-mukherjee/parport/jobs/122158665

 drivers/watchdog/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig
index fb94765..61041ba 100644
--- a/drivers/watchdog/Kconfig
+++ b/drivers/watchdog/Kconfig
@@ -981,7 +981,7 @@ config HP_WATCHDOG
 
 config KEMPLD_WDT
tristate "Kontron COM Watchdog Timer"
-   depends on MFD_KEMPLD
+   depends on MFD_KEMPLD && !AVR32
select WATCHDOG_CORE
help
  Support for the PLD watchdog on some Kontron ETX and COMexpress
-- 
1.9.1

[PATCH] watchdog: dw_wdt: dont build for avr32

2016-04-11 Thread Sudip Mukherjee

The build of avr32 allmodconfig fails with the error:
ERROR: "__avr32_udiv64" [drivers/watchdog/kempld_wdt.ko] undefined!

Exclude this driver from the build of avr32.

Signed-off-by: Sudip Mukherjee 
---

avr32 build log is at:
https://travis-ci.org/sudipm-mukherjee/parport/jobs/122158665

 drivers/watchdog/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig
index fb94765..61041ba 100644
--- a/drivers/watchdog/Kconfig
+++ b/drivers/watchdog/Kconfig
@@ -981,7 +981,7 @@ config HP_WATCHDOG
 
 config KEMPLD_WDT
tristate "Kontron COM Watchdog Timer"
-   depends on MFD_KEMPLD
+   depends on MFD_KEMPLD && !AVR32
select WATCHDOG_CORE
help
  Support for the PLD watchdog on some Kontron ETX and COMexpress
-- 
1.9.1

Re: [PATCH v6 10/10] clocksource: arm_arch_timer: Remove arch_timer_get_timecounter

2016-04-11 Thread Daniel Lezcano

On Mon, Apr 11, 2016 at 04:33:00PM +0100, Julien Grall wrote:
> The only call of arch_timer_get_timecounter (in KVM) has been removed.
> 
> Signed-off-by: Julien Grall 
> Acked-by: Christoffer Dall 
> 
> ---
> Cc: Daniel Lezcano 
> Cc: Thomas Gleixner 
> 
> Changes in v4:
> - Add Christoffer's acked-by
> 
> Changes in v3:
> - Patch added
> ---

Acked-by: Daniel Lezcano

Re: [PATCH v6 10/10] clocksource: arm_arch_timer: Remove arch_timer_get_timecounter

2016-04-11 Thread Daniel Lezcano

On Mon, Apr 11, 2016 at 04:33:00PM +0100, Julien Grall wrote:
> The only call of arch_timer_get_timecounter (in KVM) has been removed.
> 
> Signed-off-by: Julien Grall 
> Acked-by: Christoffer Dall 
> 
> ---
> Cc: Daniel Lezcano 
> Cc: Thomas Gleixner 
> 
> Changes in v4:
> - Add Christoffer's acked-by
> 
> Changes in v3:
> - Patch added
> ---

Acked-by: Daniel Lezcano

Re: [PATCH v6 02/10] clocksource: arm_arch_timer: Extend arch_timer_kvm_info to get the virtual IRQ

2016-04-11 Thread Daniel Lezcano

On Mon, Apr 11, 2016 at 04:32:52PM +0100, Julien Grall wrote:
> Currently, the firmware table is parsed by the virtual timer code in
> order to retrieve the virtual timer interrupt. However, this is already
> done by the arch timer driver.
> 
> To avoid code duplication, extend arch_timer_kvm_info to get the virtual
> IRQ.
> 
> Note that the KVM code will be modified in a subsequent patch.
> 
> Signed-off-by: Julien Grall 
> Acked-by: Christoffer Dall 
> 
> ---

Acked-by: Daniel Lezcano

Re: [PATCH v6 02/10] clocksource: arm_arch_timer: Extend arch_timer_kvm_info to get the virtual IRQ

2016-04-11 Thread Daniel Lezcano

On Mon, Apr 11, 2016 at 04:32:52PM +0100, Julien Grall wrote:
> Currently, the firmware table is parsed by the virtual timer code in
> order to retrieve the virtual timer interrupt. However, this is already
> done by the arch timer driver.
> 
> To avoid code duplication, extend arch_timer_kvm_info to get the virtual
> IRQ.
> 
> Note that the KVM code will be modified in a subsequent patch.
> 
> Signed-off-by: Julien Grall 
> Acked-by: Christoffer Dall 
> 
> ---

Acked-by: Daniel Lezcano

Re: [PATCH v6 01/10] clocksource: arm_arch_timer: Gather KVM specific information in a structure

2016-04-11 Thread Daniel Lezcano

On Mon, Apr 11, 2016 at 04:32:51PM +0100, Julien Grall wrote:
> Introduce a structure which are filled up by the arch timer driver and
> used by the virtual timer in KVM.
> 
> The first member of this structure will be the timecounter. More members
> will be added later.
> 
> A stub for the new helper isn't introduced because KVM requires the arch
> timer for both ARM64 and ARM32.
> 
> The function arch_timer_get_timecounter is kept for the time being and
> will be dropped in a subsequent patch.
> 
> Signed-off-by: Julien Grall 
> Acked-by: Christoffer Dall 
> 

Acked-by: Daniel Lezcano

Re: [PATCH v6 01/10] clocksource: arm_arch_timer: Gather KVM specific information in a structure

2016-04-11 Thread Daniel Lezcano

On Mon, Apr 11, 2016 at 04:32:51PM +0100, Julien Grall wrote:
> Introduce a structure which are filled up by the arch timer driver and
> used by the virtual timer in KVM.
> 
> The first member of this structure will be the timecounter. More members
> will be added later.
> 
> A stub for the new helper isn't introduced because KVM requires the arch
> timer for both ARM64 and ARM32.
> 
> The function arch_timer_get_timecounter is kept for the time being and
> will be dropped in a subsequent patch.
> 
> Signed-off-by: Julien Grall 
> Acked-by: Christoffer Dall 
> 

Acked-by: Daniel Lezcano

Re: [PATCH v6 0/3] thermal: mediatek: Add cpu dynamic power cooling model.

2016-04-11 Thread Viresh Kumar

On 12-04-16, 13:24, dawei chien wrote:
> Please refer to following for my resending, thank you.
> 
> https://lkml.org/lkml/2016/3/15/101
> https://patchwork.kernel.org/patch/8586131/
> https://patchwork.kernel.org/patch/8586111/
> https://patchwork.kernel.org/patch/8586081/

Oh, you were continuously sending new ping requests on the old thread.
You should have used the new thread instead :)

Anyway, I have pinged Rafael over the new thread now.

-- 
viresh

Re: [PATCH v6 0/3] thermal: mediatek: Add cpu dynamic power cooling model.

2016-04-11 Thread Viresh Kumar

On 12-04-16, 13:24, dawei chien wrote:
> Please refer to following for my resending, thank you.
> 
> https://lkml.org/lkml/2016/3/15/101
> https://patchwork.kernel.org/patch/8586131/
> https://patchwork.kernel.org/patch/8586111/
> https://patchwork.kernel.org/patch/8586081/

Oh, you were continuously sending new ping requests on the old thread.
You should have used the new thread instead :)

Anyway, I have pinged Rafael over the new thread now.

-- 
viresh

Re: [RESEND][PATCH 1/3] thermal: mediatek: Add cpu dynamic power cooling model.

2016-04-11 Thread Viresh Kumar

Hi Rafael,

On 15-03-16, 16:10, Dawei Chien wrote:
> MT8173 cpufreq driver select of_cpufreq_power_cooling_register registering
> cooling devices with dynamic power coefficient.
> 
> Signed-off-by: Dawei Chien 
> Acked-by: Viresh Kumar 

Can you please apply this patch from Dawei ?

-- 
viresh

Re: [RESEND][PATCH 1/3] thermal: mediatek: Add cpu dynamic power cooling model.

2016-04-11 Thread Viresh Kumar

Hi Rafael,

On 15-03-16, 16:10, Dawei Chien wrote:
> MT8173 cpufreq driver select of_cpufreq_power_cooling_register registering
> cooling devices with dynamic power coefficient.
> 
> Signed-off-by: Dawei Chien 
> Acked-by: Viresh Kumar 

Can you please apply this patch from Dawei ?

-- 
viresh

[PATCH RFC v0 03/12] Add cyclic support to rtc-dev.c

2016-04-11 Thread Bill Huey (hui)

wait-queue changes to rtc_dev_read so that it can support overrun count
reporting when multiple threads are blocked against a single wait object.

ioctl() additions to allow for those calling it to admit the thread to the
cyclic scheduler.

Signed-off-by: Bill Huey (hui) 
---
 drivers/rtc/rtc-dev.c | 161 ++
 1 file changed, 161 insertions(+)

diff --git a/drivers/rtc/rtc-dev.c b/drivers/rtc/rtc-dev.c
index a6d9434..0fc9a8c 100644
--- a/drivers/rtc/rtc-dev.c
+++ b/drivers/rtc/rtc-dev.c
@@ -18,6 +18,15 @@
 #include 
 #include "rtc-core.h"
 
+#ifdef CONFIG_RTC_CYCLIC
+#include 
+#include 
+
+#include <../kernel/sched/sched.h>
+#include <../kernel/sched/cyclic.h>
+//#include <../kernel/sched/cyclic_rt.h>
+#endif
+
 static dev_t rtc_devt;
 
 #define RTC_DEV_MAX 16 /* 16 RTCs should be enough for everyone... */
@@ -29,6 +38,10 @@ static int rtc_dev_open(struct inode *inode, struct file 
*file)
struct rtc_device, char_dev);
const struct rtc_class_ops *ops = rtc->ops;
 
+#ifdef CONFIG_RTC_CYCLIC
+   reset_rt_overrun();
+#endif
+
if (test_and_set_bit_lock(RTC_DEV_BUSY, >flags))
return -EBUSY;
 
@@ -153,13 +166,26 @@ rtc_dev_read(struct file *file, char __user *buf, size_t 
count, loff_t *ppos)
 {
struct rtc_device *rtc = file->private_data;
 
+#ifdef CONFIG_RTC_CYCLIC
+   DEFINE_WAIT_FUNC(wait, single_default_wake_function);
+#else
DECLARE_WAITQUEUE(wait, current);
+#endif
unsigned long data;
+   unsigned long flags;
+#ifdef CONFIG_RTC_CYCLIC
+   int wake = 0, block = 0;
+#endif
ssize_t ret;
 
if (count != sizeof(unsigned int) && count < sizeof(unsigned long))
return -EINVAL;
 
+#ifdef CONFIG_RTC_CYCLIC
+   if (rt_overrun_task_yield(current))
+   goto yield;
+#endif
+printk("%s: 0 color = %d \n", __func__, current->rt.rt_overrun.color);
add_wait_queue(>irq_queue, );
do {
__set_current_state(TASK_INTERRUPTIBLE);
@@ -169,23 +195,59 @@ rtc_dev_read(struct file *file, char __user *buf, size_t 
count, loff_t *ppos)
rtc->irq_data = 0;
spin_unlock_irq(>irq_lock);
 
+if (block) {
+   block = 0;
+   if (wake) {
+   printk("%s: wake \n", __func__);
+   wake = 0;
+   } else {
+   printk("%s: ~wake \n", __func__);
+   }
+}
if (data != 0) {
+#ifdef CONFIG_RTC_CYCLIC
+   /* overrun reporting */
+   raw_spin_lock_irqsave(_overrun_lock, flags);
+   if (_on_rt_overrun_admitted(current)) {
+   /* pass back to userspace */
+   data = rt_task_count(current);
+   rt_task_count(current) = 0;
+   }
+   raw_spin_unlock_irqrestore(_overrun_lock, flags);
+   ret = 0;
+printk("%s: 1 color = %d \n", __func__, current->rt.rt_overrun.color);
+   break;
+   }
+#else
ret = 0;
break;
}
+#endif
if (file->f_flags & O_NONBLOCK) {
ret = -EAGAIN;
+printk("%s: 2 color = %d \n", __func__, current->rt.rt_overrun.color);
break;
}
if (signal_pending(current)) {
+printk("%s: 3 color = %d \n", __func__, current->rt.rt_overrun.color);
ret = -ERESTARTSYS;
break;
}
+#ifdef CONFIG_RTC_CYCLIC
+   block = 1;
+#endif
schedule();
+#ifdef CONFIG_RTC_CYCLIC
+   /* debugging */
+   wake = 1;
+#endif
} while (1);
set_current_state(TASK_RUNNING);
remove_wait_queue(>irq_queue, );
 
+#ifdef CONFIG_RTC_CYCLIC
+ret:
+#endif
if (ret == 0) {
/* Check for any data updates */
if (rtc->ops->read_callback)
@@ -201,6 +263,29 @@ rtc_dev_read(struct file *file, char __user *buf, size_t 
count, loff_t *ppos)
sizeof(unsigned long);
}
return ret;
+
+#ifdef CONFIG_RTC_CYCLIC
+yield:
+
+   spin_lock_irq(>irq_lock);
+   data = rtc->irq_data;
+   rtc->irq_data = 0;
+   spin_unlock_irq(>irq_lock);
+
+   raw_spin_lock_irqsave(_overrun_lock, flags);
+   if (_on_rt_overrun_admitted(current)) {
+   /* pass back to userspace */
+   data = rt_task_count(current);
+   rt_task_count(current) = 0;
+   }
+   else {
+   }
+
+   raw_spin_unlock_irqrestore(_overrun_lock, flags);
+   ret = 0;
+
+   goto ret;
+#endif
 }
 
 static unsigned int rtc_dev_poll(struct file *file, poll_table *wait)
@@ -215,6 +300,56 @@ static unsigned int rtc_dev_poll(struct file

[PATCH RFC v0 05/12] Task tracking per file descriptor

2016-04-11 Thread Bill Huey (hui)

Task tracking per file descriptor for thread death clean up.

Signed-off-by: Bill Huey (hui) 
---
 drivers/rtc/class.c | 3 +++
 include/linux/rtc.h | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/drivers/rtc/class.c b/drivers/rtc/class.c
index 74fd974..ad570b9 100644
--- a/drivers/rtc/class.c
+++ b/drivers/rtc/class.c
@@ -201,6 +201,9 @@ struct rtc_device *rtc_device_register(const char *name, 
struct device *dev,
rtc->irq_freq = 1;
rtc->max_user_freq = 64;
rtc->dev.parent = dev;
+#ifdef CONFIG_RTC_CYCLIC
+   INIT_LIST_HEAD(>rt_overrun_tasks); //struct list_head
+#endif
rtc->dev.class = rtc_class;
rtc->dev.groups = rtc_get_dev_attribute_groups();
rtc->dev.release = rtc_device_release;
diff --git a/include/linux/rtc.h b/include/linux/rtc.h
index b693ada..1424550 100644
--- a/include/linux/rtc.h
+++ b/include/linux/rtc.h
@@ -114,6 +114,9 @@ struct rtc_timer {
 struct rtc_device {
struct device dev;
struct module *owner;
+#ifdef CONFIG_RTC_CYCLIC
+   struct list_head rt_overrun_tasks;
+#endif
 
int id;
char name[RTC_DEVICE_NAME_SIZE];
-- 
2.5.0

[PATCH RFC v0 03/12] Add cyclic support to rtc-dev.c

2016-04-11 Thread Bill Huey (hui)

wait-queue changes to rtc_dev_read so that it can support overrun count
reporting when multiple threads are blocked against a single wait object.

ioctl() additions to allow for those calling it to admit the thread to the
cyclic scheduler.

Signed-off-by: Bill Huey (hui) 
---
 drivers/rtc/rtc-dev.c | 161 ++
 1 file changed, 161 insertions(+)

diff --git a/drivers/rtc/rtc-dev.c b/drivers/rtc/rtc-dev.c
index a6d9434..0fc9a8c 100644
--- a/drivers/rtc/rtc-dev.c
+++ b/drivers/rtc/rtc-dev.c
@@ -18,6 +18,15 @@
 #include 
 #include "rtc-core.h"
 
+#ifdef CONFIG_RTC_CYCLIC
+#include 
+#include 
+
+#include <../kernel/sched/sched.h>
+#include <../kernel/sched/cyclic.h>
+//#include <../kernel/sched/cyclic_rt.h>
+#endif
+
 static dev_t rtc_devt;
 
 #define RTC_DEV_MAX 16 /* 16 RTCs should be enough for everyone... */
@@ -29,6 +38,10 @@ static int rtc_dev_open(struct inode *inode, struct file 
*file)
struct rtc_device, char_dev);
const struct rtc_class_ops *ops = rtc->ops;
 
+#ifdef CONFIG_RTC_CYCLIC
+   reset_rt_overrun();
+#endif
+
if (test_and_set_bit_lock(RTC_DEV_BUSY, >flags))
return -EBUSY;
 
@@ -153,13 +166,26 @@ rtc_dev_read(struct file *file, char __user *buf, size_t 
count, loff_t *ppos)
 {
struct rtc_device *rtc = file->private_data;
 
+#ifdef CONFIG_RTC_CYCLIC
+   DEFINE_WAIT_FUNC(wait, single_default_wake_function);
+#else
DECLARE_WAITQUEUE(wait, current);
+#endif
unsigned long data;
+   unsigned long flags;
+#ifdef CONFIG_RTC_CYCLIC
+   int wake = 0, block = 0;
+#endif
ssize_t ret;
 
if (count != sizeof(unsigned int) && count < sizeof(unsigned long))
return -EINVAL;
 
+#ifdef CONFIG_RTC_CYCLIC
+   if (rt_overrun_task_yield(current))
+   goto yield;
+#endif
+printk("%s: 0 color = %d \n", __func__, current->rt.rt_overrun.color);
add_wait_queue(>irq_queue, );
do {
__set_current_state(TASK_INTERRUPTIBLE);
@@ -169,23 +195,59 @@ rtc_dev_read(struct file *file, char __user *buf, size_t 
count, loff_t *ppos)
rtc->irq_data = 0;
spin_unlock_irq(>irq_lock);
 
+if (block) {
+   block = 0;
+   if (wake) {
+   printk("%s: wake \n", __func__);
+   wake = 0;
+   } else {
+   printk("%s: ~wake \n", __func__);
+   }
+}
if (data != 0) {
+#ifdef CONFIG_RTC_CYCLIC
+   /* overrun reporting */
+   raw_spin_lock_irqsave(_overrun_lock, flags);
+   if (_on_rt_overrun_admitted(current)) {
+   /* pass back to userspace */
+   data = rt_task_count(current);
+   rt_task_count(current) = 0;
+   }
+   raw_spin_unlock_irqrestore(_overrun_lock, flags);
+   ret = 0;
+printk("%s: 1 color = %d \n", __func__, current->rt.rt_overrun.color);
+   break;
+   }
+#else
ret = 0;
break;
}
+#endif
if (file->f_flags & O_NONBLOCK) {
ret = -EAGAIN;
+printk("%s: 2 color = %d \n", __func__, current->rt.rt_overrun.color);
break;
}
if (signal_pending(current)) {
+printk("%s: 3 color = %d \n", __func__, current->rt.rt_overrun.color);
ret = -ERESTARTSYS;
break;
}
+#ifdef CONFIG_RTC_CYCLIC
+   block = 1;
+#endif
schedule();
+#ifdef CONFIG_RTC_CYCLIC
+   /* debugging */
+   wake = 1;
+#endif
} while (1);
set_current_state(TASK_RUNNING);
remove_wait_queue(>irq_queue, );
 
+#ifdef CONFIG_RTC_CYCLIC
+ret:
+#endif
if (ret == 0) {
/* Check for any data updates */
if (rtc->ops->read_callback)
@@ -201,6 +263,29 @@ rtc_dev_read(struct file *file, char __user *buf, size_t 
count, loff_t *ppos)
sizeof(unsigned long);
}
return ret;
+
+#ifdef CONFIG_RTC_CYCLIC
+yield:
+
+   spin_lock_irq(>irq_lock);
+   data = rtc->irq_data;
+   rtc->irq_data = 0;
+   spin_unlock_irq(>irq_lock);
+
+   raw_spin_lock_irqsave(_overrun_lock, flags);
+   if (_on_rt_overrun_admitted(current)) {
+   /* pass back to userspace */
+   data = rt_task_count(current);
+   rt_task_count(current) = 0;
+   }
+   else {
+   }
+
+   raw_spin_unlock_irqrestore(_overrun_lock, flags);
+   ret = 0;
+
+   goto ret;
+#endif
 }
 
 static unsigned int rtc_dev_poll(struct file *file, poll_table *wait)
@@ -215,6 +300,56 @@ static unsigned int rtc_dev_poll(struct file *file, 
poll_table

[PATCH RFC v0 05/12] Task tracking per file descriptor

2016-04-11 Thread Bill Huey (hui)

Task tracking per file descriptor for thread death clean up.

Signed-off-by: Bill Huey (hui) 
---
 drivers/rtc/class.c | 3 +++
 include/linux/rtc.h | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/drivers/rtc/class.c b/drivers/rtc/class.c
index 74fd974..ad570b9 100644
--- a/drivers/rtc/class.c
+++ b/drivers/rtc/class.c
@@ -201,6 +201,9 @@ struct rtc_device *rtc_device_register(const char *name, 
struct device *dev,
rtc->irq_freq = 1;
rtc->max_user_freq = 64;
rtc->dev.parent = dev;
+#ifdef CONFIG_RTC_CYCLIC
+   INIT_LIST_HEAD(>rt_overrun_tasks); //struct list_head
+#endif
rtc->dev.class = rtc_class;
rtc->dev.groups = rtc_get_dev_attribute_groups();
rtc->dev.release = rtc_device_release;
diff --git a/include/linux/rtc.h b/include/linux/rtc.h
index b693ada..1424550 100644
--- a/include/linux/rtc.h
+++ b/include/linux/rtc.h
@@ -114,6 +114,9 @@ struct rtc_timer {
 struct rtc_device {
struct device dev;
struct module *owner;
+#ifdef CONFIG_RTC_CYCLIC
+   struct list_head rt_overrun_tasks;
+#endif
 
int id;
char name[RTC_DEVICE_NAME_SIZE];
-- 
2.5.0

[PATCH RFC v0 10/12] Export SCHED_FIFO/RT requeuing functions

2016-04-11 Thread Bill Huey (hui)

SCHED_FIFO/RT tail/head runqueue insertion support, initial thread death
support via a hook to the scheduler class. Thread death must include
additional semantics to remove/discharge an admitted task properly.

Signed-off-by: Bill Huey (hui) 
---
 kernel/sched/rt.c | 41 +
 1 file changed, 41 insertions(+)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index c41ea7a..1d77adc 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -8,6 +8,11 @@
 #include 
 #include 
 
+#ifdef CONFIG_RTC_CYCLIC
+#include "cyclic.h"
+extern int rt_overrun_task_admitted1(struct rq *rq, struct task_struct *p);
+#endif
+
 int sched_rr_timeslice = RR_TIMESLICE;
 
 static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun);
@@ -1321,8 +1326,18 @@ enqueue_task_rt(struct rq *rq, struct task_struct *p, 
int flags)
 
if (flags & ENQUEUE_WAKEUP)
rt_se->timeout = 0;
+#ifdef CONFIG_RTC_CYCLIC
+   /* if admitted and the current slot then head, otherwise tail */
+   if (rt_overrun_task_admitted1(rq, p)) {
+   if (rt_overrun_task_active(p)) {
+   flags |= ENQUEUE_HEAD;
+   }
+   }
 
enqueue_rt_entity(rt_se, flags);
+#else
+   enqueue_rt_entity(rt_se, flags & ENQUEUE_HEAD);
+#endif
 
if (!task_current(rq, p) && p->nr_cpus_allowed > 1)
enqueue_pushable_task(rq, p);
@@ -1367,6 +1382,18 @@ static void requeue_task_rt(struct rq *rq, struct 
task_struct *p, int head)
}
 }
 
+#ifdef CONFIG_RTC_CYCLIC
+void dequeue_task_rt2(struct rq *rq, struct task_struct *p, int flags)
+{
+   dequeue_task_rt(rq, p, flags);
+}
+
+void requeue_task_rt2(struct rq *rq, struct task_struct *p, int head)
+{
+   requeue_task_rt(rq, p, head);
+}
+#endif
+
 static void yield_task_rt(struct rq *rq)
 {
requeue_task_rt(rq, rq->curr, 0);
@@ -2177,6 +2204,10 @@ void __init init_sched_rt_class(void)
zalloc_cpumask_var_node(_cpu(local_cpu_mask, i),
GFP_KERNEL, cpu_to_node(i));
}
+
+#ifdef CONFIG_RTC_CYCLIC
+   init_rt_overrun();
+#endif
 }
 #endif /* CONFIG_SMP */
 
@@ -2322,6 +2353,13 @@ static unsigned int get_rr_interval_rt(struct rq *rq, 
struct task_struct *task)
return 0;
 }
 
+#ifdef CONFIG_RTC_CYCLIC
+static void task_dead_rt(struct task_struct *p)
+{
+   rt_overrun_entry_delete(p);
+}
+#endif
+
 const struct sched_class rt_sched_class = {
.next   = _sched_class,
.enqueue_task   = enqueue_task_rt,
@@ -2344,6 +2382,9 @@ const struct sched_class rt_sched_class = {
 #endif
 
.set_curr_task  = set_curr_task_rt,
+#ifdef CONFIG_RTC_CYCLIC
+   .task_dead  = task_dead_rt,
+#endif
.task_tick  = task_tick_rt,
 
.get_rr_interval= get_rr_interval_rt,
-- 
2.5.0

[PATCH RFC v0 06/12] Add anonymous struct to sched_rt_entity

2016-04-11 Thread Bill Huey (hui)

Add an anonymous struct to support admittance using a red-black tree,
overrun tracking, state for whether or not to yield or block, debugging
support, execution slot pattern for the scheduler.

Signed-off-by: Bill Huey (hui) 
---
 include/linux/sched.h | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 084ed9f..cff56c6 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1305,6 +1305,21 @@ struct sched_rt_entity {
/* rq "owned" by this entity/group: */
struct rt_rq*my_q;
 #endif
+#ifdef CONFIG_RTC_CYCLIC
+   struct {
+   struct rb_node node; /* admittance structure */
+   struct list_head task_list;
+   unsigned long count; /* overrun count per slot */
+   int type, color, yield;
+   u64 slots;
+
+   /* debug */
+   unsigned long last_task_state;
+
+   /* instrumentation  */
+   unsigned int machine_state, last_machine_state;
+   } rt_overrun;
+#endif
 };
 
 struct sched_dl_entity {
-- 
2.5.0

[PATCH RFC v0 02/12] Reroute rtc update irqs to the cyclic scheduler handler

2016-04-11 Thread Bill Huey (hui)

Redirect rtc update irqs so that it drives the cyclic scheduler timer
handler instead. Let the handler determine which slot to activate next.
Similar to scheduler tick handling but just for the cyclic scheduler.

Signed-off-by: Bill Huey (hui) 
---
 drivers/rtc/interface.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/drivers/rtc/interface.c b/drivers/rtc/interface.c
index 9ef5f6f..6d39d40 100644
--- a/drivers/rtc/interface.c
+++ b/drivers/rtc/interface.c
@@ -17,6 +17,10 @@
 #include 
 #include 
 
+#ifdef CONFIG_RTC_CYCLIC
+#include "../kernel/sched/cyclic.h"
+#endif
+
 static int rtc_timer_enqueue(struct rtc_device *rtc, struct rtc_timer *timer);
 static void rtc_timer_remove(struct rtc_device *rtc, struct rtc_timer *timer);
 
@@ -488,6 +492,9 @@ EXPORT_SYMBOL_GPL(rtc_update_irq_enable);
 void rtc_handle_legacy_irq(struct rtc_device *rtc, int num, int mode)
 {
unsigned long flags;
+#ifdef CONFIG_RTC_CYCLIC
+   int handled = 0;
+#endif
 
/* mark one irq of the appropriate mode */
spin_lock_irqsave(>irq_lock, flags);
@@ -500,7 +507,23 @@ void rtc_handle_legacy_irq(struct rtc_device *rtc, int 
num, int mode)
rtc->irq_task->func(rtc->irq_task->private_data);
spin_unlock_irqrestore(>irq_task_lock, flags);
 
+#ifdef CONFIG_RTC_CYCLIC
+   /* wake up slot_curr if overrun task */
+   if (RTC_PF) {
+   if (rt_overrun_rq_admitted()) {
+   /* advance the cursor, overrun report */
+   rt_overrun_timer_handler(rtc);
+   handled = 1;
+   }
+   }
+
+   if (!handled) {
+   wake_up_interruptible(>irq_queue);
+   }
+#else
wake_up_interruptible(>irq_queue);
+#endif
+
kill_fasync(>async_queue, SIGIO, POLL_IN);
 }
 
-- 
2.5.0

Re: [PATCH] MAINTAINERS: correct entry for LVM

2016-04-11 Thread Sudip Mukherjee


On Tuesday 12 April 2016 12:20 AM, Wols Lists wrote:

On 11/04/16 17:39, Sudip Mukherjee wrote:

On Monday 11 April 2016 09:53 PM, Alasdair G Kergon wrote:

On Mon, Apr 11, 2016 at 09:45:01PM +0530, Sudip Mukherjee wrote:

L stands for "Mailing list that is relevant to this area", and this is a
mailing list. :)


Your proposed patch isn't changing the L entry, so this is of no
relevance.


Sorry, I am not understanding.

The current entry in MAINTAINERS is:
DEVICE-MAPPER  (LVM)
M:  Alasdair Kergon 
M:  Mike Snitzer 
M:  dm-de...@redhat.com
L:  dm-de...@redhat.com
...

So my patch just removed the line : "M:  dm-de...@redhat.com"

So now the entry becomes :
DEVICE-MAPPER  (LVM)
M:  Alasdair Kergon 
M:  Mike Snitzer 
L:  dm-de...@redhat.com
...

So, now it correctly shows dm-de...@redhat.com as a mailing list which
should have cc to all the patches related to LVM.

Or am I understanding this wrong?


Yes. Because (I guess M stands for maintainer) this list has maintainer
status. As all patches should be sent to the maintainers therefore all
patches should be sent to this list.

The same person can appear twice in a phone book, once under their name
and once under their job title. This is exactly the same situation -
this list should appear once as a list to tell people that it's a list,
AND ALSO as a maintainer to tell people that patches must be sent to the
list.

I guess English is not your first language, but the important point is
that M and L are not mutually exclusive.


Don't worry, English is my first language. Have you tried with 
getmaintainer.pl and seen the result? It only shows dm-de...@redhat.com 
as a Maintainer and not as a list. (I noticed because I was sending a 
patch, and hence this patch again). But I believe a mailing list can not 
be a Maintainer ( have you seen any patch with a Signed-off-by: from a 
mailing list? ).

Anyway, I think this thread has become too long for an unimportant patch.

regards
sudip

[PATCH RFC v0 10/12] Export SCHED_FIFO/RT requeuing functions

2016-04-11 Thread Bill Huey (hui)

SCHED_FIFO/RT tail/head runqueue insertion support, initial thread death
support via a hook to the scheduler class. Thread death must include
additional semantics to remove/discharge an admitted task properly.

Signed-off-by: Bill Huey (hui) 
---
 kernel/sched/rt.c | 41 +
 1 file changed, 41 insertions(+)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index c41ea7a..1d77adc 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -8,6 +8,11 @@
 #include 
 #include 
 
+#ifdef CONFIG_RTC_CYCLIC
+#include "cyclic.h"
+extern int rt_overrun_task_admitted1(struct rq *rq, struct task_struct *p);
+#endif
+
 int sched_rr_timeslice = RR_TIMESLICE;
 
 static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun);
@@ -1321,8 +1326,18 @@ enqueue_task_rt(struct rq *rq, struct task_struct *p, 
int flags)
 
if (flags & ENQUEUE_WAKEUP)
rt_se->timeout = 0;
+#ifdef CONFIG_RTC_CYCLIC
+   /* if admitted and the current slot then head, otherwise tail */
+   if (rt_overrun_task_admitted1(rq, p)) {
+   if (rt_overrun_task_active(p)) {
+   flags |= ENQUEUE_HEAD;
+   }
+   }
 
enqueue_rt_entity(rt_se, flags);
+#else
+   enqueue_rt_entity(rt_se, flags & ENQUEUE_HEAD);
+#endif
 
if (!task_current(rq, p) && p->nr_cpus_allowed > 1)
enqueue_pushable_task(rq, p);
@@ -1367,6 +1382,18 @@ static void requeue_task_rt(struct rq *rq, struct 
task_struct *p, int head)
}
 }
 
+#ifdef CONFIG_RTC_CYCLIC
+void dequeue_task_rt2(struct rq *rq, struct task_struct *p, int flags)
+{
+   dequeue_task_rt(rq, p, flags);
+}
+
+void requeue_task_rt2(struct rq *rq, struct task_struct *p, int head)
+{
+   requeue_task_rt(rq, p, head);
+}
+#endif
+
 static void yield_task_rt(struct rq *rq)
 {
requeue_task_rt(rq, rq->curr, 0);
@@ -2177,6 +2204,10 @@ void __init init_sched_rt_class(void)
zalloc_cpumask_var_node(_cpu(local_cpu_mask, i),
GFP_KERNEL, cpu_to_node(i));
}
+
+#ifdef CONFIG_RTC_CYCLIC
+   init_rt_overrun();
+#endif
 }
 #endif /* CONFIG_SMP */
 
@@ -2322,6 +2353,13 @@ static unsigned int get_rr_interval_rt(struct rq *rq, 
struct task_struct *task)
return 0;
 }
 
+#ifdef CONFIG_RTC_CYCLIC
+static void task_dead_rt(struct task_struct *p)
+{
+   rt_overrun_entry_delete(p);
+}
+#endif
+
 const struct sched_class rt_sched_class = {
.next   = _sched_class,
.enqueue_task   = enqueue_task_rt,
@@ -2344,6 +2382,9 @@ const struct sched_class rt_sched_class = {
 #endif
 
.set_curr_task  = set_curr_task_rt,
+#ifdef CONFIG_RTC_CYCLIC
+   .task_dead  = task_dead_rt,
+#endif
.task_tick  = task_tick_rt,
 
.get_rr_interval= get_rr_interval_rt,
-- 
2.5.0

[PATCH RFC v0 06/12] Add anonymous struct to sched_rt_entity

2016-04-11 Thread Bill Huey (hui)

Add an anonymous struct to support admittance using a red-black tree,
overrun tracking, state for whether or not to yield or block, debugging
support, execution slot pattern for the scheduler.

Signed-off-by: Bill Huey (hui) 
---
 include/linux/sched.h | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 084ed9f..cff56c6 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1305,6 +1305,21 @@ struct sched_rt_entity {
/* rq "owned" by this entity/group: */
struct rt_rq*my_q;
 #endif
+#ifdef CONFIG_RTC_CYCLIC
+   struct {
+   struct rb_node node; /* admittance structure */
+   struct list_head task_list;
+   unsigned long count; /* overrun count per slot */
+   int type, color, yield;
+   u64 slots;
+
+   /* debug */
+   unsigned long last_task_state;
+
+   /* instrumentation  */
+   unsigned int machine_state, last_machine_state;
+   } rt_overrun;
+#endif
 };
 
 struct sched_dl_entity {
-- 
2.5.0

[PATCH RFC v0 02/12] Reroute rtc update irqs to the cyclic scheduler handler

2016-04-11 Thread Bill Huey (hui)

Redirect rtc update irqs so that it drives the cyclic scheduler timer
handler instead. Let the handler determine which slot to activate next.
Similar to scheduler tick handling but just for the cyclic scheduler.

Signed-off-by: Bill Huey (hui) 
---
 drivers/rtc/interface.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/drivers/rtc/interface.c b/drivers/rtc/interface.c
index 9ef5f6f..6d39d40 100644
--- a/drivers/rtc/interface.c
+++ b/drivers/rtc/interface.c
@@ -17,6 +17,10 @@
 #include 
 #include 
 
+#ifdef CONFIG_RTC_CYCLIC
+#include "../kernel/sched/cyclic.h"
+#endif
+
 static int rtc_timer_enqueue(struct rtc_device *rtc, struct rtc_timer *timer);
 static void rtc_timer_remove(struct rtc_device *rtc, struct rtc_timer *timer);
 
@@ -488,6 +492,9 @@ EXPORT_SYMBOL_GPL(rtc_update_irq_enable);
 void rtc_handle_legacy_irq(struct rtc_device *rtc, int num, int mode)
 {
unsigned long flags;
+#ifdef CONFIG_RTC_CYCLIC
+   int handled = 0;
+#endif
 
/* mark one irq of the appropriate mode */
spin_lock_irqsave(>irq_lock, flags);
@@ -500,7 +507,23 @@ void rtc_handle_legacy_irq(struct rtc_device *rtc, int 
num, int mode)
rtc->irq_task->func(rtc->irq_task->private_data);
spin_unlock_irqrestore(>irq_task_lock, flags);
 
+#ifdef CONFIG_RTC_CYCLIC
+   /* wake up slot_curr if overrun task */
+   if (RTC_PF) {
+   if (rt_overrun_rq_admitted()) {
+   /* advance the cursor, overrun report */
+   rt_overrun_timer_handler(rtc);
+   handled = 1;
+   }
+   }
+
+   if (!handled) {
+   wake_up_interruptible(>irq_queue);
+   }
+#else
wake_up_interruptible(>irq_queue);
+#endif
+
kill_fasync(>async_queue, SIGIO, POLL_IN);
 }
 
-- 
2.5.0

Re: [PATCH] MAINTAINERS: correct entry for LVM

2016-04-11 Thread Sudip Mukherjee


On Tuesday 12 April 2016 12:20 AM, Wols Lists wrote:

On 11/04/16 17:39, Sudip Mukherjee wrote:

On Monday 11 April 2016 09:53 PM, Alasdair G Kergon wrote:

On Mon, Apr 11, 2016 at 09:45:01PM +0530, Sudip Mukherjee wrote:

L stands for "Mailing list that is relevant to this area", and this is a
mailing list. :)


Your proposed patch isn't changing the L entry, so this is of no
relevance.


Sorry, I am not understanding.

The current entry in MAINTAINERS is:
DEVICE-MAPPER  (LVM)
M:  Alasdair Kergon 
M:  Mike Snitzer 
M:  dm-de...@redhat.com
L:  dm-de...@redhat.com
...

So my patch just removed the line : "M:  dm-de...@redhat.com"

So now the entry becomes :
DEVICE-MAPPER  (LVM)
M:  Alasdair Kergon 
M:  Mike Snitzer 
L:  dm-de...@redhat.com
...

So, now it correctly shows dm-de...@redhat.com as a mailing list which
should have cc to all the patches related to LVM.

Or am I understanding this wrong?


Yes. Because (I guess M stands for maintainer) this list has maintainer
status. As all patches should be sent to the maintainers therefore all
patches should be sent to this list.

The same person can appear twice in a phone book, once under their name
and once under their job title. This is exactly the same situation -
this list should appear once as a list to tell people that it's a list,
AND ALSO as a maintainer to tell people that patches must be sent to the
list.

I guess English is not your first language, but the important point is
that M and L are not mutually exclusive.


Don't worry, English is my first language. Have you tried with 
getmaintainer.pl and seen the result? It only shows dm-de...@redhat.com 
as a Maintainer and not as a list. (I noticed because I was sending a 
patch, and hence this patch again). But I believe a mailing list can not 
be a Maintainer ( have you seen any patch with a Signed-off-by: from a 
mailing list? ).

Anyway, I think this thread has become too long for an unimportant patch.

regards
sudip

[PATCH RFC v0 01/12] Kconfig change

2016-04-11 Thread Bill Huey (hui)

Add the selection options for the cyclic scheduler

Signed-off-by: Bill Huey (hui) 
---
 drivers/rtc/Kconfig | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/rtc/Kconfig b/drivers/rtc/Kconfig
index 544bd34..8a1b704 100644
--- a/drivers/rtc/Kconfig
+++ b/drivers/rtc/Kconfig
@@ -73,6 +73,11 @@ config RTC_DEBUG
  Say yes here to enable debugging support in the RTC framework
  and individual RTC drivers.
 
+config RTC_CYCLIC
+   bool "RTC cyclic executive scheduler support"
+   help
+ Frame/Cyclic executive scheduler support through the RTC interface
+
 comment "RTC interfaces"
 
 config RTC_INTF_SYSFS
-- 
2.5.0

[PATCH RFC v0 08/12] Compilation support

2016-04-11 Thread Bill Huey (hui)

Makefile changes to support the menuconfig option

Signed-off-by: Bill Huey (hui) 
---
 kernel/sched/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile
index 302d6eb..df8e131 100644
--- a/kernel/sched/Makefile
+++ b/kernel/sched/Makefile
@@ -19,4 +19,5 @@ obj-$(CONFIG_SCHED_AUTOGROUP) += auto_group.o
 obj-$(CONFIG_SCHEDSTATS) += stats.o
 obj-$(CONFIG_SCHED_DEBUG) += debug.o
 obj-$(CONFIG_CGROUP_CPUACCT) += cpuacct.o
+obj-$(CONFIG_RTC_CYCLIC) += cyclic.o
 obj-$(CONFIG_CPU_FREQ) += cpufreq.o
-- 
2.5.0

[PATCH RFC v0 01/12] Kconfig change

2016-04-11 Thread Bill Huey (hui)

Add the selection options for the cyclic scheduler

Signed-off-by: Bill Huey (hui) 
---
 drivers/rtc/Kconfig | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/rtc/Kconfig b/drivers/rtc/Kconfig
index 544bd34..8a1b704 100644
--- a/drivers/rtc/Kconfig
+++ b/drivers/rtc/Kconfig
@@ -73,6 +73,11 @@ config RTC_DEBUG
  Say yes here to enable debugging support in the RTC framework
  and individual RTC drivers.
 
+config RTC_CYCLIC
+   bool "RTC cyclic executive scheduler support"
+   help
+ Frame/Cyclic executive scheduler support through the RTC interface
+
 comment "RTC interfaces"
 
 config RTC_INTF_SYSFS
-- 
2.5.0

[PATCH RFC v0 08/12] Compilation support

2016-04-11 Thread Bill Huey (hui)

Makefile changes to support the menuconfig option

Signed-off-by: Bill Huey (hui) 
---
 kernel/sched/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile
index 302d6eb..df8e131 100644
--- a/kernel/sched/Makefile
+++ b/kernel/sched/Makefile
@@ -19,4 +19,5 @@ obj-$(CONFIG_SCHED_AUTOGROUP) += auto_group.o
 obj-$(CONFIG_SCHEDSTATS) += stats.o
 obj-$(CONFIG_SCHED_DEBUG) += debug.o
 obj-$(CONFIG_CGROUP_CPUACCT) += cpuacct.o
+obj-$(CONFIG_RTC_CYCLIC) += cyclic.o
 obj-$(CONFIG_CPU_FREQ) += cpufreq.o
-- 
2.5.0

[PATCH RFC v0 07/12] kernel/userspace additions for addition ioctl() support for rtc

2016-04-11 Thread Bill Huey (hui)

Add additional ioctl() values to rtc so that it can 'admit' the calling
thread into a red-black tree for tracking, set the execution slot pattern,
support for setting whether read() will yield or block.

Signed-off-by: Bill Huey (hui) 
---
 include/uapi/linux/rtc.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/uapi/linux/rtc.h b/include/uapi/linux/rtc.h
index f8c82e6..76c9254 100644
--- a/include/uapi/linux/rtc.h
+++ b/include/uapi/linux/rtc.h
@@ -94,6 +94,10 @@ struct rtc_pll_info {
 #define RTC_VL_READ_IOR('p', 0x13, int)/* Voltage low detector */
 #define RTC_VL_CLR _IO('p', 0x14)  /* Clear voltage low 
information */
 
+#define RTC_OV_ADMIT   _IOW('p', 0x15, unsigned long)   /* Set test   */
+#define RTC_OV_REPLEN  _IOW('p', 0x16, unsigned long)   /* Set test   */
+#define RTC_OV_YIELD   _IOW('p', 0x17, unsigned long)   /* Set test   */
+
 /* interrupt flags */
 #define RTC_IRQF 0x80  /* Any of the following is active */
 #define RTC_PF 0x40/* Periodic interrupt */
-- 
2.5.0

[REGRESSION, bisect] pci: cxgb4 probe fails after commit 104daa71b3961434 ("PCI: Determine actual VPD size on first access")

2016-04-11 Thread Hariprasad Shenai

Hi All,


The following patch introduced a regression, causing cxgb4 driver to fail in
PCIe probe.

commit 104daa71b39614343929e1982170d5fcb0569bb5
Author: Hannes Reinecke 
Author: Hannes Reinecke 
Date:   Mon Feb 15 09:42:01 2016 +0100

PCI: Determine actual VPD size on first access

PCI-2.2 VPD entries have a maximum size of 32k, but might actually be
smaller than that.  To figure out the actual size one has to read the VPD
area until the 'end marker' is reached.

Per spec, reading outside of the VPD space is "not allowed."  In practice,
it may cause simple read errors or even crash the card.  To make matters
worse not every PCI card implements this properly, leaving us with no 'end'
marker or even completely invalid data.

Try to determine the size of the VPD data when it's first accessed.  If no
valid data can be read an I/O error will be returned when reading or
writing the sysfs attribute.

As the amount of VPD data is unknown initially the size of the sysfs
attribute will always be set to '0'.

[bhelgaas: changelog, use 0/1 (not false/true) for bitfield, tweak
pci_vpd_pci22_read() error checking]
Tested-by: Shane Seymour 
Tested-by: Babu Moger 
Signed-off-by: Hannes Reinecke 
Signed-off-by: Bjorn Helgaas 
Cc: Alexander Duyck 

The problem is stemming from the fact that the Chelsio adapters actually have
two VPD structures stored in the VPD.  An abbreviated on at Offset 0x0 and the
complete VPD at Offset 0x400.  The abbreviated one only contains the PN, SN and
EC Keywords, while the complete VPD contains those plus various adapter
constants contained in V0, V1, etc.  And it also contains the Base Ethernet MAC
Address in the "NA" Keyword which the cxgb4 driver needs when it can't contact
the adapter firmware.  (We don't have the "NA" Keywork in the VPD Structure at
Offset 0x0 because that's not an allowed VPD Keyword in the PCI-E 3.0
specification.)

  With the new code, the computed size of the VPD is 0x200 and so our efforts
to read the VPD at Offset 0x400 silently fails. We check the result of the
read looking for a signature 0x82 byte but we're checking against random stack
garbage.

  The end result is that the cxgb4 driver now fails the PCI-E Probe.

Thanks,
Hari

[PATCH RFC v0 07/12] kernel/userspace additions for addition ioctl() support for rtc

2016-04-11 Thread Bill Huey (hui)

Add additional ioctl() values to rtc so that it can 'admit' the calling
thread into a red-black tree for tracking, set the execution slot pattern,
support for setting whether read() will yield or block.

Signed-off-by: Bill Huey (hui) 
---
 include/uapi/linux/rtc.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/uapi/linux/rtc.h b/include/uapi/linux/rtc.h
index f8c82e6..76c9254 100644
--- a/include/uapi/linux/rtc.h
+++ b/include/uapi/linux/rtc.h
@@ -94,6 +94,10 @@ struct rtc_pll_info {
 #define RTC_VL_READ_IOR('p', 0x13, int)/* Voltage low detector */
 #define RTC_VL_CLR _IO('p', 0x14)  /* Clear voltage low 
information */
 
+#define RTC_OV_ADMIT   _IOW('p', 0x15, unsigned long)   /* Set test   */
+#define RTC_OV_REPLEN  _IOW('p', 0x16, unsigned long)   /* Set test   */
+#define RTC_OV_YIELD   _IOW('p', 0x17, unsigned long)   /* Set test   */
+
 /* interrupt flags */
 #define RTC_IRQF 0x80  /* Any of the following is active */
 #define RTC_PF 0x40/* Periodic interrupt */
-- 
2.5.0

[REGRESSION, bisect] pci: cxgb4 probe fails after commit 104daa71b3961434 ("PCI: Determine actual VPD size on first access")

2016-04-11 Thread Hariprasad Shenai

Hi All,


The following patch introduced a regression, causing cxgb4 driver to fail in
PCIe probe.

commit 104daa71b39614343929e1982170d5fcb0569bb5
Author: Hannes Reinecke 
Author: Hannes Reinecke 
Date:   Mon Feb 15 09:42:01 2016 +0100

PCI: Determine actual VPD size on first access

PCI-2.2 VPD entries have a maximum size of 32k, but might actually be
smaller than that.  To figure out the actual size one has to read the VPD
area until the 'end marker' is reached.

Per spec, reading outside of the VPD space is "not allowed."  In practice,
it may cause simple read errors or even crash the card.  To make matters
worse not every PCI card implements this properly, leaving us with no 'end'
marker or even completely invalid data.

Try to determine the size of the VPD data when it's first accessed.  If no
valid data can be read an I/O error will be returned when reading or
writing the sysfs attribute.

As the amount of VPD data is unknown initially the size of the sysfs
attribute will always be set to '0'.

[bhelgaas: changelog, use 0/1 (not false/true) for bitfield, tweak
pci_vpd_pci22_read() error checking]
Tested-by: Shane Seymour 
Tested-by: Babu Moger 
Signed-off-by: Hannes Reinecke 
Signed-off-by: Bjorn Helgaas 
Cc: Alexander Duyck 

The problem is stemming from the fact that the Chelsio adapters actually have
two VPD structures stored in the VPD.  An abbreviated on at Offset 0x0 and the
complete VPD at Offset 0x400.  The abbreviated one only contains the PN, SN and
EC Keywords, while the complete VPD contains those plus various adapter
constants contained in V0, V1, etc.  And it also contains the Base Ethernet MAC
Address in the "NA" Keyword which the cxgb4 driver needs when it can't contact
the adapter firmware.  (We don't have the "NA" Keywork in the VPD Structure at
Offset 0x0 because that's not an allowed VPD Keyword in the PCI-E 3.0
specification.)

  With the new code, the computed size of the VPD is 0x200 and so our efforts
to read the VPD at Offset 0x400 silently fails. We check the result of the
read looking for a signature 0x82 byte but we're checking against random stack
garbage.

  The end result is that the cxgb4 driver now fails the PCI-E Probe.

Thanks,
Hari

Re: [PATCH v4 1/2] scsi: Add intermediate STARGET_REMOVE state to scsi_target_state

2016-04-11 Thread Xiong Zhou

Hi,

On Tue, Apr 5, 2016 at 5:50 PM, Johannes Thumshirn <jthumsh...@suse.de> wrote:
> Add intermediate STARGET_REMOVE state to scsi_target_state to avoid
> running into the BUG_ON() in scsi_target_reap(). The STARGET_REMOVE
> state is only valid in the path from scsi_remove_target() to
> scsi_target_destroy() indicating this target is going to be removed.
>
> This re-fixes the problem introduced in commits
> bc3f02a795d3b4faa99d37390174be2a75d091bd  and
> 40998193560dab6c3ce8d25f4fa58a23e252ef38 in a more comprehensive way.
>
> Signed-off-by: Johannes Thumshirn <jthumsh...@suse.de>
> Fixes: 40998193560dab6c3ce8d25f4fa58a23e252ef38
> Cc: sta...@vger.kernel.org
> Reviewed-by: Ewan D. Milne <emi...@redhat.com>
> Reviewed-by: Hannes Reinecke <h...@suse.com>
> Reviewed-by: James Bottomley <j...@linux.vnet.ibm.com>
> ---
>  drivers/scsi/scsi_scan.c   | 2 ++
>  drivers/scsi/scsi_sysfs.c  | 2 ++
>  include/scsi/scsi_device.h | 1 +
>  3 files changed, 5 insertions(+)
>
> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
> index 6a82066..63b8bca 100644
> --- a/drivers/scsi/scsi_scan.c
> +++ b/drivers/scsi/scsi_scan.c
> @@ -315,6 +315,8 @@ static void scsi_target_destroy(struct scsi_target 
> *starget)
> struct Scsi_Host *shost = dev_to_shost(dev->parent);
> unsigned long flags;
>
> +   BUG_ON(starget->state != STARGET_REMOVE &&
> +  starget->state != STARGET_CREATED);


#modprobe scsi_debug
#modprobe -r scsi_debug

always triggers this BUG_ON in linux-next-20160411

printk says starget->state is _RUNNING

> starget->state = STARGET_DEL;
> transport_destroy_device(dev);
> spin_lock_irqsave(shost->host_lock, flags);
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index 00bc721..0df82e8 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -1279,11 +1279,13 @@ restart:
> spin_lock_irqsave(shost->host_lock, flags);
> list_for_each_entry(starget, >__targets, siblings) {
> if (starget->state == STARGET_DEL ||
> +   starget->state == STARGET_REMOVE ||
> starget == last_target)
> continue;
> if (starget->dev.parent == dev || >dev == dev) {
> kref_get(>reap_ref);
> last_target = starget;
> +   starget->state = STARGET_REMOVE;
> spin_unlock_irqrestore(shost->host_lock, flags);
> __scsi_remove_target(starget);
> scsi_target_reap(starget);
> diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
> index f63a167..2bffaa6 100644
> --- a/include/scsi/scsi_device.h
> +++ b/include/scsi/scsi_device.h
> @@ -240,6 +240,7 @@ scmd_printk(const char *, const struct scsi_cmnd *, const 
> char *, ...);
>  enum scsi_target_state {
> STARGET_CREATED = 1,
> STARGET_RUNNING,
> +   STARGET_REMOVE,
> STARGET_DEL,
>  };
>
> --
> 1.8.5.6
>

[PATCH RFC v0 12/12] Cyclic/rtc documentation

2016-04-11 Thread Bill Huey (hui)

Initial attempt at documentation with a test program

Signed-off-by: Bill Huey (hui) 
---
 Documentation/scheduler/sched-cyclic-rtc.txt | 468 +++
 1 file changed, 468 insertions(+)
 create mode 100644 Documentation/scheduler/sched-cyclic-rtc.txt

diff --git a/Documentation/scheduler/sched-cyclic-rtc.txt 
b/Documentation/scheduler/sched-cyclic-rtc.txt
new file mode 100644
index 000..4d22381
--- /dev/null
+++ b/Documentation/scheduler/sched-cyclic-rtc.txt
@@ -0,0 +1,468 @@
+[in progress]
+
+"Work Conserving"
+
+When a task is active and calls read(), it will block/yield depending on
+is requested from the cyclic scheduler. A RT_OV_YIELD call to ioctl()
+specifies the behavior for the calling thread.
+
+In the case where read() is called before the time slice is over, it will
+allow other tasks to run with the leftover time.
+
+"Overrun Reporting/Apps"
+
+Calls to read() will return the overrun count and zero the counter. This
+can be used to adjust the execution time of the thread so that it can run
+within that slot so that thread can meet some deadline constraint.
+
+[no decision has been made to return a more meaningful set of numbers as
+you can just get time stamps and do the math in userspace but it could
+be changed to do so]
+
+The behavior of the read() depends on whether it has been admitted or not
+via an ioctl() using RTC_OV_ADMIT. If it is then it will return the overrun
+count. If this is not admitted then it returns value corresponding to the
+default read() behavior for rtc.
+
+See the sample test sources for details.
+
+Using a video game as an example, having a rendering engine overrunning its
+slot driving by a vertical retrace interrupt can cause visual skipping and
+hurt interactivity. Adapting the computation from the read() result can
+allow for the frame buffer swap at the frame interrupt. If read() reports
+and it can simplify calculations and adapt to fit within that slot.
+It would then allow the program to respond to events (touches, buttons)
+minimizing the possibility of perceived pauses.
+
+The slot allocation scheme for the video game must have some inherit
+definition of interactivity. That determines appropriate slot allocation
+amognst a mixture of soft/hard real-time. A general policy must be created
+for the system, and all programs, to meet a real-time criteria.
+
+"Admittance"
+
+Admittance of a task is done through a ioctl() call using RTC_OV_ADMIT.
+This passes 64 bit wide bitmap that maps onto a entries in the slot map.
+
+(slot map of two threads)
+execution direction ->
+
+1000 1000 1000 1000...
+0100 0100 0100 0100...
+
+(bit pattern of two threads)
+0001 0001 0001 0001...
+0010 0010 0010 0010...
+
+(hex)
+0x
+0x
+
+The slot map is an array of 64 entries of threads. An index is increment
+through determine what the next active thread-slot will be. The end of the
+index set in /proc/rt_overrun_proc
+
+"Slot/slice activation"
+
+Move the task to the front of the SCHED_FIFO list when active, the tail when
+inactive.
+
+"RTC Infrastructure and Interrupt Routing"
+
+The cyclic scheduler is driven by the update interrupt in the RTC
+infrastructure but can be rerouted to any periodic interrupt source.
+
+One of those applications could be when interrupts from a display refresh
+happen or some interval where an external controller such as a drum pad,
+touch event or whatever.
+
+"Embedded Environments"
+
+This is single run queue only and targeting embedded scenarios where not all
+cores are guaranteed to be available. Older Qualcomm MSM kernels have a very
+aggressive cpu hotplug as a means of fully powering off cores. The only
+guaranteed CPU to run is CPU 0.
+
+"Project History"
+
+This was originally created when I was at HP/Palm to solve issues related
+to touch event handling and lag working with the real-time media subsystem.
+The typical workaround used to prevent skipping is to use large buffers to
+prevent data underruns. The programs running at SCHED_FIFO which can
+starve the system from handling external events in a timely manner like
+buttons or touch events. The lack of a globally defined policy of how to
+use real-time resources can causes long pauses between handling touch
+events and other kinds of implicit deadline misses.
+
+By choosing some kind of slot execution pattern, it was hoped that it that
+can be controlled globally across the system so that some basic interactive
+guarantees can be met. Whether the tasks be some combination of soft or
+hard real-time, a mechanism like this can help guide how SCHED_FIFO tasks
+are run versus letting SCHED_FIFO tasks run wildly.
+
+"Future work"
+
+Possible integration with the deadline scheduler. Power management
+awareness, CPU clock governor. Turning off the scheduler tick when there
+are no runnable tasks, other things...
+
+"Power management"
+
+Governor awareness...
+
+[more]
+
+
+
+/*
+ * Based on the:
+ *
+ *

Re: [PATCH v4 1/2] scsi: Add intermediate STARGET_REMOVE state to scsi_target_state

2016-04-11 Thread Xiong Zhou

Hi,

On Tue, Apr 5, 2016 at 5:50 PM, Johannes Thumshirn  wrote:
> Add intermediate STARGET_REMOVE state to scsi_target_state to avoid
> running into the BUG_ON() in scsi_target_reap(). The STARGET_REMOVE
> state is only valid in the path from scsi_remove_target() to
> scsi_target_destroy() indicating this target is going to be removed.
>
> This re-fixes the problem introduced in commits
> bc3f02a795d3b4faa99d37390174be2a75d091bd  and
> 40998193560dab6c3ce8d25f4fa58a23e252ef38 in a more comprehensive way.
>
> Signed-off-by: Johannes Thumshirn 
> Fixes: 40998193560dab6c3ce8d25f4fa58a23e252ef38
> Cc: sta...@vger.kernel.org
> Reviewed-by: Ewan D. Milne 
> Reviewed-by: Hannes Reinecke 
> Reviewed-by: James Bottomley 
> ---
>  drivers/scsi/scsi_scan.c   | 2 ++
>  drivers/scsi/scsi_sysfs.c  | 2 ++
>  include/scsi/scsi_device.h | 1 +
>  3 files changed, 5 insertions(+)
>
> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
> index 6a82066..63b8bca 100644
> --- a/drivers/scsi/scsi_scan.c
> +++ b/drivers/scsi/scsi_scan.c
> @@ -315,6 +315,8 @@ static void scsi_target_destroy(struct scsi_target 
> *starget)
> struct Scsi_Host *shost = dev_to_shost(dev->parent);
> unsigned long flags;
>
> +   BUG_ON(starget->state != STARGET_REMOVE &&
> +  starget->state != STARGET_CREATED);


#modprobe scsi_debug
#modprobe -r scsi_debug

always triggers this BUG_ON in linux-next-20160411

printk says starget->state is _RUNNING

> starget->state = STARGET_DEL;
> transport_destroy_device(dev);
> spin_lock_irqsave(shost->host_lock, flags);
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index 00bc721..0df82e8 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -1279,11 +1279,13 @@ restart:
> spin_lock_irqsave(shost->host_lock, flags);
> list_for_each_entry(starget, >__targets, siblings) {
> if (starget->state == STARGET_DEL ||
> +   starget->state == STARGET_REMOVE ||
> starget == last_target)
> continue;
> if (starget->dev.parent == dev || >dev == dev) {
> kref_get(>reap_ref);
> last_target = starget;
> +   starget->state = STARGET_REMOVE;
> spin_unlock_irqrestore(shost->host_lock, flags);
> __scsi_remove_target(starget);
> scsi_target_reap(starget);
> diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
> index f63a167..2bffaa6 100644
> --- a/include/scsi/scsi_device.h
> +++ b/include/scsi/scsi_device.h
> @@ -240,6 +240,7 @@ scmd_printk(const char *, const struct scsi_cmnd *, const 
> char *, ...);
>  enum scsi_target_state {
> STARGET_CREATED = 1,
> STARGET_RUNNING,
> +   STARGET_REMOVE,
> STARGET_DEL,
>  };
>
> --
> 1.8.5.6
>

[PATCH RFC v0 12/12] Cyclic/rtc documentation

2016-04-11 Thread Bill Huey (hui)

Initial attempt at documentation with a test program

Signed-off-by: Bill Huey (hui) 
---
 Documentation/scheduler/sched-cyclic-rtc.txt | 468 +++
 1 file changed, 468 insertions(+)
 create mode 100644 Documentation/scheduler/sched-cyclic-rtc.txt

diff --git a/Documentation/scheduler/sched-cyclic-rtc.txt 
b/Documentation/scheduler/sched-cyclic-rtc.txt
new file mode 100644
index 000..4d22381
--- /dev/null
+++ b/Documentation/scheduler/sched-cyclic-rtc.txt
@@ -0,0 +1,468 @@
+[in progress]
+
+"Work Conserving"
+
+When a task is active and calls read(), it will block/yield depending on
+is requested from the cyclic scheduler. A RT_OV_YIELD call to ioctl()
+specifies the behavior for the calling thread.
+
+In the case where read() is called before the time slice is over, it will
+allow other tasks to run with the leftover time.
+
+"Overrun Reporting/Apps"
+
+Calls to read() will return the overrun count and zero the counter. This
+can be used to adjust the execution time of the thread so that it can run
+within that slot so that thread can meet some deadline constraint.
+
+[no decision has been made to return a more meaningful set of numbers as
+you can just get time stamps and do the math in userspace but it could
+be changed to do so]
+
+The behavior of the read() depends on whether it has been admitted or not
+via an ioctl() using RTC_OV_ADMIT. If it is then it will return the overrun
+count. If this is not admitted then it returns value corresponding to the
+default read() behavior for rtc.
+
+See the sample test sources for details.
+
+Using a video game as an example, having a rendering engine overrunning its
+slot driving by a vertical retrace interrupt can cause visual skipping and
+hurt interactivity. Adapting the computation from the read() result can
+allow for the frame buffer swap at the frame interrupt. If read() reports
+and it can simplify calculations and adapt to fit within that slot.
+It would then allow the program to respond to events (touches, buttons)
+minimizing the possibility of perceived pauses.
+
+The slot allocation scheme for the video game must have some inherit
+definition of interactivity. That determines appropriate slot allocation
+amognst a mixture of soft/hard real-time. A general policy must be created
+for the system, and all programs, to meet a real-time criteria.
+
+"Admittance"
+
+Admittance of a task is done through a ioctl() call using RTC_OV_ADMIT.
+This passes 64 bit wide bitmap that maps onto a entries in the slot map.
+
+(slot map of two threads)
+execution direction ->
+
+1000 1000 1000 1000...
+0100 0100 0100 0100...
+
+(bit pattern of two threads)
+0001 0001 0001 0001...
+0010 0010 0010 0010...
+
+(hex)
+0x
+0x
+
+The slot map is an array of 64 entries of threads. An index is increment
+through determine what the next active thread-slot will be. The end of the
+index set in /proc/rt_overrun_proc
+
+"Slot/slice activation"
+
+Move the task to the front of the SCHED_FIFO list when active, the tail when
+inactive.
+
+"RTC Infrastructure and Interrupt Routing"
+
+The cyclic scheduler is driven by the update interrupt in the RTC
+infrastructure but can be rerouted to any periodic interrupt source.
+
+One of those applications could be when interrupts from a display refresh
+happen or some interval where an external controller such as a drum pad,
+touch event or whatever.
+
+"Embedded Environments"
+
+This is single run queue only and targeting embedded scenarios where not all
+cores are guaranteed to be available. Older Qualcomm MSM kernels have a very
+aggressive cpu hotplug as a means of fully powering off cores. The only
+guaranteed CPU to run is CPU 0.
+
+"Project History"
+
+This was originally created when I was at HP/Palm to solve issues related
+to touch event handling and lag working with the real-time media subsystem.
+The typical workaround used to prevent skipping is to use large buffers to
+prevent data underruns. The programs running at SCHED_FIFO which can
+starve the system from handling external events in a timely manner like
+buttons or touch events. The lack of a globally defined policy of how to
+use real-time resources can causes long pauses between handling touch
+events and other kinds of implicit deadline misses.
+
+By choosing some kind of slot execution pattern, it was hoped that it that
+can be controlled globally across the system so that some basic interactive
+guarantees can be met. Whether the tasks be some combination of soft or
+hard real-time, a mechanism like this can help guide how SCHED_FIFO tasks
+are run versus letting SCHED_FIFO tasks run wildly.
+
+"Future work"
+
+Possible integration with the deadline scheduler. Power management
+awareness, CPU clock governor. Turning off the scheduler tick when there
+are no runnable tasks, other things...
+
+"Power management"
+
+Governor awareness...
+
+[more]
+
+
+
+/*
+ * Based on the:
+ *
+ *  Real Time Clock

[PATCH RFC v0 09/12] Add priority support for the cyclic scheduler

2016-04-11 Thread Bill Huey (hui)

Initial bits to prevent priority changing of cyclic scheduler tasks by
only allow them to be SCHED_FIFO. Fairly hacky at this time and will need
revisiting because of the security concerns.

Affects task death handling since it uses an additional scheduler class
hook for clean up at death. Must be SCHED_FIFO.

Signed-off-by: Bill Huey (hui) 
---
 kernel/sched/core.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 44db0ff..cf6cf57 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -87,6 +87,10 @@
 #include "../workqueue_internal.h"
 #include "../smpboot.h"
 
+#ifdef CONFIG_RTC_CYCLIC
+#include "cyclic.h"
+#endif
+
 #define CREATE_TRACE_POINTS
 #include 
 
@@ -2074,6 +2078,10 @@ static void __sched_fork(unsigned long clone_flags, 
struct task_struct *p)
memset(>se.statistics, 0, sizeof(p->se.statistics));
 #endif
 
+#ifdef CONFIG_RTC_CYCLIC
+   RB_CLEAR_NODE(>rt.rt_overrun.node);
+#endif
+
RB_CLEAR_NODE(>dl.rb_node);
init_dl_task_timer(>dl);
__dl_clear_params(p);
@@ -3881,6 +3889,11 @@ recheck:
if (dl_policy(policy))
return -EPERM;
 
+#ifdef CONFIG_RTC_CYCLIC
+   if (rt_overrun_policy(p, policy))
+   return -EPERM;
+#endif
+
/*
 * Treat SCHED_IDLE as nice 20. Only allow a switch to
 * SCHED_NORMAL if the RLIMIT_NICE would normally permit it.
-- 
2.5.0

[PATCH RFC v0 11/12] Cyclic scheduler support

2016-04-11 Thread Bill Huey (hui)

Core implementation of the cyclic scheduler that includes admittance
handling, thread death supprot, cyclic timer tick handler, primitive proc
debugging interface, wait-queue modifications.

Signed-off-by: Bill Huey (hui) 
---
 kernel/sched/cyclic.c| 620 +++
 kernel/sched/cyclic.h|  86 +++
 kernel/sched/cyclic_rt.h |   7 +
 3 files changed, 713 insertions(+)
 create mode 100644 kernel/sched/cyclic.c
 create mode 100644 kernel/sched/cyclic.h
 create mode 100644 kernel/sched/cyclic_rt.h

diff --git a/kernel/sched/cyclic.c b/kernel/sched/cyclic.c
new file mode 100644
index 000..8ce34bd
--- /dev/null
+++ b/kernel/sched/cyclic.c
@@ -0,0 +1,620 @@
+/*
+ * cyclic scheduler for rtc support
+ *
+ * Copyright (C) Bill Huey
+ * Author: Bill Huey 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+*/
+
+#include 
+#include 
+#include 
+#include "sched.h"
+#include "cyclic.h"
+#include "cyclic_rt.h"
+
+#include 
+#include 
+
+DEFINE_RAW_SPINLOCK(rt_overrun_lock);
+struct rb_root rt_overrun_tree = RB_ROOT;
+
+#define MASK2 0xfff0
+
+/* must revisit again when I get more time to fix the possbility of
+ * overflow here and 32 bit portability */
+static int cmp_ptr_unsigned_long(long *p, long *q)
+{
+   int result = ((unsigned long)p & MASK2) - ((unsigned long)q & MASK2);
+
+   WARN_ON(sizeof(long *) != 8);
+
+   if (!result)
+   return 0;
+   else if (result > 0)
+   return 1;
+   else
+   return -1;
+}
+
+static int eq_ptr_unsigned_long(long *p, long *q)
+{
+   return (((long)p & MASK2) == ((long)q & MASK2));
+}
+
+#define CMP_PTR_LONG(p,q) cmp_ptr_unsigned_long((long *)p, (long *)q)
+
+static
+struct task_struct *_rt_overrun_entry_find(struct rb_root *root,
+   struct task_struct *p)
+{
+   struct task_struct *ret = NULL;
+   struct rb_node *node = root->rb_node;
+
+   while (node) { // double_rq_lock(struct rq *, struct rq *) cpu_rq
+   struct task_struct *task = container_of(node,
+   struct task_struct, rt.rt_overrun.node);
+
+   int result = CMP_PTR_LONG(p, task);
+
+   if (result < 0)
+   node = node->rb_left;
+   else if (result > 0)
+   node = node->rb_right;
+   else {
+   ret = task;
+   goto exit;
+   }
+   }
+exit:
+   return ret;
+}
+
+static int rt_overrun_task_runnable(struct task_struct *p)
+{
+   return task_on_rq_queued(p);
+}
+
+/* avoiding excessive debug printing, splitting the entry point */
+static
+struct task_struct *rt_overrun_entry_find(struct rb_root *root,
+   struct task_struct *p)
+{
+printk("%s: \n", __func__);
+   return _rt_overrun_entry_find(root, p);
+}
+
+static int _rt_overrun_entry_insert(struct rb_root *root, struct task_struct 
*p)
+{
+   struct rb_node **new = &(root->rb_node), *parent = NULL;
+
+printk("%s: \n", __func__);
+   while (*new) {
+   struct task_struct *task = container_of(*new,
+   struct task_struct, rt.rt_overrun.node);
+
+   int result = CMP_PTR_LONG(p, task);
+
+   parent = *new;
+   if (result < 0)
+   new = &((*new)->rb_left);
+   else if (result > 0)
+   new = &((*new)->rb_right);
+   else
+   return 0;
+   }
+
+   /* Add new node and rebalance tree. */
+   rb_link_node(>rt.rt_overrun.node, parent, new);
+   rb_insert_color(>rt.rt_overrun.node, root);
+
+   return 1;
+}
+
+static void _rt_overrun_entry_delete(struct task_struct *p)
+{
+   struct task_struct *task;
+   int i;
+
+   task = rt_overrun_entry_find(_overrun_tree, p);
+
+   if (task) {
+   printk("%s: p color %d - comm %s - slots 0x%016llx\n",
+   __func__, task->rt.rt_overrun.color, task->comm,
+   task->rt.rt_overrun.slots);
+
+   rb_erase(>rt.rt_overrun.node, _overrun_tree);
+   list_del(>rt.rt_overrun.task_list);
+   for (i = 0; i < SLOTS; ++i) {
+   if (rt_admit_rq.curr[i] == p)
+   rt_admit_rq.curr[i] = NULL;
+   }
+
+   if (rt_admit_curr == p)
+   rt_admit_curr = NULL;
+   }
+}
+
+void rt_overrun_entry_delete(struct task_struct *p)
+{
+   unsigned long flags;
+
+   raw_spin_lock_irqsave(_overrun_lock, flags);
+   _rt_overrun_entry_delete(p);
+

[PATCH RFC v0 00/12] Cyclic Scheduler Against RTC

2016-04-11 Thread Bill Huey (hui)

Hi,

This a crude cyclic scheduler implementation. It uses SCHED_FIFO tasks
and runs them according to a map pattern specified by a 64 bit mask. Each
bit corresponds to an entry into an 64 entry array of
'struct task_struct'. This works single core CPU 0 only for now.

Threads are 'admitted' to this map by an extension to the ioctl() via the
of (rtc) real-time clock interface. The bit pattern then determines when
the task will run or activate next.

The /dev/rtc interface is choosen for this purpose because of its
accessibilty to userspace. For example, the mplayer program already use
it as a timer source and could possibly benefit from being sync to a
vertical retrace interrupt during decoding. Could be an OpenGL program
needing precisely scheduler support for those same handling vertical
retrace interrupts, low latency audio and timely handling of touch
events amognst other uses.

There is also a need for some kind of blocking/yielding interface that can
return an overrun count for when the thread utilizes more time than
allocated for that frame. The read() function in rtc is overloaded for this
purpose and reports overrun events. Yield functionality has yet to be fully
tested.

I apologize for any informal or misused of terminology as I haven't fully
reviewed all of the academic literature regarding these kind of schedulers.
I welcome suggestions and corrects etc

Special thanks to includes...

Peter Ziljstra (Intel), Steve Rostedt (Red Hat), Rik van Riel (Red Hat) for
encouraging me to continue working in the Linux kernel community and being
generally positive and supportive.

KY Srinivasan (formerly Novell now Microsoft) for discussion of real-time
schedulers and pointers to specifics on that topic. It was just a single
discussion but was basically the inspiration for this kind of work.

Amir Frenkel (Palm), Kenneth Albanowski (Palm), Bdale Garbee (HP) for the
amazing place that was Palm, Kenneth for being a co-conspirator with this
scheduler. This scheduler was inspired by performance work that I did
at Palm's kernel group along with discussions with the multimedia team
before HP kill webOS off. Sad and infuriating moment.

Maybe, in a short while, the community will understand the value of these
patches for -rt and start solving the general phenomenon of high performance
multi-media and user interactivity problems more properly with both a
scheduler like this and -rt shipped as default in the near future.

[Also, I'd love some kind of sponsorship to continue what I think is
critical work versus heading back into the valley]

---

Bill Huey (hui) (12):
  Kconfig change
  Reroute rtc update irqs to the cyclic scheduler handler
  Add cyclic support to rtc-dev.c
  Anonymous struct initialization
  Task tracking per file descriptor
  Add anonymous struct to sched_rt_entity
  kernel/userspace additions for addition ioctl() support for rtc
  Compilation support
  Add priority support for the cyclic scheduler
  Export SCHED_FIFO/RT requeuing functions
  Cyclic scheduler support
  Cyclic/rtc documentation

 Documentation/scheduler/sched-cyclic-rtc.txt | 468 
 drivers/rtc/Kconfig  |   5 +
 drivers/rtc/class.c  |   3 +
 drivers/rtc/interface.c  |  23 +
 drivers/rtc/rtc-dev.c| 161 +++
 include/linux/init_task.h|  18 +
 include/linux/rtc.h  |   3 +
 include/linux/sched.h|  15 +
 include/uapi/linux/rtc.h |   4 +
 kernel/sched/Makefile|   1 +
 kernel/sched/core.c  |  13 +
 kernel/sched/cyclic.c| 620 +++
 kernel/sched/cyclic.h|  86 
 kernel/sched/cyclic_rt.h |   7 +
 kernel/sched/rt.c|  41 ++
 15 files changed, 1468 insertions(+)
 create mode 100644 Documentation/scheduler/sched-cyclic-rtc.txt
 create mode 100644 kernel/sched/cyclic.c
 create mode 100644 kernel/sched/cyclic.h
 create mode 100644 kernel/sched/cyclic_rt.h

-- 
2.5.0

[PATCH RFC v0 04/12] Anonymous struct initialization

2016-04-11 Thread Bill Huey (hui)

Anonymous struct initialization

Signed-off-by: Bill Huey (hui) 
---
 include/linux/init_task.h | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index f2cb8d4..308caf6 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -183,6 +183,23 @@ extern struct task_group root_task_group;
 # define INIT_KASAN(tsk)
 #endif
 
+#ifdef CONFIG_RTC_CYCLIC
+# define INIT_RT_OVERRUN(tsk)  \
+   .rt_overrun = { \
+   .count = 0, \
+   .task_list = 
LIST_HEAD_INIT(tsk.rt.rt_overrun.task_list), \
+   .type = 0,  \
+   .color = 0, \
+   .slots = 0, \
+   .yield = 0, \
+   .machine_state = 0, \
+   .last_machine_state = 0,\
+   .last_task_state = 0,   \
+   },
+#else
+# define INIT_RT_OVERRUN
+#endif
+
 /*
  *  INIT_TASK is used to set up the first task table, touch at
  * your own risk!. Base=0, limit=0x1f (=2MB)
@@ -210,6 +227,7 @@ extern struct task_group root_task_group;
.rt = { \
.run_list   = LIST_HEAD_INIT(tsk.rt.run_list),  \
.time_slice = RR_TIMESLICE, \
+   INIT_RT_OVERRUN(tsk)\
},  \
.tasks  = LIST_HEAD_INIT(tsk.tasks),\
INIT_PUSHABLE_TASKS(tsk)\
-- 
2.5.0

[PATCH RFC v0 09/12] Add priority support for the cyclic scheduler

2016-04-11 Thread Bill Huey (hui)

Initial bits to prevent priority changing of cyclic scheduler tasks by
only allow them to be SCHED_FIFO. Fairly hacky at this time and will need
revisiting because of the security concerns.

Affects task death handling since it uses an additional scheduler class
hook for clean up at death. Must be SCHED_FIFO.

Signed-off-by: Bill Huey (hui) 
---
 kernel/sched/core.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 44db0ff..cf6cf57 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -87,6 +87,10 @@
 #include "../workqueue_internal.h"
 #include "../smpboot.h"
 
+#ifdef CONFIG_RTC_CYCLIC
+#include "cyclic.h"
+#endif
+
 #define CREATE_TRACE_POINTS
 #include 
 
@@ -2074,6 +2078,10 @@ static void __sched_fork(unsigned long clone_flags, 
struct task_struct *p)
memset(>se.statistics, 0, sizeof(p->se.statistics));
 #endif
 
+#ifdef CONFIG_RTC_CYCLIC
+   RB_CLEAR_NODE(>rt.rt_overrun.node);
+#endif
+
RB_CLEAR_NODE(>dl.rb_node);
init_dl_task_timer(>dl);
__dl_clear_params(p);
@@ -3881,6 +3889,11 @@ recheck:
if (dl_policy(policy))
return -EPERM;
 
+#ifdef CONFIG_RTC_CYCLIC
+   if (rt_overrun_policy(p, policy))
+   return -EPERM;
+#endif
+
/*
 * Treat SCHED_IDLE as nice 20. Only allow a switch to
 * SCHED_NORMAL if the RLIMIT_NICE would normally permit it.
-- 
2.5.0

[PATCH RFC v0 11/12] Cyclic scheduler support

2016-04-11 Thread Bill Huey (hui)

Core implementation of the cyclic scheduler that includes admittance
handling, thread death supprot, cyclic timer tick handler, primitive proc
debugging interface, wait-queue modifications.

Signed-off-by: Bill Huey (hui) 
---
 kernel/sched/cyclic.c| 620 +++
 kernel/sched/cyclic.h|  86 +++
 kernel/sched/cyclic_rt.h |   7 +
 3 files changed, 713 insertions(+)
 create mode 100644 kernel/sched/cyclic.c
 create mode 100644 kernel/sched/cyclic.h
 create mode 100644 kernel/sched/cyclic_rt.h

diff --git a/kernel/sched/cyclic.c b/kernel/sched/cyclic.c
new file mode 100644
index 000..8ce34bd
--- /dev/null
+++ b/kernel/sched/cyclic.c
@@ -0,0 +1,620 @@
+/*
+ * cyclic scheduler for rtc support
+ *
+ * Copyright (C) Bill Huey
+ * Author: Bill Huey 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+*/
+
+#include 
+#include 
+#include 
+#include "sched.h"
+#include "cyclic.h"
+#include "cyclic_rt.h"
+
+#include 
+#include 
+
+DEFINE_RAW_SPINLOCK(rt_overrun_lock);
+struct rb_root rt_overrun_tree = RB_ROOT;
+
+#define MASK2 0xfff0
+
+/* must revisit again when I get more time to fix the possbility of
+ * overflow here and 32 bit portability */
+static int cmp_ptr_unsigned_long(long *p, long *q)
+{
+   int result = ((unsigned long)p & MASK2) - ((unsigned long)q & MASK2);
+
+   WARN_ON(sizeof(long *) != 8);
+
+   if (!result)
+   return 0;
+   else if (result > 0)
+   return 1;
+   else
+   return -1;
+}
+
+static int eq_ptr_unsigned_long(long *p, long *q)
+{
+   return (((long)p & MASK2) == ((long)q & MASK2));
+}
+
+#define CMP_PTR_LONG(p,q) cmp_ptr_unsigned_long((long *)p, (long *)q)
+
+static
+struct task_struct *_rt_overrun_entry_find(struct rb_root *root,
+   struct task_struct *p)
+{
+   struct task_struct *ret = NULL;
+   struct rb_node *node = root->rb_node;
+
+   while (node) { // double_rq_lock(struct rq *, struct rq *) cpu_rq
+   struct task_struct *task = container_of(node,
+   struct task_struct, rt.rt_overrun.node);
+
+   int result = CMP_PTR_LONG(p, task);
+
+   if (result < 0)
+   node = node->rb_left;
+   else if (result > 0)
+   node = node->rb_right;
+   else {
+   ret = task;
+   goto exit;
+   }
+   }
+exit:
+   return ret;
+}
+
+static int rt_overrun_task_runnable(struct task_struct *p)
+{
+   return task_on_rq_queued(p);
+}
+
+/* avoiding excessive debug printing, splitting the entry point */
+static
+struct task_struct *rt_overrun_entry_find(struct rb_root *root,
+   struct task_struct *p)
+{
+printk("%s: \n", __func__);
+   return _rt_overrun_entry_find(root, p);
+}
+
+static int _rt_overrun_entry_insert(struct rb_root *root, struct task_struct 
*p)
+{
+   struct rb_node **new = &(root->rb_node), *parent = NULL;
+
+printk("%s: \n", __func__);
+   while (*new) {
+   struct task_struct *task = container_of(*new,
+   struct task_struct, rt.rt_overrun.node);
+
+   int result = CMP_PTR_LONG(p, task);
+
+   parent = *new;
+   if (result < 0)
+   new = &((*new)->rb_left);
+   else if (result > 0)
+   new = &((*new)->rb_right);
+   else
+   return 0;
+   }
+
+   /* Add new node and rebalance tree. */
+   rb_link_node(>rt.rt_overrun.node, parent, new);
+   rb_insert_color(>rt.rt_overrun.node, root);
+
+   return 1;
+}
+
+static void _rt_overrun_entry_delete(struct task_struct *p)
+{
+   struct task_struct *task;
+   int i;
+
+   task = rt_overrun_entry_find(_overrun_tree, p);
+
+   if (task) {
+   printk("%s: p color %d - comm %s - slots 0x%016llx\n",
+   __func__, task->rt.rt_overrun.color, task->comm,
+   task->rt.rt_overrun.slots);
+
+   rb_erase(>rt.rt_overrun.node, _overrun_tree);
+   list_del(>rt.rt_overrun.task_list);
+   for (i = 0; i < SLOTS; ++i) {
+   if (rt_admit_rq.curr[i] == p)
+   rt_admit_rq.curr[i] = NULL;
+   }
+
+   if (rt_admit_curr == p)
+   rt_admit_curr = NULL;
+   }
+}
+
+void rt_overrun_entry_delete(struct task_struct *p)
+{
+   unsigned long flags;
+
+   raw_spin_lock_irqsave(_overrun_lock, flags);
+   _rt_overrun_entry_delete(p);
+   raw_spin_unlock_irqrestore(_overrun_lock, flags);
+}
+
+/* forward

[PATCH RFC v0 00/12] Cyclic Scheduler Against RTC

2016-04-11 Thread Bill Huey (hui)

Hi,

This a crude cyclic scheduler implementation. It uses SCHED_FIFO tasks
and runs them according to a map pattern specified by a 64 bit mask. Each
bit corresponds to an entry into an 64 entry array of
'struct task_struct'. This works single core CPU 0 only for now.

Threads are 'admitted' to this map by an extension to the ioctl() via the
of (rtc) real-time clock interface. The bit pattern then determines when
the task will run or activate next.

The /dev/rtc interface is choosen for this purpose because of its
accessibilty to userspace. For example, the mplayer program already use
it as a timer source and could possibly benefit from being sync to a
vertical retrace interrupt during decoding. Could be an OpenGL program
needing precisely scheduler support for those same handling vertical
retrace interrupts, low latency audio and timely handling of touch
events amognst other uses.

There is also a need for some kind of blocking/yielding interface that can
return an overrun count for when the thread utilizes more time than
allocated for that frame. The read() function in rtc is overloaded for this
purpose and reports overrun events. Yield functionality has yet to be fully
tested.

I apologize for any informal or misused of terminology as I haven't fully
reviewed all of the academic literature regarding these kind of schedulers.
I welcome suggestions and corrects etc

Special thanks to includes...

Peter Ziljstra (Intel), Steve Rostedt (Red Hat), Rik van Riel (Red Hat) for
encouraging me to continue working in the Linux kernel community and being
generally positive and supportive.

KY Srinivasan (formerly Novell now Microsoft) for discussion of real-time
schedulers and pointers to specifics on that topic. It was just a single
discussion but was basically the inspiration for this kind of work.

Amir Frenkel (Palm), Kenneth Albanowski (Palm), Bdale Garbee (HP) for the
amazing place that was Palm, Kenneth for being a co-conspirator with this
scheduler. This scheduler was inspired by performance work that I did
at Palm's kernel group along with discussions with the multimedia team
before HP kill webOS off. Sad and infuriating moment.

Maybe, in a short while, the community will understand the value of these
patches for -rt and start solving the general phenomenon of high performance
multi-media and user interactivity problems more properly with both a
scheduler like this and -rt shipped as default in the near future.

[Also, I'd love some kind of sponsorship to continue what I think is
critical work versus heading back into the valley]

---

Bill Huey (hui) (12):
  Kconfig change
  Reroute rtc update irqs to the cyclic scheduler handler
  Add cyclic support to rtc-dev.c
  Anonymous struct initialization
  Task tracking per file descriptor
  Add anonymous struct to sched_rt_entity
  kernel/userspace additions for addition ioctl() support for rtc
  Compilation support
  Add priority support for the cyclic scheduler
  Export SCHED_FIFO/RT requeuing functions
  Cyclic scheduler support
  Cyclic/rtc documentation

 Documentation/scheduler/sched-cyclic-rtc.txt | 468 
 drivers/rtc/Kconfig  |   5 +
 drivers/rtc/class.c  |   3 +
 drivers/rtc/interface.c  |  23 +
 drivers/rtc/rtc-dev.c| 161 +++
 include/linux/init_task.h|  18 +
 include/linux/rtc.h  |   3 +
 include/linux/sched.h|  15 +
 include/uapi/linux/rtc.h |   4 +
 kernel/sched/Makefile|   1 +
 kernel/sched/core.c  |  13 +
 kernel/sched/cyclic.c| 620 +++
 kernel/sched/cyclic.h|  86 
 kernel/sched/cyclic_rt.h |   7 +
 kernel/sched/rt.c|  41 ++
 15 files changed, 1468 insertions(+)
 create mode 100644 Documentation/scheduler/sched-cyclic-rtc.txt
 create mode 100644 kernel/sched/cyclic.c
 create mode 100644 kernel/sched/cyclic.h
 create mode 100644 kernel/sched/cyclic_rt.h

-- 
2.5.0

[PATCH RFC v0 04/12] Anonymous struct initialization

2016-04-11 Thread Bill Huey (hui)

Anonymous struct initialization

Signed-off-by: Bill Huey (hui) 
---
 include/linux/init_task.h | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index f2cb8d4..308caf6 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -183,6 +183,23 @@ extern struct task_group root_task_group;
 # define INIT_KASAN(tsk)
 #endif
 
+#ifdef CONFIG_RTC_CYCLIC
+# define INIT_RT_OVERRUN(tsk)  \
+   .rt_overrun = { \
+   .count = 0, \
+   .task_list = 
LIST_HEAD_INIT(tsk.rt.rt_overrun.task_list), \
+   .type = 0,  \
+   .color = 0, \
+   .slots = 0, \
+   .yield = 0, \
+   .machine_state = 0, \
+   .last_machine_state = 0,\
+   .last_task_state = 0,   \
+   },
+#else
+# define INIT_RT_OVERRUN
+#endif
+
 /*
  *  INIT_TASK is used to set up the first task table, touch at
  * your own risk!. Base=0, limit=0x1f (=2MB)
@@ -210,6 +227,7 @@ extern struct task_group root_task_group;
.rt = { \
.run_list   = LIST_HEAD_INIT(tsk.rt.run_list),  \
.time_slice = RR_TIMESLICE, \
+   INIT_RT_OVERRUN(tsk)\
},  \
.tasks  = LIST_HEAD_INIT(tsk.tasks),\
INIT_PUSHABLE_TASKS(tsk)\
-- 
2.5.0

Re: [PATCH v6 0/3] thermal: mediatek: Add cpu dynamic power cooling model.

2016-04-11 Thread dawei chien

On Tue, 2016-04-12 at 10:41 +0530, Viresh Kumar wrote:
> On 12-04-16, 10:32, dawei chien wrote:
> > On Tue, 2016-03-22 at 13:13 +0800, dawei chien wrote:
> > > On Tue, 2016-03-15 at 13:17 +0700, Viresh Kumar wrote:
> > > > Its Rafael, who is going to apply this one.
> > > > 
> > > > Can you please resend it as he may not have it in patchworks?
> > > > 
> > > 
> > > Hi Rafael,
> > > Would you merge this patch to your tree, thank you.
> > > 
> > > BR,
> > > Dawei
> > 
> > Hi Rafael,
> > Would you please merge this patch, or please kindly let me know for any
> > problem, thank you.
> 
> Didn't I ask you earlier to resend this patch as Rafael wouldn't have
> it in his queue now ?
> 
> Please resend it and that will make it earlier for Rafael to get it
> applied.
> 
Hi Viresh,
Please refer to following for my resending, thank you.

https://lkml.org/lkml/2016/3/15/101
https://patchwork.kernel.org/patch/8586131/
https://patchwork.kernel.org/patch/8586111/
https://patchwork.kernel.org/patch/8586081/

BR,
Dawei

Re: [PATCH v4 1/8] arm64: dts: rockchip: Clean up /memory nodes

2016-04-11 Thread Heiko Stuebner

Am Donnerstag, 31. März 2016, 22:45:52 schrieb Heiko Stuebner:
> Am Donnerstag, 31. März 2016, 19:15:43 schrieb Heiko Stuebner:
> > Am Samstag, 19. März 2016, 09:04:08 schrieb Heiko Stuebner:
> > > Am Mittwoch, 16. März 2016, 14:58:39 schrieb Andreas Färber:
> > > > A dtc update results in warnings for nodes with reg property but
> > > > without
> > > > unit address in the node name, so rename /memory to /memory@0.
> > > > 
> > > > Signed-off-by: Andreas Färber 
> > > 
> > > applied to a dts64-fixes branch for 4.6, after changing the commit
> > > message to 
> > > A dtc update results in warnings for nodes with reg property but
> > > without
> > > unit address in the node name, so rename /memory to
> > > /memory@startaddress
> > > (memory starts at 0 in the case of the rk3368).
> > > 
> > > 
> > > To clarify that the @0 is not arbitary chosen.
> > 
> > This dtc update in question hasn't landed in v4.6-rc1 and from what I
> > gathered will need some changes. The patch is obviously still correct,
> > but I have now moved it from v4.6-fixes to the regular v4.7 64bit dts
> > changes.
> 
> also it seems "memory" is special and memory without unitname will stay
> allowed [0], especially as uboot or other bootloaders may expect such a
> node to insert the actual amount of memory into it.
> 
> Looking at uboot, fdt_fixup_memory_banks seems to look explicitly for a
> "memory" node, so I'm actually not sure, if this is safe to keep at all.

so after pondering this some more, I decided to drop this change again.
/memory will stay allowed and might produce less issues with bootloaders 
touching the memory values.

> [0] http://www.spinics.net/lists/arm-kernel/msg494038.html

Re: [PATCH v6 0/3] thermal: mediatek: Add cpu dynamic power cooling model.

2016-04-11 Thread dawei chien

On Tue, 2016-04-12 at 10:41 +0530, Viresh Kumar wrote:
> On 12-04-16, 10:32, dawei chien wrote:
> > On Tue, 2016-03-22 at 13:13 +0800, dawei chien wrote:
> > > On Tue, 2016-03-15 at 13:17 +0700, Viresh Kumar wrote:
> > > > Its Rafael, who is going to apply this one.
> > > > 
> > > > Can you please resend it as he may not have it in patchworks?
> > > > 
> > > 
> > > Hi Rafael,
> > > Would you merge this patch to your tree, thank you.
> > > 
> > > BR,
> > > Dawei
> > 
> > Hi Rafael,
> > Would you please merge this patch, or please kindly let me know for any
> > problem, thank you.
> 
> Didn't I ask you earlier to resend this patch as Rafael wouldn't have
> it in his queue now ?
> 
> Please resend it and that will make it earlier for Rafael to get it
> applied.
> 
Hi Viresh,
Please refer to following for my resending, thank you.

https://lkml.org/lkml/2016/3/15/101
https://patchwork.kernel.org/patch/8586131/
https://patchwork.kernel.org/patch/8586111/
https://patchwork.kernel.org/patch/8586081/

BR,
Dawei

Re: [PATCH v4 1/8] arm64: dts: rockchip: Clean up /memory nodes

2016-04-11 Thread Heiko Stuebner

Am Donnerstag, 31. März 2016, 22:45:52 schrieb Heiko Stuebner:
> Am Donnerstag, 31. März 2016, 19:15:43 schrieb Heiko Stuebner:
> > Am Samstag, 19. März 2016, 09:04:08 schrieb Heiko Stuebner:
> > > Am Mittwoch, 16. März 2016, 14:58:39 schrieb Andreas Färber:
> > > > A dtc update results in warnings for nodes with reg property but
> > > > without
> > > > unit address in the node name, so rename /memory to /memory@0.
> > > > 
> > > > Signed-off-by: Andreas Färber 
> > > 
> > > applied to a dts64-fixes branch for 4.6, after changing the commit
> > > message to 
> > > A dtc update results in warnings for nodes with reg property but
> > > without
> > > unit address in the node name, so rename /memory to
> > > /memory@startaddress
> > > (memory starts at 0 in the case of the rk3368).
> > > 
> > > 
> > > To clarify that the @0 is not arbitary chosen.
> > 
> > This dtc update in question hasn't landed in v4.6-rc1 and from what I
> > gathered will need some changes. The patch is obviously still correct,
> > but I have now moved it from v4.6-fixes to the regular v4.7 64bit dts
> > changes.
> 
> also it seems "memory" is special and memory without unitname will stay
> allowed [0], especially as uboot or other bootloaders may expect such a
> node to insert the actual amount of memory into it.
> 
> Looking at uboot, fdt_fixup_memory_banks seems to look explicitly for a
> "memory" node, so I'm actually not sure, if this is safe to keep at all.

so after pondering this some more, I decided to drop this change again.
/memory will stay allowed and might produce less issues with bootloaders 
touching the memory values.

> [0] http://www.spinics.net/lists/arm-kernel/msg494038.html

Re: [PATCH 5/5] cpufreq: Loongson1: Replace goto out with return in ls1x_cpufreq_probe()

2016-04-11 Thread Viresh Kumar

On 11-04-16, 19:55, Keguang Zhang wrote:
> From: Kelvin Cheung 
> 
> This patch replaces goto out with return in ls1x_cpufreq_probe(),
> and also includes some minor fixes.
> 
> Signed-off-by: Kelvin Cheung 
> ---
>  drivers/cpufreq/loongson1-cpufreq.c | 37 
> -
>  1 file changed, 16 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/cpufreq/loongson1-cpufreq.c 
> b/drivers/cpufreq/loongson1-cpufreq.c
> index 5074f5e..1bc90af 100644
> --- a/drivers/cpufreq/loongson1-cpufreq.c
> +++ b/drivers/cpufreq/loongson1-cpufreq.c
> @@ -1,7 +1,7 @@
>  /*
>   * CPU Frequency Scaling for Loongson 1 SoC
>   *
> - * Copyright (C) 2014 Zhang, Keguang 
> + * Copyright (C) 2014-2016 Zhang, Keguang 

Actually you should fold above into the first patch of the series,
that renames this file. It makes much sense that way.

>   *
>   * This file is licensed under the terms of the GNU General Public
>   * License version 2. This program is licensed "as is" without any
> @@ -141,7 +141,8 @@ static int ls1x_cpufreq_probe(struct platform_device 
> *pdev)
>   struct clk *clk;
>   int ret;
>  
> - if (!pdata || !pdata->clk_name || !pdata->osc_clk_name)
> + if (!pdata || !pdata->clk_name || !pdata->osc_clk_name) {

You added a '{' here, but the closing '}' is added way down..
Something is wrong here I feel..

> + dev_err(>dev, "platform data missing\n");
>   return -EINVAL;
>  
>   cpufreq =
> @@ -155,8 +156,7 @@ static int ls1x_cpufreq_probe(struct platform_device 
> *pdev)
>   if (IS_ERR(clk)) {
>   dev_err(>dev, "unable to get %s clock\n",
>   pdata->clk_name);
> - ret = PTR_ERR(clk);
> - goto out;
> + return PTR_ERR(clk);
>   }

>  static struct platform_driver ls1x_cpufreq_platdrv = {
> - .driver = {
> + .probe  = ls1x_cpufreq_probe,
> + .remove = ls1x_cpufreq_remove,
> + .driver = {
>   .name   = "ls1x-cpufreq",
>   },
> - .probe  = ls1x_cpufreq_probe,
> - .remove = ls1x_cpufreq_remove,

Why do this change at all? Do it in the first patch if you really want
to.

>  };
>  
>  module_platform_driver(ls1x_cpufreq_platdrv);
>  
>  MODULE_AUTHOR("Kelvin Cheung ");
> -MODULE_DESCRIPTION("Loongson 1 CPUFreq driver");
> +MODULE_DESCRIPTION("Loongson1 CPUFreq driver");

This one as well, move it to the first patch.

>  MODULE_LICENSE("GPL");
> -- 
> 1.9.1

-- 
viresh

Re: [PATCH 5/5] cpufreq: Loongson1: Replace goto out with return in ls1x_cpufreq_probe()

2016-04-11 Thread Viresh Kumar

On 11-04-16, 19:55, Keguang Zhang wrote:
> From: Kelvin Cheung 
> 
> This patch replaces goto out with return in ls1x_cpufreq_probe(),
> and also includes some minor fixes.
> 
> Signed-off-by: Kelvin Cheung 
> ---
>  drivers/cpufreq/loongson1-cpufreq.c | 37 
> -
>  1 file changed, 16 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/cpufreq/loongson1-cpufreq.c 
> b/drivers/cpufreq/loongson1-cpufreq.c
> index 5074f5e..1bc90af 100644
> --- a/drivers/cpufreq/loongson1-cpufreq.c
> +++ b/drivers/cpufreq/loongson1-cpufreq.c
> @@ -1,7 +1,7 @@
>  /*
>   * CPU Frequency Scaling for Loongson 1 SoC
>   *
> - * Copyright (C) 2014 Zhang, Keguang 
> + * Copyright (C) 2014-2016 Zhang, Keguang 

Actually you should fold above into the first patch of the series,
that renames this file. It makes much sense that way.

>   *
>   * This file is licensed under the terms of the GNU General Public
>   * License version 2. This program is licensed "as is" without any
> @@ -141,7 +141,8 @@ static int ls1x_cpufreq_probe(struct platform_device 
> *pdev)
>   struct clk *clk;
>   int ret;
>  
> - if (!pdata || !pdata->clk_name || !pdata->osc_clk_name)
> + if (!pdata || !pdata->clk_name || !pdata->osc_clk_name) {

You added a '{' here, but the closing '}' is added way down..
Something is wrong here I feel..

> + dev_err(>dev, "platform data missing\n");
>   return -EINVAL;
>  
>   cpufreq =
> @@ -155,8 +156,7 @@ static int ls1x_cpufreq_probe(struct platform_device 
> *pdev)
>   if (IS_ERR(clk)) {
>   dev_err(>dev, "unable to get %s clock\n",
>   pdata->clk_name);
> - ret = PTR_ERR(clk);
> - goto out;
> + return PTR_ERR(clk);
>   }

>  static struct platform_driver ls1x_cpufreq_platdrv = {
> - .driver = {
> + .probe  = ls1x_cpufreq_probe,
> + .remove = ls1x_cpufreq_remove,
> + .driver = {
>   .name   = "ls1x-cpufreq",
>   },
> - .probe  = ls1x_cpufreq_probe,
> - .remove = ls1x_cpufreq_remove,

Why do this change at all? Do it in the first patch if you really want
to.

>  };
>  
>  module_platform_driver(ls1x_cpufreq_platdrv);
>  
>  MODULE_AUTHOR("Kelvin Cheung ");
> -MODULE_DESCRIPTION("Loongson 1 CPUFreq driver");
> +MODULE_DESCRIPTION("Loongson1 CPUFreq driver");

This one as well, move it to the first patch.

>  MODULE_LICENSE("GPL");
> -- 
> 1.9.1

-- 
viresh

Re: [PATCH 4/5] cpufreq: Loongson1: Use devm_kzalloc() instead of global structure

2016-04-11 Thread Viresh Kumar

On 11-04-16, 19:55, Keguang Zhang wrote:
> From: Kelvin Cheung 
> 
> This patch uses devm_kzalloc() instead of global structure.
> 

Why are you doing this? The commit log should contain that.

> Signed-off-by: Kelvin Cheung 
> ---
>  drivers/cpufreq/loongson1-cpufreq.c | 63 
> -
>  1 file changed, 35 insertions(+), 28 deletions(-)

I don't have any issues with you doing this, but I don't think that's
necessary to do.

Acked-by: Viresh Kumar 

-- 
viresh

Re: [PATCH 4/5] cpufreq: Loongson1: Use devm_kzalloc() instead of global structure

2016-04-11 Thread Viresh Kumar

On 11-04-16, 19:55, Keguang Zhang wrote:
> From: Kelvin Cheung 
> 
> This patch uses devm_kzalloc() instead of global structure.
> 

Why are you doing this? The commit log should contain that.

> Signed-off-by: Kelvin Cheung 
> ---
>  drivers/cpufreq/loongson1-cpufreq.c | 63 
> -
>  1 file changed, 35 insertions(+), 28 deletions(-)

I don't have any issues with you doing this, but I don't think that's
necessary to do.

Acked-by: Viresh Kumar 

-- 
viresh

Re: [PATCH V2 1/3] acpi,pci,irq: reduce resource requirements

2016-04-11 Thread Bjorn Helgaas

Hi Sinan,

I was hoping we could *simplify* this, but I think it's just getting
even more complicated (it's a net addition of 100 lines), which is due
to feature creep that I'm afraid is my fault.

IIRC, the main thing you need is to get rid of some early memory
allocation.

I don't think all the trigger mode/level checking is worth it.  The
current code doesn't do it, and it's not fixing a problem for you.
It's conceivable that it could even make us trip over a new problem,
e.g., some broken BIOS that we currently tolerate.

I think you could make this a little easier to review if you split
things like the acpi_irq_penalty[] -> acpi_isa_irq_penalty[] rename
into their own patches.  Little patches like that are trivial to
review because a simple rename is pretty safe, and then the patches
that actually *do* interesting things are smaller and easier to
review, too.

On Fri, Apr 08, 2016 at 09:26:30PM -0400, Sinan Kaya wrote:
> Code has been redesigned to calculate penalty requirements on the fly. This
> significantly simplifies the implementation and removes some of the init
> calls from x86 architecture. Command line penalty assignment has been
> limited to ISA interrupts only.
> 
> Signed-off-by: Sinan Kaya 
> ---
>  drivers/acpi/pci_link.c | 176 
> ++--
>  1 file changed, 140 insertions(+), 36 deletions(-)
> 
> diff --git a/drivers/acpi/pci_link.c b/drivers/acpi/pci_link.c
> index ededa90..25695ea 100644
> --- a/drivers/acpi/pci_link.c
> +++ b/drivers/acpi/pci_link.c
> @@ -437,17 +437,15 @@ static int acpi_pci_link_set(struct acpi_pci_link 
> *link, int irq)
>   * enabled system.
>   */
>  
> -#define ACPI_MAX_IRQS256
>  #define ACPI_MAX_ISA_IRQ 16

ACPI_MAX_ISA_IRQ is a bit of a misnomer.  The maximum ISA IRQ is 15,
not 16, so I think this should be named ACPI_MAX_ISA_IRQS.

> -#define PIRQ_PENALTY_PCI_AVAILABLE   (0)
>  #define PIRQ_PENALTY_PCI_POSSIBLE(16*16)
>  #define PIRQ_PENALTY_PCI_USING   (16*16*16)
>  #define PIRQ_PENALTY_ISA_TYPICAL (16*16*16*16)
>  #define PIRQ_PENALTY_ISA_USED(16*16*16*16*16)
>  #define PIRQ_PENALTY_ISA_ALWAYS  (16*16*16*16*16*16)
>  
> -static int acpi_irq_penalty[ACPI_MAX_IRQS] = {
> +static int acpi_isa_irq_penalty[ACPI_MAX_ISA_IRQ] = {
>   PIRQ_PENALTY_ISA_ALWAYS,/* IRQ0 timer */
>   PIRQ_PENALTY_ISA_ALWAYS,/* IRQ1 keyboard */
>   PIRQ_PENALTY_ISA_ALWAYS,/* IRQ2 cascade */
> @@ -457,9 +455,9 @@ static int acpi_irq_penalty[ACPI_MAX_IRQS] = {
>   PIRQ_PENALTY_ISA_TYPICAL,   /* IRQ6 */
>   PIRQ_PENALTY_ISA_TYPICAL,   /* IRQ7 parallel, spurious */
>   PIRQ_PENALTY_ISA_TYPICAL,   /* IRQ8 rtc, sometimes */
> - PIRQ_PENALTY_PCI_AVAILABLE, /* IRQ9  PCI, often acpi */
> - PIRQ_PENALTY_PCI_AVAILABLE, /* IRQ10 PCI */
> - PIRQ_PENALTY_PCI_AVAILABLE, /* IRQ11 PCI */
> + 0,  /* IRQ9  PCI, often acpi */
> + 0,  /* IRQ10 PCI */
> + 0,  /* IRQ11 PCI */
>   PIRQ_PENALTY_ISA_USED,  /* IRQ12 mouse */
>   PIRQ_PENALTY_ISA_USED,  /* IRQ13 fpe, sometimes */
>   PIRQ_PENALTY_ISA_USED,  /* IRQ14 ide0 */
> @@ -467,6 +465,121 @@ static int acpi_irq_penalty[ACPI_MAX_IRQS] = {
>   /* >IRQ15 */
>  };
>  
> +static int acpi_link_trigger(int irq, u8 *polarity, u8 *triggering)
> +{
> + struct acpi_pci_link *link;
> + bool found = false;
> +
> + *polarity = ~0;
> + *triggering = ~0;
> +
> + list_for_each_entry(link, _link_list, list) {
> + int i;
> +
> + if (link->irq.active && link->irq.active == irq) {
> + if (*polarity == ~0)
> + *polarity = link->irq.polarity;
> +
> + if (*triggering == ~0)
> + *triggering = link->irq.triggering;
> +
> + if (*polarity != link->irq.polarity)
> + return -EINVAL;
> +
> + if (*triggering != link->irq.triggering)
> + return -EINVAL;
> +
> + found = true;
> + }
> +
> + for (i = 0; i < link->irq.possible_count; i++)
> + if (link->irq.possible[i] == irq) {
> + if (*polarity == ~0)
> + *polarity = link->irq.polarity;
> +
> + if (*triggering == ~0)
> + *triggering = link->irq.triggering;
> +
> + if (*polarity != link->irq.polarity)
> + return -EINVAL;
> +
> + if (*triggering != link->irq.triggering)
> + return -EINVAL;
> +
> + found = true;
> +

Re: [PATCH V2 1/3] acpi,pci,irq: reduce resource requirements

2016-04-11 Thread Bjorn Helgaas

Hi Sinan,

I was hoping we could *simplify* this, but I think it's just getting
even more complicated (it's a net addition of 100 lines), which is due
to feature creep that I'm afraid is my fault.

IIRC, the main thing you need is to get rid of some early memory
allocation.

I don't think all the trigger mode/level checking is worth it.  The
current code doesn't do it, and it's not fixing a problem for you.
It's conceivable that it could even make us trip over a new problem,
e.g., some broken BIOS that we currently tolerate.

I think you could make this a little easier to review if you split
things like the acpi_irq_penalty[] -> acpi_isa_irq_penalty[] rename
into their own patches.  Little patches like that are trivial to
review because a simple rename is pretty safe, and then the patches
that actually *do* interesting things are smaller and easier to
review, too.

On Fri, Apr 08, 2016 at 09:26:30PM -0400, Sinan Kaya wrote:
> Code has been redesigned to calculate penalty requirements on the fly. This
> significantly simplifies the implementation and removes some of the init
> calls from x86 architecture. Command line penalty assignment has been
> limited to ISA interrupts only.
> 
> Signed-off-by: Sinan Kaya 
> ---
>  drivers/acpi/pci_link.c | 176 
> ++--
>  1 file changed, 140 insertions(+), 36 deletions(-)
> 
> diff --git a/drivers/acpi/pci_link.c b/drivers/acpi/pci_link.c
> index ededa90..25695ea 100644
> --- a/drivers/acpi/pci_link.c
> +++ b/drivers/acpi/pci_link.c
> @@ -437,17 +437,15 @@ static int acpi_pci_link_set(struct acpi_pci_link 
> *link, int irq)
>   * enabled system.
>   */
>  
> -#define ACPI_MAX_IRQS256
>  #define ACPI_MAX_ISA_IRQ 16

ACPI_MAX_ISA_IRQ is a bit of a misnomer.  The maximum ISA IRQ is 15,
not 16, so I think this should be named ACPI_MAX_ISA_IRQS.

> -#define PIRQ_PENALTY_PCI_AVAILABLE   (0)
>  #define PIRQ_PENALTY_PCI_POSSIBLE(16*16)
>  #define PIRQ_PENALTY_PCI_USING   (16*16*16)
>  #define PIRQ_PENALTY_ISA_TYPICAL (16*16*16*16)
>  #define PIRQ_PENALTY_ISA_USED(16*16*16*16*16)
>  #define PIRQ_PENALTY_ISA_ALWAYS  (16*16*16*16*16*16)
>  
> -static int acpi_irq_penalty[ACPI_MAX_IRQS] = {
> +static int acpi_isa_irq_penalty[ACPI_MAX_ISA_IRQ] = {
>   PIRQ_PENALTY_ISA_ALWAYS,/* IRQ0 timer */
>   PIRQ_PENALTY_ISA_ALWAYS,/* IRQ1 keyboard */
>   PIRQ_PENALTY_ISA_ALWAYS,/* IRQ2 cascade */
> @@ -457,9 +455,9 @@ static int acpi_irq_penalty[ACPI_MAX_IRQS] = {
>   PIRQ_PENALTY_ISA_TYPICAL,   /* IRQ6 */
>   PIRQ_PENALTY_ISA_TYPICAL,   /* IRQ7 parallel, spurious */
>   PIRQ_PENALTY_ISA_TYPICAL,   /* IRQ8 rtc, sometimes */
> - PIRQ_PENALTY_PCI_AVAILABLE, /* IRQ9  PCI, often acpi */
> - PIRQ_PENALTY_PCI_AVAILABLE, /* IRQ10 PCI */
> - PIRQ_PENALTY_PCI_AVAILABLE, /* IRQ11 PCI */
> + 0,  /* IRQ9  PCI, often acpi */
> + 0,  /* IRQ10 PCI */
> + 0,  /* IRQ11 PCI */
>   PIRQ_PENALTY_ISA_USED,  /* IRQ12 mouse */
>   PIRQ_PENALTY_ISA_USED,  /* IRQ13 fpe, sometimes */
>   PIRQ_PENALTY_ISA_USED,  /* IRQ14 ide0 */
> @@ -467,6 +465,121 @@ static int acpi_irq_penalty[ACPI_MAX_IRQS] = {
>   /* >IRQ15 */
>  };
>  
> +static int acpi_link_trigger(int irq, u8 *polarity, u8 *triggering)
> +{
> + struct acpi_pci_link *link;
> + bool found = false;
> +
> + *polarity = ~0;
> + *triggering = ~0;
> +
> + list_for_each_entry(link, _link_list, list) {
> + int i;
> +
> + if (link->irq.active && link->irq.active == irq) {
> + if (*polarity == ~0)
> + *polarity = link->irq.polarity;
> +
> + if (*triggering == ~0)
> + *triggering = link->irq.triggering;
> +
> + if (*polarity != link->irq.polarity)
> + return -EINVAL;
> +
> + if (*triggering != link->irq.triggering)
> + return -EINVAL;
> +
> + found = true;
> + }
> +
> + for (i = 0; i < link->irq.possible_count; i++)
> + if (link->irq.possible[i] == irq) {
> + if (*polarity == ~0)
> + *polarity = link->irq.polarity;
> +
> + if (*triggering == ~0)
> + *triggering = link->irq.triggering;
> +
> + if (*polarity != link->irq.polarity)
> + return -EINVAL;
> +
> + if (*triggering != link->irq.triggering)
> + return -EINVAL;
> +
> + found = true;
> + }
> + }
>

Re: [PATCH 2/5] cpufreq: Loongson1: Replace kzalloc() with kcalloc()

2016-04-11 Thread Viresh Kumar

On 11-04-16, 19:55, Keguang Zhang wrote:
> From: Kelvin Cheung 
> 
> This patch replaces kzalloc() with kcalloc() when allocating
> frequency table, and remove unnecessary 'out of memory' message.
> 
> Signed-off-by: Kelvin Cheung 
> ---
>  drivers/cpufreq/loongson1-cpufreq.c | 12 
>  1 file changed, 4 insertions(+), 8 deletions(-)

Acked-by: Viresh Kumar 

-- 
viresh

Re: [PATCH 2/5] cpufreq: Loongson1: Replace kzalloc() with kcalloc()

2016-04-11 Thread Viresh Kumar

On 11-04-16, 19:55, Keguang Zhang wrote:
> From: Kelvin Cheung 
> 
> This patch replaces kzalloc() with kcalloc() when allocating
> frequency table, and remove unnecessary 'out of memory' message.
> 
> Signed-off-by: Kelvin Cheung 
> ---
>  drivers/cpufreq/loongson1-cpufreq.c | 12 
>  1 file changed, 4 insertions(+), 8 deletions(-)

Acked-by: Viresh Kumar 

-- 
viresh

Re: [PATCH 3/5] cpufreq: Loongson1: Use dev_get_platdata() to get platform_data

2016-04-11 Thread Viresh Kumar

On 11-04-16, 19:55, Keguang Zhang wrote:
> From: Kelvin Cheung 
> 
> This patch uses dev_get_platdata() to get the platform_data
> instead of referencing it directly.
> 
> Signed-off-by: Kelvin Cheung 
> ---
>  drivers/cpufreq/loongson1-cpufreq.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/cpufreq/loongson1-cpufreq.c 
> b/drivers/cpufreq/loongson1-cpufreq.c
> index 2d83744..f0d40fd 100644
> --- a/drivers/cpufreq/loongson1-cpufreq.c
> +++ b/drivers/cpufreq/loongson1-cpufreq.c
> @@ -134,7 +134,7 @@ static int ls1x_cpufreq_remove(struct platform_device 
> *pdev)
>  
>  static int ls1x_cpufreq_probe(struct platform_device *pdev)
>  {
> - struct plat_ls1x_cpufreq *pdata = pdev->dev.platform_data;
> + struct plat_ls1x_cpufreq *pdata = dev_get_platdata(>dev);
>   struct clk *clk;
>   int ret;

Acked-by: Viresh Kumar 

-- 
viresh

Re: [PATCH 3/5] cpufreq: Loongson1: Use dev_get_platdata() to get platform_data

2016-04-11 Thread Viresh Kumar

On 11-04-16, 19:55, Keguang Zhang wrote:
> From: Kelvin Cheung 
> 
> This patch uses dev_get_platdata() to get the platform_data
> instead of referencing it directly.
> 
> Signed-off-by: Kelvin Cheung 
> ---
>  drivers/cpufreq/loongson1-cpufreq.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/cpufreq/loongson1-cpufreq.c 
> b/drivers/cpufreq/loongson1-cpufreq.c
> index 2d83744..f0d40fd 100644
> --- a/drivers/cpufreq/loongson1-cpufreq.c
> +++ b/drivers/cpufreq/loongson1-cpufreq.c
> @@ -134,7 +134,7 @@ static int ls1x_cpufreq_remove(struct platform_device 
> *pdev)
>  
>  static int ls1x_cpufreq_probe(struct platform_device *pdev)
>  {
> - struct plat_ls1x_cpufreq *pdata = pdev->dev.platform_data;
> + struct plat_ls1x_cpufreq *pdata = dev_get_platdata(>dev);
>   struct clk *clk;
>   int ret;

Acked-by: Viresh Kumar 

-- 
viresh

Re: [PATCH] arm64: CONFIG_DEVPORT should not be used when PCI is being used

2016-04-11 Thread Jon Masters

On 04/07/2016 11:56 AM, Al Stone wrote:

> config DEVPORT
>   bool
>   depends on ISA && PCI
>   default y
> 
> That makes more sense.  Thanks.

I think Itanium does IO ports but not ISA. Probably best to just turn on
IO ports on the three architectures that use them in that case.

Jon.

-- 
Computer Architect

Re: [PATCH 1/5] cpufreq: Loongson1: Rename the file to loongson1-cpufreq.c

2016-04-11 Thread Viresh Kumar

On 11-04-16, 19:55, Keguang Zhang wrote:
> From: Kelvin Cheung 
> 
> This patch renames the file to loongson1-cpufreq.c
> 
> Signed-off-by: Kelvin Cheung 
> ---
>  drivers/cpufreq/Makefile| 2 +-
>  drivers/cpufreq/{ls1x-cpufreq.c => loongson1-cpufreq.c} | 0
>  2 files changed, 1 insertion(+), 1 deletion(-)
>  rename drivers/cpufreq/{ls1x-cpufreq.c => loongson1-cpufreq.c} (100%)
> 
> diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
> index 9e63fb1..bebe9c8 100644
> --- a/drivers/cpufreq/Makefile
> +++ b/drivers/cpufreq/Makefile
> @@ -100,7 +100,7 @@ obj-$(CONFIG_CRIS_MACH_ARTPEC3)   += 
> cris-artpec3-cpufreq.o
>  obj-$(CONFIG_ETRAXFS)+= cris-etraxfs-cpufreq.o
>  obj-$(CONFIG_IA64_ACPI_CPUFREQ)  += ia64-acpi-cpufreq.o
>  obj-$(CONFIG_LOONGSON2_CPUFREQ)  += loongson2_cpufreq.o
> -obj-$(CONFIG_LOONGSON1_CPUFREQ)  += ls1x-cpufreq.o
> +obj-$(CONFIG_LOONGSON1_CPUFREQ)  += loongson1-cpufreq.o
>  obj-$(CONFIG_SH_CPU_FREQ)+= sh-cpufreq.o
>  obj-$(CONFIG_SPARC_US2E_CPUFREQ) += sparc-us2e-cpufreq.o
>  obj-$(CONFIG_SPARC_US3_CPUFREQ)  += sparc-us3-cpufreq.o
> diff --git a/drivers/cpufreq/ls1x-cpufreq.c 
> b/drivers/cpufreq/loongson1-cpufreq.c
> similarity index 100%
> rename from drivers/cpufreq/ls1x-cpufreq.c
> rename to drivers/cpufreq/loongson1-cpufreq.c

Acked-by: Viresh Kumar 

-- 
viresh

Re: [PATCH] arm64: CONFIG_DEVPORT should not be used when PCI is being used

2016-04-11 Thread Jon Masters

On 04/07/2016 11:56 AM, Al Stone wrote:

> config DEVPORT
>   bool
>   depends on ISA && PCI
>   default y
> 
> That makes more sense.  Thanks.

I think Itanium does IO ports but not ISA. Probably best to just turn on
IO ports on the three architectures that use them in that case.

Jon.

-- 
Computer Architect

Re: [PATCH 1/5] cpufreq: Loongson1: Rename the file to loongson1-cpufreq.c

2016-04-11 Thread Viresh Kumar

On 11-04-16, 19:55, Keguang Zhang wrote:
> From: Kelvin Cheung 
> 
> This patch renames the file to loongson1-cpufreq.c
> 
> Signed-off-by: Kelvin Cheung 
> ---
>  drivers/cpufreq/Makefile| 2 +-
>  drivers/cpufreq/{ls1x-cpufreq.c => loongson1-cpufreq.c} | 0
>  2 files changed, 1 insertion(+), 1 deletion(-)
>  rename drivers/cpufreq/{ls1x-cpufreq.c => loongson1-cpufreq.c} (100%)
> 
> diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
> index 9e63fb1..bebe9c8 100644
> --- a/drivers/cpufreq/Makefile
> +++ b/drivers/cpufreq/Makefile
> @@ -100,7 +100,7 @@ obj-$(CONFIG_CRIS_MACH_ARTPEC3)   += 
> cris-artpec3-cpufreq.o
>  obj-$(CONFIG_ETRAXFS)+= cris-etraxfs-cpufreq.o
>  obj-$(CONFIG_IA64_ACPI_CPUFREQ)  += ia64-acpi-cpufreq.o
>  obj-$(CONFIG_LOONGSON2_CPUFREQ)  += loongson2_cpufreq.o
> -obj-$(CONFIG_LOONGSON1_CPUFREQ)  += ls1x-cpufreq.o
> +obj-$(CONFIG_LOONGSON1_CPUFREQ)  += loongson1-cpufreq.o
>  obj-$(CONFIG_SH_CPU_FREQ)+= sh-cpufreq.o
>  obj-$(CONFIG_SPARC_US2E_CPUFREQ) += sparc-us2e-cpufreq.o
>  obj-$(CONFIG_SPARC_US3_CPUFREQ)  += sparc-us3-cpufreq.o
> diff --git a/drivers/cpufreq/ls1x-cpufreq.c 
> b/drivers/cpufreq/loongson1-cpufreq.c
> similarity index 100%
> rename from drivers/cpufreq/ls1x-cpufreq.c
> rename to drivers/cpufreq/loongson1-cpufreq.c

Acked-by: Viresh Kumar 

-- 
viresh

Re: [RESEND] fence: add missing descriptions for fence

2016-04-11 Thread Sumit Semwal

Hi Luis,


On 12 April 2016 at 04:03, Luis de Bethencourt  wrote:
> On 11/04/16 21:09, Gustavo Padovan wrote:
>> Hi Luis,
>>
>> 2016-04-11 Luis de Bethencourt :
>>
>>> The members child_list and active_list were added to the fence struct
>>> without descriptions for the Documentation. Adding these.
>>>
Thanks for the patch; will get it queued for for-next.

>>> Fixes: b55b54b5db33 ("staging/android: remove struct sync_pt")
>>> Signed-off-by: Luis de Bethencourt 
>>> Reviewed-by: Javier Martinez Canillas 
>>> ---
>>> Hi,
>>>
>>> Just resending this patch since it hasn't had any reviews in since
>>> March 21st.
>>>
>>> Thanks,
>>> Luis
>>>
>>>  include/linux/fence.h | 2 ++
>>>  1 file changed, 2 insertions(+)
>>
>> Reviewed-by: Gustavo Padovan 
>>
>>   Gustavo
>>
>
> Thank you Gustavo.
>
> Nice seeing you around here :)
>
> Luis

BR,
Sumit.

Re: [RESEND] fence: add missing descriptions for fence

2016-04-11 Thread Sumit Semwal

Hi Luis,


On 12 April 2016 at 04:03, Luis de Bethencourt  wrote:
> On 11/04/16 21:09, Gustavo Padovan wrote:
>> Hi Luis,
>>
>> 2016-04-11 Luis de Bethencourt :
>>
>>> The members child_list and active_list were added to the fence struct
>>> without descriptions for the Documentation. Adding these.
>>>
Thanks for the patch; will get it queued for for-next.

>>> Fixes: b55b54b5db33 ("staging/android: remove struct sync_pt")
>>> Signed-off-by: Luis de Bethencourt 
>>> Reviewed-by: Javier Martinez Canillas 
>>> ---
>>> Hi,
>>>
>>> Just resending this patch since it hasn't had any reviews in since
>>> March 21st.
>>>
>>> Thanks,
>>> Luis
>>>
>>>  include/linux/fence.h | 2 ++
>>>  1 file changed, 2 insertions(+)
>>
>> Reviewed-by: Gustavo Padovan 
>>
>>   Gustavo
>>
>
> Thank you Gustavo.
>
> Nice seeing you around here :)
>
> Luis

BR,
Sumit.

Re: [PATCH v6 0/3] thermal: mediatek: Add cpu dynamic power cooling model.

2016-04-11 Thread Viresh Kumar

On 12-04-16, 10:32, dawei chien wrote:
> On Tue, 2016-03-22 at 13:13 +0800, dawei chien wrote:
> > On Tue, 2016-03-15 at 13:17 +0700, Viresh Kumar wrote:
> > > Its Rafael, who is going to apply this one.
> > > 
> > > Can you please resend it as he may not have it in patchworks?
> > > 
> > 
> > Hi Rafael,
> > Would you merge this patch to your tree, thank you.
> > 
> > BR,
> > Dawei
> 
> Hi Rafael,
> Would you please merge this patch, or please kindly let me know for any
> problem, thank you.

Didn't I ask you earlier to resend this patch as Rafael wouldn't have
it in his queue now ?

Please resend it and that will make it earlier for Rafael to get it
applied.

-- 
viresh

Re: [PATCH v6 0/3] thermal: mediatek: Add cpu dynamic power cooling model.

2016-04-11 Thread Viresh Kumar

On 12-04-16, 10:32, dawei chien wrote:
> On Tue, 2016-03-22 at 13:13 +0800, dawei chien wrote:
> > On Tue, 2016-03-15 at 13:17 +0700, Viresh Kumar wrote:
> > > Its Rafael, who is going to apply this one.
> > > 
> > > Can you please resend it as he may not have it in patchworks?
> > > 
> > 
> > Hi Rafael,
> > Would you merge this patch to your tree, thank you.
> > 
> > BR,
> > Dawei
> 
> Hi Rafael,
> Would you please merge this patch, or please kindly let me know for any
> problem, thank you.

Didn't I ask you earlier to resend this patch as Rafael wouldn't have
it in his queue now ?

Please resend it and that will make it earlier for Rafael to get it
applied.

-- 
viresh

Re: [PATCH v1] clk: Add clk_composite_set_rate_and_parent

2016-04-11 Thread Heiko Stuebner

Hi Finley,

Am Montag, 11. April 2016, 09:54:12 schrieb Finlye Xiao:
> From: Finley Xiao 
> 
> Some of Rockchip's clocks should consider the priority of .set_parent
> and .set_rate to prevent a too large temporary clock rate.
> 
> For example, the gpu clock can be parented to cpll(750MHz) and
> usbphy_480m(480MHz), 375MHz comes from cpll and the div is set
> to 2, 480MHz comes from usbphy_480m and the div is set to 1.
> 
> From the code, when change rate from 480MHz to 375MHz, it changes
> the gpu's parent from USBPHY_480M to cpll first(.set_parent), but the
> div value is still 1 and the gpu's rate will be 750MHz at the moment,
> then it changes the div value from 1 to 2(.set_rate) and the gpu's
> rate will be changed to 375MHz(480MHZ->750MHz->375MHz), here temporary
> rate is 750MHz, the voltage which supply for 480MHz certainly can not
> supply for 750MHz, so the gpu will crash.

We did talk about this internally and while we refined the actual code 
change, it seems I forgot to look at the commit message itself.

This behaviour (and the wish to not overflow a target clock rate) should be 
the same on all socs, so the commit message is quite a bit to Rockchip 
specific. I think I would go with something like:

--- 8< 
When changing the clock-rate, currently a new parent is set first and a 
divider adapted thereafter. This may result in the clock-rate overflowing
its target rate for a short time if the new parent has a higher rate than
the old parent.

While this often doesn't produce negative effects, it can affect components
in a voltage-scaling environment, like the GPU on the rk3399 socs, where the
voltage than simply is to low for the temporarily to high clock rate.

For general clock hirarchies this may need more extensive adaptions to
the common clock-framework, but at least for composite clocks having
both parent and rate settings it is easy to create a short-term solution to
make sure the clock-rate does not overflow the target.
--- 8< 

But of course feel free to extend or change that as you wish ;-) .


> 
> Signed-off-by: Finley Xiao 

I remember having clocks not overflow their target rate came up in some ELC 
talk last week (probably in Stephens Qualcomm-kernel-talk) and a general 
solution might need some changes closer to the core.

But at least for composite clocks where we can control the rate+parent 
process easily, Finley's change is a nice low-hanging fruit which at least 
improves behaviour for those clock-types in the short term, so

Reviewed-by: Heiko Stuebner 


Heiko

Re: [PATCH v1] clk: Add clk_composite_set_rate_and_parent

2016-04-11 Thread Heiko Stuebner

Hi Finley,

Am Montag, 11. April 2016, 09:54:12 schrieb Finlye Xiao:
> From: Finley Xiao 
> 
> Some of Rockchip's clocks should consider the priority of .set_parent
> and .set_rate to prevent a too large temporary clock rate.
> 
> For example, the gpu clock can be parented to cpll(750MHz) and
> usbphy_480m(480MHz), 375MHz comes from cpll and the div is set
> to 2, 480MHz comes from usbphy_480m and the div is set to 1.
> 
> From the code, when change rate from 480MHz to 375MHz, it changes
> the gpu's parent from USBPHY_480M to cpll first(.set_parent), but the
> div value is still 1 and the gpu's rate will be 750MHz at the moment,
> then it changes the div value from 1 to 2(.set_rate) and the gpu's
> rate will be changed to 375MHz(480MHZ->750MHz->375MHz), here temporary
> rate is 750MHz, the voltage which supply for 480MHz certainly can not
> supply for 750MHz, so the gpu will crash.

We did talk about this internally and while we refined the actual code 
change, it seems I forgot to look at the commit message itself.

This behaviour (and the wish to not overflow a target clock rate) should be 
the same on all socs, so the commit message is quite a bit to Rockchip 
specific. I think I would go with something like:

--- 8< 
When changing the clock-rate, currently a new parent is set first and a 
divider adapted thereafter. This may result in the clock-rate overflowing
its target rate for a short time if the new parent has a higher rate than
the old parent.

While this often doesn't produce negative effects, it can affect components
in a voltage-scaling environment, like the GPU on the rk3399 socs, where the
voltage than simply is to low for the temporarily to high clock rate.

For general clock hirarchies this may need more extensive adaptions to
the common clock-framework, but at least for composite clocks having
both parent and rate settings it is easy to create a short-term solution to
make sure the clock-rate does not overflow the target.
--- 8< 

But of course feel free to extend or change that as you wish ;-) .


> 
> Signed-off-by: Finley Xiao 

I remember having clocks not overflow their target rate came up in some ELC 
talk last week (probably in Stephens Qualcomm-kernel-talk) and a general 
solution might need some changes closer to the core.

But at least for composite clocks where we can control the rate+parent 
process easily, Finley's change is a nice low-hanging fruit which at least 
improves behaviour for those clock-types in the short term, so

Reviewed-by: Heiko Stuebner 


Heiko

Re: [PATCH RESEND] sst: fix missing breaks that would cause the wrong operation to execute

2016-04-11 Thread Vinod Koul

On Thu, Apr 07, 2016 at 09:30:44AM +0800, Yang Jie wrote:
> 
> 
> On 2016年04月06日 21:44, Alan wrote:
> >From: Alan 
> >
> >Now we correctly error an attempt to execute an unsupported operation. The
> >current code does something else random.

Hi Alan,

Can you either send me this patch or send this to ALSA mailing list.

You might want to add the subsystem name which here should be:

ASoC: Intel: atom: fix missing breaks that would cause the wrong operation
to execute

And you can add my Ack :)

Acked-by: Vinod Koul 

Thanks
-- 
~Vinod

> >
> >Signed-off-by: Alan Cox 
> 
> I think this is nice fix. + Vinod who is the owner of the atom sound driver.
> 
> Thanks,
> ~Keyon
> 
> >---
> >  sound/soc/intel/atom/sst-mfld-platform-compress.c |9 +++--
> >  1 file changed, 7 insertions(+), 2 deletions(-)
> >
> >diff --git a/sound/soc/intel/atom/sst-mfld-platform-compress.c 
> >b/sound/soc/intel/atom/sst-mfld-platform-compress.c
> >index 3951689..1bead81 100644
> >--- a/sound/soc/intel/atom/sst-mfld-platform-compress.c
> >+++ b/sound/soc/intel/atom/sst-mfld-platform-compress.c
> >@@ -182,24 +182,29 @@ static int sst_platform_compr_trigger(struct 
> >snd_compr_stream *cstream, int cmd)
> > case SNDRV_PCM_TRIGGER_START:
> > if (stream->compr_ops->stream_start)
> > return stream->compr_ops->stream_start(sst->dev, 
> > stream->id);
> >+break;
> > case SNDRV_PCM_TRIGGER_STOP:
> > if (stream->compr_ops->stream_drop)
> > return stream->compr_ops->stream_drop(sst->dev, 
> > stream->id);
> >+break;
> > case SND_COMPR_TRIGGER_DRAIN:
> > if (stream->compr_ops->stream_drain)
> > return stream->compr_ops->stream_drain(sst->dev, 
> > stream->id);
> >+break;
> > case SND_COMPR_TRIGGER_PARTIAL_DRAIN:
> > if (stream->compr_ops->stream_partial_drain)
> > return 
> > stream->compr_ops->stream_partial_drain(sst->dev, stream->id);
> >+break;
> > case SNDRV_PCM_TRIGGER_PAUSE_PUSH:
> > if (stream->compr_ops->stream_pause)
> > return stream->compr_ops->stream_pause(sst->dev, 
> > stream->id);
> >+break;
> > case SNDRV_PCM_TRIGGER_PAUSE_RELEASE:
> > if (stream->compr_ops->stream_pause_release)
> > return 
> > stream->compr_ops->stream_pause_release(sst->dev, stream->id);
> >-default:
> >-return -EINVAL;
> >+break;
> > }
> >+return -EINVAL;
> >  }
> >
> >  static int sst_platform_compr_pointer(struct snd_compr_stream *cstream,
> >
> >

Re: [PATCH RESEND] sst: fix missing breaks that would cause the wrong operation to execute

2016-04-11 Thread Vinod Koul

On Thu, Apr 07, 2016 at 09:30:44AM +0800, Yang Jie wrote:
> 
> 
> On 2016年04月06日 21:44, Alan wrote:
> >From: Alan 
> >
> >Now we correctly error an attempt to execute an unsupported operation. The
> >current code does something else random.

Hi Alan,

Can you either send me this patch or send this to ALSA mailing list.

You might want to add the subsystem name which here should be:

ASoC: Intel: atom: fix missing breaks that would cause the wrong operation
to execute

And you can add my Ack :)

Acked-by: Vinod Koul 

Thanks
-- 
~Vinod

> >
> >Signed-off-by: Alan Cox 
> 
> I think this is nice fix. + Vinod who is the owner of the atom sound driver.
> 
> Thanks,
> ~Keyon
> 
> >---
> >  sound/soc/intel/atom/sst-mfld-platform-compress.c |9 +++--
> >  1 file changed, 7 insertions(+), 2 deletions(-)
> >
> >diff --git a/sound/soc/intel/atom/sst-mfld-platform-compress.c 
> >b/sound/soc/intel/atom/sst-mfld-platform-compress.c
> >index 3951689..1bead81 100644
> >--- a/sound/soc/intel/atom/sst-mfld-platform-compress.c
> >+++ b/sound/soc/intel/atom/sst-mfld-platform-compress.c
> >@@ -182,24 +182,29 @@ static int sst_platform_compr_trigger(struct 
> >snd_compr_stream *cstream, int cmd)
> > case SNDRV_PCM_TRIGGER_START:
> > if (stream->compr_ops->stream_start)
> > return stream->compr_ops->stream_start(sst->dev, 
> > stream->id);
> >+break;
> > case SNDRV_PCM_TRIGGER_STOP:
> > if (stream->compr_ops->stream_drop)
> > return stream->compr_ops->stream_drop(sst->dev, 
> > stream->id);
> >+break;
> > case SND_COMPR_TRIGGER_DRAIN:
> > if (stream->compr_ops->stream_drain)
> > return stream->compr_ops->stream_drain(sst->dev, 
> > stream->id);
> >+break;
> > case SND_COMPR_TRIGGER_PARTIAL_DRAIN:
> > if (stream->compr_ops->stream_partial_drain)
> > return 
> > stream->compr_ops->stream_partial_drain(sst->dev, stream->id);
> >+break;
> > case SNDRV_PCM_TRIGGER_PAUSE_PUSH:
> > if (stream->compr_ops->stream_pause)
> > return stream->compr_ops->stream_pause(sst->dev, 
> > stream->id);
> >+break;
> > case SNDRV_PCM_TRIGGER_PAUSE_RELEASE:
> > if (stream->compr_ops->stream_pause_release)
> > return 
> > stream->compr_ops->stream_pause_release(sst->dev, stream->id);
> >-default:
> >-return -EINVAL;
> >+break;
> > }
> >+return -EINVAL;
> >  }
> >
> >  static int sst_platform_compr_pointer(struct snd_compr_stream *cstream,
> >
> >

Re: [RFC][PATCH 2/3] locking/qrwlock: Use smp_cond_load_acquire()

2016-04-11 Thread Davidlohr Bueso


On Mon, 04 Apr 2016, Peter Zijlstra wrote:


Use smp_cond_load_acquire() to make better use of the hardware
assisted 'spin' wait on arm64.

Arguably the second hunk is the more horrid abuse possible, but
avoids having to use cmpwait (see next patch) directly. Also, this
makes 'clever' (ab)use of the cond+rmb acquire to omit the acquire
from cmpxchg().

Signed-off-by: Peter Zijlstra (Intel) 
---
kernel/locking/qrwlock.c |   18 --
1 file changed, 4 insertions(+), 14 deletions(-)

--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c
@@ -53,10 +53,7 @@ struct __qrwlock {
static __always_inline void
rspin_until_writer_unlock(struct qrwlock *lock, u32 cnts)
{
-   while ((cnts & _QW_WMASK) == _QW_LOCKED) {
-   cpu_relax_lowlatency();
-   cnts = atomic_read_acquire(>cnts);
-   }
+   smp_cond_load_acquire(>cnts.counter, (VAL & _QW_WMASK) != 
_QW_LOCKED);
}

/**
@@ -109,8 +106,6 @@ EXPORT_SYMBOL(queued_read_lock_slowpath)
 */
void queued_write_lock_slowpath(struct qrwlock *lock)
{
-   u32 cnts;
-
/* Put the writer into the wait queue */
arch_spin_lock(>wait_lock);

@@ -134,15 +129,10 @@ void queued_write_lock_slowpath(struct q
}

/* When no more readers, set the locked flag */
-   for (;;) {
-   cnts = atomic_read(>cnts);
-   if ((cnts == _QW_WAITING) &&
-   (atomic_cmpxchg_acquire(>cnts, _QW_WAITING,
-   _QW_LOCKED) == _QW_WAITING))
-   break;
+   smp_cond_load_acquire(>cnts.counter,
+   (VAL == _QW_WAITING) &&
+   atomic_cmpxchg_relaxed(>cnts, _QW_WAITING, _QW_LOCKED) == 
_QW_WAITING);

-   cpu_relax_lowlatency();


You would need some variant for cpu_relax_lowlatency otherwise you'll be 
hurting s390, no?
fwiw back when I was looking at this, I recall thinking about possibly 
introducing
smp_cond_acquire_lowlatency but never got around to it.

Thanks,
Davidlohr

Re: [RFC][PATCH 2/3] locking/qrwlock: Use smp_cond_load_acquire()

2016-04-11 Thread Davidlohr Bueso


On Mon, 04 Apr 2016, Peter Zijlstra wrote:


Use smp_cond_load_acquire() to make better use of the hardware
assisted 'spin' wait on arm64.

Arguably the second hunk is the more horrid abuse possible, but
avoids having to use cmpwait (see next patch) directly. Also, this
makes 'clever' (ab)use of the cond+rmb acquire to omit the acquire
from cmpxchg().

Signed-off-by: Peter Zijlstra (Intel) 
---
kernel/locking/qrwlock.c |   18 --
1 file changed, 4 insertions(+), 14 deletions(-)

--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c
@@ -53,10 +53,7 @@ struct __qrwlock {
static __always_inline void
rspin_until_writer_unlock(struct qrwlock *lock, u32 cnts)
{
-   while ((cnts & _QW_WMASK) == _QW_LOCKED) {
-   cpu_relax_lowlatency();
-   cnts = atomic_read_acquire(>cnts);
-   }
+   smp_cond_load_acquire(>cnts.counter, (VAL & _QW_WMASK) != 
_QW_LOCKED);
}

/**
@@ -109,8 +106,6 @@ EXPORT_SYMBOL(queued_read_lock_slowpath)
 */
void queued_write_lock_slowpath(struct qrwlock *lock)
{
-   u32 cnts;
-
/* Put the writer into the wait queue */
arch_spin_lock(>wait_lock);

@@ -134,15 +129,10 @@ void queued_write_lock_slowpath(struct q
}

/* When no more readers, set the locked flag */
-   for (;;) {
-   cnts = atomic_read(>cnts);
-   if ((cnts == _QW_WAITING) &&
-   (atomic_cmpxchg_acquire(>cnts, _QW_WAITING,
-   _QW_LOCKED) == _QW_WAITING))
-   break;
+   smp_cond_load_acquire(>cnts.counter,
+   (VAL == _QW_WAITING) &&
+   atomic_cmpxchg_relaxed(>cnts, _QW_WAITING, _QW_LOCKED) == 
_QW_WAITING);

-   cpu_relax_lowlatency();


You would need some variant for cpu_relax_lowlatency otherwise you'll be 
hurting s390, no?
fwiw back when I was looking at this, I recall thinking about possibly 
introducing
smp_cond_acquire_lowlatency but never got around to it.

Thanks,
Davidlohr

[PATCH v2 03/11] mm/slab: drain the free slab as much as possible

2016-04-11 Thread js1304

From: Joonsoo Kim 

slabs_tofree() implies freeing all free slab. We can do it with
just providing INT_MAX.

Acked-by: Christoph Lameter 
Signed-off-by: Joonsoo Kim 
---
 mm/slab.c | 12 +++-
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 373b8be..5451929 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -888,12 +888,6 @@ static int init_cache_node_node(int node)
return 0;
 }
 
-static inline int slabs_tofree(struct kmem_cache *cachep,
-   struct kmem_cache_node *n)
-{
-   return (n->free_objects + cachep->num - 1) / cachep->num;
-}
-
 static void cpuup_canceled(long cpu)
 {
struct kmem_cache *cachep;
@@ -958,7 +952,7 @@ free_slab:
n = get_node(cachep, node);
if (!n)
continue;
-   drain_freelist(cachep, n, slabs_tofree(cachep, n));
+   drain_freelist(cachep, n, INT_MAX);
}
 }
 
@@ -1110,7 +1104,7 @@ static int __meminit drain_cache_node_node(int node)
if (!n)
continue;
 
-   drain_freelist(cachep, n, slabs_tofree(cachep, n));
+   drain_freelist(cachep, n, INT_MAX);
 
if (!list_empty(>slabs_full) ||
!list_empty(>slabs_partial)) {
@@ -2304,7 +2298,7 @@ int __kmem_cache_shrink(struct kmem_cache *cachep, bool 
deactivate)
 
check_irq_on();
for_each_kmem_cache_node(cachep, node, n) {
-   drain_freelist(cachep, n, slabs_tofree(cachep, n));
+   drain_freelist(cachep, n, INT_MAX);
 
ret += !list_empty(>slabs_full) ||
!list_empty(>slabs_partial);
-- 
1.9.1

[PATCH v2 04/11] mm/slab: factor out kmem_cache_node initialization code

2016-04-11 Thread js1304

From: Joonsoo Kim 

It can be reused on other place, so factor out it.  Following patch will
use it.

Signed-off-by: Joonsoo Kim 
---
 mm/slab.c | 68 ---
 1 file changed, 39 insertions(+), 29 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 5451929..49af685 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -841,6 +841,40 @@ static inline gfp_t gfp_exact_node(gfp_t flags)
 }
 #endif
 
+static int init_cache_node(struct kmem_cache *cachep, int node, gfp_t gfp)
+{
+   struct kmem_cache_node *n;
+
+   /*
+* Set up the kmem_cache_node for cpu before we can
+* begin anything. Make sure some other cpu on this
+* node has not already allocated this
+*/
+   n = get_node(cachep, node);
+   if (n)
+   return 0;
+
+   n = kmalloc_node(sizeof(struct kmem_cache_node), gfp, node);
+   if (!n)
+   return -ENOMEM;
+
+   kmem_cache_node_init(n);
+   n->next_reap = jiffies + REAPTIMEOUT_NODE +
+   ((unsigned long)cachep) % REAPTIMEOUT_NODE;
+
+   n->free_limit =
+   (1 + nr_cpus_node(node)) * cachep->batchcount + cachep->num;
+
+   /*
+* The kmem_cache_nodes don't come and go as CPUs
+* come and go.  slab_mutex is sufficient
+* protection here.
+*/
+   cachep->node[node] = n;
+
+   return 0;
+}
+
 /*
  * Allocates and initializes node for a node on each slab cache, used for
  * either memory or cpu hotplug.  If memory is being hot-added, the 
kmem_cache_node
@@ -852,39 +886,15 @@ static inline gfp_t gfp_exact_node(gfp_t flags)
  */
 static int init_cache_node_node(int node)
 {
+   int ret;
struct kmem_cache *cachep;
-   struct kmem_cache_node *n;
-   const size_t memsize = sizeof(struct kmem_cache_node);
 
list_for_each_entry(cachep, _caches, list) {
-   /*
-* Set up the kmem_cache_node for cpu before we can
-* begin anything. Make sure some other cpu on this
-* node has not already allocated this
-*/
-   n = get_node(cachep, node);
-   if (!n) {
-   n = kmalloc_node(memsize, GFP_KERNEL, node);
-   if (!n)
-   return -ENOMEM;
-   kmem_cache_node_init(n);
-   n->next_reap = jiffies + REAPTIMEOUT_NODE +
-   ((unsigned long)cachep) % REAPTIMEOUT_NODE;
-
-   /*
-* The kmem_cache_nodes don't come and go as CPUs
-* come and go.  slab_mutex is sufficient
-* protection here.
-*/
-   cachep->node[node] = n;
-   }
-
-   spin_lock_irq(>list_lock);
-   n->free_limit =
-   (1 + nr_cpus_node(node)) *
-   cachep->batchcount + cachep->num;
-   spin_unlock_irq(>list_lock);
+   ret = init_cache_node(cachep, node, GFP_KERNEL);
+   if (ret)
+   return ret;
}
+
return 0;
 }
 
-- 
1.9.1

[PATCH v2 03/11] mm/slab: drain the free slab as much as possible

2016-04-11 Thread js1304

From: Joonsoo Kim 

slabs_tofree() implies freeing all free slab. We can do it with
just providing INT_MAX.

Acked-by: Christoph Lameter 
Signed-off-by: Joonsoo Kim 
---
 mm/slab.c | 12 +++-
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 373b8be..5451929 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -888,12 +888,6 @@ static int init_cache_node_node(int node)
return 0;
 }
 
-static inline int slabs_tofree(struct kmem_cache *cachep,
-   struct kmem_cache_node *n)
-{
-   return (n->free_objects + cachep->num - 1) / cachep->num;
-}
-
 static void cpuup_canceled(long cpu)
 {
struct kmem_cache *cachep;
@@ -958,7 +952,7 @@ free_slab:
n = get_node(cachep, node);
if (!n)
continue;
-   drain_freelist(cachep, n, slabs_tofree(cachep, n));
+   drain_freelist(cachep, n, INT_MAX);
}
 }
 
@@ -1110,7 +1104,7 @@ static int __meminit drain_cache_node_node(int node)
if (!n)
continue;
 
-   drain_freelist(cachep, n, slabs_tofree(cachep, n));
+   drain_freelist(cachep, n, INT_MAX);
 
if (!list_empty(>slabs_full) ||
!list_empty(>slabs_partial)) {
@@ -2304,7 +2298,7 @@ int __kmem_cache_shrink(struct kmem_cache *cachep, bool 
deactivate)
 
check_irq_on();
for_each_kmem_cache_node(cachep, node, n) {
-   drain_freelist(cachep, n, slabs_tofree(cachep, n));
+   drain_freelist(cachep, n, INT_MAX);
 
ret += !list_empty(>slabs_full) ||
!list_empty(>slabs_partial);
-- 
1.9.1

[PATCH v2 04/11] mm/slab: factor out kmem_cache_node initialization code

2016-04-11 Thread js1304

From: Joonsoo Kim 

It can be reused on other place, so factor out it.  Following patch will
use it.

Signed-off-by: Joonsoo Kim 
---
 mm/slab.c | 68 ---
 1 file changed, 39 insertions(+), 29 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 5451929..49af685 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -841,6 +841,40 @@ static inline gfp_t gfp_exact_node(gfp_t flags)
 }
 #endif
 
+static int init_cache_node(struct kmem_cache *cachep, int node, gfp_t gfp)
+{
+   struct kmem_cache_node *n;
+
+   /*
+* Set up the kmem_cache_node for cpu before we can
+* begin anything. Make sure some other cpu on this
+* node has not already allocated this
+*/
+   n = get_node(cachep, node);
+   if (n)
+   return 0;
+
+   n = kmalloc_node(sizeof(struct kmem_cache_node), gfp, node);
+   if (!n)
+   return -ENOMEM;
+
+   kmem_cache_node_init(n);
+   n->next_reap = jiffies + REAPTIMEOUT_NODE +
+   ((unsigned long)cachep) % REAPTIMEOUT_NODE;
+
+   n->free_limit =
+   (1 + nr_cpus_node(node)) * cachep->batchcount + cachep->num;
+
+   /*
+* The kmem_cache_nodes don't come and go as CPUs
+* come and go.  slab_mutex is sufficient
+* protection here.
+*/
+   cachep->node[node] = n;
+
+   return 0;
+}
+
 /*
  * Allocates and initializes node for a node on each slab cache, used for
  * either memory or cpu hotplug.  If memory is being hot-added, the 
kmem_cache_node
@@ -852,39 +886,15 @@ static inline gfp_t gfp_exact_node(gfp_t flags)
  */
 static int init_cache_node_node(int node)
 {
+   int ret;
struct kmem_cache *cachep;
-   struct kmem_cache_node *n;
-   const size_t memsize = sizeof(struct kmem_cache_node);
 
list_for_each_entry(cachep, _caches, list) {
-   /*
-* Set up the kmem_cache_node for cpu before we can
-* begin anything. Make sure some other cpu on this
-* node has not already allocated this
-*/
-   n = get_node(cachep, node);
-   if (!n) {
-   n = kmalloc_node(memsize, GFP_KERNEL, node);
-   if (!n)
-   return -ENOMEM;
-   kmem_cache_node_init(n);
-   n->next_reap = jiffies + REAPTIMEOUT_NODE +
-   ((unsigned long)cachep) % REAPTIMEOUT_NODE;
-
-   /*
-* The kmem_cache_nodes don't come and go as CPUs
-* come and go.  slab_mutex is sufficient
-* protection here.
-*/
-   cachep->node[node] = n;
-   }
-
-   spin_lock_irq(>list_lock);
-   n->free_limit =
-   (1 + nr_cpus_node(node)) *
-   cachep->batchcount + cachep->num;
-   spin_unlock_irq(>list_lock);
+   ret = init_cache_node(cachep, node, GFP_KERNEL);
+   if (ret)
+   return ret;
}
+
return 0;
 }
 
-- 
1.9.1

[PATCH v2 10/11] mm/slab: refill cpu cache through a new slab without holding a node lock

2016-04-11 Thread js1304

From: Joonsoo Kim 

Until now, cache growing makes a free slab on node's slab list and then we
can allocate free objects from it.  This necessarily requires to hold a
node lock which is very contended.  If we refill cpu cache before
attaching it to node's slab list, we can avoid holding a node lock as much
as possible because this newly allocated slab is only visible to the
current task.  This will reduce lock contention.

Below is the result of concurrent allocation/free in slab allocation
benchmark made by Christoph a long time ago.  I make the output simpler.
The number shows cycle count during alloc/free respectively so less is
better.

* Before
Kmalloc N*alloc N*free(32): Average=355/750
Kmalloc N*alloc N*free(64): Average=452/812
Kmalloc N*alloc N*free(128): Average=559/1070
Kmalloc N*alloc N*free(256): Average=1176/980
Kmalloc N*alloc N*free(512): Average=1939/1189
Kmalloc N*alloc N*free(1024): Average=3521/1278
Kmalloc N*alloc N*free(2048): Average=7152/1838
Kmalloc N*alloc N*free(4096): Average=13438/2013

* After
Kmalloc N*alloc N*free(32): Average=248/966
Kmalloc N*alloc N*free(64): Average=261/949
Kmalloc N*alloc N*free(128): Average=314/1016
Kmalloc N*alloc N*free(256): Average=741/1061
Kmalloc N*alloc N*free(512): Average=1246/1152
Kmalloc N*alloc N*free(1024): Average=2437/1259
Kmalloc N*alloc N*free(2048): Average=4980/1800
Kmalloc N*alloc N*free(4096): Average=9000/2078

It shows that contention is reduced for all the object sizes and
performance increases by 30 ~ 40%.

Signed-off-by: Joonsoo Kim 
---
 mm/slab.c | 68 +--
 1 file changed, 36 insertions(+), 32 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 2c28ad5..cf12fbd 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2852,6 +2852,30 @@ static noinline void *cache_alloc_pfmemalloc(struct 
kmem_cache *cachep,
return obj;
 }
 
+/*
+ * Slab list should be fixed up by fixup_slab_list() for existing slab
+ * or cache_grow_end() for new slab
+ */
+static __always_inline int alloc_block(struct kmem_cache *cachep,
+   struct array_cache *ac, struct page *page, int batchcount)
+{
+   /*
+* There must be at least one object available for
+* allocation.
+*/
+   BUG_ON(page->active >= cachep->num);
+
+   while (page->active < cachep->num && batchcount--) {
+   STATS_INC_ALLOCED(cachep);
+   STATS_INC_ACTIVE(cachep);
+   STATS_SET_HIGH(cachep);
+
+   ac->entry[ac->avail++] = slab_get_obj(cachep, page);
+   }
+
+   return batchcount;
+}
+
 static void *cache_alloc_refill(struct kmem_cache *cachep, gfp_t flags)
 {
int batchcount;
@@ -2864,7 +2888,6 @@ static void *cache_alloc_refill(struct kmem_cache 
*cachep, gfp_t flags)
check_irq_off();
node = numa_mem_id();
 
-retry:
ac = cpu_cache_get(cachep);
batchcount = ac->batchcount;
if (!ac->touched && batchcount > BATCHREFILL_LIMIT) {
@@ -2894,21 +2917,7 @@ retry:
 
check_spinlock_acquired(cachep);
 
-   /*
-* The slab was either on partial or free list so
-* there must be at least one object available for
-* allocation.
-*/
-   BUG_ON(page->active >= cachep->num);
-
-   while (page->active < cachep->num && batchcount--) {
-   STATS_INC_ALLOCED(cachep);
-   STATS_INC_ACTIVE(cachep);
-   STATS_SET_HIGH(cachep);
-
-   ac->entry[ac->avail++] = slab_get_obj(cachep, page);
-   }
-
+   batchcount = alloc_block(cachep, ac, page, batchcount);
fixup_slab_list(cachep, n, page, );
}
 
@@ -2928,21 +2937,18 @@ alloc_done:
}
 
page = cache_grow_begin(cachep, gfp_exact_node(flags), node);
-   cache_grow_end(cachep, page);
 
/*
 * cache_grow_begin() can reenable interrupts,
 * then ac could change.
 */
ac = cpu_cache_get(cachep);
-   node = numa_mem_id();
+   if (!ac->avail && page)
+   alloc_block(cachep, ac, page, batchcount);
+   cache_grow_end(cachep, page);
 
-   /* no objects in sight? abort */
-   if (!page && ac->avail == 0)
+   if (!ac->avail)
return NULL;
-
-   if (!ac->avail) /* objects refilled by interrupt? */
-   goto retry;
}
ac->touched = 1;
 
@@ -3136,14 +3142,13 @@ static void *cache_alloc_node(struct kmem_cache 
*cachep, gfp_t flags,
 {
struct page *page;
struct kmem_cache_node *n;
-   void *obj;
+   void *obj = NULL;
void *list = NULL;
 
VM_BUG_ON(nodeid < 0 ||

[PATCH v2 10/11] mm/slab: refill cpu cache through a new slab without holding a node lock

2016-04-11 Thread js1304

From: Joonsoo Kim 

Until now, cache growing makes a free slab on node's slab list and then we
can allocate free objects from it.  This necessarily requires to hold a
node lock which is very contended.  If we refill cpu cache before
attaching it to node's slab list, we can avoid holding a node lock as much
as possible because this newly allocated slab is only visible to the
current task.  This will reduce lock contention.

Below is the result of concurrent allocation/free in slab allocation
benchmark made by Christoph a long time ago.  I make the output simpler.
The number shows cycle count during alloc/free respectively so less is
better.

* Before
Kmalloc N*alloc N*free(32): Average=355/750
Kmalloc N*alloc N*free(64): Average=452/812
Kmalloc N*alloc N*free(128): Average=559/1070
Kmalloc N*alloc N*free(256): Average=1176/980
Kmalloc N*alloc N*free(512): Average=1939/1189
Kmalloc N*alloc N*free(1024): Average=3521/1278
Kmalloc N*alloc N*free(2048): Average=7152/1838
Kmalloc N*alloc N*free(4096): Average=13438/2013

* After
Kmalloc N*alloc N*free(32): Average=248/966
Kmalloc N*alloc N*free(64): Average=261/949
Kmalloc N*alloc N*free(128): Average=314/1016
Kmalloc N*alloc N*free(256): Average=741/1061
Kmalloc N*alloc N*free(512): Average=1246/1152
Kmalloc N*alloc N*free(1024): Average=2437/1259
Kmalloc N*alloc N*free(2048): Average=4980/1800
Kmalloc N*alloc N*free(4096): Average=9000/2078

It shows that contention is reduced for all the object sizes and
performance increases by 30 ~ 40%.

Signed-off-by: Joonsoo Kim 
---
 mm/slab.c | 68 +--
 1 file changed, 36 insertions(+), 32 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 2c28ad5..cf12fbd 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2852,6 +2852,30 @@ static noinline void *cache_alloc_pfmemalloc(struct 
kmem_cache *cachep,
return obj;
 }
 
+/*
+ * Slab list should be fixed up by fixup_slab_list() for existing slab
+ * or cache_grow_end() for new slab
+ */
+static __always_inline int alloc_block(struct kmem_cache *cachep,
+   struct array_cache *ac, struct page *page, int batchcount)
+{
+   /*
+* There must be at least one object available for
+* allocation.
+*/
+   BUG_ON(page->active >= cachep->num);
+
+   while (page->active < cachep->num && batchcount--) {
+   STATS_INC_ALLOCED(cachep);
+   STATS_INC_ACTIVE(cachep);
+   STATS_SET_HIGH(cachep);
+
+   ac->entry[ac->avail++] = slab_get_obj(cachep, page);
+   }
+
+   return batchcount;
+}
+
 static void *cache_alloc_refill(struct kmem_cache *cachep, gfp_t flags)
 {
int batchcount;
@@ -2864,7 +2888,6 @@ static void *cache_alloc_refill(struct kmem_cache 
*cachep, gfp_t flags)
check_irq_off();
node = numa_mem_id();
 
-retry:
ac = cpu_cache_get(cachep);
batchcount = ac->batchcount;
if (!ac->touched && batchcount > BATCHREFILL_LIMIT) {
@@ -2894,21 +2917,7 @@ retry:
 
check_spinlock_acquired(cachep);
 
-   /*
-* The slab was either on partial or free list so
-* there must be at least one object available for
-* allocation.
-*/
-   BUG_ON(page->active >= cachep->num);
-
-   while (page->active < cachep->num && batchcount--) {
-   STATS_INC_ALLOCED(cachep);
-   STATS_INC_ACTIVE(cachep);
-   STATS_SET_HIGH(cachep);
-
-   ac->entry[ac->avail++] = slab_get_obj(cachep, page);
-   }
-
+   batchcount = alloc_block(cachep, ac, page, batchcount);
fixup_slab_list(cachep, n, page, );
}
 
@@ -2928,21 +2937,18 @@ alloc_done:
}
 
page = cache_grow_begin(cachep, gfp_exact_node(flags), node);
-   cache_grow_end(cachep, page);
 
/*
 * cache_grow_begin() can reenable interrupts,
 * then ac could change.
 */
ac = cpu_cache_get(cachep);
-   node = numa_mem_id();
+   if (!ac->avail && page)
+   alloc_block(cachep, ac, page, batchcount);
+   cache_grow_end(cachep, page);
 
-   /* no objects in sight? abort */
-   if (!page && ac->avail == 0)
+   if (!ac->avail)
return NULL;
-
-   if (!ac->avail) /* objects refilled by interrupt? */
-   goto retry;
}
ac->touched = 1;
 
@@ -3136,14 +3142,13 @@ static void *cache_alloc_node(struct kmem_cache 
*cachep, gfp_t flags,
 {
struct page *page;
struct kmem_cache_node *n;
-   void *obj;
+   void *obj = NULL;
void *list = NULL;
 
VM_BUG_ON(nodeid < 0 || nodeid >= MAX_NUMNODES);
n =

[PATCH v2 11/11] mm/slab: lockless decision to grow cache

2016-04-11 Thread js1304

From: Joonsoo Kim 

To check whther free objects exist or not precisely, we need to grab a
lock.  But, accuracy isn't that important because race window would be
even small and if there is too much free object, cache reaper would reap
it.  So, this patch makes the check for free object exisistence not to
hold a lock.  This will reduce lock contention in heavily allocation case.

Note that until now, n->shared can be freed during the processing by
writing slabinfo, but, with some trick in this patch, we can access it
freely within interrupt disabled period.

Below is the result of concurrent allocation/free in slab allocation
benchmark made by Christoph a long time ago.  I make the output simpler.
The number shows cycle count during alloc/free respectively so less is
better.

* Before
Kmalloc N*alloc N*free(32): Average=248/966
Kmalloc N*alloc N*free(64): Average=261/949
Kmalloc N*alloc N*free(128): Average=314/1016
Kmalloc N*alloc N*free(256): Average=741/1061
Kmalloc N*alloc N*free(512): Average=1246/1152
Kmalloc N*alloc N*free(1024): Average=2437/1259
Kmalloc N*alloc N*free(2048): Average=4980/1800
Kmalloc N*alloc N*free(4096): Average=9000/2078

* After
Kmalloc N*alloc N*free(32): Average=344/792
Kmalloc N*alloc N*free(64): Average=347/882
Kmalloc N*alloc N*free(128): Average=390/959
Kmalloc N*alloc N*free(256): Average=393/1067
Kmalloc N*alloc N*free(512): Average=683/1229
Kmalloc N*alloc N*free(1024): Average=1295/1325
Kmalloc N*alloc N*free(2048): Average=2513/1664
Kmalloc N*alloc N*free(4096): Average=4742/2172

It shows that allocation performance decreases for the object size up to
128 and it may be due to extra checks in cache_alloc_refill().  But, with
considering improvement of free performance, net result looks the same.
Result for other size class looks very promising, roughly, 50% performance
improvement.

v2: replace kick_all_cpus_sync() with synchronize_sched().

Signed-off-by: Joonsoo Kim 
---
 mm/slab.c | 21 ++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index cf12fbd..13e74aa 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -952,6 +952,15 @@ static int setup_kmem_cache_node(struct kmem_cache *cachep,
spin_unlock_irq(>list_lock);
slabs_destroy(cachep, );
 
+   /*
+* To protect lockless access to n->shared during irq disabled context.
+* If n->shared isn't NULL in irq disabled context, accessing to it is
+* guaranteed to be valid until irq is re-enabled, because it will be
+* freed after synchronize_sched().
+*/
+   if (force_change)
+   synchronize_sched();
+
 fail:
kfree(old_shared);
kfree(new_shared);
@@ -2880,7 +2889,7 @@ static void *cache_alloc_refill(struct kmem_cache 
*cachep, gfp_t flags)
 {
int batchcount;
struct kmem_cache_node *n;
-   struct array_cache *ac;
+   struct array_cache *ac, *shared;
int node;
void *list = NULL;
struct page *page;
@@ -2901,11 +2910,16 @@ static void *cache_alloc_refill(struct kmem_cache 
*cachep, gfp_t flags)
n = get_node(cachep, node);
 
BUG_ON(ac->avail > 0 || !n);
+   shared = READ_ONCE(n->shared);
+   if (!n->free_objects && (!shared || !shared->avail))
+   goto direct_grow;
+
spin_lock(>list_lock);
+   shared = READ_ONCE(n->shared);
 
/* See if we can refill from the shared array */
-   if (n->shared && transfer_objects(ac, n->shared, batchcount)) {
-   n->shared->touched = 1;
+   if (shared && transfer_objects(ac, shared, batchcount)) {
+   shared->touched = 1;
goto alloc_done;
}
 
@@ -2927,6 +2941,7 @@ alloc_done:
spin_unlock(>list_lock);
fixup_objfreelist_debug(cachep, );
 
+direct_grow:
if (unlikely(!ac->avail)) {
/* Check if we can use obj in pfmemalloc slab */
if (sk_memalloc_socks()) {
-- 
1.9.1

[PATCH v2 08/11] mm/slab: make cache_grow() handle the page allocated on arbitrary node

2016-04-11 Thread js1304

From: Joonsoo Kim 

Currently, cache_grow() assumes that allocated page's nodeid would be same
with parameter nodeid which is used for allocation request.  If we discard
this assumption, we can handle fallback_alloc() case gracefully.  So, this
patch makes cache_grow() handle the page allocated on arbitrary node and
clean-up relevant code.

Signed-off-by: Joonsoo Kim 
---
 mm/slab.c | 60 +---
 1 file changed, 21 insertions(+), 39 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index a3422bc..1910589 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2543,13 +2543,14 @@ static void slab_map_pages(struct kmem_cache *cache, 
struct page *page,
  * Grow (by 1) the number of slabs within a cache.  This is called by
  * kmem_cache_alloc() when there are no active objs left in a cache.
  */
-static int cache_grow(struct kmem_cache *cachep,
-   gfp_t flags, int nodeid, struct page *page)
+static int cache_grow(struct kmem_cache *cachep, gfp_t flags, int nodeid)
 {
void *freelist;
size_t offset;
gfp_t local_flags;
+   int page_node;
struct kmem_cache_node *n;
+   struct page *page;
 
/*
 * Be lazy and only check for valid flags here,  keeping it out of the
@@ -2577,12 +2578,12 @@ static int cache_grow(struct kmem_cache *cachep,
 * Get mem for the objs.  Attempt to allocate a physical page from
 * 'nodeid'.
 */
-   if (!page)
-   page = kmem_getpages(cachep, local_flags, nodeid);
+   page = kmem_getpages(cachep, local_flags, nodeid);
if (!page)
goto failed;
 
-   n = get_node(cachep, nodeid);
+   page_node = page_to_nid(page);
+   n = get_node(cachep, page_node);
 
/* Get colour for the slab, and cal the next value. */
n->colour_next++;
@@ -2597,7 +2598,7 @@ static int cache_grow(struct kmem_cache *cachep,
 
/* Get slab management. */
freelist = alloc_slabmgmt(cachep, page, offset,
-   local_flags & ~GFP_CONSTRAINT_MASK, nodeid);
+   local_flags & ~GFP_CONSTRAINT_MASK, page_node);
if (OFF_SLAB(cachep) && !freelist)
goto opps1;
 
@@ -2616,13 +2617,13 @@ static int cache_grow(struct kmem_cache *cachep,
STATS_INC_GROWN(cachep);
n->free_objects += cachep->num;
spin_unlock(>list_lock);
-   return 1;
+   return page_node;
 opps1:
kmem_freepages(cachep, page);
 failed:
if (gfpflags_allow_blocking(local_flags))
local_irq_disable();
-   return 0;
+   return -1;
 }
 
 #if DEBUG
@@ -2903,14 +2904,14 @@ alloc_done:
return obj;
}
 
-   x = cache_grow(cachep, gfp_exact_node(flags), node, NULL);
+   x = cache_grow(cachep, gfp_exact_node(flags), node);
 
/* cache_grow can reenable interrupts, then ac could change. */
ac = cpu_cache_get(cachep);
node = numa_mem_id();
 
/* no objects in sight? abort */
-   if (!x && ac->avail == 0)
+   if (x < 0 && ac->avail == 0)
return NULL;
 
if (!ac->avail) /* objects refilled by interrupt? */
@@ -3039,7 +3040,6 @@ static void *alternate_node_alloc(struct kmem_cache 
*cachep, gfp_t flags)
 static void *fallback_alloc(struct kmem_cache *cache, gfp_t flags)
 {
struct zonelist *zonelist;
-   gfp_t local_flags;
struct zoneref *z;
struct zone *zone;
enum zone_type high_zoneidx = gfp_zone(flags);
@@ -3050,8 +3050,6 @@ static void *fallback_alloc(struct kmem_cache *cache, 
gfp_t flags)
if (flags & __GFP_THISNODE)
return NULL;
 
-   local_flags = flags & (GFP_CONSTRAINT_MASK|GFP_RECLAIM_MASK);
-
 retry_cpuset:
cpuset_mems_cookie = read_mems_allowed_begin();
zonelist = node_zonelist(mempolicy_slab_node(), flags);
@@ -3081,33 +3079,17 @@ retry:
 * We may trigger various forms of reclaim on the allowed
 * set and go into memory reserves if necessary.
 */
-   struct page *page;
+   nid = cache_grow(cache, flags, numa_mem_id());
+   if (nid >= 0) {
+   obj = cache_alloc_node(cache,
+   gfp_exact_node(flags), nid);
 
-   if (gfpflags_allow_blocking(local_flags))
-   local_irq_enable();
-   kmem_flagcheck(cache, flags);
-   page = kmem_getpages(cache, local_flags, numa_mem_id());
-   if (gfpflags_allow_blocking(local_flags))
-   local_irq_disable();
-   if (page) {
/*
-* Insert into the appropriate per node queues
+*

[PATCH v2 07/11] mm/slab: racy access/modify the slab color

2016-04-11 Thread js1304

From: Joonsoo Kim 

Slab color isn't needed to be changed strictly.  Because locking for
changing slab color could cause more lock contention so this patch
implements racy access/modify the slab color.  This is a preparation step
to implement lockless allocation path when there is no free objects in the
kmem_cache.

Below is the result of concurrent allocation/free in slab allocation
benchmark made by Christoph a long time ago.  I make the output simpler.
The number shows cycle count during alloc/free respectively so less is
better.

* Before
Kmalloc N*alloc N*free(32): Average=365/806
Kmalloc N*alloc N*free(64): Average=452/690
Kmalloc N*alloc N*free(128): Average=736/886
Kmalloc N*alloc N*free(256): Average=1167/985
Kmalloc N*alloc N*free(512): Average=2088/1125
Kmalloc N*alloc N*free(1024): Average=4115/1184
Kmalloc N*alloc N*free(2048): Average=8451/1748
Kmalloc N*alloc N*free(4096): Average=16024/2048

* After
Kmalloc N*alloc N*free(32): Average=355/750
Kmalloc N*alloc N*free(64): Average=452/812
Kmalloc N*alloc N*free(128): Average=559/1070
Kmalloc N*alloc N*free(256): Average=1176/980
Kmalloc N*alloc N*free(512): Average=1939/1189
Kmalloc N*alloc N*free(1024): Average=3521/1278
Kmalloc N*alloc N*free(2048): Average=7152/1838
Kmalloc N*alloc N*free(4096): Average=13438/2013

It shows that contention is reduced for object size >= 1024 and
performance increases by roughly 15%.

Acked-by: Christoph Lameter 
Signed-off-by: Joonsoo Kim 
---
 mm/slab.c | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 6e61461..a3422bc 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2561,20 +2561,7 @@ static int cache_grow(struct kmem_cache *cachep,
}
local_flags = flags & (GFP_CONSTRAINT_MASK|GFP_RECLAIM_MASK);
 
-   /* Take the node list lock to change the colour_next on this node */
check_irq_off();
-   n = get_node(cachep, nodeid);
-   spin_lock(>list_lock);
-
-   /* Get colour for the slab, and cal the next value. */
-   offset = n->colour_next;
-   n->colour_next++;
-   if (n->colour_next >= cachep->colour)
-   n->colour_next = 0;
-   spin_unlock(>list_lock);
-
-   offset *= cachep->colour_off;
-
if (gfpflags_allow_blocking(local_flags))
local_irq_enable();
 
@@ -2595,6 +2582,19 @@ static int cache_grow(struct kmem_cache *cachep,
if (!page)
goto failed;
 
+   n = get_node(cachep, nodeid);
+
+   /* Get colour for the slab, and cal the next value. */
+   n->colour_next++;
+   if (n->colour_next >= cachep->colour)
+   n->colour_next = 0;
+
+   offset = n->colour_next;
+   if (offset >= cachep->colour)
+   offset = 0;
+
+   offset *= cachep->colour_off;
+
/* Get slab management. */
freelist = alloc_slabmgmt(cachep, page, offset,
local_flags & ~GFP_CONSTRAINT_MASK, nodeid);
-- 
1.9.1

[PATCH v2 09/11] mm/slab: separate cache_grow() to two parts

2016-04-11 Thread js1304

From: Joonsoo Kim 

This is a preparation step to implement lockless allocation path when
there is no free objects in kmem_cache.  What we'd like to do here is to
refill cpu cache without holding a node lock.  To accomplish this purpose,
refill should be done after new slab allocation but before attaching the
slab to the management list.  So, this patch separates cache_grow() to two
parts, allocation and attaching to the list in order to add some code
inbetween them in the following patch.

Signed-off-by: Joonsoo Kim 
---
 mm/slab.c | 74 ---
 1 file changed, 52 insertions(+), 22 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 1910589..2c28ad5 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -213,6 +213,11 @@ static void slabs_destroy(struct kmem_cache *cachep, 
struct list_head *list);
 static int enable_cpucache(struct kmem_cache *cachep, gfp_t gfp);
 static void cache_reap(struct work_struct *unused);
 
+static inline void fixup_objfreelist_debug(struct kmem_cache *cachep,
+   void **list);
+static inline void fixup_slab_list(struct kmem_cache *cachep,
+   struct kmem_cache_node *n, struct page *page,
+   void **list);
 static int slab_early_init = 1;
 
 #define INDEX_NODE kmalloc_index(sizeof(struct kmem_cache_node))
@@ -1797,7 +1802,7 @@ static size_t calculate_slab_order(struct kmem_cache 
*cachep,
 
/*
 * Needed to avoid possible looping condition
-* in cache_grow()
+* in cache_grow_begin()
 */
if (OFF_SLAB(freelist_cache))
continue;
@@ -2543,7 +2548,8 @@ static void slab_map_pages(struct kmem_cache *cache, 
struct page *page,
  * Grow (by 1) the number of slabs within a cache.  This is called by
  * kmem_cache_alloc() when there are no active objs left in a cache.
  */
-static int cache_grow(struct kmem_cache *cachep, gfp_t flags, int nodeid)
+static struct page *cache_grow_begin(struct kmem_cache *cachep,
+   gfp_t flags, int nodeid)
 {
void *freelist;
size_t offset;
@@ -2609,21 +2615,40 @@ static int cache_grow(struct kmem_cache *cachep, gfp_t 
flags, int nodeid)
 
if (gfpflags_allow_blocking(local_flags))
local_irq_disable();
-   check_irq_off();
-   spin_lock(>list_lock);
 
-   /* Make slab active. */
-   list_add_tail(>lru, &(n->slabs_free));
-   STATS_INC_GROWN(cachep);
-   n->free_objects += cachep->num;
-   spin_unlock(>list_lock);
-   return page_node;
+   return page;
+
 opps1:
kmem_freepages(cachep, page);
 failed:
if (gfpflags_allow_blocking(local_flags))
local_irq_disable();
-   return -1;
+   return NULL;
+}
+
+static void cache_grow_end(struct kmem_cache *cachep, struct page *page)
+{
+   struct kmem_cache_node *n;
+   void *list = NULL;
+
+   check_irq_off();
+
+   if (!page)
+   return;
+
+   INIT_LIST_HEAD(>lru);
+   n = get_node(cachep, page_to_nid(page));
+
+   spin_lock(>list_lock);
+   if (!page->active)
+   list_add_tail(>lru, &(n->slabs_free));
+   else
+   fixup_slab_list(cachep, n, page, );
+   STATS_INC_GROWN(cachep);
+   n->free_objects += cachep->num - page->active;
+   spin_unlock(>list_lock);
+
+   fixup_objfreelist_debug(cachep, );
 }
 
 #if DEBUG
@@ -2834,6 +2859,7 @@ static void *cache_alloc_refill(struct kmem_cache 
*cachep, gfp_t flags)
struct array_cache *ac;
int node;
void *list = NULL;
+   struct page *page;
 
check_irq_off();
node = numa_mem_id();
@@ -2861,7 +2887,6 @@ retry:
}
 
while (batchcount > 0) {
-   struct page *page;
/* Get slab alloc is to come from. */
page = get_first_slab(n, false);
if (!page)
@@ -2894,8 +2919,6 @@ alloc_done:
fixup_objfreelist_debug(cachep, );
 
if (unlikely(!ac->avail)) {
-   int x;
-
/* Check if we can use obj in pfmemalloc slab */
if (sk_memalloc_socks()) {
void *obj = cache_alloc_pfmemalloc(cachep, n, flags);
@@ -2904,14 +2927,18 @@ alloc_done:
return obj;
}
 
-   x = cache_grow(cachep, gfp_exact_node(flags), node);
+   page = cache_grow_begin(cachep, gfp_exact_node(flags), node);
+   cache_grow_end(cachep, page);
 
-   /* cache_grow can reenable interrupts, then ac could change. */
+   /*
+* cache_grow_begin() can reenable interrupts,
+* then ac could change.
+*/

[PATCH v2 11/11] mm/slab: lockless decision to grow cache

2016-04-11 Thread js1304

From: Joonsoo Kim 

To check whther free objects exist or not precisely, we need to grab a
lock.  But, accuracy isn't that important because race window would be
even small and if there is too much free object, cache reaper would reap
it.  So, this patch makes the check for free object exisistence not to
hold a lock.  This will reduce lock contention in heavily allocation case.

Note that until now, n->shared can be freed during the processing by
writing slabinfo, but, with some trick in this patch, we can access it
freely within interrupt disabled period.

Below is the result of concurrent allocation/free in slab allocation
benchmark made by Christoph a long time ago.  I make the output simpler.
The number shows cycle count during alloc/free respectively so less is
better.

* Before
Kmalloc N*alloc N*free(32): Average=248/966
Kmalloc N*alloc N*free(64): Average=261/949
Kmalloc N*alloc N*free(128): Average=314/1016
Kmalloc N*alloc N*free(256): Average=741/1061
Kmalloc N*alloc N*free(512): Average=1246/1152
Kmalloc N*alloc N*free(1024): Average=2437/1259
Kmalloc N*alloc N*free(2048): Average=4980/1800
Kmalloc N*alloc N*free(4096): Average=9000/2078

* After
Kmalloc N*alloc N*free(32): Average=344/792
Kmalloc N*alloc N*free(64): Average=347/882
Kmalloc N*alloc N*free(128): Average=390/959
Kmalloc N*alloc N*free(256): Average=393/1067
Kmalloc N*alloc N*free(512): Average=683/1229
Kmalloc N*alloc N*free(1024): Average=1295/1325
Kmalloc N*alloc N*free(2048): Average=2513/1664
Kmalloc N*alloc N*free(4096): Average=4742/2172

It shows that allocation performance decreases for the object size up to
128 and it may be due to extra checks in cache_alloc_refill().  But, with
considering improvement of free performance, net result looks the same.
Result for other size class looks very promising, roughly, 50% performance
improvement.

v2: replace kick_all_cpus_sync() with synchronize_sched().

Signed-off-by: Joonsoo Kim 
---
 mm/slab.c | 21 ++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index cf12fbd..13e74aa 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -952,6 +952,15 @@ static int setup_kmem_cache_node(struct kmem_cache *cachep,
spin_unlock_irq(>list_lock);
slabs_destroy(cachep, );
 
+   /*
+* To protect lockless access to n->shared during irq disabled context.
+* If n->shared isn't NULL in irq disabled context, accessing to it is
+* guaranteed to be valid until irq is re-enabled, because it will be
+* freed after synchronize_sched().
+*/
+   if (force_change)
+   synchronize_sched();
+
 fail:
kfree(old_shared);
kfree(new_shared);
@@ -2880,7 +2889,7 @@ static void *cache_alloc_refill(struct kmem_cache 
*cachep, gfp_t flags)
 {
int batchcount;
struct kmem_cache_node *n;
-   struct array_cache *ac;
+   struct array_cache *ac, *shared;
int node;
void *list = NULL;
struct page *page;
@@ -2901,11 +2910,16 @@ static void *cache_alloc_refill(struct kmem_cache 
*cachep, gfp_t flags)
n = get_node(cachep, node);
 
BUG_ON(ac->avail > 0 || !n);
+   shared = READ_ONCE(n->shared);
+   if (!n->free_objects && (!shared || !shared->avail))
+   goto direct_grow;
+
spin_lock(>list_lock);
+   shared = READ_ONCE(n->shared);
 
/* See if we can refill from the shared array */
-   if (n->shared && transfer_objects(ac, n->shared, batchcount)) {
-   n->shared->touched = 1;
+   if (shared && transfer_objects(ac, shared, batchcount)) {
+   shared->touched = 1;
goto alloc_done;
}
 
@@ -2927,6 +2941,7 @@ alloc_done:
spin_unlock(>list_lock);
fixup_objfreelist_debug(cachep, );
 
+direct_grow:
if (unlikely(!ac->avail)) {
/* Check if we can use obj in pfmemalloc slab */
if (sk_memalloc_socks()) {
-- 
1.9.1

[PATCH v2 08/11] mm/slab: make cache_grow() handle the page allocated on arbitrary node

2016-04-11 Thread js1304

From: Joonsoo Kim 

Currently, cache_grow() assumes that allocated page's nodeid would be same
with parameter nodeid which is used for allocation request.  If we discard
this assumption, we can handle fallback_alloc() case gracefully.  So, this
patch makes cache_grow() handle the page allocated on arbitrary node and
clean-up relevant code.

Signed-off-by: Joonsoo Kim 
---
 mm/slab.c | 60 +---
 1 file changed, 21 insertions(+), 39 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index a3422bc..1910589 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2543,13 +2543,14 @@ static void slab_map_pages(struct kmem_cache *cache, 
struct page *page,
  * Grow (by 1) the number of slabs within a cache.  This is called by
  * kmem_cache_alloc() when there are no active objs left in a cache.
  */
-static int cache_grow(struct kmem_cache *cachep,
-   gfp_t flags, int nodeid, struct page *page)
+static int cache_grow(struct kmem_cache *cachep, gfp_t flags, int nodeid)
 {
void *freelist;
size_t offset;
gfp_t local_flags;
+   int page_node;
struct kmem_cache_node *n;
+   struct page *page;
 
/*
 * Be lazy and only check for valid flags here,  keeping it out of the
@@ -2577,12 +2578,12 @@ static int cache_grow(struct kmem_cache *cachep,
 * Get mem for the objs.  Attempt to allocate a physical page from
 * 'nodeid'.
 */
-   if (!page)
-   page = kmem_getpages(cachep, local_flags, nodeid);
+   page = kmem_getpages(cachep, local_flags, nodeid);
if (!page)
goto failed;
 
-   n = get_node(cachep, nodeid);
+   page_node = page_to_nid(page);
+   n = get_node(cachep, page_node);
 
/* Get colour for the slab, and cal the next value. */
n->colour_next++;
@@ -2597,7 +2598,7 @@ static int cache_grow(struct kmem_cache *cachep,
 
/* Get slab management. */
freelist = alloc_slabmgmt(cachep, page, offset,
-   local_flags & ~GFP_CONSTRAINT_MASK, nodeid);
+   local_flags & ~GFP_CONSTRAINT_MASK, page_node);
if (OFF_SLAB(cachep) && !freelist)
goto opps1;
 
@@ -2616,13 +2617,13 @@ static int cache_grow(struct kmem_cache *cachep,
STATS_INC_GROWN(cachep);
n->free_objects += cachep->num;
spin_unlock(>list_lock);
-   return 1;
+   return page_node;
 opps1:
kmem_freepages(cachep, page);
 failed:
if (gfpflags_allow_blocking(local_flags))
local_irq_disable();
-   return 0;
+   return -1;
 }
 
 #if DEBUG
@@ -2903,14 +2904,14 @@ alloc_done:
return obj;
}
 
-   x = cache_grow(cachep, gfp_exact_node(flags), node, NULL);
+   x = cache_grow(cachep, gfp_exact_node(flags), node);
 
/* cache_grow can reenable interrupts, then ac could change. */
ac = cpu_cache_get(cachep);
node = numa_mem_id();
 
/* no objects in sight? abort */
-   if (!x && ac->avail == 0)
+   if (x < 0 && ac->avail == 0)
return NULL;
 
if (!ac->avail) /* objects refilled by interrupt? */
@@ -3039,7 +3040,6 @@ static void *alternate_node_alloc(struct kmem_cache 
*cachep, gfp_t flags)
 static void *fallback_alloc(struct kmem_cache *cache, gfp_t flags)
 {
struct zonelist *zonelist;
-   gfp_t local_flags;
struct zoneref *z;
struct zone *zone;
enum zone_type high_zoneidx = gfp_zone(flags);
@@ -3050,8 +3050,6 @@ static void *fallback_alloc(struct kmem_cache *cache, 
gfp_t flags)
if (flags & __GFP_THISNODE)
return NULL;
 
-   local_flags = flags & (GFP_CONSTRAINT_MASK|GFP_RECLAIM_MASK);
-
 retry_cpuset:
cpuset_mems_cookie = read_mems_allowed_begin();
zonelist = node_zonelist(mempolicy_slab_node(), flags);
@@ -3081,33 +3079,17 @@ retry:
 * We may trigger various forms of reclaim on the allowed
 * set and go into memory reserves if necessary.
 */
-   struct page *page;
+   nid = cache_grow(cache, flags, numa_mem_id());
+   if (nid >= 0) {
+   obj = cache_alloc_node(cache,
+   gfp_exact_node(flags), nid);
 
-   if (gfpflags_allow_blocking(local_flags))
-   local_irq_enable();
-   kmem_flagcheck(cache, flags);
-   page = kmem_getpages(cache, local_flags, numa_mem_id());
-   if (gfpflags_allow_blocking(local_flags))
-   local_irq_disable();
-   if (page) {
/*
-* Insert into the appropriate per node queues
+* Another processor may allocate the objects in
+

[PATCH v2 07/11] mm/slab: racy access/modify the slab color

2016-04-11 Thread js1304

From: Joonsoo Kim 

Slab color isn't needed to be changed strictly.  Because locking for
changing slab color could cause more lock contention so this patch
implements racy access/modify the slab color.  This is a preparation step
to implement lockless allocation path when there is no free objects in the
kmem_cache.

Below is the result of concurrent allocation/free in slab allocation
benchmark made by Christoph a long time ago.  I make the output simpler.
The number shows cycle count during alloc/free respectively so less is
better.

* Before
Kmalloc N*alloc N*free(32): Average=365/806
Kmalloc N*alloc N*free(64): Average=452/690
Kmalloc N*alloc N*free(128): Average=736/886
Kmalloc N*alloc N*free(256): Average=1167/985
Kmalloc N*alloc N*free(512): Average=2088/1125
Kmalloc N*alloc N*free(1024): Average=4115/1184
Kmalloc N*alloc N*free(2048): Average=8451/1748
Kmalloc N*alloc N*free(4096): Average=16024/2048

* After
Kmalloc N*alloc N*free(32): Average=355/750
Kmalloc N*alloc N*free(64): Average=452/812
Kmalloc N*alloc N*free(128): Average=559/1070
Kmalloc N*alloc N*free(256): Average=1176/980
Kmalloc N*alloc N*free(512): Average=1939/1189
Kmalloc N*alloc N*free(1024): Average=3521/1278
Kmalloc N*alloc N*free(2048): Average=7152/1838
Kmalloc N*alloc N*free(4096): Average=13438/2013

It shows that contention is reduced for object size >= 1024 and
performance increases by roughly 15%.

Acked-by: Christoph Lameter 
Signed-off-by: Joonsoo Kim 
---
 mm/slab.c | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 6e61461..a3422bc 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2561,20 +2561,7 @@ static int cache_grow(struct kmem_cache *cachep,
}
local_flags = flags & (GFP_CONSTRAINT_MASK|GFP_RECLAIM_MASK);
 
-   /* Take the node list lock to change the colour_next on this node */
check_irq_off();
-   n = get_node(cachep, nodeid);
-   spin_lock(>list_lock);
-
-   /* Get colour for the slab, and cal the next value. */
-   offset = n->colour_next;
-   n->colour_next++;
-   if (n->colour_next >= cachep->colour)
-   n->colour_next = 0;
-   spin_unlock(>list_lock);
-
-   offset *= cachep->colour_off;
-
if (gfpflags_allow_blocking(local_flags))
local_irq_enable();
 
@@ -2595,6 +2582,19 @@ static int cache_grow(struct kmem_cache *cachep,
if (!page)
goto failed;
 
+   n = get_node(cachep, nodeid);
+
+   /* Get colour for the slab, and cal the next value. */
+   n->colour_next++;
+   if (n->colour_next >= cachep->colour)
+   n->colour_next = 0;
+
+   offset = n->colour_next;
+   if (offset >= cachep->colour)
+   offset = 0;
+
+   offset *= cachep->colour_off;
+
/* Get slab management. */
freelist = alloc_slabmgmt(cachep, page, offset,
local_flags & ~GFP_CONSTRAINT_MASK, nodeid);
-- 
1.9.1

[PATCH v2 09/11] mm/slab: separate cache_grow() to two parts

2016-04-11 Thread js1304

From: Joonsoo Kim 

This is a preparation step to implement lockless allocation path when
there is no free objects in kmem_cache.  What we'd like to do here is to
refill cpu cache without holding a node lock.  To accomplish this purpose,
refill should be done after new slab allocation but before attaching the
slab to the management list.  So, this patch separates cache_grow() to two
parts, allocation and attaching to the list in order to add some code
inbetween them in the following patch.

Signed-off-by: Joonsoo Kim 
---
 mm/slab.c | 74 ---
 1 file changed, 52 insertions(+), 22 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 1910589..2c28ad5 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -213,6 +213,11 @@ static void slabs_destroy(struct kmem_cache *cachep, 
struct list_head *list);
 static int enable_cpucache(struct kmem_cache *cachep, gfp_t gfp);
 static void cache_reap(struct work_struct *unused);
 
+static inline void fixup_objfreelist_debug(struct kmem_cache *cachep,
+   void **list);
+static inline void fixup_slab_list(struct kmem_cache *cachep,
+   struct kmem_cache_node *n, struct page *page,
+   void **list);
 static int slab_early_init = 1;
 
 #define INDEX_NODE kmalloc_index(sizeof(struct kmem_cache_node))
@@ -1797,7 +1802,7 @@ static size_t calculate_slab_order(struct kmem_cache 
*cachep,
 
/*
 * Needed to avoid possible looping condition
-* in cache_grow()
+* in cache_grow_begin()
 */
if (OFF_SLAB(freelist_cache))
continue;
@@ -2543,7 +2548,8 @@ static void slab_map_pages(struct kmem_cache *cache, 
struct page *page,
  * Grow (by 1) the number of slabs within a cache.  This is called by
  * kmem_cache_alloc() when there are no active objs left in a cache.
  */
-static int cache_grow(struct kmem_cache *cachep, gfp_t flags, int nodeid)
+static struct page *cache_grow_begin(struct kmem_cache *cachep,
+   gfp_t flags, int nodeid)
 {
void *freelist;
size_t offset;
@@ -2609,21 +2615,40 @@ static int cache_grow(struct kmem_cache *cachep, gfp_t 
flags, int nodeid)
 
if (gfpflags_allow_blocking(local_flags))
local_irq_disable();
-   check_irq_off();
-   spin_lock(>list_lock);
 
-   /* Make slab active. */
-   list_add_tail(>lru, &(n->slabs_free));
-   STATS_INC_GROWN(cachep);
-   n->free_objects += cachep->num;
-   spin_unlock(>list_lock);
-   return page_node;
+   return page;
+
 opps1:
kmem_freepages(cachep, page);
 failed:
if (gfpflags_allow_blocking(local_flags))
local_irq_disable();
-   return -1;
+   return NULL;
+}
+
+static void cache_grow_end(struct kmem_cache *cachep, struct page *page)
+{
+   struct kmem_cache_node *n;
+   void *list = NULL;
+
+   check_irq_off();
+
+   if (!page)
+   return;
+
+   INIT_LIST_HEAD(>lru);
+   n = get_node(cachep, page_to_nid(page));
+
+   spin_lock(>list_lock);
+   if (!page->active)
+   list_add_tail(>lru, &(n->slabs_free));
+   else
+   fixup_slab_list(cachep, n, page, );
+   STATS_INC_GROWN(cachep);
+   n->free_objects += cachep->num - page->active;
+   spin_unlock(>list_lock);
+
+   fixup_objfreelist_debug(cachep, );
 }
 
 #if DEBUG
@@ -2834,6 +2859,7 @@ static void *cache_alloc_refill(struct kmem_cache 
*cachep, gfp_t flags)
struct array_cache *ac;
int node;
void *list = NULL;
+   struct page *page;
 
check_irq_off();
node = numa_mem_id();
@@ -2861,7 +2887,6 @@ retry:
}
 
while (batchcount > 0) {
-   struct page *page;
/* Get slab alloc is to come from. */
page = get_first_slab(n, false);
if (!page)
@@ -2894,8 +2919,6 @@ alloc_done:
fixup_objfreelist_debug(cachep, );
 
if (unlikely(!ac->avail)) {
-   int x;
-
/* Check if we can use obj in pfmemalloc slab */
if (sk_memalloc_socks()) {
void *obj = cache_alloc_pfmemalloc(cachep, n, flags);
@@ -2904,14 +2927,18 @@ alloc_done:
return obj;
}
 
-   x = cache_grow(cachep, gfp_exact_node(flags), node);
+   page = cache_grow_begin(cachep, gfp_exact_node(flags), node);
+   cache_grow_end(cachep, page);
 
-   /* cache_grow can reenable interrupts, then ac could change. */
+   /*
+* cache_grow_begin() can reenable interrupts,
+* then ac could change.
+*/
ac = cpu_cache_get(cachep);

[PATCH v2 05/11] mm/slab: clean-up kmem_cache_node setup

2016-04-11 Thread js1304

From: Joonsoo Kim 

There are mostly same code for setting up kmem_cache_node either in
cpuup_prepare() or alloc_kmem_cache_node().  Factor out and clean-up them.

v2
o Rename setup_kmem_cache_node_node to setup_kmem_cache_nodes
o Fix suspend-to-ram issue reported by Nishanth

Tested-by: Nishanth Menon 
Tested-by: Jon Hunter 
Signed-off-by: Joonsoo Kim 
---
 mm/slab.c | 168 +-
 1 file changed, 68 insertions(+), 100 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 49af685..27cb390 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -898,6 +898,63 @@ static int init_cache_node_node(int node)
return 0;
 }
 
+static int setup_kmem_cache_node(struct kmem_cache *cachep,
+   int node, gfp_t gfp, bool force_change)
+{
+   int ret = -ENOMEM;
+   struct kmem_cache_node *n;
+   struct array_cache *old_shared = NULL;
+   struct array_cache *new_shared = NULL;
+   struct alien_cache **new_alien = NULL;
+   LIST_HEAD(list);
+
+   if (use_alien_caches) {
+   new_alien = alloc_alien_cache(node, cachep->limit, gfp);
+   if (!new_alien)
+   goto fail;
+   }
+
+   if (cachep->shared) {
+   new_shared = alloc_arraycache(node,
+   cachep->shared * cachep->batchcount, 0xbaadf00d, gfp);
+   if (!new_shared)
+   goto fail;
+   }
+
+   ret = init_cache_node(cachep, node, gfp);
+   if (ret)
+   goto fail;
+
+   n = get_node(cachep, node);
+   spin_lock_irq(>list_lock);
+   if (n->shared && force_change) {
+   free_block(cachep, n->shared->entry,
+   n->shared->avail, node, );
+   n->shared->avail = 0;
+   }
+
+   if (!n->shared || force_change) {
+   old_shared = n->shared;
+   n->shared = new_shared;
+   new_shared = NULL;
+   }
+
+   if (!n->alien) {
+   n->alien = new_alien;
+   new_alien = NULL;
+   }
+
+   spin_unlock_irq(>list_lock);
+   slabs_destroy(cachep, );
+
+fail:
+   kfree(old_shared);
+   kfree(new_shared);
+   free_alien_cache(new_alien);
+
+   return ret;
+}
+
 static void cpuup_canceled(long cpu)
 {
struct kmem_cache *cachep;
@@ -969,7 +1026,6 @@ free_slab:
 static int cpuup_prepare(long cpu)
 {
struct kmem_cache *cachep;
-   struct kmem_cache_node *n = NULL;
int node = cpu_to_mem(cpu);
int err;
 
@@ -988,44 +1044,9 @@ static int cpuup_prepare(long cpu)
 * array caches
 */
list_for_each_entry(cachep, _caches, list) {
-   struct array_cache *shared = NULL;
-   struct alien_cache **alien = NULL;
-
-   if (cachep->shared) {
-   shared = alloc_arraycache(node,
-   cachep->shared * cachep->batchcount,
-   0xbaadf00d, GFP_KERNEL);
-   if (!shared)
-   goto bad;
-   }
-   if (use_alien_caches) {
-   alien = alloc_alien_cache(node, cachep->limit, 
GFP_KERNEL);
-   if (!alien) {
-   kfree(shared);
-   goto bad;
-   }
-   }
-   n = get_node(cachep, node);
-   BUG_ON(!n);
-
-   spin_lock_irq(>list_lock);
-   if (!n->shared) {
-   /*
-* We are serialised from CPU_DEAD or
-* CPU_UP_CANCELLED by the cpucontrol lock
-*/
-   n->shared = shared;
-   shared = NULL;
-   }
-#ifdef CONFIG_NUMA
-   if (!n->alien) {
-   n->alien = alien;
-   alien = NULL;
-   }
-#endif
-   spin_unlock_irq(>list_lock);
-   kfree(shared);
-   free_alien_cache(alien);
+   err = setup_kmem_cache_node(cachep, node, GFP_KERNEL, false);
+   if (err)
+   goto bad;
}
 
return 0;
@@ -3676,72 +3697,19 @@ EXPORT_SYMBOL(kfree);
 /*
  * This initializes kmem_cache_node or resizes various caches for all nodes.
  */
-static int alloc_kmem_cache_node(struct kmem_cache *cachep, gfp_t gfp)
+static int setup_kmem_cache_nodes(struct kmem_cache *cachep, gfp_t gfp)
 {
+   int ret;
int node;
struct kmem_cache_node *n;
-   struct array_cache *new_shared;
-   struct alien_cache **new_alien = NULL;
 
for_each_online_node(node) {
-
-   if (use_alien_caches) {
-   new_alien = alloc_alien_cache(node,

[PATCH v2 06/11] mm/slab: don't keep free slabs if free_objects exceeds free_limit

2016-04-11 Thread js1304

From: Joonsoo Kim 

Currently, determination to free a slab is done whenever each freed
object is put into the slab. This has a following problem.

Assume free_limit = 10 and nr_free = 9.

Free happens as following sequence and nr_free changes as following.

free(become a free slab) free(not become a free slab)
nr_free: 9 -> 10 (at first free) -> 11 (at second free)

If we try to check if we can free current slab or not on each object free,
we can't free any slab in this situation because current slab isn't
a free slab when nr_free exceed free_limit (at second free) even if
there is a free slab.

However, if we check it lastly, we can free 1 free slab.

This problem would cause to keep too much memory in the slab subsystem.
This patch try to fix it by checking number of free object after
all free work is done. If there is free slab at that time, we can
free slab as much as possible so we keep free slab as minimal.

v2: explain more about the problem

Signed-off-by: Joonsoo Kim 
---
 mm/slab.c | 23 ++-
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 27cb390..6e61461 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3283,6 +3283,9 @@ static void free_block(struct kmem_cache *cachep, void 
**objpp,
 {
int i;
struct kmem_cache_node *n = get_node(cachep, node);
+   struct page *page;
+
+   n->free_objects += nr_objects;
 
for (i = 0; i < nr_objects; i++) {
void *objp;
@@ -3295,17 +3298,11 @@ static void free_block(struct kmem_cache *cachep, void 
**objpp,
check_spinlock_acquired_node(cachep, node);
slab_put_obj(cachep, page, objp);
STATS_DEC_ACTIVE(cachep);
-   n->free_objects++;
 
/* fixup slab chains */
-   if (page->active == 0) {
-   if (n->free_objects > n->free_limit) {
-   n->free_objects -= cachep->num;
-   list_add_tail(>lru, list);
-   } else {
-   list_add(>lru, >slabs_free);
-   }
-   } else {
+   if (page->active == 0)
+   list_add(>lru, >slabs_free);
+   else {
/* Unconditionally move a slab to the end of the
 * partial list on free - maximum time for the
 * other objects to be freed, too.
@@ -3313,6 +3310,14 @@ static void free_block(struct kmem_cache *cachep, void 
**objpp,
list_add_tail(>lru, >slabs_partial);
}
}
+
+   while (n->free_objects > n->free_limit && !list_empty(>slabs_free)) {
+   n->free_objects -= cachep->num;
+
+   page = list_last_entry(>slabs_free, struct page, lru);
+   list_del(>lru);
+   list_add(>lru, list);
+   }
 }
 
 static void cache_flusharray(struct kmem_cache *cachep, struct array_cache *ac)
-- 
1.9.1

[PATCH v2 02/11] mm/slab: remove BAD_ALIEN_MAGIC again

2016-04-11 Thread js1304

From: Joonsoo Kim 

Initial attemp to remove BAD_ALIEN_MAGIC is once reverted by 'commit
edcad2509550 ("Revert "slab: remove BAD_ALIEN_MAGIC"")' because it causes
a problem on m68k which has many node but !CONFIG_NUMA.  In this case,
although alien cache isn't used at all but to cope with some
initialization path, garbage value is used and that is BAD_ALIEN_MAGIC.
Now, this patch set use_alien_caches to 0 when !CONFIG_NUMA, there is no
initialization path problem so we don't need BAD_ALIEN_MAGIC at all.  So
remove it.

Tested-by: Geert Uytterhoeven 
Signed-off-by: Joonsoo Kim 
---
 mm/slab.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index d8746c0..373b8be 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -421,8 +421,6 @@ static struct kmem_cache kmem_cache_boot = {
.name = "kmem_cache",
 };
 
-#define BAD_ALIEN_MAGIC 0x01020304ul
-
 static DEFINE_PER_CPU(struct delayed_work, slab_reap_work);
 
 static inline struct array_cache *cpu_cache_get(struct kmem_cache *cachep)
@@ -637,7 +635,7 @@ static int transfer_objects(struct array_cache *to,
 static inline struct alien_cache **alloc_alien_cache(int node,
int limit, gfp_t gfp)
 {
-   return (struct alien_cache **)BAD_ALIEN_MAGIC;
+   return NULL;
 }
 
 static inline void free_alien_cache(struct alien_cache **ac_ptr)
@@ -1205,7 +1203,7 @@ void __init kmem_cache_init(void)
sizeof(struct rcu_head));
kmem_cache = _cache_boot;
 
-   if (num_possible_nodes() == 1)
+   if (!IS_ENABLED(CONFIG_NUMA) || num_possible_nodes() == 1)
use_alien_caches = 0;
 
for (i = 0; i < NUM_INIT_LISTS; i++)
-- 
1.9.1

[PATCH v2 05/11] mm/slab: clean-up kmem_cache_node setup

2016-04-11 Thread js1304

From: Joonsoo Kim 

There are mostly same code for setting up kmem_cache_node either in
cpuup_prepare() or alloc_kmem_cache_node().  Factor out and clean-up them.

v2
o Rename setup_kmem_cache_node_node to setup_kmem_cache_nodes
o Fix suspend-to-ram issue reported by Nishanth

Tested-by: Nishanth Menon 
Tested-by: Jon Hunter 
Signed-off-by: Joonsoo Kim 
---
 mm/slab.c | 168 +-
 1 file changed, 68 insertions(+), 100 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 49af685..27cb390 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -898,6 +898,63 @@ static int init_cache_node_node(int node)
return 0;
 }
 
+static int setup_kmem_cache_node(struct kmem_cache *cachep,
+   int node, gfp_t gfp, bool force_change)
+{
+   int ret = -ENOMEM;
+   struct kmem_cache_node *n;
+   struct array_cache *old_shared = NULL;
+   struct array_cache *new_shared = NULL;
+   struct alien_cache **new_alien = NULL;
+   LIST_HEAD(list);
+
+   if (use_alien_caches) {
+   new_alien = alloc_alien_cache(node, cachep->limit, gfp);
+   if (!new_alien)
+   goto fail;
+   }
+
+   if (cachep->shared) {
+   new_shared = alloc_arraycache(node,
+   cachep->shared * cachep->batchcount, 0xbaadf00d, gfp);
+   if (!new_shared)
+   goto fail;
+   }
+
+   ret = init_cache_node(cachep, node, gfp);
+   if (ret)
+   goto fail;
+
+   n = get_node(cachep, node);
+   spin_lock_irq(>list_lock);
+   if (n->shared && force_change) {
+   free_block(cachep, n->shared->entry,
+   n->shared->avail, node, );
+   n->shared->avail = 0;
+   }
+
+   if (!n->shared || force_change) {
+   old_shared = n->shared;
+   n->shared = new_shared;
+   new_shared = NULL;
+   }
+
+   if (!n->alien) {
+   n->alien = new_alien;
+   new_alien = NULL;
+   }
+
+   spin_unlock_irq(>list_lock);
+   slabs_destroy(cachep, );
+
+fail:
+   kfree(old_shared);
+   kfree(new_shared);
+   free_alien_cache(new_alien);
+
+   return ret;
+}
+
 static void cpuup_canceled(long cpu)
 {
struct kmem_cache *cachep;
@@ -969,7 +1026,6 @@ free_slab:
 static int cpuup_prepare(long cpu)
 {
struct kmem_cache *cachep;
-   struct kmem_cache_node *n = NULL;
int node = cpu_to_mem(cpu);
int err;
 
@@ -988,44 +1044,9 @@ static int cpuup_prepare(long cpu)
 * array caches
 */
list_for_each_entry(cachep, _caches, list) {
-   struct array_cache *shared = NULL;
-   struct alien_cache **alien = NULL;
-
-   if (cachep->shared) {
-   shared = alloc_arraycache(node,
-   cachep->shared * cachep->batchcount,
-   0xbaadf00d, GFP_KERNEL);
-   if (!shared)
-   goto bad;
-   }
-   if (use_alien_caches) {
-   alien = alloc_alien_cache(node, cachep->limit, 
GFP_KERNEL);
-   if (!alien) {
-   kfree(shared);
-   goto bad;
-   }
-   }
-   n = get_node(cachep, node);
-   BUG_ON(!n);
-
-   spin_lock_irq(>list_lock);
-   if (!n->shared) {
-   /*
-* We are serialised from CPU_DEAD or
-* CPU_UP_CANCELLED by the cpucontrol lock
-*/
-   n->shared = shared;
-   shared = NULL;
-   }
-#ifdef CONFIG_NUMA
-   if (!n->alien) {
-   n->alien = alien;
-   alien = NULL;
-   }
-#endif
-   spin_unlock_irq(>list_lock);
-   kfree(shared);
-   free_alien_cache(alien);
+   err = setup_kmem_cache_node(cachep, node, GFP_KERNEL, false);
+   if (err)
+   goto bad;
}
 
return 0;
@@ -3676,72 +3697,19 @@ EXPORT_SYMBOL(kfree);
 /*
  * This initializes kmem_cache_node or resizes various caches for all nodes.
  */
-static int alloc_kmem_cache_node(struct kmem_cache *cachep, gfp_t gfp)
+static int setup_kmem_cache_nodes(struct kmem_cache *cachep, gfp_t gfp)
 {
+   int ret;
int node;
struct kmem_cache_node *n;
-   struct array_cache *new_shared;
-   struct alien_cache **new_alien = NULL;
 
for_each_online_node(node) {
-
-   if (use_alien_caches) {
-   new_alien = alloc_alien_cache(node, cachep->limit, gfp);
-   if (!new_alien)
-

[PATCH v2 06/11] mm/slab: don't keep free slabs if free_objects exceeds free_limit

2016-04-11 Thread js1304

From: Joonsoo Kim 

Currently, determination to free a slab is done whenever each freed
object is put into the slab. This has a following problem.

Assume free_limit = 10 and nr_free = 9.

Free happens as following sequence and nr_free changes as following.

free(become a free slab) free(not become a free slab)
nr_free: 9 -> 10 (at first free) -> 11 (at second free)

If we try to check if we can free current slab or not on each object free,
we can't free any slab in this situation because current slab isn't
a free slab when nr_free exceed free_limit (at second free) even if
there is a free slab.

However, if we check it lastly, we can free 1 free slab.

This problem would cause to keep too much memory in the slab subsystem.
This patch try to fix it by checking number of free object after
all free work is done. If there is free slab at that time, we can
free slab as much as possible so we keep free slab as minimal.

v2: explain more about the problem

Signed-off-by: Joonsoo Kim 
---
 mm/slab.c | 23 ++-
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 27cb390..6e61461 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3283,6 +3283,9 @@ static void free_block(struct kmem_cache *cachep, void 
**objpp,
 {
int i;
struct kmem_cache_node *n = get_node(cachep, node);
+   struct page *page;
+
+   n->free_objects += nr_objects;
 
for (i = 0; i < nr_objects; i++) {
void *objp;
@@ -3295,17 +3298,11 @@ static void free_block(struct kmem_cache *cachep, void 
**objpp,
check_spinlock_acquired_node(cachep, node);
slab_put_obj(cachep, page, objp);
STATS_DEC_ACTIVE(cachep);
-   n->free_objects++;
 
/* fixup slab chains */
-   if (page->active == 0) {
-   if (n->free_objects > n->free_limit) {
-   n->free_objects -= cachep->num;
-   list_add_tail(>lru, list);
-   } else {
-   list_add(>lru, >slabs_free);
-   }
-   } else {
+   if (page->active == 0)
+   list_add(>lru, >slabs_free);
+   else {
/* Unconditionally move a slab to the end of the
 * partial list on free - maximum time for the
 * other objects to be freed, too.
@@ -3313,6 +3310,14 @@ static void free_block(struct kmem_cache *cachep, void 
**objpp,
list_add_tail(>lru, >slabs_partial);
}
}
+
+   while (n->free_objects > n->free_limit && !list_empty(>slabs_free)) {
+   n->free_objects -= cachep->num;
+
+   page = list_last_entry(>slabs_free, struct page, lru);
+   list_del(>lru);
+   list_add(>lru, list);
+   }
 }
 
 static void cache_flusharray(struct kmem_cache *cachep, struct array_cache *ac)
-- 
1.9.1

[PATCH v2 02/11] mm/slab: remove BAD_ALIEN_MAGIC again

2016-04-11 Thread js1304

From: Joonsoo Kim 

Initial attemp to remove BAD_ALIEN_MAGIC is once reverted by 'commit
edcad2509550 ("Revert "slab: remove BAD_ALIEN_MAGIC"")' because it causes
a problem on m68k which has many node but !CONFIG_NUMA.  In this case,
although alien cache isn't used at all but to cope with some
initialization path, garbage value is used and that is BAD_ALIEN_MAGIC.
Now, this patch set use_alien_caches to 0 when !CONFIG_NUMA, there is no
initialization path problem so we don't need BAD_ALIEN_MAGIC at all.  So
remove it.

Tested-by: Geert Uytterhoeven 
Signed-off-by: Joonsoo Kim 
---
 mm/slab.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index d8746c0..373b8be 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -421,8 +421,6 @@ static struct kmem_cache kmem_cache_boot = {
.name = "kmem_cache",
 };
 
-#define BAD_ALIEN_MAGIC 0x01020304ul
-
 static DEFINE_PER_CPU(struct delayed_work, slab_reap_work);
 
 static inline struct array_cache *cpu_cache_get(struct kmem_cache *cachep)
@@ -637,7 +635,7 @@ static int transfer_objects(struct array_cache *to,
 static inline struct alien_cache **alloc_alien_cache(int node,
int limit, gfp_t gfp)
 {
-   return (struct alien_cache **)BAD_ALIEN_MAGIC;
+   return NULL;
 }
 
 static inline void free_alien_cache(struct alien_cache **ac_ptr)
@@ -1205,7 +1203,7 @@ void __init kmem_cache_init(void)
sizeof(struct rcu_head));
kmem_cache = _cache_boot;
 
-   if (num_possible_nodes() == 1)
+   if (!IS_ENABLED(CONFIG_NUMA) || num_possible_nodes() == 1)
use_alien_caches = 0;
 
for (i = 0; i < NUM_INIT_LISTS; i++)
-- 
1.9.1

[PATCH v2 00/11] mm/slab: reduce lock contention in alloc path

2016-04-11 Thread js1304

From: Joonsoo Kim 

Major changes from v1
o hold node lock instead of slab_mutex in kmem_cache_shrink()
o fix suspend-to-ram issue reported by Nishanth
o use synchronize_sched() instead of kick_all_cpus_sync()

While processing concurrent allocation, SLAB could be contended
a lot because it did a lots of work with holding a lock. This
patchset try to reduce the number of critical section to reduce
lock contention. Major changes are lockless decision to allocate
more slab and lockless cpu cache refill from the newly allocated slab.

Below is the result of concurrent allocation/free in slab allocation
benchmark made by Christoph a long time ago. I make the output simpler.
The number shows cycle count during alloc/free respectively so less
is better.

* Before
Kmalloc N*alloc N*free(32): Average=365/806
Kmalloc N*alloc N*free(64): Average=452/690
Kmalloc N*alloc N*free(128): Average=736/886
Kmalloc N*alloc N*free(256): Average=1167/985
Kmalloc N*alloc N*free(512): Average=2088/1125
Kmalloc N*alloc N*free(1024): Average=4115/1184
Kmalloc N*alloc N*free(2048): Average=8451/1748
Kmalloc N*alloc N*free(4096): Average=16024/2048

* After
Kmalloc N*alloc N*free(32): Average=344/792
Kmalloc N*alloc N*free(64): Average=347/882
Kmalloc N*alloc N*free(128): Average=390/959
Kmalloc N*alloc N*free(256): Average=393/1067
Kmalloc N*alloc N*free(512): Average=683/1229
Kmalloc N*alloc N*free(1024): Average=1295/1325
Kmalloc N*alloc N*free(2048): Average=2513/1664
Kmalloc N*alloc N*free(4096): Average=4742/2172

It shows that performance improves greatly (roughly more than 50%)
for the object class whose size is more than 128 bytes.

Joonsoo Kim (11):
  mm/slab: fix the theoretical race by holding proper lock
  mm/slab: remove BAD_ALIEN_MAGIC again
  mm/slab: drain the free slab as much as possible
  mm/slab: factor out kmem_cache_node initialization code
  mm/slab: clean-up kmem_cache_node setup
  mm/slab: don't keep free slabs if free_objects exceeds free_limit
  mm/slab: racy access/modify the slab color
  mm/slab: make cache_grow() handle the page allocated on arbitrary node
  mm/slab: separate cache_grow() to two parts
  mm/slab: refill cpu cache through a new slab without holding a node
lock
  mm/slab: lockless decision to grow cache

 mm/slab.c | 562 +-
 1 file changed, 295 insertions(+), 267 deletions(-)

-- 
1.9.1

[PATCH v2 01/11] mm/slab: fix the theoretical race by holding proper lock

2016-04-11 Thread js1304

From: Joonsoo Kim 

While processing concurrent allocation, SLAB could be contended a lot
because it did a lots of work with holding a lock.  This patchset try to
reduce the number of critical section to reduce lock contention.  Major
changes are lockless decision to allocate more slab and lockless cpu cache
refill from the newly allocated slab.

Below is the result of concurrent allocation/free in slab allocation
benchmark made by Christoph a long time ago.  I make the output simpler.
The number shows cycle count during alloc/free respectively so less is
better.

* Before
Kmalloc N*alloc N*free(32): Average=365/806
Kmalloc N*alloc N*free(64): Average=452/690
Kmalloc N*alloc N*free(128): Average=736/886
Kmalloc N*alloc N*free(256): Average=1167/985
Kmalloc N*alloc N*free(512): Average=2088/1125
Kmalloc N*alloc N*free(1024): Average=4115/1184
Kmalloc N*alloc N*free(2048): Average=8451/1748
Kmalloc N*alloc N*free(4096): Average=16024/2048

* After
Kmalloc N*alloc N*free(32): Average=344/792
Kmalloc N*alloc N*free(64): Average=347/882
Kmalloc N*alloc N*free(128): Average=390/959
Kmalloc N*alloc N*free(256): Average=393/1067
Kmalloc N*alloc N*free(512): Average=683/1229
Kmalloc N*alloc N*free(1024): Average=1295/1325
Kmalloc N*alloc N*free(2048): Average=2513/1664
Kmalloc N*alloc N*free(4096): Average=4742/2172

It shows that performance improves greatly (roughly more than 50%) for the
object class whose size is more than 128 bytes.

This patch (of 11):

If we don't hold neither the slab_mutex nor the node lock, node's shared
array cache could be freed and re-populated. If __kmem_cache_shrink() is
called at the same time, it will call drain_array() with n->shared without
holding node lock so problem can happen. This patch fix the situation by
holding the node lock before trying to drain the shared array.

In addition, add a debug check to confirm that n->shared access race
doesn't exist.

v2:
o Hold the node lock instead of holding the slab_mutex (per Christoph)
o Add a debug check rather than adding code comment (per Nikolay)

Signed-off-by: Joonsoo Kim 
---
 mm/slab.c | 68 ++-
 1 file changed, 45 insertions(+), 23 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index a53a0f6..d8746c0 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2173,6 +2173,11 @@ static void check_irq_on(void)
BUG_ON(irqs_disabled());
 }
 
+static void check_mutex_acquired(void)
+{
+   BUG_ON(!mutex_is_locked(_mutex));
+}
+
 static void check_spinlock_acquired(struct kmem_cache *cachep)
 {
 #ifdef CONFIG_SMP
@@ -2192,13 +2197,27 @@ static void check_spinlock_acquired_node(struct 
kmem_cache *cachep, int node)
 #else
 #define check_irq_off()do { } while(0)
 #define check_irq_on() do { } while(0)
+#define check_mutex_acquired() do { } while(0)
 #define check_spinlock_acquired(x) do { } while(0)
 #define check_spinlock_acquired_node(x, y) do { } while(0)
 #endif
 
-static void drain_array(struct kmem_cache *cachep, struct kmem_cache_node *n,
-   struct array_cache *ac,
-   int force, int node);
+static void drain_array_locked(struct kmem_cache *cachep, struct array_cache 
*ac,
+   int node, bool free_all, struct list_head *list)
+{
+   int tofree;
+
+   if (!ac || !ac->avail)
+   return;
+
+   tofree = free_all ? ac->avail : (ac->limit + 4) / 5;
+   if (tofree > ac->avail)
+   tofree = (ac->avail + 1) / 2;
+
+   free_block(cachep, ac->entry, tofree, node, list);
+   ac->avail -= tofree;
+   memmove(ac->entry, &(ac->entry[tofree]), sizeof(void *) * ac->avail);
+}
 
 static void do_drain(void *arg)
 {
@@ -,6 +2241,7 @@ static void drain_cpu_caches(struct kmem_cache *cachep)
 {
struct kmem_cache_node *n;
int node;
+   LIST_HEAD(list);
 
on_each_cpu(do_drain, cachep, 1);
check_irq_on();
@@ -2229,8 +2249,13 @@ static void drain_cpu_caches(struct kmem_cache *cachep)
if (n->alien)
drain_alien_cache(cachep, n->alien);
 
-   for_each_kmem_cache_node(cachep, node, n)
-   drain_array(cachep, n, n->shared, 1, node);
+   for_each_kmem_cache_node(cachep, node, n) {
+   spin_lock_irq(>list_lock);
+   drain_array_locked(cachep, n->shared, node, true, );
+   spin_unlock_irq(>list_lock);
+
+   slabs_destroy(cachep, );
+   }
 }
 
 /*
@@ -3873,29 +3898,26 @@ skip_setup:
  * if drain_array() is used on the shared array.
  */
 static void drain_array(struct kmem_cache *cachep, struct kmem_cache_node *n,
-struct array_cache *ac, int force, int node)
+struct array_cache *ac, int node)
 {
LIST_HEAD(list);
-   int tofree;
+
+   /* ac from n->shared can be freed if we don't hold the slab_mutex. */
+

[PATCH v2 00/11] mm/slab: reduce lock contention in alloc path

2016-04-11 Thread js1304

From: Joonsoo Kim 

Major changes from v1
o hold node lock instead of slab_mutex in kmem_cache_shrink()
o fix suspend-to-ram issue reported by Nishanth
o use synchronize_sched() instead of kick_all_cpus_sync()

While processing concurrent allocation, SLAB could be contended
a lot because it did a lots of work with holding a lock. This
patchset try to reduce the number of critical section to reduce
lock contention. Major changes are lockless decision to allocate
more slab and lockless cpu cache refill from the newly allocated slab.

Below is the result of concurrent allocation/free in slab allocation
benchmark made by Christoph a long time ago. I make the output simpler.
The number shows cycle count during alloc/free respectively so less
is better.

* Before
Kmalloc N*alloc N*free(32): Average=365/806
Kmalloc N*alloc N*free(64): Average=452/690
Kmalloc N*alloc N*free(128): Average=736/886
Kmalloc N*alloc N*free(256): Average=1167/985
Kmalloc N*alloc N*free(512): Average=2088/1125
Kmalloc N*alloc N*free(1024): Average=4115/1184
Kmalloc N*alloc N*free(2048): Average=8451/1748
Kmalloc N*alloc N*free(4096): Average=16024/2048

* After
Kmalloc N*alloc N*free(32): Average=344/792
Kmalloc N*alloc N*free(64): Average=347/882
Kmalloc N*alloc N*free(128): Average=390/959
Kmalloc N*alloc N*free(256): Average=393/1067
Kmalloc N*alloc N*free(512): Average=683/1229
Kmalloc N*alloc N*free(1024): Average=1295/1325
Kmalloc N*alloc N*free(2048): Average=2513/1664
Kmalloc N*alloc N*free(4096): Average=4742/2172

It shows that performance improves greatly (roughly more than 50%)
for the object class whose size is more than 128 bytes.

Joonsoo Kim (11):
  mm/slab: fix the theoretical race by holding proper lock
  mm/slab: remove BAD_ALIEN_MAGIC again
  mm/slab: drain the free slab as much as possible
  mm/slab: factor out kmem_cache_node initialization code
  mm/slab: clean-up kmem_cache_node setup
  mm/slab: don't keep free slabs if free_objects exceeds free_limit
  mm/slab: racy access/modify the slab color
  mm/slab: make cache_grow() handle the page allocated on arbitrary node
  mm/slab: separate cache_grow() to two parts
  mm/slab: refill cpu cache through a new slab without holding a node
lock
  mm/slab: lockless decision to grow cache

 mm/slab.c | 562 +-
 1 file changed, 295 insertions(+), 267 deletions(-)

-- 
1.9.1

[PATCH v2 01/11] mm/slab: fix the theoretical race by holding proper lock

2016-04-11 Thread js1304

From: Joonsoo Kim 

While processing concurrent allocation, SLAB could be contended a lot
because it did a lots of work with holding a lock.  This patchset try to
reduce the number of critical section to reduce lock contention.  Major
changes are lockless decision to allocate more slab and lockless cpu cache
refill from the newly allocated slab.

Below is the result of concurrent allocation/free in slab allocation
benchmark made by Christoph a long time ago.  I make the output simpler.
The number shows cycle count during alloc/free respectively so less is
better.

* Before
Kmalloc N*alloc N*free(32): Average=365/806
Kmalloc N*alloc N*free(64): Average=452/690
Kmalloc N*alloc N*free(128): Average=736/886
Kmalloc N*alloc N*free(256): Average=1167/985
Kmalloc N*alloc N*free(512): Average=2088/1125
Kmalloc N*alloc N*free(1024): Average=4115/1184
Kmalloc N*alloc N*free(2048): Average=8451/1748
Kmalloc N*alloc N*free(4096): Average=16024/2048

* After
Kmalloc N*alloc N*free(32): Average=344/792
Kmalloc N*alloc N*free(64): Average=347/882
Kmalloc N*alloc N*free(128): Average=390/959
Kmalloc N*alloc N*free(256): Average=393/1067
Kmalloc N*alloc N*free(512): Average=683/1229
Kmalloc N*alloc N*free(1024): Average=1295/1325
Kmalloc N*alloc N*free(2048): Average=2513/1664
Kmalloc N*alloc N*free(4096): Average=4742/2172

It shows that performance improves greatly (roughly more than 50%) for the
object class whose size is more than 128 bytes.

This patch (of 11):

If we don't hold neither the slab_mutex nor the node lock, node's shared
array cache could be freed and re-populated. If __kmem_cache_shrink() is
called at the same time, it will call drain_array() with n->shared without
holding node lock so problem can happen. This patch fix the situation by
holding the node lock before trying to drain the shared array.

In addition, add a debug check to confirm that n->shared access race
doesn't exist.

v2:
o Hold the node lock instead of holding the slab_mutex (per Christoph)
o Add a debug check rather than adding code comment (per Nikolay)

Signed-off-by: Joonsoo Kim 
---
 mm/slab.c | 68 ++-
 1 file changed, 45 insertions(+), 23 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index a53a0f6..d8746c0 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2173,6 +2173,11 @@ static void check_irq_on(void)
BUG_ON(irqs_disabled());
 }
 
+static void check_mutex_acquired(void)
+{
+   BUG_ON(!mutex_is_locked(_mutex));
+}
+
 static void check_spinlock_acquired(struct kmem_cache *cachep)
 {
 #ifdef CONFIG_SMP
@@ -2192,13 +2197,27 @@ static void check_spinlock_acquired_node(struct 
kmem_cache *cachep, int node)
 #else
 #define check_irq_off()do { } while(0)
 #define check_irq_on() do { } while(0)
+#define check_mutex_acquired() do { } while(0)
 #define check_spinlock_acquired(x) do { } while(0)
 #define check_spinlock_acquired_node(x, y) do { } while(0)
 #endif
 
-static void drain_array(struct kmem_cache *cachep, struct kmem_cache_node *n,
-   struct array_cache *ac,
-   int force, int node);
+static void drain_array_locked(struct kmem_cache *cachep, struct array_cache 
*ac,
+   int node, bool free_all, struct list_head *list)
+{
+   int tofree;
+
+   if (!ac || !ac->avail)
+   return;
+
+   tofree = free_all ? ac->avail : (ac->limit + 4) / 5;
+   if (tofree > ac->avail)
+   tofree = (ac->avail + 1) / 2;
+
+   free_block(cachep, ac->entry, tofree, node, list);
+   ac->avail -= tofree;
+   memmove(ac->entry, &(ac->entry[tofree]), sizeof(void *) * ac->avail);
+}
 
 static void do_drain(void *arg)
 {
@@ -,6 +2241,7 @@ static void drain_cpu_caches(struct kmem_cache *cachep)
 {
struct kmem_cache_node *n;
int node;
+   LIST_HEAD(list);
 
on_each_cpu(do_drain, cachep, 1);
check_irq_on();
@@ -2229,8 +2249,13 @@ static void drain_cpu_caches(struct kmem_cache *cachep)
if (n->alien)
drain_alien_cache(cachep, n->alien);
 
-   for_each_kmem_cache_node(cachep, node, n)
-   drain_array(cachep, n, n->shared, 1, node);
+   for_each_kmem_cache_node(cachep, node, n) {
+   spin_lock_irq(>list_lock);
+   drain_array_locked(cachep, n->shared, node, true, );
+   spin_unlock_irq(>list_lock);
+
+   slabs_destroy(cachep, );
+   }
 }
 
 /*
@@ -3873,29 +3898,26 @@ skip_setup:
  * if drain_array() is used on the shared array.
  */
 static void drain_array(struct kmem_cache *cachep, struct kmem_cache_node *n,
-struct array_cache *ac, int force, int node)
+struct array_cache *ac, int node)
 {
LIST_HEAD(list);
-   int tofree;
+
+   /* ac from n->shared can be freed if we don't hold the slab_mutex. */
+   check_mutex_acquired();
 
if (!ac || !ac->avail)

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2104 matches

Mail list logo