date:20161225

Re: [RFC PATCH 4.10 1/6] crypto/sha256: Refactor the API so it can be used without shash

2016-12-25 Thread Herbert Xu

On Sat, Dec 24, 2016 at 09:57:53AM -0800, Andy Lutomirski wrote:
> 
> I actually do use incremental hashing later on.   BPF currently
> vmallocs() a big temporary buffer just so it can fill it and hash it.
> I change it to hash as it goes.

How much data is this supposed to hash on average? If it's a large
amount then perhaps using the existing crypto API would be a better
option than adding this.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [RFC PATCH 4.10 1/6] crypto/sha256: Refactor the API so it can be used without shash

2016-12-25 Thread Herbert Xu

On Sat, Dec 24, 2016 at 09:57:53AM -0800, Andy Lutomirski wrote:
> 
> I actually do use incremental hashing later on.   BPF currently
> vmallocs() a big temporary buffer just so it can fill it and hash it.
> I change it to hash as it goes.

How much data is this supposed to hash on average? If it's a large
amount then perhaps using the existing crypto API would be a better
option than adding this.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [GIT pull] smp/hotplug: Removal of notifiers

2016-12-25 Thread Markus Trippelsdorf

On 2016.12.25 at 14:39 +0100, Thomas Gleixner wrote:
> Linus,
> 
> please pull the latest smp-urgent-for-linus git tree from:
> 
>git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
> smp-urgent-for-linus
> 
> Thomas Gleixner (11):
>   cpu/hotplug: Prevent overwriting of callbacks

The following commit:

 commit dc280d93623927570da279e99393879dbbab39e7
 Author: Thomas Gleixner 
 Date:   Wed Dec 21 20:19:49 2016 +0100

 cpu/hotplug: Prevent overwriting of callbacks

results in an early OOPs during boot on my AMD machine.
I haven't wrote down the entire backtrace, but basically things start to
go wrong in mce_threshold_create_device() from
arch/x86/kernel/cpu/mcheck/mce_amd.c.

# CONFIG_HOTPLUG_CPU is not set 

Reverting the commit "fixes" the issue for me.

-- 
Markus

Re: [GIT pull] smp/hotplug: Removal of notifiers

2016-12-25 Thread Markus Trippelsdorf

On 2016.12.25 at 14:39 +0100, Thomas Gleixner wrote:
> Linus,
> 
> please pull the latest smp-urgent-for-linus git tree from:
> 
>git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
> smp-urgent-for-linus
> 
> Thomas Gleixner (11):
>   cpu/hotplug: Prevent overwriting of callbacks

The following commit:

 commit dc280d93623927570da279e99393879dbbab39e7
 Author: Thomas Gleixner 
 Date:   Wed Dec 21 20:19:49 2016 +0100

 cpu/hotplug: Prevent overwriting of callbacks

results in an early OOPs during boot on my AMD machine.
I haven't wrote down the entire backtrace, but basically things start to
go wrong in mce_threshold_create_device() from
arch/x86/kernel/cpu/mcheck/mce_amd.c.

# CONFIG_HOTPLUG_CPU is not set 

Reverting the commit "fixes" the issue for me.

-- 
Markus

RE: [Intel-gfx] [PATCH] drm/i915: check if execlist_port is empty before using its content

2016-12-25 Thread Du, Changbin

> On Fri, Dec 23, 2016 at 01:46:36PM +0800, changbin...@intel.com wrote:
> > From: "Du, Changbin" 
> >
> > This patch fix a crash in function reset_common_ring. In this case,
> > the port[0].request is null when reset the render ring, so a null
> > dereference exception is raised. We need to check execlist_port status
> > first.
> 
> No. The root cause is whatever got you into the illegal condition in the
> first place.
> -Chris
> 
Thanks, I will restudy the code after process my current job. Since this happen
on gvt guest, so this may related to gvt emulation.

> --
> Chris Wilson, Intel Open Source Technology Centre

RE: [Intel-gfx] [PATCH] drm/i915: check if execlist_port is empty before using its content

2016-12-25 Thread Du, Changbin

> On Fri, Dec 23, 2016 at 01:46:36PM +0800, changbin...@intel.com wrote:
> > From: "Du, Changbin" 
> >
> > This patch fix a crash in function reset_common_ring. In this case,
> > the port[0].request is null when reset the render ring, so a null
> > dereference exception is raised. We need to check execlist_port status
> > first.
> 
> No. The root cause is whatever got you into the illegal condition in the
> first place.
> -Chris
> 
Thanks, I will restudy the code after process my current job. Since this happen
on gvt guest, so this may related to gvt emulation.

> --
> Chris Wilson, Intel Open Source Technology Centre

Re: [PATCH 3/4] mmc: mmc: enable ios.enhanced_strobe before mmc_set_timing

2016-12-25 Thread Shawn Lin


On 2016/12/23 17:18, Ritesh Harjani wrote:

Hi Shawn,

Do you think, below change should be fine?


yes.


I am still checking on what we discussed on Patch2.

why I am asking is because - for HS400 in SDHC-MSM, we do the DLL
re-calibration as per the HW sequence. So it is done in both the cases,
for HS400 mode without and with enhanced strobe mode.

This can be done as part of set_ios when mmc_set_timing is called in
sdhci-msm driver.

I am still trying to check more on what would be more generic and
appropriate way inside sdhci-msm, for that I would like to know if
ios.enhanced_strobe = true before calling mmc_set_timing should be
acceptable or not ?



Sure.



Regards
Ritesh

On 12/20/2016 11:23 AM, Ritesh Harjani wrote:

Some controllers may need to configure few registers based on enhanced
strobe mode while configuring to HS400 timing, thus make
ios.enhanced_strobe to true before mmc_set_timing in mmc_select_hs400es.

Signed-off-by: Ritesh Harjani 
---
 drivers/mmc/core/mmc.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c
index eb69497..052368e 100644
--- a/drivers/mmc/core/mmc.c
+++ b/drivers/mmc/core/mmc.c
@@ -1327,12 +1327,18 @@ static int mmc_select_hs400es(struct mmc_card
*card)
 goto out_err;
 }

+/*
+ * Enable enhanced_strobe in ios, as some controllers
+ * may need to configure few registers based on enhanced
+ * strobe while changing HS400 timing.
+ */
+host->ios.enhanced_strobe = true;
+
 /* Set host controller to HS400 timing and frequency */
 mmc_set_timing(host, MMC_TIMING_MMC_HS400);
 mmc_set_bus_speed(card);

 /* Controller enable enhanced strobe function */
-host->ios.enhanced_strobe = true;
 if (host->ops->hs400_enhanced_strobe)
 host->ops->hs400_enhanced_strobe(host, >ios);







--
Best Regards
Shawn Lin

Re: [PATCH 3/4] mmc: mmc: enable ios.enhanced_strobe before mmc_set_timing

2016-12-25 Thread Shawn Lin


On 2016/12/23 17:18, Ritesh Harjani wrote:

Hi Shawn,

Do you think, below change should be fine?


yes.


I am still checking on what we discussed on Patch2.

why I am asking is because - for HS400 in SDHC-MSM, we do the DLL
re-calibration as per the HW sequence. So it is done in both the cases,
for HS400 mode without and with enhanced strobe mode.

This can be done as part of set_ios when mmc_set_timing is called in
sdhci-msm driver.

I am still trying to check more on what would be more generic and
appropriate way inside sdhci-msm, for that I would like to know if
ios.enhanced_strobe = true before calling mmc_set_timing should be
acceptable or not ?



Sure.



Regards
Ritesh

On 12/20/2016 11:23 AM, Ritesh Harjani wrote:

Some controllers may need to configure few registers based on enhanced
strobe mode while configuring to HS400 timing, thus make
ios.enhanced_strobe to true before mmc_set_timing in mmc_select_hs400es.

Signed-off-by: Ritesh Harjani 
---
 drivers/mmc/core/mmc.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c
index eb69497..052368e 100644
--- a/drivers/mmc/core/mmc.c
+++ b/drivers/mmc/core/mmc.c
@@ -1327,12 +1327,18 @@ static int mmc_select_hs400es(struct mmc_card
*card)
 goto out_err;
 }

+/*
+ * Enable enhanced_strobe in ios, as some controllers
+ * may need to configure few registers based on enhanced
+ * strobe while changing HS400 timing.
+ */
+host->ios.enhanced_strobe = true;
+
 /* Set host controller to HS400 timing and frequency */
 mmc_set_timing(host, MMC_TIMING_MMC_HS400);
 mmc_set_bus_speed(card);

 /* Controller enable enhanced strobe function */
-host->ios.enhanced_strobe = true;
 if (host->ops->hs400_enhanced_strobe)
 host->ops->hs400_enhanced_strobe(host, >ios);







--
Best Regards
Shawn Lin

[rfc patch-rt] posix_cpu_timers: Kill hotplug cpu notifier

2016-12-25 Thread Mike Galbraith

Shamelessly steal softirq.c thread initialization method.

Signed-off-by: Mike Galbraith 
---
 include/linux/cpuhotplug.h |1 
 kernel/time/posix-cpu-timers.c |  158 ++---
 2 files changed, 56 insertions(+), 103 deletions(-)

--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -20,6 +20,7 @@ enum cpuhp_state {
CPUHP_SLUB_DEAD,
CPUHP_MM_WRITEBACK_DEAD,
CPUHP_SOFTIRQ_DEAD,
+   CPUHP_POSIXCPUTMR_DEAD,
CPUHP_NET_MVNETA_DEAD,
CPUHP_CPUIDLE_DEAD,
CPUHP_ARM64_FPSIMD_DEAD,
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -13,6 +13,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 /*
  * Called after updating RLIMIT_CPU to run cpu timer and update
@@ -1220,62 +1222,36 @@ static void __run_posix_cpu_timers(struc
 DEFINE_PER_CPU(struct task_struct *, posix_timer_task);
 DEFINE_PER_CPU(struct task_struct *, posix_timer_tasklist);
 
-static int posix_cpu_timers_thread(void *data)
+static void posix_cpu_timers_thread(unsigned int cpu)
 {
-   int cpu = (long)data;
+   struct task_struct *tsk = NULL;
+   struct task_struct *next = NULL;
 
-   BUG_ON(per_cpu(posix_timer_task,cpu) != current);
+   /* grab task list */
+   raw_local_irq_disable();
+   tsk = per_cpu(posix_timer_tasklist, cpu);
+   per_cpu(posix_timer_tasklist, cpu) = NULL;
+   raw_local_irq_enable();
+
+   /* Process task list */
+   while (tsk) {
+   /* save next */
+   next = tsk->posix_timer_list;
 
-   while (!kthread_should_stop()) {
-   struct task_struct *tsk = NULL;
-   struct task_struct *next = NULL;
-
-   if (cpu_is_offline(cpu))
-   goto wait_to_die;
-
-   /* grab task list */
-   raw_local_irq_disable();
-   tsk = per_cpu(posix_timer_tasklist, cpu);
-   per_cpu(posix_timer_tasklist, cpu) = NULL;
-   raw_local_irq_enable();
-
-   /* its possible the list is empty, just return */
-   if (!tsk) {
-   set_current_state(TASK_INTERRUPTIBLE);
-   schedule();
-   __set_current_state(TASK_RUNNING);
-   continue;
-   }
-
-   /* Process task list */
-   while (1) {
-   /* save next */
-   next = tsk->posix_timer_list;
+   /* run the task timers, clear its ptr and
+* unreference it
+*/
+   __run_posix_cpu_timers(tsk);
+   tsk->posix_timer_list = NULL;
+   put_task_struct(tsk);
 
-   /* run the task timers, clear its ptr and
-* unreference it
-*/
-   __run_posix_cpu_timers(tsk);
-   tsk->posix_timer_list = NULL;
-   put_task_struct(tsk);
-
-   /* check if this is the last on the list */
-   if (next == tsk)
-   break;
-   tsk = next;
-   }
+   /* check if this is the last on the list */
+   if (next == tsk)
+   break;
+   tsk = next;
}
-   return 0;
 
-wait_to_die:
-   /* Wait for kthread_stop */
-   set_current_state(TASK_INTERRUPTIBLE);
-   while (!kthread_should_stop()) {
-   schedule();
-   set_current_state(TASK_INTERRUPTIBLE);
-   }
-   __set_current_state(TASK_RUNNING);
-   return 0;
+   return;
 }
 
 static inline int __fastpath_timer_check(struct task_struct *tsk)
@@ -1322,72 +1298,48 @@ void run_posix_cpu_timers(struct task_st
}
 }
 
-/*
- * posix_cpu_thread_call - callback that gets triggered when a CPU is added.
- * Here we can start up the necessary migration thread for the new CPU.
- */
-static int posix_cpu_thread_call(struct notifier_block *nfb,
-unsigned long action, void *hcpu)
+static void posix_cpu_thread_setup(unsigned int cpu)
 {
-   int cpu = (long)hcpu;
-   struct task_struct *p;
-   struct sched_param param;
+   struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
 
-   switch (action) {
-   case CPU_UP_PREPARE:
-   p = kthread_create(posix_cpu_timers_thread, hcpu,
-   "posixcputmr/%d",cpu);
-   if (IS_ERR(p))
-   return NOTIFY_BAD;
-   p->flags |= PF_NOFREEZE;
-   kthread_bind(p, cpu);
-   /* Must be high prio to avoid getting starved */
-   param.sched_priority = MAX_RT_PRIO-1;
-   sched_setscheduler(p, SCHED_FIFO, );
-   per_cpu(posix_timer_task,cpu) = p;
-

[rfc patch-rt] posix_cpu_timers: Kill hotplug cpu notifier

2016-12-25 Thread Mike Galbraith

Shamelessly steal softirq.c thread initialization method.

Signed-off-by: Mike Galbraith 
---
 include/linux/cpuhotplug.h |1 
 kernel/time/posix-cpu-timers.c |  158 ++---
 2 files changed, 56 insertions(+), 103 deletions(-)

--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -20,6 +20,7 @@ enum cpuhp_state {
CPUHP_SLUB_DEAD,
CPUHP_MM_WRITEBACK_DEAD,
CPUHP_SOFTIRQ_DEAD,
+   CPUHP_POSIXCPUTMR_DEAD,
CPUHP_NET_MVNETA_DEAD,
CPUHP_CPUIDLE_DEAD,
CPUHP_ARM64_FPSIMD_DEAD,
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -13,6 +13,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 /*
  * Called after updating RLIMIT_CPU to run cpu timer and update
@@ -1220,62 +1222,36 @@ static void __run_posix_cpu_timers(struc
 DEFINE_PER_CPU(struct task_struct *, posix_timer_task);
 DEFINE_PER_CPU(struct task_struct *, posix_timer_tasklist);
 
-static int posix_cpu_timers_thread(void *data)
+static void posix_cpu_timers_thread(unsigned int cpu)
 {
-   int cpu = (long)data;
+   struct task_struct *tsk = NULL;
+   struct task_struct *next = NULL;
 
-   BUG_ON(per_cpu(posix_timer_task,cpu) != current);
+   /* grab task list */
+   raw_local_irq_disable();
+   tsk = per_cpu(posix_timer_tasklist, cpu);
+   per_cpu(posix_timer_tasklist, cpu) = NULL;
+   raw_local_irq_enable();
+
+   /* Process task list */
+   while (tsk) {
+   /* save next */
+   next = tsk->posix_timer_list;
 
-   while (!kthread_should_stop()) {
-   struct task_struct *tsk = NULL;
-   struct task_struct *next = NULL;
-
-   if (cpu_is_offline(cpu))
-   goto wait_to_die;
-
-   /* grab task list */
-   raw_local_irq_disable();
-   tsk = per_cpu(posix_timer_tasklist, cpu);
-   per_cpu(posix_timer_tasklist, cpu) = NULL;
-   raw_local_irq_enable();
-
-   /* its possible the list is empty, just return */
-   if (!tsk) {
-   set_current_state(TASK_INTERRUPTIBLE);
-   schedule();
-   __set_current_state(TASK_RUNNING);
-   continue;
-   }
-
-   /* Process task list */
-   while (1) {
-   /* save next */
-   next = tsk->posix_timer_list;
+   /* run the task timers, clear its ptr and
+* unreference it
+*/
+   __run_posix_cpu_timers(tsk);
+   tsk->posix_timer_list = NULL;
+   put_task_struct(tsk);
 
-   /* run the task timers, clear its ptr and
-* unreference it
-*/
-   __run_posix_cpu_timers(tsk);
-   tsk->posix_timer_list = NULL;
-   put_task_struct(tsk);
-
-   /* check if this is the last on the list */
-   if (next == tsk)
-   break;
-   tsk = next;
-   }
+   /* check if this is the last on the list */
+   if (next == tsk)
+   break;
+   tsk = next;
}
-   return 0;
 
-wait_to_die:
-   /* Wait for kthread_stop */
-   set_current_state(TASK_INTERRUPTIBLE);
-   while (!kthread_should_stop()) {
-   schedule();
-   set_current_state(TASK_INTERRUPTIBLE);
-   }
-   __set_current_state(TASK_RUNNING);
-   return 0;
+   return;
 }
 
 static inline int __fastpath_timer_check(struct task_struct *tsk)
@@ -1322,72 +1298,48 @@ void run_posix_cpu_timers(struct task_st
}
 }
 
-/*
- * posix_cpu_thread_call - callback that gets triggered when a CPU is added.
- * Here we can start up the necessary migration thread for the new CPU.
- */
-static int posix_cpu_thread_call(struct notifier_block *nfb,
-unsigned long action, void *hcpu)
+static void posix_cpu_thread_setup(unsigned int cpu)
 {
-   int cpu = (long)hcpu;
-   struct task_struct *p;
-   struct sched_param param;
+   struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
 
-   switch (action) {
-   case CPU_UP_PREPARE:
-   p = kthread_create(posix_cpu_timers_thread, hcpu,
-   "posixcputmr/%d",cpu);
-   if (IS_ERR(p))
-   return NOTIFY_BAD;
-   p->flags |= PF_NOFREEZE;
-   kthread_bind(p, cpu);
-   /* Must be high prio to avoid getting starved */
-   param.sched_priority = MAX_RT_PRIO-1;
-   sched_setscheduler(p, SCHED_FIFO, );
-   per_cpu(posix_timer_task,cpu) = p;
-   break;

[patch-rt] kvm: Convert pvclock_gtod_sync_lock to raw_spinlock_t

2016-12-25 Thread Mike Galbraith

Fix splat below.  Lock is short hold, make it raw.

[15528.614216] BUG: sleeping function called from invalid context at 
kernel/locking/rtmutex.c:995
[15528.614218] in_atomic(): 0, irqs_disabled(): 1, pid: 19619, name: 
qemu-system-x86
[15528.614218] no locks held by qemu-system-x86/19619.
[15528.614219] irq event stamp: 321840
[15528.614224] hardirqs last  enabled at (321839): [] 
entry_SYSCALL_64_fastpath+0x5/0xc2
[15528.614244] hardirqs last disabled at (321840): [] 
kvm_arch_vm_ioctl+0x234/0xda0 [kvm]
[15528.614246] softirqs last  enabled at (0): [] 
copy_process.part.36+0x5ba/0x20b0
[15528.614247] softirqs last disabled at (0): [<  (null)>]   
(null)
[15528.614250] CPU: 7 PID: 19619 Comm: qemu-system-x86 Tainted: GE  
 4.9.0-rt1-virgin #1
[15528.614250] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 
09/23/2013
[15528.614253]  c9000b98bc30 8136874d  
8803e76db200
[15528.614255]  c9000b98bc68 810abe9d 8800353472d0 
8800353472d0
[15528.614257]  7ffc53dbc2b0 000b 7ffc53dbc2b0 
c9000b98bc88
[15528.614257] Call Trace:
[15528.614262]  [] dump_stack+0x85/0xc8
[15528.614266]  [] ___might_sleep+0x15d/0x260
[15528.614268]  [] rt_spin_lock+0x24/0x80
[15528.614283]  [] __get_kvmclock_ns+0x22/0xf0 [kvm]
[15528.614297]  [] kvm_arch_vm_ioctl+0x23c/0xda0 [kvm]
[15528.614300]  [] ? __lock_acquire+0x305/0x16a0
[15528.614301]  [] ? unpin_current_cpu+0x16/0x70
[15528.614314]  [] kvm_vm_ioctl+0x9d/0x920 [kvm]
[15528.614316]  [] ? __fget+0x107/0x220
[15528.614318]  [] ? __lock_is_held+0x49/0x70
[15528.614320]  [] do_vfs_ioctl+0x96/0x6c0
[15528.614321]  [] ? __fget+0x124/0x220
[15528.614322]  [] ? __fget+0x5/0x220
[15528.614324]  [] SyS_ioctl+0x41/0x70
[15528.614326]  [] entry_SYSCALL_64_fastpath+0x1f/0xc2

Signed-off-by: Mike Galbraith 
---
 arch/x86/include/asm/kvm_host.h |2 +-
 arch/x86/kvm/x86.c  |   20 ++--
 2 files changed, 11 insertions(+), 11 deletions(-)

--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -755,7 +755,7 @@ struct kvm_arch {
u64 cur_tsc_generation;
int nr_vcpus_matched_tsc;
 
-   spinlock_t pvclock_gtod_sync_lock;
+   raw_spinlock_t pvclock_gtod_sync_lock;
bool use_master_clock;
u64 master_kernel_ns;
cycle_t master_cycle_now;
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1540,7 +1540,7 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu
kvm_vcpu_write_tsc_offset(vcpu, offset);
raw_spin_unlock_irqrestore(>arch.tsc_write_lock, flags);
 
-   spin_lock(>arch.pvclock_gtod_sync_lock);
+   raw_spin_lock(>arch.pvclock_gtod_sync_lock);
if (!matched) {
kvm->arch.nr_vcpus_matched_tsc = 0;
} else if (!already_matched) {
@@ -1548,7 +1548,7 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu
}
 
kvm_track_tsc_matching(vcpu);
-   spin_unlock(>arch.pvclock_gtod_sync_lock);
+   raw_spin_unlock(>arch.pvclock_gtod_sync_lock);
 }
 
 EXPORT_SYMBOL_GPL(kvm_write_tsc);
@@ -1715,7 +1715,7 @@ static void kvm_gen_update_masterclock(s
struct kvm_vcpu *vcpu;
struct kvm_arch *ka = >arch;
 
-   spin_lock(>pvclock_gtod_sync_lock);
+   raw_spin_lock(>pvclock_gtod_sync_lock);
kvm_make_mclock_inprogress_request(kvm);
/* no guest entries from this point */
pvclock_update_vm_gtod_copy(kvm);
@@ -1727,7 +1727,7 @@ static void kvm_gen_update_masterclock(s
kvm_for_each_vcpu(i, vcpu, kvm)
clear_bit(KVM_REQ_MCLOCK_INPROGRESS, >requests);
 
-   spin_unlock(>pvclock_gtod_sync_lock);
+   raw_spin_unlock(>pvclock_gtod_sync_lock);
 #endif
 }
 
@@ -1736,15 +1736,15 @@ static u64 __get_kvmclock_ns(struct kvm
struct kvm_arch *ka = >arch;
struct pvclock_vcpu_time_info hv_clock;
 
-   spin_lock(>pvclock_gtod_sync_lock);
+   raw_spin_lock(>pvclock_gtod_sync_lock);
if (!ka->use_master_clock) {
-   spin_unlock(>pvclock_gtod_sync_lock);
+   raw_spin_unlock(>pvclock_gtod_sync_lock);
return ktime_get_boot_ns() + ka->kvmclock_offset;
}
 
hv_clock.tsc_timestamp = ka->master_cycle_now;
hv_clock.system_time = ka->master_kernel_ns + ka->kvmclock_offset;
-   spin_unlock(>pvclock_gtod_sync_lock);
+   raw_spin_unlock(>pvclock_gtod_sync_lock);
 
kvm_get_time_scale(NSEC_PER_SEC, __this_cpu_read(cpu_tsc_khz) * 1000LL,
   _clock.tsc_shift,
@@ -1835,13 +1835,13 @@ static int kvm_guest_time_update(struct
 * If the host uses TSC clock, then passthrough TSC as stable
 * to the guest.
 */
-   spin_lock(>pvclock_gtod_sync_lock);
+   raw_spin_lock(>pvclock_gtod_sync_lock);
use_master_clock = ka->use_master_clock;
if (use_master_clock) {
host_tsc = ka->master_cycle_now;
kernel_ns =

[patch-rt] kvm: Convert pvclock_gtod_sync_lock to raw_spinlock_t

2016-12-25 Thread Mike Galbraith

Fix splat below.  Lock is short hold, make it raw.

[15528.614216] BUG: sleeping function called from invalid context at 
kernel/locking/rtmutex.c:995
[15528.614218] in_atomic(): 0, irqs_disabled(): 1, pid: 19619, name: 
qemu-system-x86
[15528.614218] no locks held by qemu-system-x86/19619.
[15528.614219] irq event stamp: 321840
[15528.614224] hardirqs last  enabled at (321839): [] 
entry_SYSCALL_64_fastpath+0x5/0xc2
[15528.614244] hardirqs last disabled at (321840): [] 
kvm_arch_vm_ioctl+0x234/0xda0 [kvm]
[15528.614246] softirqs last  enabled at (0): [] 
copy_process.part.36+0x5ba/0x20b0
[15528.614247] softirqs last disabled at (0): [<  (null)>]   
(null)
[15528.614250] CPU: 7 PID: 19619 Comm: qemu-system-x86 Tainted: GE  
 4.9.0-rt1-virgin #1
[15528.614250] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 
09/23/2013
[15528.614253]  c9000b98bc30 8136874d  
8803e76db200
[15528.614255]  c9000b98bc68 810abe9d 8800353472d0 
8800353472d0
[15528.614257]  7ffc53dbc2b0 000b 7ffc53dbc2b0 
c9000b98bc88
[15528.614257] Call Trace:
[15528.614262]  [] dump_stack+0x85/0xc8
[15528.614266]  [] ___might_sleep+0x15d/0x260
[15528.614268]  [] rt_spin_lock+0x24/0x80
[15528.614283]  [] __get_kvmclock_ns+0x22/0xf0 [kvm]
[15528.614297]  [] kvm_arch_vm_ioctl+0x23c/0xda0 [kvm]
[15528.614300]  [] ? __lock_acquire+0x305/0x16a0
[15528.614301]  [] ? unpin_current_cpu+0x16/0x70
[15528.614314]  [] kvm_vm_ioctl+0x9d/0x920 [kvm]
[15528.614316]  [] ? __fget+0x107/0x220
[15528.614318]  [] ? __lock_is_held+0x49/0x70
[15528.614320]  [] do_vfs_ioctl+0x96/0x6c0
[15528.614321]  [] ? __fget+0x124/0x220
[15528.614322]  [] ? __fget+0x5/0x220
[15528.614324]  [] SyS_ioctl+0x41/0x70
[15528.614326]  [] entry_SYSCALL_64_fastpath+0x1f/0xc2

Signed-off-by: Mike Galbraith 
---
 arch/x86/include/asm/kvm_host.h |2 +-
 arch/x86/kvm/x86.c  |   20 ++--
 2 files changed, 11 insertions(+), 11 deletions(-)

--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -755,7 +755,7 @@ struct kvm_arch {
u64 cur_tsc_generation;
int nr_vcpus_matched_tsc;
 
-   spinlock_t pvclock_gtod_sync_lock;
+   raw_spinlock_t pvclock_gtod_sync_lock;
bool use_master_clock;
u64 master_kernel_ns;
cycle_t master_cycle_now;
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1540,7 +1540,7 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu
kvm_vcpu_write_tsc_offset(vcpu, offset);
raw_spin_unlock_irqrestore(>arch.tsc_write_lock, flags);
 
-   spin_lock(>arch.pvclock_gtod_sync_lock);
+   raw_spin_lock(>arch.pvclock_gtod_sync_lock);
if (!matched) {
kvm->arch.nr_vcpus_matched_tsc = 0;
} else if (!already_matched) {
@@ -1548,7 +1548,7 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu
}
 
kvm_track_tsc_matching(vcpu);
-   spin_unlock(>arch.pvclock_gtod_sync_lock);
+   raw_spin_unlock(>arch.pvclock_gtod_sync_lock);
 }
 
 EXPORT_SYMBOL_GPL(kvm_write_tsc);
@@ -1715,7 +1715,7 @@ static void kvm_gen_update_masterclock(s
struct kvm_vcpu *vcpu;
struct kvm_arch *ka = >arch;
 
-   spin_lock(>pvclock_gtod_sync_lock);
+   raw_spin_lock(>pvclock_gtod_sync_lock);
kvm_make_mclock_inprogress_request(kvm);
/* no guest entries from this point */
pvclock_update_vm_gtod_copy(kvm);
@@ -1727,7 +1727,7 @@ static void kvm_gen_update_masterclock(s
kvm_for_each_vcpu(i, vcpu, kvm)
clear_bit(KVM_REQ_MCLOCK_INPROGRESS, >requests);
 
-   spin_unlock(>pvclock_gtod_sync_lock);
+   raw_spin_unlock(>pvclock_gtod_sync_lock);
 #endif
 }
 
@@ -1736,15 +1736,15 @@ static u64 __get_kvmclock_ns(struct kvm
struct kvm_arch *ka = >arch;
struct pvclock_vcpu_time_info hv_clock;
 
-   spin_lock(>pvclock_gtod_sync_lock);
+   raw_spin_lock(>pvclock_gtod_sync_lock);
if (!ka->use_master_clock) {
-   spin_unlock(>pvclock_gtod_sync_lock);
+   raw_spin_unlock(>pvclock_gtod_sync_lock);
return ktime_get_boot_ns() + ka->kvmclock_offset;
}
 
hv_clock.tsc_timestamp = ka->master_cycle_now;
hv_clock.system_time = ka->master_kernel_ns + ka->kvmclock_offset;
-   spin_unlock(>pvclock_gtod_sync_lock);
+   raw_spin_unlock(>pvclock_gtod_sync_lock);
 
kvm_get_time_scale(NSEC_PER_SEC, __this_cpu_read(cpu_tsc_khz) * 1000LL,
   _clock.tsc_shift,
@@ -1835,13 +1835,13 @@ static int kvm_guest_time_update(struct
 * If the host uses TSC clock, then passthrough TSC as stable
 * to the guest.
 */
-   spin_lock(>pvclock_gtod_sync_lock);
+   raw_spin_lock(>pvclock_gtod_sync_lock);
use_master_clock = ka->use_master_clock;
if (use_master_clock) {
host_tsc = ka->master_cycle_now;
kernel_ns =

[PATCH] coresight: fix kernel panic caused by invalid CPU

2016-12-25 Thread Wang Nan

Commit d52c9750f150 ("coresight: reset 'enable_sink' flag when need be")
caused a kernel panic because of the using of an invalid value: after
'for_each_cpu(cpu, mask)', value of local variable 'cpu' become invalid,
causes following 'cpu_to_node' access invalid memory area.

This patch brings the deleted 'cpu = cpumask_first(mask)' back.

Panic log:

 $ perf record -e cs_etm// ls

 Unable to handle kernel paging request at virtual address fffe801804af4f10
 pgd = 8017ce031600
 [fffe801804af4f10] *pgd=, *pud=
 Internal error: Oops: 9604 [#1] SMP
 Modules linked in:
 CPU: 33 PID: 1619 Comm: perf Not tainted 4.7.1+ #16
 Hardware name: Huawei Taishan 2280 /CH05TEVBA, BIOS 1.10 11/24/2016
 task: 8017cb0c8400 ti: 8017cb154000 task.ti: 8017cb154000
 PC is at tmc_alloc_etf_buffer+0x60/0xd4
 LR is at tmc_alloc_etf_buffer+0x44/0xd4
 pc : [] lr : [] pstate: 6145
 sp : 8017cb157b40
 x29: 8017cb157b40 x28: 
 ...skip...
 7a60: 08c64dc8 0006 0253 
 7a80:   080872cc 0001
 [] tmc_alloc_etf_buffer+0x60/0xd4
 [] etm_setup_aux+0x1dc/0x1e8
 [] rb_alloc_aux+0x2b0/0x338
 [] perf_mmap+0x414/0x568
 [] mmap_region+0x324/0x544
 [] do_mmap+0x334/0x3e0
 [] vm_mmap_pgoff+0xa4/0xc8
 [] SyS_mmap_pgoff+0xb0/0x22c
 [] sys_mmap+0x18/0x28
 [] el0_svc_naked+0x24/0x28
 Code: 912040a5 d0001c00 f873d821 911c6000 (b8656822)
 ---[ end trace 98933da8f92b0c9a ]---

Signed-off-by: Wang Nan 
Cc: Xia Kaixu 
Cc: Li Zefan 
Cc: Mathieu Poirier 
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
---
 drivers/hwtracing/coresight/coresight-etm-perf.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c 
b/drivers/hwtracing/coresight/coresight-etm-perf.c
index 1774196..26cfac3 100644
--- a/drivers/hwtracing/coresight/coresight-etm-perf.c
+++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
@@ -242,6 +242,7 @@ static void *etm_setup_aux(int event_cpu, void **pages,
if (!sink_ops(sink)->alloc_buffer)
goto err;
 
+   cpu = cpumask_first(mask);
/* Get the AUX specific data from the sink buffer */
event_data->snk_config =
sink_ops(sink)->alloc_buffer(sink, cpu, pages,
-- 
2.10.1

[PATCH] coresight: fix kernel panic caused by invalid CPU

2016-12-25 Thread Wang Nan

Commit d52c9750f150 ("coresight: reset 'enable_sink' flag when need be")
caused a kernel panic because of the using of an invalid value: after
'for_each_cpu(cpu, mask)', value of local variable 'cpu' become invalid,
causes following 'cpu_to_node' access invalid memory area.

This patch brings the deleted 'cpu = cpumask_first(mask)' back.

Panic log:

 $ perf record -e cs_etm// ls

 Unable to handle kernel paging request at virtual address fffe801804af4f10
 pgd = 8017ce031600
 [fffe801804af4f10] *pgd=, *pud=
 Internal error: Oops: 9604 [#1] SMP
 Modules linked in:
 CPU: 33 PID: 1619 Comm: perf Not tainted 4.7.1+ #16
 Hardware name: Huawei Taishan 2280 /CH05TEVBA, BIOS 1.10 11/24/2016
 task: 8017cb0c8400 ti: 8017cb154000 task.ti: 8017cb154000
 PC is at tmc_alloc_etf_buffer+0x60/0xd4
 LR is at tmc_alloc_etf_buffer+0x44/0xd4
 pc : [] lr : [] pstate: 6145
 sp : 8017cb157b40
 x29: 8017cb157b40 x28: 
 ...skip...
 7a60: 08c64dc8 0006 0253 
 7a80:   080872cc 0001
 [] tmc_alloc_etf_buffer+0x60/0xd4
 [] etm_setup_aux+0x1dc/0x1e8
 [] rb_alloc_aux+0x2b0/0x338
 [] perf_mmap+0x414/0x568
 [] mmap_region+0x324/0x544
 [] do_mmap+0x334/0x3e0
 [] vm_mmap_pgoff+0xa4/0xc8
 [] SyS_mmap_pgoff+0xb0/0x22c
 [] sys_mmap+0x18/0x28
 [] el0_svc_naked+0x24/0x28
 Code: 912040a5 d0001c00 f873d821 911c6000 (b8656822)
 ---[ end trace 98933da8f92b0c9a ]---

Signed-off-by: Wang Nan 
Cc: Xia Kaixu 
Cc: Li Zefan 
Cc: Mathieu Poirier 
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
---
 drivers/hwtracing/coresight/coresight-etm-perf.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c 
b/drivers/hwtracing/coresight/coresight-etm-perf.c
index 1774196..26cfac3 100644
--- a/drivers/hwtracing/coresight/coresight-etm-perf.c
+++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
@@ -242,6 +242,7 @@ static void *etm_setup_aux(int event_cpu, void **pages,
if (!sink_ops(sink)->alloc_buffer)
goto err;
 
+   cpu = cpumask_first(mask);
/* Get the AUX specific data from the sink buffer */
event_data->snk_config =
sink_ops(sink)->alloc_buffer(sink, cpu, pages,
-- 
2.10.1

[lkp-developer] [mm, memcg] d18e2b2aca: WARNING:at_mm/memcontrol.c:#mem_cgroup_update_lru_size

2016-12-25 Thread kernel test robot


FYI, we noticed the following commit:

commit: d18e2b2aca0396849f588241e134787a829c707d ("mm, memcg: fix (Re: OOM: 
Better, but still there on)")
url: 
https://github.com/0day-ci/linux/commits/Michal-Hocko/mm-memcg-fix-Re-OOM-Better-but-still-there-on/20161223-225057
base: git://git.cmpxchg.org/linux-mmotm.git master

in testcase: boot

on test machine: qemu-system-i386 -enable-kvm -m 360M

caused below changes:


++++
|| c7d85b880b | 
d18e2b2aca |
++++
| boot_successes | 8  | 0   
   |
| boot_failures  | 0  | 2   
   |
| WARNING:at_mm/memcontrol.c:#mem_cgroup_update_lru_size | 0  | 2   
   |
| kernel_BUG_at_mm/memcontrol.c  | 0  | 2   
   |
| invalid_opcode:#[##]DEBUG_PAGEALLOC| 0  | 2   
   |
| Kernel_panic-not_syncing:Fatal_exception   | 0  | 2   
   |
++++



[   95.226364] init: tty6 main process (990) killed by TERM signal
[   95.314020] init: plymouth-upstart-bridge main process (1039) terminated 
with status 1
[   97.588568] [ cut here ]
[   97.594364] WARNING: CPU: 0 PID: 1055 at mm/memcontrol.c:1032 
mem_cgroup_update_lru_size+0xdd/0x12b
[   97.606654] mem_cgroup_update_lru_size(40297f00, 0, -1): lru_size 1 but empty
[   97.615140] Modules linked in:
[   97.618834] CPU: 0 PID: 1055 Comm: killall5 Not tainted 
4.9.0-mm1-00095-gd18e2b2 #82
[   97.628008] Call Trace:
[   97.631025]  dump_stack+0x16/0x18
[   97.635107]  __warn+0xaf/0xc6
[   97.638729]  ? mem_cgroup_update_lru_size+0xdd/0x12b


To reproduce:

git clone 
git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k  job-script  # job-script is attached in this 
email



Thanks,
Xiaolong
#
# Automatically generated file; DO NOT EDIT.
# Linux/i386 4.9.0-mm1 Kernel Configuration
#
# CONFIG_64BIT is not set
CONFIG_X86_32=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf32-i386"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_BITS_MAX=16
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_X86_32_LAZY_GS=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_DEBUG_RODATA=y
CONFIG_PGTABLE_LEVELS=2
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
# CONFIG_KERNEL_GZIP is not set
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
CONFIG_KERNEL_LZO=y
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
# CONFIG_SYSVIPC is not set
# CONFIG_POSIX_MQUEUE is not set
# CONFIG_CROSS_MEMORY_ATTACH is not set
CONFIG_FHANDLE=y
# CONFIG_USELIB is not set
# CONFIG_AUDIT is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_IRQ_DOMAIN_DEBUG=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_HZ_PERIODIC=y
# CONFIG_NO_HZ_IDLE is not set
# CONFIG_NO_HZ is not set
CONFIG_HIGH_RES_TIMERS=y

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_IRQ_TIME_ACCOUNTING is not set
# CONFIG_BSD_PROCESS_ACCT

[lkp-developer] [mm, memcg] d18e2b2aca: WARNING:at_mm/memcontrol.c:#mem_cgroup_update_lru_size

2016-12-25 Thread kernel test robot


FYI, we noticed the following commit:

commit: d18e2b2aca0396849f588241e134787a829c707d ("mm, memcg: fix (Re: OOM: 
Better, but still there on)")
url: 
https://github.com/0day-ci/linux/commits/Michal-Hocko/mm-memcg-fix-Re-OOM-Better-but-still-there-on/20161223-225057
base: git://git.cmpxchg.org/linux-mmotm.git master

in testcase: boot

on test machine: qemu-system-i386 -enable-kvm -m 360M

caused below changes:


++++
|| c7d85b880b | 
d18e2b2aca |
++++
| boot_successes | 8  | 0   
   |
| boot_failures  | 0  | 2   
   |
| WARNING:at_mm/memcontrol.c:#mem_cgroup_update_lru_size | 0  | 2   
   |
| kernel_BUG_at_mm/memcontrol.c  | 0  | 2   
   |
| invalid_opcode:#[##]DEBUG_PAGEALLOC| 0  | 2   
   |
| Kernel_panic-not_syncing:Fatal_exception   | 0  | 2   
   |
++++



[   95.226364] init: tty6 main process (990) killed by TERM signal
[   95.314020] init: plymouth-upstart-bridge main process (1039) terminated 
with status 1
[   97.588568] [ cut here ]
[   97.594364] WARNING: CPU: 0 PID: 1055 at mm/memcontrol.c:1032 
mem_cgroup_update_lru_size+0xdd/0x12b
[   97.606654] mem_cgroup_update_lru_size(40297f00, 0, -1): lru_size 1 but empty
[   97.615140] Modules linked in:
[   97.618834] CPU: 0 PID: 1055 Comm: killall5 Not tainted 
4.9.0-mm1-00095-gd18e2b2 #82
[   97.628008] Call Trace:
[   97.631025]  dump_stack+0x16/0x18
[   97.635107]  __warn+0xaf/0xc6
[   97.638729]  ? mem_cgroup_update_lru_size+0xdd/0x12b


To reproduce:

git clone 
git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k  job-script  # job-script is attached in this 
email



Thanks,
Xiaolong
#
# Automatically generated file; DO NOT EDIT.
# Linux/i386 4.9.0-mm1 Kernel Configuration
#
# CONFIG_64BIT is not set
CONFIG_X86_32=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf32-i386"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_BITS_MAX=16
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_X86_32_LAZY_GS=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_DEBUG_RODATA=y
CONFIG_PGTABLE_LEVELS=2
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
# CONFIG_KERNEL_GZIP is not set
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
CONFIG_KERNEL_LZO=y
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
# CONFIG_SYSVIPC is not set
# CONFIG_POSIX_MQUEUE is not set
# CONFIG_CROSS_MEMORY_ATTACH is not set
CONFIG_FHANDLE=y
# CONFIG_USELIB is not set
# CONFIG_AUDIT is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_IRQ_DOMAIN_DEBUG=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_HZ_PERIODIC=y
# CONFIG_NO_HZ_IDLE is not set
# CONFIG_NO_HZ is not set
CONFIG_HIGH_RES_TIMERS=y

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_IRQ_TIME_ACCOUNTING is not set
# CONFIG_BSD_PROCESS_ACCT

[PATCH] Staging: comedi: comedi_compat32: fixed a syntax error

2016-12-25 Thread Jonathan Villatoro

Fixed a syntax error in the function definition's parameter.

Signed-off-by: Jonathan Horacio Villatoro Córdoba 
---
 drivers/staging/comedi/comedi_compat32.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/comedi/comedi_compat32.h 
b/drivers/staging/comedi/comedi_compat32.h
index 5ce77f3..91d15c6 100644
--- a/drivers/staging/comedi/comedi_compat32.h
+++ b/drivers/staging/comedi/comedi_compat32.h
@@ -25,7 +25,7 @@
 #ifdef CONFIG_COMPAT
 
 struct file;
-long comedi_compat_ioctl(struct file *, unsigned int cmd, unsigned long arg);
+long comedi_compat_ioctl(struct file *f, unsigned int cmd, unsigned long arg);
 
 #else /* CONFIG_COMPAT */
 
-- 
2.10.2

[PATCH] Staging: comedi: comedi_compat32: fixed a syntax error

2016-12-25 Thread Jonathan Villatoro

Fixed a syntax error in the function definition's parameter.

Signed-off-by: Jonathan Horacio Villatoro Córdoba 
---
 drivers/staging/comedi/comedi_compat32.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/comedi/comedi_compat32.h 
b/drivers/staging/comedi/comedi_compat32.h
index 5ce77f3..91d15c6 100644
--- a/drivers/staging/comedi/comedi_compat32.h
+++ b/drivers/staging/comedi/comedi_compat32.h
@@ -25,7 +25,7 @@
 #ifdef CONFIG_COMPAT
 
 struct file;
-long comedi_compat_ioctl(struct file *, unsigned int cmd, unsigned long arg);
+long comedi_compat_ioctl(struct file *f, unsigned int cmd, unsigned long arg);
 
 #else /* CONFIG_COMPAT */
 
-- 
2.10.2

Re: [PATCHSET] cgroup: reorganize cgroup source files

2016-12-25 Thread Zefan Li

On 2016/12/21 5:08, Tejun Heo wrote:
> cgroup source files are getting a bit too unwieldy.  Let's reorganize
> them.
> 
> * Move cgroup files in kernel to kernel/cgroup.
> 
> * Split cgroup.c into multiple files.
> 
> The only functional change this patchset causes is loss of a
> WARN_ON(); otherwise, everything remains identical.
> 
> This patchset contains the following eight patches.
> 
>  0001-cgroup-reorder-css_set-fields.patch
>  0002-cgroup-move-cgroup-files-under-kernel-cgroup.patch
>  0003-cgroup-move-cgroup-v1-specific-code-to-kernel-cgroup.patch
>  0004-cgroup-refactor-mount-path-and-clearly-distinguish-v.patch
>  0005-cgroup-separate-out-cgroup1_kf_syscall_ops.patch
>  0006-cgroup-move-v1-mount-functions-to-kernel-cgroup-cgro.patch
>  0007-cgroup-rename-functions-for-consistency.patch
>  0008-cgroup-move-namespace-code-to-kernel-cgroup-namespac.patch
> 
> The patchset is available in the following git branch
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git 
> review-cgroup-reorg
> 
> and on top of the following git branch
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git 
> review-cgroup2-procs
> 
> diffstat follows.  Thanks.
> 
>  include/linux/cgroup-defs.h |   54 -
>  kernel/Makefile |5 
>  kernel/cgroup/Makefile  |5 
>  kernel/cgroup/cgroup-internal.h |  155 +++
>  kernel/cgroup/cgroup-v1.c   | 1395 
>  kernel/cgroup/cgroup.c  | 1940 
> +++-
>  kernel/cgroup/namespace.c   |  155 +++
>  7 files changed, 1907 insertions(+), 1802 deletions(-)
> 

Acked-by: Zefan Li

Re: [PATCHSET] cgroup: reorganize cgroup source files

2016-12-25 Thread Zefan Li

On 2016/12/21 5:08, Tejun Heo wrote:
> cgroup source files are getting a bit too unwieldy.  Let's reorganize
> them.
> 
> * Move cgroup files in kernel to kernel/cgroup.
> 
> * Split cgroup.c into multiple files.
> 
> The only functional change this patchset causes is loss of a
> WARN_ON(); otherwise, everything remains identical.
> 
> This patchset contains the following eight patches.
> 
>  0001-cgroup-reorder-css_set-fields.patch
>  0002-cgroup-move-cgroup-files-under-kernel-cgroup.patch
>  0003-cgroup-move-cgroup-v1-specific-code-to-kernel-cgroup.patch
>  0004-cgroup-refactor-mount-path-and-clearly-distinguish-v.patch
>  0005-cgroup-separate-out-cgroup1_kf_syscall_ops.patch
>  0006-cgroup-move-v1-mount-functions-to-kernel-cgroup-cgro.patch
>  0007-cgroup-rename-functions-for-consistency.patch
>  0008-cgroup-move-namespace-code-to-kernel-cgroup-namespac.patch
> 
> The patchset is available in the following git branch
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git 
> review-cgroup-reorg
> 
> and on top of the following git branch
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git 
> review-cgroup2-procs
> 
> diffstat follows.  Thanks.
> 
>  include/linux/cgroup-defs.h |   54 -
>  kernel/Makefile |5 
>  kernel/cgroup/Makefile  |5 
>  kernel/cgroup/cgroup-internal.h |  155 +++
>  kernel/cgroup/cgroup-v1.c   | 1395 
>  kernel/cgroup/cgroup.c  | 1940 
> +++-
>  kernel/cgroup/namespace.c   |  155 +++
>  7 files changed, 1907 insertions(+), 1802 deletions(-)
> 

Acked-by: Zefan Li

Re: [PATCHSET] kernfs, cgroup: reimplement "cgroup.procs" reading for v2

2016-12-25 Thread Zefan Li

On 2016/12/21 0:12, Tejun Heo wrote:
> On cgroup v1, the pid listings in "cgroup.procs" and "tasks" are
> sorted which adds a lot of complications and overhead.  v2 doesn't
> have such requirement and has been intentionally using a modified
> sorting order so that the output doesn't look sorted to users.
> 
> This patchset re-implements "cgroup.procs" reading for v2 which simply
> keeps a css_task_iter open while the file is being read.  Keeping the
> iterator open makes it unnecessary to skip to the right position on
> each read segment and associated errors - e.g. incorrectly skipping
> over pids because earlier pids disappeared between the reads.
> 
> Using persistent iterator across multiple read calls requires
> ->release() callback to clean it up.  kernfs operations
> ->open/release() are added and piped through cftype.
> 
> This patchset contains the following five patches.
> 
>  0001-kernfs-make-kernfs_open_file-mmapped-a-bitfield.patch
>  0002-kernfs-add-kernfs_ops-open-release-callbacks.patch
>  0003-cgroup-add-cftype-open-release-callbacks.patch
>  0004-cgroup-reimplement-reading-cgroup.procs-on-cgroup-v2.patch
>  0005-cgroup-remove-cgroup_pid_fry-and-friends.patch
> 
> 0001 is a misc kernfs patch and 0002 adds ->open/release() to kernfs.
> 0003 pipes ->open/release() through cftype.  0004 implements the new
> cgroup.procs for v2 and 0005 removes the now unused sort order frying
> logic.
> 
> Greg, would it be okay to route the kernfs patches through
> cgroup/for-4.11?
> 
> The patches are also available in the following git branch.
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git 
> review-cgroup2-procs
> 
> diffstat follows.  Thanks.
> 
>  fs/kernfs/dir.c |2 
>  fs/kernfs/file.c|   53 +++--
>  fs/kernfs/kernfs-internal.h |2 
>  include/linux/cgroup-defs.h |3 +
>  include/linux/kernfs.h  |   12 +++-
>  kernel/cgroup.c |  130 
> +---
>  6 files changed, 148 insertions(+), 54 deletions(-)
> 

Acked-by: Zefan Li

Re: [PATCHSET] kernfs, cgroup: reimplement "cgroup.procs" reading for v2

2016-12-25 Thread Zefan Li

On 2016/12/21 0:12, Tejun Heo wrote:
> On cgroup v1, the pid listings in "cgroup.procs" and "tasks" are
> sorted which adds a lot of complications and overhead.  v2 doesn't
> have such requirement and has been intentionally using a modified
> sorting order so that the output doesn't look sorted to users.
> 
> This patchset re-implements "cgroup.procs" reading for v2 which simply
> keeps a css_task_iter open while the file is being read.  Keeping the
> iterator open makes it unnecessary to skip to the right position on
> each read segment and associated errors - e.g. incorrectly skipping
> over pids because earlier pids disappeared between the reads.
> 
> Using persistent iterator across multiple read calls requires
> ->release() callback to clean it up.  kernfs operations
> ->open/release() are added and piped through cftype.
> 
> This patchset contains the following five patches.
> 
>  0001-kernfs-make-kernfs_open_file-mmapped-a-bitfield.patch
>  0002-kernfs-add-kernfs_ops-open-release-callbacks.patch
>  0003-cgroup-add-cftype-open-release-callbacks.patch
>  0004-cgroup-reimplement-reading-cgroup.procs-on-cgroup-v2.patch
>  0005-cgroup-remove-cgroup_pid_fry-and-friends.patch
> 
> 0001 is a misc kernfs patch and 0002 adds ->open/release() to kernfs.
> 0003 pipes ->open/release() through cftype.  0004 implements the new
> cgroup.procs for v2 and 0005 removes the now unused sort order frying
> logic.
> 
> Greg, would it be okay to route the kernfs patches through
> cgroup/for-4.11?
> 
> The patches are also available in the following git branch.
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git 
> review-cgroup2-procs
> 
> diffstat follows.  Thanks.
> 
>  fs/kernfs/dir.c |2 
>  fs/kernfs/file.c|   53 +++--
>  fs/kernfs/kernfs-internal.h |2 
>  include/linux/cgroup-defs.h |3 +
>  include/linux/kernfs.h  |   12 +++-
>  kernel/cgroup.c |  130 
> +---
>  6 files changed, 148 insertions(+), 54 deletions(-)
> 

Acked-by: Zefan Li

[PATCH] x86/apic: Fix two typos in comments

2016-12-25 Thread Dou Liyang

s/inr_logical_cpuidi/nr_logical_cpuids/
s/generic_processor_info()/__generic_processor_info()/

Signed-off-by: Dou Liyang 
---
 arch/x86/kernel/apic/apic.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 5b7e43e..c32a3ad 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -2028,7 +2028,7 @@ void disconnect_bsp_APIC(int virt_wire_setup)
 /*
  * The number of allocated logical CPU IDs. Since logical CPU IDs are allocated
  * contiguously, it equals to current allocated max logical CPU ID plus 1.
- * All allocated CPU ID should be in [0, nr_logical_cpuidi), so the maximum of
+ * All allocated CPU ID should be in [0, nr_logical_cpuids), so the maximum of
  * nr_logical_cpuids is nr_cpu_ids.
  *
  * NOTE: Reserve 0 for BSP.
@@ -2094,7 +2094,7 @@ int __generic_processor_info(int apicid, int version, 
bool enabled)
 * Since fixing handling of boot_cpu_physical_apicid requires
 * another discussion and tests on each platform, we leave it
 * for now and here we use read_apic_id() directly in this
-* function, generic_processor_info().
+* function, __generic_processor_info().
 */
if (disabled_cpu_apicid != BAD_APICID &&
disabled_cpu_apicid != read_apic_id() &&
-- 
2.5.5

[PATCH] x86/apic: Fix two typos in comments

2016-12-25 Thread Dou Liyang

s/inr_logical_cpuidi/nr_logical_cpuids/
s/generic_processor_info()/__generic_processor_info()/

Signed-off-by: Dou Liyang 
---
 arch/x86/kernel/apic/apic.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 5b7e43e..c32a3ad 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -2028,7 +2028,7 @@ void disconnect_bsp_APIC(int virt_wire_setup)
 /*
  * The number of allocated logical CPU IDs. Since logical CPU IDs are allocated
  * contiguously, it equals to current allocated max logical CPU ID plus 1.
- * All allocated CPU ID should be in [0, nr_logical_cpuidi), so the maximum of
+ * All allocated CPU ID should be in [0, nr_logical_cpuids), so the maximum of
  * nr_logical_cpuids is nr_cpu_ids.
  *
  * NOTE: Reserve 0 for BSP.
@@ -2094,7 +2094,7 @@ int __generic_processor_info(int apicid, int version, 
bool enabled)
 * Since fixing handling of boot_cpu_physical_apicid requires
 * another discussion and tests on each platform, we leave it
 * for now and here we use read_apic_id() directly in this
-* function, generic_processor_info().
+* function, __generic_processor_info().
 */
if (disabled_cpu_apicid != BAD_APICID &&
disabled_cpu_apicid != read_apic_id() &&
-- 
2.5.5

Re: [PATCH v2] locking/pvqspinlock: Relax cmpxchg's to improve performance on some archs

2016-12-25 Thread Boqun Feng

Hi Wainman,

On Sun, Dec 25, 2016 at 03:26:01PM -0500, Waiman Long wrote:
> A number of cmpxchg calls in qspinlock_paravirt.h were replaced by more
> relaxed versions to improve performance on architectures that use LL/SC.
> 
> All the locking related cmpxchg's are replaced with the _acquire
> variants:
>  - pv_queued_spin_steal_lock()
>  - trylock_clear_pending()
> 
> The cmpxchg's related to hashing are replaced by either by the _release
> or the _relaxed variants. See the inline comment for details.
> 
> Signed-off-by: Waiman Long 
> 
>  v1->v2:
>   - Add comments in changelog and code for the rationale of the change.
> 
> ---
>  kernel/locking/qspinlock_paravirt.h | 50 
> -
>  1 file changed, 33 insertions(+), 17 deletions(-)
> 
> diff --git a/kernel/locking/qspinlock_paravirt.h 
> b/kernel/locking/qspinlock_paravirt.h
> index e3b5520..c31d1ab 100644
> --- a/kernel/locking/qspinlock_paravirt.h
> +++ b/kernel/locking/qspinlock_paravirt.h
> @@ -72,7 +72,7 @@ static inline bool pv_queued_spin_steal_lock(struct 
> qspinlock *lock)
>   struct __qspinlock *l = (void *)lock;
>  
>   if (!(atomic_read(>val) & _Q_LOCKED_PENDING_MASK) &&
> - (cmpxchg(>locked, 0, _Q_LOCKED_VAL) == 0)) {
> + (cmpxchg_acquire(>locked, 0, _Q_LOCKED_VAL) == 0)) {
>   qstat_inc(qstat_pv_lock_stealing, true);
>   return true;
>   }
> @@ -101,16 +101,16 @@ static __always_inline void clear_pending(struct 
> qspinlock *lock)
>  
>  /*
>   * The pending bit check in pv_queued_spin_steal_lock() isn't a memory
> - * barrier. Therefore, an atomic cmpxchg() is used to acquire the lock
> - * just to be sure that it will get it.
> + * barrier. Therefore, an atomic cmpxchg_acquire() is used to acquire the
> + * lock to provide the proper memory barrier.
>   */
>  static __always_inline int trylock_clear_pending(struct qspinlock *lock)
>  {
>   struct __qspinlock *l = (void *)lock;
>  
>   return !READ_ONCE(l->locked) &&
> -(cmpxchg(>locked_pending, _Q_PENDING_VAL, _Q_LOCKED_VAL)
> - == _Q_PENDING_VAL);
> +(cmpxchg_acquire(>locked_pending, _Q_PENDING_VAL,
> + _Q_LOCKED_VAL) == _Q_PENDING_VAL);
>  }
>  #else /* _Q_PENDING_BITS == 8 */
>  static __always_inline void set_pending(struct qspinlock *lock)
> @@ -138,7 +138,7 @@ static __always_inline int trylock_clear_pending(struct 
> qspinlock *lock)
>*/
>   old = val;
>   new = (val & ~_Q_PENDING_MASK) | _Q_LOCKED_VAL;
> - val = atomic_cmpxchg(>val, old, new);
> + val = atomic_cmpxchg_acquire(>val, old, new);
>  
>   if (val == old)
>   return 1;
> @@ -209,9 +209,15 @@ static struct qspinlock **pv_hash(struct qspinlock 
> *lock, struct pv_node *node)
>   struct pv_hash_entry *he;
>   int hopcnt = 0;
>  
> + /*
> +  * Synchronizing with the node state variable will control who does
> +  * the hashing - the lock holder or lock waiter. The control
> +  * dependency will ensure that node value is written after the lock
> +  * value. So we don't need other ordering guarantee.
> +  */

By this comment, you mean that

cmpxchg_relaxed(>lock, NULL, lock);
  r1 = ll he->lock;
  
  sc he->lock, lock // successed

if (r1)
WRITE_ONCE(he->node, node);


the sc and WRITE_ONCE() can not be reordered because of the control
dependency? I dont think this is true. Yes the sc must execute before
the WRITE_ONCE(), but the memory/cache effects may be reordered. IOW,
the following may happen


CPU 0   CPU 1
=== ===
{x = 0, y = 0}  if (!cmpxchg_relaxed(, 0, 1))
WRITE_ONCE(x, 1);
r1 = READ_ONCE(x);

smp_rmb();

r2 = READ_ONCE(y);

The following result is possible:

y = 1 && r1 = 1 && r2 = 0

Or I'm missing your point here? ;-) 

Regards,
Boqun

>   for_each_hash_entry(he, offset, hash) {
>   hopcnt++;
> - if (!cmpxchg(>lock, NULL, lock)) {
> + if (!cmpxchg_relaxed(>lock, NULL, lock)) {
>   WRITE_ONCE(he->node, node);
>   qstat_hop(hopcnt);
>   return >lock;
> @@ -309,7 +315,7 @@ static void pv_wait_node(struct mcs_spinlock *node, 
> struct mcs_spinlock *prev)
>* MB MB
>* [L] pn->locked   [RmW] pn->state = vcpu_hashed
>*
> -  * Matches the cmpxchg() from pv_kick_node().
> +  * Matches the cmpxchg_release() from pv_kick_node().
>*/
>   smp_store_mb(pn->state, vcpu_halted);
>  
> @@ -323,8 +329,14 @@ static void pv_wait_node(struct mcs_spinlock *node, 
>

Re: [PATCH v2] locking/pvqspinlock: Relax cmpxchg's to improve performance on some archs

2016-12-25 Thread Boqun Feng

Hi Wainman,

On Sun, Dec 25, 2016 at 03:26:01PM -0500, Waiman Long wrote:
> A number of cmpxchg calls in qspinlock_paravirt.h were replaced by more
> relaxed versions to improve performance on architectures that use LL/SC.
> 
> All the locking related cmpxchg's are replaced with the _acquire
> variants:
>  - pv_queued_spin_steal_lock()
>  - trylock_clear_pending()
> 
> The cmpxchg's related to hashing are replaced by either by the _release
> or the _relaxed variants. See the inline comment for details.
> 
> Signed-off-by: Waiman Long 
> 
>  v1->v2:
>   - Add comments in changelog and code for the rationale of the change.
> 
> ---
>  kernel/locking/qspinlock_paravirt.h | 50 
> -
>  1 file changed, 33 insertions(+), 17 deletions(-)
> 
> diff --git a/kernel/locking/qspinlock_paravirt.h 
> b/kernel/locking/qspinlock_paravirt.h
> index e3b5520..c31d1ab 100644
> --- a/kernel/locking/qspinlock_paravirt.h
> +++ b/kernel/locking/qspinlock_paravirt.h
> @@ -72,7 +72,7 @@ static inline bool pv_queued_spin_steal_lock(struct 
> qspinlock *lock)
>   struct __qspinlock *l = (void *)lock;
>  
>   if (!(atomic_read(>val) & _Q_LOCKED_PENDING_MASK) &&
> - (cmpxchg(>locked, 0, _Q_LOCKED_VAL) == 0)) {
> + (cmpxchg_acquire(>locked, 0, _Q_LOCKED_VAL) == 0)) {
>   qstat_inc(qstat_pv_lock_stealing, true);
>   return true;
>   }
> @@ -101,16 +101,16 @@ static __always_inline void clear_pending(struct 
> qspinlock *lock)
>  
>  /*
>   * The pending bit check in pv_queued_spin_steal_lock() isn't a memory
> - * barrier. Therefore, an atomic cmpxchg() is used to acquire the lock
> - * just to be sure that it will get it.
> + * barrier. Therefore, an atomic cmpxchg_acquire() is used to acquire the
> + * lock to provide the proper memory barrier.
>   */
>  static __always_inline int trylock_clear_pending(struct qspinlock *lock)
>  {
>   struct __qspinlock *l = (void *)lock;
>  
>   return !READ_ONCE(l->locked) &&
> -(cmpxchg(>locked_pending, _Q_PENDING_VAL, _Q_LOCKED_VAL)
> - == _Q_PENDING_VAL);
> +(cmpxchg_acquire(>locked_pending, _Q_PENDING_VAL,
> + _Q_LOCKED_VAL) == _Q_PENDING_VAL);
>  }
>  #else /* _Q_PENDING_BITS == 8 */
>  static __always_inline void set_pending(struct qspinlock *lock)
> @@ -138,7 +138,7 @@ static __always_inline int trylock_clear_pending(struct 
> qspinlock *lock)
>*/
>   old = val;
>   new = (val & ~_Q_PENDING_MASK) | _Q_LOCKED_VAL;
> - val = atomic_cmpxchg(>val, old, new);
> + val = atomic_cmpxchg_acquire(>val, old, new);
>  
>   if (val == old)
>   return 1;
> @@ -209,9 +209,15 @@ static struct qspinlock **pv_hash(struct qspinlock 
> *lock, struct pv_node *node)
>   struct pv_hash_entry *he;
>   int hopcnt = 0;
>  
> + /*
> +  * Synchronizing with the node state variable will control who does
> +  * the hashing - the lock holder or lock waiter. The control
> +  * dependency will ensure that node value is written after the lock
> +  * value. So we don't need other ordering guarantee.
> +  */

By this comment, you mean that

cmpxchg_relaxed(>lock, NULL, lock);
  r1 = ll he->lock;
  
  sc he->lock, lock // successed

if (r1)
WRITE_ONCE(he->node, node);


the sc and WRITE_ONCE() can not be reordered because of the control
dependency? I dont think this is true. Yes the sc must execute before
the WRITE_ONCE(), but the memory/cache effects may be reordered. IOW,
the following may happen


CPU 0   CPU 1
=== ===
{x = 0, y = 0}  if (!cmpxchg_relaxed(, 0, 1))
WRITE_ONCE(x, 1);
r1 = READ_ONCE(x);

smp_rmb();

r2 = READ_ONCE(y);

The following result is possible:

y = 1 && r1 = 1 && r2 = 0

Or I'm missing your point here? ;-) 

Regards,
Boqun

>   for_each_hash_entry(he, offset, hash) {
>   hopcnt++;
> - if (!cmpxchg(>lock, NULL, lock)) {
> + if (!cmpxchg_relaxed(>lock, NULL, lock)) {
>   WRITE_ONCE(he->node, node);
>   qstat_hop(hopcnt);
>   return >lock;
> @@ -309,7 +315,7 @@ static void pv_wait_node(struct mcs_spinlock *node, 
> struct mcs_spinlock *prev)
>* MB MB
>* [L] pn->locked   [RmW] pn->state = vcpu_hashed
>*
> -  * Matches the cmpxchg() from pv_kick_node().
> +  * Matches the cmpxchg_release() from pv_kick_node().
>*/
>   smp_store_mb(pn->state, vcpu_halted);
>  
> @@ -323,8 +329,14 @@ static void pv_wait_node(struct mcs_spinlock *node, 
> struct mcs_spinlock

Re: [PATCH] serial: 8250_dw: #ifdef out dw8250_acpi_match if ACPI isn't set

2016-12-25 Thread Jisheng Zhang

Hi Greg,

On Fri, 23 Dec 2016 19:09:23 +0100 Greg KH wrote:
> 
> On Fri, Dec 23, 2016 at 08:33:55PM +0800, Jisheng Zhang wrote:
> > dw8250_acpi_match is used only when ACPI is enabled. Fix the following
> > gcc warning when W=1 is set:
> > 
> > drivers/tty/serial/8250/8250_dw.c:640:36: warning: 'dw8250_acpi_match'
> > defined but not used [-Wunused-const-variable=]  
> 
> Don't set W=1 then, this type of "fix" is foolish and is just going to
> make code messier, sorry.

Per my understanding of commit c9c6837d3931 ("kbuild: move
-Wunused-const-variable to W=1 warning level"), "Once we have eliminated
the majority of the warnings for both, we can put them back into the
default list.", so we'll have such warning one day without W=1 in the
future.

And FWICT, other devices drivers which could be used in ACPI and DT also
add the "#ifdef ACPI" to the acpi match table, such as:

drivers/ata/ahci_xgene.c
drivers/dma/dw/platform.c
drivers/i2c/busses/i2c-designware-platdrv.c
drivers/hid/i2c-hid/i2c-hid.c
drivers/usb/dwc3/core.c

Thanks,
Jisheng

Re: [PATCH] serial: 8250_dw: #ifdef out dw8250_acpi_match if ACPI isn't set

2016-12-25 Thread Jisheng Zhang

Hi Greg,

On Fri, 23 Dec 2016 19:09:23 +0100 Greg KH wrote:
> 
> On Fri, Dec 23, 2016 at 08:33:55PM +0800, Jisheng Zhang wrote:
> > dw8250_acpi_match is used only when ACPI is enabled. Fix the following
> > gcc warning when W=1 is set:
> > 
> > drivers/tty/serial/8250/8250_dw.c:640:36: warning: 'dw8250_acpi_match'
> > defined but not used [-Wunused-const-variable=]  
> 
> Don't set W=1 then, this type of "fix" is foolish and is just going to
> make code messier, sorry.

Per my understanding of commit c9c6837d3931 ("kbuild: move
-Wunused-const-variable to W=1 warning level"), "Once we have eliminated
the majority of the warnings for both, we can put them back into the
default list.", so we'll have such warning one day without W=1 in the
future.

And FWICT, other devices drivers which could be used in ACPI and DT also
add the "#ifdef ACPI" to the acpi match table, such as:

drivers/ata/ahci_xgene.c
drivers/dma/dw/platform.c
drivers/i2c/busses/i2c-designware-platdrv.c
drivers/hid/i2c-hid/i2c-hid.c
drivers/usb/dwc3/core.c

Thanks,
Jisheng

[RFC PATCH 2/6] Documetation: samsung-phy: add the exynos-pcie-phy binding

2016-12-25 Thread Jaehoon Chung

Adds the exynos-pcie-phy binding for Exynos PCIe PHY.
This is for using generic PHY framework.

Signed-off-by: Jaehoon Chung 
---
 .../devicetree/bindings/phy/samsung-phy.txt | 21 +
 1 file changed, 21 insertions(+)

diff --git a/Documentation/devicetree/bindings/phy/samsung-phy.txt 
b/Documentation/devicetree/bindings/phy/samsung-phy.txt
index 9872ba8..eb1085e 100644
--- a/Documentation/devicetree/bindings/phy/samsung-phy.txt
+++ b/Documentation/devicetree/bindings/phy/samsung-phy.txt
@@ -191,3 +191,24 @@ Example:
usbdrdphy0 = _phy0;
usbdrdphy1 = _phy1;
};
+
+Samsung Exynos SoC series PCIe PHY controller
+--
+Required properties:
+- compatible : Should be set to "samsung,exynos5433-pcie-phy"
+- #phy-cells : must be zero
+- reg : a list of registers usd by phy driver
+
+Optional properites:
+-samsung,pmureg-phandle- handle to syscon used to control PMU registers
+-samsung,fsys-sysreg   - handle to syscon used to control the system registers
+
+Example:
+   pcie_phy: pcie-phy@1568 {
+   #phy-cells = <0>;
+   compatible = "samsung,exynos5433-pcie-phy";
+   reg = <0x1568 0x1000>;
+   samsung,pmureg-phandle = <_system_controller>;
+   samsung,fsys-sysreg = <_fsys>;
+   status = "okay";
+   };
-- 
2.10.2

[RFC PATCH 3/6] ARM64: dts: exynos5433: add the pcie_phy node for PCIe

2016-12-25 Thread Jaehoon Chung

To use the generic PHY framework, adds the pcie_phy node.

Signed-off-by: Jaehoon Chung 
---
 arch/arm64/boot/dts/exynos/exynos5433.dtsi | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/arch/arm64/boot/dts/exynos/exynos5433.dtsi 
b/arch/arm64/boot/dts/exynos/exynos5433.dtsi
index 64226d5..2a15f18 100644
--- a/arch/arm64/boot/dts/exynos/exynos5433.dtsi
+++ b/arch/arm64/boot/dts/exynos/exynos5433.dtsi
@@ -805,6 +805,11 @@
reg = <0x145f 0x1038>;
};
 
+   syscon_fsys: syscon@156f {
+   compatible = "syscon";
+   reg = <0x156f 0x1044>;
+   };
+
gsc_0: video-scaler@13C0 {
compatible = "samsung,exynos5433-gsc";
reg = <0x13c0 0x1000>;
@@ -1443,6 +1448,15 @@
status = "disabled";
};
};
+
+   pcie_phy: pcie-phy@1568 {
+   #phy-cells = <0>;
+   compatible = "samsung,exynos5433-pcie-phy";
+   reg = <0x1568 0x1000>;
+   samsung,pmureg-phandle = <_system_controller>;
+   samsung,fsys-sysreg = <_fsys>;
+   status = "okay";
+   };
};
 
timer: timer {
-- 
2.10.2

[RFC PATCH 6/6] ARM64: exynos: add the pcie node for TM2

2016-12-25 Thread Jaehoon Chung

Add the Exxynos5433 pcie node for TM2.
This pcie device is used for supporting WiFi.

And some gpios are already requested from pinctrl. so it doesn't need to
initialize.
GPJ2-0 is used for supplying to WiFi PCIe chip.

Signed-off-by: Jaehoon Chung 
---
 arch/arm64/boot/dts/exynos/exynos5433-pinctrl.dtsi |  7 +++
 arch/arm64/boot/dts/exynos/exynos5433-tm2.dts  | 11 +--
 arch/arm64/boot/dts/exynos/exynos5433.dtsi | 23 ++
 3 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/boot/dts/exynos/exynos5433-pinctrl.dtsi 
b/arch/arm64/boot/dts/exynos/exynos5433-pinctrl.dtsi
index ad71247..3e8b728 100644
--- a/arch/arm64/boot/dts/exynos/exynos5433-pinctrl.dtsi
+++ b/arch/arm64/boot/dts/exynos/exynos5433-pinctrl.dtsi
@@ -183,6 +183,13 @@
interrupt-controller;
#interrupt-cells = <2>;
};
+
+   pcie_wlanen: pcie-wlanen {
+   samsung,pins = "gpj2-0";
+   samsung,pin-function = <0>;
+   samsung,pin-pud = <3>;
+   samsung,pin-drv = <3>;
+   };
 };
 
 _finger {
diff --git a/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts 
b/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts
index f21bdc2..c84a2ad 100644
--- a/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts
+++ b/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts
@@ -737,6 +737,15 @@
bus-width = <4>;
 };
 
+ {
+   assigned-clocks = <_fsys CLK_MOUT_SCLK_PCIE_100_USER>,
+   <_top CLK_MOUT_SCLK_PCIE_100>;
+   assigned-clock-parents = <_top CLK_SCLK_PCIE_100_FSYS>,
+   <_top CLK_MOUT_BUS_PLL_USER>;
+   assigned-clock-rates = <0>, <1>;
+   status = "okay";
+};
+
 _alive {
pinctrl-names = "default";
pinctrl-0 = <_alive>;
@@ -836,7 +845,6 @@
pinctrl-0 = <_ese>;
 
initial_ese: initial-state {
-   PIN(IN, gpj2-0, DOWN, LV1);
PIN(IN, gpj2-1, DOWN, LV1);
PIN(IN, gpj2-2, DOWN, LV1);
};
@@ -851,7 +859,6 @@
PIN(IN, gpr3-1, DOWN, LV1);
PIN(IN, gpr3-2, DOWN, LV1);
PIN(IN, gpr3-3, DOWN, LV1);
-   PIN(IN, gpr3-7, NONE, LV1);
};
 };
 
diff --git a/arch/arm64/boot/dts/exynos/exynos5433.dtsi 
b/arch/arm64/boot/dts/exynos/exynos5433.dtsi
index 2a15f18..da287f4 100644
--- a/arch/arm64/boot/dts/exynos/exynos5433.dtsi
+++ b/arch/arm64/boot/dts/exynos/exynos5433.dtsi
@@ -1457,6 +1457,29 @@
samsung,fsys-sysreg = <_fsys>;
status = "okay";
};
+
+   pcie: pcie@1570 {
+   compatible = "samsung,exynos5433-pcie", "snps,dw-pcie";
+   #address-cells = <3>;
+   #size-cells = <2>;
+   device_type = "pci";
+   interrupts = ;
+   interrupt-names = "intr";
+   clocks = <_fsys CLK_PCIE>,
+  <_fsys CLK_PCLK_PCIE_PHY>;
+   clock-names = "pcie", "pcie_bus";
+   num-lanes = <1>;
+   pinctrl-names = "default";
+   phys = <_phy>;
+   phy-names = "pcie-phy";
+   pinctrl-0 = <_bus _wlanen>;
+   reg = <0x156b 0x1000>, <0x1570 0x1000>,
+   <0x0c00 0x1000>;
+   reg-names = "elbi", "dbi", "config";
+   ranges = <0x8100 0 0  0x0c001000 0 
0x0001
+ 0x8200 0 0x0c011000 0x0c011000 0 
0x3feefff>;
+   status = "disabled";
+   };
};
 
timer: timer {
-- 
2.10.2

[RFC PATCH 2/6] Documetation: samsung-phy: add the exynos-pcie-phy binding

2016-12-25 Thread Jaehoon Chung

Adds the exynos-pcie-phy binding for Exynos PCIe PHY.
This is for using generic PHY framework.

Signed-off-by: Jaehoon Chung 
---
 .../devicetree/bindings/phy/samsung-phy.txt | 21 +
 1 file changed, 21 insertions(+)

diff --git a/Documentation/devicetree/bindings/phy/samsung-phy.txt 
b/Documentation/devicetree/bindings/phy/samsung-phy.txt
index 9872ba8..eb1085e 100644
--- a/Documentation/devicetree/bindings/phy/samsung-phy.txt
+++ b/Documentation/devicetree/bindings/phy/samsung-phy.txt
@@ -191,3 +191,24 @@ Example:
usbdrdphy0 = _phy0;
usbdrdphy1 = _phy1;
};
+
+Samsung Exynos SoC series PCIe PHY controller
+--
+Required properties:
+- compatible : Should be set to "samsung,exynos5433-pcie-phy"
+- #phy-cells : must be zero
+- reg : a list of registers usd by phy driver
+
+Optional properites:
+-samsung,pmureg-phandle- handle to syscon used to control PMU registers
+-samsung,fsys-sysreg   - handle to syscon used to control the system registers
+
+Example:
+   pcie_phy: pcie-phy@1568 {
+   #phy-cells = <0>;
+   compatible = "samsung,exynos5433-pcie-phy";
+   reg = <0x1568 0x1000>;
+   samsung,pmureg-phandle = <_system_controller>;
+   samsung,fsys-sysreg = <_fsys>;
+   status = "okay";
+   };
-- 
2.10.2

[RFC PATCH 3/6] ARM64: dts: exynos5433: add the pcie_phy node for PCIe

2016-12-25 Thread Jaehoon Chung

To use the generic PHY framework, adds the pcie_phy node.

Signed-off-by: Jaehoon Chung 
---
 arch/arm64/boot/dts/exynos/exynos5433.dtsi | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/arch/arm64/boot/dts/exynos/exynos5433.dtsi 
b/arch/arm64/boot/dts/exynos/exynos5433.dtsi
index 64226d5..2a15f18 100644
--- a/arch/arm64/boot/dts/exynos/exynos5433.dtsi
+++ b/arch/arm64/boot/dts/exynos/exynos5433.dtsi
@@ -805,6 +805,11 @@
reg = <0x145f 0x1038>;
};
 
+   syscon_fsys: syscon@156f {
+   compatible = "syscon";
+   reg = <0x156f 0x1044>;
+   };
+
gsc_0: video-scaler@13C0 {
compatible = "samsung,exynos5433-gsc";
reg = <0x13c0 0x1000>;
@@ -1443,6 +1448,15 @@
status = "disabled";
};
};
+
+   pcie_phy: pcie-phy@1568 {
+   #phy-cells = <0>;
+   compatible = "samsung,exynos5433-pcie-phy";
+   reg = <0x1568 0x1000>;
+   samsung,pmureg-phandle = <_system_controller>;
+   samsung,fsys-sysreg = <_fsys>;
+   status = "okay";
+   };
};
 
timer: timer {
-- 
2.10.2

[RFC PATCH 6/6] ARM64: exynos: add the pcie node for TM2

2016-12-25 Thread Jaehoon Chung

Add the Exxynos5433 pcie node for TM2.
This pcie device is used for supporting WiFi.

And some gpios are already requested from pinctrl. so it doesn't need to
initialize.
GPJ2-0 is used for supplying to WiFi PCIe chip.

Signed-off-by: Jaehoon Chung 
---
 arch/arm64/boot/dts/exynos/exynos5433-pinctrl.dtsi |  7 +++
 arch/arm64/boot/dts/exynos/exynos5433-tm2.dts  | 11 +--
 arch/arm64/boot/dts/exynos/exynos5433.dtsi | 23 ++
 3 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/boot/dts/exynos/exynos5433-pinctrl.dtsi 
b/arch/arm64/boot/dts/exynos/exynos5433-pinctrl.dtsi
index ad71247..3e8b728 100644
--- a/arch/arm64/boot/dts/exynos/exynos5433-pinctrl.dtsi
+++ b/arch/arm64/boot/dts/exynos/exynos5433-pinctrl.dtsi
@@ -183,6 +183,13 @@
interrupt-controller;
#interrupt-cells = <2>;
};
+
+   pcie_wlanen: pcie-wlanen {
+   samsung,pins = "gpj2-0";
+   samsung,pin-function = <0>;
+   samsung,pin-pud = <3>;
+   samsung,pin-drv = <3>;
+   };
 };
 
 _finger {
diff --git a/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts 
b/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts
index f21bdc2..c84a2ad 100644
--- a/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts
+++ b/arch/arm64/boot/dts/exynos/exynos5433-tm2.dts
@@ -737,6 +737,15 @@
bus-width = <4>;
 };
 
+ {
+   assigned-clocks = <_fsys CLK_MOUT_SCLK_PCIE_100_USER>,
+   <_top CLK_MOUT_SCLK_PCIE_100>;
+   assigned-clock-parents = <_top CLK_SCLK_PCIE_100_FSYS>,
+   <_top CLK_MOUT_BUS_PLL_USER>;
+   assigned-clock-rates = <0>, <1>;
+   status = "okay";
+};
+
 _alive {
pinctrl-names = "default";
pinctrl-0 = <_alive>;
@@ -836,7 +845,6 @@
pinctrl-0 = <_ese>;
 
initial_ese: initial-state {
-   PIN(IN, gpj2-0, DOWN, LV1);
PIN(IN, gpj2-1, DOWN, LV1);
PIN(IN, gpj2-2, DOWN, LV1);
};
@@ -851,7 +859,6 @@
PIN(IN, gpr3-1, DOWN, LV1);
PIN(IN, gpr3-2, DOWN, LV1);
PIN(IN, gpr3-3, DOWN, LV1);
-   PIN(IN, gpr3-7, NONE, LV1);
};
 };
 
diff --git a/arch/arm64/boot/dts/exynos/exynos5433.dtsi 
b/arch/arm64/boot/dts/exynos/exynos5433.dtsi
index 2a15f18..da287f4 100644
--- a/arch/arm64/boot/dts/exynos/exynos5433.dtsi
+++ b/arch/arm64/boot/dts/exynos/exynos5433.dtsi
@@ -1457,6 +1457,29 @@
samsung,fsys-sysreg = <_fsys>;
status = "okay";
};
+
+   pcie: pcie@1570 {
+   compatible = "samsung,exynos5433-pcie", "snps,dw-pcie";
+   #address-cells = <3>;
+   #size-cells = <2>;
+   device_type = "pci";
+   interrupts = ;
+   interrupt-names = "intr";
+   clocks = <_fsys CLK_PCIE>,
+  <_fsys CLK_PCLK_PCIE_PHY>;
+   clock-names = "pcie", "pcie_bus";
+   num-lanes = <1>;
+   pinctrl-names = "default";
+   phys = <_phy>;
+   phy-names = "pcie-phy";
+   pinctrl-0 = <_bus _wlanen>;
+   reg = <0x156b 0x1000>, <0x1570 0x1000>,
+   <0x0c00 0x1000>;
+   reg-names = "elbi", "dbi", "config";
+   ranges = <0x8100 0 0  0x0c001000 0 
0x0001
+ 0x8200 0 0x0c011000 0x0c011000 0 
0x3feefff>;
+   status = "disabled";
+   };
};
 
timer: timer {
-- 
2.10.2

[RFC PATCH 5/6] Documentation: pci: add the exynos5433-pcie binding

2016-12-25 Thread Jaehoon Chung

Signed-off-by: Jaehoon Chung 
---
 .../devicetree/bindings/pci/exynos5433-pcie.txt| 36 ++
 1 file changed, 36 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/pci/exynos5433-pcie.txt

diff --git a/Documentation/devicetree/bindings/pci/exynos5433-pcie.txt 
b/Documentation/devicetree/bindings/pci/exynos5433-pcie.txt
new file mode 100644
index 000..932a847
--- /dev/null
+++ b/Documentation/devicetree/bindings/pci/exynos5433-pcie.txt
@@ -0,0 +1,36 @@
+* Samsung Exynos5433 PCIe interface
+
+This PCIe host controller is based on the Synopsis Designware PCIe IP
+and thus inherits all the common properties defined in designware-pcie.txt.
+
+Required properties:
+- compatible: "samsung,exynos5433-pcie"
+- reg: base addresses and lengths of the pcie controller,
+   the phy controller, additional register for the phy controller.
+- reg-names: Must be "elbi", "phy" and "dbi" for each regs
+- interrupt-names: Must be "intr" for legacy interrupt pin.
+
+Other common properites refer to
+   Documentation/devicetree/binding/pci/designware-pcie.txt
+
+Example:
+
+   pcie: pcie@1570 {
+   compatible ="samsung,exynos5433-pcie", "snps,dw-pcie";
+   #address-cells = <3>;
+   #size-cells = <2>;
+   device_type = "pci";
+   interrupts = ;
+   interrupt-names = "intr";
+   clocks = <_fsys CLK_PCIE>, <_fsys CLK_PCLK_PCIE_PHY>;
+   clock-names = "pcie", "pcie_bus";
+   num-lanes = <1>;
+   pinctrl-names = "default";
+   pinctrl-0 = <_bus>;
+   reg = <0x156b 0x1000>, <0x1568 0x1000>,
+   <0x1570 0x1000>, <0x0c00 0x1000>;
+   reg-names = "elbi", "phy", "dbi", "config";
+   ranges = <0x8100 0 0  0x0c001000 0 0x0001
+ 0x8200 0 0x0c011000 0x0c011000 0 0x3feefff>;
+   status = "disabled";
+   };
-- 
2.10.2

[RFC PATCH 4/6] PCI: exynos5433: Add new exynos pci host controller for Exynos5433

2016-12-25 Thread Jaehoon Chung

Exynos5433 supports the PCIe.
This patch adds new pci-exynos5433.c file for Exynos ARM64.

Signed-off-by: Jaehoon Chung 
---
 drivers/pci/host/Kconfig  |   9 +
 drivers/pci/host/Makefile |   1 +
 drivers/pci/host/pci-exynos5433.c | 338 ++
 3 files changed, 348 insertions(+)
 create mode 100644 drivers/pci/host/pci-exynos5433.c

diff --git a/drivers/pci/host/Kconfig b/drivers/pci/host/Kconfig
index d7e7c0a..3d77d0b 100644
--- a/drivers/pci/host/Kconfig
+++ b/drivers/pci/host/Kconfig
@@ -60,6 +60,15 @@ config PCI_EXYNOS
select PCIEPORTBUS
select PCIE_DW
 
+config PCI_EXYNOS5433
+   bool "Samsung Exynos5433 PCIe controller"
+   depends on ARCH_EXYNOS && ARM64
+   depends on PCI_MSI_IRQ_DOMAIN
+   select PCIEPORTBUS
+   select PCIE_DW
+   help
+ If you want support for Exynos5433 PCIe host controller, say Y.
+
 config PCI_IMX6
bool "Freescale i.MX6 PCIe controller"
depends on SOC_IMX6Q
diff --git a/drivers/pci/host/Makefile b/drivers/pci/host/Makefile
index 084cb49..2168de2 100644
--- a/drivers/pci/host/Makefile
+++ b/drivers/pci/host/Makefile
@@ -2,6 +2,7 @@ obj-$(CONFIG_PCIE_DW) += pcie-designware.o
 obj-$(CONFIG_PCIE_DW_PLAT) += pcie-designware-plat.o
 obj-$(CONFIG_PCI_DRA7XX) += pci-dra7xx.o
 obj-$(CONFIG_PCI_EXYNOS) += pci-exynos.o
+obj-$(CONFIG_PCI_EXYNOS5433) += pci-exynos5433.o
 obj-$(CONFIG_PCI_IMX6) += pci-imx6.o
 obj-$(CONFIG_PCI_HYPERV) += pci-hyperv.o
 obj-$(CONFIG_PCI_MVEBU) += pci-mvebu.o
diff --git a/drivers/pci/host/pci-exynos5433.c 
b/drivers/pci/host/pci-exynos5433.c
new file mode 100644
index 000..ff254ca
--- /dev/null
+++ b/drivers/pci/host/pci-exynos5433.c
@@ -0,0 +1,338 @@
+/*
+ * PCIe host controller driver for Samsung EXYNOS5433 SoCs
+ *
+ * Copyright (C) 2016 Samsung Electronics Co., Ltd.
+ * http://www.samsung.com
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "pcie-designware.h"
+
+#define to_exynos_pcie(x)  container_of(x, struct exynos_pcie, pp)
+
+/* Pcie structure for Exynos specific data */
+struct exynos_pcie {
+   void __iomem*elbi_base;
+   struct clk  *clk;
+   struct clk  *bus_clk;
+   struct pcie_portpp;
+   struct phy  *phy;
+};
+
+/* PCIe ELBI registers */
+#define PCIE_IRQ_PULSE 0x000
+#define IRQ_INTA_ASSERTBIT(0)
+#define IRQ_INTB_ASSERTBIT(2)
+#define IRQ_INTC_ASSERTBIT(4)
+#define IRQ_INTD_ASSERTBIT(6)
+#define IRQ_INTX_ASSERT(IRQ_INTA_ASSERT | IRQ_INTB_ASSERT | \
+   IRQ_INTC_ASSERT | IRQ_INTD_ASSERT)
+#define PCIE_IRQ_EN_PULSE  0x00c
+#define PCIE_IRQ_EN_LEVEL  0x010
+#define PCIE_SW_WAKE   0x018
+#define PCIE_BUS_ENBIT(1)
+#define PCIE_APP_LTSSM_ENABLE  0x02c
+#define PCIE_ELBI_LTSSM_ENABLE 0x1
+#define PCIE_ELBI_DEBUG_L  0x074
+#define PCIE_ELBI_XMLH_LINK_UP BIT(4)
+#define PCIE_ELBI_SLV_AWMISC   0x11c
+#define PCIE_ELBI_SLV_ARMISC   0x120
+#define PCIE_ELBI_SLV_DBI_ENABLE   BIT(21)
+
+/* DBI register */
+#define PCIE_MISC_CONTROL_1_OFF0x8BC
+#define DBI_RO_WR_EN   BIT(0)
+
+static inline void exynos_pcie_writel(void __iomem *base, u32 val, u32 offset)
+{
+   writel(val, base + offset);
+}
+
+static inline u32 exynos_pcie_readl(void __iomem *base, u32 offset)
+{
+   return readl(base + offset);
+}
+
+static void exynos_pcie_clear_irq_pulse(struct exynos_pcie *ep)
+{
+   u32 val;
+
+   val = exynos_pcie_readl(ep->elbi_base, PCIE_IRQ_PULSE);
+   val &= ~IRQ_INTX_ASSERT;
+   exynos_pcie_writel(ep->elbi_base, val, PCIE_IRQ_PULSE);
+}
+
+static void exynos_pcie_enable_irq_pulse(struct exynos_pcie *ep)
+{
+   exynos_pcie_writel(ep->elbi_base, IRQ_INTX_ASSERT, PCIE_IRQ_EN_PULSE);
+
+   /* Clear PCIE_IRQ_EN_LEVEL register */
+   exynos_pcie_writel(ep->elbi_base, 0, PCIE_IRQ_EN_LEVEL);
+}
+
+static irqreturn_t exynos_pcie_irq_handler(int irq, void *arg)
+{
+   struct pcie_port *pp = arg;
+   struct exynos_pcie *ep = to_exynos_pcie(pp);
+
+   exynos_pcie_clear_irq_pulse(ep);
+
+   return IRQ_HANDLED;
+}
+
+static void exynos_pcie_sideband_dbi_w_mode(struct exynos_pcie *ep, bool on)
+{
+   u32 val;
+
+   val = exynos_pcie_readl(ep->elbi_base, PCIE_ELBI_SLV_AWMISC);
+   if (on)
+   val |= PCIE_ELBI_SLV_DBI_ENABLE;
+   else
+   val &=

[RFC PATCH 0/6] Support the PCIe for TM2(exynos5433)

2016-12-25 Thread Jaehoon Chung

This patchset is for supporting PCIe exynos5433.
TM2(exynos5433) supports the PCIe for WiFi. In driver/pci/host/, there is 
pci-exynos.c.
But i can't touch anything. The below reasons are why i added the new file.
1. Don't have the exynos5440 TRM
- I can't check anything for exynso5440.
- So i can't touch anything for using PHY generic framework.
2. Can't test the exynos5440 board.
- If used the phy generic framework, can't ensure whether it's working fine or 
not.
3. There is no maintiain for exynos5440.
- i don't know anywhere pci-exynos5440 is used.

As i know, Bjorn(PCIe Maintainer) agreed about adding the new file.

So i added the new pci-exynos5433 file in driver/pci/host/.
And adds the phy-exynos-pcie.c for using PHY generic framework.
When use the PHY generic framework, controlling pcie is more easier than now.

There are future works,
  - Supporting MSI
  - If possible, combine the one file to pci-exynos.c

This is based on httt://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux.git 
(for-next branch)
(Because PCI git repository doesn't snycrhonize yet.)

The below is working messesage

[0.817081] OF: PCI: host bridge /soc/pcie@1570 ranges:
[0.817264] OF: PCI:   No bus range found for /soc/pcie@1570, using [bus 
00-ff]
[0.821377] OF: PCI:IO 0x0c001000..0x0c010fff -> 0x
[0.827270] OF: PCI:   MEM 0x0c011000..0x0ffe -> 0x0c011000
[0.934306] exynos5433-pcie 156b.pcie: link up
[0.934649] exynos5433-pcie 156b.pcie: PCI host bridge to bus :00
[0.934867] pci_bus :00: root bus resource [bus 00-ff]
[0.935045] pci_bus :00: root bus resource [io  0x-0x]
[0.935243] pci_bus :00: root bus resource [mem 0x0c011000-0x0ffe]
[0.953719] pci :00:00.0: BAR 8: assigned [mem 0x0c20-0x0c7f]
[0.953941] pci :01:00.0: BAR 2: assigned [mem 0x0c40-0x0c7f 
64bit]
[0.956672] pci :01:00.0: BAR 0: assigned [mem 0x0c20-0x0c207fff 
64bit]
[0.963959] pci :00:00.0: PCI bridge to [bus 01]
[0.968368] pci :00:00.0:   bridge window [mem 0x0c20-0x0c7f]
[0.975241] pcieport :00:00.0: of_irq_parse_pci() failed with rc=-22
[0.982124] pcieport :00:00.0: Signaling PME through PCIe PME interrupt

Jaehoon Chung (6):
  phy: exynos-pcie: Add support for Exynos PCIe phy
  Documetation: samsung-phy: add the exynos-pcie-phy binding
  ARM64: dts: exynos5433: add the pcie_phy node for PCIe
  PCI: exynos5433: Add new exynos pci host controller for Exynos5433
  Documentation: pci: add the exynos5433-pcie binding
  ARM64: exynos: add the pcie node for TM2

 .../devicetree/bindings/pci/exynos5433-pcie.txt|  36 +++
 .../devicetree/bindings/phy/samsung-phy.txt|  21 ++
 arch/arm64/boot/dts/exynos/exynos5433-pinctrl.dtsi |   7 +
 arch/arm64/boot/dts/exynos/exynos5433-tm2.dts  |  11 +-
 arch/arm64/boot/dts/exynos/exynos5433.dtsi |  37 +++
 drivers/pci/host/Kconfig   |   9 +
 drivers/pci/host/Makefile  |   1 +
 drivers/pci/host/pci-exynos5433.c  | 338 +
 drivers/phy/Kconfig|   9 +
 drivers/phy/Makefile   |   1 +
 drivers/phy/phy-exynos-pcie.c  | 227 ++
 11 files changed, 695 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/pci/exynos5433-pcie.txt
 create mode 100644 drivers/pci/host/pci-exynos5433.c
 create mode 100644 drivers/phy/phy-exynos-pcie.c

-- 
2.10.2

[RFC PATCH 5/6] Documentation: pci: add the exynos5433-pcie binding

2016-12-25 Thread Jaehoon Chung

Signed-off-by: Jaehoon Chung 
---
 .../devicetree/bindings/pci/exynos5433-pcie.txt| 36 ++
 1 file changed, 36 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/pci/exynos5433-pcie.txt

diff --git a/Documentation/devicetree/bindings/pci/exynos5433-pcie.txt 
b/Documentation/devicetree/bindings/pci/exynos5433-pcie.txt
new file mode 100644
index 000..932a847
--- /dev/null
+++ b/Documentation/devicetree/bindings/pci/exynos5433-pcie.txt
@@ -0,0 +1,36 @@
+* Samsung Exynos5433 PCIe interface
+
+This PCIe host controller is based on the Synopsis Designware PCIe IP
+and thus inherits all the common properties defined in designware-pcie.txt.
+
+Required properties:
+- compatible: "samsung,exynos5433-pcie"
+- reg: base addresses and lengths of the pcie controller,
+   the phy controller, additional register for the phy controller.
+- reg-names: Must be "elbi", "phy" and "dbi" for each regs
+- interrupt-names: Must be "intr" for legacy interrupt pin.
+
+Other common properites refer to
+   Documentation/devicetree/binding/pci/designware-pcie.txt
+
+Example:
+
+   pcie: pcie@1570 {
+   compatible ="samsung,exynos5433-pcie", "snps,dw-pcie";
+   #address-cells = <3>;
+   #size-cells = <2>;
+   device_type = "pci";
+   interrupts = ;
+   interrupt-names = "intr";
+   clocks = <_fsys CLK_PCIE>, <_fsys CLK_PCLK_PCIE_PHY>;
+   clock-names = "pcie", "pcie_bus";
+   num-lanes = <1>;
+   pinctrl-names = "default";
+   pinctrl-0 = <_bus>;
+   reg = <0x156b 0x1000>, <0x1568 0x1000>,
+   <0x1570 0x1000>, <0x0c00 0x1000>;
+   reg-names = "elbi", "phy", "dbi", "config";
+   ranges = <0x8100 0 0  0x0c001000 0 0x0001
+ 0x8200 0 0x0c011000 0x0c011000 0 0x3feefff>;
+   status = "disabled";
+   };
-- 
2.10.2

[RFC PATCH 4/6] PCI: exynos5433: Add new exynos pci host controller for Exynos5433

2016-12-25 Thread Jaehoon Chung

Exynos5433 supports the PCIe.
This patch adds new pci-exynos5433.c file for Exynos ARM64.

Signed-off-by: Jaehoon Chung 
---
 drivers/pci/host/Kconfig  |   9 +
 drivers/pci/host/Makefile |   1 +
 drivers/pci/host/pci-exynos5433.c | 338 ++
 3 files changed, 348 insertions(+)
 create mode 100644 drivers/pci/host/pci-exynos5433.c

diff --git a/drivers/pci/host/Kconfig b/drivers/pci/host/Kconfig
index d7e7c0a..3d77d0b 100644
--- a/drivers/pci/host/Kconfig
+++ b/drivers/pci/host/Kconfig
@@ -60,6 +60,15 @@ config PCI_EXYNOS
select PCIEPORTBUS
select PCIE_DW
 
+config PCI_EXYNOS5433
+   bool "Samsung Exynos5433 PCIe controller"
+   depends on ARCH_EXYNOS && ARM64
+   depends on PCI_MSI_IRQ_DOMAIN
+   select PCIEPORTBUS
+   select PCIE_DW
+   help
+ If you want support for Exynos5433 PCIe host controller, say Y.
+
 config PCI_IMX6
bool "Freescale i.MX6 PCIe controller"
depends on SOC_IMX6Q
diff --git a/drivers/pci/host/Makefile b/drivers/pci/host/Makefile
index 084cb49..2168de2 100644
--- a/drivers/pci/host/Makefile
+++ b/drivers/pci/host/Makefile
@@ -2,6 +2,7 @@ obj-$(CONFIG_PCIE_DW) += pcie-designware.o
 obj-$(CONFIG_PCIE_DW_PLAT) += pcie-designware-plat.o
 obj-$(CONFIG_PCI_DRA7XX) += pci-dra7xx.o
 obj-$(CONFIG_PCI_EXYNOS) += pci-exynos.o
+obj-$(CONFIG_PCI_EXYNOS5433) += pci-exynos5433.o
 obj-$(CONFIG_PCI_IMX6) += pci-imx6.o
 obj-$(CONFIG_PCI_HYPERV) += pci-hyperv.o
 obj-$(CONFIG_PCI_MVEBU) += pci-mvebu.o
diff --git a/drivers/pci/host/pci-exynos5433.c 
b/drivers/pci/host/pci-exynos5433.c
new file mode 100644
index 000..ff254ca
--- /dev/null
+++ b/drivers/pci/host/pci-exynos5433.c
@@ -0,0 +1,338 @@
+/*
+ * PCIe host controller driver for Samsung EXYNOS5433 SoCs
+ *
+ * Copyright (C) 2016 Samsung Electronics Co., Ltd.
+ * http://www.samsung.com
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "pcie-designware.h"
+
+#define to_exynos_pcie(x)  container_of(x, struct exynos_pcie, pp)
+
+/* Pcie structure for Exynos specific data */
+struct exynos_pcie {
+   void __iomem*elbi_base;
+   struct clk  *clk;
+   struct clk  *bus_clk;
+   struct pcie_portpp;
+   struct phy  *phy;
+};
+
+/* PCIe ELBI registers */
+#define PCIE_IRQ_PULSE 0x000
+#define IRQ_INTA_ASSERTBIT(0)
+#define IRQ_INTB_ASSERTBIT(2)
+#define IRQ_INTC_ASSERTBIT(4)
+#define IRQ_INTD_ASSERTBIT(6)
+#define IRQ_INTX_ASSERT(IRQ_INTA_ASSERT | IRQ_INTB_ASSERT | \
+   IRQ_INTC_ASSERT | IRQ_INTD_ASSERT)
+#define PCIE_IRQ_EN_PULSE  0x00c
+#define PCIE_IRQ_EN_LEVEL  0x010
+#define PCIE_SW_WAKE   0x018
+#define PCIE_BUS_ENBIT(1)
+#define PCIE_APP_LTSSM_ENABLE  0x02c
+#define PCIE_ELBI_LTSSM_ENABLE 0x1
+#define PCIE_ELBI_DEBUG_L  0x074
+#define PCIE_ELBI_XMLH_LINK_UP BIT(4)
+#define PCIE_ELBI_SLV_AWMISC   0x11c
+#define PCIE_ELBI_SLV_ARMISC   0x120
+#define PCIE_ELBI_SLV_DBI_ENABLE   BIT(21)
+
+/* DBI register */
+#define PCIE_MISC_CONTROL_1_OFF0x8BC
+#define DBI_RO_WR_EN   BIT(0)
+
+static inline void exynos_pcie_writel(void __iomem *base, u32 val, u32 offset)
+{
+   writel(val, base + offset);
+}
+
+static inline u32 exynos_pcie_readl(void __iomem *base, u32 offset)
+{
+   return readl(base + offset);
+}
+
+static void exynos_pcie_clear_irq_pulse(struct exynos_pcie *ep)
+{
+   u32 val;
+
+   val = exynos_pcie_readl(ep->elbi_base, PCIE_IRQ_PULSE);
+   val &= ~IRQ_INTX_ASSERT;
+   exynos_pcie_writel(ep->elbi_base, val, PCIE_IRQ_PULSE);
+}
+
+static void exynos_pcie_enable_irq_pulse(struct exynos_pcie *ep)
+{
+   exynos_pcie_writel(ep->elbi_base, IRQ_INTX_ASSERT, PCIE_IRQ_EN_PULSE);
+
+   /* Clear PCIE_IRQ_EN_LEVEL register */
+   exynos_pcie_writel(ep->elbi_base, 0, PCIE_IRQ_EN_LEVEL);
+}
+
+static irqreturn_t exynos_pcie_irq_handler(int irq, void *arg)
+{
+   struct pcie_port *pp = arg;
+   struct exynos_pcie *ep = to_exynos_pcie(pp);
+
+   exynos_pcie_clear_irq_pulse(ep);
+
+   return IRQ_HANDLED;
+}
+
+static void exynos_pcie_sideband_dbi_w_mode(struct exynos_pcie *ep, bool on)
+{
+   u32 val;
+
+   val = exynos_pcie_readl(ep->elbi_base, PCIE_ELBI_SLV_AWMISC);
+   if (on)
+   val |= PCIE_ELBI_SLV_DBI_ENABLE;
+   else
+   val &= ~PCIE_ELBI_SLV_DBI_ENABLE;
+

[RFC PATCH 0/6] Support the PCIe for TM2(exynos5433)

2016-12-25 Thread Jaehoon Chung

This patchset is for supporting PCIe exynos5433.
TM2(exynos5433) supports the PCIe for WiFi. In driver/pci/host/, there is 
pci-exynos.c.
But i can't touch anything. The below reasons are why i added the new file.
1. Don't have the exynos5440 TRM
- I can't check anything for exynso5440.
- So i can't touch anything for using PHY generic framework.
2. Can't test the exynos5440 board.
- If used the phy generic framework, can't ensure whether it's working fine or 
not.
3. There is no maintiain for exynos5440.
- i don't know anywhere pci-exynos5440 is used.

As i know, Bjorn(PCIe Maintainer) agreed about adding the new file.

So i added the new pci-exynos5433 file in driver/pci/host/.
And adds the phy-exynos-pcie.c for using PHY generic framework.
When use the PHY generic framework, controlling pcie is more easier than now.

There are future works,
  - Supporting MSI
  - If possible, combine the one file to pci-exynos.c

This is based on httt://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux.git 
(for-next branch)
(Because PCI git repository doesn't snycrhonize yet.)

The below is working messesage

[0.817081] OF: PCI: host bridge /soc/pcie@1570 ranges:
[0.817264] OF: PCI:   No bus range found for /soc/pcie@1570, using [bus 
00-ff]
[0.821377] OF: PCI:IO 0x0c001000..0x0c010fff -> 0x
[0.827270] OF: PCI:   MEM 0x0c011000..0x0ffe -> 0x0c011000
[0.934306] exynos5433-pcie 156b.pcie: link up
[0.934649] exynos5433-pcie 156b.pcie: PCI host bridge to bus :00
[0.934867] pci_bus :00: root bus resource [bus 00-ff]
[0.935045] pci_bus :00: root bus resource [io  0x-0x]
[0.935243] pci_bus :00: root bus resource [mem 0x0c011000-0x0ffe]
[0.953719] pci :00:00.0: BAR 8: assigned [mem 0x0c20-0x0c7f]
[0.953941] pci :01:00.0: BAR 2: assigned [mem 0x0c40-0x0c7f 
64bit]
[0.956672] pci :01:00.0: BAR 0: assigned [mem 0x0c20-0x0c207fff 
64bit]
[0.963959] pci :00:00.0: PCI bridge to [bus 01]
[0.968368] pci :00:00.0:   bridge window [mem 0x0c20-0x0c7f]
[0.975241] pcieport :00:00.0: of_irq_parse_pci() failed with rc=-22
[0.982124] pcieport :00:00.0: Signaling PME through PCIe PME interrupt

Jaehoon Chung (6):
  phy: exynos-pcie: Add support for Exynos PCIe phy
  Documetation: samsung-phy: add the exynos-pcie-phy binding
  ARM64: dts: exynos5433: add the pcie_phy node for PCIe
  PCI: exynos5433: Add new exynos pci host controller for Exynos5433
  Documentation: pci: add the exynos5433-pcie binding
  ARM64: exynos: add the pcie node for TM2

 .../devicetree/bindings/pci/exynos5433-pcie.txt|  36 +++
 .../devicetree/bindings/phy/samsung-phy.txt|  21 ++
 arch/arm64/boot/dts/exynos/exynos5433-pinctrl.dtsi |   7 +
 arch/arm64/boot/dts/exynos/exynos5433-tm2.dts  |  11 +-
 arch/arm64/boot/dts/exynos/exynos5433.dtsi |  37 +++
 drivers/pci/host/Kconfig   |   9 +
 drivers/pci/host/Makefile  |   1 +
 drivers/pci/host/pci-exynos5433.c  | 338 +
 drivers/phy/Kconfig|   9 +
 drivers/phy/Makefile   |   1 +
 drivers/phy/phy-exynos-pcie.c  | 227 ++
 11 files changed, 695 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/pci/exynos5433-pcie.txt
 create mode 100644 drivers/pci/host/pci-exynos5433.c
 create mode 100644 drivers/phy/phy-exynos-pcie.c

-- 
2.10.2

[RFC PATCH 1/6] phy: exynos-pcie: Add support for Exynos PCIe phy

2016-12-25 Thread Jaehoon Chung

This patch supports to use Generic Phy framework for Exynos PCIe phy.
When Exynos that supported the pcie want to use the PCIe,
it needs to control the phy resgister.
But it should be more complex to control in their own PCIe device drivers.

Signed-off-by: Jaehoon Chung 
---
 drivers/phy/Kconfig   |   9 ++
 drivers/phy/Makefile  |   1 +
 drivers/phy/phy-exynos-pcie.c | 227 ++
 3 files changed, 237 insertions(+)
 create mode 100644 drivers/phy/phy-exynos-pcie.c

diff --git a/drivers/phy/Kconfig b/drivers/phy/Kconfig
index fe00f91..94b0433 100644
--- a/drivers/phy/Kconfig
+++ b/drivers/phy/Kconfig
@@ -341,6 +341,15 @@ config PHY_EXYNOS5_USBDRD
  This driver provides PHY interface for USB 3.0 DRD controller
  present on Exynos5 SoC series.
 
+config PHY_EXYNOS_PCIE
+   bool "Exynos PCIe PHY driver"
+   depends on ARCH_EXYNOS && OF
+   depends on PCI_EXYNOS5433
+   select GENERIC_PHY
+   help
+ Enable PCIe PHY support for Exynos SoC series.
+ This driver provides PHY interface for Exynos PCIe controller.
+
 config PHY_PISTACHIO_USB
tristate "IMG Pistachio USB2.0 PHY driver"
depends on MACH_PISTACHIO
diff --git a/drivers/phy/Makefile b/drivers/phy/Makefile
index a534cf5..586344d 100644
--- a/drivers/phy/Makefile
+++ b/drivers/phy/Makefile
@@ -38,6 +38,7 @@ phy-exynos-usb2-$(CONFIG_PHY_EXYNOS4X12_USB2) += 
phy-exynos4x12-usb2.o
 phy-exynos-usb2-$(CONFIG_PHY_EXYNOS5250_USB2)  += phy-exynos5250-usb2.o
 phy-exynos-usb2-$(CONFIG_PHY_S5PV210_USB2) += phy-s5pv210-usb2.o
 obj-$(CONFIG_PHY_EXYNOS5_USBDRD)   += phy-exynos5-usbdrd.o
+obj-$(CONFIG_PHY_EXYNOS_PCIE)  += phy-exynos-pcie.o
 obj-$(CONFIG_PHY_QCOM_APQ8064_SATA)+= phy-qcom-apq8064-sata.o
 obj-$(CONFIG_PHY_ROCKCHIP_USB) += phy-rockchip-usb.o
 obj-$(CONFIG_PHY_ROCKCHIP_INNO_USB2)   += phy-rockchip-inno-usb2.o
diff --git a/drivers/phy/phy-exynos-pcie.c b/drivers/phy/phy-exynos-pcie.c
new file mode 100644
index 000..0f5eefd
--- /dev/null
+++ b/drivers/phy/phy-exynos-pcie.c
@@ -0,0 +1,227 @@
+/*
+ * Samsung EXYNOS SoC series PCIe PHY driver
+ *
+ * Phy provider for PCIe controller on Exynos SoC series
+ *
+ * Copyright (C) 2016 Samsung Electronics Co., Ltd.
+ * Jaehoon Chung 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define PCIE_EXYNOS5433_PMU_PHY_OFFSET 0x730
+#define PCIE_PHY_OFFSET(x) ((x) * 0x4)
+
+/* Sysreg Fsys register offset and bit for Exynos5433 */
+#define PCIE_PHY_MAC_RESET 0x208
+#define PCIE_MAC_RESET_MASK0xFF
+#define PCIE_MAC_RESET BIT(4)
+#define PCIE_L1SUB_CM_CON  0x1010
+#define PCIE_REFCLK_GATING_EN  BIT(0)
+#define PCIE_PHY_COMMON_RESET  0x1020
+#define PCIE_PHY_RESET BIT(0)
+#define PCIE_PHY_GLOBAL_RESET  0x1040
+#define PCIE_GLOBAL_RESET  BIT(0)
+#define PCIE_REFCLKBIT(1)
+#define PCIE_REFCLK_MASK   0x16
+#define PCIE_APP_REQ_EXIT_L1_MODE  BIT(5)
+
+enum exynos_pcie_phy_data_type {
+   PCIE_PHY_TYPE_EXYNOS5433,
+};
+
+struct exynos_pcie_phy_data {
+   enum exynos_pcie_phy_data_type  ctrl_type;
+   u32 pmureg_offset; /* PMU_REG offset */
+   struct phy_ops  *ops;
+};
+
+/* for Exynos pcie phy */
+struct exynos_pcie_phy {
+   const struct exynos_pcie_phy_data *drv_data;
+   struct regmap *pmureg;
+   struct regmap *fsysreg;
+   void __iomem *phy_base;
+};
+
+static void exynos_pcie_phy_writel(void __iomem *base, u32 val, u32 offset)
+{
+   writel(val, base + offset);
+}
+
+static int exynos_pcie_phy_init(struct phy *phy)
+{
+   struct exynos_pcie_phy *ep = phy_get_drvdata(phy);
+
+   if (ep->fsysreg) {
+   regmap_update_bits(ep->fsysreg, PCIE_PHY_COMMON_RESET,
+   PCIE_PHY_RESET, 1);
+   regmap_update_bits(ep->fsysreg, PCIE_PHY_MAC_RESET,
+   PCIE_MAC_RESET, 0);
+   /* PHY refclk 24MHz */
+   regmap_update_bits(ep->fsysreg, PCIE_PHY_GLOBAL_RESET,
+   PCIE_REFCLK_MASK, PCIE_REFCLK);
+   regmap_update_bits(ep->fsysreg, PCIE_PHY_GLOBAL_RESET,
+   PCIE_GLOBAL_RESET, 0);
+   }
+
+   exynos_pcie_phy_writel(ep->phy_base, 0x11, PCIE_PHY_OFFSET(0x3));
+
+   /* band gap reference on */
+   exynos_pcie_phy_writel(ep->phy_base, 0, PCIE_PHY_OFFSET(0x20));
+   exynos_pcie_phy_writel(ep->phy_base, 0, PCIE_PHY_OFFSET(0x4b));
+
+   /* jitter tunning */
+   exynos_pcie_phy_writel(ep->phy_base, 0x34,

[RFC PATCH 1/6] phy: exynos-pcie: Add support for Exynos PCIe phy

2016-12-25 Thread Jaehoon Chung

This patch supports to use Generic Phy framework for Exynos PCIe phy.
When Exynos that supported the pcie want to use the PCIe,
it needs to control the phy resgister.
But it should be more complex to control in their own PCIe device drivers.

Signed-off-by: Jaehoon Chung 
---
 drivers/phy/Kconfig   |   9 ++
 drivers/phy/Makefile  |   1 +
 drivers/phy/phy-exynos-pcie.c | 227 ++
 3 files changed, 237 insertions(+)
 create mode 100644 drivers/phy/phy-exynos-pcie.c

diff --git a/drivers/phy/Kconfig b/drivers/phy/Kconfig
index fe00f91..94b0433 100644
--- a/drivers/phy/Kconfig
+++ b/drivers/phy/Kconfig
@@ -341,6 +341,15 @@ config PHY_EXYNOS5_USBDRD
  This driver provides PHY interface for USB 3.0 DRD controller
  present on Exynos5 SoC series.
 
+config PHY_EXYNOS_PCIE
+   bool "Exynos PCIe PHY driver"
+   depends on ARCH_EXYNOS && OF
+   depends on PCI_EXYNOS5433
+   select GENERIC_PHY
+   help
+ Enable PCIe PHY support for Exynos SoC series.
+ This driver provides PHY interface for Exynos PCIe controller.
+
 config PHY_PISTACHIO_USB
tristate "IMG Pistachio USB2.0 PHY driver"
depends on MACH_PISTACHIO
diff --git a/drivers/phy/Makefile b/drivers/phy/Makefile
index a534cf5..586344d 100644
--- a/drivers/phy/Makefile
+++ b/drivers/phy/Makefile
@@ -38,6 +38,7 @@ phy-exynos-usb2-$(CONFIG_PHY_EXYNOS4X12_USB2) += 
phy-exynos4x12-usb2.o
 phy-exynos-usb2-$(CONFIG_PHY_EXYNOS5250_USB2)  += phy-exynos5250-usb2.o
 phy-exynos-usb2-$(CONFIG_PHY_S5PV210_USB2) += phy-s5pv210-usb2.o
 obj-$(CONFIG_PHY_EXYNOS5_USBDRD)   += phy-exynos5-usbdrd.o
+obj-$(CONFIG_PHY_EXYNOS_PCIE)  += phy-exynos-pcie.o
 obj-$(CONFIG_PHY_QCOM_APQ8064_SATA)+= phy-qcom-apq8064-sata.o
 obj-$(CONFIG_PHY_ROCKCHIP_USB) += phy-rockchip-usb.o
 obj-$(CONFIG_PHY_ROCKCHIP_INNO_USB2)   += phy-rockchip-inno-usb2.o
diff --git a/drivers/phy/phy-exynos-pcie.c b/drivers/phy/phy-exynos-pcie.c
new file mode 100644
index 000..0f5eefd
--- /dev/null
+++ b/drivers/phy/phy-exynos-pcie.c
@@ -0,0 +1,227 @@
+/*
+ * Samsung EXYNOS SoC series PCIe PHY driver
+ *
+ * Phy provider for PCIe controller on Exynos SoC series
+ *
+ * Copyright (C) 2016 Samsung Electronics Co., Ltd.
+ * Jaehoon Chung 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define PCIE_EXYNOS5433_PMU_PHY_OFFSET 0x730
+#define PCIE_PHY_OFFSET(x) ((x) * 0x4)
+
+/* Sysreg Fsys register offset and bit for Exynos5433 */
+#define PCIE_PHY_MAC_RESET 0x208
+#define PCIE_MAC_RESET_MASK0xFF
+#define PCIE_MAC_RESET BIT(4)
+#define PCIE_L1SUB_CM_CON  0x1010
+#define PCIE_REFCLK_GATING_EN  BIT(0)
+#define PCIE_PHY_COMMON_RESET  0x1020
+#define PCIE_PHY_RESET BIT(0)
+#define PCIE_PHY_GLOBAL_RESET  0x1040
+#define PCIE_GLOBAL_RESET  BIT(0)
+#define PCIE_REFCLKBIT(1)
+#define PCIE_REFCLK_MASK   0x16
+#define PCIE_APP_REQ_EXIT_L1_MODE  BIT(5)
+
+enum exynos_pcie_phy_data_type {
+   PCIE_PHY_TYPE_EXYNOS5433,
+};
+
+struct exynos_pcie_phy_data {
+   enum exynos_pcie_phy_data_type  ctrl_type;
+   u32 pmureg_offset; /* PMU_REG offset */
+   struct phy_ops  *ops;
+};
+
+/* for Exynos pcie phy */
+struct exynos_pcie_phy {
+   const struct exynos_pcie_phy_data *drv_data;
+   struct regmap *pmureg;
+   struct regmap *fsysreg;
+   void __iomem *phy_base;
+};
+
+static void exynos_pcie_phy_writel(void __iomem *base, u32 val, u32 offset)
+{
+   writel(val, base + offset);
+}
+
+static int exynos_pcie_phy_init(struct phy *phy)
+{
+   struct exynos_pcie_phy *ep = phy_get_drvdata(phy);
+
+   if (ep->fsysreg) {
+   regmap_update_bits(ep->fsysreg, PCIE_PHY_COMMON_RESET,
+   PCIE_PHY_RESET, 1);
+   regmap_update_bits(ep->fsysreg, PCIE_PHY_MAC_RESET,
+   PCIE_MAC_RESET, 0);
+   /* PHY refclk 24MHz */
+   regmap_update_bits(ep->fsysreg, PCIE_PHY_GLOBAL_RESET,
+   PCIE_REFCLK_MASK, PCIE_REFCLK);
+   regmap_update_bits(ep->fsysreg, PCIE_PHY_GLOBAL_RESET,
+   PCIE_GLOBAL_RESET, 0);
+   }
+
+   exynos_pcie_phy_writel(ep->phy_base, 0x11, PCIE_PHY_OFFSET(0x3));
+
+   /* band gap reference on */
+   exynos_pcie_phy_writel(ep->phy_base, 0, PCIE_PHY_OFFSET(0x20));
+   exynos_pcie_phy_writel(ep->phy_base, 0, PCIE_PHY_OFFSET(0x4b));
+
+   /* jitter tunning */
+   exynos_pcie_phy_writel(ep->phy_base, 0x34, PCIE_PHY_OFFSET(0x4));
+

RE: [PATCH v2 REGRESSION RESEND] usb: ohci-at91: use descriptor-based gpio APIs correctly

2016-12-25 Thread Wenyou.Yang



> -Original Message-
> From: Greg Kroah-Hartman [mailto:gre...@linuxfoundation.org]
> Sent: 2016年12月24日 2:05
> To: Peter Rosin 
> Cc: linux-kernel@vger.kernel.org; Alan Stern ;
> linux-...@vger.kernel.org; Wenyou Yang - A41535
> 
> Subject: Re: [PATCH v2 REGRESSION RESEND] usb: ohci-at91: use descriptor-
> based gpio APIs correctly
> 
> On Thu, Dec 22, 2016 at 09:38:08PM +0100, Peter Rosin wrote:
> > On 2016-12-22 18:27, Greg Kroah-Hartman wrote:
> > > On Thu, Dec 22, 2016 at 08:43:55AM +0100, Peter Rosin wrote:
> > >> The gpiod_get* function family does not want the -gpio suffix.
> > >> Use devm_gpiod_get_index_optional instead of devm_gpiod_get_optional.
> > >> The descriptor based APIs handle active high/low automatically.
> > >> The vbus-gpios are output, request enable while getting the gpio.
> > >> Don't try to get any vbus-gpios for ports outside num-ports.
> > >>
> > >> WTF? Big sigh.
> > >>
> > >> Fixes: 054d4b7b577d ("usb: ohci-at91: Use descriptor-based gpio
> > >> APIs")
> > >> Signed-off-by: Peter Rosin 
> > >> ---
> > >>
> > >> Hi!
> > >>
> > >> Resending this, since the only response I've got is that the merge
> > >> window is open and that this patch has been put on hold due to that.
> > >> But I think this regression (which happend between v4.9 and current
> > >> master) should be fixed before the merge window closes.
> > >
> > > I don't merge patches before -rc1 comes out, sorry, people should
> > > have tested linux-next better :)
> >
> > Neat, shift the blame for the shit patch over to the messenger :)
> 
> Not at all, I blame the original developer :)

I am very very sorry. It is my ignorance. Sorry.

I tested this patch on linux-next branch on the SAMA5D3 and SAMA5D4 Xplained 
board.

> 
> > > I'll catch up the first week of January, relax.
> >
> > As we all know, unrelated regressions are painful when trying to
> > locate other problems. It's seems silly to have a few extra for no good 
> > reason.
> 
> I am supposed to be on vacation and not reading email until the 3rd of 
> January,
> relax, we will catch up on stuff like this, and other minor things, soon 
> enough, in
> plenty of time for 4.10-final.
> 
> thanks,
> 
> greg k-h


Best Regards,
Wenyou Yang

RE: [PATCH v2 REGRESSION RESEND] usb: ohci-at91: use descriptor-based gpio APIs correctly

2016-12-25 Thread Wenyou.Yang



> -Original Message-
> From: Greg Kroah-Hartman [mailto:gre...@linuxfoundation.org]
> Sent: 2016年12月24日 2:05
> To: Peter Rosin 
> Cc: linux-kernel@vger.kernel.org; Alan Stern ;
> linux-...@vger.kernel.org; Wenyou Yang - A41535
> 
> Subject: Re: [PATCH v2 REGRESSION RESEND] usb: ohci-at91: use descriptor-
> based gpio APIs correctly
> 
> On Thu, Dec 22, 2016 at 09:38:08PM +0100, Peter Rosin wrote:
> > On 2016-12-22 18:27, Greg Kroah-Hartman wrote:
> > > On Thu, Dec 22, 2016 at 08:43:55AM +0100, Peter Rosin wrote:
> > >> The gpiod_get* function family does not want the -gpio suffix.
> > >> Use devm_gpiod_get_index_optional instead of devm_gpiod_get_optional.
> > >> The descriptor based APIs handle active high/low automatically.
> > >> The vbus-gpios are output, request enable while getting the gpio.
> > >> Don't try to get any vbus-gpios for ports outside num-ports.
> > >>
> > >> WTF? Big sigh.
> > >>
> > >> Fixes: 054d4b7b577d ("usb: ohci-at91: Use descriptor-based gpio
> > >> APIs")
> > >> Signed-off-by: Peter Rosin 
> > >> ---
> > >>
> > >> Hi!
> > >>
> > >> Resending this, since the only response I've got is that the merge
> > >> window is open and that this patch has been put on hold due to that.
> > >> But I think this regression (which happend between v4.9 and current
> > >> master) should be fixed before the merge window closes.
> > >
> > > I don't merge patches before -rc1 comes out, sorry, people should
> > > have tested linux-next better :)
> >
> > Neat, shift the blame for the shit patch over to the messenger :)
> 
> Not at all, I blame the original developer :)

I am very very sorry. It is my ignorance. Sorry.

I tested this patch on linux-next branch on the SAMA5D3 and SAMA5D4 Xplained 
board.

> 
> > > I'll catch up the first week of January, relax.
> >
> > As we all know, unrelated regressions are painful when trying to
> > locate other problems. It's seems silly to have a few extra for no good 
> > reason.
> 
> I am supposed to be on vacation and not reading email until the 3rd of 
> January,
> relax, we will catch up on stuff like this, and other minor things, soon 
> enough, in
> plenty of time for 4.10-final.
> 
> thanks,
> 
> greg k-h


Best Regards,
Wenyou Yang

Re: Detecting kprobes generated code addresses

2016-12-25 Thread Masami Hiramatsu

On Sun, 25 Dec 2016 15:16:00 +0900
Masami Hiramatsu  wrote:

> On Sun, 25 Dec 2016 12:13:20 +0900
> Masami Hiramatsu  wrote:
> 
> > On Thu, 22 Dec 2016 00:42:19 -0600
> > Josh Poimboeuf  wrote:
> > 
> > > Hi Masami,
> > > 
> > > I would like to make __kernel_text_address() be able to detect whether
> > > an address belongs to code which was generated by kprobes.  As far as I
> > > can tell, that information seems to be in the 'pages' lists of
> > > kprobe_insn_slots and kprobe_optinsn_slots.  But they seem to be
> > > protected by mutexes.
> > 
> > Right. It is currently under mutex because it may kick
> > page allocation. But I think it is easy to fix that :)
> 
> Hmm, IMHO, it seems that we should add a dummy (auto-generated)
> symbol for optprobe trampoline code to kallsyms so that
> __kernel_text_address() automatically returns true on it.

Sorry, I reconsidered this idea and conclude it was overkill.
As same as ftrace does, maybe it is enough to add a check
routine to __kernel_text_address().

Thanks,

-- 
Masami Hiramatsu

Re: Detecting kprobes generated code addresses

2016-12-25 Thread Masami Hiramatsu

On Sun, 25 Dec 2016 15:16:00 +0900
Masami Hiramatsu  wrote:

> On Sun, 25 Dec 2016 12:13:20 +0900
> Masami Hiramatsu  wrote:
> 
> > On Thu, 22 Dec 2016 00:42:19 -0600
> > Josh Poimboeuf  wrote:
> > 
> > > Hi Masami,
> > > 
> > > I would like to make __kernel_text_address() be able to detect whether
> > > an address belongs to code which was generated by kprobes.  As far as I
> > > can tell, that information seems to be in the 'pages' lists of
> > > kprobe_insn_slots and kprobe_optinsn_slots.  But they seem to be
> > > protected by mutexes.
> > 
> > Right. It is currently under mutex because it may kick
> > page allocation. But I think it is easy to fix that :)
> 
> Hmm, IMHO, it seems that we should add a dummy (auto-generated)
> symbol for optprobe trampoline code to kallsyms so that
> __kernel_text_address() automatically returns true on it.

Sorry, I reconsidered this idea and conclude it was overkill.
As same as ftrace does, maybe it is enough to add a check
routine to __kernel_text_address().

Thanks,

-- 
Masami Hiramatsu

[PATCH] lib: bitmap: introduce bitmap_find_next_zero_area_and_size

2016-12-25 Thread Jaewon Kim

There was no bitmap API which returns both next zero index and size of zeros
from that index.

This is helpful to look fragmentation. This is an test code to look size of 
zeros.
Test result is '10+9+994=>1013 found of total: 1024'

unsigned long search_idx, found_idx, nr_found_tot;
unsigned long bitmap_max;
unsigned int nr_found;
unsigned long *bitmap;

search_idx = nr_found_tot = 0;
bitmap_max = 1024;
bitmap = kzalloc(BITS_TO_LONGS(bitmap_max) * sizeof(long),
 GFP_KERNEL);

/* test bitmap_set offset, count */
bitmap_set(bitmap, 10, 1);
bitmap_set(bitmap, 20, 10);

for (;;) {
found_idx = bitmap_find_next_zero_area_and_size(bitmap,
bitmap_max, search_idx, _found);
if (found_idx >= bitmap_max)
break;
if (nr_found_tot == 0)
printk("%u", nr_found);
else
printk("+%u", nr_found);
nr_found_tot += nr_found;
search_idx = found_idx + nr_found;
}
printk("=>%lu found of total: %lu\n", nr_found_tot, bitmap_max);

Signed-off-by: Jaewon Kim 
---
 include/linux/bitmap.h |  6 ++
 lib/bitmap.c   | 25 +
 2 files changed, 31 insertions(+)

diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
index 3b77588..b724a6c 100644
--- a/include/linux/bitmap.h
+++ b/include/linux/bitmap.h
@@ -46,6 +46,7 @@
  * bitmap_clear(dst, pos, nbits)   Clear specified bit area
  * bitmap_find_next_zero_area(buf, len, pos, n, mask)  Find bit free area
  * bitmap_find_next_zero_area_off(buf, len, pos, n, mask)  as above
+ * bitmap_find_next_zero_area_and_size(buf, len, pos, n, mask) Find bit free 
area and its size
  * bitmap_shift_right(dst, src, n, nbits)  *dst = *src >> n
  * bitmap_shift_left(dst, src, n, nbits)   *dst = *src << n
  * bitmap_remap(dst, src, old, new, nbits) *dst = map(old, new)(src)
@@ -123,6 +124,11 @@ extern unsigned long 
bitmap_find_next_zero_area_off(unsigned long *map,
unsigned long align_mask,
unsigned long align_offset);
 
+extern unsigned long bitmap_find_next_zero_area_and_size(unsigned long *map,
+unsigned long size,
+unsigned long start,
+unsigned int *nr);
+
 /**
  * bitmap_find_next_zero_area - find a contiguous aligned zero area
  * @map: The address to base the search on
diff --git a/lib/bitmap.c b/lib/bitmap.c
index 0b66f0e..d02817c 100644
--- a/lib/bitmap.c
+++ b/lib/bitmap.c
@@ -332,6 +332,31 @@ unsigned long bitmap_find_next_zero_area_off(unsigned long 
*map,
 }
 EXPORT_SYMBOL(bitmap_find_next_zero_area_off);
 
+/**
+ * bitmap_find_next_zero_area_and_size - find a contiguous aligned zero area
+ * @map: The address to base the search on
+ * @size: The bitmap size in bits
+ * @start: The bitnumber to start searching at
+ * @nr: The number of zeroed bits we've found
+ */
+unsigned long bitmap_find_next_zero_area_and_size(unsigned long *map,
+unsigned long size,
+unsigned long start,
+unsigned int *nr)
+{
+   unsigned long index, i;
+
+   *nr = 0;
+   index = find_next_zero_bit(map, size, start);
+
+   if (index >= size)
+   return index;
+   i = find_next_bit(map, size, index);
+   *nr = i - index;
+   return index;
+}
+EXPORT_SYMBOL(bitmap_find_next_zero_area_and_size);
+
 /*
  * Bitmap printing & parsing functions: first version by Nadia Yvette Chambers,
  * second version by Paul Jackson, third by Joe Korty.
-- 
1.9.1

[PATCH] lib: bitmap: introduce bitmap_find_next_zero_area_and_size

2016-12-25 Thread Jaewon Kim

There was no bitmap API which returns both next zero index and size of zeros
from that index.

This is helpful to look fragmentation. This is an test code to look size of 
zeros.
Test result is '10+9+994=>1013 found of total: 1024'

unsigned long search_idx, found_idx, nr_found_tot;
unsigned long bitmap_max;
unsigned int nr_found;
unsigned long *bitmap;

search_idx = nr_found_tot = 0;
bitmap_max = 1024;
bitmap = kzalloc(BITS_TO_LONGS(bitmap_max) * sizeof(long),
 GFP_KERNEL);

/* test bitmap_set offset, count */
bitmap_set(bitmap, 10, 1);
bitmap_set(bitmap, 20, 10);

for (;;) {
found_idx = bitmap_find_next_zero_area_and_size(bitmap,
bitmap_max, search_idx, _found);
if (found_idx >= bitmap_max)
break;
if (nr_found_tot == 0)
printk("%u", nr_found);
else
printk("+%u", nr_found);
nr_found_tot += nr_found;
search_idx = found_idx + nr_found;
}
printk("=>%lu found of total: %lu\n", nr_found_tot, bitmap_max);

Signed-off-by: Jaewon Kim 
---
 include/linux/bitmap.h |  6 ++
 lib/bitmap.c   | 25 +
 2 files changed, 31 insertions(+)

diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
index 3b77588..b724a6c 100644
--- a/include/linux/bitmap.h
+++ b/include/linux/bitmap.h
@@ -46,6 +46,7 @@
  * bitmap_clear(dst, pos, nbits)   Clear specified bit area
  * bitmap_find_next_zero_area(buf, len, pos, n, mask)  Find bit free area
  * bitmap_find_next_zero_area_off(buf, len, pos, n, mask)  as above
+ * bitmap_find_next_zero_area_and_size(buf, len, pos, n, mask) Find bit free 
area and its size
  * bitmap_shift_right(dst, src, n, nbits)  *dst = *src >> n
  * bitmap_shift_left(dst, src, n, nbits)   *dst = *src << n
  * bitmap_remap(dst, src, old, new, nbits) *dst = map(old, new)(src)
@@ -123,6 +124,11 @@ extern unsigned long 
bitmap_find_next_zero_area_off(unsigned long *map,
unsigned long align_mask,
unsigned long align_offset);
 
+extern unsigned long bitmap_find_next_zero_area_and_size(unsigned long *map,
+unsigned long size,
+unsigned long start,
+unsigned int *nr);
+
 /**
  * bitmap_find_next_zero_area - find a contiguous aligned zero area
  * @map: The address to base the search on
diff --git a/lib/bitmap.c b/lib/bitmap.c
index 0b66f0e..d02817c 100644
--- a/lib/bitmap.c
+++ b/lib/bitmap.c
@@ -332,6 +332,31 @@ unsigned long bitmap_find_next_zero_area_off(unsigned long 
*map,
 }
 EXPORT_SYMBOL(bitmap_find_next_zero_area_off);
 
+/**
+ * bitmap_find_next_zero_area_and_size - find a contiguous aligned zero area
+ * @map: The address to base the search on
+ * @size: The bitmap size in bits
+ * @start: The bitnumber to start searching at
+ * @nr: The number of zeroed bits we've found
+ */
+unsigned long bitmap_find_next_zero_area_and_size(unsigned long *map,
+unsigned long size,
+unsigned long start,
+unsigned int *nr)
+{
+   unsigned long index, i;
+
+   *nr = 0;
+   index = find_next_zero_bit(map, size, start);
+
+   if (index >= size)
+   return index;
+   i = find_next_bit(map, size, index);
+   *nr = i - index;
+   return index;
+}
+EXPORT_SYMBOL(bitmap_find_next_zero_area_and_size);
+
 /*
  * Bitmap printing & parsing functions: first version by Nadia Yvette Chambers,
  * second version by Paul Jackson, third by Joe Korty.
-- 
1.9.1

[PATCH] [linux-next] tools/power turbostat: Fix typo in turbostat.8

2016-12-25 Thread Masanari Iida

This patch fix two spelling typo in turbostat.8

Signed-off-by: Masanari Iida 
---
 tools/power/x86/turbostat/turbostat.8 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/power/x86/turbostat/turbostat.8 
b/tools/power/x86/turbostat/turbostat.8
index 492e84fbebfa..6765e0ae9830 100644
--- a/tools/power/x86/turbostat/turbostat.8
+++ b/tools/power/x86/turbostat/turbostat.8
@@ -171,7 +171,7 @@ The remaining rows show what maximum turbo frequency is 
possible
 depending on the number of idle cores.  Note that not all information is
 available on all processors.
 .PP
-The --debug option adds additional columns to the measurement ouput, including 
CPU idle power-state residency processor temperature sensor readinds.
+The --debug option adds additional columns to the measurement output, 
including CPU idle power-state residency processor temperature sensor readings.
 See the field definitions above.
 .SH FORK EXAMPLE
 If turbostat is invoked with a command, it will fork that command
-- 
2.11.0.161.g6610af872f64

[PATCH] [linux-next] tools/power turbostat: Fix typo in turbostat.8

2016-12-25 Thread Masanari Iida

This patch fix two spelling typo in turbostat.8

Signed-off-by: Masanari Iida 
---
 tools/power/x86/turbostat/turbostat.8 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/power/x86/turbostat/turbostat.8 
b/tools/power/x86/turbostat/turbostat.8
index 492e84fbebfa..6765e0ae9830 100644
--- a/tools/power/x86/turbostat/turbostat.8
+++ b/tools/power/x86/turbostat/turbostat.8
@@ -171,7 +171,7 @@ The remaining rows show what maximum turbo frequency is 
possible
 depending on the number of idle cores.  Note that not all information is
 available on all processors.
 .PP
-The --debug option adds additional columns to the measurement ouput, including 
CPU idle power-state residency processor temperature sensor readinds.
+The --debug option adds additional columns to the measurement output, 
including CPU idle power-state residency processor temperature sensor readings.
 See the field definitions above.
 .SH FORK EXAMPLE
 If turbostat is invoked with a command, it will fork that command
-- 
2.11.0.161.g6610af872f64

[PATCH v3 4/4] clk: rockchip: add new pll-type for rk3328

2016-12-25 Thread Elaine Zhang

The rk3328's pll and clock are similar with rk3036's,
it different with pll_mode_mask, the rk3328 soc
pll mode only one bit(rk3036 soc have two bits)
so these should be independent and separate from
the series of rk3328s.

Changes in v3:
  fix up the pll type pll_rk3328 description and use

Signed-off-by: Elaine Zhang 
---
 drivers/clk/rockchip/clk-pll.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/clk/rockchip/clk-pll.c b/drivers/clk/rockchip/clk-pll.c
index 6ed605776abd..eec51893a7e6 100644
--- a/drivers/clk/rockchip/clk-pll.c
+++ b/drivers/clk/rockchip/clk-pll.c
@@ -29,6 +29,7 @@
 #define PLL_MODE_SLOW  0x0
 #define PLL_MODE_NORM  0x1
 #define PLL_MODE_DEEP  0x2
+#define PLL_RK3328_MODE_MASK   0x1
 
 struct rockchip_clk_pll {
struct clk_hw   hw;
@@ -848,7 +849,8 @@ struct clk *rockchip_clk_register_pll(struct 
rockchip_clk_provider *ctx,
struct clk *pll_clk, *mux_clk;
char pll_name[20];
 
-   if (num_parents != 2) {
+   if ((pll_type != pll_rk3328 && num_parents != 2) ||
+   (pll_type == pll_rk3328 && num_parents != 1)) {
pr_err("%s: needs two parent clocks\n", __func__);
return ERR_PTR(-EINVAL);
}
@@ -865,13 +867,17 @@ struct clk *rockchip_clk_register_pll(struct 
rockchip_clk_provider *ctx,
pll_mux = >pll_mux;
pll_mux->reg = ctx->reg_base + mode_offset;
pll_mux->shift = mode_shift;
-   pll_mux->mask = PLL_MODE_MASK;
+   if (pll_type == pll_rk3328)
+   pll_mux->mask = PLL_RK3328_MODE_MASK;
+   else
+   pll_mux->mask = PLL_MODE_MASK;
pll_mux->flags = 0;
pll_mux->lock = >lock;
pll_mux->hw.init = 
 
if (pll_type == pll_rk3036 ||
pll_type == pll_rk3066 ||
+   pll_type == pll_rk3328 ||
pll_type == pll_rk3399)
pll_mux->flags |= CLK_MUX_HIWORD_MASK;
 
@@ -884,7 +890,10 @@ struct clk *rockchip_clk_register_pll(struct 
rockchip_clk_provider *ctx,
init.flags = CLK_SET_RATE_PARENT;
init.ops = pll->pll_mux_ops;
init.parent_names = pll_parents;
-   init.num_parents = ARRAY_SIZE(pll_parents);
+   if (pll_type == pll_rk3328)
+   init.num_parents = 2;
+   else
+   init.num_parents = ARRAY_SIZE(pll_parents);
 
mux_clk = clk_register(NULL, _mux->hw);
if (IS_ERR(mux_clk))
@@ -918,6 +927,7 @@ struct clk *rockchip_clk_register_pll(struct 
rockchip_clk_provider *ctx,
 
switch (pll_type) {
case pll_rk3036:
+   case pll_rk3328:
if (!pll->rate_table || IS_ERR(ctx->grf))
init.ops = _rk3036_pll_clk_norate_ops;
else
-- 
1.9.1

[PATCH v3 3/4] clk: rockchip: add clock controller for rk3328

2016-12-25 Thread Elaine Zhang

Add the clock tree definition for the new rk3328 SoC.

Changes in v3:
  fix up the pll parent only xin24m.
Changes in v2:
  fix up these *_sample error description.

Signed-off-by: Elaine Zhang 
---
 drivers/clk/rockchip/Makefile |1 +
 drivers/clk/rockchip/clk-rk3328.c | 1068 +
 drivers/clk/rockchip/clk.h|   23 +
 3 files changed, 1092 insertions(+)
 create mode 100644 drivers/clk/rockchip/clk-rk3328.c

diff --git a/drivers/clk/rockchip/Makefile b/drivers/clk/rockchip/Makefile
index 16e098c36f90..68b04bfca282 100644
--- a/drivers/clk/rockchip/Makefile
+++ b/drivers/clk/rockchip/Makefile
@@ -16,5 +16,6 @@ obj-y += clk-rk3036.o
 obj-y  += clk-rk3188.o
 obj-y  += clk-rk3228.o
 obj-y  += clk-rk3288.o
+obj-y  += clk-rk3328.o
 obj-y  += clk-rk3368.o
 obj-y  += clk-rk3399.o
diff --git a/drivers/clk/rockchip/clk-rk3328.c 
b/drivers/clk/rockchip/clk-rk3328.c
new file mode 100644
index ..9958ce7d0dcd
--- /dev/null
+++ b/drivers/clk/rockchip/clk-rk3328.c
@@ -0,0 +1,1068 @@
+/*
+ * Copyright (c) 2016 Rockchip Electronics Co. Ltd.
+ * Author: Elaine 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "clk.h"
+
+#define RK3328_GRF_SOC_STATUS0 0x480
+#define RK3328_GRF_MAC_CON10x904
+#define RK3328_GRF_MAC_CON20x908
+
+enum rk3328_plls {
+   apll, dpll, cpll, gpll, npll,
+};
+
+static struct rockchip_pll_rate_table rk3328_pll_rates[] = {
+   /* _mhz, _refdiv, _fbdiv, _postdiv1, _postdiv2, _dsmpd, _frac */
+   RK3036_PLL_RATE(160800, 1, 67, 1, 1, 1, 0),
+   RK3036_PLL_RATE(158400, 1, 66, 1, 1, 1, 0),
+   RK3036_PLL_RATE(156000, 1, 65, 1, 1, 1, 0),
+   RK3036_PLL_RATE(153600, 1, 64, 1, 1, 1, 0),
+   RK3036_PLL_RATE(151200, 1, 63, 1, 1, 1, 0),
+   RK3036_PLL_RATE(148800, 1, 62, 1, 1, 1, 0),
+   RK3036_PLL_RATE(146400, 1, 61, 1, 1, 1, 0),
+   RK3036_PLL_RATE(144000, 1, 60, 1, 1, 1, 0),
+   RK3036_PLL_RATE(141600, 1, 59, 1, 1, 1, 0),
+   RK3036_PLL_RATE(139200, 1, 58, 1, 1, 1, 0),
+   RK3036_PLL_RATE(136800, 1, 57, 1, 1, 1, 0),
+   RK3036_PLL_RATE(134400, 1, 56, 1, 1, 1, 0),
+   RK3036_PLL_RATE(132000, 1, 55, 1, 1, 1, 0),
+   RK3036_PLL_RATE(129600, 1, 54, 1, 1, 1, 0),
+   RK3036_PLL_RATE(127200, 1, 53, 1, 1, 1, 0),
+   RK3036_PLL_RATE(124800, 1, 52, 1, 1, 1, 0),
+   RK3036_PLL_RATE(12, 1, 50, 1, 1, 1, 0),
+   RK3036_PLL_RATE(118800, 2, 99, 1, 1, 1, 0),
+   RK3036_PLL_RATE(110400, 1, 46, 1, 1, 1, 0),
+   RK3036_PLL_RATE(11, 12, 550, 1, 1, 1, 0),
+   RK3036_PLL_RATE(100800, 1, 84, 2, 1, 1, 0),
+   RK3036_PLL_RATE(10, 6, 500, 2, 1, 1, 0),
+   RK3036_PLL_RATE(98400, 1, 82, 2, 1, 1, 0),
+   RK3036_PLL_RATE(96000, 1, 80, 2, 1, 1, 0),
+   RK3036_PLL_RATE(93600, 1, 78, 2, 1, 1, 0),
+   RK3036_PLL_RATE(91200, 1, 76, 2, 1, 1, 0),
+   RK3036_PLL_RATE(9, 4, 300, 2, 1, 1, 0),
+   RK3036_PLL_RATE(88800, 1, 74, 2, 1, 1, 0),
+   RK3036_PLL_RATE(86400, 1, 72, 2, 1, 1, 0),
+   RK3036_PLL_RATE(84000, 1, 70, 2, 1, 1, 0),
+   RK3036_PLL_RATE(81600, 1, 68, 2, 1, 1, 0),
+   RK3036_PLL_RATE(8, 6, 400, 2, 1, 1, 0),
+   RK3036_PLL_RATE(7, 6, 350, 2, 1, 1, 0),
+   RK3036_PLL_RATE(69600, 1, 58, 2, 1, 1, 0),
+   RK3036_PLL_RATE(6, 1, 75, 3, 1, 1, 0),
+   RK3036_PLL_RATE(59400, 2, 99, 2, 1, 1, 0),
+   RK3036_PLL_RATE(50400, 1, 63, 3, 1, 1, 0),
+   RK3036_PLL_RATE(5, 6, 250, 2, 1, 1, 0),
+   RK3036_PLL_RATE(40800, 1, 68, 2, 2, 1, 0),
+   RK3036_PLL_RATE(31200, 1, 52, 2, 2, 1, 0),
+   RK3036_PLL_RATE(21600, 1, 72, 4, 2, 1, 0),
+   RK3036_PLL_RATE(9600, 1, 64, 4, 4, 1, 0),
+   { /* sentinel */ },
+};
+
+static struct rockchip_pll_rate_table rk3328_pll_frac_rates[] = {
+   /* _mhz, _refdiv, _fbdiv, _postdiv1, _postdiv2, _dsmpd, _frac */
+   RK3036_PLL_RATE(1016064000, 3, 127, 1, 1, 0, 134217),
+   /* vco = 1016064000 */
+   RK3036_PLL_RATE(98304, 24, 983, 1, 1, 0, 671088),
+   /* vco = 98304 */
+   RK3036_PLL_RATE(49152, 24, 983, 2, 1, 0, 671088),
+   /* vco = 98304 */
+   RK3036_PLL_RATE(6144, 6, 215, 7, 2, 0, 671088),
+   /* vco = 860156000 */
+

[PATCH v3 4/4] clk: rockchip: add new pll-type for rk3328

2016-12-25 Thread Elaine Zhang

The rk3328's pll and clock are similar with rk3036's,
it different with pll_mode_mask, the rk3328 soc
pll mode only one bit(rk3036 soc have two bits)
so these should be independent and separate from
the series of rk3328s.

Changes in v3:
  fix up the pll type pll_rk3328 description and use

Signed-off-by: Elaine Zhang 
---
 drivers/clk/rockchip/clk-pll.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/clk/rockchip/clk-pll.c b/drivers/clk/rockchip/clk-pll.c
index 6ed605776abd..eec51893a7e6 100644
--- a/drivers/clk/rockchip/clk-pll.c
+++ b/drivers/clk/rockchip/clk-pll.c
@@ -29,6 +29,7 @@
 #define PLL_MODE_SLOW  0x0
 #define PLL_MODE_NORM  0x1
 #define PLL_MODE_DEEP  0x2
+#define PLL_RK3328_MODE_MASK   0x1
 
 struct rockchip_clk_pll {
struct clk_hw   hw;
@@ -848,7 +849,8 @@ struct clk *rockchip_clk_register_pll(struct 
rockchip_clk_provider *ctx,
struct clk *pll_clk, *mux_clk;
char pll_name[20];
 
-   if (num_parents != 2) {
+   if ((pll_type != pll_rk3328 && num_parents != 2) ||
+   (pll_type == pll_rk3328 && num_parents != 1)) {
pr_err("%s: needs two parent clocks\n", __func__);
return ERR_PTR(-EINVAL);
}
@@ -865,13 +867,17 @@ struct clk *rockchip_clk_register_pll(struct 
rockchip_clk_provider *ctx,
pll_mux = >pll_mux;
pll_mux->reg = ctx->reg_base + mode_offset;
pll_mux->shift = mode_shift;
-   pll_mux->mask = PLL_MODE_MASK;
+   if (pll_type == pll_rk3328)
+   pll_mux->mask = PLL_RK3328_MODE_MASK;
+   else
+   pll_mux->mask = PLL_MODE_MASK;
pll_mux->flags = 0;
pll_mux->lock = >lock;
pll_mux->hw.init = 
 
if (pll_type == pll_rk3036 ||
pll_type == pll_rk3066 ||
+   pll_type == pll_rk3328 ||
pll_type == pll_rk3399)
pll_mux->flags |= CLK_MUX_HIWORD_MASK;
 
@@ -884,7 +890,10 @@ struct clk *rockchip_clk_register_pll(struct 
rockchip_clk_provider *ctx,
init.flags = CLK_SET_RATE_PARENT;
init.ops = pll->pll_mux_ops;
init.parent_names = pll_parents;
-   init.num_parents = ARRAY_SIZE(pll_parents);
+   if (pll_type == pll_rk3328)
+   init.num_parents = 2;
+   else
+   init.num_parents = ARRAY_SIZE(pll_parents);
 
mux_clk = clk_register(NULL, _mux->hw);
if (IS_ERR(mux_clk))
@@ -918,6 +927,7 @@ struct clk *rockchip_clk_register_pll(struct 
rockchip_clk_provider *ctx,
 
switch (pll_type) {
case pll_rk3036:
+   case pll_rk3328:
if (!pll->rate_table || IS_ERR(ctx->grf))
init.ops = _rk3036_pll_clk_norate_ops;
else
-- 
1.9.1

[PATCH v3 3/4] clk: rockchip: add clock controller for rk3328

2016-12-25 Thread Elaine Zhang

Add the clock tree definition for the new rk3328 SoC.

Changes in v3:
  fix up the pll parent only xin24m.
Changes in v2:
  fix up these *_sample error description.

Signed-off-by: Elaine Zhang 
---
 drivers/clk/rockchip/Makefile |1 +
 drivers/clk/rockchip/clk-rk3328.c | 1068 +
 drivers/clk/rockchip/clk.h|   23 +
 3 files changed, 1092 insertions(+)
 create mode 100644 drivers/clk/rockchip/clk-rk3328.c

diff --git a/drivers/clk/rockchip/Makefile b/drivers/clk/rockchip/Makefile
index 16e098c36f90..68b04bfca282 100644
--- a/drivers/clk/rockchip/Makefile
+++ b/drivers/clk/rockchip/Makefile
@@ -16,5 +16,6 @@ obj-y += clk-rk3036.o
 obj-y  += clk-rk3188.o
 obj-y  += clk-rk3228.o
 obj-y  += clk-rk3288.o
+obj-y  += clk-rk3328.o
 obj-y  += clk-rk3368.o
 obj-y  += clk-rk3399.o
diff --git a/drivers/clk/rockchip/clk-rk3328.c 
b/drivers/clk/rockchip/clk-rk3328.c
new file mode 100644
index ..9958ce7d0dcd
--- /dev/null
+++ b/drivers/clk/rockchip/clk-rk3328.c
@@ -0,0 +1,1068 @@
+/*
+ * Copyright (c) 2016 Rockchip Electronics Co. Ltd.
+ * Author: Elaine 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "clk.h"
+
+#define RK3328_GRF_SOC_STATUS0 0x480
+#define RK3328_GRF_MAC_CON10x904
+#define RK3328_GRF_MAC_CON20x908
+
+enum rk3328_plls {
+   apll, dpll, cpll, gpll, npll,
+};
+
+static struct rockchip_pll_rate_table rk3328_pll_rates[] = {
+   /* _mhz, _refdiv, _fbdiv, _postdiv1, _postdiv2, _dsmpd, _frac */
+   RK3036_PLL_RATE(160800, 1, 67, 1, 1, 1, 0),
+   RK3036_PLL_RATE(158400, 1, 66, 1, 1, 1, 0),
+   RK3036_PLL_RATE(156000, 1, 65, 1, 1, 1, 0),
+   RK3036_PLL_RATE(153600, 1, 64, 1, 1, 1, 0),
+   RK3036_PLL_RATE(151200, 1, 63, 1, 1, 1, 0),
+   RK3036_PLL_RATE(148800, 1, 62, 1, 1, 1, 0),
+   RK3036_PLL_RATE(146400, 1, 61, 1, 1, 1, 0),
+   RK3036_PLL_RATE(144000, 1, 60, 1, 1, 1, 0),
+   RK3036_PLL_RATE(141600, 1, 59, 1, 1, 1, 0),
+   RK3036_PLL_RATE(139200, 1, 58, 1, 1, 1, 0),
+   RK3036_PLL_RATE(136800, 1, 57, 1, 1, 1, 0),
+   RK3036_PLL_RATE(134400, 1, 56, 1, 1, 1, 0),
+   RK3036_PLL_RATE(132000, 1, 55, 1, 1, 1, 0),
+   RK3036_PLL_RATE(129600, 1, 54, 1, 1, 1, 0),
+   RK3036_PLL_RATE(127200, 1, 53, 1, 1, 1, 0),
+   RK3036_PLL_RATE(124800, 1, 52, 1, 1, 1, 0),
+   RK3036_PLL_RATE(12, 1, 50, 1, 1, 1, 0),
+   RK3036_PLL_RATE(118800, 2, 99, 1, 1, 1, 0),
+   RK3036_PLL_RATE(110400, 1, 46, 1, 1, 1, 0),
+   RK3036_PLL_RATE(11, 12, 550, 1, 1, 1, 0),
+   RK3036_PLL_RATE(100800, 1, 84, 2, 1, 1, 0),
+   RK3036_PLL_RATE(10, 6, 500, 2, 1, 1, 0),
+   RK3036_PLL_RATE(98400, 1, 82, 2, 1, 1, 0),
+   RK3036_PLL_RATE(96000, 1, 80, 2, 1, 1, 0),
+   RK3036_PLL_RATE(93600, 1, 78, 2, 1, 1, 0),
+   RK3036_PLL_RATE(91200, 1, 76, 2, 1, 1, 0),
+   RK3036_PLL_RATE(9, 4, 300, 2, 1, 1, 0),
+   RK3036_PLL_RATE(88800, 1, 74, 2, 1, 1, 0),
+   RK3036_PLL_RATE(86400, 1, 72, 2, 1, 1, 0),
+   RK3036_PLL_RATE(84000, 1, 70, 2, 1, 1, 0),
+   RK3036_PLL_RATE(81600, 1, 68, 2, 1, 1, 0),
+   RK3036_PLL_RATE(8, 6, 400, 2, 1, 1, 0),
+   RK3036_PLL_RATE(7, 6, 350, 2, 1, 1, 0),
+   RK3036_PLL_RATE(69600, 1, 58, 2, 1, 1, 0),
+   RK3036_PLL_RATE(6, 1, 75, 3, 1, 1, 0),
+   RK3036_PLL_RATE(59400, 2, 99, 2, 1, 1, 0),
+   RK3036_PLL_RATE(50400, 1, 63, 3, 1, 1, 0),
+   RK3036_PLL_RATE(5, 6, 250, 2, 1, 1, 0),
+   RK3036_PLL_RATE(40800, 1, 68, 2, 2, 1, 0),
+   RK3036_PLL_RATE(31200, 1, 52, 2, 2, 1, 0),
+   RK3036_PLL_RATE(21600, 1, 72, 4, 2, 1, 0),
+   RK3036_PLL_RATE(9600, 1, 64, 4, 4, 1, 0),
+   { /* sentinel */ },
+};
+
+static struct rockchip_pll_rate_table rk3328_pll_frac_rates[] = {
+   /* _mhz, _refdiv, _fbdiv, _postdiv1, _postdiv2, _dsmpd, _frac */
+   RK3036_PLL_RATE(1016064000, 3, 127, 1, 1, 0, 134217),
+   /* vco = 1016064000 */
+   RK3036_PLL_RATE(98304, 24, 983, 1, 1, 0, 671088),
+   /* vco = 98304 */
+   RK3036_PLL_RATE(49152, 24, 983, 2, 1, 0, 671088),
+   /* vco = 98304 */
+   RK3036_PLL_RATE(6144, 6, 215, 7, 2, 0, 671088),
+   /* vco = 860156000 */
+   RK3036_PLL_RATE(56448000, 12, 451, 4, 4, 0, 9797894),
+

[PATCH v3 2/4] dt-bindings: add bindings for rk3328 clock controller

2016-12-25 Thread Elaine Zhang

Add devicetree bindings for Rockchip cru which found on
Rockchip SoCs.

Signed-off-by: Elaine Zhang 
---
 .../bindings/clock/rockchip,rk3328-cru.txt | 57 ++
 1 file changed, 57 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/clock/rockchip,rk3328-cru.txt

diff --git a/Documentation/devicetree/bindings/clock/rockchip,rk3328-cru.txt 
b/Documentation/devicetree/bindings/clock/rockchip,rk3328-cru.txt
new file mode 100644
index ..20053494d49f
--- /dev/null
+++ b/Documentation/devicetree/bindings/clock/rockchip,rk3328-cru.txt
@@ -0,0 +1,57 @@
+* Rockchip RK3328 Clock and Reset Unit
+
+The RK3328 clock controller generates and supplies clock to various
+controllers within the SoC and also implements a reset controller for SoC
+peripherals.
+
+Required Properties:
+
+- compatible: should be "rockchip,rk3328-cru"
+- reg: physical base address of the controller and length of memory mapped
+  region.
+- #clock-cells: should be 1.
+- #reset-cells: should be 1.
+
+Optional Properties:
+
+- rockchip,grf: phandle to the syscon managing the "general register files"
+  If missing pll rates are not changeable, due to the missing pll lock status.
+
+Each clock is assigned an identifier and client nodes can use this identifier
+to specify the clock which they consume. All available clocks are defined as
+preprocessor macros in the dt-bindings/clock/rk3328-cru.h headers and can be
+used in device tree sources. Similar macros exist for the reset sources in
+these files.
+
+External clocks:
+
+There are several clocks that are generated outside the SoC. It is expected
+that they are defined using standard clock bindings with following
+clock-output-names:
+ - "xin24m" - crystal input - required,
+ - "clkin_i2s" - external I2S clock - optional,
+ - "gmac_clkin" - external GMAC clock - optional
+ - "phy_50m_out" - output clock of the pll in the mac phy
+
+Example: Clock controller node:
+
+   cru: clock-controller@ff44 {
+   compatible = "rockchip,rk3328-cru", "rockchip,cru", "syscon";
+   reg = <0x0 0xff44 0x0 0x1000>;
+   rockchip,grf = <>;
+
+   #clock-cells = <1>;
+   #reset-cells = <1>;
+   };
+
+Example: UART controller node that consumes the clock generated by the clock
+  controller:
+
+   uart0: serial@ff12 {
+   compatible = "snps,dw-apb-uart";
+   reg = <0xff12 0x100>;
+   interrupts = ;
+   reg-shift = <2>;
+   reg-io-width = <4>;
+   clocks = < SCLK_UART0>;
+   };
-- 
1.9.1

[PATCH v3 2/4] dt-bindings: add bindings for rk3328 clock controller

2016-12-25 Thread Elaine Zhang

Add devicetree bindings for Rockchip cru which found on
Rockchip SoCs.

Signed-off-by: Elaine Zhang 
---
 .../bindings/clock/rockchip,rk3328-cru.txt | 57 ++
 1 file changed, 57 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/clock/rockchip,rk3328-cru.txt

diff --git a/Documentation/devicetree/bindings/clock/rockchip,rk3328-cru.txt 
b/Documentation/devicetree/bindings/clock/rockchip,rk3328-cru.txt
new file mode 100644
index ..20053494d49f
--- /dev/null
+++ b/Documentation/devicetree/bindings/clock/rockchip,rk3328-cru.txt
@@ -0,0 +1,57 @@
+* Rockchip RK3328 Clock and Reset Unit
+
+The RK3328 clock controller generates and supplies clock to various
+controllers within the SoC and also implements a reset controller for SoC
+peripherals.
+
+Required Properties:
+
+- compatible: should be "rockchip,rk3328-cru"
+- reg: physical base address of the controller and length of memory mapped
+  region.
+- #clock-cells: should be 1.
+- #reset-cells: should be 1.
+
+Optional Properties:
+
+- rockchip,grf: phandle to the syscon managing the "general register files"
+  If missing pll rates are not changeable, due to the missing pll lock status.
+
+Each clock is assigned an identifier and client nodes can use this identifier
+to specify the clock which they consume. All available clocks are defined as
+preprocessor macros in the dt-bindings/clock/rk3328-cru.h headers and can be
+used in device tree sources. Similar macros exist for the reset sources in
+these files.
+
+External clocks:
+
+There are several clocks that are generated outside the SoC. It is expected
+that they are defined using standard clock bindings with following
+clock-output-names:
+ - "xin24m" - crystal input - required,
+ - "clkin_i2s" - external I2S clock - optional,
+ - "gmac_clkin" - external GMAC clock - optional
+ - "phy_50m_out" - output clock of the pll in the mac phy
+
+Example: Clock controller node:
+
+   cru: clock-controller@ff44 {
+   compatible = "rockchip,rk3328-cru", "rockchip,cru", "syscon";
+   reg = <0x0 0xff44 0x0 0x1000>;
+   rockchip,grf = <>;
+
+   #clock-cells = <1>;
+   #reset-cells = <1>;
+   };
+
+Example: UART controller node that consumes the clock generated by the clock
+  controller:
+
+   uart0: serial@ff12 {
+   compatible = "snps,dw-apb-uart";
+   reg = <0xff12 0x100>;
+   interrupts = ;
+   reg-shift = <2>;
+   reg-io-width = <4>;
+   clocks = < SCLK_UART0>;
+   };
-- 
1.9.1

[PATCH v3 0/4] clk: rockchip: support clk controller for rk3328 SoC

2016-12-25 Thread Elaine Zhang

Changes in v3:
  fix up the pll type pll_rk3328 description and use.
Changes in v2:
  add bindings for rk3328 clock controller

Elaine Zhang (4):
  clk: rockchip: add dt-binding header for rk3328
  dt-bindings: add bindings for rk3328 clock controller
  clk: rockchip: add clock controller for rk3328
  clk: rockchip: add new pll-type for rk3328

 .../bindings/clock/rockchip,rk3328-cru.txt |   57 ++
 drivers/clk/rockchip/Makefile  |1 +
 drivers/clk/rockchip/clk-pll.c |   16 +-
 drivers/clk/rockchip/clk-rk3328.c  | 1068 
 drivers/clk/rockchip/clk.h |   23 +
 include/dt-bindings/clock/rk3328-cru.h |  403 
 6 files changed, 1565 insertions(+), 3 deletions(-)
 create mode 100644 
Documentation/devicetree/bindings/clock/rockchip,rk3328-cru.txt
 create mode 100644 drivers/clk/rockchip/clk-rk3328.c
 create mode 100644 include/dt-bindings/clock/rk3328-cru.h

-- 
1.9.1

[PATCH v3 1/4] clk: rockchip: add dt-binding header for rk3328

2016-12-25 Thread Elaine Zhang

Add the dt-bindings header for the rk3328, that gets shared between
the clock controller and the clock references in the dts.
Add softreset ID for rk3328.

Signed-off-by: Elaine Zhang 
---
 include/dt-bindings/clock/rk3328-cru.h | 403 +
 1 file changed, 403 insertions(+)
 create mode 100644 include/dt-bindings/clock/rk3328-cru.h

diff --git a/include/dt-bindings/clock/rk3328-cru.h 
b/include/dt-bindings/clock/rk3328-cru.h
new file mode 100644
index ..545ed7541316
--- /dev/null
+++ b/include/dt-bindings/clock/rk3328-cru.h
@@ -0,0 +1,403 @@
+/*
+ * Copyright (c) 2016 Rockchip Electronics Co. Ltd.
+ * Author: Elaine 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef _DT_BINDINGS_CLK_ROCKCHIP_RK3328_H
+#define _DT_BINDINGS_CLK_ROCKCHIP_RK3328_H
+
+/* core clocks */
+#define PLL_APLL   1
+#define PLL_DPLL   2
+#define PLL_CPLL   3
+#define PLL_GPLL   4
+#define PLL_NPLL   5
+#define ARMCLK 6
+
+/* sclk gates (special clocks) */
+#define SCLK_RTC32K30
+#define SCLK_SDMMC_EXT 31
+#define SCLK_SPI   32
+#define SCLK_SDMMC 33
+#define SCLK_SDIO  34
+#define SCLK_EMMC  35
+#define SCLK_TSADC 36
+#define SCLK_SARADC37
+#define SCLK_UART0 38
+#define SCLK_UART1 39
+#define SCLK_UART2 40
+#define SCLK_I2S0  41
+#define SCLK_I2S1  42
+#define SCLK_I2S2  43
+#define SCLK_I2S1_OUT  44
+#define SCLK_I2S2_OUT  45
+#define SCLK_SPDIF 46
+#define SCLK_TIMER047
+#define SCLK_TIMER148
+#define SCLK_TIMER249
+#define SCLK_TIMER350
+#define SCLK_TIMER451
+#define SCLK_TIMER552
+#define SCLK_WIFI  53
+#define SCLK_CIF_OUT   54
+#define SCLK_I2C0  55
+#define SCLK_I2C1  56
+#define SCLK_I2C2  57
+#define SCLK_I2C3  58
+#define SCLK_CRYPTO59
+#define SCLK_PWM   60
+#define SCLK_PDM   61
+#define SCLK_EFUSE 62
+#define SCLK_OTP   63
+#define SCLK_DDRCLK64
+#define SCLK_VDEC_CABAC65
+#define SCLK_VDEC_CORE 66
+#define SCLK_VENC_DSP  67
+#define SCLK_VENC_CORE 68
+#define SCLK_RGA   69
+#define SCLK_HDMI_SFC  70
+#define SCLK_HDMI_CEC  71
+#define SCLK_USB3_REF  72
+#define SCLK_USB3_SUSPEND  73
+#define SCLK_SDMMC_DRV 74
+#define SCLK_SDIO_DRV  75
+#define SCLK_EMMC_DRV  76
+#define SCLK_SDMMC_EXT_DRV 77
+#define SCLK_SDMMC_SAMPLE  78
+#define SCLK_SDIO_SAMPLE   79
+#define SCLK_EMMC_SAMPLE   80
+#define SCLK_SDMMC_EXT_SAMPLE  81
+#define SCLK_VOP   82
+#define SCLK_MAC2PHY_RXTX  83
+#define SCLK_MAC2PHY_SRC   84
+#define SCLK_MAC2PHY_REF   85
+#define SCLK_MAC2PHY_OUT   86
+#define SCLK_MAC2IO_RX 87
+#define SCLK_MAC2IO_TX 88
+#define SCLK_MAC2IO_REFOUT 89
+#define SCLK_MAC2IO_REF90
+#define SCLK_MAC2IO_OUT91
+#define SCLK_TSP   92
+#define SCLK_HSADC_TSP 93
+#define SCLK_USB3PHY_REF   94
+#define SCLK_REF_USB3OTG   95
+#define SCLK_USB3OTG_REF   96
+#define SCLK_USB3OTG_SUSPEND   97
+#define SCLK_REF_USB3OTG_SRC   98
+#define SCLK_MAC2IO_SRC99
+
+/* dclk gates */
+#define DCLK_LCDC  180
+#define DCLK_HDMIPHY   181
+#define HDMIPHY182
+#define USB480M183
+#define DCLK_LCDC_SRC  184
+
+/* aclk gates */
+#define ACLK_AXISRAM   190
+#define ACLK_VOP_PRE   191
+#define ACLK_USB3OTG   192
+#define ACLK_RGA_PRE   193
+#define ACLK_DMAC  194
+#define ACLK_GPU   195
+#define ACLK_BUS_PRE   196
+#define ACLK_PERI_PRE  197
+#define ACLK_RKVDEC_PRE198
+#define ACLK_RKVDEC199
+#define ACLK_RKVENC200
+#define ACLK_VPU_PRE   201
+#define ACLK_VIO_PRE   202
+#define ACLK_VPU   203
+#define ACLK_VIO   204
+#define ACLK_VOP   205
+#define ACLK_GMAC  206
+#define ACLK_H265  207
+#define ACLK_H264  208

[PATCH v3 0/4] clk: rockchip: support clk controller for rk3328 SoC

2016-12-25 Thread Elaine Zhang

Changes in v3:
  fix up the pll type pll_rk3328 description and use.
Changes in v2:
  add bindings for rk3328 clock controller

Elaine Zhang (4):
  clk: rockchip: add dt-binding header for rk3328
  dt-bindings: add bindings for rk3328 clock controller
  clk: rockchip: add clock controller for rk3328
  clk: rockchip: add new pll-type for rk3328

 .../bindings/clock/rockchip,rk3328-cru.txt |   57 ++
 drivers/clk/rockchip/Makefile  |1 +
 drivers/clk/rockchip/clk-pll.c |   16 +-
 drivers/clk/rockchip/clk-rk3328.c  | 1068 
 drivers/clk/rockchip/clk.h |   23 +
 include/dt-bindings/clock/rk3328-cru.h |  403 
 6 files changed, 1565 insertions(+), 3 deletions(-)
 create mode 100644 
Documentation/devicetree/bindings/clock/rockchip,rk3328-cru.txt
 create mode 100644 drivers/clk/rockchip/clk-rk3328.c
 create mode 100644 include/dt-bindings/clock/rk3328-cru.h

-- 
1.9.1

[PATCH v3 1/4] clk: rockchip: add dt-binding header for rk3328

2016-12-25 Thread Elaine Zhang

Add the dt-bindings header for the rk3328, that gets shared between
the clock controller and the clock references in the dts.
Add softreset ID for rk3328.

Signed-off-by: Elaine Zhang 
---
 include/dt-bindings/clock/rk3328-cru.h | 403 +
 1 file changed, 403 insertions(+)
 create mode 100644 include/dt-bindings/clock/rk3328-cru.h

diff --git a/include/dt-bindings/clock/rk3328-cru.h 
b/include/dt-bindings/clock/rk3328-cru.h
new file mode 100644
index ..545ed7541316
--- /dev/null
+++ b/include/dt-bindings/clock/rk3328-cru.h
@@ -0,0 +1,403 @@
+/*
+ * Copyright (c) 2016 Rockchip Electronics Co. Ltd.
+ * Author: Elaine 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef _DT_BINDINGS_CLK_ROCKCHIP_RK3328_H
+#define _DT_BINDINGS_CLK_ROCKCHIP_RK3328_H
+
+/* core clocks */
+#define PLL_APLL   1
+#define PLL_DPLL   2
+#define PLL_CPLL   3
+#define PLL_GPLL   4
+#define PLL_NPLL   5
+#define ARMCLK 6
+
+/* sclk gates (special clocks) */
+#define SCLK_RTC32K30
+#define SCLK_SDMMC_EXT 31
+#define SCLK_SPI   32
+#define SCLK_SDMMC 33
+#define SCLK_SDIO  34
+#define SCLK_EMMC  35
+#define SCLK_TSADC 36
+#define SCLK_SARADC37
+#define SCLK_UART0 38
+#define SCLK_UART1 39
+#define SCLK_UART2 40
+#define SCLK_I2S0  41
+#define SCLK_I2S1  42
+#define SCLK_I2S2  43
+#define SCLK_I2S1_OUT  44
+#define SCLK_I2S2_OUT  45
+#define SCLK_SPDIF 46
+#define SCLK_TIMER047
+#define SCLK_TIMER148
+#define SCLK_TIMER249
+#define SCLK_TIMER350
+#define SCLK_TIMER451
+#define SCLK_TIMER552
+#define SCLK_WIFI  53
+#define SCLK_CIF_OUT   54
+#define SCLK_I2C0  55
+#define SCLK_I2C1  56
+#define SCLK_I2C2  57
+#define SCLK_I2C3  58
+#define SCLK_CRYPTO59
+#define SCLK_PWM   60
+#define SCLK_PDM   61
+#define SCLK_EFUSE 62
+#define SCLK_OTP   63
+#define SCLK_DDRCLK64
+#define SCLK_VDEC_CABAC65
+#define SCLK_VDEC_CORE 66
+#define SCLK_VENC_DSP  67
+#define SCLK_VENC_CORE 68
+#define SCLK_RGA   69
+#define SCLK_HDMI_SFC  70
+#define SCLK_HDMI_CEC  71
+#define SCLK_USB3_REF  72
+#define SCLK_USB3_SUSPEND  73
+#define SCLK_SDMMC_DRV 74
+#define SCLK_SDIO_DRV  75
+#define SCLK_EMMC_DRV  76
+#define SCLK_SDMMC_EXT_DRV 77
+#define SCLK_SDMMC_SAMPLE  78
+#define SCLK_SDIO_SAMPLE   79
+#define SCLK_EMMC_SAMPLE   80
+#define SCLK_SDMMC_EXT_SAMPLE  81
+#define SCLK_VOP   82
+#define SCLK_MAC2PHY_RXTX  83
+#define SCLK_MAC2PHY_SRC   84
+#define SCLK_MAC2PHY_REF   85
+#define SCLK_MAC2PHY_OUT   86
+#define SCLK_MAC2IO_RX 87
+#define SCLK_MAC2IO_TX 88
+#define SCLK_MAC2IO_REFOUT 89
+#define SCLK_MAC2IO_REF90
+#define SCLK_MAC2IO_OUT91
+#define SCLK_TSP   92
+#define SCLK_HSADC_TSP 93
+#define SCLK_USB3PHY_REF   94
+#define SCLK_REF_USB3OTG   95
+#define SCLK_USB3OTG_REF   96
+#define SCLK_USB3OTG_SUSPEND   97
+#define SCLK_REF_USB3OTG_SRC   98
+#define SCLK_MAC2IO_SRC99
+
+/* dclk gates */
+#define DCLK_LCDC  180
+#define DCLK_HDMIPHY   181
+#define HDMIPHY182
+#define USB480M183
+#define DCLK_LCDC_SRC  184
+
+/* aclk gates */
+#define ACLK_AXISRAM   190
+#define ACLK_VOP_PRE   191
+#define ACLK_USB3OTG   192
+#define ACLK_RGA_PRE   193
+#define ACLK_DMAC  194
+#define ACLK_GPU   195
+#define ACLK_BUS_PRE   196
+#define ACLK_PERI_PRE  197
+#define ACLK_RKVDEC_PRE198
+#define ACLK_RKVDEC199
+#define ACLK_RKVENC200
+#define ACLK_VPU_PRE   201
+#define ACLK_VIO_PRE   202
+#define ACLK_VPU   203
+#define ACLK_VIO   204
+#define ACLK_VOP   205
+#define ACLK_GMAC  206
+#define ACLK_H265  207
+#define ACLK_H264  208
+#define ACLK_MAC2PHY   209
+#define

Re: [PATCH v2 0/4] vfio-mdev: Fix remove race, clean namespace and better define ABI

2016-12-25 Thread Kirti Wankhede


On 12/23/2016 1:51 AM, Alex Williamson wrote:
> Cleanup the namespace a bit by prefixing structures with mdev_ and
> also more concretely define the mdev interface.  Structs with comments
> defining which fields are private vs public tempts poor behavior,
> especially for an interface where we expect out of tree vendor drivers.
> 

Patch 2/4-4/4 looks good to me.

Reviewed by: Kirti Wankhede 

Thanks,
Kirti

> Additionally in v2, the patch removing the next field from mdev_device
> is dropped, instead using it to fix a remove race, my From address is
> fixed, and Documentation is updated.  Jike, I left your R-b on the
> patches that didn't change only.  If I've missed any relevant doc
> updates, please let me know.  Thanks,
> 
> Alex
> 
> ---
> 
> Alex Williamson (4):
>   vfio-mdev: Fix remove race
>   vfio-mdev: de-polute the namespace, rename parent_device & parent_ops
>   vfio-mdev: Make mdev_parent private
>   vfio-mdev: Make mdev_device private and abstract interfaces
> 
> 
>  Documentation/vfio-mediated-device.txt |   27 +
>  drivers/gpu/drm/i915/gvt/kvmgt.c   |   22 ---
>  drivers/vfio/mdev/mdev_core.c  |  100 
> +++-
>  drivers/vfio/mdev/mdev_private.h   |   29 -
>  drivers/vfio/mdev/mdev_sysfs.c |8 +--
>  drivers/vfio/mdev/vfio_mdev.c  |   12 ++--
>  include/linux/mdev.h   |   54 -
>  samples/vfio-mdev/mtty.c   |   28 +
>  8 files changed, 173 insertions(+), 107 deletions(-)
>

Re: [PATCH v2 0/4] vfio-mdev: Fix remove race, clean namespace and better define ABI

2016-12-25 Thread Kirti Wankhede


On 12/23/2016 1:51 AM, Alex Williamson wrote:
> Cleanup the namespace a bit by prefixing structures with mdev_ and
> also more concretely define the mdev interface.  Structs with comments
> defining which fields are private vs public tempts poor behavior,
> especially for an interface where we expect out of tree vendor drivers.
> 

Patch 2/4-4/4 looks good to me.

Reviewed by: Kirti Wankhede 

Thanks,
Kirti

> Additionally in v2, the patch removing the next field from mdev_device
> is dropped, instead using it to fix a remove race, my From address is
> fixed, and Documentation is updated.  Jike, I left your R-b on the
> patches that didn't change only.  If I've missed any relevant doc
> updates, please let me know.  Thanks,
> 
> Alex
> 
> ---
> 
> Alex Williamson (4):
>   vfio-mdev: Fix remove race
>   vfio-mdev: de-polute the namespace, rename parent_device & parent_ops
>   vfio-mdev: Make mdev_parent private
>   vfio-mdev: Make mdev_device private and abstract interfaces
> 
> 
>  Documentation/vfio-mediated-device.txt |   27 +
>  drivers/gpu/drm/i915/gvt/kvmgt.c   |   22 ---
>  drivers/vfio/mdev/mdev_core.c  |  100 
> +++-
>  drivers/vfio/mdev/mdev_private.h   |   29 -
>  drivers/vfio/mdev/mdev_sysfs.c |8 +--
>  drivers/vfio/mdev/vfio_mdev.c  |   12 ++--
>  include/linux/mdev.h   |   54 -
>  samples/vfio-mdev/mtty.c   |   28 +
>  8 files changed, 173 insertions(+), 107 deletions(-)
>

Re: [PATCH v2 1/4] vfio-mdev: Fix remove race

2016-12-25 Thread Kirti Wankhede



On 12/26/2016 1:10 AM, Alex Williamson wrote:
> On Sun, 25 Dec 2016 22:39:47 +0530
> Kirti Wankhede  wrote:
> 
>> On 12/23/2016 1:51 AM, Alex Williamson wrote:
>>> Using the mtty mdev sample driver we can generate a remove race by
>>> starting one shell that continuously creates mtty devices and several
>>> other shells all attempting to remove devices, in my case four remove
>>> shells.  The fault occurs in mdev_remove_sysfs_files() where the
>>> passed type arg is NULL, which suggests we've received a struct device
>>> in mdev_device_remove() but it's in some sort of teardown state.  The
>>> solution here is to make use of the accidentally unused list_head on
>>> the mdev_device such that the mdev core keeps a list of all the mdev
>>> devices.  This allows us to validate that we have a valid mdev before
>>> we start removal, remove it from the list to prevent others from
>>> working on it, and if the vendor driver refuses to remove, we can
>>> re-add it to the list.
>>>   
>>
>> Alex,
>>
>> Writing 1 on 'remove' first removes itself, i.e. calls
>> device_remove_file_self(dev, attr). So if the file is removed then
>> device_remove_file_self() should return false, isn't that returns false?
>> kernfs_remove_self() hold the mutex that should handle this condition.
> 
> In theory, I agree.  In practice I was able to generate the race
> described.  We're getting through to call mdev_device_remove with
> a struct device that resolves to an mdev where the type_kobj is
> NULL, presumably it's been freed.  Maybe there's a better fix
> within kernfs, but this sanitizes the mdev on our end to resolve
> it.  To see the issue, simply run 'while true; do uuidgen >
> create; done', then from a few other shells loop finding mdev
> devices and remove any that are found.  Set dmesg to only print
> critical messages or else it'll slow create and delete to the
> point where it'll be difficult to get the race.  Thanks,
> 

I see. pci-sysfs too uses mutex around its remove function even after
device_remove_file_self() returned true. Yes, probably kernfs might have
better fix.
This change looks good to me.

Thanks,
Kirti

Re: [PATCH v2 1/4] vfio-mdev: Fix remove race

2016-12-25 Thread Kirti Wankhede



On 12/26/2016 1:10 AM, Alex Williamson wrote:
> On Sun, 25 Dec 2016 22:39:47 +0530
> Kirti Wankhede  wrote:
> 
>> On 12/23/2016 1:51 AM, Alex Williamson wrote:
>>> Using the mtty mdev sample driver we can generate a remove race by
>>> starting one shell that continuously creates mtty devices and several
>>> other shells all attempting to remove devices, in my case four remove
>>> shells.  The fault occurs in mdev_remove_sysfs_files() where the
>>> passed type arg is NULL, which suggests we've received a struct device
>>> in mdev_device_remove() but it's in some sort of teardown state.  The
>>> solution here is to make use of the accidentally unused list_head on
>>> the mdev_device such that the mdev core keeps a list of all the mdev
>>> devices.  This allows us to validate that we have a valid mdev before
>>> we start removal, remove it from the list to prevent others from
>>> working on it, and if the vendor driver refuses to remove, we can
>>> re-add it to the list.
>>>   
>>
>> Alex,
>>
>> Writing 1 on 'remove' first removes itself, i.e. calls
>> device_remove_file_self(dev, attr). So if the file is removed then
>> device_remove_file_self() should return false, isn't that returns false?
>> kernfs_remove_self() hold the mutex that should handle this condition.
> 
> In theory, I agree.  In practice I was able to generate the race
> described.  We're getting through to call mdev_device_remove with
> a struct device that resolves to an mdev where the type_kobj is
> NULL, presumably it's been freed.  Maybe there's a better fix
> within kernfs, but this sanitizes the mdev on our end to resolve
> it.  To see the issue, simply run 'while true; do uuidgen >
> create; done', then from a few other shells loop finding mdev
> devices and remove any that are found.  Set dmesg to only print
> critical messages or else it'll slow create and delete to the
> point where it'll be difficult to get the race.  Thanks,
> 

I see. pci-sysfs too uses mutex around its remove function even after
device_remove_file_self() returned true. Yes, probably kernfs might have
better fix.
This change looks good to me.

Thanks,
Kirti

Re: [PATCH] nohz: Fix collision between tick and other hrtimers

2016-12-25 Thread Rik van Riel

On Sat, 2016-12-24 at 17:15 +0100, Frederic Weisbecker wrote:
> When the tick is stopped and an interrupt occurs afterward, we check
> on
> that interrupt exit if the next tick needs to be rescheduled. If it
> doesn't need any update, we don't want to do anything.
> 
> In order to check if the tick needs an update, we compare it against
> the
> clockevent device deadline. Now that's a problem because the
> clockevent
> device is at a lower level than the tick itself if it is implemented
> on top of hrtimer.

Oh, good find. That is one subtle bug.

Acked-by: Rik van Riel 

-- 
All Rights Reversed.

signature.asc
Description: This is a digitally signed message part

Re: [PATCH] nohz: Fix collision between tick and other hrtimers

2016-12-25 Thread Rik van Riel

On Sat, 2016-12-24 at 17:15 +0100, Frederic Weisbecker wrote:
> When the tick is stopped and an interrupt occurs afterward, we check
> on
> that interrupt exit if the next tick needs to be rescheduled. If it
> doesn't need any update, we don't want to do anything.
> 
> In order to check if the tick needs an update, we compare it against
> the
> clockevent device deadline. Now that's a problem because the
> clockevent
> device is at a lower level than the tick itself if it is implemented
> on top of hrtimer.

Oh, good find. That is one subtle bug.

Acked-by: Rik van Riel 

-- 
All Rights Reversed.

signature.asc
Description: This is a digitally signed message part

Re: [PATCH net 0/9] several fixups for virtio-net XDP

2016-12-25 Thread Jason Wang




On 2016年12月24日 01:10, John Fastabend wrote:

On 16-12-23 06:37 AM, Jason Wang wrote:

Merry Xmas and a Happy New year to all:

This series tries to fixes several issues for virtio-net XDP which
could be categorized into several parts:

- fix several issues during XDP linearizing
- allow csumed packet to work for XDP_PASS
- make EWMA rxbuf size estimation works for XDP
- forbid XDP when GUEST_UFO is support
- remove big packet XDP support
- add XDP support or small buffer

Please see individual patches for details.

Thanks

Jason Wang (9):
   virtio-net: remove the warning before XDP linearizing
   virtio-net: correctly xmit linearized page on XDP_TX
   virtio-net: fix page miscount during XDP linearizing
   virtio-net: correctly handle XDP_PASS for linearized packets
   virtio-net: unbreak csumed packets for XDP_PASS
   virtio-net: make rx buf size estimation works for XDP
   virtio-net: forbid XDP when VIRTIO_NET_F_GUEST_UFO is support
   virtio-net: remove big packet XDP codes
   virtio-net: XDP support for small buffers

  drivers/net/virtio_net.c | 172 ---
  1 file changed, 102 insertions(+), 70 deletions(-)


Thanks a lot Jason. The last piece that is needed is support to
complete XDP support is to get the adjust_head part correct. I'll
send out a patch in a bit but will need to merge it on top of this
set.

.John


Yes, glad to see the your patch.

Thanks.

Re: [PATCH net 0/9] several fixups for virtio-net XDP

2016-12-25 Thread Jason Wang




On 2016年12月24日 01:10, John Fastabend wrote:

On 16-12-23 06:37 AM, Jason Wang wrote:

Merry Xmas and a Happy New year to all:

This series tries to fixes several issues for virtio-net XDP which
could be categorized into several parts:

- fix several issues during XDP linearizing
- allow csumed packet to work for XDP_PASS
- make EWMA rxbuf size estimation works for XDP
- forbid XDP when GUEST_UFO is support
- remove big packet XDP support
- add XDP support or small buffer

Please see individual patches for details.

Thanks

Jason Wang (9):
   virtio-net: remove the warning before XDP linearizing
   virtio-net: correctly xmit linearized page on XDP_TX
   virtio-net: fix page miscount during XDP linearizing
   virtio-net: correctly handle XDP_PASS for linearized packets
   virtio-net: unbreak csumed packets for XDP_PASS
   virtio-net: make rx buf size estimation works for XDP
   virtio-net: forbid XDP when VIRTIO_NET_F_GUEST_UFO is support
   virtio-net: remove big packet XDP codes
   virtio-net: XDP support for small buffers

  drivers/net/virtio_net.c | 172 ---
  1 file changed, 102 insertions(+), 70 deletions(-)


Thanks a lot Jason. The last piece that is needed is support to
complete XDP support is to get the adjust_head part correct. I'll
send out a patch in a bit but will need to merge it on top of this
set.

.John


Yes, glad to see the your patch.

Thanks.

Re: [PATCH net 7/9] virtio-net: forbid XDP when VIRTIO_NET_F_GUEST_UFO is support

2016-12-25 Thread Jason Wang




On 2016年12月24日 00:10, John Fastabend wrote:

On 16-12-23 08:02 AM, John Fastabend wrote:

On 16-12-23 06:37 AM, Jason Wang wrote:

When VIRTIO_NET_F_GUEST_UFO is negotiated, host could still send UFO
packet that exceeds a single page which could not be handled
correctly by XDP. So this patch forbids setting XDP when GUEST_UFO is
supported. While at it, forbid XDP for ECN (which comes only from GRO)
too to prevent user from misconfiguration.


Is sending packets greater than single page though normal in this case?


Yes, when NETIF_F_UFO was enabled for tap, it won't segment UFO packet 
and will send it directly to guest. (This could be reproduced with 
UDP_STREAM between two guests or host to guest).


Thanks


I don't have any need to support big packet mode other than MST asked
for it. And I wasn't seeing this in my tests. MTU is capped at 4k - hdr
when XDP is enabled.

.John


Cc: John Fastabend 
Signed-off-by: Jason Wang 
---
  drivers/net/virtio_net.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 77ae358..c1f66d8 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1684,7 +1684,9 @@ static int virtnet_xdp_set(struct net_device *dev, struct 
bpf_prog *prog)
int i, err;
  
  	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO4) ||

-   virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO6)) {
+   virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO6) ||
+   virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_ECN) ||
+   virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_UFO)) {
netdev_warn(dev, "can't set XDP while host is implementing LRO, 
disable LRO first\n");
return -EOPNOTSUPP;
}


Acked-by: John Fastabend

Re: [PATCH net 7/9] virtio-net: forbid XDP when VIRTIO_NET_F_GUEST_UFO is support

2016-12-25 Thread Jason Wang




On 2016年12月24日 00:10, John Fastabend wrote:

On 16-12-23 08:02 AM, John Fastabend wrote:

On 16-12-23 06:37 AM, Jason Wang wrote:

When VIRTIO_NET_F_GUEST_UFO is negotiated, host could still send UFO
packet that exceeds a single page which could not be handled
correctly by XDP. So this patch forbids setting XDP when GUEST_UFO is
supported. While at it, forbid XDP for ECN (which comes only from GRO)
too to prevent user from misconfiguration.


Is sending packets greater than single page though normal in this case?


Yes, when NETIF_F_UFO was enabled for tap, it won't segment UFO packet 
and will send it directly to guest. (This could be reproduced with 
UDP_STREAM between two guests or host to guest).


Thanks


I don't have any need to support big packet mode other than MST asked
for it. And I wasn't seeing this in my tests. MTU is capped at 4k - hdr
when XDP is enabled.

.John


Cc: John Fastabend 
Signed-off-by: Jason Wang 
---
  drivers/net/virtio_net.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 77ae358..c1f66d8 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1684,7 +1684,9 @@ static int virtnet_xdp_set(struct net_device *dev, struct 
bpf_prog *prog)
int i, err;
  
  	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO4) ||

-   virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO6)) {
+   virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO6) ||
+   virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_ECN) ||
+   virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_UFO)) {
netdev_warn(dev, "can't set XDP while host is implementing LRO, 
disable LRO first\n");
return -EOPNOTSUPP;
}


Acked-by: John Fastabend

[PATCH] clk: uniphier: remove unneeded #include

2016-12-25 Thread Masahiro Yamada

This include was needed to suppress build error when this driver
was initially merged because  did not include
 at that time.  (developers' headache across
sub-systems)

The root cause has been fixed by commit adf08d481b52 ("regmap:
include  from include/linux/regmap.h"), so this
line can be dropped now.

Signed-off-by: Masahiro Yamada 
---

This patch must be applied on v4.10-rc1 or later
so that you can see both commit d08f1f0d596c and adf08d481b52.


 drivers/clk/uniphier/clk-uniphier-cpugear.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/clk/uniphier/clk-uniphier-cpugear.c 
b/drivers/clk/uniphier/clk-uniphier-cpugear.c
index 9bff26e..ec11f55 100644
--- a/drivers/clk/uniphier/clk-uniphier-cpugear.c
+++ b/drivers/clk/uniphier/clk-uniphier-cpugear.c
@@ -14,7 +14,6 @@
  */
 
 #include 
-#include 
 #include 
 #include 
 
-- 
2.7.4

[PATCH] clk: uniphier: remove unneeded #include

2016-12-25 Thread Masahiro Yamada

This include was needed to suppress build error when this driver
was initially merged because  did not include
 at that time.  (developers' headache across
sub-systems)

The root cause has been fixed by commit adf08d481b52 ("regmap:
include  from include/linux/regmap.h"), so this
line can be dropped now.

Signed-off-by: Masahiro Yamada 
---

This patch must be applied on v4.10-rc1 or later
so that you can see both commit d08f1f0d596c and adf08d481b52.


 drivers/clk/uniphier/clk-uniphier-cpugear.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/clk/uniphier/clk-uniphier-cpugear.c 
b/drivers/clk/uniphier/clk-uniphier-cpugear.c
index 9bff26e..ec11f55 100644
--- a/drivers/clk/uniphier/clk-uniphier-cpugear.c
+++ b/drivers/clk/uniphier/clk-uniphier-cpugear.c
@@ -14,7 +14,6 @@
  */
 
 #include 
-#include 
 #include 
 #include 
 
-- 
2.7.4

Re: [PATCH net 4/9] virtio-net: correctly handle XDP_PASS for linearized packets

2016-12-25 Thread Jason Wang




On 2016年12月23日 23:57, John Fastabend wrote:

On 16-12-23 06:37 AM, Jason Wang wrote:

When XDP_PASS were determined for linearized packets, we try to get
new buffers in the virtqueue and build skbs from them. This is wrong,
we should create skbs based on existed buffers instead. Fixing them by
creating skb based on xdp_page.

With this patch "ping 192.168.100.4 -s 3900 -M do" works for XDP_PASS.

Cc: John Fastabend 
Signed-off-by: Jason Wang 
---
  drivers/net/virtio_net.c | 10 --
  1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 58ad40e..470293e 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -578,8 +578,14 @@ static struct sk_buff *receive_mergeable(struct net_device 
*dev,
act = do_xdp_prog(vi, rq, xdp_prog, xdp_page, offset, len);
switch (act) {
case XDP_PASS:
-   if (unlikely(xdp_page != page))
-   __free_pages(xdp_page, 0);
+   /* We can only create skb based on xdp_page. */
+   if (unlikely(xdp_page != page)) {
+   rcu_read_unlock();
+   put_page(page);
+   head_skb = page_to_skb(vi, rq, xdp_page,
+  0, len, PAGE_SIZE);
+   return head_skb;
+   }
break;
case XDP_TX:
if (unlikely(xdp_page != page))


Great thanks. This was likely working before because of the memory
leak fixed in 3/9.


Looks not, without this and 3/9 the code will try to get buffers and 
build skb for a new packet instead of existed buffers.


Thanks



Acked-by: John Fastabend

Re: [PATCH net 4/9] virtio-net: correctly handle XDP_PASS for linearized packets

2016-12-25 Thread Jason Wang




On 2016年12月23日 23:57, John Fastabend wrote:

On 16-12-23 06:37 AM, Jason Wang wrote:

When XDP_PASS were determined for linearized packets, we try to get
new buffers in the virtqueue and build skbs from them. This is wrong,
we should create skbs based on existed buffers instead. Fixing them by
creating skb based on xdp_page.

With this patch "ping 192.168.100.4 -s 3900 -M do" works for XDP_PASS.

Cc: John Fastabend 
Signed-off-by: Jason Wang 
---
  drivers/net/virtio_net.c | 10 --
  1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 58ad40e..470293e 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -578,8 +578,14 @@ static struct sk_buff *receive_mergeable(struct net_device 
*dev,
act = do_xdp_prog(vi, rq, xdp_prog, xdp_page, offset, len);
switch (act) {
case XDP_PASS:
-   if (unlikely(xdp_page != page))
-   __free_pages(xdp_page, 0);
+   /* We can only create skb based on xdp_page. */
+   if (unlikely(xdp_page != page)) {
+   rcu_read_unlock();
+   put_page(page);
+   head_skb = page_to_skb(vi, rq, xdp_page,
+  0, len, PAGE_SIZE);
+   return head_skb;
+   }
break;
case XDP_TX:
if (unlikely(xdp_page != page))


Great thanks. This was likely working before because of the memory
leak fixed in 3/9.


Looks not, without this and 3/9 the code will try to get buffers and 
build skb for a new packet instead of existed buffers.


Thanks



Acked-by: John Fastabend

Re: [PATCH net 3/9] virtio-net: fix page miscount during XDP linearizing

2016-12-25 Thread Jason Wang




On 2016年12月23日 23:54, John Fastabend wrote:

On 16-12-23 06:37 AM, Jason Wang wrote:

We don't put page during linearizing, the would cause leaking when
xmit through XDP_TX or the packet exceeds PAGE_SIZE. Fix them by
put page accordingly. Also decrease the number of buffers during
linearizing to make sure caller can free buffers correctly when packet
exceeds PAGE_SIZE. With this patch, we won't get OOM after linearize
huge number of packets.

Cc: John Fastabend 
Signed-off-by: Jason Wang 
---

Thanks! looks good. By the way do you happen to have any actual
configuration where this path is hit? I obviously didn't test this
very long other than a quick test with my hacked vhost driver.

Acked-by: John Fastabend 


Yes, I have. Just increase the MTU above 1500 for both virtio and tap 
and produce some traffic with size which will lead underestimated of rxbuf.


Thanks

Re: [PATCH net 3/9] virtio-net: fix page miscount during XDP linearizing

2016-12-25 Thread Jason Wang




On 2016年12月23日 23:54, John Fastabend wrote:

On 16-12-23 06:37 AM, Jason Wang wrote:

We don't put page during linearizing, the would cause leaking when
xmit through XDP_TX or the packet exceeds PAGE_SIZE. Fix them by
put page accordingly. Also decrease the number of buffers during
linearizing to make sure caller can free buffers correctly when packet
exceeds PAGE_SIZE. With this patch, we won't get OOM after linearize
huge number of packets.

Cc: John Fastabend 
Signed-off-by: Jason Wang 
---

Thanks! looks good. By the way do you happen to have any actual
configuration where this path is hit? I obviously didn't test this
very long other than a quick test with my hacked vhost driver.

Acked-by: John Fastabend 


Yes, I have. Just increase the MTU above 1500 for both virtio and tap 
and produce some traffic with size which will lead underestimated of rxbuf.


Thanks

Re: [PATCH] x86/e820: make e820_search_gap() static and remove unused variables

2016-12-25 Thread Yinghai Lu

On Sun, Dec 25, 2016 at 02:35:51PM +, Wei Yang wrote:
> e820_search_gap() is just used locally now and the start_addr and end_addr
> is fixed. Also gapstart is not checked in this function.
> 
> The patch makes e820_search_gap() static and remove those unused variables.
> 
> Signed-off-by: Wei Yang 

Acked-by: Yinghai Lu

Re: [PATCH] x86/e820: make e820_search_gap() static and remove unused variables

2016-12-25 Thread Yinghai Lu

On Sun, Dec 25, 2016 at 02:35:51PM +, Wei Yang wrote:
> e820_search_gap() is just used locally now and the start_addr and end_addr
> is fixed. Also gapstart is not checked in this function.
> 
> The patch makes e820_search_gap() static and remove those unused variables.
> 
> Signed-off-by: Wei Yang 

Acked-by: Yinghai Lu

[PATCH] arm64: defconfig: enable CONFIG_MMC_SDHCI_CADENCE

2016-12-25 Thread Masahiro Yamada

Enable the Cadence SD/SDIO/eMMC controller.  This is used on
Socionext UniPhier SoC family.

Signed-off-by: Masahiro Yamada 
---

 arch/arm64/configs/defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index 0888cab..23045b4 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -384,6 +384,7 @@ CONFIG_MMC_SDHCI_ACPI=y
 CONFIG_MMC_SDHCI_PLTFM=y
 CONFIG_MMC_SDHCI_OF_ARASAN=y
 CONFIG_MMC_SDHCI_OF_ESDHC=y
+CONFIG_MMC_SDHCI_CADENCE=y
 CONFIG_MMC_SDHCI_TEGRA=y
 CONFIG_MMC_SDHCI_MSM=y
 CONFIG_MMC_SPI=y
-- 
2.7.4

[PATCH] arm64: defconfig: enable CONFIG_MMC_SDHCI_CADENCE

2016-12-25 Thread Masahiro Yamada

Enable the Cadence SD/SDIO/eMMC controller.  This is used on
Socionext UniPhier SoC family.

Signed-off-by: Masahiro Yamada 
---

 arch/arm64/configs/defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index 0888cab..23045b4 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -384,6 +384,7 @@ CONFIG_MMC_SDHCI_ACPI=y
 CONFIG_MMC_SDHCI_PLTFM=y
 CONFIG_MMC_SDHCI_OF_ARASAN=y
 CONFIG_MMC_SDHCI_OF_ESDHC=y
+CONFIG_MMC_SDHCI_CADENCE=y
 CONFIG_MMC_SDHCI_TEGRA=y
 CONFIG_MMC_SDHCI_MSM=y
 CONFIG_MMC_SPI=y
-- 
2.7.4

fs/crypto/crypto.c:360:6: error: dereferencing pointer to incomplete type

2016-12-25 Thread kbuild test robot

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   7ce7d89f48834cefece7804d38fc5d85382edf77
commit: d475a507457b5cafa428871a473d0dcc828c5f68 ubifs: Add skeleton for 
fscrypto
date:   13 days ago
config: x86_64-randconfig-n0-12260905 (attached as .config)
compiler: gcc-4.8 (Debian 4.8.4-1) 4.8.4
reproduce:
git checkout d475a507457b5cafa428871a473d0dcc828c5f68
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   fs/crypto/crypto.c: In function 'fscrypt_zeroout_range':
   fs/crypto/crypto.c:355:3: error: implicit declaration of function 
'bio_alloc' [-Werror=implicit-function-declaration]
  bio = bio_alloc(GFP_NOWAIT, 1);
  ^
   fs/crypto/crypto.c:355:7: warning: assignment makes pointer from integer 
without a cast [enabled by default]
  bio = bio_alloc(GFP_NOWAIT, 1);
  ^
>> fs/crypto/crypto.c:360:6: error: dereferencing pointer to incomplete type
  bio->bi_bdev = inode->i_sb->s_bdev;
 ^
   fs/crypto/crypto.c:361:6: error: dereferencing pointer to incomplete type
  bio->bi_iter.bi_sector =
 ^
   fs/crypto/crypto.c:363:3: error: implicit declaration of function 
'bio_set_op_attrs' [-Werror=implicit-function-declaration]
  bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
  ^
   fs/crypto/crypto.c:364:3: error: implicit declaration of function 
'bio_add_page' [-Werror=implicit-function-declaration]
  ret = bio_add_page(bio, ciphertext_page,
  ^
   fs/crypto/crypto.c:369:4: error: implicit declaration of function 'bio_put' 
[-Werror=implicit-function-declaration]
   bio_put(bio);
   ^
   fs/crypto/crypto.c:373:3: error: implicit declaration of function 
'submit_bio_wait' [-Werror=implicit-function-declaration]
  err = submit_bio_wait(bio);
  ^
   fs/crypto/crypto.c:374:24: error: dereferencing pointer to incomplete type
  if ((err == 0) && bio->bi_error)
   ^
   fs/crypto/crypto.c: In function 'completion_pages':
   fs/crypto/crypto.c:457:2: error: implicit declaration of function 
'bio_for_each_segment_all' [-Werror=implicit-function-declaration]
 bio_for_each_segment_all(bv, bio, i) {
 ^
   fs/crypto/crypto.c:457:39: error: expected ';' before '{' token
 bio_for_each_segment_all(bv, bio, i) {
  ^
   cc1: some warnings being treated as errors

vim +360 fs/crypto/crypto.c

0b81d077 Jaegeuk Kim   2015-05-15  349  err = 
do_page_crypto(inode, FS_ENCRYPT, lblk,
b32e4482 Jaegeuk Kim   2016-04-11  350  
ZERO_PAGE(0), ciphertext_page,
7821d4dd David Gstir   2016-11-13  351  
PAGE_SIZE, 0, GFP_NOFS);
0b81d077 Jaegeuk Kim   2015-05-15  352  if (err)
0b81d077 Jaegeuk Kim   2015-05-15  353  goto errout;
0b81d077 Jaegeuk Kim   2015-05-15  354  
b32e4482 Jaegeuk Kim   2016-04-11 @355  bio = 
bio_alloc(GFP_NOWAIT, 1);
0b81d077 Jaegeuk Kim   2015-05-15  356  if (!bio) {
0b81d077 Jaegeuk Kim   2015-05-15  357  err = -ENOMEM;
0b81d077 Jaegeuk Kim   2015-05-15  358  goto errout;
0b81d077 Jaegeuk Kim   2015-05-15  359  }
0b81d077 Jaegeuk Kim   2015-05-15 @360  bio->bi_bdev = 
inode->i_sb->s_bdev;
0b81d077 Jaegeuk Kim   2015-05-15  361  bio->bi_iter.bi_sector =
0b81d077 Jaegeuk Kim   2015-05-15  362  pblk << 
(inode->i_sb->s_blocksize_bits - 9);
95fe6c1a Mike Christie 2016-06-05  363  bio_set_op_attrs(bio, 
REQ_OP_WRITE, 0);

:: The code at line 360 was first introduced by commit
:: 0b81d0779072696371822e5ed9e7c6292e547024 fs crypto: move per-file 
encryption from f2fs tree to fs/crypto

:: TO: Jaegeuk Kim 
:: CC: Jaegeuk Kim 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

fs/crypto/crypto.c:360:6: error: dereferencing pointer to incomplete type

2016-12-25 Thread kbuild test robot

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   7ce7d89f48834cefece7804d38fc5d85382edf77
commit: d475a507457b5cafa428871a473d0dcc828c5f68 ubifs: Add skeleton for 
fscrypto
date:   13 days ago
config: x86_64-randconfig-n0-12260905 (attached as .config)
compiler: gcc-4.8 (Debian 4.8.4-1) 4.8.4
reproduce:
git checkout d475a507457b5cafa428871a473d0dcc828c5f68
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   fs/crypto/crypto.c: In function 'fscrypt_zeroout_range':
   fs/crypto/crypto.c:355:3: error: implicit declaration of function 
'bio_alloc' [-Werror=implicit-function-declaration]
  bio = bio_alloc(GFP_NOWAIT, 1);
  ^
   fs/crypto/crypto.c:355:7: warning: assignment makes pointer from integer 
without a cast [enabled by default]
  bio = bio_alloc(GFP_NOWAIT, 1);
  ^
>> fs/crypto/crypto.c:360:6: error: dereferencing pointer to incomplete type
  bio->bi_bdev = inode->i_sb->s_bdev;
 ^
   fs/crypto/crypto.c:361:6: error: dereferencing pointer to incomplete type
  bio->bi_iter.bi_sector =
 ^
   fs/crypto/crypto.c:363:3: error: implicit declaration of function 
'bio_set_op_attrs' [-Werror=implicit-function-declaration]
  bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
  ^
   fs/crypto/crypto.c:364:3: error: implicit declaration of function 
'bio_add_page' [-Werror=implicit-function-declaration]
  ret = bio_add_page(bio, ciphertext_page,
  ^
   fs/crypto/crypto.c:369:4: error: implicit declaration of function 'bio_put' 
[-Werror=implicit-function-declaration]
   bio_put(bio);
   ^
   fs/crypto/crypto.c:373:3: error: implicit declaration of function 
'submit_bio_wait' [-Werror=implicit-function-declaration]
  err = submit_bio_wait(bio);
  ^
   fs/crypto/crypto.c:374:24: error: dereferencing pointer to incomplete type
  if ((err == 0) && bio->bi_error)
   ^
   fs/crypto/crypto.c: In function 'completion_pages':
   fs/crypto/crypto.c:457:2: error: implicit declaration of function 
'bio_for_each_segment_all' [-Werror=implicit-function-declaration]
 bio_for_each_segment_all(bv, bio, i) {
 ^
   fs/crypto/crypto.c:457:39: error: expected ';' before '{' token
 bio_for_each_segment_all(bv, bio, i) {
  ^
   cc1: some warnings being treated as errors

vim +360 fs/crypto/crypto.c

0b81d077 Jaegeuk Kim   2015-05-15  349  err = 
do_page_crypto(inode, FS_ENCRYPT, lblk,
b32e4482 Jaegeuk Kim   2016-04-11  350  
ZERO_PAGE(0), ciphertext_page,
7821d4dd David Gstir   2016-11-13  351  
PAGE_SIZE, 0, GFP_NOFS);
0b81d077 Jaegeuk Kim   2015-05-15  352  if (err)
0b81d077 Jaegeuk Kim   2015-05-15  353  goto errout;
0b81d077 Jaegeuk Kim   2015-05-15  354  
b32e4482 Jaegeuk Kim   2016-04-11 @355  bio = 
bio_alloc(GFP_NOWAIT, 1);
0b81d077 Jaegeuk Kim   2015-05-15  356  if (!bio) {
0b81d077 Jaegeuk Kim   2015-05-15  357  err = -ENOMEM;
0b81d077 Jaegeuk Kim   2015-05-15  358  goto errout;
0b81d077 Jaegeuk Kim   2015-05-15  359  }
0b81d077 Jaegeuk Kim   2015-05-15 @360  bio->bi_bdev = 
inode->i_sb->s_bdev;
0b81d077 Jaegeuk Kim   2015-05-15  361  bio->bi_iter.bi_sector =
0b81d077 Jaegeuk Kim   2015-05-15  362  pblk << 
(inode->i_sb->s_blocksize_bits - 9);
95fe6c1a Mike Christie 2016-06-05  363  bio_set_op_attrs(bio, 
REQ_OP_WRITE, 0);

:: The code at line 360 was first introduced by commit
:: 0b81d0779072696371822e5ed9e7c6292e547024 fs crypto: move per-file 
encryption from f2fs tree to fs/crypto

:: TO: Jaegeuk Kim 
:: CC: Jaegeuk Kim 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: [PATCH v5 09/14] ACPI: platform: setup MSI domain for ACPI based platform device

2016-12-25 Thread Hanjun Guo

Hi Rafael,

Happy holidays! reply inline.

On 2016/12/26 8:31, Rafael J. Wysocki wrote:
> On Sat, Dec 24, 2016 at 8:34 AM, Hanjun Guo  wrote:
>> Hi Rafael,
>>
>> Thank you for your comments, when I was demoing your suggestion,
>> I got a little bit confusions, please see my comments below.
>>
> [cut]
>
 +
 +/**
   * acpi_create_platform_device - Create platform device for ACPI device 
 node
   * @adev: ACPI device node to create a platform device for.
   * @properties: Optional collection of build-in properties.
 @@ -109,6 +119,7 @@ struct platform_device 
 *acpi_create_platform_device(struct acpi_device *adev,
 pdevinfo.num_res = count;
 pdevinfo.fwnode = acpi_fwnode_handle(adev);
 pdevinfo.properties = properties;
 +   pdevinfo.pre_add_cb = acpi_platform_pre_add_cb;
>>> Why don't you point that directly to acpi_configure_pmsi_domain()?  It
>>> doesn't look like the wrapper is necessary at all.
>> I was thinking that we can add something more in the future
>> if we need to extend the function of the callback, I can just
>> use acpi_configure_pmsi_domain() here.
> So you can add the wrapper in the future just fine as well.  At this
> point it is just redundant.
>
>>> And I'm not sure why the new callback is necessary ->
>> I was demoing your suggestion but...
>>
 if (acpi_dma_supported(adev))
 pdevinfo.dma_mask = DMA_BIT_MASK(32);
 diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
 index bc68d93..6b72fcb 100644
 --- a/drivers/acpi/arm64/iort.c
 +++ b/drivers/acpi/arm64/iort.c
 @@ -527,6 +527,49 @@ struct irq_domain *iort_get_device_domain(struct 
 device *dev, u32 req_id)
 return irq_find_matching_fwnode(handle, DOMAIN_BUS_PCI_MSI);
  }

 +/**
 + * iort_get_platform_device_domain() - Find MSI domain related to a
 + * platform device
 + * @dev: the dev pointer associated with the platform device
 + *
 + * Returns: the MSI domain for this device, NULL otherwise
 + */
 +static struct irq_domain *iort_get_platform_device_domain(struct device 
 *dev)
 +{
 +   struct acpi_iort_node *node, *msi_parent;
 +   struct fwnode_handle *iort_fwnode;
 +   struct acpi_iort_its_group *its;
 +
 +   /* find its associated iort node */
 +   node = iort_scan_node(ACPI_IORT_NODE_NAMED_COMPONENT,
 + iort_match_node_callback, dev);
 +   if (!node)
 +   return NULL;
 +
 +   /* then find its msi parent node */
 +   msi_parent = iort_node_get_id(node, NULL, IORT_MSI_TYPE, 0);
 +   if (!msi_parent)
 +   return NULL;
 +
 +   /* Move to ITS specific data */
 +   its = (struct acpi_iort_its_group *)msi_parent->node_data;
 +
 +   iort_fwnode = iort_find_domain_token(its->identifiers[0]);
 +   if (!iort_fwnode)
 +   return NULL;
 +
 +   return irq_find_matching_fwnode(iort_fwnode, 
 DOMAIN_BUS_PLATFORM_MSI);
 +}
 +
 +void acpi_configure_pmsi_domain(struct device *dev)
 +{
 +   struct irq_domain *msi_domain;
 +
 +   msi_domain = iort_get_platform_device_domain(dev);
 +   if (msi_domain)
 +   dev_set_msi_domain(dev, msi_domain);
 +}
 +
  static int __get_pci_rid(struct pci_dev *pdev, u16 alias, void *data)
  {
 u32 *rid = data;
 diff --git a/drivers/base/platform.c b/drivers/base/platform.c
 index c4af003..3e68f31 100644
 --- a/drivers/base/platform.c
 +++ b/drivers/base/platform.c
 @@ -537,6 +537,9 @@ struct platform_device *platform_device_register_full(
 goto err;
 }

 +   if (pdevinfo->pre_add_cb)
 +   pdevinfo->pre_add_cb(>dev);
 +
>>> -> because it looks like this might be done in acpi_platform_notify()
>>> for platform devices.
>> It works and I just simply add the code below:
>>
>> diff --git a/drivers/acpi/glue.c b/drivers/acpi/glue.c
>> index f8d6564..e0cd649 100644
>> --- a/drivers/acpi/glue.c
>> +++ b/drivers/acpi/glue.c
>> @@ -13,6 +13,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  #include 
>>
>>  #include "internal.h"
>> @@ -315,6 +316,8 @@ static int acpi_platform_notify(struct device *dev)
>> if (!adev)
>> goto out;
>>
>> + acpi_configure_pmsi_domain(dev);
>> +
> But that should apply to platform devices only I suppose?

Yes, it's only for the platform device.

>
>> if (type && type->setup)
>> type->setup(dev);
>> else if (adev->handler && adev->handler->bind)
>>
>> Do you suggesting to configure the msi domain in this way?
>> or add the function in the type->setup() callback (which needs
>> to introduce a new acpi bus type)?
> A

Re: [PATCH v5 09/14] ACPI: platform: setup MSI domain for ACPI based platform device

2016-12-25 Thread Hanjun Guo

Hi Rafael,

Happy holidays! reply inline.

On 2016/12/26 8:31, Rafael J. Wysocki wrote:
> On Sat, Dec 24, 2016 at 8:34 AM, Hanjun Guo  wrote:
>> Hi Rafael,
>>
>> Thank you for your comments, when I was demoing your suggestion,
>> I got a little bit confusions, please see my comments below.
>>
> [cut]
>
 +
 +/**
   * acpi_create_platform_device - Create platform device for ACPI device 
 node
   * @adev: ACPI device node to create a platform device for.
   * @properties: Optional collection of build-in properties.
 @@ -109,6 +119,7 @@ struct platform_device 
 *acpi_create_platform_device(struct acpi_device *adev,
 pdevinfo.num_res = count;
 pdevinfo.fwnode = acpi_fwnode_handle(adev);
 pdevinfo.properties = properties;
 +   pdevinfo.pre_add_cb = acpi_platform_pre_add_cb;
>>> Why don't you point that directly to acpi_configure_pmsi_domain()?  It
>>> doesn't look like the wrapper is necessary at all.
>> I was thinking that we can add something more in the future
>> if we need to extend the function of the callback, I can just
>> use acpi_configure_pmsi_domain() here.
> So you can add the wrapper in the future just fine as well.  At this
> point it is just redundant.
>
>>> And I'm not sure why the new callback is necessary ->
>> I was demoing your suggestion but...
>>
 if (acpi_dma_supported(adev))
 pdevinfo.dma_mask = DMA_BIT_MASK(32);
 diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
 index bc68d93..6b72fcb 100644
 --- a/drivers/acpi/arm64/iort.c
 +++ b/drivers/acpi/arm64/iort.c
 @@ -527,6 +527,49 @@ struct irq_domain *iort_get_device_domain(struct 
 device *dev, u32 req_id)
 return irq_find_matching_fwnode(handle, DOMAIN_BUS_PCI_MSI);
  }

 +/**
 + * iort_get_platform_device_domain() - Find MSI domain related to a
 + * platform device
 + * @dev: the dev pointer associated with the platform device
 + *
 + * Returns: the MSI domain for this device, NULL otherwise
 + */
 +static struct irq_domain *iort_get_platform_device_domain(struct device 
 *dev)
 +{
 +   struct acpi_iort_node *node, *msi_parent;
 +   struct fwnode_handle *iort_fwnode;
 +   struct acpi_iort_its_group *its;
 +
 +   /* find its associated iort node */
 +   node = iort_scan_node(ACPI_IORT_NODE_NAMED_COMPONENT,
 + iort_match_node_callback, dev);
 +   if (!node)
 +   return NULL;
 +
 +   /* then find its msi parent node */
 +   msi_parent = iort_node_get_id(node, NULL, IORT_MSI_TYPE, 0);
 +   if (!msi_parent)
 +   return NULL;
 +
 +   /* Move to ITS specific data */
 +   its = (struct acpi_iort_its_group *)msi_parent->node_data;
 +
 +   iort_fwnode = iort_find_domain_token(its->identifiers[0]);
 +   if (!iort_fwnode)
 +   return NULL;
 +
 +   return irq_find_matching_fwnode(iort_fwnode, 
 DOMAIN_BUS_PLATFORM_MSI);
 +}
 +
 +void acpi_configure_pmsi_domain(struct device *dev)
 +{
 +   struct irq_domain *msi_domain;
 +
 +   msi_domain = iort_get_platform_device_domain(dev);
 +   if (msi_domain)
 +   dev_set_msi_domain(dev, msi_domain);
 +}
 +
  static int __get_pci_rid(struct pci_dev *pdev, u16 alias, void *data)
  {
 u32 *rid = data;
 diff --git a/drivers/base/platform.c b/drivers/base/platform.c
 index c4af003..3e68f31 100644
 --- a/drivers/base/platform.c
 +++ b/drivers/base/platform.c
 @@ -537,6 +537,9 @@ struct platform_device *platform_device_register_full(
 goto err;
 }

 +   if (pdevinfo->pre_add_cb)
 +   pdevinfo->pre_add_cb(>dev);
 +
>>> -> because it looks like this might be done in acpi_platform_notify()
>>> for platform devices.
>> It works and I just simply add the code below:
>>
>> diff --git a/drivers/acpi/glue.c b/drivers/acpi/glue.c
>> index f8d6564..e0cd649 100644
>> --- a/drivers/acpi/glue.c
>> +++ b/drivers/acpi/glue.c
>> @@ -13,6 +13,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  #include 
>>
>>  #include "internal.h"
>> @@ -315,6 +316,8 @@ static int acpi_platform_notify(struct device *dev)
>> if (!adev)
>> goto out;
>>
>> + acpi_configure_pmsi_domain(dev);
>> +
> But that should apply to platform devices only I suppose?

Yes, it's only for the platform device.

>
>> if (type && type->setup)
>> type->setup(dev);
>> else if (adev->handler && adev->handler->bind)
>>
>> Do you suggesting to configure the msi domain in this way?
>> or add the function in the type->setup() callback (which needs
>> to introduce a new acpi bus type)?
> A type->setup() would be

Re: [PATCH 2/2] mm: add PageWaiters indicating tasks are waiting for a page bit

2016-12-25 Thread Nicholas Piggin

On Sun, 25 Dec 2016 13:51:17 -0800
Linus Torvalds  wrote:

> On Sat, Dec 24, 2016 at 7:00 PM, Nicholas Piggin  wrote:
> > Add a new page flag, PageWaiters, to indicate the page waitqueue has
> > tasks waiting. This can be tested rather than testing waitqueue_active
> > which requires another cacheline load.  
> 
> Ok, I applied this one too. I think there's room for improvement, but
> I don't think it's going to help to just wait another release cycle
> and hope something happens.
> 
> Example room for improvement from a profile of unlock_page():
> 
>46.44 │  lock   andb $0xfe,(%rdi)
>34.22 │  mov(%rdi),%rax
> 
> this has the old "do atomic op on a byte, then load the whole word"
> issue that we used to have with the nasty zone lookup code too. And it
> causes a horrible pipeline hickup because the load will not forward
> the data from the (partial) store.
> 
>  Its' really a misfeature of our asm optimizations of the atomic bit
> ops. Using "andb" is slightly smaller, but in this case in particular,
> an "andq" would be a ton faster, and the mask still fits in an imm8,
> so it's not even hugely larger.

I did actually play around with that. I could not get my skylake
to forward the result from a lock op to a subsequent load (the
latency was the same whether you use lock ; andb or lock ; andl
(32 cycles for my test loop) whereas with non-atomic versions I
was getting about 15 cycles for andb vs 2 for andl.

I guess the lock op drains the store queue to coherency and does
not allow forwarding so as to provide the memory ordering
semantics.

> But it might also be a good idea to simply use a "cmpxchg" loop here.
> That also gives atomicity guarantees that we don't have with the
> "clear bit and then load the value".

cmpxchg ends up at 19 cycles including the initial load, so it
may be worthwhile. Powerpc has a similar problem with doing a
clear_bit; test_bit (not the size mismatch, but forwarding from
atomic ops being less capable).

Thanks,
Nick

Re: [PATCH 2/2] mm: add PageWaiters indicating tasks are waiting for a page bit

2016-12-25 Thread Nicholas Piggin

On Sun, 25 Dec 2016 13:51:17 -0800
Linus Torvalds  wrote:

> On Sat, Dec 24, 2016 at 7:00 PM, Nicholas Piggin  wrote:
> > Add a new page flag, PageWaiters, to indicate the page waitqueue has
> > tasks waiting. This can be tested rather than testing waitqueue_active
> > which requires another cacheline load.  
> 
> Ok, I applied this one too. I think there's room for improvement, but
> I don't think it's going to help to just wait another release cycle
> and hope something happens.
> 
> Example room for improvement from a profile of unlock_page():
> 
>46.44 │  lock   andb $0xfe,(%rdi)
>34.22 │  mov(%rdi),%rax
> 
> this has the old "do atomic op on a byte, then load the whole word"
> issue that we used to have with the nasty zone lookup code too. And it
> causes a horrible pipeline hickup because the load will not forward
> the data from the (partial) store.
> 
>  Its' really a misfeature of our asm optimizations of the atomic bit
> ops. Using "andb" is slightly smaller, but in this case in particular,
> an "andq" would be a ton faster, and the mask still fits in an imm8,
> so it's not even hugely larger.

I did actually play around with that. I could not get my skylake
to forward the result from a lock op to a subsequent load (the
latency was the same whether you use lock ; andb or lock ; andl
(32 cycles for my test loop) whereas with non-atomic versions I
was getting about 15 cycles for andb vs 2 for andl.

I guess the lock op drains the store queue to coherency and does
not allow forwarding so as to provide the memory ordering
semantics.

> But it might also be a good idea to simply use a "cmpxchg" loop here.
> That also gives atomicity guarantees that we don't have with the
> "clear bit and then load the value".

cmpxchg ends up at 19 cycles including the initial load, so it
may be worthwhile. Powerpc has a similar problem with doing a
clear_bit; test_bit (not the size mismatch, but forwarding from
atomic ops being less capable).

Thanks,
Nick

Linux 4.10-rc1

2016-12-25 Thread Linus Torvalds

It's Christmas Day, and it's two weeks since the merge window opened.
Thus, the merge window is now closed.

I did a few final pulls today, but I also rejected a couple of pulls
that came in late in the window and looked dodgy. You know who you
are.

On the whole, this wasn't all that big a release - nothing like 4.9.
Although it wasn't tiny either. I think 4.7 was smaller. 4.8 might
have been too. It's xmas day, and right now I can't be arsed to
actually do the statistics I'd normally do.

Everything looks pretty normal, although we had an unusual amount of
tree-wide final cleanups in the last days of the merge window. But the
general statistics look fairly common: a bit over half is drivers,
maybe slightly less arch updates than normal, and a fair amount of
documentation updates due to the sphinx conversion. And then the usual
misc noise all over, although the perf tooling updates do stand out.

The shortlog is much too big, as it always is during the merge window,
so as usual you just get the merge-log.

Linus

---

Al Viro (3):
vfs updates
more vfs updates
final vfs updates

Alex Williamson (1):
VFIO updates

Alexandre Belloni (1):
RTC updates

Andrew Morton (3):
updates
more updates
final set of updates

Arnd Bergmann (7):
ARM SoC non-urgent fixes
ARM SoC platform updates
ARM SoC defconfig updates
ARM SoC 64-bit updates
ARM DT updates
ARM 64-bit DT updates
ARM SoC driver updates

Bart Van Assche (1):
scsi target cleanups

Bjorn Andersson (2):
remoteproc updates
rpmsg updates

Bjorn Helgaas (1):
PCI updates

Borislav Petkov (1):
EDAC updates

Brian Norris (1):
MTD updates

Bruce Fields (1):
nfsd updates

Catalin Marinas (1):
arm64 updates

Chris Mason (1):
btrfs updates

Chris Metcalf (1):
arch/tile updates

Christoph Hellwig (1):
configfs update

Corey Minyard (1):
IPMI updates

Dan Williams (1):
libnvdimm updates

Darren Hart (1):
more x86 platform driver updates

Darrent Hart (1):
x86 platform driver updates

Dave Airlie (2):
drm updates
drm fixes

Dave Chinner (1):
xfs updates

David Kleikamp (1):
jfs update

David Miller (5):
networking updates
sparc updates
networking fixes and cleanups
networking fixes and cleanups
networking fixes

David Teigland (1):
dlm fixes

Dmitry Torokhov (1):
input subsystem updates

Doug Ledford (2):
rdma updates
rdma fixes

Eric Biederman (1):
namespace updates

Geert Uytterhoeven (1):
m68k updates

Greg KH (5):
USB/PHY updates
tty/serial updates
staging/IIO updates
driver core updates
char/misc driver updates

Greg Ungerer (1):
m68knommu updates

Guenter Roeck (1):
hwmon updates

Hans-Christian Noren Egtvedt (1):
AVR32 updates

Helge Deller (1):
parisc updates

Herbert Xu (2):
crypto updates
crypto fixes

Ilya Dryomov (1):
ceph updates

Ingo Molnar (24):
RCU updates
SMP bootup updates
EFI updates
locking updates
mm/PAT cleanup
perf updates
scheduler updates
hotplug API fix
x86 RAS updates
x86 apic updates
x86 asm updates
x86 boot updates
x86 build updates
x86 cleanups
x86 CPU updates
x86 FPU updates
x86 header fixlet
x86 idle updates
x86 microcode update
x86 platform updates
irq fix
perf fixes
timer fix
x86 fixes

Jacek Anaszewski (2):
LED updates
LED maintainer email update

Jaegeuk Kim (1):
f2fs updates

James Bottomley (2):
SCSI updates
late SCSI updates

James Morris (2):
security subsystem updates
SElinux fix

Jan Kara (1):
quota, fsnotify and ext2 updates

Jassi Brar (1):
mailbox updates

Jean Delvare (1):
dmi fix

Jens Axboe (4):
block layer updates
fs meta data unmap optimization
block IO fixes
block layer fixes

Jesper Nilsson (1):
CRIS updates

Jessica Yu (1):
modules updates

Jiri Kosina (3):
HID updates
livepatch update
trivial updates

Joerg Roedel (1):
IOMMU updates

Jon Mason (1):
NTB update

Jonathan Corbet (3):
documentation update
more documentation updates
documentation fix

Juergen Gross (1):
xen updates

Kees Cook (2):
pstore updates
gcc plugins updates

Konrad Rzeszutek Wilk (1):
swiotlb updates

Lee Jones (1):
MFD updates

Len Brown (1):
turbostat updates

Ley Foon Tan (1):
arch/nios2 updates

Linus Walleij (1):
pinctrl updates

Luinus Walleij (1):
GPIO updates

Luis de Bethencourt (1):
befs updates

Mark Brown (3):
regmap fixlet
regulator updates
spi updates

Martin Schwidefsky (2):
s390 updates
more s390 updates

Mauro Carvalho Chehab (2):
edac updates
media updates

Max Filippov (1):
Xtensa updates

Michael Ellerman (1):
powerpc updates

Michael Tsirkin (1):
virtio updates

Michal Marek (3):
kbuild updates
kconfig updates

[PATCH/RESEND 5/5] z3fold: add kref refcounting

2016-12-25 Thread Vitaly Wool

With both coming and already present locking optimizations,
introducing kref to reference-count z3fold objects is the right
thing to do. Moreover, it makes buddied list no longer necessary,
and allows for a simpler handling of headless pages.

Signed-off-by: Vitaly Wool 
---
 mm/z3fold.c | 137 ++--
 1 file changed, 68 insertions(+), 69 deletions(-)

diff --git a/mm/z3fold.c b/mm/z3fold.c
index 729a2da..4593493 100644
--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -52,6 +52,7 @@ enum buddy {
  * z3fold page, except for HEADLESS pages
  * @buddy: links the z3fold page into the relevant list in the pool
  * @page_lock: per-page lock
+ * @refcount:  reference cound for the z3fold page
  * @first_chunks:  the size of the first buddy in chunks, 0 if free
  * @middle_chunks: the size of the middle buddy in chunks, 0 if free
  * @last_chunks:   the size of the last buddy in chunks, 0 if free
@@ -60,6 +61,7 @@ enum buddy {
 struct z3fold_header {
struct list_head buddy;
raw_spinlock_t page_lock;
+   struct kref refcount;
unsigned short first_chunks;
unsigned short middle_chunks;
unsigned short last_chunks;
@@ -95,8 +97,6 @@ struct z3fold_header {
  * @unbuddied: array of lists tracking z3fold pages that contain 2- buddies;
  * the lists each z3fold page is added to depends on the size of
  * its free region.
- * @buddied:   list tracking the z3fold pages that contain 3 buddies;
- * these z3fold pages are full
  * @lru:   list tracking the z3fold pages in LRU order by most recently
  * added buddy.
  * @pages_nr:  number of z3fold pages in the pool.
@@ -109,7 +109,6 @@ struct z3fold_header {
 struct z3fold_pool {
spinlock_t lock;
struct list_head unbuddied[NCHUNKS];
-   struct list_head buddied;
struct list_head lru;
atomic64_t pages_nr;
const struct z3fold_ops *ops;
@@ -162,9 +161,21 @@ static struct z3fold_header *init_z3fold_page(struct page 
*page)
 }
 
 /* Resets the struct page fields and frees the page */
-static void free_z3fold_page(struct z3fold_header *zhdr)
+static void free_z3fold_page(struct page *page)
 {
-   __free_page(virt_to_page(zhdr));
+   __free_page(page);
+}
+
+static void release_z3fold_page(struct kref *ref)
+{
+   struct z3fold_header *zhdr = container_of(ref, struct z3fold_header,
+   refcount);
+   struct page *page = virt_to_page(zhdr);
+   if (!list_empty(>buddy))
+   list_del(>buddy);
+   if (!list_empty(>lru))
+   list_del(>lru);
+   free_z3fold_page(page);
 }
 
 /* Lock a z3fold page */
@@ -256,9 +267,9 @@ static struct z3fold_pool *z3fold_create_pool(gfp_t gfp,
if (!pool)
return NULL;
spin_lock_init(>lock);
+   kref_init(>refcount);
for_each_unbuddied_list(i, 0)
INIT_LIST_HEAD(>unbuddied[i]);
-   INIT_LIST_HEAD(>buddied);
INIT_LIST_HEAD(>lru);
atomic64_set(>pages_nr, 0);
pool->ops = ops;
@@ -383,7 +394,7 @@ static int z3fold_alloc(struct z3fold_pool *pool, size_t 
size, gfp_t gfp,
spin_lock(>lock);
zhdr = list_first_entry_or_null(>unbuddied[i],
struct z3fold_header, buddy);
-   if (!zhdr) {
+   if (!zhdr || !kref_get_unless_zero(>refcount)) {
spin_unlock(>lock);
continue;
}
@@ -403,10 +414,12 @@ static int z3fold_alloc(struct z3fold_pool *pool, size_t 
size, gfp_t gfp,
else if (zhdr->middle_chunks == 0)
bud = MIDDLE;
else {
+   z3fold_page_unlock(zhdr);
spin_lock(>lock);
-   list_add(>buddy, >buddied);
+   if (kref_put(>refcount,
+release_z3fold_page))
+   atomic64_dec(>pages_nr);
spin_unlock(>lock);
-   z3fold_page_unlock(zhdr);
pr_err("No free chunks in unbuddied\n");
WARN_ON(1);
continue;
@@ -447,9 +460,6 @@ static int z3fold_alloc(struct z3fold_pool *pool, size_t 
size, gfp_t gfp,
/* Add to unbuddied list */
freechunks = num_free_chunks(zhdr);
list_add(>buddy, >unbuddied[freechunks]);
-   } else {
-   /* Add to buddied list */
-   list_add(>buddy, >buddied);
}
 
 headless:
@@ -515,50 +525,39 @@ static void

Linux 4.10-rc1

2016-12-25 Thread Linus Torvalds

It's Christmas Day, and it's two weeks since the merge window opened.
Thus, the merge window is now closed.

I did a few final pulls today, but I also rejected a couple of pulls
that came in late in the window and looked dodgy. You know who you
are.

On the whole, this wasn't all that big a release - nothing like 4.9.
Although it wasn't tiny either. I think 4.7 was smaller. 4.8 might
have been too. It's xmas day, and right now I can't be arsed to
actually do the statistics I'd normally do.

Everything looks pretty normal, although we had an unusual amount of
tree-wide final cleanups in the last days of the merge window. But the
general statistics look fairly common: a bit over half is drivers,
maybe slightly less arch updates than normal, and a fair amount of
documentation updates due to the sphinx conversion. And then the usual
misc noise all over, although the perf tooling updates do stand out.

The shortlog is much too big, as it always is during the merge window,
so as usual you just get the merge-log.

Linus

---

Al Viro (3):
vfs updates
more vfs updates
final vfs updates

Alex Williamson (1):
VFIO updates

Alexandre Belloni (1):
RTC updates

Andrew Morton (3):
updates
more updates
final set of updates

Arnd Bergmann (7):
ARM SoC non-urgent fixes
ARM SoC platform updates
ARM SoC defconfig updates
ARM SoC 64-bit updates
ARM DT updates
ARM 64-bit DT updates
ARM SoC driver updates

Bart Van Assche (1):
scsi target cleanups

Bjorn Andersson (2):
remoteproc updates
rpmsg updates

Bjorn Helgaas (1):
PCI updates

Borislav Petkov (1):
EDAC updates

Brian Norris (1):
MTD updates

Bruce Fields (1):
nfsd updates

Catalin Marinas (1):
arm64 updates

Chris Mason (1):
btrfs updates

Chris Metcalf (1):
arch/tile updates

Christoph Hellwig (1):
configfs update

Corey Minyard (1):
IPMI updates

Dan Williams (1):
libnvdimm updates

Darren Hart (1):
more x86 platform driver updates

Darrent Hart (1):
x86 platform driver updates

Dave Airlie (2):
drm updates
drm fixes

Dave Chinner (1):
xfs updates

David Kleikamp (1):
jfs update

David Miller (5):
networking updates
sparc updates
networking fixes and cleanups
networking fixes and cleanups
networking fixes

David Teigland (1):
dlm fixes

Dmitry Torokhov (1):
input subsystem updates

Doug Ledford (2):
rdma updates
rdma fixes

Eric Biederman (1):
namespace updates

Geert Uytterhoeven (1):
m68k updates

Greg KH (5):
USB/PHY updates
tty/serial updates
staging/IIO updates
driver core updates
char/misc driver updates

Greg Ungerer (1):
m68knommu updates

Guenter Roeck (1):
hwmon updates

Hans-Christian Noren Egtvedt (1):
AVR32 updates

Helge Deller (1):
parisc updates

Herbert Xu (2):
crypto updates
crypto fixes

Ilya Dryomov (1):
ceph updates

Ingo Molnar (24):
RCU updates
SMP bootup updates
EFI updates
locking updates
mm/PAT cleanup
perf updates
scheduler updates
hotplug API fix
x86 RAS updates
x86 apic updates
x86 asm updates
x86 boot updates
x86 build updates
x86 cleanups
x86 CPU updates
x86 FPU updates
x86 header fixlet
x86 idle updates
x86 microcode update
x86 platform updates
irq fix
perf fixes
timer fix
x86 fixes

Jacek Anaszewski (2):
LED updates
LED maintainer email update

Jaegeuk Kim (1):
f2fs updates

James Bottomley (2):
SCSI updates
late SCSI updates

James Morris (2):
security subsystem updates
SElinux fix

Jan Kara (1):
quota, fsnotify and ext2 updates

Jassi Brar (1):
mailbox updates

Jean Delvare (1):
dmi fix

Jens Axboe (4):
block layer updates
fs meta data unmap optimization
block IO fixes
block layer fixes

Jesper Nilsson (1):
CRIS updates

Jessica Yu (1):
modules updates

Jiri Kosina (3):
HID updates
livepatch update
trivial updates

Joerg Roedel (1):
IOMMU updates

Jon Mason (1):
NTB update

Jonathan Corbet (3):
documentation update
more documentation updates
documentation fix

Juergen Gross (1):
xen updates

Kees Cook (2):
pstore updates
gcc plugins updates

Konrad Rzeszutek Wilk (1):
swiotlb updates

Lee Jones (1):
MFD updates

Len Brown (1):
turbostat updates

Ley Foon Tan (1):
arch/nios2 updates

Linus Walleij (1):
pinctrl updates

Luinus Walleij (1):
GPIO updates

Luis de Bethencourt (1):
befs updates

Mark Brown (3):
regmap fixlet
regulator updates
spi updates

Martin Schwidefsky (2):
s390 updates
more s390 updates

Mauro Carvalho Chehab (2):
edac updates
media updates

Max Filippov (1):
Xtensa updates

Michael Ellerman (1):
powerpc updates

Michael Tsirkin (1):
virtio updates

Michal Marek (3):
kbuild updates
kconfig updates

[PATCH/RESEND 5/5] z3fold: add kref refcounting

2016-12-25 Thread Vitaly Wool

With both coming and already present locking optimizations,
introducing kref to reference-count z3fold objects is the right
thing to do. Moreover, it makes buddied list no longer necessary,
and allows for a simpler handling of headless pages.

Signed-off-by: Vitaly Wool 
---
 mm/z3fold.c | 137 ++--
 1 file changed, 68 insertions(+), 69 deletions(-)

diff --git a/mm/z3fold.c b/mm/z3fold.c
index 729a2da..4593493 100644
--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -52,6 +52,7 @@ enum buddy {
  * z3fold page, except for HEADLESS pages
  * @buddy: links the z3fold page into the relevant list in the pool
  * @page_lock: per-page lock
+ * @refcount:  reference cound for the z3fold page
  * @first_chunks:  the size of the first buddy in chunks, 0 if free
  * @middle_chunks: the size of the middle buddy in chunks, 0 if free
  * @last_chunks:   the size of the last buddy in chunks, 0 if free
@@ -60,6 +61,7 @@ enum buddy {
 struct z3fold_header {
struct list_head buddy;
raw_spinlock_t page_lock;
+   struct kref refcount;
unsigned short first_chunks;
unsigned short middle_chunks;
unsigned short last_chunks;
@@ -95,8 +97,6 @@ struct z3fold_header {
  * @unbuddied: array of lists tracking z3fold pages that contain 2- buddies;
  * the lists each z3fold page is added to depends on the size of
  * its free region.
- * @buddied:   list tracking the z3fold pages that contain 3 buddies;
- * these z3fold pages are full
  * @lru:   list tracking the z3fold pages in LRU order by most recently
  * added buddy.
  * @pages_nr:  number of z3fold pages in the pool.
@@ -109,7 +109,6 @@ struct z3fold_header {
 struct z3fold_pool {
spinlock_t lock;
struct list_head unbuddied[NCHUNKS];
-   struct list_head buddied;
struct list_head lru;
atomic64_t pages_nr;
const struct z3fold_ops *ops;
@@ -162,9 +161,21 @@ static struct z3fold_header *init_z3fold_page(struct page 
*page)
 }
 
 /* Resets the struct page fields and frees the page */
-static void free_z3fold_page(struct z3fold_header *zhdr)
+static void free_z3fold_page(struct page *page)
 {
-   __free_page(virt_to_page(zhdr));
+   __free_page(page);
+}
+
+static void release_z3fold_page(struct kref *ref)
+{
+   struct z3fold_header *zhdr = container_of(ref, struct z3fold_header,
+   refcount);
+   struct page *page = virt_to_page(zhdr);
+   if (!list_empty(>buddy))
+   list_del(>buddy);
+   if (!list_empty(>lru))
+   list_del(>lru);
+   free_z3fold_page(page);
 }
 
 /* Lock a z3fold page */
@@ -256,9 +267,9 @@ static struct z3fold_pool *z3fold_create_pool(gfp_t gfp,
if (!pool)
return NULL;
spin_lock_init(>lock);
+   kref_init(>refcount);
for_each_unbuddied_list(i, 0)
INIT_LIST_HEAD(>unbuddied[i]);
-   INIT_LIST_HEAD(>buddied);
INIT_LIST_HEAD(>lru);
atomic64_set(>pages_nr, 0);
pool->ops = ops;
@@ -383,7 +394,7 @@ static int z3fold_alloc(struct z3fold_pool *pool, size_t 
size, gfp_t gfp,
spin_lock(>lock);
zhdr = list_first_entry_or_null(>unbuddied[i],
struct z3fold_header, buddy);
-   if (!zhdr) {
+   if (!zhdr || !kref_get_unless_zero(>refcount)) {
spin_unlock(>lock);
continue;
}
@@ -403,10 +414,12 @@ static int z3fold_alloc(struct z3fold_pool *pool, size_t 
size, gfp_t gfp,
else if (zhdr->middle_chunks == 0)
bud = MIDDLE;
else {
+   z3fold_page_unlock(zhdr);
spin_lock(>lock);
-   list_add(>buddy, >buddied);
+   if (kref_put(>refcount,
+release_z3fold_page))
+   atomic64_dec(>pages_nr);
spin_unlock(>lock);
-   z3fold_page_unlock(zhdr);
pr_err("No free chunks in unbuddied\n");
WARN_ON(1);
continue;
@@ -447,9 +460,6 @@ static int z3fold_alloc(struct z3fold_pool *pool, size_t 
size, gfp_t gfp,
/* Add to unbuddied list */
freechunks = num_free_chunks(zhdr);
list_add(>buddy, >unbuddied[freechunks]);
-   } else {
-   /* Add to buddied list */
-   list_add(>buddy, >buddied);
}
 
 headless:
@@ -515,50 +525,39 @@ static void z3fold_free(struct

[PATCH/RESEND 4/5] z3fold: fix header size related issues

2016-12-25 Thread Vitaly Wool

Currently the whole kernel build will be stopped if the size of struct
z3fold_header is greater than the size of one chunk, which is 64 bytes by
default. This patch instead defines the offset for z3fold objects as the
size of the z3fold header in chunks.

Fixed also are the calculation of num_free_chunks() and the address to
move the middle chunk to in case of in-page compaction in
z3fold_compact_page().

Signed-off-by: Vitaly Wool 
---
 mm/z3fold.c | 161 
 1 file changed, 87 insertions(+), 74 deletions(-)

diff --git a/mm/z3fold.c b/mm/z3fold.c
index 28c0a2d..729a2da 100644
--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -34,29 +34,60 @@
 /*
  * Structures
 */
+struct z3fold_pool;
+struct z3fold_ops {
+   int (*evict)(struct z3fold_pool *pool, unsigned long handle);
+};
+
+enum buddy {
+   HEADLESS = 0,
+   FIRST,
+   MIDDLE,
+   LAST,
+   BUDDIES_MAX
+};
+
+/*
+ * struct z3fold_header - z3fold page metadata occupying the first chunk of 
each
+ * z3fold page, except for HEADLESS pages
+ * @buddy: links the z3fold page into the relevant list in the pool
+ * @page_lock: per-page lock
+ * @first_chunks:  the size of the first buddy in chunks, 0 if free
+ * @middle_chunks: the size of the middle buddy in chunks, 0 if free
+ * @last_chunks:   the size of the last buddy in chunks, 0 if free
+ * @first_num: the starting number (for the first handle)
+ */
+struct z3fold_header {
+   struct list_head buddy;
+   raw_spinlock_t page_lock;
+   unsigned short first_chunks;
+   unsigned short middle_chunks;
+   unsigned short last_chunks;
+   unsigned short start_middle;
+   unsigned short first_num:2;
+};
+
 /*
  * NCHUNKS_ORDER determines the internal allocation granularity, effectively
  * adjusting internal fragmentation.  It also determines the number of
  * freelists maintained in each pool. NCHUNKS_ORDER of 6 means that the
- * allocation granularity will be in chunks of size PAGE_SIZE/64. As one chunk
- * in allocated page is occupied by z3fold header, NCHUNKS will be calculated
- * to 63 which shows the max number of free chunks in z3fold page, also there
- * will be 63 freelists per pool.
+ * allocation granularity will be in chunks of size PAGE_SIZE/64. Some chunks
+ * in the beginning of an allocated page are occupied by z3fold header, so
+ * NCHUNKS will be calculated to 63 (or 62 in case CONFIG_DEBUG_SPINLOCK=y),
+ * which shows the max number of free chunks in z3fold page, also there will
+ * be 63, or 62, respectively, freelists per pool.
  */
 #define NCHUNKS_ORDER  6
 
 #define CHUNK_SHIFT(PAGE_SHIFT - NCHUNKS_ORDER)
 #define CHUNK_SIZE (1 << CHUNK_SHIFT)
-#define ZHDR_SIZE_ALIGNED CHUNK_SIZE
+#define ZHDR_SIZE_ALIGNED round_up(sizeof(struct z3fold_header), CHUNK_SIZE)
+#define ZHDR_CHUNKS(ZHDR_SIZE_ALIGNED >> CHUNK_SHIFT)
+#define TOTAL_CHUNKS   (PAGE_SIZE >> CHUNK_SHIFT)
 #define NCHUNKS((PAGE_SIZE - ZHDR_SIZE_ALIGNED) >> CHUNK_SHIFT)
 
 #define BUDDY_MASK (0x3)
 
-struct z3fold_pool;
-struct z3fold_ops {
-   int (*evict)(struct z3fold_pool *pool, unsigned long handle);
-};
-
 /**
  * struct z3fold_pool - stores metadata for each z3fold pool
  * @lock:  protects all pool fields and first|last_chunk fields of any
@@ -86,33 +117,6 @@ struct z3fold_pool {
const struct zpool_ops *zpool_ops;
 };
 
-enum buddy {
-   HEADLESS = 0,
-   FIRST,
-   MIDDLE,
-   LAST,
-   BUDDIES_MAX
-};
-
-/*
- * struct z3fold_header - z3fold page metadata occupying the first chunk of 
each
- * z3fold page, except for HEADLESS pages
- * @buddy: links the z3fold page into the relevant list in the pool
- * @page_lock: per-page lock
- * @first_chunks:  the size of the first buddy in chunks, 0 if free
- * @middle_chunks: the size of the middle buddy in chunks, 0 if free
- * @last_chunks:   the size of the last buddy in chunks, 0 if free
- * @first_num: the starting number (for the first handle)
- */
-struct z3fold_header {
-   struct list_head buddy;
-   raw_spinlock_t page_lock;
-   unsigned short first_chunks;
-   unsigned short middle_chunks;
-   unsigned short last_chunks;
-   unsigned short start_middle;
-   unsigned short first_num:2;
-};
 
 /*
  * Internal z3fold page flags
@@ -123,6 +127,7 @@ enum z3fold_page_flags {
MIDDLE_CHUNK_MAPPED,
 };
 
+
 /*
  * Helpers
 */
@@ -220,9 +225,10 @@ static int num_free_chunks(struct z3fold_header *zhdr)
 */
if (zhdr->middle_chunks != 0) {
int nfree_before = zhdr->first_chunks ?
-   0 : zhdr->start_middle - 1;
+   0 : zhdr->start_middle - ZHDR_CHUNKS;
int nfree_after = zhdr->last_chunks ?
-

[PATCH/RESEND 4/5] z3fold: fix header size related issues

2016-12-25 Thread Vitaly Wool

Currently the whole kernel build will be stopped if the size of struct
z3fold_header is greater than the size of one chunk, which is 64 bytes by
default. This patch instead defines the offset for z3fold objects as the
size of the z3fold header in chunks.

Fixed also are the calculation of num_free_chunks() and the address to
move the middle chunk to in case of in-page compaction in
z3fold_compact_page().

Signed-off-by: Vitaly Wool 
---
 mm/z3fold.c | 161 
 1 file changed, 87 insertions(+), 74 deletions(-)

diff --git a/mm/z3fold.c b/mm/z3fold.c
index 28c0a2d..729a2da 100644
--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -34,29 +34,60 @@
 /*
  * Structures
 */
+struct z3fold_pool;
+struct z3fold_ops {
+   int (*evict)(struct z3fold_pool *pool, unsigned long handle);
+};
+
+enum buddy {
+   HEADLESS = 0,
+   FIRST,
+   MIDDLE,
+   LAST,
+   BUDDIES_MAX
+};
+
+/*
+ * struct z3fold_header - z3fold page metadata occupying the first chunk of 
each
+ * z3fold page, except for HEADLESS pages
+ * @buddy: links the z3fold page into the relevant list in the pool
+ * @page_lock: per-page lock
+ * @first_chunks:  the size of the first buddy in chunks, 0 if free
+ * @middle_chunks: the size of the middle buddy in chunks, 0 if free
+ * @last_chunks:   the size of the last buddy in chunks, 0 if free
+ * @first_num: the starting number (for the first handle)
+ */
+struct z3fold_header {
+   struct list_head buddy;
+   raw_spinlock_t page_lock;
+   unsigned short first_chunks;
+   unsigned short middle_chunks;
+   unsigned short last_chunks;
+   unsigned short start_middle;
+   unsigned short first_num:2;
+};
+
 /*
  * NCHUNKS_ORDER determines the internal allocation granularity, effectively
  * adjusting internal fragmentation.  It also determines the number of
  * freelists maintained in each pool. NCHUNKS_ORDER of 6 means that the
- * allocation granularity will be in chunks of size PAGE_SIZE/64. As one chunk
- * in allocated page is occupied by z3fold header, NCHUNKS will be calculated
- * to 63 which shows the max number of free chunks in z3fold page, also there
- * will be 63 freelists per pool.
+ * allocation granularity will be in chunks of size PAGE_SIZE/64. Some chunks
+ * in the beginning of an allocated page are occupied by z3fold header, so
+ * NCHUNKS will be calculated to 63 (or 62 in case CONFIG_DEBUG_SPINLOCK=y),
+ * which shows the max number of free chunks in z3fold page, also there will
+ * be 63, or 62, respectively, freelists per pool.
  */
 #define NCHUNKS_ORDER  6
 
 #define CHUNK_SHIFT(PAGE_SHIFT - NCHUNKS_ORDER)
 #define CHUNK_SIZE (1 << CHUNK_SHIFT)
-#define ZHDR_SIZE_ALIGNED CHUNK_SIZE
+#define ZHDR_SIZE_ALIGNED round_up(sizeof(struct z3fold_header), CHUNK_SIZE)
+#define ZHDR_CHUNKS(ZHDR_SIZE_ALIGNED >> CHUNK_SHIFT)
+#define TOTAL_CHUNKS   (PAGE_SIZE >> CHUNK_SHIFT)
 #define NCHUNKS((PAGE_SIZE - ZHDR_SIZE_ALIGNED) >> CHUNK_SHIFT)
 
 #define BUDDY_MASK (0x3)
 
-struct z3fold_pool;
-struct z3fold_ops {
-   int (*evict)(struct z3fold_pool *pool, unsigned long handle);
-};
-
 /**
  * struct z3fold_pool - stores metadata for each z3fold pool
  * @lock:  protects all pool fields and first|last_chunk fields of any
@@ -86,33 +117,6 @@ struct z3fold_pool {
const struct zpool_ops *zpool_ops;
 };
 
-enum buddy {
-   HEADLESS = 0,
-   FIRST,
-   MIDDLE,
-   LAST,
-   BUDDIES_MAX
-};
-
-/*
- * struct z3fold_header - z3fold page metadata occupying the first chunk of 
each
- * z3fold page, except for HEADLESS pages
- * @buddy: links the z3fold page into the relevant list in the pool
- * @page_lock: per-page lock
- * @first_chunks:  the size of the first buddy in chunks, 0 if free
- * @middle_chunks: the size of the middle buddy in chunks, 0 if free
- * @last_chunks:   the size of the last buddy in chunks, 0 if free
- * @first_num: the starting number (for the first handle)
- */
-struct z3fold_header {
-   struct list_head buddy;
-   raw_spinlock_t page_lock;
-   unsigned short first_chunks;
-   unsigned short middle_chunks;
-   unsigned short last_chunks;
-   unsigned short start_middle;
-   unsigned short first_num:2;
-};
 
 /*
  * Internal z3fold page flags
@@ -123,6 +127,7 @@ enum z3fold_page_flags {
MIDDLE_CHUNK_MAPPED,
 };
 
+
 /*
  * Helpers
 */
@@ -220,9 +225,10 @@ static int num_free_chunks(struct z3fold_header *zhdr)
 */
if (zhdr->middle_chunks != 0) {
int nfree_before = zhdr->first_chunks ?
-   0 : zhdr->start_middle - 1;
+   0 : zhdr->start_middle - ZHDR_CHUNKS;
int nfree_after = zhdr->last_chunks ?
-   0 : NCHUNKS -

[PATCH/RESEND 3/5] z3fold: use per-page spinlock

2016-12-25 Thread Vitaly Wool

Most of z3fold operations are in-page, such as modifying z3fold page
header or moving z3fold objects within a page.  Taking per-pool spinlock
to protect per-page objects is therefore suboptimal, and the idea of
having a per-page spinlock (or rwlock) has been around for some time.

This patch implements raw spinlock-based per-page locking mechanism which
is lightweight enough to normally fit ok into the z3fold header.

Signed-off-by: Vitaly Wool 
---
 mm/z3fold.c | 148 +++-
 1 file changed, 106 insertions(+), 42 deletions(-)

diff --git a/mm/z3fold.c b/mm/z3fold.c
index d2e8aec..28c0a2d 100644
--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -98,6 +98,7 @@ enum buddy {
  * struct z3fold_header - z3fold page metadata occupying the first chunk of 
each
  * z3fold page, except for HEADLESS pages
  * @buddy: links the z3fold page into the relevant list in the pool
+ * @page_lock: per-page lock
  * @first_chunks:  the size of the first buddy in chunks, 0 if free
  * @middle_chunks: the size of the middle buddy in chunks, 0 if free
  * @last_chunks:   the size of the last buddy in chunks, 0 if free
@@ -105,6 +106,7 @@ enum buddy {
  */
 struct z3fold_header {
struct list_head buddy;
+   raw_spinlock_t page_lock;
unsigned short first_chunks;
unsigned short middle_chunks;
unsigned short last_chunks;
@@ -144,6 +146,7 @@ static struct z3fold_header *init_z3fold_page(struct page 
*page)
clear_bit(PAGE_HEADLESS, >private);
clear_bit(MIDDLE_CHUNK_MAPPED, >private);
 
+   raw_spin_lock_init(>page_lock);
zhdr->first_chunks = 0;
zhdr->middle_chunks = 0;
zhdr->last_chunks = 0;
@@ -159,6 +162,19 @@ static void free_z3fold_page(struct z3fold_header *zhdr)
__free_page(virt_to_page(zhdr));
 }
 
+/* Lock a z3fold page */
+static inline void z3fold_page_lock(struct z3fold_header *zhdr)
+{
+   raw_spin_lock(>page_lock);
+}
+
+/* Unlock a z3fold page */
+static inline void z3fold_page_unlock(struct z3fold_header *zhdr)
+{
+   raw_spin_unlock(>page_lock);
+}
+
+
 /*
  * Encodes the handle of a particular buddy within a z3fold page
  * Pool lock should be held as this function accesses first_num
@@ -347,50 +363,60 @@ static int z3fold_alloc(struct z3fold_pool *pool, size_t 
size, gfp_t gfp,
bud = HEADLESS;
else {
chunks = size_to_chunks(size);
-   spin_lock(>lock);
 
/* First, try to find an unbuddied z3fold page. */
zhdr = NULL;
for_each_unbuddied_list(i, chunks) {
-   if (!list_empty(>unbuddied[i])) {
-   zhdr = list_first_entry(>unbuddied[i],
+   spin_lock(>lock);
+   zhdr = list_first_entry_or_null(>unbuddied[i],
struct z3fold_header, buddy);
-   page = virt_to_page(zhdr);
-   if (zhdr->first_chunks == 0) {
-   if (zhdr->middle_chunks != 0 &&
-   chunks >= zhdr->start_middle)
-   bud = LAST;
-   else
-   bud = FIRST;
-   } else if (zhdr->last_chunks == 0)
+   if (!zhdr) {
+   spin_unlock(>lock);
+   continue;
+   }
+   list_del_init(>buddy);
+   spin_unlock(>lock);
+
+   page = virt_to_page(zhdr);
+   z3fold_page_lock(zhdr);
+   if (zhdr->first_chunks == 0) {
+   if (zhdr->middle_chunks != 0 &&
+   chunks >= zhdr->start_middle)
bud = LAST;
-   else if (zhdr->middle_chunks == 0)
-   bud = MIDDLE;
-   else {
-   pr_err("No free chunks in unbuddied\n");
-   WARN_ON(1);
-   continue;
-   }
-   list_del(>buddy);
-   goto found;
+   else
+   bud = FIRST;
+   } else if (zhdr->last_chunks == 0)
+   bud = LAST;
+   else if (zhdr->middle_chunks == 0)
+   bud = MIDDLE;
+   else {
+   spin_lock(>lock);
+   list_add(>buddy,

[PATCH/RESEND 3/5] z3fold: use per-page spinlock

2016-12-25 Thread Vitaly Wool

Most of z3fold operations are in-page, such as modifying z3fold page
header or moving z3fold objects within a page.  Taking per-pool spinlock
to protect per-page objects is therefore suboptimal, and the idea of
having a per-page spinlock (or rwlock) has been around for some time.

This patch implements raw spinlock-based per-page locking mechanism which
is lightweight enough to normally fit ok into the z3fold header.

Signed-off-by: Vitaly Wool 
---
 mm/z3fold.c | 148 +++-
 1 file changed, 106 insertions(+), 42 deletions(-)

diff --git a/mm/z3fold.c b/mm/z3fold.c
index d2e8aec..28c0a2d 100644
--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -98,6 +98,7 @@ enum buddy {
  * struct z3fold_header - z3fold page metadata occupying the first chunk of 
each
  * z3fold page, except for HEADLESS pages
  * @buddy: links the z3fold page into the relevant list in the pool
+ * @page_lock: per-page lock
  * @first_chunks:  the size of the first buddy in chunks, 0 if free
  * @middle_chunks: the size of the middle buddy in chunks, 0 if free
  * @last_chunks:   the size of the last buddy in chunks, 0 if free
@@ -105,6 +106,7 @@ enum buddy {
  */
 struct z3fold_header {
struct list_head buddy;
+   raw_spinlock_t page_lock;
unsigned short first_chunks;
unsigned short middle_chunks;
unsigned short last_chunks;
@@ -144,6 +146,7 @@ static struct z3fold_header *init_z3fold_page(struct page 
*page)
clear_bit(PAGE_HEADLESS, >private);
clear_bit(MIDDLE_CHUNK_MAPPED, >private);
 
+   raw_spin_lock_init(>page_lock);
zhdr->first_chunks = 0;
zhdr->middle_chunks = 0;
zhdr->last_chunks = 0;
@@ -159,6 +162,19 @@ static void free_z3fold_page(struct z3fold_header *zhdr)
__free_page(virt_to_page(zhdr));
 }
 
+/* Lock a z3fold page */
+static inline void z3fold_page_lock(struct z3fold_header *zhdr)
+{
+   raw_spin_lock(>page_lock);
+}
+
+/* Unlock a z3fold page */
+static inline void z3fold_page_unlock(struct z3fold_header *zhdr)
+{
+   raw_spin_unlock(>page_lock);
+}
+
+
 /*
  * Encodes the handle of a particular buddy within a z3fold page
  * Pool lock should be held as this function accesses first_num
@@ -347,50 +363,60 @@ static int z3fold_alloc(struct z3fold_pool *pool, size_t 
size, gfp_t gfp,
bud = HEADLESS;
else {
chunks = size_to_chunks(size);
-   spin_lock(>lock);
 
/* First, try to find an unbuddied z3fold page. */
zhdr = NULL;
for_each_unbuddied_list(i, chunks) {
-   if (!list_empty(>unbuddied[i])) {
-   zhdr = list_first_entry(>unbuddied[i],
+   spin_lock(>lock);
+   zhdr = list_first_entry_or_null(>unbuddied[i],
struct z3fold_header, buddy);
-   page = virt_to_page(zhdr);
-   if (zhdr->first_chunks == 0) {
-   if (zhdr->middle_chunks != 0 &&
-   chunks >= zhdr->start_middle)
-   bud = LAST;
-   else
-   bud = FIRST;
-   } else if (zhdr->last_chunks == 0)
+   if (!zhdr) {
+   spin_unlock(>lock);
+   continue;
+   }
+   list_del_init(>buddy);
+   spin_unlock(>lock);
+
+   page = virt_to_page(zhdr);
+   z3fold_page_lock(zhdr);
+   if (zhdr->first_chunks == 0) {
+   if (zhdr->middle_chunks != 0 &&
+   chunks >= zhdr->start_middle)
bud = LAST;
-   else if (zhdr->middle_chunks == 0)
-   bud = MIDDLE;
-   else {
-   pr_err("No free chunks in unbuddied\n");
-   WARN_ON(1);
-   continue;
-   }
-   list_del(>buddy);
-   goto found;
+   else
+   bud = FIRST;
+   } else if (zhdr->last_chunks == 0)
+   bud = LAST;
+   else if (zhdr->middle_chunks == 0)
+   bud = MIDDLE;
+   else {
+   spin_lock(>lock);
+   list_add(>buddy, >buddied);
+

[PATCH/RESEND 2/5] mm/z3fold.c: extend compaction function

2016-12-25 Thread Vitaly Wool

z3fold_compact_page() currently only handles the situation where there's a
single middle chunk within the z3fold page.  However it may be worth it to
move middle chunk closer to either first or last chunk, whichever is
there, if the gap between them is big enough.

Basically compression ratio wise, it always makes sense to move middle
chunk as close as possible to another in-page z3fold object, because then
the third object can use all the remaining space.  However, moving big
object just by one chunk will hurt performance without gaining much
compression ratio wise.  So the gap between the middle object and the edge
object should be big enough to justify the move.

So this patch improves compression ratio because in-page compaction
becomes more comprehensive; this patch (which came as a surprise) also
increases performance in fio randrw tests (I am not 100% sure why, but
probably due to less actual page allocations on hot path due to denser
in-page allocation).

This patch adds the relevant code, using BIG_CHUNK_GAP define as a
threshold for middle chunk to be worth moving.

Signed-off-by: Vitaly Wool 
---
 mm/z3fold.c | 60 +++-
 1 file changed, 47 insertions(+), 13 deletions(-)

diff --git a/mm/z3fold.c b/mm/z3fold.c
index 2273789..d2e8aec 100644
--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -254,26 +254,60 @@ static void z3fold_destroy_pool(struct z3fold_pool *pool)
kfree(pool);
 }
 
+static inline void *mchunk_memmove(struct z3fold_header *zhdr,
+   unsigned short dst_chunk)
+{
+   void *beg = zhdr;
+   return memmove(beg + (dst_chunk << CHUNK_SHIFT),
+  beg + (zhdr->start_middle << CHUNK_SHIFT),
+  zhdr->middle_chunks << CHUNK_SHIFT);
+}
+
+#define BIG_CHUNK_GAP  3
 /* Has to be called with lock held */
 static int z3fold_compact_page(struct z3fold_header *zhdr)
 {
struct page *page = virt_to_page(zhdr);
-   void *beg = zhdr;
+   int ret = 0;
+
+   if (test_bit(MIDDLE_CHUNK_MAPPED, >private))
+   goto out;
 
+   if (zhdr->middle_chunks != 0) {
+   if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
+   mchunk_memmove(zhdr, 1); /* move to the beginning */
+   zhdr->first_chunks = zhdr->middle_chunks;
+   zhdr->middle_chunks = 0;
+   zhdr->start_middle = 0;
+   zhdr->first_num++;
+   ret = 1;
+   goto out;
+   }
 
-   if (!test_bit(MIDDLE_CHUNK_MAPPED, >private) &&
-   zhdr->middle_chunks != 0 &&
-   zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
-   memmove(beg + ZHDR_SIZE_ALIGNED,
-   beg + (zhdr->start_middle << CHUNK_SHIFT),
-   zhdr->middle_chunks << CHUNK_SHIFT);
-   zhdr->first_chunks = zhdr->middle_chunks;
-   zhdr->middle_chunks = 0;
-   zhdr->start_middle = 0;
-   zhdr->first_num++;
-   return 1;
+   /*
+* moving data is expensive, so let's only do that if
+* there's substantial gain (at least BIG_CHUNK_GAP chunks)
+*/
+   if (zhdr->first_chunks != 0 && zhdr->last_chunks == 0 &&
+   zhdr->start_middle > zhdr->first_chunks + BIG_CHUNK_GAP) {
+   mchunk_memmove(zhdr, zhdr->first_chunks + 1);
+   zhdr->start_middle = zhdr->first_chunks + 1;
+   ret = 1;
+   goto out;
+   }
+   if (zhdr->last_chunks != 0 && zhdr->first_chunks == 0 &&
+   zhdr->middle_chunks + zhdr->last_chunks <=
+   NCHUNKS - zhdr->start_middle - BIG_CHUNK_GAP) {
+   unsigned short new_start = NCHUNKS - zhdr->last_chunks -
+   zhdr->middle_chunks;
+   mchunk_memmove(zhdr, new_start);
+   zhdr->start_middle = new_start;
+   ret = 1;
+   goto out;
+   }
}
-   return 0;
+out:
+   return ret;
 }
 
 /**
-- 
2.4.2

[PATCH/RESEND 2/5] mm/z3fold.c: extend compaction function

2016-12-25 Thread Vitaly Wool

z3fold_compact_page() currently only handles the situation where there's a
single middle chunk within the z3fold page.  However it may be worth it to
move middle chunk closer to either first or last chunk, whichever is
there, if the gap between them is big enough.

Basically compression ratio wise, it always makes sense to move middle
chunk as close as possible to another in-page z3fold object, because then
the third object can use all the remaining space.  However, moving big
object just by one chunk will hurt performance without gaining much
compression ratio wise.  So the gap between the middle object and the edge
object should be big enough to justify the move.

So this patch improves compression ratio because in-page compaction
becomes more comprehensive; this patch (which came as a surprise) also
increases performance in fio randrw tests (I am not 100% sure why, but
probably due to less actual page allocations on hot path due to denser
in-page allocation).

This patch adds the relevant code, using BIG_CHUNK_GAP define as a
threshold for middle chunk to be worth moving.

Signed-off-by: Vitaly Wool 
---
 mm/z3fold.c | 60 +++-
 1 file changed, 47 insertions(+), 13 deletions(-)

diff --git a/mm/z3fold.c b/mm/z3fold.c
index 2273789..d2e8aec 100644
--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -254,26 +254,60 @@ static void z3fold_destroy_pool(struct z3fold_pool *pool)
kfree(pool);
 }
 
+static inline void *mchunk_memmove(struct z3fold_header *zhdr,
+   unsigned short dst_chunk)
+{
+   void *beg = zhdr;
+   return memmove(beg + (dst_chunk << CHUNK_SHIFT),
+  beg + (zhdr->start_middle << CHUNK_SHIFT),
+  zhdr->middle_chunks << CHUNK_SHIFT);
+}
+
+#define BIG_CHUNK_GAP  3
 /* Has to be called with lock held */
 static int z3fold_compact_page(struct z3fold_header *zhdr)
 {
struct page *page = virt_to_page(zhdr);
-   void *beg = zhdr;
+   int ret = 0;
+
+   if (test_bit(MIDDLE_CHUNK_MAPPED, >private))
+   goto out;
 
+   if (zhdr->middle_chunks != 0) {
+   if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
+   mchunk_memmove(zhdr, 1); /* move to the beginning */
+   zhdr->first_chunks = zhdr->middle_chunks;
+   zhdr->middle_chunks = 0;
+   zhdr->start_middle = 0;
+   zhdr->first_num++;
+   ret = 1;
+   goto out;
+   }
 
-   if (!test_bit(MIDDLE_CHUNK_MAPPED, >private) &&
-   zhdr->middle_chunks != 0 &&
-   zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
-   memmove(beg + ZHDR_SIZE_ALIGNED,
-   beg + (zhdr->start_middle << CHUNK_SHIFT),
-   zhdr->middle_chunks << CHUNK_SHIFT);
-   zhdr->first_chunks = zhdr->middle_chunks;
-   zhdr->middle_chunks = 0;
-   zhdr->start_middle = 0;
-   zhdr->first_num++;
-   return 1;
+   /*
+* moving data is expensive, so let's only do that if
+* there's substantial gain (at least BIG_CHUNK_GAP chunks)
+*/
+   if (zhdr->first_chunks != 0 && zhdr->last_chunks == 0 &&
+   zhdr->start_middle > zhdr->first_chunks + BIG_CHUNK_GAP) {
+   mchunk_memmove(zhdr, zhdr->first_chunks + 1);
+   zhdr->start_middle = zhdr->first_chunks + 1;
+   ret = 1;
+   goto out;
+   }
+   if (zhdr->last_chunks != 0 && zhdr->first_chunks == 0 &&
+   zhdr->middle_chunks + zhdr->last_chunks <=
+   NCHUNKS - zhdr->start_middle - BIG_CHUNK_GAP) {
+   unsigned short new_start = NCHUNKS - zhdr->last_chunks -
+   zhdr->middle_chunks;
+   mchunk_memmove(zhdr, new_start);
+   zhdr->start_middle = new_start;
+   ret = 1;
+   goto out;
+   }
}
-   return 0;
+out:
+   return ret;
 }
 
 /**
-- 
2.4.2

[PATCH/RESEND 2/5] mm/z3fold.c: extend compaction function

2016-12-25 Thread Vitaly Wool

z3fold_compact_page() currently only handles the situation where there's a
single middle chunk within the z3fold page.  However it may be worth it to
move middle chunk closer to either first or last chunk, whichever is
there, if the gap between them is big enough.

Basically compression ratio wise, it always makes sense to move middle
chunk as close as possible to another in-page z3fold object, because then
the third object can use all the remaining space.  However, moving big
object just by one chunk will hurt performance without gaining much
compression ratio wise.  So the gap between the middle object and the edge
object should be big enough to justify the move.

So this patch improves compression ratio because in-page compaction
becomes more comprehensive; this patch (which came as a surprise) also
increases performance in fio randrw tests (I am not 100% sure why, but
probably due to less actual page allocations on hot path due to denser
in-page allocation).

This patch adds the relevant code, using BIG_CHUNK_GAP define as a
threshold for middle chunk to be worth moving.

Signed-off-by: Vitaly Wool 
---
 mm/z3fold.c | 60 +++-
 1 file changed, 47 insertions(+), 13 deletions(-)

diff --git a/mm/z3fold.c b/mm/z3fold.c
index 2273789..d2e8aec 100644
--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -254,26 +254,60 @@ static void z3fold_destroy_pool(struct z3fold_pool *pool)
kfree(pool);
 }
 
+static inline void *mchunk_memmove(struct z3fold_header *zhdr,
+   unsigned short dst_chunk)
+{
+   void *beg = zhdr;
+   return memmove(beg + (dst_chunk << CHUNK_SHIFT),
+  beg + (zhdr->start_middle << CHUNK_SHIFT),
+  zhdr->middle_chunks << CHUNK_SHIFT);
+}
+
+#define BIG_CHUNK_GAP  3
 /* Has to be called with lock held */
 static int z3fold_compact_page(struct z3fold_header *zhdr)
 {
struct page *page = virt_to_page(zhdr);
-   void *beg = zhdr;
+   int ret = 0;
+
+   if (test_bit(MIDDLE_CHUNK_MAPPED, >private))
+   goto out;
 
+   if (zhdr->middle_chunks != 0) {
+   if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
+   mchunk_memmove(zhdr, 1); /* move to the beginning */
+   zhdr->first_chunks = zhdr->middle_chunks;
+   zhdr->middle_chunks = 0;
+   zhdr->start_middle = 0;
+   zhdr->first_num++;
+   ret = 1;
+   goto out;
+   }
 
-   if (!test_bit(MIDDLE_CHUNK_MAPPED, >private) &&
-   zhdr->middle_chunks != 0 &&
-   zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
-   memmove(beg + ZHDR_SIZE_ALIGNED,
-   beg + (zhdr->start_middle << CHUNK_SHIFT),
-   zhdr->middle_chunks << CHUNK_SHIFT);
-   zhdr->first_chunks = zhdr->middle_chunks;
-   zhdr->middle_chunks = 0;
-   zhdr->start_middle = 0;
-   zhdr->first_num++;
-   return 1;
+   /*
+* moving data is expensive, so let's only do that if
+* there's substantial gain (at least BIG_CHUNK_GAP chunks)
+*/
+   if (zhdr->first_chunks != 0 && zhdr->last_chunks == 0 &&
+   zhdr->start_middle > zhdr->first_chunks + BIG_CHUNK_GAP) {
+   mchunk_memmove(zhdr, zhdr->first_chunks + 1);
+   zhdr->start_middle = zhdr->first_chunks + 1;
+   ret = 1;
+   goto out;
+   }
+   if (zhdr->last_chunks != 0 && zhdr->first_chunks == 0 &&
+   zhdr->middle_chunks + zhdr->last_chunks <=
+   NCHUNKS - zhdr->start_middle - BIG_CHUNK_GAP) {
+   unsigned short new_start = NCHUNKS - zhdr->last_chunks -
+   zhdr->middle_chunks;
+   mchunk_memmove(zhdr, new_start);
+   zhdr->start_middle = new_start;
+   ret = 1;
+   goto out;
+   }
}
-   return 0;
+out:
+   return ret;
 }
 
 /**
-- 
2.4.2

[PATCH/RESEND 2/5] mm/z3fold.c: extend compaction function

2016-12-25 Thread Vitaly Wool

z3fold_compact_page() currently only handles the situation where there's a
single middle chunk within the z3fold page.  However it may be worth it to
move middle chunk closer to either first or last chunk, whichever is
there, if the gap between them is big enough.

Basically compression ratio wise, it always makes sense to move middle
chunk as close as possible to another in-page z3fold object, because then
the third object can use all the remaining space.  However, moving big
object just by one chunk will hurt performance without gaining much
compression ratio wise.  So the gap between the middle object and the edge
object should be big enough to justify the move.

So this patch improves compression ratio because in-page compaction
becomes more comprehensive; this patch (which came as a surprise) also
increases performance in fio randrw tests (I am not 100% sure why, but
probably due to less actual page allocations on hot path due to denser
in-page allocation).

This patch adds the relevant code, using BIG_CHUNK_GAP define as a
threshold for middle chunk to be worth moving.

Signed-off-by: Vitaly Wool 
---
 mm/z3fold.c | 60 +++-
 1 file changed, 47 insertions(+), 13 deletions(-)

diff --git a/mm/z3fold.c b/mm/z3fold.c
index 2273789..d2e8aec 100644
--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -254,26 +254,60 @@ static void z3fold_destroy_pool(struct z3fold_pool *pool)
kfree(pool);
 }
 
+static inline void *mchunk_memmove(struct z3fold_header *zhdr,
+   unsigned short dst_chunk)
+{
+   void *beg = zhdr;
+   return memmove(beg + (dst_chunk << CHUNK_SHIFT),
+  beg + (zhdr->start_middle << CHUNK_SHIFT),
+  zhdr->middle_chunks << CHUNK_SHIFT);
+}
+
+#define BIG_CHUNK_GAP  3
 /* Has to be called with lock held */
 static int z3fold_compact_page(struct z3fold_header *zhdr)
 {
struct page *page = virt_to_page(zhdr);
-   void *beg = zhdr;
+   int ret = 0;
+
+   if (test_bit(MIDDLE_CHUNK_MAPPED, >private))
+   goto out;
 
+   if (zhdr->middle_chunks != 0) {
+   if (zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
+   mchunk_memmove(zhdr, 1); /* move to the beginning */
+   zhdr->first_chunks = zhdr->middle_chunks;
+   zhdr->middle_chunks = 0;
+   zhdr->start_middle = 0;
+   zhdr->first_num++;
+   ret = 1;
+   goto out;
+   }
 
-   if (!test_bit(MIDDLE_CHUNK_MAPPED, >private) &&
-   zhdr->middle_chunks != 0 &&
-   zhdr->first_chunks == 0 && zhdr->last_chunks == 0) {
-   memmove(beg + ZHDR_SIZE_ALIGNED,
-   beg + (zhdr->start_middle << CHUNK_SHIFT),
-   zhdr->middle_chunks << CHUNK_SHIFT);
-   zhdr->first_chunks = zhdr->middle_chunks;
-   zhdr->middle_chunks = 0;
-   zhdr->start_middle = 0;
-   zhdr->first_num++;
-   return 1;
+   /*
+* moving data is expensive, so let's only do that if
+* there's substantial gain (at least BIG_CHUNK_GAP chunks)
+*/
+   if (zhdr->first_chunks != 0 && zhdr->last_chunks == 0 &&
+   zhdr->start_middle > zhdr->first_chunks + BIG_CHUNK_GAP) {
+   mchunk_memmove(zhdr, zhdr->first_chunks + 1);
+   zhdr->start_middle = zhdr->first_chunks + 1;
+   ret = 1;
+   goto out;
+   }
+   if (zhdr->last_chunks != 0 && zhdr->first_chunks == 0 &&
+   zhdr->middle_chunks + zhdr->last_chunks <=
+   NCHUNKS - zhdr->start_middle - BIG_CHUNK_GAP) {
+   unsigned short new_start = NCHUNKS - zhdr->last_chunks -
+   zhdr->middle_chunks;
+   mchunk_memmove(zhdr, new_start);
+   zhdr->start_middle = new_start;
+   ret = 1;
+   goto out;
+   }
}
-   return 0;
+out:
+   return ret;
 }
 
 /**
-- 
2.4.2

[PATCH/RESEND 1/5] mm/z3fold.c: make pages_nr atomic

2016-12-25 Thread Vitaly Wool

Convert pages_nr per-pool counter to atomic64_t so that we won't have
to care about locking for reading/updating it.

Signed-off-by: Vitaly Wool 
---
 mm/z3fold.c | 20 +---
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/mm/z3fold.c b/mm/z3fold.c
index 207e5dd..2273789 100644
--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -80,7 +80,7 @@ struct z3fold_pool {
struct list_head unbuddied[NCHUNKS];
struct list_head buddied;
struct list_head lru;
-   u64 pages_nr;
+   atomic64_t pages_nr;
const struct z3fold_ops *ops;
struct zpool *zpool;
const struct zpool_ops *zpool_ops;
@@ -238,7 +238,7 @@ static struct z3fold_pool *z3fold_create_pool(gfp_t gfp,
INIT_LIST_HEAD(>unbuddied[i]);
INIT_LIST_HEAD(>buddied);
INIT_LIST_HEAD(>lru);
-   pool->pages_nr = 0;
+   atomic64_set(>pages_nr, 0);
pool->ops = ops;
return pool;
 }
@@ -350,7 +350,7 @@ static int z3fold_alloc(struct z3fold_pool *pool, size_t 
size, gfp_t gfp,
if (!page)
return -ENOMEM;
spin_lock(>lock);
-   pool->pages_nr++;
+   atomic64_inc(>pages_nr);
zhdr = init_z3fold_page(page);
 
if (bud == HEADLESS) {
@@ -443,10 +443,9 @@ static void z3fold_free(struct z3fold_pool *pool, unsigned 
long handle)
return;
}
 
-   if (bud != HEADLESS) {
-   /* Remove from existing buddy list */
+   /* Remove from existing buddy list */
+   if (bud != HEADLESS)
list_del(>buddy);
-   }
 
if (bud == HEADLESS ||
(zhdr->first_chunks == 0 && zhdr->middle_chunks == 0 &&
@@ -455,7 +454,7 @@ static void z3fold_free(struct z3fold_pool *pool, unsigned 
long handle)
list_del(>lru);
clear_bit(PAGE_HEADLESS, >private);
free_z3fold_page(zhdr);
-   pool->pages_nr--;
+   atomic64_dec(>pages_nr);
} else {
z3fold_compact_page(zhdr);
/* Add to the unbuddied list */
@@ -573,7 +572,7 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, 
unsigned int retries)
 */
clear_bit(PAGE_HEADLESS, >private);
free_z3fold_page(zhdr);
-   pool->pages_nr--;
+   atomic64_dec(>pages_nr);
spin_unlock(>lock);
return 0;
}  else if (!test_bit(PAGE_HEADLESS, >private)) {
@@ -676,12 +675,11 @@ static void z3fold_unmap(struct z3fold_pool *pool, 
unsigned long handle)
  * z3fold_get_pool_size() - gets the z3fold pool size in pages
  * @pool:  pool whose size is being queried
  *
- * Returns: size in pages of the given pool.  The pool lock need not be
- * taken to access pages_nr.
+ * Returns: size in pages of the given pool.
  */
 static u64 z3fold_get_pool_size(struct z3fold_pool *pool)
 {
-   return pool->pages_nr;
+   return atomic64_read(>pages_nr);
 }
 
 /*
-- 
2.4.2

[PATCH/RESEND 1/5] mm/z3fold.c: make pages_nr atomic

2016-12-25 Thread Vitaly Wool

Convert pages_nr per-pool counter to atomic64_t so that we won't have
to care about locking for reading/updating it.

Signed-off-by: Vitaly Wool 
---
 mm/z3fold.c | 20 +---
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/mm/z3fold.c b/mm/z3fold.c
index 207e5dd..2273789 100644
--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -80,7 +80,7 @@ struct z3fold_pool {
struct list_head unbuddied[NCHUNKS];
struct list_head buddied;
struct list_head lru;
-   u64 pages_nr;
+   atomic64_t pages_nr;
const struct z3fold_ops *ops;
struct zpool *zpool;
const struct zpool_ops *zpool_ops;
@@ -238,7 +238,7 @@ static struct z3fold_pool *z3fold_create_pool(gfp_t gfp,
INIT_LIST_HEAD(>unbuddied[i]);
INIT_LIST_HEAD(>buddied);
INIT_LIST_HEAD(>lru);
-   pool->pages_nr = 0;
+   atomic64_set(>pages_nr, 0);
pool->ops = ops;
return pool;
 }
@@ -350,7 +350,7 @@ static int z3fold_alloc(struct z3fold_pool *pool, size_t 
size, gfp_t gfp,
if (!page)
return -ENOMEM;
spin_lock(>lock);
-   pool->pages_nr++;
+   atomic64_inc(>pages_nr);
zhdr = init_z3fold_page(page);
 
if (bud == HEADLESS) {
@@ -443,10 +443,9 @@ static void z3fold_free(struct z3fold_pool *pool, unsigned 
long handle)
return;
}
 
-   if (bud != HEADLESS) {
-   /* Remove from existing buddy list */
+   /* Remove from existing buddy list */
+   if (bud != HEADLESS)
list_del(>buddy);
-   }
 
if (bud == HEADLESS ||
(zhdr->first_chunks == 0 && zhdr->middle_chunks == 0 &&
@@ -455,7 +454,7 @@ static void z3fold_free(struct z3fold_pool *pool, unsigned 
long handle)
list_del(>lru);
clear_bit(PAGE_HEADLESS, >private);
free_z3fold_page(zhdr);
-   pool->pages_nr--;
+   atomic64_dec(>pages_nr);
} else {
z3fold_compact_page(zhdr);
/* Add to the unbuddied list */
@@ -573,7 +572,7 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, 
unsigned int retries)
 */
clear_bit(PAGE_HEADLESS, >private);
free_z3fold_page(zhdr);
-   pool->pages_nr--;
+   atomic64_dec(>pages_nr);
spin_unlock(>lock);
return 0;
}  else if (!test_bit(PAGE_HEADLESS, >private)) {
@@ -676,12 +675,11 @@ static void z3fold_unmap(struct z3fold_pool *pool, 
unsigned long handle)
  * z3fold_get_pool_size() - gets the z3fold pool size in pages
  * @pool:  pool whose size is being queried
  *
- * Returns: size in pages of the given pool.  The pool lock need not be
- * taken to access pages_nr.
+ * Returns: size in pages of the given pool.
  */
 static u64 z3fold_get_pool_size(struct z3fold_pool *pool)
 {
-   return pool->pages_nr;
+   return atomic64_read(>pages_nr);
 }
 
 /*
-- 
2.4.2

1 2 3 >

1 - 100 of 284 matches

Mail list logo