[tip: ras/core] x86/mce: Add _ASM_EXTABLE_CPY for copy user access

2020-10-07 Thread tip-bot2 for Youquan Song
The following commit has been merged into the ras/core branch of tip:

Commit-ID: 278b917f8cb9b02923c15249f9d1a5769d2c1976
Gitweb:
https://git.kernel.org/tip/278b917f8cb9b02923c15249f9d1a5769d2c1976
Author:Youquan Song 
AuthorDate:Tue, 06 Oct 2020 14:09:07 -07:00
Committer: Borislav Petkov 
CommitterDate: Wed, 07 Oct 2020 11:19:11 +02:00

x86/mce: Add _ASM_EXTABLE_CPY for copy user access

_ASM_EXTABLE_UA is a general exception entry to record the exception fixup
for all exception spots between kernel and user space access.

To enable recovery from machine checks while coping data from user
addresses it is necessary to be able to distinguish the places that are
looping copying data from those that copy a single byte/word/etc.

Add a new macro _ASM_EXTABLE_CPY and use it in place of _ASM_EXTABLE_UA
in the copy functions.

Record the exception reason number to regs->ax at
ex_handler_uaccess which is used to check MCE triggered.

The new fixup routine ex_handler_copy() is almost an exact copy of
ex_handler_uaccess() The difference is that it sets regs->ax to the trap
number. Following patches use this to avoid trying to copy remaining
bytes from the tail of the copy and possibly hitting the poison again.

New mce.kflags bit MCE_IN_KERNEL_COPYIN will be used by mce_severity()
calculation to indicate that a machine check is recoverable because the
kernel was copying from user space.

Signed-off-by: Youquan Song 
Signed-off-by: Tony Luck 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20201006210910.21062-4-tony.l...@intel.com
---
 arch/x86/include/asm/asm.h  |  6 ++-
 arch/x86/include/asm/mce.h  | 15 ++-
 arch/x86/lib/copy_user_64.S | 96 ++--
 arch/x86/mm/extable.c   | 14 -
 4 files changed, 82 insertions(+), 49 deletions(-)

diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
index 5c15f95..0359cbb 100644
--- a/arch/x86/include/asm/asm.h
+++ b/arch/x86/include/asm/asm.h
@@ -135,6 +135,9 @@
 # define _ASM_EXTABLE_UA(from, to) \
_ASM_EXTABLE_HANDLE(from, to, ex_handler_uaccess)
 
+# define _ASM_EXTABLE_CPY(from, to)\
+   _ASM_EXTABLE_HANDLE(from, to, ex_handler_copy)
+
 # define _ASM_EXTABLE_FAULT(from, to)  \
_ASM_EXTABLE_HANDLE(from, to, ex_handler_fault)
 
@@ -160,6 +163,9 @@
 # define _ASM_EXTABLE_UA(from, to) \
_ASM_EXTABLE_HANDLE(from, to, ex_handler_uaccess)
 
+# define _ASM_EXTABLE_CPY(from, to)\
+   _ASM_EXTABLE_HANDLE(from, to, ex_handler_copy)
+
 # define _ASM_EXTABLE_FAULT(from, to)  \
_ASM_EXTABLE_HANDLE(from, to, ex_handler_fault)
 
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index ba2062d..a0f1478 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -136,9 +136,24 @@
 #defineMCE_HANDLED_NFITBIT_ULL(3)
 #defineMCE_HANDLED_EDACBIT_ULL(4)
 #defineMCE_HANDLED_MCELOG  BIT_ULL(5)
+
+/*
+ * Indicates an MCE which has happened in kernel space but from
+ * which the kernel can recover simply by executing fixup_exception()
+ * so that an error is returned to the caller of the function that
+ * hit the machine check.
+ */
 #define MCE_IN_KERNEL_RECOVBIT_ULL(6)
 
 /*
+ * Indicates an MCE that happened in kernel space while copying data
+ * from user. In this case fixup_exception() gets the kernel to the
+ * error exit for the copy function. Machine check handler can then
+ * treat it like a fault taken in user mode.
+ */
+#define MCE_IN_KERNEL_COPYIN   BIT_ULL(7)
+
+/*
  * This structure contains all data related to the MCE log.  Also
  * carries a signature to make it easier to find from external
  * debugging tools.  Each entry is only valid when its finished flag
diff --git a/arch/x86/lib/copy_user_64.S b/arch/x86/lib/copy_user_64.S
index 816f128..5b68e94 100644
--- a/arch/x86/lib/copy_user_64.S
+++ b/arch/x86/lib/copy_user_64.S
@@ -36,8 +36,8 @@
jmp .Lcopy_user_handle_tail
.previous
 
-   _ASM_EXTABLE_UA(100b, 103b)
-   _ASM_EXTABLE_UA(101b, 103b)
+   _ASM_EXTABLE_CPY(100b, 103b)
+   _ASM_EXTABLE_CPY(101b, 103b)
.endm
 
 /*
@@ -116,26 +116,26 @@ SYM_FUNC_START(copy_user_generic_unrolled)
 60:jmp .Lcopy_user_handle_tail /* ecx is zerorest also */
.previous
 
-   _ASM_EXTABLE_UA(1b, 30b)
-   _ASM_EXTABLE_UA(2b, 30b)
-   _ASM_EXTABLE_UA(3b, 30b)
-   _ASM_EXTABLE_UA(4b, 30b)
-   _ASM_EXTABLE_UA(5b, 30b)
-   _ASM_EXTABLE_UA(6b, 30b)
-   _ASM_EXTABLE_UA(7b, 30b)
-   _ASM_EXTABLE_UA(8b, 30b)
-   _ASM_EXTABLE_UA(9b, 30b)
-   _ASM_EXTABLE_UA(10b, 30b)
-   _ASM_EXTABLE_UA(11b, 30b)
-   _ASM_EXTABLE_UA(12b, 30b)
-   _ASM_EXTABLE_UA(13b, 30b)
-   _ASM_EXTABLE_UA(14b, 30b)
-   _ASM_EXTABLE_UA(15

[tip: ras/core] x86/mce: Pass pointer to saved pt_regs to severity calculation routines

2020-10-07 Thread tip-bot2 for Youquan Song
The following commit has been merged into the ras/core branch of tip:

Commit-ID: 41ce0564bfe2e129d56730418d8c0a9f9f2d31b5
Gitweb:
https://git.kernel.org/tip/41ce0564bfe2e129d56730418d8c0a9f9f2d31b5
Author:Youquan Song 
AuthorDate:Tue, 06 Oct 2020 14:09:05 -07:00
Committer: Borislav Petkov 
CommitterDate: Wed, 07 Oct 2020 10:51:42 +02:00

x86/mce: Pass pointer to saved pt_regs to severity calculation routines

New recovery features require additional information about processor
state when a machine check occurred. Pass pt_regs down to the routines
that need it.

No functional change.

Signed-off-by: Youquan Song 
Signed-off-by: Tony Luck 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20201006210910.21062-2-tony.l...@intel.com
---
 arch/x86/kernel/cpu/mce/core.c | 14 +++---
 arch/x86/kernel/cpu/mce/internal.h |  3 ++-
 arch/x86/kernel/cpu/mce/severity.c | 14 --
 3 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index b5b70f4..2d6caf0 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -807,7 +807,7 @@ log_it:
goto clear_it;
 
mce_read_aux(&m, i);
-   m.severity = mce_severity(&m, mca_cfg.tolerant, NULL, false);
+   m.severity = mce_severity(&m, NULL, mca_cfg.tolerant, NULL, 
false);
/*
 * Don't get the IP here because it's unlikely to
 * have anything to do with the actual error location.
@@ -856,7 +856,7 @@ static int mce_no_way_out(struct mce *m, char **msg, 
unsigned long *validp,
quirk_no_way_out(i, m, regs);
 
m->bank = i;
-   if (mce_severity(m, mca_cfg.tolerant, &tmp, true) >= 
MCE_PANIC_SEVERITY) {
+   if (mce_severity(m, regs, mca_cfg.tolerant, &tmp, true) >= 
MCE_PANIC_SEVERITY) {
mce_read_aux(m, i);
*msg = tmp;
return 1;
@@ -956,7 +956,7 @@ static void mce_reign(void)
 */
if (m && global_worst >= MCE_PANIC_SEVERITY && mca_cfg.tolerant < 3) {
/* call mce_severity() to get "msg" for panic */
-   mce_severity(m, mca_cfg.tolerant, &msg, true);
+   mce_severity(m, NULL, mca_cfg.tolerant, &msg, true);
mce_panic("Fatal machine check", m, msg);
}
 
@@ -1167,7 +1167,7 @@ static noinstr bool mce_check_crashing_cpu(void)
return false;
 }
 
-static void __mc_scan_banks(struct mce *m, struct mce *final,
+static void __mc_scan_banks(struct mce *m, struct pt_regs *regs, struct mce 
*final,
unsigned long *toclear, unsigned long *valid_banks,
int no_way_out, int *worst)
 {
@@ -1202,7 +1202,7 @@ static void __mc_scan_banks(struct mce *m, struct mce 
*final,
/* Set taint even when machine check was not enabled. */
add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
 
-   severity = mce_severity(m, cfg->tolerant, NULL, true);
+   severity = mce_severity(m, regs, cfg->tolerant, NULL, true);
 
/*
 * When machine check was for corrected/deferred handler don't
@@ -1354,7 +1354,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
order = mce_start(&no_way_out);
}
 
-   __mc_scan_banks(&m, final, toclear, valid_banks, no_way_out, &worst);
+   __mc_scan_banks(&m, regs, final, toclear, valid_banks, no_way_out, 
&worst);
 
if (!no_way_out)
mce_clear_state(toclear);
@@ -1376,7 +1376,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
 * make sure we have the right "msg".
 */
if (worst >= MCE_PANIC_SEVERITY && mca_cfg.tolerant < 3) {
-   mce_severity(&m, cfg->tolerant, &msg, true);
+   mce_severity(&m, regs, cfg->tolerant, &msg, true);
mce_panic("Local fatal machine check!", &m, msg);
}
}
diff --git a/arch/x86/kernel/cpu/mce/internal.h 
b/arch/x86/kernel/cpu/mce/internal.h
index b122610..88dcc79 100644
--- a/arch/x86/kernel/cpu/mce/internal.h
+++ b/arch/x86/kernel/cpu/mce/internal.h
@@ -38,7 +38,8 @@ int mce_gen_pool_add(struct mce *mce);
 int mce_gen_pool_init(void);
 struct llist_node *mce_gen_pool_prepare_records(void);
 
-extern int (*mce_severity)(struct mce *a, int tolerant, char **msg, bool 
is_excp);
+extern int (*mce_severity)(struct mce *a, struct pt_regs *regs,
+  int tolerant, char **msg, bool is_excp);
 struct dentry *mce_get_debugfs_dir(void);
 
 extern 

[PATCH 14/24] x86/speculation: Move firmware_restrict_branch_speculation_*() from C to CPP

2018-04-17 Thread Youquan Song
From: Ingo Molnar 

(cherry picked from commit d72f4e29e6d84b7ec02ae93088aa459ac70e733b)

firmware_restrict_branch_speculation_*() recently started using
preempt_enable()/disable(), but those are relatively high level
primitives and cause build failures on some 32-bit builds.

Since we want to keep  low level, convert
them to macros to avoid header hell...

Cc: David Woodhouse 
Cc: Thomas Gleixner 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: arjan.van.de@intel.com
Cc: b...@alien8.de
Cc: dave.han...@intel.com
Cc: jmatt...@google.com
Cc: karah...@amazon.de
Cc: k...@vger.kernel.org
Cc: pbonz...@redhat.com
Cc: rkrc...@redhat.com
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar 
Signed-off-by: Youquan Song  [v4.4 backport]
---
 arch/x86/include/asm/nospec-branch.h | 26 ++
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/nospec-branch.h 
b/arch/x86/include/asm/nospec-branch.h
index 27582aa..4675f65 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -214,20 +214,22 @@ static inline void 
indirect_branch_prediction_barrier(void)
 /*
  * With retpoline, we must use IBRS to restrict branch prediction
  * before calling into firmware.
+ *
+ * (Implemented as CPP macros due to header hell.)
  */
-static inline void firmware_restrict_branch_speculation_start(void)
-{
-   preempt_disable();
-   alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS,
- X86_FEATURE_USE_IBRS_FW);
-}
+#define firmware_restrict_branch_speculation_start()   \
+do {   \
+   preempt_disable();  \
+   alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS,   \
+ X86_FEATURE_USE_IBRS_FW); \
+} while (0)
 
-static inline void firmware_restrict_branch_speculation_end(void)
-{
-   alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,
- X86_FEATURE_USE_IBRS_FW);
-   preempt_enable();
-}
+#define firmware_restrict_branch_speculation_end() \
+do {   \
+   alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,\
+ X86_FEATURE_USE_IBRS_FW); \
+   preempt_enable();   \
+} while (0)
 
 #endif /* __ASSEMBLY__ */
 
-- 
1.8.3.1



[PATCH 14/24] x86/speculation: Move firmware_restrict_branch_speculation_*() from C to CPP

2018-04-16 Thread Youquan Song
From: Ingo Molnar 

(cherry picked from commit d72f4e29e6d84b7ec02ae93088aa459ac70e733b)

firmware_restrict_branch_speculation_*() recently started using
preempt_enable()/disable(), but those are relatively high level
primitives and cause build failures on some 32-bit builds.

Since we want to keep  low level, convert
them to macros to avoid header hell...

Cc: David Woodhouse 
Cc: Thomas Gleixner 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: arjan.van.de@intel.com
Cc: b...@alien8.de
Cc: dave.han...@intel.com
Cc: jmatt...@google.com
Cc: karah...@amazon.de
Cc: k...@vger.kernel.org
Cc: pbonz...@redhat.com
Cc: rkrc...@redhat.com
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar 
Signed-off-by: Youquan Song  [v4.4 backport]
---
 arch/x86/include/asm/nospec-branch.h | 26 ++
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/nospec-branch.h 
b/arch/x86/include/asm/nospec-branch.h
index 27582aa..4675f65 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -214,20 +214,22 @@ static inline void 
indirect_branch_prediction_barrier(void)
 /*
  * With retpoline, we must use IBRS to restrict branch prediction
  * before calling into firmware.
+ *
+ * (Implemented as CPP macros due to header hell.)
  */
-static inline void firmware_restrict_branch_speculation_start(void)
-{
-   preempt_disable();
-   alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS,
- X86_FEATURE_USE_IBRS_FW);
-}
+#define firmware_restrict_branch_speculation_start()   \
+do {   \
+   preempt_disable();  \
+   alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS,   \
+ X86_FEATURE_USE_IBRS_FW); \
+} while (0)
 
-static inline void firmware_restrict_branch_speculation_end(void)
-{
-   alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,
- X86_FEATURE_USE_IBRS_FW);
-   preempt_enable();
-}
+#define firmware_restrict_branch_speculation_end() \
+do {   \
+   alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,\
+ X86_FEATURE_USE_IBRS_FW); \
+   preempt_enable();   \
+} while (0)
 
 #endif /* __ASSEMBLY__ */
 
-- 
1.8.3.1



[PATCH 14/23] x86/speculation: Move firmware_restrict_branch_speculation_*() from C to CPP

2018-04-16 Thread Youquan Song
From: Ingo Molnar 

(cherry picked from commit d72f4e29e6d84b7ec02ae93088aa459ac70e733b)

firmware_restrict_branch_speculation_*() recently started using
preempt_enable()/disable(), but those are relatively high level
primitives and cause build failures on some 32-bit builds.

Since we want to keep  low level, convert
them to macros to avoid header hell...

Cc: David Woodhouse 
Cc: Thomas Gleixner 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: arjan.van.de@intel.com
Cc: b...@alien8.de
Cc: dave.han...@intel.com
Cc: jmatt...@google.com
Cc: karah...@amazon.de
Cc: k...@vger.kernel.org
Cc: pbonz...@redhat.com
Cc: rkrc...@redhat.com
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman 
[Youquan Song: port to 4.4]
Signed-off-by: Youquan Song 
---
 arch/x86/include/asm/nospec-branch.h | 26 ++
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/nospec-branch.h 
b/arch/x86/include/asm/nospec-branch.h
index 27582aa..4675f65 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -214,20 +214,22 @@ static inline void 
indirect_branch_prediction_barrier(void)
 /*
  * With retpoline, we must use IBRS to restrict branch prediction
  * before calling into firmware.
+ *
+ * (Implemented as CPP macros due to header hell.)
  */
-static inline void firmware_restrict_branch_speculation_start(void)
-{
-   preempt_disable();
-   alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS,
- X86_FEATURE_USE_IBRS_FW);
-}
+#define firmware_restrict_branch_speculation_start()   \
+do {   \
+   preempt_disable();  \
+   alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS,   \
+ X86_FEATURE_USE_IBRS_FW); \
+} while (0)
 
-static inline void firmware_restrict_branch_speculation_end(void)
-{
-   alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,
- X86_FEATURE_USE_IBRS_FW);
-   preempt_enable();
-}
+#define firmware_restrict_branch_speculation_end() \
+do {   \
+   alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,\
+ X86_FEATURE_USE_IBRS_FW); \
+   preempt_enable();   \
+} while (0)
 
 #endif /* __ASSEMBLY__ */
 
-- 
1.9.1



[PATCH 1/3] dmar: Fix domain id not update to newly create

2013-12-12 Thread Youquan Song
At domain_context_mapping_one(), if the domain is still not assign domain id,
it will assign a new domain_id for it, but the newly creating domain id is not
update to domain, so the domain will keep an unkown domain id.

It will cause the issues: like flush wrong domain in iommu->flush.flush_iotlb,
and free/release wrong domain.

Tested-by: Zhiyuan Zhou 
Signed-off-by: Youquan Song 
---
 drivers/iommu/intel-iommu.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 43b9bfe..9cd522f 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1625,6 +1625,7 @@ static int domain_context_mapping_one(struct dmar_domain 
*domain, int segment,
}
}
 
+   domain->id = id;
context_set_domain_id(context, id);
 
if (translation != CONTEXT_TT_PASS_THROUGH) {
-- 
1.7.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/3] dmar: Move the confuse comments to proper place

2013-12-12 Thread Youquan Song
the "found=1" should be "there are other device owned by the domain", the
comments is put at wrong place and make the code reviewing confuse, so move it
to the correct place.

Signed-off-by: Youquan Song 
---
 drivers/iommu/intel-iommu.c |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 9cd522f..aa821fc 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3813,10 +3813,6 @@ static void domain_remove_one_dev_info(struct 
dmar_domain *domain,
continue;
}
 
-   /* if there is no other devices under the same iommu
-* owned by this domain, clear this iommu in iommu_bmp
-* update iommu count and coherency
-*/
if (iommu == device_to_iommu(info->segment, info->bus,
info->devfn))
found = 1;
@@ -3824,6 +3820,10 @@ static void domain_remove_one_dev_info(struct 
dmar_domain *domain,
 
spin_unlock_irqrestore(&device_domain_lock, flags);
 
+   /* if there is no other devices under the same iommu
+* owned by this domain, clear this iommu in iommu_bmp
+* update iommu count and coherency
+*/
if (found == 0) {
unsigned long tmp_flags;
spin_lock_irqsave(&domain->iommu_lock, tmp_flags);
-- 
1.7.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/3] dmar: reduce loop to find multi-devices owned by IOMMU

2013-12-12 Thread Youquan Song
When try to find if the iommu owns other devices in the domain except the 
device will be moved. It will loop all devices under the domain if the removed
device is the first device in domain devices list.

This patch will improve it and it only loop before find the removed device and
 one of other device, so save the loop time and make the code more clear.

Signed-off-by: Youquan Song 
---
 drivers/iommu/intel-iommu.c |   15 ++-
 1 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index aa821fc..9f3bf3f 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3785,7 +3785,7 @@ static void domain_remove_one_dev_info(struct dmar_domain 
*domain,
struct device_domain_info *info, *tmp;
struct intel_iommu *iommu;
unsigned long flags;
-   int found = 0;
+   int found = 0, del = 0;
 
iommu = device_to_iommu(pci_domain_nr(pdev->bus), pdev->bus->number,
pdev->devfn);
@@ -3806,16 +3806,13 @@ static void domain_remove_one_dev_info(struct 
dmar_domain *domain,
free_devinfo_mem(info);
 
spin_lock_irqsave(&device_domain_lock, flags);
-
-   if (found)
-   break;
-   else
-   continue;
-   }
-
-   if (iommu == device_to_iommu(info->segment, info->bus,
+   del = 1;
+   } else if (iommu == device_to_iommu(info->segment, info->bus,
info->devfn))
found = 1;
+
+   if (found & del)
+   break;
}
 
spin_unlock_irqrestore(&device_domain_lock, flags);
-- 
1.7.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] dma: Add interface to calculate data transferred

2013-10-14 Thread Youquan Song
On Sun, Oct 13, 2013 at 08:56:33PM +0530, Vinod Koul wrote:
> On Fri, Oct 11, 2013 at 06:33:43AM -0700, Greg KH wrote:
> > On Fri, Oct 11, 2013 at 05:42:17PM -0400, Youquan Song wrote:
> > > Currently, the DMA channel calculates its data transferred only at network
> > > device driver. When other devices like UART or SPI etc, transfers data by 
> > > DMA 
> > > mode, but it always shows 0 at /sys/class/dma/dma0chan*/bytes_transferred.
> > 
> > Is that really a problem?  I have never heard anyone complaining about
> > it.  Where are the reports of this?
> Right, am not still getting the point on what is the problem that this series 
> is
> trying to fix..

The issue is that when I using UART to transfer data between to COMs
which using Designware DMA controller channel. But I check the specific
DMA channel by "cat /sys/class/dma/dma0chan3/bytes_transferred", but it
should all "0". I have transferred data by UART port, why its DMA
channel report "0" bytes transferred?  So I guess that it is possible
the DMA device driver issue or the data does not use the Designware DMA channel
fro transferred.  After check the code, I notice only when the DMA
channel used by network device driver and it will record how much data has been
 tranferred, why other device driver will not calculate it. Since DMA
channel is used by other device driver, why only network is specific?  since it 
is
common interface, the current /sys/class/dma/dma0chan*/bytes_transferred has
much possibility to mislead the user.


Thanks
-Youquan
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] dma: calculate the data tranferred by 8250

2013-10-11 Thread Youquan Song
When using UART transfers data by DMA mode, but it always shows 0 at 
/sys/class/dma/dma0chan*/bytes_transferred.

Call the new function to calculate how many the data has been transferred
 after doing it by DMA mode. 

Signed-off-by: Youquan Song 
---
 drivers/tty/serial/8250/8250_dma.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/tty/serial/8250/8250_dma.c 
b/drivers/tty/serial/8250/8250_dma.c
index 7046769..b22ef80 100644
--- a/drivers/tty/serial/8250/8250_dma.c
+++ b/drivers/tty/serial/8250/8250_dma.c
@@ -83,7 +83,7 @@ int serial8250_tx_dma(struct uart_8250_port *p)
desc->callback = __dma_tx_complete;
desc->callback_param = p;
 
-   dma->tx_cookie = dmaengine_submit(desc);
+   dma->tx_cookie = dma_tx_submit_cal(desc, dma->txchan, dma->tx_size);
 
dma_sync_single_for_device(dma->txchan->device->dev, dma->tx_addr,
   UART_XMIT_SIZE, DMA_TO_DEVICE);
-- 
1.7.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


DMA: Calculate how many data transferred by DMA

2013-10-11 Thread Youquan Song
Currently, the DMA channel calculates its data transferred only at network
device driver. When other devices like UART or SPI etc, transfers data by DMA
mode, but it always shows 0 at /sys/class/dma/dma0chan*/bytes_transferred.
It will possibly mislead user that the DMA engine does not work.

This patch add a new function which will calculate how many the data has been
transferred after doing it by DMA mode. It can be used by other modules and
also simplify current duplicated code.

Add the interface when UART transfer data by Designware DMA engine. It will
calculate the data already tranferred in the DMA channel.

If the patch work, I will add the interface to other modules when needed.  

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] dma: Add interface to calculate data transferred

2013-10-11 Thread Youquan Song
Currently, the DMA channel calculates its data transferred only at network
device driver. When other devices like UART or SPI etc, transfers data by DMA 
mode, but it always shows 0 at /sys/class/dma/dma0chan*/bytes_transferred.

This patch add a new function which will calculate how many the data has been
transferred after doing it by DMA mode. It can be used by other modules and
also simplify current duplicated code.

Signed-off-by: Youquan Song 
---
 drivers/dma/dmaengine.c   |   35 +++
 include/linux/dmaengine.h |3 +++
 2 files changed, 22 insertions(+), 16 deletions(-)

diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index 9162ac8..4356a7e 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -901,6 +901,23 @@ void dma_async_device_unregister(struct dma_device *device)
 }
 EXPORT_SYMBOL(dma_async_device_unregister);
 
+dma_cookie_t
+dma_tx_submit_cal(struct dma_async_tx_descriptor *tx,
+   struct dma_chan *chan, size_t len)
+{
+
+   dma_cookie_t cookie;
+   cookie = tx->tx_submit(tx);
+
+   preempt_disable();
+   __this_cpu_add(chan->local->bytes_transferred, len);
+   __this_cpu_inc(chan->local->memcpy_count);
+   preempt_enable();
+
+   return cookie;
+
+}
+
 /**
  * dma_async_memcpy_buf_to_buf - offloaded copy between virtual addresses
  * @chan: DMA channel to offload copy to
@@ -920,7 +937,6 @@ dma_async_memcpy_buf_to_buf(struct dma_chan *chan, void 
*dest,
struct dma_device *dev = chan->device;
struct dma_async_tx_descriptor *tx;
dma_addr_t dma_dest, dma_src;
-   dma_cookie_t cookie;
unsigned long flags;
 
dma_src = dma_map_single(dev->dev, src, len, DMA_TO_DEVICE);
@@ -937,14 +953,8 @@ dma_async_memcpy_buf_to_buf(struct dma_chan *chan, void 
*dest,
}
 
tx->callback = NULL;
-   cookie = tx->tx_submit(tx);
-
-   preempt_disable();
-   __this_cpu_add(chan->local->bytes_transferred, len);
-   __this_cpu_inc(chan->local->memcpy_count);
-   preempt_enable();
 
-   return cookie;
+   return dma_tx_submit_cal(tx, chan, len);
 }
 EXPORT_SYMBOL(dma_async_memcpy_buf_to_buf);
 
@@ -968,7 +978,6 @@ dma_async_memcpy_buf_to_pg(struct dma_chan *chan, struct 
page *page,
struct dma_device *dev = chan->device;
struct dma_async_tx_descriptor *tx;
dma_addr_t dma_dest, dma_src;
-   dma_cookie_t cookie;
unsigned long flags;
 
dma_src = dma_map_single(dev->dev, kdata, len, DMA_TO_DEVICE);
@@ -983,14 +992,8 @@ dma_async_memcpy_buf_to_pg(struct dma_chan *chan, struct 
page *page,
}
 
tx->callback = NULL;
-   cookie = tx->tx_submit(tx);
 
-   preempt_disable();
-   __this_cpu_add(chan->local->bytes_transferred, len);
-   __this_cpu_inc(chan->local->memcpy_count);
-   preempt_enable();
-
-   return cookie;
+   return dma_tx_submit_cal(tx, chan, len);
 }
 EXPORT_SYMBOL(dma_async_memcpy_buf_to_pg);
 
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index 0bc7275..0025f8e 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -1084,4 +1084,7 @@ dma_cookie_t dma_memcpy_pg_to_iovec(struct dma_chan 
*chan, struct iovec *iov,
struct dma_pinned_list *pinned_list, struct page *page,
unsigned int offset, size_t len);
 
+dma_cookie_t dma_tx_submit_cal(struct dma_async_tx_descriptor *tx,
+   struct dma_chan *chan, size_t len);
+
 #endif /* DMAENGINE_H */
-- 
1.7.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, apic: Enable x2APIC physical when cpu < 256 native

2013-08-17 Thread Youquan Song
> Firstly, please use the customary (multi-line) comment 
> style:
> 
>   /*
>* Comment .
>* .. goes here.
>*/
> 
> specified in Documentation/CodingStyle.
> 
> Secondly, please send a patch against a vanilla (e.g. 
> v3.11-rc5) kernel, as I've already zapped your previous 
> patch from tip:x86/apic per your request.
Hi Ingo,

latest vanilla has no includes the patch yet, so I think it
 will be fine by only dropping it from tip tree.

Thanks
-Youquan
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, apic: Enable x2APIC physical when cpu < 256 native

2013-08-16 Thread Youquan Song
> No problem - you might want to send another patch adding some comments to 
> the code, explaining why we don't switch to physical mode, quoting from 
> the SDM and so.

Here is the revert patch.

Subject: [PATCH] Revert "x86/apic: Enable x2APIC physical mode on native 
hardware too, when there are fewer than 256 CPUs"

x2APIC without interrupt remapping is not architecture and no guarantee it 
will work in future.
There are some words in SDM3, 10.12.7 Initialization by System
Software Routing of device interrupts to local APIC units operating in 
x2APIC mode requires use of the interrupt-remapping architecture 
specified in the Intel Virtualization Technology for Directed I/O, 
Revision 1.3. Because of this, BIOS must enumerate support for and 
software must enable this interrupt remapping with Extended Interrupt 
Mode Enabled before it enabling x2APIC mode in the local APIC units.

This reverts commit 3d1acb49d22fbbae96524040e9e2d4cbbb3adbef, do not use
x2apic_pysical mode if interrupt remapping is not enabled even at CPU
number fewer than 256.

Signed-off-by: Youquan Song 
---
 arch/x86/kernel/apic/apic.c |7 +--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index d9dd5a6..eca89c5 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1622,8 +1622,11 @@ void __init enable_IR_x2apic(void)
goto skip_x2apic;
 
if (ret < 0) {
-   /* IR is required if there is APIC ID > 255 */
-   if (max_physical_apicid > 255) {
+   /* IR is required if there is APIC ID > 255 even when running
+* under KVM
+*/
+   if (max_physical_apicid > 255 ||
+   !hypervisor_x2apic_available()) {
if (x2apic_preenabled)
disable_x2apic();
goto skip_x2apic;
-- 
1.6.4.2

 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, apic: Enable x2APIC physical when cpu < 256 native

2013-08-13 Thread Youquan Song
> In order to make sure the patch without involving unexpected issues beyond
> I can understand, I will confirm with our expert about it.
> 
> so please pend the patch going to mainline. If the patch can move on, I
> think I will also provide other patch changing, like direct EOI.

Hi Yinghai and Ingo,

I have confirmed with our experts about it. x2APIC without interrupt
remapping is not architecture and no guarantee it will work in future.

What's more, there are some words in SDM3, 
10.12.7 Initialization by System
Software Routing of device interrupts to local APIC units operating in
x2APIC mode requires use of the interrupt-remapping architecture
specified in the Intel Virtualization Technology for Directed I/O,
Revision 1.3. Because of this, BIOS must enumerate support for and
software must enable this interrupt remapping with Extended Interrupt
Mode Enabled before it enabling x2APIC mode in the local APIC units.

Ingo, please drop the patch in -tip tree.
3d1acb49d22fbbae96524040e9e2d4cbbb3adbef "x86/apic: Enable x2APIC
physical mode on native hardware too, when there are fewer than 256 
CPUs" 

Sorry for making fuss here and it is my fault. 

Thanks
-Youquan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: revert request for cpuidle patches e11538d1 and 69a37bea

2013-07-29 Thread Youquan Song
Hi Jeremy,

I try reproduce your result and then fix the issue, but I do not reproduce it
 yet.

I run at netperf-2.6.0 at one machine as server: netserver, other
machine: netperf -t TCP_RR -H $SERVER_IP -l 60. The target machine is
used in both client and server. I do not reproduce the performance drop
issue. I also notice the result is not stable, sometime it is high,
sometime is low. In sumarry, it is hard to make a definite result.

Can you try tell me how to reproduce the issue? how do you get the C0
data?

What's your config for kernel?  Do you enable CONFIG_NO_HZ_FULL=y or
only CONFIG_NO_HZ=y?


Thanks
-Youquan 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, apic: Enable x2APIC physical when cpu < 256 native

2013-07-28 Thread Youquan Song
> Yes. It would be great, if Youquan can point out where is the intel doc
> about the change.
> 
> Also if the patch can move on,  hypervisor_x2apic_available() related
> declaration and define
> could be dropped.

Hi Yinghai,

Sorry I do not know the document change but I also do not find the
words/description/explanation that x2APIC physical mode also need interrupt
 remapping support when CPU < 256. Of course, X2APIC cluster mode must
has interrupt remapping support. 

I have tested many machines, both old and most recent machines and from
desktop to server, x2APIC physical mode works without interrupt
remapping when CPU < 256.

In theory and real test, I do not find any issue about the patch.

In order to make sure the patch without involving unexpected issues beyond
I can understand, I will confirm with our expert about it.

so please pend the patch going to mainline. If the patch can move on, I
think I will also provide other patch changing, like direct EOI.

Thanks
-Youquan
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, apic: Enable x2APIC physical when cpu < 256 native

2013-07-28 Thread Youquan Song
> > Thanks Ingo!
> > The machines will be affected: CPU support x2APIC and CPU number < 256,
> > chipset does not support VT-d2 or VT-d is disabled in BIOS. 
> 
> I mean, can you guess what rough percentage of new systems 
> shipping (or significant number of older systems already 
> shipped) will be affected by this?
> 
> My feeling is that this should be relatively rare (only 
> when a user reconfigures the BIOS, etc.), but I might be 
> wrong.

Sorry. I do not know what percentage of system shipped be affected.
I have encountered one affected machine which CPU support x2APIC but its
BIOS not support VT-d (BIOS also has no item to enable it). After apply
the patch, it works with X2APIC physical mode.

Of course, most of machine affected are in the case of disable VT-d in BIOS
 by option or add intremap=off kernel option. 

>From what I understand, the x2APIC physical mode should be compatiable
with legacy mode when CPU < 256 without support interrupt remapping.

I have tested many machines, both old and most recent machines and from desktop
 to server, x2APIC physical mode works without interrupt remapping when CPU < 
256. 

Thanks
-Youquan
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/apic] x86/apic: Enable x2APIC physical mode on native hardware too, when there are fewer than 256 CPUs

2013-07-23 Thread tip-bot for Youquan Song
Commit-ID:  3d1acb49d22fbbae96524040e9e2d4cbbb3adbef
Gitweb: http://git.kernel.org/tip/3d1acb49d22fbbae96524040e9e2d4cbbb3adbef
Author: Youquan Song 
AuthorDate: Thu, 11 Jul 2013 21:22:39 -0400
Committer:  Ingo Molnar 
CommitDate: Tue, 23 Jul 2013 11:15:42 +0200

x86/apic: Enable x2APIC physical mode on native hardware too, when there are 
fewer than 256 CPUs

x2APIC extends APICID from 8 bits to 32 bits, but the device
interrupt routed from IOAPIC or delivered in MSI mode will keep
8 bits destination APICID.  In order to support x2APIC, the VT-d
interrupt remapping is introduced to translate the destination
APICID to 32 bits in x2APIC mode and keep the device compatible
in this way.

x2APIC support both logical and physical mode in destination
mode.

In logical destination mode, the 32 bits Logical APICID
has 2 sub-fields: 16 bits cluster ID and 16 bits logical ID within
the cluster and it is required VT-d interrupt remapping in x2APIC
cluster mode.

In physical destination mode, the 8 bits physical id is
compatible with 32  bits physical id when CPU number < 256.

When interrupt remapping initialization fails on platforms with
CPU number < 256, the current kernel only enables x2APIC physical
mode in virtualization environment, while we could also can enable
x2APIC physcial mode in native kernel this situation.

In this case the device interrupt will use 8 bits destination
APICID in physical mode and be compatible with x2APIC physical
when < 256 CPUs.

So we can benefit from x2APIC vs xAPIC MMIO:

 - x2APIC MSR read/write is faster than xAPIC mmio

 - x2APIC only ICR write to deliver interrupt without polling ICR deliver
   status bit and xAPIC need poll to read ICR deliver status bit.

 - x2APIC 64 bits ICR access instead of xAPIC two 32 bits access.

Signed-off-by: Youquan Song 
Cc: Youquan Song 
Cc: h...@linux.intel.com
Cc: ying...@kernel.org
Link: 
http://lkml.kernel.org/r/1373592159-459-1-git-send-email-youquan.s...@intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/kernel/apic/apic.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index eca89c5..d9dd5a6 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1622,11 +1622,8 @@ void __init enable_IR_x2apic(void)
goto skip_x2apic;
 
if (ret < 0) {
-   /* IR is required if there is APIC ID > 255 even when running
-* under KVM
-*/
-   if (max_physical_apicid > 255 ||
-   !hypervisor_x2apic_available()) {
+   /* IR is required if there is APIC ID > 255 */
+   if (max_physical_apicid > 255) {
if (x2apic_preenabled)
disable_x2apic();
goto skip_x2apic;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, apic: Enable x2APIC physical when cpu < 256 native

2013-07-23 Thread Youquan Song
On Tue, Jul 23, 2013 at 11:17:29AM +0200, Ingo Molnar wrote:
> 
> * Youquan Song  wrote:
> 
> > x2APIC extends APICID from 8 bits to 32 bits, but the device interrupt 
> > routed from IOAPIC or delivered in MSI mode will keep 8 bits destination 
> > APICID. In order to support x2APIC, the VT-d interrupt remapping is 
> > introduced to translate the destination APICID to 32 bits in x2APIC mode 
> > and keep the device compatible in this way.
> > 
> > x2APIC support both logical and physical mode in destination mode.  In 
> > logical destination mode, the 32 bits Logical APICID has 2 sub-fields:
> >  16 bits cluster ID and 16 bits logical ID within the cluster and it is 
> > required VT-d interrupt remapping in x2APIC cluster mode. In physical 
> > destination mode, the 8 bits physical id is compatible with 32 bits 
> > physical id when CPU number < 256. When interrupt remapping 
> > initialization fail on platform with CPU number < 256, current kernel 
> > only enables x2APIC physical mode in virutalization environment, while 
> > we also can enable x2APIC physcial mode in native kernel this situation, 
> > and the device interrupt will use 8 bits destination APICID in physical 
> > mode and be compatible with x2APIC physical when < 256 CPUs.
> >  
> > So we can benefit from x2APIC vs xAPIC MMIO:
> >  - x2APIC MSR read/write is faster than xAPIC mmio
> >  - x2APIC only ICR write to deliver interrupt without polling ICR deliver 
> >status bit and xAPIC need poll to read ICR deliver status bit.
> >  - x2APIC 64 bits ICR access instead of xAPIC two 32 bits access.
> 
> That looks interesting. How many systems are affected by this change in 
> practice? Have you tested it on affected hardware?

Thanks Ingo!
The machines will be affected: CPU support x2APIC and CPU number < 256,
chipset does not support VT-d2 or VT-d is disabled in BIOS. 

I have tested on one of affected hardware, it works.

Thanks
-Youquan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] x86, apic: Enable x2APIC physical when cpu < 256 native

2013-07-11 Thread Youquan Song
x2APIC extends APICID from 8 bits to 32 bits, but the device interrupt routed
from IOAPIC or delivered in MSI mode will keep 8 bits destination APICID. 
In order to support x2APIC, the VT-d interrupt remapping is introduced to
translate the destination APICID to 32 bits in x2APIC mode and keep the device
compatible in this way.

x2APIC support both logical and physical mode in destination mode.  
In logical destination mode, the 32 bits Logical APICID has 2 sub-fields: 
 16 bits cluster ID and 16 bits logical ID within the cluster and it is 
required VT-d interrupt remapping in x2APIC cluster mode.
In physical destination mode, the 8 bits physical id is compatible with 32 
bits physical id when CPU number < 256. 
When interrupt remapping initialization fail on platform with CPU number < 256, 
current kernel only enables x2APIC physical mode in virutalization environment,
while we also can enable x2APIC physcial mode in native kernel this situation,
and the device interrupt will use 8 bits destination APICID in physical mode
and be compatible with x2APIC physical when < 256 CPUs.
 
So we can benefit from x2APIC vs xAPIC MMIO:
 - x2APIC MSR read/write is faster than xAPIC mmio
 - x2APIC only ICR write to deliver interrupt without polling ICR deliver 
   status bit and xAPIC need poll to read ICR deliver status bit.
 - x2APIC 64 bits ICR access instead of xAPIC two 32 bits access.
  
Signed-off-by: Youquan Song 
---
 arch/x86/kernel/apic/apic.c |7 ++-
 1 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 904611b..51a065a 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1603,11 +1603,8 @@ void __init enable_IR_x2apic(void)
goto skip_x2apic;
 
if (ret < 0) {
-   /* IR is required if there is APIC ID > 255 even when running
-* under KVM
-*/
-   if (max_physical_apicid > 255 ||
-   !hypervisor_x2apic_available()) {
+   /* IR is required if there is APIC ID > 255 */
+   if (max_physical_apicid > 255) {
if (x2apic_preenabled)
disable_x2apic();
goto skip_x2apic;
-- 
1.7.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] ata: Fix DVD not dectected at some platform with Wellsburg PCH

2013-07-11 Thread Youquan Song
There is a patch b55f84e2d527182e7c611d466cd0bb6ddce201de "ata_piix: Fix DVD
 not dectected at some Haswell platforms" to fix an issue of DVD not 
recognized on Haswell Desktop platform with Lynx Point. 
Recently, it is also found the same issue at some platformas with Wellsburg PCH.

So deliver a similar patch to fix it by disables 32bit PIO in IDE mode.

Signed-off-by: Youquan Song 
Cc: sta...@vger.kernel.org
---
 drivers/ata/ata_piix.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/ata/ata_piix.c b/drivers/ata/ata_piix.c
index 9a8a674..424bcbe 100644
--- a/drivers/ata/ata_piix.c
+++ b/drivers/ata/ata_piix.c
@@ -330,7 +330,7 @@ static const struct pci_device_id piix_pci_tbl[] = {
/* SATA Controller IDE (Wellsburg) */
{ 0x8086, 0x8d00, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_sata_snb },
/* SATA Controller IDE (Wellsburg) */
-   { 0x8086, 0x8d08, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata },
+   { 0x8086, 0x8d08, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata_snb },
/* SATA Controller IDE (Wellsburg) */
{ 0x8086, 0x8d60, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_sata_snb },
/* SATA Controller IDE (Wellsburg) */
-- 
1.6.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] ata: Fix DVD not dectected at some platform with Wellsburg PCH

2013-07-02 Thread Youquan Song
There is a patch b55f84e2d527182e7c611d466cd0bb6ddce201de "ata_piix: Fix DVD
 not dectected at some Haswell platforms" to fix an issue of DVD not 
recognized on Haswell Desktop platform with Lynx Point. 
Recently, it is also found the same issue at some platformas with Wellsburg PCH.

So deliver a similar patch to fix it by disables 32bit PIO in IDE mode.

Signed-off-by: Youquan Song 
---
 drivers/ata/ata_piix.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/ata/ata_piix.c b/drivers/ata/ata_piix.c
index 9a8a674..424bcbe 100644
--- a/drivers/ata/ata_piix.c
+++ b/drivers/ata/ata_piix.c
@@ -330,7 +330,7 @@ static const struct pci_device_id piix_pci_tbl[] = {
/* SATA Controller IDE (Wellsburg) */
{ 0x8086, 0x8d00, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_sata_snb },
/* SATA Controller IDE (Wellsburg) */
-   { 0x8086, 0x8d08, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata },
+   { 0x8086, 0x8d08, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata_snb },
/* SATA Controller IDE (Wellsburg) */
{ 0x8086, 0x8d60, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_sata_snb },
/* SATA Controller IDE (Wellsburg) */
-- 
1.6.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] ata: Fix DVD not dectected at some platform with Wellsburg PCH

2013-06-27 Thread Youquan Song
There is a patch b55f84e2d527182e7c611d466cd0bb6ddce201de "ata_piix: Fix DVD
 not dectected at some Haswell platforms" to fix an issue of DVD not 
recognized on Haswell Desktop platform with Lynx Point. 
Recently, it is also found the same issue at some platformas with Wellsburg PCH.

So deliver a similar patch to fix it by disables 32bit PIO in IDE mode.

Signed-off-by: Youquan Song 
---
 drivers/ata/ata_piix.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/ata/ata_piix.c b/drivers/ata/ata_piix.c
index 9a8a674..424bcbe 100644
--- a/drivers/ata/ata_piix.c
+++ b/drivers/ata/ata_piix.c
@@ -330,7 +330,7 @@ static const struct pci_device_id piix_pci_tbl[] = {
/* SATA Controller IDE (Wellsburg) */
{ 0x8086, 0x8d00, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_sata_snb },
/* SATA Controller IDE (Wellsburg) */
-   { 0x8086, 0x8d08, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata },
+   { 0x8086, 0x8d08, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata_snb },
/* SATA Controller IDE (Wellsburg) */
{ 0x8086, 0x8d60, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_sata_snb },
/* SATA Controller IDE (Wellsburg) */
-- 
1.6.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cpu hotplug: possible_cpus broken (again?) next-20130607

2013-06-12 Thread Youquan Song
> > Interesting, you are changing long standing meaning of maxcpus=
> > 
> > We always use maxcpus=1 to have one cpu up, and later in user space
> > to online other cpus like
> > echo 1 > /sys/devices/system/cpuX/online.
> > 
> > aka maxcpus= is a soft limit or initial online nr.
> > 
> > we already have nr_cpus= for hard limit.
> > 
> > So need to drop
> >  commit 3e275a5ba367ab74b3a4e49114307baed989fcac
> >  Author: Youquan Song 
> >  Date:   Fri Jun 7 10:07:08 2013 +1000
> > 
> >  drivers/base/cpu.c: fix maxcpus boot option
> 
> Agreed.

Yes. I also agree to drop it and the fix need more consideration.
I try use maxcpus to limit cpu number to debug a well known applition 
because it fail to run when cpu number is larger to > 69. 
When I use maxcpus at to limit the boot CPUs number, but udev will 
enable all of the CPUs at 3.10 kernel automatically. 
I also try maxcpus at 3.0 kernel, it does not show the maxcpus issue. 
I have digged out recently, it is the commit at 3.2 kernel
8a25a2fd126c621f44f3aeaef80d51f00fc11639 "cpu: convert 'cpu' and
'machinecheck' sysdev_class to a regular subsystem" result in udev
automatically enable all of CPUs though maxcpus has been provided.

So the next, I need look at udev try to enable all of CPUs though
maxcpus provided.  Possibly, it can also fix it in udev daemon.

Secondly, I think that the maxcpus= option description is too confused in
Documentation/kernel-parameters.txt. The maxcpus and nr_cpus option need
switch their name.
Currently:

maxcpus=[SMP] Maximum number of processors that an SMP kernel
should make use of.  maxcpus=n : n >= 0 limits the
kernel to using 'n' processors.  n=0 is a special case,
it is equivalent to "nosmp", which also disables
the IO APIC.

How about change to 

maxcpus=[SMP] Maximum number of processors that an SMP kernel
bring up during booting.  maxcpus=n : n >= 0 limits the
kernel to using 'n' processors.  n=0 is a special case,
it is equivalent to "nosmp", which also disables
the IO APIC.


Thanks
-Youquan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cpu hotplug: possible_cpus broken (again?) next-20130607

2013-06-11 Thread Youquan Song
> On 06/12/2013 05:03 AM, Youquan Song wrote:
> > +#ifdef CONFIG_SMP
> > +  /* return when cpu number greater than maximum number of
> > CPUs */
> > +   if (setup_max_cpus <= num_online_cpus() + 1) {
> > +   cpu_hotplug_driver_unlock();
> > +   return -EINVAL;
> > +   }
> > +#endif
> > from_nid = cpu_to_node(cpuid);
> > ret = cpu_up(cpuid);
> 
> Your patch is line-wrapped.
> 
> Also, the #ifdef is unnecessary.  If CONFIG_SMP is off:
> 
>   static const unsigned int setup_max_cpus = NR_CPUS;
>   #define num_online_cpus() 1U
> 
> The compiler will take care of optimizing out the the if() without the
> explicit #ifdef.
> 
> Also, the +1 looks goofy to me.  Doesn't this do the same thing (and
> isn't it much easier to read)?
> 
>   if (num_online_cpus() >= setup_max_cpus)
> 

Thanks. Here is a formal patch for it. please review and try.

Subject: [PATCH] core: Fix maxcpus boot option broken

maxcpus boot option to limit maximum number of CPUs on system, but this option
is broken at recent kernel. Though we use maxcpus to limit CPUs number, but
current kernel will register all of present CPUs in sysfs.
udev will enumerate all registered cpu at sysfs, and it will bring up the CPU
if the CPU is offline. So the maxcpus option is broken.

This patch will limit the online cpus number not over limitation of maxcpus
option. So it will keep the maxcpus limitation when udev enumeration
or other intention of bring up CPUs over the limitation by method like  
echo 1 > /sys/devices/system/cpu/online 

Signed-off-by: Youquan Song 
---
diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index 3d48fc8..e32fffa 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -60,6 +60,13 @@ static ssize_t __ref store_online(struct device *dev,
kobject_uevent(&dev->kobj, KOBJ_OFFLINE);
break;
case '1':
+  /* Return when online cpu number equal or greater than
+   *  maximum number of CPUs */
+   if (num_online_cpus() >= setup_max_cpus) {
+   cpu_hotplug_driver_unlock();
+   return -EINVAL;
+   }
+
from_nid = cpu_to_node(cpuid);
ret = cpu_up(cpuid);
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cpu hotplug: possible_cpus broken (again?) next-20130607

2013-06-11 Thread Youquan Song
On Tue, Jun 11, 2013 at 04:32:34PM -0600, Toshi Kani wrote:
> On Wed, 2013-06-12 at 00:34 +0200, Rafael J. Wysocki wrote:
> > On Tuesday, June 11, 2013 03:17:28 PM Dave Hansen wrote:
> > > On 06/11/2013 03:05 PM, Rafael J. Wysocki wrote:
> > > > On Tuesday, June 11, 2013 02:51:33 PM Dave Hansen wrote:
> > > >> possible_cpus looks broken again.  I'm booting with:
> > > >>
> > > >>  maxcpus=10 possible_cpus=160
> > > >>
> > > >> But I only get 0-9 in sysfs:
> > > >>
> > > >>> # ls /sys/devices/system/cpu/
> > > >>> cpu0  cpu2  cpu4  cpu6  cpu8  cpufreq  kernel_max  offline  possible  
> > > >>> probeuevent
> > > >>> cpu1  cpu3  cpu5  cpu7  cpu9  cpuidle  modaliasonline   present   
> > > >>> release
> > > > 
> > > > Can you please test the acpi-hotplug branch of the linux-pm.git tree?
> > > 
> > > That branch seems to work happily.
> > 
> > In that case the problem may have been reintroduced by a merge conflict fix 
> > in
> > linux-next.
> 
> I believe the problem was introduced by the following change.  From the
> description, though, this is exactly what this patch was trying to
> change...  Adding Youguan to the list.
> 
> commit 3e275a5ba367ab74b3a4e49114307baed989fcac
> Author: Youquan Song 
> Date:   Fri Jun 7 10:07:08 2013 +1000
> 
> drivers/base/cpu.c: fix maxcpus boot option
> 
Hi Toshi,

Thanks Thoshi for the information.
please try the below patch to fix the issue by moving the code to
store_online.

diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index 3d48fc8..2378f42 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -60,6 +60,13 @@ static ssize_t __ref store_online(struct device *dev,
kobject_uevent(&dev->kobj, KOBJ_OFFLINE);
break;
case '1':
+#ifdef CONFIG_SMP
+  /* return when cpu number greater than maximum number of
CPUs */
+   if (setup_max_cpus <= num_online_cpus() + 1) {
+   cpu_hotplug_driver_unlock();
+   return -EINVAL;
+   }
+#endif
from_nid = cpu_to_node(cpuid);
ret = cpu_up(cpuid);

Thanks
-Youquan

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] core: Fix maxcpus boot option broken

2013-05-29 Thread Youquan Song
maxcpus boot option to limit maximum number of CPUs on system, but this option
is broken at recent kernel. Though we use maxcpus to limit CPUs number, but
current kernel will register all of present CPUs in sysfs.
udev will enumerate all registered cpu at sysfs, and it will bring up the CPU
if the CPU is offline. So the maxcpus option is broken.

This patch will only register the CPU which is not over limitation of maxcpus 
option in sysfs. So it will keep the maxcpus limitation when udev enumeration
or other intention of bring up CPUs over the limitation by method like  
echo 1 > /sys/devices/system/cpu/online 

Signed-off-by: Youquan Song 
---
 drivers/base/cpu.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index 3d48fc8..c7d603a 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -272,6 +272,10 @@ int __cpuinit register_cpu(struct cpu *cpu, int num)
 {
int error;
 
+   /* return when cpu number greater than maximum number of CPUs */
+   if (num >= setup_max_cpus)
+   return 0;
+
cpu->node_id = cpu_to_node(num);
memset(&cpu->dev, 0x00, sizeof(struct device));
cpu->dev.id = num;
-- 
1.7.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] ata: Fix DVD not dectected at some Haswell platforms

2013-03-24 Thread Youquan Song
> 
> Can you look at the patch which required by some Haswell platforms?
> 
Hi Jeff,

What's your opinion about the patch? It block the installation on some
new platforms.

Thanks
-Youquan 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] perf: Fix parameter type mismatch

2013-03-20 Thread Youquan Song
When build the tools/perf, encounter a block issue:
cc1: warnings being treated as errors
util/scripting-engines/trace-event-perl.c: In function 
‘perl_process_tracepoint’:
util/scripting-engines/trace-event-perl.c:285: error: format ‘%lu’ expects 
type ‘long unsigned int’, but argument 2 has type ‘__u64’
make: *** [util/scripting-engines/trace-event-perl.o] Error 1

Signed-off-by: Youquan Song 
---
 .../perf/util/scripting-engines/trace-event-perl.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/tools/perf/util/scripting-engines/trace-event-perl.c 
b/tools/perf/util/scripting-engines/trace-event-perl.c
index f80605e..b2b3bdb 100644
--- a/tools/perf/util/scripting-engines/trace-event-perl.c
+++ b/tools/perf/util/scripting-engines/trace-event-perl.c
@@ -282,7 +282,7 @@ static void perl_process_tracepoint(union perf_event 
*perf_event __maybe_unused,
 
event = find_cache_event(evsel);
if (!event)
-   die("ug! no event found for type %" PRIu64, evsel->attr.config);
+   die("ug! no event found for type %" PRIu64, 
(u64)(evsel->attr.config));
 
pid = raw_field_value(event, "common_pid", data);
 
-- 
1.7.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86,apic: Blacklist x2APIC on some platforms

2013-03-18 Thread Youquan Song
> 
> I found this patch after some googling and for the record, it makes my
> W520 boot with VT-d enabled and the discrete NVidia card.
> Is it still being considered?
> 

Yes.  I am still in pushing the patch to upstream. The patch is good and 
reviewed
by Yinghai but it depends on Yinghai's patch which is not upstream now.
http://git.kernel.org/cgit/linux/kernel/git/yinghai/linux-yinghai.git/diff/?id=de38757e964cfee20e6da1977572a2191d7f4aa0

Refer to https://bugzilla.kernel.org/show_bug.cgi?id=43054

Peter, will you take it?

Thanks
-Youquan

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] ata: Fix DVD not dectected at some Haswell platforms

2013-03-13 Thread Youquan Song
Hi Maintainer,

Can you look at the patch which required by some Haswell platforms?

Thanks
-Youquan

On Wed, Mar 06, 2013 at 10:49:05AM -0500, Youquan Song wrote:
> There is a quirk patch 5e5a4f5d5a08c9c504fe956391ac3dae2c66556d 
> "ata_piix: make DVD Drive recognisable on systems with Intel Sandybridge
>  chipsets(v2)" fixing the 4 ports IDE controller 32bit PIO mode. 
> 
> We've hit a problem with DVD not recognized on Haswell Desktop platform which
> includes Lynx Point 2-port SATA controller.
> 
> This quirk patch disables 32bit PIO on this controller in IDE mode.
> 
> v2: Change spelling error in statememnt pointed by Sergei Shtylyov.
> v3: Change comment statememnt and spliting line over 80 characters pointed by
> Libor Pechacek and also rebase the patch against 3.8-rc7 kernel.
> 
> Tested-by: Lee, Chun-Yi 
> Signed-off-by: Youquan Song 
> Cc: sta...@vger.kernel.org
> ---
>  drivers/ata/ata_piix.c |   14 +-
>  1 files changed, 13 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/ata/ata_piix.c b/drivers/ata/ata_piix.c
> index 174eca6..4aab550 100644
> --- a/drivers/ata/ata_piix.c
> +++ b/drivers/ata/ata_piix.c
> @@ -150,6 +150,7 @@ enum piix_controller_ids {
>   tolapai_sata,
>   piix_pata_vmw,  /* PIIX4 for VMware, spurious DMA_ERR */
>   ich8_sata_snb,
> + ich8_2port_sata_snb,
>  };
>  
>  struct piix_map_db {
> @@ -304,7 +305,7 @@ static const struct pci_device_id piix_pci_tbl[] = {
>   /* SATA Controller IDE (Lynx Point) */
>   { 0x8086, 0x8c01, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_sata_snb },
>   /* SATA Controller IDE (Lynx Point) */
> - { 0x8086, 0x8c08, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata },
> + { 0x8086, 0x8c08, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata_snb },
>   /* SATA Controller IDE (Lynx Point) */
>   { 0x8086, 0x8c09, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata },
>   /* SATA Controller IDE (Lynx Point-LP) */
> @@ -422,6 +423,7 @@ static const struct piix_map_db *piix_map_db_table[] = {
>   [ich8m_apple_sata]  = &ich8m_apple_map_db,
>   [tolapai_sata]  = &tolapai_map_db,
>   [ich8_sata_snb] = &ich8_map_db,
> + [ich8_2port_sata_snb]   = &ich8_2port_map_db,
>  };
>  
>  static struct pci_bits piix_enable_bits[] = {
> @@ -1225,6 +1227,16 @@ static struct ata_port_info piix_port_info[] = {
>   .udma_mask  = ATA_UDMA6,
>   .port_ops   = &piix_sata_ops,
>   },
> +
> + [ich8_2port_sata_snb] =
> + {
> + .flags  = PIIX_SATA_FLAGS | PIIX_FLAG_SIDPR
> + | PIIX_FLAG_PIO16,
> + .pio_mask   = ATA_PIO4,
> + .mwdma_mask = ATA_MWDMA2,
> + .udma_mask  = ATA_UDMA6,
> + .port_ops   = &piix_sata_ops,
> + },
>  };
>  
>  #define AHCI_PCI_BAR 5
> -- 
> 1.7.7.4
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3] ata: Fix DVD not dectected at some Haswell platforms

2013-03-05 Thread Youquan Song
There is a quirk patch 5e5a4f5d5a08c9c504fe956391ac3dae2c66556d 
"ata_piix: make DVD Drive recognisable on systems with Intel Sandybridge
 chipsets(v2)" fixing the 4 ports IDE controller 32bit PIO mode. 

We've hit a problem with DVD not recognized on Haswell Desktop platform which
includes Lynx Point 2-port SATA controller.

This quirk patch disables 32bit PIO on this controller in IDE mode.

v2: Change spelling error in statememnt pointed by Sergei Shtylyov.
v3: Change comment statememnt and spliting line over 80 characters pointed by
Libor Pechacek and also rebase the patch against 3.8-rc7 kernel.

Tested-by: Lee, Chun-Yi 
Signed-off-by: Youquan Song 
Cc: sta...@vger.kernel.org
---
 drivers/ata/ata_piix.c |   14 +-
 1 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/drivers/ata/ata_piix.c b/drivers/ata/ata_piix.c
index 174eca6..4aab550 100644
--- a/drivers/ata/ata_piix.c
+++ b/drivers/ata/ata_piix.c
@@ -150,6 +150,7 @@ enum piix_controller_ids {
tolapai_sata,
piix_pata_vmw,  /* PIIX4 for VMware, spurious DMA_ERR */
ich8_sata_snb,
+   ich8_2port_sata_snb,
 };
 
 struct piix_map_db {
@@ -304,7 +305,7 @@ static const struct pci_device_id piix_pci_tbl[] = {
/* SATA Controller IDE (Lynx Point) */
{ 0x8086, 0x8c01, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_sata_snb },
/* SATA Controller IDE (Lynx Point) */
-   { 0x8086, 0x8c08, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata },
+   { 0x8086, 0x8c08, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata_snb },
/* SATA Controller IDE (Lynx Point) */
{ 0x8086, 0x8c09, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata },
/* SATA Controller IDE (Lynx Point-LP) */
@@ -422,6 +423,7 @@ static const struct piix_map_db *piix_map_db_table[] = {
[ich8m_apple_sata]  = &ich8m_apple_map_db,
[tolapai_sata]  = &tolapai_map_db,
[ich8_sata_snb] = &ich8_map_db,
+   [ich8_2port_sata_snb]   = &ich8_2port_map_db,
 };
 
 static struct pci_bits piix_enable_bits[] = {
@@ -1225,6 +1227,16 @@ static struct ata_port_info piix_port_info[] = {
.udma_mask  = ATA_UDMA6,
.port_ops   = &piix_sata_ops,
},
+
+   [ich8_2port_sata_snb] =
+   {
+   .flags  = PIIX_SATA_FLAGS | PIIX_FLAG_SIDPR
+   | PIIX_FLAG_PIO16,
+   .pio_mask   = ATA_PIO4,
+   .mwdma_mask = ATA_MWDMA2,
+   .udma_mask  = ATA_UDMA6,
+   .port_ops   = &piix_sata_ops,
+   },
 };
 
 #define AHCI_PCI_BAR 5
-- 
1.7.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] ata: Fix DVD not dectected at some Haswell platforms

2013-03-04 Thread Youquan Song
Hi Maintainer,

Can you take the patch which is needed by some new platforms?

Thanks
-Youquan

On Mon, Feb 18, 2013 at 11:00:55AM -0500, Youquan Song wrote:
> There is a quirk patch 5e5a4f5d5a08c9c504fe956391ac3dae2c66556d 
> "ata_piix: make DVD Drive recognisable on systems with Intel Sandybridge
>  chipsets(v2)" fixing the 4 ports IDE controller 32bit PIO mode. 
> 
> We've hit a problem with DVD not recognized on Haswell Desktop platform which
> includes Lynx Point 2-port SATA controller.
> 
> This quirk patch disables 32bit PIO on this controller in IDE mode.
> 
> v2: Change spelling error in statememnt pointed by Sergei Shtylyov.
> v3: Change comment statememnt and spliting line over 80 characters pointed by
> Libor Pechacek and also rebase the patch against 3.8-rc7 kernel.
> 
> Tested-by: Lee, Chun-Yi 
> Signed-off-by: Youquan Song 
> Cc: sta...@vger.kernel.org
> ---
>  drivers/ata/ata_piix.c |   14 +-
>  1 files changed, 13 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/ata/ata_piix.c b/drivers/ata/ata_piix.c
> index 174eca6..4aab550 100644
> --- a/drivers/ata/ata_piix.c
> +++ b/drivers/ata/ata_piix.c
> @@ -150,6 +150,7 @@ enum piix_controller_ids {
>   tolapai_sata,
>   piix_pata_vmw,  /* PIIX4 for VMware, spurious DMA_ERR */
>   ich8_sata_snb,
> + ich8_2port_sata_snb,
>  };
>  
>  struct piix_map_db {
> @@ -304,7 +305,7 @@ static const struct pci_device_id piix_pci_tbl[] = {
>   /* SATA Controller IDE (Lynx Point) */
>   { 0x8086, 0x8c01, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_sata_snb },
>   /* SATA Controller IDE (Lynx Point) */
> - { 0x8086, 0x8c08, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata },
> + { 0x8086, 0x8c08, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata_snb },
>   /* SATA Controller IDE (Lynx Point) */
>   { 0x8086, 0x8c09, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata },
>   /* SATA Controller IDE (Lynx Point-LP) */
> @@ -422,6 +423,7 @@ static const struct piix_map_db *piix_map_db_table[] = {
>   [ich8m_apple_sata]  = &ich8m_apple_map_db,
>   [tolapai_sata]  = &tolapai_map_db,
>   [ich8_sata_snb] = &ich8_map_db,
> + [ich8_2port_sata_snb]   = &ich8_2port_map_db,
>  };
>  
>  static struct pci_bits piix_enable_bits[] = {
> @@ -1225,6 +1227,16 @@ static struct ata_port_info piix_port_info[] = {
>   .udma_mask  = ATA_UDMA6,
>   .port_ops   = &piix_sata_ops,
>   },
> +
> + [ich8_2port_sata_snb] =
> + {
> + .flags  = PIIX_SATA_FLAGS | PIIX_FLAG_SIDPR
> + | PIIX_FLAG_PIO16,
> + .pio_mask   = ATA_PIO4,
> + .mwdma_mask = ATA_MWDMA2,
> + .udma_mask  = ATA_UDMA6,
> + .port_ops   = &piix_sata_ops,
> + },
>  };
>  
>  #define AHCI_PCI_BAR 5
> -- 
> 1.7.7.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] ata: Fix DVD not dectected at some Haswell platforms

2013-02-17 Thread Youquan Song
> 
> As to my understanding Sergei did not suggest citing the whole commit message.
> I also find the numerous references to Sandy Bridge confusing as this is a fix
> for Lynx Point chipset.
> 
> How about rephrasing the commit message in a way similar to the following one?
> --8<-
> We've hit a problem with DVD not recognized on Haswell Desktop platform which
> includes Lynx Point 2-port SATA controller.  This quirk patch disables 32bit
> PIO on the controller in IDE mode.
> -->8-
Thanks Libor!
I have updated the comments and sent out a v3 patch out to LKML. 


> > +   .flags  = PIIX_SATA_FLAGS | PIIX_FLAG_SIDPR | 
> > PIIX_FLAG_PIO16,
> 
> The line might be worth splitting as it's over 80 characters.
> 
> Otherwise the patch looks OK to me.
> 
Also change it in v3 patch.

Thanks
-Youquan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3] ata: Fix DVD not dectected at some Haswell platforms

2013-02-17 Thread Youquan Song
There is a quirk patch 5e5a4f5d5a08c9c504fe956391ac3dae2c66556d 
"ata_piix: make DVD Drive recognisable on systems with Intel Sandybridge
 chipsets(v2)" fixing the 4 ports IDE controller 32bit PIO mode. 

We've hit a problem with DVD not recognized on Haswell Desktop platform which
includes Lynx Point 2-port SATA controller.

This quirk patch disables 32bit PIO on this controller in IDE mode.

v2: Change spelling error in statememnt pointed by Sergei Shtylyov.
v3: Change comment statememnt and spliting line over 80 characters pointed by
Libor Pechacek and also rebase the patch against 3.8-rc7 kernel.

Tested-by: Lee, Chun-Yi 
Signed-off-by: Youquan Song 
Cc: sta...@vger.kernel.org
---
 drivers/ata/ata_piix.c |   14 +-
 1 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/drivers/ata/ata_piix.c b/drivers/ata/ata_piix.c
index 174eca6..4aab550 100644
--- a/drivers/ata/ata_piix.c
+++ b/drivers/ata/ata_piix.c
@@ -150,6 +150,7 @@ enum piix_controller_ids {
tolapai_sata,
piix_pata_vmw,  /* PIIX4 for VMware, spurious DMA_ERR */
ich8_sata_snb,
+   ich8_2port_sata_snb,
 };
 
 struct piix_map_db {
@@ -304,7 +305,7 @@ static const struct pci_device_id piix_pci_tbl[] = {
/* SATA Controller IDE (Lynx Point) */
{ 0x8086, 0x8c01, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_sata_snb },
/* SATA Controller IDE (Lynx Point) */
-   { 0x8086, 0x8c08, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata },
+   { 0x8086, 0x8c08, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata_snb },
/* SATA Controller IDE (Lynx Point) */
{ 0x8086, 0x8c09, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata },
/* SATA Controller IDE (Lynx Point-LP) */
@@ -422,6 +423,7 @@ static const struct piix_map_db *piix_map_db_table[] = {
[ich8m_apple_sata]  = &ich8m_apple_map_db,
[tolapai_sata]  = &tolapai_map_db,
[ich8_sata_snb] = &ich8_map_db,
+   [ich8_2port_sata_snb]   = &ich8_2port_map_db,
 };
 
 static struct pci_bits piix_enable_bits[] = {
@@ -1225,6 +1227,16 @@ static struct ata_port_info piix_port_info[] = {
.udma_mask  = ATA_UDMA6,
.port_ops   = &piix_sata_ops,
},
+
+   [ich8_2port_sata_snb] =
+   {
+   .flags  = PIIX_SATA_FLAGS | PIIX_FLAG_SIDPR
+   | PIIX_FLAG_PIO16,
+   .pio_mask   = ATA_PIO4,
+   .mwdma_mask = ATA_MWDMA2,
+   .udma_mask  = ATA_UDMA6,
+   .port_ops   = &piix_sata_ops,
+   },
 };
 
 #define AHCI_PCI_BAR 5
-- 
1.7.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ata: Fix DVD not dectected at some Haswell platforms

2013-01-31 Thread Youquan Song
>> +{ 0x8086, 0x8c08, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata_snb },
>>  /* SATA Controller IDE (Lynx Point) */
>>  { 0x8086, 0x8c09, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata },
>
>Also, are you sure this one and the following Lynx Point controllers are 
> not affected?

I am not sure. the 0x8c09 is possibly used on mobile PC not desktop. On
one of my machine, it includes the chipset but the 2 ports IDE controller is
not extend out for use. There are only 2 ports extended out from 4 ports
IDE controller.  So I can not verify it. I think, the notebook/mobile PC does
 not require to extends out all of the IDE ports. 

This patch only fixs the 0x8c08 2 ports IDE controller for it block the
installation. If there is an issue reporting from 0x8c09, we can fix it
late.

Thanks
-Youuquan

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ata: Fix DVD not dectected at some Haswell platforms

2013-01-31 Thread Youquan Song
> On 30-01-2013 21:19, Youquan Song wrote:
>
>> There is a quirk patch 5e5a4f5d5a08c9c504fe956391ac3dae2c66556d
>
>   Please also specify the summary of that patch in parens.
>
>> fix the 4 ports
>
>s/fix/fixing/
>
>> IDE controller 32bit PIO mode.
>> Recently, the problem was showed
>
>s/showed/shown/
>
>> at Haswell platform which includes 2 ports IDE controller.
>
>> So introduce a qurik
>
>Quirk.
>
>> patch to disable 32bit PIO at this IDE controller.
>
>s/at/on/
>
>> Signed-off-by: Youquan Song 
>
> MBR, Sergei

Thanks a lot! I have sent out a fixing patch for it.

-Youquan

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] ata: Fix DVD not dectected at some Haswell platforms

2013-01-31 Thread Youquan Song
There is a quirk patch 5e5a4f5d5a08c9c504fe956391ac3dae2c66556d 
"ata_piix: make DVD Drive recognisable on systems with Intel Sandybridge
 chipsets(v2)

This quirk patch fixes one kind of bug inside some Intel Sandybridge
chipsets, see reports from

   https://bugzilla.kernel.org/show_bug.cgi?id=40592.

Many guys also have reported the problem before:

https://bugs.launchpad.net/bugs/737388
https://bugs.launchpad.net/bugs/794642
https://bugs.launchpad.net/bugs/782389
..

With help from Tejun, the problem is found to be caused by 32bit PIO
mode, so introduce the quirk patch to disable 32bit PIO on SATA piix
for some Sandybridge CPT chipsets.

Seth also tested the patch on all five affected chipsets
(pci device ID: 0x1c00, 0x1c01, 0x1d00, 0x1e00, 0x1e01), and found
the patch does fix the problem.
"

The above patch only fixing the 4 ports IDE controller 32bit PIO mode. 

Recently, the problem was shown at Haswell Desktop platform which includes 2 
ports IDE controller.

So introduce a quirk patch to disable 32bit PIO on this IDE controller. 

v2: Change spelling error in statememnt pointed by Sergei Shtylyov.

Tested-by: Lee, Chun-Yi 
Signed-off-by: Youquan Song 
Cc: sta...@vger.kernel.org
---
 drivers/ata/ata_piix.c |   14 +-
 1 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/drivers/ata/ata_piix.c b/drivers/ata/ata_piix.c
index ef773e1..1993e52 100644
--- a/drivers/ata/ata_piix.c
+++ b/drivers/ata/ata_piix.c
@@ -150,6 +150,7 @@ enum piix_controller_ids {
tolapai_sata,
piix_pata_vmw,  /* PIIX4 for VMware, spurious DMA_ERR */
ich8_sata_snb,
+   ich8_2port_sata_snb,
 };
 
 struct piix_map_db {
@@ -326,7 +327,7 @@ static const struct pci_device_id piix_pci_tbl[] = {
/* SATA Controller IDE (Lynx Point) */
{ 0x8086, 0x8c01, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_sata_snb },
/* SATA Controller IDE (Lynx Point) */
-   { 0x8086, 0x8c08, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata },
+   { 0x8086, 0x8c08, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata_snb },
/* SATA Controller IDE (Lynx Point) */
{ 0x8086, 0x8c09, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata },
/* SATA Controller IDE (Lynx Point-LP) */
@@ -502,6 +503,7 @@ static const struct piix_map_db *piix_map_db_table[] = {
[ich8m_apple_sata]  = &ich8m_apple_map_db,
[tolapai_sata]  = &tolapai_map_db,
[ich8_sata_snb] = &ich8_map_db,
+   [ich8_2port_sata_snb]   = &ich8_2port_map_db,
 };
 
 static struct ata_port_info piix_port_info[] = {
@@ -643,6 +645,16 @@ static struct ata_port_info piix_port_info[] = {
.port_ops   = &piix_sata_ops,
},
 
+   [ich8_2port_sata_snb] =
+   {
+   .flags  = PIIX_SATA_FLAGS | PIIX_FLAG_SIDPR | 
PIIX_FLAG_PIO16,
+   .pio_mask   = ATA_PIO4,
+   .mwdma_mask = ATA_MWDMA2,
+   .udma_mask  = ATA_UDMA6,
+   .port_ops   = &piix_sata_ops,
+   },
+
+
 };
 
 static struct pci_bits piix_enable_bits[] = {
-- 
1.6.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] ata: Fix DVD not dectected at some Haswell platforms

2013-01-29 Thread Youquan Song
There is a quirk patch 5e5a4f5d5a08c9c504fe956391ac3dae2c66556d fix the 4 ports
IDE controller 32bit PIO mode. 
Recently, the problem was showed at Haswell platform which includes 2 ports 
IDE controller.

So introduce a qurik patch to disable 32bit PIO at this IDE controller. 

Signed-off-by: Youquan Song 
---
 drivers/ata/ata_piix.c |   14 +-
 1 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/drivers/ata/ata_piix.c b/drivers/ata/ata_piix.c
index ef773e1..1993e52 100644
--- a/drivers/ata/ata_piix.c
+++ b/drivers/ata/ata_piix.c
@@ -150,6 +150,7 @@ enum piix_controller_ids {
tolapai_sata,
piix_pata_vmw,  /* PIIX4 for VMware, spurious DMA_ERR */
ich8_sata_snb,
+   ich8_2port_sata_snb,
 };
 
 struct piix_map_db {
@@ -326,7 +327,7 @@ static const struct pci_device_id piix_pci_tbl[] = {
/* SATA Controller IDE (Lynx Point) */
{ 0x8086, 0x8c01, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_sata_snb },
/* SATA Controller IDE (Lynx Point) */
-   { 0x8086, 0x8c08, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata },
+   { 0x8086, 0x8c08, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata_snb },
/* SATA Controller IDE (Lynx Point) */
{ 0x8086, 0x8c09, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata },
/* SATA Controller IDE (Lynx Point-LP) */
@@ -502,6 +503,7 @@ static const struct piix_map_db *piix_map_db_table[] = {
[ich8m_apple_sata]  = &ich8m_apple_map_db,
[tolapai_sata]  = &tolapai_map_db,
[ich8_sata_snb] = &ich8_map_db,
+   [ich8_2port_sata_snb]   = &ich8_2port_map_db,
 };
 
 static struct ata_port_info piix_port_info[] = {
@@ -643,6 +645,16 @@ static struct ata_port_info piix_port_info[] = {
.port_ops   = &piix_sata_ops,
},
 
+   [ich8_2port_sata_snb] =
+   {
+   .flags  = PIIX_SATA_FLAGS | PIIX_FLAG_SIDPR | 
PIIX_FLAG_PIO16,
+   .pio_mask   = ATA_PIO4,
+   .mwdma_mask = ATA_MWDMA2,
+   .udma_mask  = ATA_UDMA6,
+   .port_ops   = &piix_sata_ops,
+   },
+
+
 };
 
 static struct pci_bits piix_enable_bits[] = {
-- 
1.6.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] ata: Fix DVD not dectected at some Haswell platforms

2013-01-29 Thread Youquan Song
There is a quirk patch 5e5a4f5d5a08c9c504fe956391ac3dae2c66556d fix the 4 ports
IDE controller 32bit PIO mode. 
Recently, the problem was showed at Haswell platform which includes 2 ports 
IDE controller.

So introduce a qurik patch to disable 32bit PIO at this IDE controller. 

Signed-off-by: Youquan Song 
---
 drivers/ata/ata_piix.c |   14 +-
 1 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/drivers/ata/ata_piix.c b/drivers/ata/ata_piix.c
index ef773e1..1993e52 100644
--- a/drivers/ata/ata_piix.c
+++ b/drivers/ata/ata_piix.c
@@ -150,6 +150,7 @@ enum piix_controller_ids {
tolapai_sata,
piix_pata_vmw,  /* PIIX4 for VMware, spurious DMA_ERR */
ich8_sata_snb,
+   ich8_2port_sata_snb,
 };
 
 struct piix_map_db {
@@ -326,7 +327,7 @@ static const struct pci_device_id piix_pci_tbl[] = {
/* SATA Controller IDE (Lynx Point) */
{ 0x8086, 0x8c01, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_sata_snb },
/* SATA Controller IDE (Lynx Point) */
-   { 0x8086, 0x8c08, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata },
+   { 0x8086, 0x8c08, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata_snb },
/* SATA Controller IDE (Lynx Point) */
{ 0x8086, 0x8c09, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata },
/* SATA Controller IDE (Lynx Point-LP) */
@@ -502,6 +503,7 @@ static const struct piix_map_db *piix_map_db_table[] = {
[ich8m_apple_sata]  = &ich8m_apple_map_db,
[tolapai_sata]  = &tolapai_map_db,
[ich8_sata_snb] = &ich8_map_db,
+   [ich8_2port_sata_snb]   = &ich8_2port_map_db,
 };
 
 static struct ata_port_info piix_port_info[] = {
@@ -643,6 +645,16 @@ static struct ata_port_info piix_port_info[] = {
.port_ops   = &piix_sata_ops,
},
 
+   [ich8_2port_sata_snb] =
+   {
+   .flags  = PIIX_SATA_FLAGS | PIIX_FLAG_SIDPR | 
PIIX_FLAG_PIO16,
+   .pio_mask   = ATA_PIO4,
+   .mwdma_mask = ATA_MWDMA2,
+   .udma_mask  = ATA_UDMA6,
+   .port_ops   = &piix_sata_ops,
+   },
+
+
 };
 
 static struct pci_bits piix_enable_bits[] = {
-- 
1.6.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:perf/urgent] x86/perf: Add IvyBridge EP support

2013-01-24 Thread tip-bot for Youquan Song
Commit-ID:  923d8697e24847000490c187de1aeaca622611a3
Gitweb: http://git.kernel.org/tip/923d8697e24847000490c187de1aeaca622611a3
Author: Youquan Song 
AuthorDate: Tue, 18 Dec 2012 12:20:23 -0500
Committer:  Ingo Molnar 
CommitDate: Thu, 24 Jan 2013 16:14:04 +0100

x86/perf: Add IvyBridge EP support

Running the perf utility on a Ivybridge EP server we encounter
"not supported" events:

L1-dcache-loads
L1-dcache-load-misses
L1-dcache-stores
L1-dcache-store-misses
L1-dcache-prefetches
L1-dcache-prefetch-misses

This patch adds support for this processor.

Signed-off-by: Youquan Song 
Cc: Andi Kleen 
Cc: Youquan Song 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Link: 
http://lkml.kernel.org/r/1355851223-27705-1-git-send-email-youquan.s...@intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/kernel/cpu/perf_event_intel.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index cb313a5..4914e94 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -2087,6 +2087,7 @@ __init int intel_pmu_init(void)
pr_cont("SandyBridge events, ");
break;
case 58: /* IvyBridge */
+   case 62: /* IvyBridge EP */
memcpy(hw_cache_event_ids, snb_hw_cache_event_ids,
   sizeof(hw_cache_event_ids));
memcpy(hw_cache_extra_regs, snb_hw_cache_extra_regs,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86,apic: Blacklist x2APIC on some platforms

2013-01-04 Thread Youquan Song
On Tue, Dec 18, 2012 at 09:42:30AM -0800, Yinghai Lu wrote:
> On Tue, Dec 18, 2012 at 9:33 AM, H. Peter Anvin  wrote:
> > On 12/18/2012 09:07 AM, Youquan Song wrote:
> >> Blacklist x2apic when Nivida graphics enabled on Lenovo ThinkPad T420.
> >> Also set blacklist x2apic for Lenovo ThinkPad W520 and L520.
> >
> > I thought we had gotten reports that the Nvidia correlation was false?
> 
> that's T520.

Hi hpa,

Yinghai's T520 works when x2APIC enabled, so do not need to blacklist.

Would you like to take the patch?

Thanks
-Youquan
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86,perf: Add IvyBridge EP support

2013-01-04 Thread Youquan Song

Would you like to take it? It is needed by Linux OSVs.

Thanks
-Youquan

On Tue, Dec 18, 2012 at 12:20:23PM -0500, Youquan Song wrote:
> Run in perf utility at Ivybridge EP server, encouter "not supported" event
> 
> L1-dcache-loads 
> L1-dcache-load-misses   
> L1-dcache-stores
> L1-dcache-store-misses  
> L1-dcache-prefetches
> L1-dcache-prefetch-misses
> 
> This patch add the support for this processor.
> 
> Reviewed-by: Andi Kleen 
> Signed-off-by: Youquan Song 
> ---
>  arch/x86/kernel/cpu/perf_event_intel.c |1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
> b/arch/x86/kernel/cpu/perf_event_intel.c
> index 324bb52..aea3503 100644
> --- a/arch/x86/kernel/cpu/perf_event_intel.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel.c
> @@ -2075,6 +2075,7 @@ __init int intel_pmu_init(void)
>   pr_cont("SandyBridge events, ");
>   break;
>   case 58: /* IvyBridge */
> + case 62: /* IvyBridge EP */
>   memcpy(hw_cache_event_ids, snb_hw_cache_event_ids,
>  sizeof(hw_cache_event_ids));
>   memcpy(hw_cache_extra_regs, snb_hw_cache_extra_regs,
> -- 
> 1.6.4.2
> 
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] x86,idle: pr_debug information need separated

2012-12-17 Thread Youquan Song
When debug kernel, the the below information is found:
intel_idle: unaware of model 0x1a MWAIT 4 please contact lenb@kernel.orgACPI: 
Device input0 -> No ACPI support

so this patch separates it.

Signed-off-by: Youquan Song 
---
 drivers/idle/intel_idle.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index b0f6b4c..eae6e3b 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -518,7 +518,7 @@ static int intel_idle_cpuidle_driver_init(void)
if (*cpuidle_state_table[cstate].name == '\0')
pr_debug(PREFIX "unaware of model 0x%x"
" MWAIT %d please"
-   " contact l...@kernel.org",
+   " contact l...@kernel.org\n",
boot_cpu_data.x86_model, cstate);
continue;
}
-- 
1.6.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] x86,perf: Add IvyBridge EP support

2012-12-17 Thread Youquan Song
Run in perf utility at Ivybridge EP server, encouter "not supported" event

L1-dcache-loads 
L1-dcache-load-misses   
L1-dcache-stores
L1-dcache-store-misses  
L1-dcache-prefetches
L1-dcache-prefetch-misses

This patch add the support for this processor.

Reviewed-by: Andi Kleen 
Signed-off-by: Youquan Song 
---
 arch/x86/kernel/cpu/perf_event_intel.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index 324bb52..aea3503 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -2075,6 +2075,7 @@ __init int intel_pmu_init(void)
pr_cont("SandyBridge events, ");
break;
case 58: /* IvyBridge */
+   case 62: /* IvyBridge EP */
memcpy(hw_cache_event_ids, snb_hw_cache_event_ids,
   sizeof(hw_cache_event_ids));
memcpy(hw_cache_extra_regs, snb_hw_cache_extra_regs,
-- 
1.6.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] x86,apic: Blacklist x2APIC on some platforms

2012-12-17 Thread Youquan Song
Blacklist x2apic when Nivida graphics enabled on Lenovo ThinkPad T420.
Also set blacklist x2apic for Lenovo ThinkPad W520 and L520.


Thre are 3 bug reports:
https://bugzilla.kernel.org/show_bug.cgi?id=43054
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/776999
https://bugs.launchpad.net/bugs/922037

The patches is based on http://git.kernel.org/?p=linux/kernel/git/yinghai/
linux-yinghai.git;a=patch;h=de38757e964cfee20e6da1977572a2191d7f4aa0

Reviewed-by: Yinghai Lu 
Signed-off-by: Youquan Song 
---
 arch/x86/include/asm/x86_init.h |1 +
 arch/x86/kernel/apic/apic.c |   51 +++
 arch/x86/kernel/early-quirks.c  |9 +++
 3 files changed, 61 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index 38155f6..88e39e6 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -202,5 +202,6 @@ extern struct x86_msi_ops x86_msi;
 extern struct x86_io_apic_ops x86_io_apic_ops;
 extern void x86_init_noop(void);
 extern void x86_init_uint_noop(unsigned int unused);
+extern int early_found_nvidia_display_card;
 
 #endif
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 24deb30..0822fe9 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -170,6 +170,54 @@ static __init int setup_nox2apic(char *str)
return 0;
 }
 early_param("nox2apic", setup_nox2apic);
+
+static __init int x2apic_set_blacklist_nvidia(const struct dmi_system_id *d)
+{
+   if (!early_found_nvidia_display_card)
+   return 1;
+
+   setup_nox2apic("");
+   pr_info("x2apic blacklisted when Nivida graphics enabled on %s\n",
+   d->ident);
+   return 0;
+}
+
+static __init int x2apic_set_blacklist(const struct dmi_system_id *d)
+{
+   setup_nox2apic("");
+   pr_info("x2apic blacklisted because of broken SMI on %s\n",
+   d->ident);
+   return 0;
+}
+
+static const struct dmi_system_id x2apic_dmi_table[] = {
+   {
+   .callback = x2apic_set_blacklist_nvidia,
+   .ident = "Lenovo ThinkPad T420",
+   .matches = {
+   DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
+   DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad T420"),
+   },
+   },
+   {
+   .callback = x2apic_set_blacklist,
+   .ident = "Lenovo ThinkPad W520",
+   .matches = {
+   DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
+   DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad W520"),
+   },
+   },
+   {
+   .callback = x2apic_set_blacklist,
+   .ident = "Lenovo ThinkPad L520",
+   .matches = {
+   DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
+   DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad L520"),
+   },
+   },
+   {}
+};
+
 #endif
 
 unsigned long mp_lapic_addr;
@@ -1542,6 +1590,9 @@ void __init enable_IR_x2apic(void)
int ret, x2apic_enabled = 0;
int hardware_init_ret;
 
+   if (x2apic_supported())
+   dmi_check_system(x2apic_dmi_table);
+
/* Make sure irq_remap_ops are initialized */
setup_irq_remapping_ops();
 
diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c
index 7548932..852d7a0 100644
--- a/arch/x86/kernel/early-quirks.c
+++ b/arch/x86/kernel/early-quirks.c
@@ -19,6 +19,8 @@
 #include 
 #include 
 
+int early_found_nvidia_display_card __initdata;
+
 static void __init fix_hypertransport_config(int num, int slot, int func)
 {
u32 htcfg;
@@ -192,6 +194,11 @@ static void __init ati_bugs_contd(int num, int slot, int 
func)
 }
 #endif
 
+static void __init nvidia_x2apic_bugs(int num, int slot, int func)
+{
+   early_found_nvidia_display_card = 1;
+}
+
 #define QFLAG_APPLY_ONCE   0x1
 #define QFLAG_APPLIED  0x2
 #define QFLAG_DONE (QFLAG_APPLY_ONCE|QFLAG_APPLIED)
@@ -221,6 +228,8 @@ static struct chipset early_qrk[] __initdata = {
  PCI_CLASS_SERIAL_SMBUS, PCI_ANY_ID, 0, ati_bugs },
{ PCI_VENDOR_ID_ATI, PCI_DEVICE_ID_ATI_SBX00_SMBUS,
  PCI_CLASS_SERIAL_SMBUS, PCI_ANY_ID, 0, ati_bugs_contd },
+   { PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID,
+ PCI_CLASS_DISPLAY_VGA, 0xff00, 0, nvidia_x2apic_bugs},
{}
 };
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2 1/4] x86,idle: Quickly notice prediction failure for repeat mode

2012-10-18 Thread Youquan Song
The prediction for future is difficult and when the cpuidle governor prediction 
fails and govenor possibly choose the shallower C-state than it should. How to 
quickly notice and find the failure becomes important for power saving.

cpuidle menu governor has a method to predict the repeat pattern if there are 8
C-states residency which are continuous and the same or very close, so it will
predict the next C-states residency will keep same residency time.

There is a real case that turbostat utility (tools/power/x86/turbostat) 
at kernel 3.3 or early. turbostat utility will read 10 registers one by one at
Sandybridge, so it will generate 10 IPIs to wake up idle CPUs. So cpuidle menu
 governor will predict it is repeat mode and there is another IPI wake up idle
 CPU soon, so it keeps idle CPU stay at C1 state even though CPU is totally 
idle. However, in the turbostat, following 10 registers reading is sleep 5 
seconds by default, so the idle CPU will keep at C1 for a long time though it is
 idle until break event occurs.
In a idle Sandybridge system, run "./turbostat -v", we will notice that deep 
C-state dangles between "70% ~ 99%". After patched the kernel, we will notice
deep C-state stays at >99.98%.

In the patch, a timer is added when menu governor detects a repeat mode and
choose a shallow C-state. The timer is set to a time out value that greater
than predicted time, and we conclude repeat mode prediction failure if timer is
triggered. When repeat mode happens as expected, the timer is not triggered
and CPU waken up from C-states and it will cancel the timer initiatively. 
When repeat mode does not happen, the timer will be time out and menu governor 
will quickly notice that the repeat mode prediction fails and then re-evaluates 
deeper C-states possibility.

Below is another case which will clearly show the patch much benefit:

#include 
#include 
#include 
#include 
#include 
#include 
#include 

volatile int * shutdown;
volatile long * count;
int delay = 20;
int loop = 8;

void usage(void)
{
fprintf(stderr,
"Usage: idle_predict [options]\n"
"  --help   -h  Print this help\n"
"  --thread -n  Thread number\n"
"  --loop   -l  Loop times in shallow Cstate\n"
"  --delay  -t  Sleep time (uS)in shallow Cstate\n");
}

void *simple_loop() {
int idle_num = 1;
while (!(*shutdown)) {
*count = *count + 1;

if (idle_num % loop)
usleep(delay);
else {
/* sleep 1 second */
usleep(100);
idle_num = 0;
}
idle_num++;
}

}

static void sighand(int sig)
{
*shutdown = 1;
}

int main(int argc, char *argv[])
{
sigset_t sigset;
int signum = SIGALRM;
int i, c, er = 0, thread_num = 8;
pthread_t pt[1024];

static char optstr[] = "n:l:t:h:";

while ((c = getopt(argc, argv, optstr)) != EOF)
switch (c) {
case 'n':
thread_num = atoi(optarg);
break;
case 'l':
loop = atoi(optarg);
break;
case 't':
delay = atoi(optarg);
break;
case 'h':
default:
usage();
exit(1);
}

printf("thread=%d,loop=%d,delay=%d\n",thread_num,loop,delay);
count = malloc(sizeof(long));
shutdown = malloc(sizeof(int));
*count = 0;
*shutdown = 0;

sigemptyset(&sigset);
sigaddset(&sigset, signum);
sigprocmask (SIG_BLOCK, &sigset, NULL);
signal(SIGINT, sighand);
signal(SIGTERM, sighand);

for(i = 0; i < thread_num ; i++)
pthread_create(&pt[i], NULL, simple_loop, NULL);

for (i = 0; i < thread_num; i++)
pthread_join(pt[i], NULL);

exit(0);
}

Get powertop V2 from git://github.com/fenrus75/powertop, build powertop.
After build the above test application, then run it.
Test plaform can be Intel Sandybridge or other recent platforms.
#./idle_predict -l 10 &
#./powertop

We will find that deep C-state will dangle between 40%~100% and much time spent
on C1 state. It is because menu governor wrongly predict that repeat mode
is kept, so it will choose the C1 shallow C-state even though it has chance to
sleep 1 second in deep C-state.
 
While after patched the kernel, we find that deep C-state will keep >99.

[PATCH V2 2/4] x86,idle: Quickly notice prediction failure in general case

2012-10-18 Thread Youquan Song
The prediction for future is difficult and when the cpuidle governor prediction 
fails and govenor possibly choose the shallower C-state than it should. How to 
quickly notice and find the failure becomes important for power saving.

The patch extends to general case that prediction logic get a small predicted
residency, so it choose a shallow C-state though the expected residency is large
. Once the prediction will be fail, the CPU will keep staying at shallow C-state
for a long time. Acutally, the CPU has change enter into deep C-state.
So when the expected residency is long enough but governor choose a shallow
C-state, an timer will be added in order to monitor if the prediction failure. 

When C-state is waken up prior to the adding timer, the timer will be cancelled 
initiatively. When the timer is triggered and menu governor will quickly notice
prediction failure and re-evaluates deeper C-states possibility. 

Signed-off-by: Youquan Song 
Signed-off-by: Rik van Riel 
---
 drivers/cpuidle/governors/menu.c |   34 +-
 1 files changed, 33 insertions(+), 1 deletions(-)

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 37c0ff6..c824b4f 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -34,7 +34,7 @@
 static DEFINE_PER_CPU(struct hrtimer, menu_hrtimer);
 static DEFINE_PER_CPU(int, hrtimer_status);
 /* menu hrtimer mode */
-enum {MENU_HRTIMER_STOP, MENU_HRTIMER_REPEAT};
+enum {MENU_HRTIMER_STOP, MENU_HRTIMER_REPEAT, MENU_HRTIMER_GENERAL};
 
 /*
  * Concepts and ideas behind the menu governor
@@ -116,6 +116,13 @@ enum {MENU_HRTIMER_STOP, MENU_HRTIMER_REPEAT};
  *
  */
 
+/*
+ * The C-state residency is so long that is is worthwhile to exit
+ * from the shallow C-state and re-enter into a deeper C-state.
+ */
+static unsigned int perfect_cstate_ms __read_mostly = 30;
+module_param(perfect_cstate_ms, uint, );
+
 struct menu_device {
int last_state_idx;
int needs_update;
@@ -216,7 +223,17 @@ EXPORT_SYMBOL_GPL(menu_hrtimer_cancel);
 static enum hrtimer_restart menu_hrtimer_notify(struct hrtimer *hrtimer)
 {
int cpu = smp_processor_id();
+   struct menu_device *data = &per_cpu(menu_devices, cpu);
 
+   /* In general case, the expected residency is much larger than
+*  deepest C-state target residency, but prediction logic still
+*  predicts a small predicted residency, so the prediction
+*  history is totally broken if the timer is triggered.
+*  So reset the correction factor.
+*/
+   if (per_cpu(hrtimer_status, cpu) == MENU_HRTIMER_GENERAL)
+   data->correction_factor[data->bucket] = RESOLUTION * DECAY;
+
per_cpu(hrtimer_status, cpu) = MENU_HRTIMER_STOP;
 
return HRTIMER_NORESTART;
@@ -353,6 +370,7 @@ static int menu_select(struct cpuidle_driver *drv, struct 
cpuidle_device *dev)
/* not deepest C-state chosen for low predicted residency */
if (low_predicted) {
unsigned int timer_us = 0;
+   unsigned int perfect_us = 0;
 
/*
 * Set a timer to detect whether this sleep is much
@@ -363,12 +381,26 @@ static int menu_select(struct cpuidle_driver *drv, struct 
cpuidle_device *dev)
 */
timer_us = 2 * (data->predicted_us + MAX_DEVIATION);
 
+   perfect_us = perfect_cstate_ms * 1000;
+
if (repeat && (4 * timer_us < data->expected_us)) {
hrtimer_start(hrtmr, ns_to_ktime(1000 * timer_us),
HRTIMER_MODE_REL_PINNED);
/* In repeat case, menu hrtimer is started */
per_cpu(hrtimer_status, cpu) = MENU_HRTIMER_REPEAT;
+   } else if (perfect_us < data->expected_us) {
+   /*
+* The next timer is long. This could be because
+* we did not make a useful prediction.
+* In that case, it makes sense to re-enter
+* into a deeper C-state after some time.
+*/
+   hrtimer_start(hrtmr, ns_to_ktime(1000 * timer_us),
+   HRTIMER_MODE_REL_PINNED);
+   /* In general case, menu hrtimer is started */
+   per_cpu(hrtimer_status, cpu) = MENU_HRTIMER_GENERAL;
}
+
}
 
return data->last_state_idx;
-- 
1.7.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2 4/4] x86,idle: Get typical recent sleep interval

2012-10-18 Thread Youquan Song
The function detect_repeating_patterns was not very useful for
workloads with alternating long and short pauses, for example
virtual machines handling network requests for each other (say
a web and database server).

Instead, try to find a recent sleep interval that is somewhere
between the median and the mode sleep time, by discarding outliers
to the up side and recalculating the average and standard deviation
until that is no longer required.

This should do something sane with a sleep interval series like:

200 180 210 1 30 1000 170 200

The current code would simply discard such a series, while the
new code will guess a typical sleep interval just shy of 200.

The original patch come from Rik van Riel .

Signed-off-by: Youquan Song 
Signed-off-by: Rik van Riel 
---
 drivers/cpuidle/governors/menu.c |   69 +
 1 files changed, 46 insertions(+), 23 deletions(-)

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index c824b4f..2411c4c 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -245,36 +245,59 @@ static enum hrtimer_restart menu_hrtimer_notify(struct 
hrtimer *hrtimer)
  * of points is below a threshold. If it is... then use the
  * average of these 8 points as the estimated value.
  */
-static int detect_repeating_patterns(struct menu_device *data)
+static u32 get_typical_interval(struct menu_device *data)
 {
-   int i;
-   uint64_t avg = 0;
-   uint64_t stddev = 0; /* contains the square of the std deviation */
-   int ret = 0;
-
-   /* first calculate average and standard deviation of the past */
-   for (i = 0; i < INTERVALS; i++)
-   avg += data->intervals[i];
-   avg = avg / INTERVALS;
+   int i = 0, divisor = 0;
+   int64_t max = 0, avg = 0, stddev = 0;
+   int64_t thresh = LLONG_MAX; /* Discard outliers above this value. */
+   unsigned int ret = 0;
 
-   /* if the avg is beyond the known next tick, it's worthless */
-   if (avg > data->expected_us)
-   return 0;
-
-   for (i = 0; i < INTERVALS; i++)
-   stddev += (data->intervals[i] - avg) *
- (data->intervals[i] - avg);
+again:
 
-   stddev = stddev / INTERVALS;
+   /* first calculate average and standard deviation of the past */
+   max = avg = divisor = stddev = 0;
+   for (i = 0; i < INTERVALS; i++) {
+   int64_t value = data->intervals[i];
+   if (value <= thresh) {
+   avg += value;
+   divisor++;
+   if (value > max)
+   max = value;
+   }
+   }
+   do_div(avg, divisor);
 
+   for (i = 0; i < INTERVALS; i++) {
+   int64_t value = data->intervals[i];
+   if (value <= thresh) {
+   int64_t diff = value - avg;
+   stddev += diff * diff;
+   }
+   }
+   do_div(stddev, divisor);
+   stddev = int_sqrt(stddev);
/*
-* now.. if stddev is small.. then assume we have a
-* repeating pattern and predict we keep doing this.
+* If we have outliers to the upside in our distribution, discard
+* those by setting the threshold to exclude these outliers, then
+* calculate the average and standard deviation again. Once we get
+* down to the bottom 3/4 of our samples, stop excluding samples.
+*
+* This can deal with workloads that have long pauses interspersed
+* with sporadic activity with a bunch of short pauses.
+*
+* The typical interval is obtained when standard deviation is small
+* or standard deviation is small compared to the average interval.
 */
-
-   if (avg && stddev < STDDEV_THRESH) {
+   if (((avg > stddev * 6) && (divisor * 4 >= INTERVALS * 3))
+   || stddev <= 20) {
data->predicted_us = avg;
ret = 1;
+   return ret;
+
+   } else if ((divisor * 4) > INTERVALS * 3) {
+   /* Exclude the max interval */
+   thresh = max - 1;
+   goto again;
}
 
return ret;
@@ -330,7 +353,7 @@ static int menu_select(struct cpuidle_driver *drv, struct 
cpuidle_device *dev)
data->predicted_us = div_round64(data->expected_us * 
data->correction_factor[data->bucket],
 RESOLUTION * DECAY);
 
-   repeat = detect_repeating_patterns(data);
+   repeat = get_typical_interval(data);
 
/*
 * We want to default to C1 (hlt), not to busy polling
-- 
1.7.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vg

[PATCH V2 0/4]: x86,idle: Enhance menu governor C-state prediction

2012-10-18 Thread Youquan Song

V2: Add menu timer status enums depends on Rafael suggestion.

The prediction for future is difficult and when the cpuidle governor prediction 
fails and govenor possibly choose the shallower C-state than it should. How to 
quickly notice and find the failure becomes important for power saving.

cpuidle menu governor has a method to predict the repeat pattern if there are 8
C-states residency which are continuous and the same or very close, so it will
predict the next C-states residency will keep same residency time.

This patchset adds a timer when menu governor choose a non-deepest C-state in
order to wake up quickly from shallow C-state to avoid staying too long at 
shallow C-state for prediction failure. The timer is set to a time out value 
that is greater than predicted time and if the timer with the value is 
triggered 
, we can confidently conclude prediction is failure. When prediction
succeeds, CPU is waken up from C-states in predicted time and the timer is not 
triggered and will be cancelled right after CPU waken up. When prediction fails,
the timer is triggered to wake up CPU from shallow C-states, so menu governor 
will quickly notice that prediction fails and then re-evaluates deeper C-states
 possibility. This patchset can improves cpuidle prediction process for both 
repeat mode and general mode.

The patchset integrates one patch from Rik van Riel , which try
to find a typical interval along with cut the upside outliers depends on
historical sleep intervals. The patch tends to choose a shallow C-state to
achieve better performance and ehancement of prediction failure will advise it
if the deepest C-state should be chosen.  

Testing result:

The whole patchset achieve good result after bunch of testing/tuning. 
Testing on two sockets Sandybridge server, SPECPower2008 get 2%~5% increase
ssj_ops/watt; Running benchmark in phoronix-test-suite: compress-7zip, 
build-linux-kernel, apache, fio etc, it also proves to increase the 
performance/power; What's more, it not only boosts the performance but also
saves power.  
 
There are also 2 cases will clear show this patchset benefit.

One case is turbostat utility (tools/power/x86/turbostat) at kernel 3.3 or early
. turbostat utility will read 10 registers one by one at Sandybridge, so it will
generate 10 IPIs to wake up idle CPUs. So cpuidle menu governor will predict it
 is repeat mode and there is another IPI wake up idle CPU soon, so it keeps idle
 CPU stay at C1 state even though CPU is totally idle. However, in the turbostat
, following 10 registers reading is sleep 5 seconds by default, so the idle CPU
 will keep at C1 for a long time though it is idle until break event occurs.
In a idle Sandybridge system, run "./turbostat -v", we will notice that deep 
C-state dangles between "70% ~ 99%". After patched the kernel, we will notice
deep C-state stays at >99.98%.

Below is another case which will clearly show the patch much benefit:

#include 
#include 
#include 
#include 
#include 
#include 
#include 

volatile int * shutdown;
volatile long * count;
int delay = 20;
int loop = 8;

void usage(void)
{
fprintf(stderr,
"Usage: idle_predict [options]\n"
"  --help   -h  Print this help\n"
"  --thread -n  Thread number\n"
"  --loop   -l  Loop times in shallow Cstate\n"
"  --delay  -t  Sleep time (uS)in shallow Cstate\n");
}

void *simple_loop() {
int idle_num = 1;
while (!(*shutdown)) {
*count = *count + 1;

if (idle_num % loop)
usleep(delay);
else {
/* sleep 1 second */
usleep(100);
idle_num = 0;
}
idle_num++;
}

}

static void sighand(int sig)
{
*shutdown = 1;
}

int main(int argc, char *argv[])
{
sigset_t sigset;
int signum = SIGALRM;
int i, c, er = 0, thread_num = 8;
pthread_t pt[1024];

static char optstr[] = "n:l:t:h:";

while ((c = getopt(argc, argv, optstr)) != EOF)
switch (c) {
case 'n':
thread_num = atoi(optarg);
break;
case 'l':
loop = atoi(optarg);
break;
case 't':
delay = atoi(optarg);
break;
case 'h':
default:
usage();
exit(1);
}

printf("thread=%d,loop=%d,delay=%d\n",thread_num,loop,delay);
count = malloc(sizeof(long));
shutdown = malloc(sizeof(int));
*count = 0;
*shutdown = 0;

sigemptyset(&sigset);
sigadds

[PATCH V2 3/4] x86,idle: Set residency to 0 if target Cstate not enter

2012-10-18 Thread Youquan Song
When cpuidle governor choose a C-state to enter for idle CPU, but it notice that
there is tasks request to be executed. So the idle CPU will not really enter
the target C-state and go to run task.

In this situation, it will use the residency of previous really entered target 
C-states. Obviously, it is not reasonable. 

So, this patch fix it by set the target C-state residency to 0. 

Signed-off-by: Youquan Song 
Signed-off-by: Rik van Riel 
---
 drivers/cpuidle/cpuidle.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index e28f6ea..01dca54 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -144,6 +144,10 @@ int cpuidle_idle_call(void)
/* ask the governor for the next state */
next_state = cpuidle_curr_governor->select(drv, dev);
if (need_resched()) {
+   dev->last_residency = 0;
+   /* give the governor an opportunity to reflect on the outcome */
+   if (cpuidle_curr_governor->reflect)
+   cpuidle_curr_governor->reflect(dev, next_state);
local_irq_enable();
return 0;
}
-- 
1.7.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/5] x86,idle: Quickly notice prediction failure in general case

2012-10-16 Thread Youquan Song
The prediction for future is difficult and when the cpuidle governor prediction 
fails and govenor possibly choose the shallower C-state than it should. How to 
quickly notice and find the failure becomes important for power saving.

The patch extends to general case that prediction logic get a small predicted
residency, so it choose a shallow C-state though the expected residency is large
. Once the prediction will be fail, the CPU will keep staying at shallow C-state
for a long time. Acutally, the CPU has change enter into deep C-state.
So when the expected residency is long enough but governor choose a shallow
C-state, an timer will be added in order to monitor if the prediction failure. 

When C-state is waken up prior to the adding timer, the timer will be cancelled 
initiatively. When the timer is triggered and menu governor will quickly notice
prediction failure and re-evaluates deeper C-states possibility. 

Signed-off-by: Youquan Song 
Signed-off-by: Rik van Riel 
---
 drivers/cpuidle/governors/menu.c |   22 ++
 1 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index beeab6a..b34bf11 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -114,6 +114,13 @@ static DEFINE_PER_CPU(int, hrtimer_started);
  *
  */
 
+/*
+ * The C-state residency is so long that is is worthwhile to exit
+ * from the shallow C-state and re-enter into a deeper C-state.
+ */
+static unsigned int perfect_cstate_ms __read_mostly = 30;
+module_param(perfect_cstate_ms, uint, );
+
 struct menu_device {
int last_state_idx;
int needs_update;
@@ -351,6 +358,7 @@ static int menu_select(struct cpuidle_driver *drv, struct 
cpuidle_device *dev)
/* not deepest C-state chosen for low predicted residency */
if (low_predicted) {
unsigned int timer_us = 0;
+   unsigned int perfect_us = 0;
 
/*
 * Set a timer to detect whether this sleep is much
@@ -361,12 +369,26 @@ static int menu_select(struct cpuidle_driver *drv, struct 
cpuidle_device *dev)
 */
timer_us = 2 * (data->predicted_us + MAX_DEVIATION);
 
+   perfect_us = perfect_cstate_ms * 1000;
+
if (repeat && (4 * timer_us < data->expected_us)) {
hrtimer_start(hrtmr, ns_to_ktime(1000 * timer_us),
HRTIMER_MODE_REL_PINNED);
/* menu hrtimer is started */
per_cpu(hrtimer_started, cpu) = 1;
+   } else if (perfect_us < data->expected_us) {
+   /*
+* The next timer is long. This could be because
+* we did not make a useful prediction.
+* In that case, it makes sense to re-enter
+* into a deeper C-state after some time.
+*/
+   hrtimer_start(hrtmr, ns_to_ktime(1000 * timer_us),
+   HRTIMER_MODE_REL_PINNED);
+   /* menu hrtimer is started */
+   per_cpu(hrtimer_started, cpu) = 1;
}
+
}
 
return data->last_state_idx;
-- 
1.7.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/5] x86,idle: Get typical recent sleep interval

2012-10-16 Thread Youquan Song
The function detect_repeating_patterns was not very useful for
workloads with alternating long and short pauses, for example
virtual machines handling network requests for each other (say
a web and database server).

Instead, try to find a recent sleep interval that is somewhere
between the median and the mode sleep time, by discarding outliers
to the up side and recalculating the average and standard deviation
until that is no longer required.

This should do something sane with a sleep interval series like:

200 180 210 1 30 1000 170 200

The current code would simply discard such a series, while the
new code will guess a typical sleep interval just shy of 200.

The original patch come from Rik van Riel .

Signed-off-by: Youquan Song 
Signed-off-by: Rik van Riel 
---
 drivers/cpuidle/governors/menu.c |   69 +
 1 files changed, 46 insertions(+), 23 deletions(-)

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 7dbac97..dbb9e1c 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -237,36 +237,59 @@ static enum hrtimer_restart menu_hrtimer_notify(struct 
hrtimer *hrtimer)
  * of points is below a threshold. If it is... then use the
  * average of these 8 points as the estimated value.
  */
-static int detect_repeating_patterns(struct menu_device *data)
+static u32 get_typical_interval(struct menu_device *data)
 {
-   int i;
-   uint64_t avg = 0;
-   uint64_t stddev = 0; /* contains the square of the std deviation */
-   int ret = 0;
-
-   /* first calculate average and standard deviation of the past */
-   for (i = 0; i < INTERVALS; i++)
-   avg += data->intervals[i];
-   avg = avg / INTERVALS;
+   int i = 0, divisor = 0;
+   int64_t max = 0, avg = 0, stddev = 0;
+   int64_t thresh = LLONG_MAX; /* Discard outliers above this value. */
+   unsigned int ret = 0;
 
-   /* if the avg is beyond the known next tick, it's worthless */
-   if (avg > data->expected_us)
-   return 0;
-
-   for (i = 0; i < INTERVALS; i++)
-   stddev += (data->intervals[i] - avg) *
- (data->intervals[i] - avg);
+again:
 
-   stddev = stddev / INTERVALS;
+   /* first calculate average and standard deviation of the past */
+   max = avg = divisor = stddev = 0;
+   for (i = 0; i < INTERVALS; i++) {
+   int64_t value = data->intervals[i];
+   if (value <= thresh) {
+   avg += value;
+   divisor++;
+   if (value > max)
+   max = value;
+   }
+   }
+   do_div(avg, divisor);
 
+   for (i = 0; i < INTERVALS; i++) {
+   int64_t value = data->intervals[i];
+   if (value <= thresh) {
+   int64_t diff = value - avg;
+   stddev += diff * diff;
+   }
+   }
+   do_div(stddev, divisor);
+   stddev = int_sqrt(stddev);
/*
-* now.. if stddev is small.. then assume we have a
-* repeating pattern and predict we keep doing this.
+* If we have outliers to the upside in our distribution, discard
+* those by setting the threshold to exclude these outliers, then
+* calculate the average and standard deviation again. Once we get
+* down to the bottom 3/4 of our samples, stop excluding samples.
+*
+* This can deal with workloads that have long pauses interspersed
+* with sporadic activity with a bunch of short pauses.
+*
+* The typical interval is obtained when standard deviation is small
+* or standard deviation is small compared to the average interval.
 */
-
-   if (avg && stddev < STDDEV_THRESH) {
+   if (((avg > stddev * 6) && (divisor * 4 >= INTERVALS * 3))
+   || stddev <= 20) {
data->predicted_us = avg;
ret = 1;
+   return ret;
+
+   } else if ((divisor * 4) > INTERVALS * 3) {
+   /* Exclude the max interval */
+   thresh = max - 1;
+   goto again;
}
 
return ret;
@@ -322,7 +345,7 @@ static int menu_select(struct cpuidle_driver *drv, struct 
cpuidle_device *dev)
data->predicted_us = div_round64(data->expected_us * 
data->correction_factor[data->bucket],
 RESOLUTION * DECAY);
 
-   repeat = detect_repeating_patterns(data);
+   repeat = get_typical_interval(data);
 
/*
 * We want to default to C1 (hlt), not to busy polling
-- 
1.7.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vg

[PATCH 4/5] x86,idle: Set residency to 0 if target Cstate not enter

2012-10-16 Thread Youquan Song
When cpuidle governor choose a C-state to enter for idle CPU, but it notice that
there is tasks request to be executed. So the idle CPU will not really enter
the target C-state and go to run task.

In this situation, it will use the residency of previous really entered target 
C-states. Obviously, it is not reasonable. 

So, this patch fix it by set the target C-state residency to 0. 

Signed-off-by: Youquan Song 
Signed-off-by: Rik van Riel 
---
 drivers/cpuidle/cpuidle.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index e28f6ea..01dca54 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -144,6 +144,10 @@ int cpuidle_idle_call(void)
/* ask the governor for the next state */
next_state = cpuidle_curr_governor->select(drv, dev);
if (need_resched()) {
+   dev->last_residency = 0;
+   /* give the governor an opportunity to reflect on the outcome */
+   if (cpuidle_curr_governor->reflect)
+   cpuidle_curr_governor->reflect(dev, next_state);
local_irq_enable();
return 0;
}
-- 
1.7.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/5] x86,idle: Reset correction factor

2012-10-16 Thread Youquan Song
In general case, the expected residency is much larger than deepest C-state
target residency, but prediction logic still predicts the small predicted
residency, so the prediction history is totally broken. In this situation,
reset the correction factor is the only choice.

Signed-off-by: Youquan Song 
Signed-off-by: Rik van Riel 
---
 drivers/cpuidle/governors/menu.c |6 +-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index b34bf11..7dbac97 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -221,6 +221,10 @@ EXPORT_SYMBOL_GPL(menu_hrtimer_cancel);
 static enum hrtimer_restart menu_hrtimer_notify(struct hrtimer *hrtimer)
 {
int cpu = smp_processor_id();
+   struct menu_device *data = &per_cpu(menu_devices, cpu);
+
+   if (per_cpu(hrtimer_started, cpu) == 2)
+   data->correction_factor[data->bucket] = RESOLUTION * DECAY;
 
per_cpu(hrtimer_started, cpu) = 0;
 
@@ -386,7 +390,7 @@ static int menu_select(struct cpuidle_driver *drv, struct 
cpuidle_device *dev)
hrtimer_start(hrtmr, ns_to_ktime(1000 * timer_us),
HRTIMER_MODE_REL_PINNED);
/* menu hrtimer is started */
-   per_cpu(hrtimer_started, cpu) = 1;
+   per_cpu(hrtimer_started, cpu) = 2;
}
 
}
-- 
1.7.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/5] x86,idle: Quickly notice prediction failure for repeat mode

2012-10-16 Thread Youquan Song
The prediction for future is difficult and when the cpuidle governor prediction 
fails and govenor possibly choose the shallower C-state than it should. How to 
quickly notice and find the failure becomes important for power saving.

cpuidle menu governor has a method to predict the repeat pattern if there are 8
C-states residency which are continuous and the same or very close, so it will
predict the next C-states residency will keep same residency time.

There is a real case that turbostat utility (tools/power/x86/turbostat) 
at kernel 3.3 or early. turbostat utility will read 10 registers one by one at
Sandybridge, so it will generate 10 IPIs to wake up idle CPUs. So cpuidle menu
 governor will predict it is repeat mode and there is another IPI wake up idle
 CPU soon, so it keeps idle CPU stay at C1 state even though CPU is totally 
idle. However, in the turbostat, following 10 registers reading is sleep 5 
seconds by default, so the idle CPU will keep at C1 for a long time though it is
 idle until break event occurs.
In a idle Sandybridge system, run "./turbostat -v", we will notice that deep 
C-state dangles between "70% ~ 99%". After patched the kernel, we will notice
deep C-state stays at >99.98%.

In the patch, a timer is added when menu governor detects a repeat mode and
choose a shallow C-state. The timer is set to a time out value that greater
than predicted time, and we conclude repeat mode prediction failure if timer is
triggered. When repeat mode happens as expected, the timer is not triggered
and CPU waken up from C-states and it will cancel the timer initiatively. 
When repeat mode does not happen, the timer will be time out and menu governor 
will quickly notice that the repeat mode prediction fails and then re-evaluates 
deeper C-states possibility.

Below is another case which will clearly show the patch much benefit:

#include 
#include 
#include 
#include 
#include 
#include 
#include 

volatile int * shutdown;
volatile long * count;
int delay = 20;
int loop = 8;

void usage(void)
{
fprintf(stderr,
"Usage: idle_predict [options]\n"
"  --help   -h  Print this help\n"
"  --thread -n  Thread number\n"
"  --loop   -l  Loop times in shallow Cstate\n"
"  --delay  -t  Sleep time (uS)in shallow Cstate\n");
}

void *simple_loop() {
int idle_num = 1;
while (!(*shutdown)) {
*count = *count + 1;

if (idle_num % loop)
usleep(delay);
else {
/* sleep 1 second */
usleep(100);
idle_num = 0;
}
idle_num++;
}

}

static void sighand(int sig)
{
*shutdown = 1;
}

int main(int argc, char *argv[])
{
sigset_t sigset;
int signum = SIGALRM;
int i, c, er = 0, thread_num = 8;
pthread_t pt[1024];

static char optstr[] = "n:l:t:h:";

while ((c = getopt(argc, argv, optstr)) != EOF)
switch (c) {
case 'n':
thread_num = atoi(optarg);
break;
case 'l':
loop = atoi(optarg);
break;
case 't':
delay = atoi(optarg);
break;
case 'h':
default:
usage();
exit(1);
}

printf("thread=%d,loop=%d,delay=%d\n",thread_num,loop,delay);
count = malloc(sizeof(long));
shutdown = malloc(sizeof(int));
*count = 0;
*shutdown = 0;

sigemptyset(&sigset);
sigaddset(&sigset, signum);
sigprocmask (SIG_BLOCK, &sigset, NULL);
signal(SIGINT, sighand);
signal(SIGTERM, sighand);

for(i = 0; i < thread_num ; i++)
pthread_create(&pt[i], NULL, simple_loop, NULL);

for (i = 0; i < thread_num; i++)
pthread_join(pt[i], NULL);

exit(0);
}

Get powertop V2 from git://github.com/fenrus75/powertop, build powertop.
After build the above test application, then run it.
Test plaform can be Intel Sandybridge or other recent platforms.
#./idle_predict -l 10 &
#./powertop

We will find that deep C-state will dangle between 40%~100% and much time spent
on C1 state. It is because menu governor wrongly predict that repeat mode
is kept, so it will choose the C1 shallow C-state even though it has chance to
sleep 1 second in deep C-state.
 
While after patched the kernel, we find that deep C-state will keep >99.

[PATCH 0/5] x86,idle: Enhance menu governor C-state prediction

2012-10-16 Thread Youquan Song


The prediction for future is difficult and when the cpuidle governor prediction 
fails and govenor possibly choose the shallower C-state than it should. How to 
quickly notice and find the failure becomes important for power saving.

cpuidle menu governor has a method to predict the repeat pattern if there are 8
C-states residency which are continuous and the same or very close, so it will
predict the next C-states residency will keep same residency time.

This patchset adds a timer when menu governor choose a non-deepest C-state in
order to wake up quickly from shallow C-state to avoid staying too long at 
shallow C-state for prediction failure. The timer is set to a time out value 
that is greater than predicted time and if the timer with the value is 
triggered 
, we can confidently conclude prediction is failure. When prediction
succeeds, CPU is waken up from C-states in predicted time and the timer is not 
triggered and will be cancelled right after CPU waken up. When prediction fails,
the timer is triggered to wake up CPU from shallow C-states, so menu governor 
will quickly notice that prediction fails and then re-evaluates deeper C-states
 possibility. This patchset can improves cpuidle prediction process for both 
repeat mode and general mode.

The patchset integrates one patch from Rik van Riel , which try
to find a typical interval along with cut the upside outliers depends on
historical sleep intervals. The patch tends to choose a shallow C-state to
achieve better performance and ehancement of prediction failure will advise it
if the deepest C-state should be chosen.  

Testing result:

The whole patchset achieve good result after bunch of testing/tuning. 
Testing on two sockets Sandybridge server, SPECPower2008 get 2%~5% increase
ssj_ops/watt; Running benchmark in phoronix-test-suite: compress-7zip, 
build-linux-kernel, apache, fio etc, it also proves to increase the 
performance/power; What's more, it not only boosts the performance but also
saves power.  
 
There are also 2 cases will clear show this patchset benefit.

One case is turbostat utility (tools/power/x86/turbostat) at kernel 3.3 or early
. turbostat utility will read 10 registers one by one at Sandybridge, so it will
generate 10 IPIs to wake up idle CPUs. So cpuidle menu governor will predict it
 is repeat mode and there is another IPI wake up idle CPU soon, so it keeps idle
 CPU stay at C1 state even though CPU is totally idle. However, in the turbostat
, following 10 registers reading is sleep 5 seconds by default, so the idle CPU
 will keep at C1 for a long time though it is idle until break event occurs.
In a idle Sandybridge system, run "./turbostat -v", we will notice that deep 
C-state dangles between "70% ~ 99%". After patched the kernel, we will notice
deep C-state stays at >99.98%.

Below is another case which will clearly show the patch much benefit:

#include 
#include 
#include 
#include 
#include 
#include 
#include 

volatile int * shutdown;
volatile long * count;
int delay = 20;
int loop = 8;

void usage(void)
{
fprintf(stderr,
"Usage: idle_predict [options]\n"
"  --help   -h  Print this help\n"
"  --thread -n  Thread number\n"
"  --loop   -l  Loop times in shallow Cstate\n"
"  --delay  -t  Sleep time (uS)in shallow Cstate\n");
}

void *simple_loop() {
int idle_num = 1;
while (!(*shutdown)) {
*count = *count + 1;

if (idle_num % loop)
usleep(delay);
else {
/* sleep 1 second */
usleep(100);
idle_num = 0;
}
idle_num++;
}

}

static void sighand(int sig)
{
*shutdown = 1;
}

int main(int argc, char *argv[])
{
sigset_t sigset;
int signum = SIGALRM;
int i, c, er = 0, thread_num = 8;
pthread_t pt[1024];

static char optstr[] = "n:l:t:h:";

while ((c = getopt(argc, argv, optstr)) != EOF)
switch (c) {
case 'n':
thread_num = atoi(optarg);
break;
case 'l':
loop = atoi(optarg);
break;
case 't':
delay = atoi(optarg);
break;
case 'h':
default:
usage();
exit(1);
}

printf("thread=%d,loop=%d,delay=%d\n",thread_num,loop,delay);
count = malloc(sizeof(long));
shutdown = malloc(sizeof(int));
*count = 0;
*shutdown = 0;

sigemptyset(&sigset);
sigaddset(&sigset, signum);
sigprocmask (SIG_BLOCK, &sigset, 

Re: [PATCH V2 0/3] x86,idle: Enhance cpuidle prediction to handle its failure

2012-09-17 Thread Youquan Song
> > One case is turbostat utility (tools/power/x86/turbostat) at kernel 3.3 or 
> > early
> > . turbostat utility will read 10 registers one by one at Sandybridge, so it 
> > will
> > generate 10 IPIs to wake up idle CPUs. So cpuidle menu governor will 
> > predict it
> >  is repeat mode and there is another IPI wake up idle CPU soon, so it keeps 
> > idle
> >  CPU stay at C1 state even though CPU is totally idle. However, in the 
> > turbostat
> > , following 10 registers reading is sleep 5 seconds by default, so the idle 
> > CPU
> >  will keep at C1 for a long time though it is idle until break event occurs.
> > In a idle Sandybridge system, run "./turbostat -v", we will notice that 
> > deep 
> > C-state dangles between "70% ~ 99%". After patched the kernel, we will 
> > notice
> > deep C-state stays at >99.98%.
> 
> Is there an impact on performances ?

In this case, turbostat is utility to measure cpu idle status and itself
also is a workload to system. Its purpose is that show cpu C-state
information every 5 seconds. After patched the kernel, it also does
the same thing as usual. So I think the performance has no/little impact.

I do not find performance impact in my tests. If you performance impact cases or
suggestions, I will be very glad to try. 

Thanks
-Youquan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 1/3] x86,idle: Quickly notice prediction failure for repeat mode

2012-09-17 Thread Youquan Song
> Could I convince you to try out my variation on
> detect_repeating_intervals? :)
>
> http://people.redhat.com/riel/cstate/cstate-stddev-converge.patch
>
> I suspect that small change might help your code adapt to changed
> conditions even faster.

Yes. of course. your patch of cstate-stddev-converge is a good point by
filter some noise first, then calculate further. I will try to integrate
the patch to my patchset, then ask you review tomorrow. 

Thanks
-Youquan

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2 3/3] x86,idle: Set residency to 0 if target Cstate not really enter

2012-09-17 Thread Youquan Song
When cpuidle governor choose a C-state to enter for idle CPU, but it notice that
there is tasks request to be executed. So the idle CPU will not really enter
the target C-state and go to run task.

In this situation, it will use the residency of previous really entered target 
C-states. Obviously, it is not reasonable. 

So, this patch fix it by set the target C-state residency to 0. 

Signed-off-by: Youquan Song 
---
 drivers/cpuidle/cpuidle.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)


diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index 2f0083a..7992417 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -136,6 +136,10 @@ int cpuidle_idle_call(void)
/* ask the governor for the next state */
next_state = cpuidle_curr_governor->select(drv, dev);
if (need_resched()) {
+   dev->last_residency = 0;
+   /* give the governor an opportunity to reflect on the outcome */
+   if (cpuidle_curr_governor->reflect)
+   cpuidle_curr_governor->reflect(dev, next_state);
local_irq_enable();
return 0;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2 2/3] x86,idle: Quickly notice prediction failure in general case

2012-09-17 Thread Youquan Song
The prediction for future is difficult and when the cpuidle governor prediction 
fails and govenor possibly choose the shallower C-state than it should. How to 
quickly notice and find the failure becomes important for power saving.

The patch extends the patch to enhance the prediction for repeat mode by add a 
timer when menu governor choose a shallow C-state. 
The timer is set to time out in 50 milli-seconds by default. It is special twist
 that there are no power saving gains even sleep longer than it.
  
When C-state is waken up prior to the adding timer, the timer will be cancelled 
initiatively. When the timer is triggered and menu governor will quickly notice
prediction failure and re-evaluates deeper C-states possibility. 

Signed-off-by: Youquan Song 
---
 drivers/cpuidle/governors/menu.c |   48 ++
 1 files changed, 33 insertions(+), 15 deletions(-)

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 8c23fbd..9f92dd4 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -113,6 +113,13 @@ static DEFINE_PER_CPU(int, hrtimer_started);
  * represented in the system load average.
  *
  */
+
+/*
+ * Default set to 50 milliseconds based on special twist mentioned above that
+ * there are no power gains sleep longer than it.
+ */
+static unsigned int perfect_cstate_ms __read_mostly = 50;
+module_param(perfect_cstate_ms, uint, );
 
 struct menu_device {
int last_state_idx;
@@ -343,26 +350,37 @@ static int menu_select(struct cpuidle_driver *drv, struct 
cpuidle_device *dev)
data->exit_us = s->exit_latency;
}
}
-
+
+   /* not deepest C-state chosen */
if (data->last_state_idx < drv->state_count - 1) { 
+   unsigned int repeat_us = 0;
+   unsigned int perfect_us = 0;
+
+   /*
+* Set enough timer to recognize the repeat mode broken.
+* If the timer is time out, the repeat mode prediction
+* fails,then re-evaluate deeper C-states possibility.
+* If the timer is not triggered, the timer will be
+* cancelled when CPU waken up.
+*/
+   repeat_us =
+   (repeat ? (2 * data->predicted_us + MAX_DEVIATION) : 0);
+   perfect_us = perfect_cstate_ms * 1000;
 
/* Repeat mode detected */
-   if (repeat) {
-   unsigned int repeat_us = 0;
-   /* 
-* Set enough timer to recognize the repeat mode broken.
-* If the timer is time out, the repeat mode prediction
-* fails,then re-evaluate deeper C-states possibility. 
-* If the timer is not triggered, the timer will be
-* cancelled when CPU waken up.
-*/
-   repeat_us = 2 * data->predicted_us + MAX_DEVIATION;
-   hrtimer_start(hrtmr, ns_to_ktime(1000 * repeat_us),
-   HRTIMER_MODE_REL_PINNED);
+   if (repeat && (repeat_us  < perfect_us)) {
+   hrtimer_start(hrtmr, ns_to_ktime(1000 * repeat_us),
+   HRTIMER_MODE_REL_PINNED);
+   /* menu hrtimer is started */
+   per_cpu(hrtimer_started, cpu) = 1;
+   } else if (perfect_us < data->expected_us) {
+   /* expected time is larger than adding timer time */
+   hrtimer_start(hrtmr, ns_to_ktime(1000 * perfect_us),
+   HRTIMER_MODE_REL_PINNED);
/* menu hrtimer is started */
per_cpu(hrtimer_started, cpu) = 1;
-   }
-   }
+   }
+   }
 
return data->last_state_idx;
 }
-- 
1.6.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2 1/3] x86,idle: Quickly notice prediction failure for repeat mode

2012-09-17 Thread Youquan Song
The prediction for future is difficult and when the cpuidle governor prediction 
fails and govenor possibly choose the shallower C-state than it should. How to 
quickly notice and find the failure becomes important for power saving.

cpuidle menu governor has a method to predict the repeat pattern if there are 8
C-states residency which are continuous and the same or very close, so it will
predict the next C-states residency will keep same residency time.

There is a real case that turbostat utility (tools/power/x86/turbostat) 
at kernel 3.3 or early. turbostat utility will read 10 registers one by one at
Sandybridge, so it will generate 10 IPIs to wake up idle CPUs. So cpuidle menu
 governor will predict it is repeat mode and there is another IPI wake up idle
 CPU soon, so it keeps idle CPU stay at C1 state even though CPU is totally 
idle. However, in the turbostat, following 10 registers reading is sleep 5 
seconds by default, so the idle CPU will keep at C1 for a long time though it is
 idle until break event occurs.
In a idle Sandybridge system, run "./turbostat -v", we will notice that deep 
C-state dangles between "70% ~ 99%". After patched the kernel, we will notice
deep C-state stays at >99.98%.

In the patch, a timer is added when menu governor detects a repeat mode and
choose a shallow C-state. The timer is set to a time out value that greater
than predicted time, and we conclude repeat mode prediction failure if timer is
triggered. When repeat mode happens as expected, the timer is not triggered
and CPU waken up from C-states and it will cancel the timer initiatively. 
When repeat mode does not happen, the timer will be time out and menu governor 
will quickly notice that the repeat mode prediction fails and then re-evaluates 
deeper C-states possibility.

Below is another case which will clearly show the patch much benefit:

#include 
#include 
#include 
#include 
#include 
#include 
#include 

volatile int * shutdown;
volatile long * count;
int delay = 20;
int loop = 8;

void usage(void)
{
fprintf(stderr,
"Usage: idle_predict [options]\n"
"  --help   -h  Print this help\n"
"  --thread -n  Thread number\n"
"  --loop   -l  Loop times in shallow Cstate\n"
"  --delay  -t  Sleep time (uS)in shallow Cstate\n");
}

void *simple_loop() {
int idle_num = 1;
while (!(*shutdown)) {
*count = *count + 1;

if (idle_num % loop)
usleep(delay);
else {
/* sleep 1 second */
usleep(100);
idle_num = 0;
}
idle_num++;
}

}

static void sighand(int sig)
{
*shutdown = 1;
}

int main(int argc, char *argv[])
{
sigset_t sigset;
int signum = SIGALRM;
int i, c, er = 0, thread_num = 8;
pthread_t pt[1024];

static char optstr[] = "n:l:t:h:";

while ((c = getopt(argc, argv, optstr)) != EOF)
switch (c) {
case 'n':
thread_num = atoi(optarg);
break;
case 'l':
loop = atoi(optarg);
break;
case 't':
delay = atoi(optarg);
break;
case 'h':
default:
usage();
exit(1);
}

printf("thread=%d,loop=%d,delay=%d\n",thread_num,loop,delay);
count = malloc(sizeof(long));
shutdown = malloc(sizeof(int));
*count = 0;
*shutdown = 0;

sigemptyset(&sigset);
sigaddset(&sigset, signum);
sigprocmask (SIG_BLOCK, &sigset, NULL);
signal(SIGINT, sighand);
signal(SIGTERM, sighand);

for(i = 0; i < thread_num ; i++)
pthread_create(&pt[i], NULL, simple_loop, NULL);

for (i = 0; i < thread_num; i++)
pthread_join(pt[i], NULL);

exit(0);
}

Get powertop V2 from git://github.com/fenrus75/powertop, build powertop.
After build the above test application, then run it.
Test plaform can be Intel Sandybridge or other recent platforms.
#./idle_predict -l 10 &
#./powertop

We will find that deep C-state will dangle between 40%~100% and much time spent
on C1 state. It is because menu governor wrongly predict that repeat mode
is kept, so it will choose the C1 shallow C-state even though it has chance to
sleep 1 second in deep C-state.
 
While after patched the kernel, we find that deep C-state will keep >99.6

[PATCH V2 0/3] x86,idle: Enhance cpuidle prediction to handle its failure

2012-09-17 Thread Youquan Song

The prediction for future is difficult and when the cpuidle governor prediction 
fails and govenor possibly choose the shallower C-state than it should. How to 
quickly notice and find the failure becomes important for power saving.

cpuidle menu governor has a method to predict the repeat pattern if there are 8
C-states residency which are continuous and the same or very close, so it will
predict the next C-states residency will keep same residency time.

This patchset adds a timer when menu governor choose a non-deepest C-state in
order to wake up quickly from shallow C-state to avoid staying too long at 
shallow C-state for prediction failure. The timer is set to a time out value 
that is greater than predicted time and if the timer with the value is 
triggered 
, we can confidently conclude prediction is failure. When prediction
succeeds, CPU is waken up from C-states in predicted time and the timer is not 
triggered and will be cancelled right after CPU waken up. When prediction fails,
the timer is triggered to wake up CPU from shallow C-states, so menu governor 
will quickly notice that prediction fails and then re-evaluates deeper C-states
 possibility. This patchset can improves cpuidle prediction process for both 
repeat mode and general mode.

There are 2 cases will clear show this patchset benefit.

One case is turbostat utility (tools/power/x86/turbostat) at kernel 3.3 or early
. turbostat utility will read 10 registers one by one at Sandybridge, so it will
generate 10 IPIs to wake up idle CPUs. So cpuidle menu governor will predict it
 is repeat mode and there is another IPI wake up idle CPU soon, so it keeps idle
 CPU stay at C1 state even though CPU is totally idle. However, in the turbostat
, following 10 registers reading is sleep 5 seconds by default, so the idle CPU
 will keep at C1 for a long time though it is idle until break event occurs.
In a idle Sandybridge system, run "./turbostat -v", we will notice that deep 
C-state dangles between "70% ~ 99%". After patched the kernel, we will notice
deep C-state stays at >99.98%.

Below is another case which will clearly show the patch much benefit:

#include 
#include 
#include 
#include 
#include 
#include 
#include 

volatile int * shutdown;
volatile long * count;
int delay = 20;
int loop = 8;

void usage(void)
{
fprintf(stderr,
"Usage: idle_predict [options]\n"
"  --help   -h  Print this help\n"
"  --thread -n  Thread number\n"
"  --loop   -l  Loop times in shallow Cstate\n"
"  --delay  -t  Sleep time (uS)in shallow Cstate\n");
}

void *simple_loop() {
int idle_num = 1;
while (!(*shutdown)) {
*count = *count + 1;

if (idle_num % loop)
usleep(delay);
else {
/* sleep 1 second */
usleep(100);
idle_num = 0;
}
idle_num++;
}

}

static void sighand(int sig)
{
*shutdown = 1;
}

int main(int argc, char *argv[])
{
sigset_t sigset;
int signum = SIGALRM;
int i, c, er = 0, thread_num = 8;
pthread_t pt[1024];

static char optstr[] = "n:l:t:h:";

while ((c = getopt(argc, argv, optstr)) != EOF)
switch (c) {
case 'n':
thread_num = atoi(optarg);
break;
case 'l':
loop = atoi(optarg);
break;
case 't':
delay = atoi(optarg);
break;
case 'h':
default:
usage();
exit(1);
}

printf("thread=%d,loop=%d,delay=%d\n",thread_num,loop,delay);
count = malloc(sizeof(long));
shutdown = malloc(sizeof(int));
*count = 0;
*shutdown = 0;

sigemptyset(&sigset);
sigaddset(&sigset, signum);
sigprocmask (SIG_BLOCK, &sigset, NULL);
signal(SIGINT, sighand);
signal(SIGTERM, sighand);

for(i = 0; i < thread_num ; i++)
pthread_create(&pt[i], NULL, simple_loop, NULL);

for (i = 0; i < thread_num; i++)
pthread_join(pt[i], NULL);

exit(0);
}

Get powertop v2 from git://github.com/fenrus75/powertop, build powertop.
After build the above test application, then run it.
Test plaform can be Intel Sandybridge or other recent platforms.
#./idle_predict -l 10 &
#./powertop

We will find that deep C-state will dangle between 40%~100% and much time spent
on C1 state. It is because menu governor wrongly predict that repeat mode
is kept, so it will choose the C1 shallow C-state even though it has c

Re: KS/Plumbers: c-state governor BOF

2012-09-11 Thread Youquan Song
> Your patches could make a lot of sense when integrated with my
> patches:
>
> http://people.redhat.com/riel/cstate/
> However, we should probably get the tracepoint upstream first,
> so we can know for sure :)

I can not access the patches at this directory. Can you send it to me?
I will look at your patches and then integrated with my patches to look
what will happen tomorrow.

Do you have test case share? or ideas how to show the benefit.

I have done many test for my pathes. It show some benefit big or small
in various cases, but there is no negative effect showed at least. 
 
I have two onviced test cases to show the great benefit
1.  turbostat v1 (before 3.5)
2. I write the simple test application which also show greate benefit.
running it by #./idle_predict -l 8


I write a simple application using usleep which it is clear to the
repeat mode prediction failure will greatly effect the application with
such repeat pattern.

---
#include 
#include 
#include 
#include 
#include 
#include 
#include 

volatile int * shutdown;
volatile long * count;
int delay = 20;
int loop = 8;

void usage(void)
{
fprintf(stderr,
"Usage: idle_predict [options]\n"
"  --help   -h  Print this help\n"
"  --thread -n  Thread number\n"
"  --loop   -l  Loop times in shallow Cstate\n"
"  --delay  -t  Sleep time (uS)in shallow
Cstate\n");
}

void *simple_loop() {
int idle_num = 1;
while (*shutdown) {
*count = *count + 1;

if (idle_num % loop)
usleep(delay);
else {
/* sleep 1 second */
usleep(100);
idle_num = 0;
}
idle_num++;
}

}

static void sighand(int sig)
{
*shutdown = 0;
}

int main(int argc, char *argv[])
{
sigset_t sigset;
int signum = SIGALRM;
int i, c, er = 0, thread_num = 8;
pthread_t pt[1024];

static char optstr[] = "n:l:t:h:";

while ((c = getopt(argc, argv, optstr)) != EOF)
switch (c) {
case 'n':
thread_num = atoi(optarg);
break;
case 'l':
loop = atoi(optarg);
break;
case 't':
delay = atoi(optarg);
break;
case 'h':
default:
usage();
exit(1);
}

printf("thread=%d,loop=%d,delay=%d\n",thread_num,loop,delay);
count = malloc(sizeof(long));
shutdown = malloc(sizeof(int));
*count = 0;
*shutdown = 1;

sigemptyset(&sigset);
sigaddset(&sigset, signum);
sigprocmask (SIG_BLOCK, &sigset, NULL);
signal(SIGINT, sighand);
signal(SIGTERM, sighand);

for(i = 0; i < thread_num ; i++)
pthread_create(&pt[i], NULL, simple_loop, NULL);

for (i = 0; i < thread_num; i++)
pthread_join(pt[i], NULL);

exit(0);
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: KS/Plumbers: c-state governor BOF

2012-09-11 Thread Youquan Song
> After talking about my RFC patches to the c-state governor with
> Matthew and Arjan, it is clear that the whole concept of how
> things are done could use some more discussion.
>
> Since a good number of us will be in San Diego next week, at
> Kernel Summit / Plumbers / etc, I will organize a c-state
> governor BOF for those who are interested.
>
> Things to think about:
> - what should the c-state governor do?
> - how to best predict the future?
> - what kinds of odd workloads do we need to accomodate?

Hi Rik,

Just notice there is a topic to discuss menu governor at Kernel Summit.
Acutally, I have posted a patchset to at May 11 2012 to bring up the
topic, at that time, I only have a convinced and proved application 
turbostat v1 to prove that my patch are useful. I try to find other
workloads to prove that the patchset are also solidated useful. But I
stucked in other high priority tasks, so I move slow on it.
>From you bring up the issue I guess that you already has real workload
to show this issue. 
My patchset is not only improve repeat mode failure but also improve 
general prediction failure. Let's have a discuss and talk about it.

Here is the patchset posted at May 11 2012.

http://lwn.net/Articles/496919/ "x86,idle: Enhance cpuidle prediction to
handle its failure"
http://lkml.indiana.edu/hypermail/linux/kernel/1205.1/02267.html
"[PATCH 1/3] x86,idle: Quickly notice prediction failure for repeat mode"
http://lkml.indiana.edu/hypermail/linux/kernel/1205.1/02268.html
"[PATCH 2/3] x86,idle: Quickly notice prediction failure in general case"
http://lkml.indiana.edu/hypermail/linux/kernel/1205.1/02269.html
"[PATCH 3/3] x86,idle: Set residency to 0 if target Cstate not really
enter"

Thanks
-Youquan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/