[tip: ras/core] x86/mce: Add _ASM_EXTABLE_CPY for copy user access
The following commit has been merged into the ras/core branch of tip: Commit-ID: 278b917f8cb9b02923c15249f9d1a5769d2c1976 Gitweb: https://git.kernel.org/tip/278b917f8cb9b02923c15249f9d1a5769d2c1976 Author:Youquan Song AuthorDate:Tue, 06 Oct 2020 14:09:07 -07:00 Committer: Borislav Petkov CommitterDate: Wed, 07 Oct 2020 11:19:11 +02:00 x86/mce: Add _ASM_EXTABLE_CPY for copy user access _ASM_EXTABLE_UA is a general exception entry to record the exception fixup for all exception spots between kernel and user space access. To enable recovery from machine checks while coping data from user addresses it is necessary to be able to distinguish the places that are looping copying data from those that copy a single byte/word/etc. Add a new macro _ASM_EXTABLE_CPY and use it in place of _ASM_EXTABLE_UA in the copy functions. Record the exception reason number to regs->ax at ex_handler_uaccess which is used to check MCE triggered. The new fixup routine ex_handler_copy() is almost an exact copy of ex_handler_uaccess() The difference is that it sets regs->ax to the trap number. Following patches use this to avoid trying to copy remaining bytes from the tail of the copy and possibly hitting the poison again. New mce.kflags bit MCE_IN_KERNEL_COPYIN will be used by mce_severity() calculation to indicate that a machine check is recoverable because the kernel was copying from user space. Signed-off-by: Youquan Song Signed-off-by: Tony Luck Signed-off-by: Borislav Petkov Link: https://lkml.kernel.org/r/20201006210910.21062-4-tony.l...@intel.com --- arch/x86/include/asm/asm.h | 6 ++- arch/x86/include/asm/mce.h | 15 ++- arch/x86/lib/copy_user_64.S | 96 ++-- arch/x86/mm/extable.c | 14 - 4 files changed, 82 insertions(+), 49 deletions(-) diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h index 5c15f95..0359cbb 100644 --- a/arch/x86/include/asm/asm.h +++ b/arch/x86/include/asm/asm.h @@ -135,6 +135,9 @@ # define _ASM_EXTABLE_UA(from, to) \ _ASM_EXTABLE_HANDLE(from, to, ex_handler_uaccess) +# define _ASM_EXTABLE_CPY(from, to)\ + _ASM_EXTABLE_HANDLE(from, to, ex_handler_copy) + # define _ASM_EXTABLE_FAULT(from, to) \ _ASM_EXTABLE_HANDLE(from, to, ex_handler_fault) @@ -160,6 +163,9 @@ # define _ASM_EXTABLE_UA(from, to) \ _ASM_EXTABLE_HANDLE(from, to, ex_handler_uaccess) +# define _ASM_EXTABLE_CPY(from, to)\ + _ASM_EXTABLE_HANDLE(from, to, ex_handler_copy) + # define _ASM_EXTABLE_FAULT(from, to) \ _ASM_EXTABLE_HANDLE(from, to, ex_handler_fault) diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h index ba2062d..a0f1478 100644 --- a/arch/x86/include/asm/mce.h +++ b/arch/x86/include/asm/mce.h @@ -136,9 +136,24 @@ #defineMCE_HANDLED_NFITBIT_ULL(3) #defineMCE_HANDLED_EDACBIT_ULL(4) #defineMCE_HANDLED_MCELOG BIT_ULL(5) + +/* + * Indicates an MCE which has happened in kernel space but from + * which the kernel can recover simply by executing fixup_exception() + * so that an error is returned to the caller of the function that + * hit the machine check. + */ #define MCE_IN_KERNEL_RECOVBIT_ULL(6) /* + * Indicates an MCE that happened in kernel space while copying data + * from user. In this case fixup_exception() gets the kernel to the + * error exit for the copy function. Machine check handler can then + * treat it like a fault taken in user mode. + */ +#define MCE_IN_KERNEL_COPYIN BIT_ULL(7) + +/* * This structure contains all data related to the MCE log. Also * carries a signature to make it easier to find from external * debugging tools. Each entry is only valid when its finished flag diff --git a/arch/x86/lib/copy_user_64.S b/arch/x86/lib/copy_user_64.S index 816f128..5b68e94 100644 --- a/arch/x86/lib/copy_user_64.S +++ b/arch/x86/lib/copy_user_64.S @@ -36,8 +36,8 @@ jmp .Lcopy_user_handle_tail .previous - _ASM_EXTABLE_UA(100b, 103b) - _ASM_EXTABLE_UA(101b, 103b) + _ASM_EXTABLE_CPY(100b, 103b) + _ASM_EXTABLE_CPY(101b, 103b) .endm /* @@ -116,26 +116,26 @@ SYM_FUNC_START(copy_user_generic_unrolled) 60:jmp .Lcopy_user_handle_tail /* ecx is zerorest also */ .previous - _ASM_EXTABLE_UA(1b, 30b) - _ASM_EXTABLE_UA(2b, 30b) - _ASM_EXTABLE_UA(3b, 30b) - _ASM_EXTABLE_UA(4b, 30b) - _ASM_EXTABLE_UA(5b, 30b) - _ASM_EXTABLE_UA(6b, 30b) - _ASM_EXTABLE_UA(7b, 30b) - _ASM_EXTABLE_UA(8b, 30b) - _ASM_EXTABLE_UA(9b, 30b) - _ASM_EXTABLE_UA(10b, 30b) - _ASM_EXTABLE_UA(11b, 30b) - _ASM_EXTABLE_UA(12b, 30b) - _ASM_EXTABLE_UA(13b, 30b) - _ASM_EXTABLE_UA(14b, 30b) - _ASM_EXTABLE_UA(15
[tip: ras/core] x86/mce: Pass pointer to saved pt_regs to severity calculation routines
The following commit has been merged into the ras/core branch of tip: Commit-ID: 41ce0564bfe2e129d56730418d8c0a9f9f2d31b5 Gitweb: https://git.kernel.org/tip/41ce0564bfe2e129d56730418d8c0a9f9f2d31b5 Author:Youquan Song AuthorDate:Tue, 06 Oct 2020 14:09:05 -07:00 Committer: Borislav Petkov CommitterDate: Wed, 07 Oct 2020 10:51:42 +02:00 x86/mce: Pass pointer to saved pt_regs to severity calculation routines New recovery features require additional information about processor state when a machine check occurred. Pass pt_regs down to the routines that need it. No functional change. Signed-off-by: Youquan Song Signed-off-by: Tony Luck Signed-off-by: Borislav Petkov Link: https://lkml.kernel.org/r/20201006210910.21062-2-tony.l...@intel.com --- arch/x86/kernel/cpu/mce/core.c | 14 +++--- arch/x86/kernel/cpu/mce/internal.h | 3 ++- arch/x86/kernel/cpu/mce/severity.c | 14 -- 3 files changed, 17 insertions(+), 14 deletions(-) diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index b5b70f4..2d6caf0 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -807,7 +807,7 @@ log_it: goto clear_it; mce_read_aux(&m, i); - m.severity = mce_severity(&m, mca_cfg.tolerant, NULL, false); + m.severity = mce_severity(&m, NULL, mca_cfg.tolerant, NULL, false); /* * Don't get the IP here because it's unlikely to * have anything to do with the actual error location. @@ -856,7 +856,7 @@ static int mce_no_way_out(struct mce *m, char **msg, unsigned long *validp, quirk_no_way_out(i, m, regs); m->bank = i; - if (mce_severity(m, mca_cfg.tolerant, &tmp, true) >= MCE_PANIC_SEVERITY) { + if (mce_severity(m, regs, mca_cfg.tolerant, &tmp, true) >= MCE_PANIC_SEVERITY) { mce_read_aux(m, i); *msg = tmp; return 1; @@ -956,7 +956,7 @@ static void mce_reign(void) */ if (m && global_worst >= MCE_PANIC_SEVERITY && mca_cfg.tolerant < 3) { /* call mce_severity() to get "msg" for panic */ - mce_severity(m, mca_cfg.tolerant, &msg, true); + mce_severity(m, NULL, mca_cfg.tolerant, &msg, true); mce_panic("Fatal machine check", m, msg); } @@ -1167,7 +1167,7 @@ static noinstr bool mce_check_crashing_cpu(void) return false; } -static void __mc_scan_banks(struct mce *m, struct mce *final, +static void __mc_scan_banks(struct mce *m, struct pt_regs *regs, struct mce *final, unsigned long *toclear, unsigned long *valid_banks, int no_way_out, int *worst) { @@ -1202,7 +1202,7 @@ static void __mc_scan_banks(struct mce *m, struct mce *final, /* Set taint even when machine check was not enabled. */ add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE); - severity = mce_severity(m, cfg->tolerant, NULL, true); + severity = mce_severity(m, regs, cfg->tolerant, NULL, true); /* * When machine check was for corrected/deferred handler don't @@ -1354,7 +1354,7 @@ noinstr void do_machine_check(struct pt_regs *regs) order = mce_start(&no_way_out); } - __mc_scan_banks(&m, final, toclear, valid_banks, no_way_out, &worst); + __mc_scan_banks(&m, regs, final, toclear, valid_banks, no_way_out, &worst); if (!no_way_out) mce_clear_state(toclear); @@ -1376,7 +1376,7 @@ noinstr void do_machine_check(struct pt_regs *regs) * make sure we have the right "msg". */ if (worst >= MCE_PANIC_SEVERITY && mca_cfg.tolerant < 3) { - mce_severity(&m, cfg->tolerant, &msg, true); + mce_severity(&m, regs, cfg->tolerant, &msg, true); mce_panic("Local fatal machine check!", &m, msg); } } diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h index b122610..88dcc79 100644 --- a/arch/x86/kernel/cpu/mce/internal.h +++ b/arch/x86/kernel/cpu/mce/internal.h @@ -38,7 +38,8 @@ int mce_gen_pool_add(struct mce *mce); int mce_gen_pool_init(void); struct llist_node *mce_gen_pool_prepare_records(void); -extern int (*mce_severity)(struct mce *a, int tolerant, char **msg, bool is_excp); +extern int (*mce_severity)(struct mce *a, struct pt_regs *regs, + int tolerant, char **msg, bool is_excp); struct dentry *mce_get_debugfs_dir(void); extern
[PATCH 14/24] x86/speculation: Move firmware_restrict_branch_speculation_*() from C to CPP
From: Ingo Molnar (cherry picked from commit d72f4e29e6d84b7ec02ae93088aa459ac70e733b) firmware_restrict_branch_speculation_*() recently started using preempt_enable()/disable(), but those are relatively high level primitives and cause build failures on some 32-bit builds. Since we want to keep low level, convert them to macros to avoid header hell... Cc: David Woodhouse Cc: Thomas Gleixner Cc: Linus Torvalds Cc: Peter Zijlstra Cc: arjan.van.de@intel.com Cc: b...@alien8.de Cc: dave.han...@intel.com Cc: jmatt...@google.com Cc: karah...@amazon.de Cc: k...@vger.kernel.org Cc: pbonz...@redhat.com Cc: rkrc...@redhat.com Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Youquan Song [v4.4 backport] --- arch/x86/include/asm/nospec-branch.h | 26 ++ 1 file changed, 14 insertions(+), 12 deletions(-) diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h index 27582aa..4675f65 100644 --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -214,20 +214,22 @@ static inline void indirect_branch_prediction_barrier(void) /* * With retpoline, we must use IBRS to restrict branch prediction * before calling into firmware. + * + * (Implemented as CPP macros due to header hell.) */ -static inline void firmware_restrict_branch_speculation_start(void) -{ - preempt_disable(); - alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS, - X86_FEATURE_USE_IBRS_FW); -} +#define firmware_restrict_branch_speculation_start() \ +do { \ + preempt_disable(); \ + alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS, \ + X86_FEATURE_USE_IBRS_FW); \ +} while (0) -static inline void firmware_restrict_branch_speculation_end(void) -{ - alternative_msr_write(MSR_IA32_SPEC_CTRL, 0, - X86_FEATURE_USE_IBRS_FW); - preempt_enable(); -} +#define firmware_restrict_branch_speculation_end() \ +do { \ + alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,\ + X86_FEATURE_USE_IBRS_FW); \ + preempt_enable(); \ +} while (0) #endif /* __ASSEMBLY__ */ -- 1.8.3.1
[PATCH 14/24] x86/speculation: Move firmware_restrict_branch_speculation_*() from C to CPP
From: Ingo Molnar (cherry picked from commit d72f4e29e6d84b7ec02ae93088aa459ac70e733b) firmware_restrict_branch_speculation_*() recently started using preempt_enable()/disable(), but those are relatively high level primitives and cause build failures on some 32-bit builds. Since we want to keep low level, convert them to macros to avoid header hell... Cc: David Woodhouse Cc: Thomas Gleixner Cc: Linus Torvalds Cc: Peter Zijlstra Cc: arjan.van.de@intel.com Cc: b...@alien8.de Cc: dave.han...@intel.com Cc: jmatt...@google.com Cc: karah...@amazon.de Cc: k...@vger.kernel.org Cc: pbonz...@redhat.com Cc: rkrc...@redhat.com Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Youquan Song [v4.4 backport] --- arch/x86/include/asm/nospec-branch.h | 26 ++ 1 file changed, 14 insertions(+), 12 deletions(-) diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h index 27582aa..4675f65 100644 --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -214,20 +214,22 @@ static inline void indirect_branch_prediction_barrier(void) /* * With retpoline, we must use IBRS to restrict branch prediction * before calling into firmware. + * + * (Implemented as CPP macros due to header hell.) */ -static inline void firmware_restrict_branch_speculation_start(void) -{ - preempt_disable(); - alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS, - X86_FEATURE_USE_IBRS_FW); -} +#define firmware_restrict_branch_speculation_start() \ +do { \ + preempt_disable(); \ + alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS, \ + X86_FEATURE_USE_IBRS_FW); \ +} while (0) -static inline void firmware_restrict_branch_speculation_end(void) -{ - alternative_msr_write(MSR_IA32_SPEC_CTRL, 0, - X86_FEATURE_USE_IBRS_FW); - preempt_enable(); -} +#define firmware_restrict_branch_speculation_end() \ +do { \ + alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,\ + X86_FEATURE_USE_IBRS_FW); \ + preempt_enable(); \ +} while (0) #endif /* __ASSEMBLY__ */ -- 1.8.3.1
[PATCH 14/23] x86/speculation: Move firmware_restrict_branch_speculation_*() from C to CPP
From: Ingo Molnar (cherry picked from commit d72f4e29e6d84b7ec02ae93088aa459ac70e733b) firmware_restrict_branch_speculation_*() recently started using preempt_enable()/disable(), but those are relatively high level primitives and cause build failures on some 32-bit builds. Since we want to keep low level, convert them to macros to avoid header hell... Cc: David Woodhouse Cc: Thomas Gleixner Cc: Linus Torvalds Cc: Peter Zijlstra Cc: arjan.van.de@intel.com Cc: b...@alien8.de Cc: dave.han...@intel.com Cc: jmatt...@google.com Cc: karah...@amazon.de Cc: k...@vger.kernel.org Cc: pbonz...@redhat.com Cc: rkrc...@redhat.com Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman [Youquan Song: port to 4.4] Signed-off-by: Youquan Song --- arch/x86/include/asm/nospec-branch.h | 26 ++ 1 file changed, 14 insertions(+), 12 deletions(-) diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h index 27582aa..4675f65 100644 --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -214,20 +214,22 @@ static inline void indirect_branch_prediction_barrier(void) /* * With retpoline, we must use IBRS to restrict branch prediction * before calling into firmware. + * + * (Implemented as CPP macros due to header hell.) */ -static inline void firmware_restrict_branch_speculation_start(void) -{ - preempt_disable(); - alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS, - X86_FEATURE_USE_IBRS_FW); -} +#define firmware_restrict_branch_speculation_start() \ +do { \ + preempt_disable(); \ + alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS, \ + X86_FEATURE_USE_IBRS_FW); \ +} while (0) -static inline void firmware_restrict_branch_speculation_end(void) -{ - alternative_msr_write(MSR_IA32_SPEC_CTRL, 0, - X86_FEATURE_USE_IBRS_FW); - preempt_enable(); -} +#define firmware_restrict_branch_speculation_end() \ +do { \ + alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,\ + X86_FEATURE_USE_IBRS_FW); \ + preempt_enable(); \ +} while (0) #endif /* __ASSEMBLY__ */ -- 1.9.1
[PATCH 1/3] dmar: Fix domain id not update to newly create
At domain_context_mapping_one(), if the domain is still not assign domain id, it will assign a new domain_id for it, but the newly creating domain id is not update to domain, so the domain will keep an unkown domain id. It will cause the issues: like flush wrong domain in iommu->flush.flush_iotlb, and free/release wrong domain. Tested-by: Zhiyuan Zhou Signed-off-by: Youquan Song --- drivers/iommu/intel-iommu.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 43b9bfe..9cd522f 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -1625,6 +1625,7 @@ static int domain_context_mapping_one(struct dmar_domain *domain, int segment, } } + domain->id = id; context_set_domain_id(context, id); if (translation != CONTEXT_TT_PASS_THROUGH) { -- 1.7.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/3] dmar: Move the confuse comments to proper place
the "found=1" should be "there are other device owned by the domain", the comments is put at wrong place and make the code reviewing confuse, so move it to the correct place. Signed-off-by: Youquan Song --- drivers/iommu/intel-iommu.c |8 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 9cd522f..aa821fc 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -3813,10 +3813,6 @@ static void domain_remove_one_dev_info(struct dmar_domain *domain, continue; } - /* if there is no other devices under the same iommu -* owned by this domain, clear this iommu in iommu_bmp -* update iommu count and coherency -*/ if (iommu == device_to_iommu(info->segment, info->bus, info->devfn)) found = 1; @@ -3824,6 +3820,10 @@ static void domain_remove_one_dev_info(struct dmar_domain *domain, spin_unlock_irqrestore(&device_domain_lock, flags); + /* if there is no other devices under the same iommu +* owned by this domain, clear this iommu in iommu_bmp +* update iommu count and coherency +*/ if (found == 0) { unsigned long tmp_flags; spin_lock_irqsave(&domain->iommu_lock, tmp_flags); -- 1.7.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/3] dmar: reduce loop to find multi-devices owned by IOMMU
When try to find if the iommu owns other devices in the domain except the device will be moved. It will loop all devices under the domain if the removed device is the first device in domain devices list. This patch will improve it and it only loop before find the removed device and one of other device, so save the loop time and make the code more clear. Signed-off-by: Youquan Song --- drivers/iommu/intel-iommu.c | 15 ++- 1 files changed, 6 insertions(+), 9 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index aa821fc..9f3bf3f 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -3785,7 +3785,7 @@ static void domain_remove_one_dev_info(struct dmar_domain *domain, struct device_domain_info *info, *tmp; struct intel_iommu *iommu; unsigned long flags; - int found = 0; + int found = 0, del = 0; iommu = device_to_iommu(pci_domain_nr(pdev->bus), pdev->bus->number, pdev->devfn); @@ -3806,16 +3806,13 @@ static void domain_remove_one_dev_info(struct dmar_domain *domain, free_devinfo_mem(info); spin_lock_irqsave(&device_domain_lock, flags); - - if (found) - break; - else - continue; - } - - if (iommu == device_to_iommu(info->segment, info->bus, + del = 1; + } else if (iommu == device_to_iommu(info->segment, info->bus, info->devfn)) found = 1; + + if (found & del) + break; } spin_unlock_irqrestore(&device_domain_lock, flags); -- 1.7.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] dma: Add interface to calculate data transferred
On Sun, Oct 13, 2013 at 08:56:33PM +0530, Vinod Koul wrote: > On Fri, Oct 11, 2013 at 06:33:43AM -0700, Greg KH wrote: > > On Fri, Oct 11, 2013 at 05:42:17PM -0400, Youquan Song wrote: > > > Currently, the DMA channel calculates its data transferred only at network > > > device driver. When other devices like UART or SPI etc, transfers data by > > > DMA > > > mode, but it always shows 0 at /sys/class/dma/dma0chan*/bytes_transferred. > > > > Is that really a problem? I have never heard anyone complaining about > > it. Where are the reports of this? > Right, am not still getting the point on what is the problem that this series > is > trying to fix.. The issue is that when I using UART to transfer data between to COMs which using Designware DMA controller channel. But I check the specific DMA channel by "cat /sys/class/dma/dma0chan3/bytes_transferred", but it should all "0". I have transferred data by UART port, why its DMA channel report "0" bytes transferred? So I guess that it is possible the DMA device driver issue or the data does not use the Designware DMA channel fro transferred. After check the code, I notice only when the DMA channel used by network device driver and it will record how much data has been tranferred, why other device driver will not calculate it. Since DMA channel is used by other device driver, why only network is specific? since it is common interface, the current /sys/class/dma/dma0chan*/bytes_transferred has much possibility to mislead the user. Thanks -Youquan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] dma: calculate the data tranferred by 8250
When using UART transfers data by DMA mode, but it always shows 0 at /sys/class/dma/dma0chan*/bytes_transferred. Call the new function to calculate how many the data has been transferred after doing it by DMA mode. Signed-off-by: Youquan Song --- drivers/tty/serial/8250/8250_dma.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/tty/serial/8250/8250_dma.c b/drivers/tty/serial/8250/8250_dma.c index 7046769..b22ef80 100644 --- a/drivers/tty/serial/8250/8250_dma.c +++ b/drivers/tty/serial/8250/8250_dma.c @@ -83,7 +83,7 @@ int serial8250_tx_dma(struct uart_8250_port *p) desc->callback = __dma_tx_complete; desc->callback_param = p; - dma->tx_cookie = dmaengine_submit(desc); + dma->tx_cookie = dma_tx_submit_cal(desc, dma->txchan, dma->tx_size); dma_sync_single_for_device(dma->txchan->device->dev, dma->tx_addr, UART_XMIT_SIZE, DMA_TO_DEVICE); -- 1.7.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
DMA: Calculate how many data transferred by DMA
Currently, the DMA channel calculates its data transferred only at network device driver. When other devices like UART or SPI etc, transfers data by DMA mode, but it always shows 0 at /sys/class/dma/dma0chan*/bytes_transferred. It will possibly mislead user that the DMA engine does not work. This patch add a new function which will calculate how many the data has been transferred after doing it by DMA mode. It can be used by other modules and also simplify current duplicated code. Add the interface when UART transfer data by Designware DMA engine. It will calculate the data already tranferred in the DMA channel. If the patch work, I will add the interface to other modules when needed. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] dma: Add interface to calculate data transferred
Currently, the DMA channel calculates its data transferred only at network device driver. When other devices like UART or SPI etc, transfers data by DMA mode, but it always shows 0 at /sys/class/dma/dma0chan*/bytes_transferred. This patch add a new function which will calculate how many the data has been transferred after doing it by DMA mode. It can be used by other modules and also simplify current duplicated code. Signed-off-by: Youquan Song --- drivers/dma/dmaengine.c | 35 +++ include/linux/dmaengine.h |3 +++ 2 files changed, 22 insertions(+), 16 deletions(-) diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c index 9162ac8..4356a7e 100644 --- a/drivers/dma/dmaengine.c +++ b/drivers/dma/dmaengine.c @@ -901,6 +901,23 @@ void dma_async_device_unregister(struct dma_device *device) } EXPORT_SYMBOL(dma_async_device_unregister); +dma_cookie_t +dma_tx_submit_cal(struct dma_async_tx_descriptor *tx, + struct dma_chan *chan, size_t len) +{ + + dma_cookie_t cookie; + cookie = tx->tx_submit(tx); + + preempt_disable(); + __this_cpu_add(chan->local->bytes_transferred, len); + __this_cpu_inc(chan->local->memcpy_count); + preempt_enable(); + + return cookie; + +} + /** * dma_async_memcpy_buf_to_buf - offloaded copy between virtual addresses * @chan: DMA channel to offload copy to @@ -920,7 +937,6 @@ dma_async_memcpy_buf_to_buf(struct dma_chan *chan, void *dest, struct dma_device *dev = chan->device; struct dma_async_tx_descriptor *tx; dma_addr_t dma_dest, dma_src; - dma_cookie_t cookie; unsigned long flags; dma_src = dma_map_single(dev->dev, src, len, DMA_TO_DEVICE); @@ -937,14 +953,8 @@ dma_async_memcpy_buf_to_buf(struct dma_chan *chan, void *dest, } tx->callback = NULL; - cookie = tx->tx_submit(tx); - - preempt_disable(); - __this_cpu_add(chan->local->bytes_transferred, len); - __this_cpu_inc(chan->local->memcpy_count); - preempt_enable(); - return cookie; + return dma_tx_submit_cal(tx, chan, len); } EXPORT_SYMBOL(dma_async_memcpy_buf_to_buf); @@ -968,7 +978,6 @@ dma_async_memcpy_buf_to_pg(struct dma_chan *chan, struct page *page, struct dma_device *dev = chan->device; struct dma_async_tx_descriptor *tx; dma_addr_t dma_dest, dma_src; - dma_cookie_t cookie; unsigned long flags; dma_src = dma_map_single(dev->dev, kdata, len, DMA_TO_DEVICE); @@ -983,14 +992,8 @@ dma_async_memcpy_buf_to_pg(struct dma_chan *chan, struct page *page, } tx->callback = NULL; - cookie = tx->tx_submit(tx); - preempt_disable(); - __this_cpu_add(chan->local->bytes_transferred, len); - __this_cpu_inc(chan->local->memcpy_count); - preempt_enable(); - - return cookie; + return dma_tx_submit_cal(tx, chan, len); } EXPORT_SYMBOL(dma_async_memcpy_buf_to_pg); diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h index 0bc7275..0025f8e 100644 --- a/include/linux/dmaengine.h +++ b/include/linux/dmaengine.h @@ -1084,4 +1084,7 @@ dma_cookie_t dma_memcpy_pg_to_iovec(struct dma_chan *chan, struct iovec *iov, struct dma_pinned_list *pinned_list, struct page *page, unsigned int offset, size_t len); +dma_cookie_t dma_tx_submit_cal(struct dma_async_tx_descriptor *tx, + struct dma_chan *chan, size_t len); + #endif /* DMAENGINE_H */ -- 1.7.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, apic: Enable x2APIC physical when cpu < 256 native
> Firstly, please use the customary (multi-line) comment > style: > > /* >* Comment . >* .. goes here. >*/ > > specified in Documentation/CodingStyle. > > Secondly, please send a patch against a vanilla (e.g. > v3.11-rc5) kernel, as I've already zapped your previous > patch from tip:x86/apic per your request. Hi Ingo, latest vanilla has no includes the patch yet, so I think it will be fine by only dropping it from tip tree. Thanks -Youquan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, apic: Enable x2APIC physical when cpu < 256 native
> No problem - you might want to send another patch adding some comments to > the code, explaining why we don't switch to physical mode, quoting from > the SDM and so. Here is the revert patch. Subject: [PATCH] Revert "x86/apic: Enable x2APIC physical mode on native hardware too, when there are fewer than 256 CPUs" x2APIC without interrupt remapping is not architecture and no guarantee it will work in future. There are some words in SDM3, 10.12.7 Initialization by System Software Routing of device interrupts to local APIC units operating in x2APIC mode requires use of the interrupt-remapping architecture specified in the Intel Virtualization Technology for Directed I/O, Revision 1.3. Because of this, BIOS must enumerate support for and software must enable this interrupt remapping with Extended Interrupt Mode Enabled before it enabling x2APIC mode in the local APIC units. This reverts commit 3d1acb49d22fbbae96524040e9e2d4cbbb3adbef, do not use x2apic_pysical mode if interrupt remapping is not enabled even at CPU number fewer than 256. Signed-off-by: Youquan Song --- arch/x86/kernel/apic/apic.c |7 +-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index d9dd5a6..eca89c5 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -1622,8 +1622,11 @@ void __init enable_IR_x2apic(void) goto skip_x2apic; if (ret < 0) { - /* IR is required if there is APIC ID > 255 */ - if (max_physical_apicid > 255) { + /* IR is required if there is APIC ID > 255 even when running +* under KVM +*/ + if (max_physical_apicid > 255 || + !hypervisor_x2apic_available()) { if (x2apic_preenabled) disable_x2apic(); goto skip_x2apic; -- 1.6.4.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, apic: Enable x2APIC physical when cpu < 256 native
> In order to make sure the patch without involving unexpected issues beyond > I can understand, I will confirm with our expert about it. > > so please pend the patch going to mainline. If the patch can move on, I > think I will also provide other patch changing, like direct EOI. Hi Yinghai and Ingo, I have confirmed with our experts about it. x2APIC without interrupt remapping is not architecture and no guarantee it will work in future. What's more, there are some words in SDM3, 10.12.7 Initialization by System Software Routing of device interrupts to local APIC units operating in x2APIC mode requires use of the interrupt-remapping architecture specified in the Intel Virtualization Technology for Directed I/O, Revision 1.3. Because of this, BIOS must enumerate support for and software must enable this interrupt remapping with Extended Interrupt Mode Enabled before it enabling x2APIC mode in the local APIC units. Ingo, please drop the patch in -tip tree. 3d1acb49d22fbbae96524040e9e2d4cbbb3adbef "x86/apic: Enable x2APIC physical mode on native hardware too, when there are fewer than 256 CPUs" Sorry for making fuss here and it is my fault. Thanks -Youquan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: revert request for cpuidle patches e11538d1 and 69a37bea
Hi Jeremy, I try reproduce your result and then fix the issue, but I do not reproduce it yet. I run at netperf-2.6.0 at one machine as server: netserver, other machine: netperf -t TCP_RR -H $SERVER_IP -l 60. The target machine is used in both client and server. I do not reproduce the performance drop issue. I also notice the result is not stable, sometime it is high, sometime is low. In sumarry, it is hard to make a definite result. Can you try tell me how to reproduce the issue? how do you get the C0 data? What's your config for kernel? Do you enable CONFIG_NO_HZ_FULL=y or only CONFIG_NO_HZ=y? Thanks -Youquan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, apic: Enable x2APIC physical when cpu < 256 native
> Yes. It would be great, if Youquan can point out where is the intel doc > about the change. > > Also if the patch can move on, hypervisor_x2apic_available() related > declaration and define > could be dropped. Hi Yinghai, Sorry I do not know the document change but I also do not find the words/description/explanation that x2APIC physical mode also need interrupt remapping support when CPU < 256. Of course, X2APIC cluster mode must has interrupt remapping support. I have tested many machines, both old and most recent machines and from desktop to server, x2APIC physical mode works without interrupt remapping when CPU < 256. In theory and real test, I do not find any issue about the patch. In order to make sure the patch without involving unexpected issues beyond I can understand, I will confirm with our expert about it. so please pend the patch going to mainline. If the patch can move on, I think I will also provide other patch changing, like direct EOI. Thanks -Youquan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, apic: Enable x2APIC physical when cpu < 256 native
> > Thanks Ingo! > > The machines will be affected: CPU support x2APIC and CPU number < 256, > > chipset does not support VT-d2 or VT-d is disabled in BIOS. > > I mean, can you guess what rough percentage of new systems > shipping (or significant number of older systems already > shipped) will be affected by this? > > My feeling is that this should be relatively rare (only > when a user reconfigures the BIOS, etc.), but I might be > wrong. Sorry. I do not know what percentage of system shipped be affected. I have encountered one affected machine which CPU support x2APIC but its BIOS not support VT-d (BIOS also has no item to enable it). After apply the patch, it works with X2APIC physical mode. Of course, most of machine affected are in the case of disable VT-d in BIOS by option or add intremap=off kernel option. >From what I understand, the x2APIC physical mode should be compatiable with legacy mode when CPU < 256 without support interrupt remapping. I have tested many machines, both old and most recent machines and from desktop to server, x2APIC physical mode works without interrupt remapping when CPU < 256. Thanks -Youquan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/apic] x86/apic: Enable x2APIC physical mode on native hardware too, when there are fewer than 256 CPUs
Commit-ID: 3d1acb49d22fbbae96524040e9e2d4cbbb3adbef Gitweb: http://git.kernel.org/tip/3d1acb49d22fbbae96524040e9e2d4cbbb3adbef Author: Youquan Song AuthorDate: Thu, 11 Jul 2013 21:22:39 -0400 Committer: Ingo Molnar CommitDate: Tue, 23 Jul 2013 11:15:42 +0200 x86/apic: Enable x2APIC physical mode on native hardware too, when there are fewer than 256 CPUs x2APIC extends APICID from 8 bits to 32 bits, but the device interrupt routed from IOAPIC or delivered in MSI mode will keep 8 bits destination APICID. In order to support x2APIC, the VT-d interrupt remapping is introduced to translate the destination APICID to 32 bits in x2APIC mode and keep the device compatible in this way. x2APIC support both logical and physical mode in destination mode. In logical destination mode, the 32 bits Logical APICID has 2 sub-fields: 16 bits cluster ID and 16 bits logical ID within the cluster and it is required VT-d interrupt remapping in x2APIC cluster mode. In physical destination mode, the 8 bits physical id is compatible with 32 bits physical id when CPU number < 256. When interrupt remapping initialization fails on platforms with CPU number < 256, the current kernel only enables x2APIC physical mode in virtualization environment, while we could also can enable x2APIC physcial mode in native kernel this situation. In this case the device interrupt will use 8 bits destination APICID in physical mode and be compatible with x2APIC physical when < 256 CPUs. So we can benefit from x2APIC vs xAPIC MMIO: - x2APIC MSR read/write is faster than xAPIC mmio - x2APIC only ICR write to deliver interrupt without polling ICR deliver status bit and xAPIC need poll to read ICR deliver status bit. - x2APIC 64 bits ICR access instead of xAPIC two 32 bits access. Signed-off-by: Youquan Song Cc: Youquan Song Cc: h...@linux.intel.com Cc: ying...@kernel.org Link: http://lkml.kernel.org/r/1373592159-459-1-git-send-email-youquan.s...@intel.com Signed-off-by: Ingo Molnar --- arch/x86/kernel/apic/apic.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index eca89c5..d9dd5a6 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -1622,11 +1622,8 @@ void __init enable_IR_x2apic(void) goto skip_x2apic; if (ret < 0) { - /* IR is required if there is APIC ID > 255 even when running -* under KVM -*/ - if (max_physical_apicid > 255 || - !hypervisor_x2apic_available()) { + /* IR is required if there is APIC ID > 255 */ + if (max_physical_apicid > 255) { if (x2apic_preenabled) disable_x2apic(); goto skip_x2apic; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, apic: Enable x2APIC physical when cpu < 256 native
On Tue, Jul 23, 2013 at 11:17:29AM +0200, Ingo Molnar wrote: > > * Youquan Song wrote: > > > x2APIC extends APICID from 8 bits to 32 bits, but the device interrupt > > routed from IOAPIC or delivered in MSI mode will keep 8 bits destination > > APICID. In order to support x2APIC, the VT-d interrupt remapping is > > introduced to translate the destination APICID to 32 bits in x2APIC mode > > and keep the device compatible in this way. > > > > x2APIC support both logical and physical mode in destination mode. In > > logical destination mode, the 32 bits Logical APICID has 2 sub-fields: > > 16 bits cluster ID and 16 bits logical ID within the cluster and it is > > required VT-d interrupt remapping in x2APIC cluster mode. In physical > > destination mode, the 8 bits physical id is compatible with 32 bits > > physical id when CPU number < 256. When interrupt remapping > > initialization fail on platform with CPU number < 256, current kernel > > only enables x2APIC physical mode in virutalization environment, while > > we also can enable x2APIC physcial mode in native kernel this situation, > > and the device interrupt will use 8 bits destination APICID in physical > > mode and be compatible with x2APIC physical when < 256 CPUs. > > > > So we can benefit from x2APIC vs xAPIC MMIO: > > - x2APIC MSR read/write is faster than xAPIC mmio > > - x2APIC only ICR write to deliver interrupt without polling ICR deliver > >status bit and xAPIC need poll to read ICR deliver status bit. > > - x2APIC 64 bits ICR access instead of xAPIC two 32 bits access. > > That looks interesting. How many systems are affected by this change in > practice? Have you tested it on affected hardware? Thanks Ingo! The machines will be affected: CPU support x2APIC and CPU number < 256, chipset does not support VT-d2 or VT-d is disabled in BIOS. I have tested on one of affected hardware, it works. Thanks -Youquan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] x86, apic: Enable x2APIC physical when cpu < 256 native
x2APIC extends APICID from 8 bits to 32 bits, but the device interrupt routed from IOAPIC or delivered in MSI mode will keep 8 bits destination APICID. In order to support x2APIC, the VT-d interrupt remapping is introduced to translate the destination APICID to 32 bits in x2APIC mode and keep the device compatible in this way. x2APIC support both logical and physical mode in destination mode. In logical destination mode, the 32 bits Logical APICID has 2 sub-fields: 16 bits cluster ID and 16 bits logical ID within the cluster and it is required VT-d interrupt remapping in x2APIC cluster mode. In physical destination mode, the 8 bits physical id is compatible with 32 bits physical id when CPU number < 256. When interrupt remapping initialization fail on platform with CPU number < 256, current kernel only enables x2APIC physical mode in virutalization environment, while we also can enable x2APIC physcial mode in native kernel this situation, and the device interrupt will use 8 bits destination APICID in physical mode and be compatible with x2APIC physical when < 256 CPUs. So we can benefit from x2APIC vs xAPIC MMIO: - x2APIC MSR read/write is faster than xAPIC mmio - x2APIC only ICR write to deliver interrupt without polling ICR deliver status bit and xAPIC need poll to read ICR deliver status bit. - x2APIC 64 bits ICR access instead of xAPIC two 32 bits access. Signed-off-by: Youquan Song --- arch/x86/kernel/apic/apic.c |7 ++- 1 files changed, 2 insertions(+), 5 deletions(-) diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index 904611b..51a065a 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -1603,11 +1603,8 @@ void __init enable_IR_x2apic(void) goto skip_x2apic; if (ret < 0) { - /* IR is required if there is APIC ID > 255 even when running -* under KVM -*/ - if (max_physical_apicid > 255 || - !hypervisor_x2apic_available()) { + /* IR is required if there is APIC ID > 255 */ + if (max_physical_apicid > 255) { if (x2apic_preenabled) disable_x2apic(); goto skip_x2apic; -- 1.7.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ata: Fix DVD not dectected at some platform with Wellsburg PCH
There is a patch b55f84e2d527182e7c611d466cd0bb6ddce201de "ata_piix: Fix DVD not dectected at some Haswell platforms" to fix an issue of DVD not recognized on Haswell Desktop platform with Lynx Point. Recently, it is also found the same issue at some platformas with Wellsburg PCH. So deliver a similar patch to fix it by disables 32bit PIO in IDE mode. Signed-off-by: Youquan Song Cc: sta...@vger.kernel.org --- drivers/ata/ata_piix.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/ata/ata_piix.c b/drivers/ata/ata_piix.c index 9a8a674..424bcbe 100644 --- a/drivers/ata/ata_piix.c +++ b/drivers/ata/ata_piix.c @@ -330,7 +330,7 @@ static const struct pci_device_id piix_pci_tbl[] = { /* SATA Controller IDE (Wellsburg) */ { 0x8086, 0x8d00, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_sata_snb }, /* SATA Controller IDE (Wellsburg) */ - { 0x8086, 0x8d08, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata }, + { 0x8086, 0x8d08, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata_snb }, /* SATA Controller IDE (Wellsburg) */ { 0x8086, 0x8d60, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_sata_snb }, /* SATA Controller IDE (Wellsburg) */ -- 1.6.4.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ata: Fix DVD not dectected at some platform with Wellsburg PCH
There is a patch b55f84e2d527182e7c611d466cd0bb6ddce201de "ata_piix: Fix DVD not dectected at some Haswell platforms" to fix an issue of DVD not recognized on Haswell Desktop platform with Lynx Point. Recently, it is also found the same issue at some platformas with Wellsburg PCH. So deliver a similar patch to fix it by disables 32bit PIO in IDE mode. Signed-off-by: Youquan Song --- drivers/ata/ata_piix.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/ata/ata_piix.c b/drivers/ata/ata_piix.c index 9a8a674..424bcbe 100644 --- a/drivers/ata/ata_piix.c +++ b/drivers/ata/ata_piix.c @@ -330,7 +330,7 @@ static const struct pci_device_id piix_pci_tbl[] = { /* SATA Controller IDE (Wellsburg) */ { 0x8086, 0x8d00, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_sata_snb }, /* SATA Controller IDE (Wellsburg) */ - { 0x8086, 0x8d08, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata }, + { 0x8086, 0x8d08, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata_snb }, /* SATA Controller IDE (Wellsburg) */ { 0x8086, 0x8d60, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_sata_snb }, /* SATA Controller IDE (Wellsburg) */ -- 1.6.4.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ata: Fix DVD not dectected at some platform with Wellsburg PCH
There is a patch b55f84e2d527182e7c611d466cd0bb6ddce201de "ata_piix: Fix DVD not dectected at some Haswell platforms" to fix an issue of DVD not recognized on Haswell Desktop platform with Lynx Point. Recently, it is also found the same issue at some platformas with Wellsburg PCH. So deliver a similar patch to fix it by disables 32bit PIO in IDE mode. Signed-off-by: Youquan Song --- drivers/ata/ata_piix.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/ata/ata_piix.c b/drivers/ata/ata_piix.c index 9a8a674..424bcbe 100644 --- a/drivers/ata/ata_piix.c +++ b/drivers/ata/ata_piix.c @@ -330,7 +330,7 @@ static const struct pci_device_id piix_pci_tbl[] = { /* SATA Controller IDE (Wellsburg) */ { 0x8086, 0x8d00, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_sata_snb }, /* SATA Controller IDE (Wellsburg) */ - { 0x8086, 0x8d08, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata }, + { 0x8086, 0x8d08, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata_snb }, /* SATA Controller IDE (Wellsburg) */ { 0x8086, 0x8d60, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_sata_snb }, /* SATA Controller IDE (Wellsburg) */ -- 1.6.4.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: cpu hotplug: possible_cpus broken (again?) next-20130607
> > Interesting, you are changing long standing meaning of maxcpus= > > > > We always use maxcpus=1 to have one cpu up, and later in user space > > to online other cpus like > > echo 1 > /sys/devices/system/cpuX/online. > > > > aka maxcpus= is a soft limit or initial online nr. > > > > we already have nr_cpus= for hard limit. > > > > So need to drop > > commit 3e275a5ba367ab74b3a4e49114307baed989fcac > > Author: Youquan Song > > Date: Fri Jun 7 10:07:08 2013 +1000 > > > > drivers/base/cpu.c: fix maxcpus boot option > > Agreed. Yes. I also agree to drop it and the fix need more consideration. I try use maxcpus to limit cpu number to debug a well known applition because it fail to run when cpu number is larger to > 69. When I use maxcpus at to limit the boot CPUs number, but udev will enable all of the CPUs at 3.10 kernel automatically. I also try maxcpus at 3.0 kernel, it does not show the maxcpus issue. I have digged out recently, it is the commit at 3.2 kernel 8a25a2fd126c621f44f3aeaef80d51f00fc11639 "cpu: convert 'cpu' and 'machinecheck' sysdev_class to a regular subsystem" result in udev automatically enable all of CPUs though maxcpus has been provided. So the next, I need look at udev try to enable all of CPUs though maxcpus provided. Possibly, it can also fix it in udev daemon. Secondly, I think that the maxcpus= option description is too confused in Documentation/kernel-parameters.txt. The maxcpus and nr_cpus option need switch their name. Currently: maxcpus=[SMP] Maximum number of processors that an SMP kernel should make use of. maxcpus=n : n >= 0 limits the kernel to using 'n' processors. n=0 is a special case, it is equivalent to "nosmp", which also disables the IO APIC. How about change to maxcpus=[SMP] Maximum number of processors that an SMP kernel bring up during booting. maxcpus=n : n >= 0 limits the kernel to using 'n' processors. n=0 is a special case, it is equivalent to "nosmp", which also disables the IO APIC. Thanks -Youquan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: cpu hotplug: possible_cpus broken (again?) next-20130607
> On 06/12/2013 05:03 AM, Youquan Song wrote: > > +#ifdef CONFIG_SMP > > + /* return when cpu number greater than maximum number of > > CPUs */ > > + if (setup_max_cpus <= num_online_cpus() + 1) { > > + cpu_hotplug_driver_unlock(); > > + return -EINVAL; > > + } > > +#endif > > from_nid = cpu_to_node(cpuid); > > ret = cpu_up(cpuid); > > Your patch is line-wrapped. > > Also, the #ifdef is unnecessary. If CONFIG_SMP is off: > > static const unsigned int setup_max_cpus = NR_CPUS; > #define num_online_cpus() 1U > > The compiler will take care of optimizing out the the if() without the > explicit #ifdef. > > Also, the +1 looks goofy to me. Doesn't this do the same thing (and > isn't it much easier to read)? > > if (num_online_cpus() >= setup_max_cpus) > Thanks. Here is a formal patch for it. please review and try. Subject: [PATCH] core: Fix maxcpus boot option broken maxcpus boot option to limit maximum number of CPUs on system, but this option is broken at recent kernel. Though we use maxcpus to limit CPUs number, but current kernel will register all of present CPUs in sysfs. udev will enumerate all registered cpu at sysfs, and it will bring up the CPU if the CPU is offline. So the maxcpus option is broken. This patch will limit the online cpus number not over limitation of maxcpus option. So it will keep the maxcpus limitation when udev enumeration or other intention of bring up CPUs over the limitation by method like echo 1 > /sys/devices/system/cpu/online Signed-off-by: Youquan Song --- diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c index 3d48fc8..e32fffa 100644 --- a/drivers/base/cpu.c +++ b/drivers/base/cpu.c @@ -60,6 +60,13 @@ static ssize_t __ref store_online(struct device *dev, kobject_uevent(&dev->kobj, KOBJ_OFFLINE); break; case '1': + /* Return when online cpu number equal or greater than + * maximum number of CPUs */ + if (num_online_cpus() >= setup_max_cpus) { + cpu_hotplug_driver_unlock(); + return -EINVAL; + } + from_nid = cpu_to_node(cpuid); ret = cpu_up(cpuid); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: cpu hotplug: possible_cpus broken (again?) next-20130607
On Tue, Jun 11, 2013 at 04:32:34PM -0600, Toshi Kani wrote: > On Wed, 2013-06-12 at 00:34 +0200, Rafael J. Wysocki wrote: > > On Tuesday, June 11, 2013 03:17:28 PM Dave Hansen wrote: > > > On 06/11/2013 03:05 PM, Rafael J. Wysocki wrote: > > > > On Tuesday, June 11, 2013 02:51:33 PM Dave Hansen wrote: > > > >> possible_cpus looks broken again. I'm booting with: > > > >> > > > >> maxcpus=10 possible_cpus=160 > > > >> > > > >> But I only get 0-9 in sysfs: > > > >> > > > >>> # ls /sys/devices/system/cpu/ > > > >>> cpu0 cpu2 cpu4 cpu6 cpu8 cpufreq kernel_max offline possible > > > >>> probeuevent > > > >>> cpu1 cpu3 cpu5 cpu7 cpu9 cpuidle modaliasonline present > > > >>> release > > > > > > > > Can you please test the acpi-hotplug branch of the linux-pm.git tree? > > > > > > That branch seems to work happily. > > > > In that case the problem may have been reintroduced by a merge conflict fix > > in > > linux-next. > > I believe the problem was introduced by the following change. From the > description, though, this is exactly what this patch was trying to > change... Adding Youguan to the list. > > commit 3e275a5ba367ab74b3a4e49114307baed989fcac > Author: Youquan Song > Date: Fri Jun 7 10:07:08 2013 +1000 > > drivers/base/cpu.c: fix maxcpus boot option > Hi Toshi, Thanks Thoshi for the information. please try the below patch to fix the issue by moving the code to store_online. diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c index 3d48fc8..2378f42 100644 --- a/drivers/base/cpu.c +++ b/drivers/base/cpu.c @@ -60,6 +60,13 @@ static ssize_t __ref store_online(struct device *dev, kobject_uevent(&dev->kobj, KOBJ_OFFLINE); break; case '1': +#ifdef CONFIG_SMP + /* return when cpu number greater than maximum number of CPUs */ + if (setup_max_cpus <= num_online_cpus() + 1) { + cpu_hotplug_driver_unlock(); + return -EINVAL; + } +#endif from_nid = cpu_to_node(cpuid); ret = cpu_up(cpuid); Thanks -Youquan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] core: Fix maxcpus boot option broken
maxcpus boot option to limit maximum number of CPUs on system, but this option is broken at recent kernel. Though we use maxcpus to limit CPUs number, but current kernel will register all of present CPUs in sysfs. udev will enumerate all registered cpu at sysfs, and it will bring up the CPU if the CPU is offline. So the maxcpus option is broken. This patch will only register the CPU which is not over limitation of maxcpus option in sysfs. So it will keep the maxcpus limitation when udev enumeration or other intention of bring up CPUs over the limitation by method like echo 1 > /sys/devices/system/cpu/online Signed-off-by: Youquan Song --- drivers/base/cpu.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c index 3d48fc8..c7d603a 100644 --- a/drivers/base/cpu.c +++ b/drivers/base/cpu.c @@ -272,6 +272,10 @@ int __cpuinit register_cpu(struct cpu *cpu, int num) { int error; + /* return when cpu number greater than maximum number of CPUs */ + if (num >= setup_max_cpus) + return 0; + cpu->node_id = cpu_to_node(num); memset(&cpu->dev, 0x00, sizeof(struct device)); cpu->dev.id = num; -- 1.7.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3] ata: Fix DVD not dectected at some Haswell platforms
> > Can you look at the patch which required by some Haswell platforms? > Hi Jeff, What's your opinion about the patch? It block the installation on some new platforms. Thanks -Youquan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] perf: Fix parameter type mismatch
When build the tools/perf, encounter a block issue: cc1: warnings being treated as errors util/scripting-engines/trace-event-perl.c: In function ‘perl_process_tracepoint’: util/scripting-engines/trace-event-perl.c:285: error: format ‘%lu’ expects type ‘long unsigned int’, but argument 2 has type ‘__u64’ make: *** [util/scripting-engines/trace-event-perl.o] Error 1 Signed-off-by: Youquan Song --- .../perf/util/scripting-engines/trace-event-perl.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/tools/perf/util/scripting-engines/trace-event-perl.c b/tools/perf/util/scripting-engines/trace-event-perl.c index f80605e..b2b3bdb 100644 --- a/tools/perf/util/scripting-engines/trace-event-perl.c +++ b/tools/perf/util/scripting-engines/trace-event-perl.c @@ -282,7 +282,7 @@ static void perl_process_tracepoint(union perf_event *perf_event __maybe_unused, event = find_cache_event(evsel); if (!event) - die("ug! no event found for type %" PRIu64, evsel->attr.config); + die("ug! no event found for type %" PRIu64, (u64)(evsel->attr.config)); pid = raw_field_value(event, "common_pid", data); -- 1.7.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86,apic: Blacklist x2APIC on some platforms
> > I found this patch after some googling and for the record, it makes my > W520 boot with VT-d enabled and the discrete NVidia card. > Is it still being considered? > Yes. I am still in pushing the patch to upstream. The patch is good and reviewed by Yinghai but it depends on Yinghai's patch which is not upstream now. http://git.kernel.org/cgit/linux/kernel/git/yinghai/linux-yinghai.git/diff/?id=de38757e964cfee20e6da1977572a2191d7f4aa0 Refer to https://bugzilla.kernel.org/show_bug.cgi?id=43054 Peter, will you take it? Thanks -Youquan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3] ata: Fix DVD not dectected at some Haswell platforms
Hi Maintainer, Can you look at the patch which required by some Haswell platforms? Thanks -Youquan On Wed, Mar 06, 2013 at 10:49:05AM -0500, Youquan Song wrote: > There is a quirk patch 5e5a4f5d5a08c9c504fe956391ac3dae2c66556d > "ata_piix: make DVD Drive recognisable on systems with Intel Sandybridge > chipsets(v2)" fixing the 4 ports IDE controller 32bit PIO mode. > > We've hit a problem with DVD not recognized on Haswell Desktop platform which > includes Lynx Point 2-port SATA controller. > > This quirk patch disables 32bit PIO on this controller in IDE mode. > > v2: Change spelling error in statememnt pointed by Sergei Shtylyov. > v3: Change comment statememnt and spliting line over 80 characters pointed by > Libor Pechacek and also rebase the patch against 3.8-rc7 kernel. > > Tested-by: Lee, Chun-Yi > Signed-off-by: Youquan Song > Cc: sta...@vger.kernel.org > --- > drivers/ata/ata_piix.c | 14 +- > 1 files changed, 13 insertions(+), 1 deletions(-) > > diff --git a/drivers/ata/ata_piix.c b/drivers/ata/ata_piix.c > index 174eca6..4aab550 100644 > --- a/drivers/ata/ata_piix.c > +++ b/drivers/ata/ata_piix.c > @@ -150,6 +150,7 @@ enum piix_controller_ids { > tolapai_sata, > piix_pata_vmw, /* PIIX4 for VMware, spurious DMA_ERR */ > ich8_sata_snb, > + ich8_2port_sata_snb, > }; > > struct piix_map_db { > @@ -304,7 +305,7 @@ static const struct pci_device_id piix_pci_tbl[] = { > /* SATA Controller IDE (Lynx Point) */ > { 0x8086, 0x8c01, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_sata_snb }, > /* SATA Controller IDE (Lynx Point) */ > - { 0x8086, 0x8c08, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata }, > + { 0x8086, 0x8c08, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata_snb }, > /* SATA Controller IDE (Lynx Point) */ > { 0x8086, 0x8c09, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata }, > /* SATA Controller IDE (Lynx Point-LP) */ > @@ -422,6 +423,7 @@ static const struct piix_map_db *piix_map_db_table[] = { > [ich8m_apple_sata] = &ich8m_apple_map_db, > [tolapai_sata] = &tolapai_map_db, > [ich8_sata_snb] = &ich8_map_db, > + [ich8_2port_sata_snb] = &ich8_2port_map_db, > }; > > static struct pci_bits piix_enable_bits[] = { > @@ -1225,6 +1227,16 @@ static struct ata_port_info piix_port_info[] = { > .udma_mask = ATA_UDMA6, > .port_ops = &piix_sata_ops, > }, > + > + [ich8_2port_sata_snb] = > + { > + .flags = PIIX_SATA_FLAGS | PIIX_FLAG_SIDPR > + | PIIX_FLAG_PIO16, > + .pio_mask = ATA_PIO4, > + .mwdma_mask = ATA_MWDMA2, > + .udma_mask = ATA_UDMA6, > + .port_ops = &piix_sata_ops, > + }, > }; > > #define AHCI_PCI_BAR 5 > -- > 1.7.7.4 > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3] ata: Fix DVD not dectected at some Haswell platforms
There is a quirk patch 5e5a4f5d5a08c9c504fe956391ac3dae2c66556d "ata_piix: make DVD Drive recognisable on systems with Intel Sandybridge chipsets(v2)" fixing the 4 ports IDE controller 32bit PIO mode. We've hit a problem with DVD not recognized on Haswell Desktop platform which includes Lynx Point 2-port SATA controller. This quirk patch disables 32bit PIO on this controller in IDE mode. v2: Change spelling error in statememnt pointed by Sergei Shtylyov. v3: Change comment statememnt and spliting line over 80 characters pointed by Libor Pechacek and also rebase the patch against 3.8-rc7 kernel. Tested-by: Lee, Chun-Yi Signed-off-by: Youquan Song Cc: sta...@vger.kernel.org --- drivers/ata/ata_piix.c | 14 +- 1 files changed, 13 insertions(+), 1 deletions(-) diff --git a/drivers/ata/ata_piix.c b/drivers/ata/ata_piix.c index 174eca6..4aab550 100644 --- a/drivers/ata/ata_piix.c +++ b/drivers/ata/ata_piix.c @@ -150,6 +150,7 @@ enum piix_controller_ids { tolapai_sata, piix_pata_vmw, /* PIIX4 for VMware, spurious DMA_ERR */ ich8_sata_snb, + ich8_2port_sata_snb, }; struct piix_map_db { @@ -304,7 +305,7 @@ static const struct pci_device_id piix_pci_tbl[] = { /* SATA Controller IDE (Lynx Point) */ { 0x8086, 0x8c01, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_sata_snb }, /* SATA Controller IDE (Lynx Point) */ - { 0x8086, 0x8c08, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata }, + { 0x8086, 0x8c08, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata_snb }, /* SATA Controller IDE (Lynx Point) */ { 0x8086, 0x8c09, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata }, /* SATA Controller IDE (Lynx Point-LP) */ @@ -422,6 +423,7 @@ static const struct piix_map_db *piix_map_db_table[] = { [ich8m_apple_sata] = &ich8m_apple_map_db, [tolapai_sata] = &tolapai_map_db, [ich8_sata_snb] = &ich8_map_db, + [ich8_2port_sata_snb] = &ich8_2port_map_db, }; static struct pci_bits piix_enable_bits[] = { @@ -1225,6 +1227,16 @@ static struct ata_port_info piix_port_info[] = { .udma_mask = ATA_UDMA6, .port_ops = &piix_sata_ops, }, + + [ich8_2port_sata_snb] = + { + .flags = PIIX_SATA_FLAGS | PIIX_FLAG_SIDPR + | PIIX_FLAG_PIO16, + .pio_mask = ATA_PIO4, + .mwdma_mask = ATA_MWDMA2, + .udma_mask = ATA_UDMA6, + .port_ops = &piix_sata_ops, + }, }; #define AHCI_PCI_BAR 5 -- 1.7.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3] ata: Fix DVD not dectected at some Haswell platforms
Hi Maintainer, Can you take the patch which is needed by some new platforms? Thanks -Youquan On Mon, Feb 18, 2013 at 11:00:55AM -0500, Youquan Song wrote: > There is a quirk patch 5e5a4f5d5a08c9c504fe956391ac3dae2c66556d > "ata_piix: make DVD Drive recognisable on systems with Intel Sandybridge > chipsets(v2)" fixing the 4 ports IDE controller 32bit PIO mode. > > We've hit a problem with DVD not recognized on Haswell Desktop platform which > includes Lynx Point 2-port SATA controller. > > This quirk patch disables 32bit PIO on this controller in IDE mode. > > v2: Change spelling error in statememnt pointed by Sergei Shtylyov. > v3: Change comment statememnt and spliting line over 80 characters pointed by > Libor Pechacek and also rebase the patch against 3.8-rc7 kernel. > > Tested-by: Lee, Chun-Yi > Signed-off-by: Youquan Song > Cc: sta...@vger.kernel.org > --- > drivers/ata/ata_piix.c | 14 +- > 1 files changed, 13 insertions(+), 1 deletions(-) > > diff --git a/drivers/ata/ata_piix.c b/drivers/ata/ata_piix.c > index 174eca6..4aab550 100644 > --- a/drivers/ata/ata_piix.c > +++ b/drivers/ata/ata_piix.c > @@ -150,6 +150,7 @@ enum piix_controller_ids { > tolapai_sata, > piix_pata_vmw, /* PIIX4 for VMware, spurious DMA_ERR */ > ich8_sata_snb, > + ich8_2port_sata_snb, > }; > > struct piix_map_db { > @@ -304,7 +305,7 @@ static const struct pci_device_id piix_pci_tbl[] = { > /* SATA Controller IDE (Lynx Point) */ > { 0x8086, 0x8c01, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_sata_snb }, > /* SATA Controller IDE (Lynx Point) */ > - { 0x8086, 0x8c08, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata }, > + { 0x8086, 0x8c08, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata_snb }, > /* SATA Controller IDE (Lynx Point) */ > { 0x8086, 0x8c09, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata }, > /* SATA Controller IDE (Lynx Point-LP) */ > @@ -422,6 +423,7 @@ static const struct piix_map_db *piix_map_db_table[] = { > [ich8m_apple_sata] = &ich8m_apple_map_db, > [tolapai_sata] = &tolapai_map_db, > [ich8_sata_snb] = &ich8_map_db, > + [ich8_2port_sata_snb] = &ich8_2port_map_db, > }; > > static struct pci_bits piix_enable_bits[] = { > @@ -1225,6 +1227,16 @@ static struct ata_port_info piix_port_info[] = { > .udma_mask = ATA_UDMA6, > .port_ops = &piix_sata_ops, > }, > + > + [ich8_2port_sata_snb] = > + { > + .flags = PIIX_SATA_FLAGS | PIIX_FLAG_SIDPR > + | PIIX_FLAG_PIO16, > + .pio_mask = ATA_PIO4, > + .mwdma_mask = ATA_MWDMA2, > + .udma_mask = ATA_UDMA6, > + .port_ops = &piix_sata_ops, > + }, > }; > > #define AHCI_PCI_BAR 5 > -- > 1.7.7.4 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] ata: Fix DVD not dectected at some Haswell platforms
> > As to my understanding Sergei did not suggest citing the whole commit message. > I also find the numerous references to Sandy Bridge confusing as this is a fix > for Lynx Point chipset. > > How about rephrasing the commit message in a way similar to the following one? > --8<- > We've hit a problem with DVD not recognized on Haswell Desktop platform which > includes Lynx Point 2-port SATA controller. This quirk patch disables 32bit > PIO on the controller in IDE mode. > -->8- Thanks Libor! I have updated the comments and sent out a v3 patch out to LKML. > > + .flags = PIIX_SATA_FLAGS | PIIX_FLAG_SIDPR | > > PIIX_FLAG_PIO16, > > The line might be worth splitting as it's over 80 characters. > > Otherwise the patch looks OK to me. > Also change it in v3 patch. Thanks -Youquan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3] ata: Fix DVD not dectected at some Haswell platforms
There is a quirk patch 5e5a4f5d5a08c9c504fe956391ac3dae2c66556d "ata_piix: make DVD Drive recognisable on systems with Intel Sandybridge chipsets(v2)" fixing the 4 ports IDE controller 32bit PIO mode. We've hit a problem with DVD not recognized on Haswell Desktop platform which includes Lynx Point 2-port SATA controller. This quirk patch disables 32bit PIO on this controller in IDE mode. v2: Change spelling error in statememnt pointed by Sergei Shtylyov. v3: Change comment statememnt and spliting line over 80 characters pointed by Libor Pechacek and also rebase the patch against 3.8-rc7 kernel. Tested-by: Lee, Chun-Yi Signed-off-by: Youquan Song Cc: sta...@vger.kernel.org --- drivers/ata/ata_piix.c | 14 +- 1 files changed, 13 insertions(+), 1 deletions(-) diff --git a/drivers/ata/ata_piix.c b/drivers/ata/ata_piix.c index 174eca6..4aab550 100644 --- a/drivers/ata/ata_piix.c +++ b/drivers/ata/ata_piix.c @@ -150,6 +150,7 @@ enum piix_controller_ids { tolapai_sata, piix_pata_vmw, /* PIIX4 for VMware, spurious DMA_ERR */ ich8_sata_snb, + ich8_2port_sata_snb, }; struct piix_map_db { @@ -304,7 +305,7 @@ static const struct pci_device_id piix_pci_tbl[] = { /* SATA Controller IDE (Lynx Point) */ { 0x8086, 0x8c01, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_sata_snb }, /* SATA Controller IDE (Lynx Point) */ - { 0x8086, 0x8c08, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata }, + { 0x8086, 0x8c08, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata_snb }, /* SATA Controller IDE (Lynx Point) */ { 0x8086, 0x8c09, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata }, /* SATA Controller IDE (Lynx Point-LP) */ @@ -422,6 +423,7 @@ static const struct piix_map_db *piix_map_db_table[] = { [ich8m_apple_sata] = &ich8m_apple_map_db, [tolapai_sata] = &tolapai_map_db, [ich8_sata_snb] = &ich8_map_db, + [ich8_2port_sata_snb] = &ich8_2port_map_db, }; static struct pci_bits piix_enable_bits[] = { @@ -1225,6 +1227,16 @@ static struct ata_port_info piix_port_info[] = { .udma_mask = ATA_UDMA6, .port_ops = &piix_sata_ops, }, + + [ich8_2port_sata_snb] = + { + .flags = PIIX_SATA_FLAGS | PIIX_FLAG_SIDPR + | PIIX_FLAG_PIO16, + .pio_mask = ATA_PIO4, + .mwdma_mask = ATA_MWDMA2, + .udma_mask = ATA_UDMA6, + .port_ops = &piix_sata_ops, + }, }; #define AHCI_PCI_BAR 5 -- 1.7.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ata: Fix DVD not dectected at some Haswell platforms
>> +{ 0x8086, 0x8c08, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata_snb }, >> /* SATA Controller IDE (Lynx Point) */ >> { 0x8086, 0x8c09, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata }, > >Also, are you sure this one and the following Lynx Point controllers are > not affected? I am not sure. the 0x8c09 is possibly used on mobile PC not desktop. On one of my machine, it includes the chipset but the 2 ports IDE controller is not extend out for use. There are only 2 ports extended out from 4 ports IDE controller. So I can not verify it. I think, the notebook/mobile PC does not require to extends out all of the IDE ports. This patch only fixs the 0x8c08 2 ports IDE controller for it block the installation. If there is an issue reporting from 0x8c09, we can fix it late. Thanks -Youuquan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ata: Fix DVD not dectected at some Haswell platforms
> On 30-01-2013 21:19, Youquan Song wrote: > >> There is a quirk patch 5e5a4f5d5a08c9c504fe956391ac3dae2c66556d > > Please also specify the summary of that patch in parens. > >> fix the 4 ports > >s/fix/fixing/ > >> IDE controller 32bit PIO mode. >> Recently, the problem was showed > >s/showed/shown/ > >> at Haswell platform which includes 2 ports IDE controller. > >> So introduce a qurik > >Quirk. > >> patch to disable 32bit PIO at this IDE controller. > >s/at/on/ > >> Signed-off-by: Youquan Song > > MBR, Sergei Thanks a lot! I have sent out a fixing patch for it. -Youquan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2] ata: Fix DVD not dectected at some Haswell platforms
There is a quirk patch 5e5a4f5d5a08c9c504fe956391ac3dae2c66556d "ata_piix: make DVD Drive recognisable on systems with Intel Sandybridge chipsets(v2) This quirk patch fixes one kind of bug inside some Intel Sandybridge chipsets, see reports from https://bugzilla.kernel.org/show_bug.cgi?id=40592. Many guys also have reported the problem before: https://bugs.launchpad.net/bugs/737388 https://bugs.launchpad.net/bugs/794642 https://bugs.launchpad.net/bugs/782389 .. With help from Tejun, the problem is found to be caused by 32bit PIO mode, so introduce the quirk patch to disable 32bit PIO on SATA piix for some Sandybridge CPT chipsets. Seth also tested the patch on all five affected chipsets (pci device ID: 0x1c00, 0x1c01, 0x1d00, 0x1e00, 0x1e01), and found the patch does fix the problem. " The above patch only fixing the 4 ports IDE controller 32bit PIO mode. Recently, the problem was shown at Haswell Desktop platform which includes 2 ports IDE controller. So introduce a quirk patch to disable 32bit PIO on this IDE controller. v2: Change spelling error in statememnt pointed by Sergei Shtylyov. Tested-by: Lee, Chun-Yi Signed-off-by: Youquan Song Cc: sta...@vger.kernel.org --- drivers/ata/ata_piix.c | 14 +- 1 files changed, 13 insertions(+), 1 deletions(-) diff --git a/drivers/ata/ata_piix.c b/drivers/ata/ata_piix.c index ef773e1..1993e52 100644 --- a/drivers/ata/ata_piix.c +++ b/drivers/ata/ata_piix.c @@ -150,6 +150,7 @@ enum piix_controller_ids { tolapai_sata, piix_pata_vmw, /* PIIX4 for VMware, spurious DMA_ERR */ ich8_sata_snb, + ich8_2port_sata_snb, }; struct piix_map_db { @@ -326,7 +327,7 @@ static const struct pci_device_id piix_pci_tbl[] = { /* SATA Controller IDE (Lynx Point) */ { 0x8086, 0x8c01, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_sata_snb }, /* SATA Controller IDE (Lynx Point) */ - { 0x8086, 0x8c08, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata }, + { 0x8086, 0x8c08, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata_snb }, /* SATA Controller IDE (Lynx Point) */ { 0x8086, 0x8c09, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata }, /* SATA Controller IDE (Lynx Point-LP) */ @@ -502,6 +503,7 @@ static const struct piix_map_db *piix_map_db_table[] = { [ich8m_apple_sata] = &ich8m_apple_map_db, [tolapai_sata] = &tolapai_map_db, [ich8_sata_snb] = &ich8_map_db, + [ich8_2port_sata_snb] = &ich8_2port_map_db, }; static struct ata_port_info piix_port_info[] = { @@ -643,6 +645,16 @@ static struct ata_port_info piix_port_info[] = { .port_ops = &piix_sata_ops, }, + [ich8_2port_sata_snb] = + { + .flags = PIIX_SATA_FLAGS | PIIX_FLAG_SIDPR | PIIX_FLAG_PIO16, + .pio_mask = ATA_PIO4, + .mwdma_mask = ATA_MWDMA2, + .udma_mask = ATA_UDMA6, + .port_ops = &piix_sata_ops, + }, + + }; static struct pci_bits piix_enable_bits[] = { -- 1.6.4.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ata: Fix DVD not dectected at some Haswell platforms
There is a quirk patch 5e5a4f5d5a08c9c504fe956391ac3dae2c66556d fix the 4 ports IDE controller 32bit PIO mode. Recently, the problem was showed at Haswell platform which includes 2 ports IDE controller. So introduce a qurik patch to disable 32bit PIO at this IDE controller. Signed-off-by: Youquan Song --- drivers/ata/ata_piix.c | 14 +- 1 files changed, 13 insertions(+), 1 deletions(-) diff --git a/drivers/ata/ata_piix.c b/drivers/ata/ata_piix.c index ef773e1..1993e52 100644 --- a/drivers/ata/ata_piix.c +++ b/drivers/ata/ata_piix.c @@ -150,6 +150,7 @@ enum piix_controller_ids { tolapai_sata, piix_pata_vmw, /* PIIX4 for VMware, spurious DMA_ERR */ ich8_sata_snb, + ich8_2port_sata_snb, }; struct piix_map_db { @@ -326,7 +327,7 @@ static const struct pci_device_id piix_pci_tbl[] = { /* SATA Controller IDE (Lynx Point) */ { 0x8086, 0x8c01, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_sata_snb }, /* SATA Controller IDE (Lynx Point) */ - { 0x8086, 0x8c08, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata }, + { 0x8086, 0x8c08, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata_snb }, /* SATA Controller IDE (Lynx Point) */ { 0x8086, 0x8c09, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata }, /* SATA Controller IDE (Lynx Point-LP) */ @@ -502,6 +503,7 @@ static const struct piix_map_db *piix_map_db_table[] = { [ich8m_apple_sata] = &ich8m_apple_map_db, [tolapai_sata] = &tolapai_map_db, [ich8_sata_snb] = &ich8_map_db, + [ich8_2port_sata_snb] = &ich8_2port_map_db, }; static struct ata_port_info piix_port_info[] = { @@ -643,6 +645,16 @@ static struct ata_port_info piix_port_info[] = { .port_ops = &piix_sata_ops, }, + [ich8_2port_sata_snb] = + { + .flags = PIIX_SATA_FLAGS | PIIX_FLAG_SIDPR | PIIX_FLAG_PIO16, + .pio_mask = ATA_PIO4, + .mwdma_mask = ATA_MWDMA2, + .udma_mask = ATA_UDMA6, + .port_ops = &piix_sata_ops, + }, + + }; static struct pci_bits piix_enable_bits[] = { -- 1.6.4.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ata: Fix DVD not dectected at some Haswell platforms
There is a quirk patch 5e5a4f5d5a08c9c504fe956391ac3dae2c66556d fix the 4 ports IDE controller 32bit PIO mode. Recently, the problem was showed at Haswell platform which includes 2 ports IDE controller. So introduce a qurik patch to disable 32bit PIO at this IDE controller. Signed-off-by: Youquan Song --- drivers/ata/ata_piix.c | 14 +- 1 files changed, 13 insertions(+), 1 deletions(-) diff --git a/drivers/ata/ata_piix.c b/drivers/ata/ata_piix.c index ef773e1..1993e52 100644 --- a/drivers/ata/ata_piix.c +++ b/drivers/ata/ata_piix.c @@ -150,6 +150,7 @@ enum piix_controller_ids { tolapai_sata, piix_pata_vmw, /* PIIX4 for VMware, spurious DMA_ERR */ ich8_sata_snb, + ich8_2port_sata_snb, }; struct piix_map_db { @@ -326,7 +327,7 @@ static const struct pci_device_id piix_pci_tbl[] = { /* SATA Controller IDE (Lynx Point) */ { 0x8086, 0x8c01, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_sata_snb }, /* SATA Controller IDE (Lynx Point) */ - { 0x8086, 0x8c08, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata }, + { 0x8086, 0x8c08, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata_snb }, /* SATA Controller IDE (Lynx Point) */ { 0x8086, 0x8c09, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata }, /* SATA Controller IDE (Lynx Point-LP) */ @@ -502,6 +503,7 @@ static const struct piix_map_db *piix_map_db_table[] = { [ich8m_apple_sata] = &ich8m_apple_map_db, [tolapai_sata] = &tolapai_map_db, [ich8_sata_snb] = &ich8_map_db, + [ich8_2port_sata_snb] = &ich8_2port_map_db, }; static struct ata_port_info piix_port_info[] = { @@ -643,6 +645,16 @@ static struct ata_port_info piix_port_info[] = { .port_ops = &piix_sata_ops, }, + [ich8_2port_sata_snb] = + { + .flags = PIIX_SATA_FLAGS | PIIX_FLAG_SIDPR | PIIX_FLAG_PIO16, + .pio_mask = ATA_PIO4, + .mwdma_mask = ATA_MWDMA2, + .udma_mask = ATA_UDMA6, + .port_ops = &piix_sata_ops, + }, + + }; static struct pci_bits piix_enable_bits[] = { -- 1.6.4.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:perf/urgent] x86/perf: Add IvyBridge EP support
Commit-ID: 923d8697e24847000490c187de1aeaca622611a3 Gitweb: http://git.kernel.org/tip/923d8697e24847000490c187de1aeaca622611a3 Author: Youquan Song AuthorDate: Tue, 18 Dec 2012 12:20:23 -0500 Committer: Ingo Molnar CommitDate: Thu, 24 Jan 2013 16:14:04 +0100 x86/perf: Add IvyBridge EP support Running the perf utility on a Ivybridge EP server we encounter "not supported" events: L1-dcache-loads L1-dcache-load-misses L1-dcache-stores L1-dcache-store-misses L1-dcache-prefetches L1-dcache-prefetch-misses This patch adds support for this processor. Signed-off-by: Youquan Song Cc: Andi Kleen Cc: Youquan Song Cc: Peter Zijlstra Cc: Stephane Eranian Link: http://lkml.kernel.org/r/1355851223-27705-1-git-send-email-youquan.s...@intel.com Signed-off-by: Ingo Molnar --- arch/x86/kernel/cpu/perf_event_intel.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c index cb313a5..4914e94 100644 --- a/arch/x86/kernel/cpu/perf_event_intel.c +++ b/arch/x86/kernel/cpu/perf_event_intel.c @@ -2087,6 +2087,7 @@ __init int intel_pmu_init(void) pr_cont("SandyBridge events, "); break; case 58: /* IvyBridge */ + case 62: /* IvyBridge EP */ memcpy(hw_cache_event_ids, snb_hw_cache_event_ids, sizeof(hw_cache_event_ids)); memcpy(hw_cache_extra_regs, snb_hw_cache_extra_regs, -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86,apic: Blacklist x2APIC on some platforms
On Tue, Dec 18, 2012 at 09:42:30AM -0800, Yinghai Lu wrote: > On Tue, Dec 18, 2012 at 9:33 AM, H. Peter Anvin wrote: > > On 12/18/2012 09:07 AM, Youquan Song wrote: > >> Blacklist x2apic when Nivida graphics enabled on Lenovo ThinkPad T420. > >> Also set blacklist x2apic for Lenovo ThinkPad W520 and L520. > > > > I thought we had gotten reports that the Nvidia correlation was false? > > that's T520. Hi hpa, Yinghai's T520 works when x2APIC enabled, so do not need to blacklist. Would you like to take the patch? Thanks -Youquan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86,perf: Add IvyBridge EP support
Would you like to take it? It is needed by Linux OSVs. Thanks -Youquan On Tue, Dec 18, 2012 at 12:20:23PM -0500, Youquan Song wrote: > Run in perf utility at Ivybridge EP server, encouter "not supported" event > > L1-dcache-loads > L1-dcache-load-misses > L1-dcache-stores > L1-dcache-store-misses > L1-dcache-prefetches > L1-dcache-prefetch-misses > > This patch add the support for this processor. > > Reviewed-by: Andi Kleen > Signed-off-by: Youquan Song > --- > arch/x86/kernel/cpu/perf_event_intel.c |1 + > 1 files changed, 1 insertions(+), 0 deletions(-) > > diff --git a/arch/x86/kernel/cpu/perf_event_intel.c > b/arch/x86/kernel/cpu/perf_event_intel.c > index 324bb52..aea3503 100644 > --- a/arch/x86/kernel/cpu/perf_event_intel.c > +++ b/arch/x86/kernel/cpu/perf_event_intel.c > @@ -2075,6 +2075,7 @@ __init int intel_pmu_init(void) > pr_cont("SandyBridge events, "); > break; > case 58: /* IvyBridge */ > + case 62: /* IvyBridge EP */ > memcpy(hw_cache_event_ids, snb_hw_cache_event_ids, > sizeof(hw_cache_event_ids)); > memcpy(hw_cache_extra_regs, snb_hw_cache_extra_regs, > -- > 1.6.4.2 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] x86,idle: pr_debug information need separated
When debug kernel, the the below information is found: intel_idle: unaware of model 0x1a MWAIT 4 please contact lenb@kernel.orgACPI: Device input0 -> No ACPI support so this patch separates it. Signed-off-by: Youquan Song --- drivers/idle/intel_idle.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c index b0f6b4c..eae6e3b 100644 --- a/drivers/idle/intel_idle.c +++ b/drivers/idle/intel_idle.c @@ -518,7 +518,7 @@ static int intel_idle_cpuidle_driver_init(void) if (*cpuidle_state_table[cstate].name == '\0') pr_debug(PREFIX "unaware of model 0x%x" " MWAIT %d please" - " contact l...@kernel.org", + " contact l...@kernel.org\n", boot_cpu_data.x86_model, cstate); continue; } -- 1.6.4.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] x86,perf: Add IvyBridge EP support
Run in perf utility at Ivybridge EP server, encouter "not supported" event L1-dcache-loads L1-dcache-load-misses L1-dcache-stores L1-dcache-store-misses L1-dcache-prefetches L1-dcache-prefetch-misses This patch add the support for this processor. Reviewed-by: Andi Kleen Signed-off-by: Youquan Song --- arch/x86/kernel/cpu/perf_event_intel.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c index 324bb52..aea3503 100644 --- a/arch/x86/kernel/cpu/perf_event_intel.c +++ b/arch/x86/kernel/cpu/perf_event_intel.c @@ -2075,6 +2075,7 @@ __init int intel_pmu_init(void) pr_cont("SandyBridge events, "); break; case 58: /* IvyBridge */ + case 62: /* IvyBridge EP */ memcpy(hw_cache_event_ids, snb_hw_cache_event_ids, sizeof(hw_cache_event_ids)); memcpy(hw_cache_extra_regs, snb_hw_cache_extra_regs, -- 1.6.4.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] x86,apic: Blacklist x2APIC on some platforms
Blacklist x2apic when Nivida graphics enabled on Lenovo ThinkPad T420. Also set blacklist x2apic for Lenovo ThinkPad W520 and L520. Thre are 3 bug reports: https://bugzilla.kernel.org/show_bug.cgi?id=43054 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/776999 https://bugs.launchpad.net/bugs/922037 The patches is based on http://git.kernel.org/?p=linux/kernel/git/yinghai/ linux-yinghai.git;a=patch;h=de38757e964cfee20e6da1977572a2191d7f4aa0 Reviewed-by: Yinghai Lu Signed-off-by: Youquan Song --- arch/x86/include/asm/x86_init.h |1 + arch/x86/kernel/apic/apic.c | 51 +++ arch/x86/kernel/early-quirks.c |9 +++ 3 files changed, 61 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h index 38155f6..88e39e6 100644 --- a/arch/x86/include/asm/x86_init.h +++ b/arch/x86/include/asm/x86_init.h @@ -202,5 +202,6 @@ extern struct x86_msi_ops x86_msi; extern struct x86_io_apic_ops x86_io_apic_ops; extern void x86_init_noop(void); extern void x86_init_uint_noop(unsigned int unused); +extern int early_found_nvidia_display_card; #endif diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index 24deb30..0822fe9 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -170,6 +170,54 @@ static __init int setup_nox2apic(char *str) return 0; } early_param("nox2apic", setup_nox2apic); + +static __init int x2apic_set_blacklist_nvidia(const struct dmi_system_id *d) +{ + if (!early_found_nvidia_display_card) + return 1; + + setup_nox2apic(""); + pr_info("x2apic blacklisted when Nivida graphics enabled on %s\n", + d->ident); + return 0; +} + +static __init int x2apic_set_blacklist(const struct dmi_system_id *d) +{ + setup_nox2apic(""); + pr_info("x2apic blacklisted because of broken SMI on %s\n", + d->ident); + return 0; +} + +static const struct dmi_system_id x2apic_dmi_table[] = { + { + .callback = x2apic_set_blacklist_nvidia, + .ident = "Lenovo ThinkPad T420", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"), + DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad T420"), + }, + }, + { + .callback = x2apic_set_blacklist, + .ident = "Lenovo ThinkPad W520", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"), + DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad W520"), + }, + }, + { + .callback = x2apic_set_blacklist, + .ident = "Lenovo ThinkPad L520", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"), + DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad L520"), + }, + }, + {} +}; + #endif unsigned long mp_lapic_addr; @@ -1542,6 +1590,9 @@ void __init enable_IR_x2apic(void) int ret, x2apic_enabled = 0; int hardware_init_ret; + if (x2apic_supported()) + dmi_check_system(x2apic_dmi_table); + /* Make sure irq_remap_ops are initialized */ setup_irq_remapping_ops(); diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c index 7548932..852d7a0 100644 --- a/arch/x86/kernel/early-quirks.c +++ b/arch/x86/kernel/early-quirks.c @@ -19,6 +19,8 @@ #include #include +int early_found_nvidia_display_card __initdata; + static void __init fix_hypertransport_config(int num, int slot, int func) { u32 htcfg; @@ -192,6 +194,11 @@ static void __init ati_bugs_contd(int num, int slot, int func) } #endif +static void __init nvidia_x2apic_bugs(int num, int slot, int func) +{ + early_found_nvidia_display_card = 1; +} + #define QFLAG_APPLY_ONCE 0x1 #define QFLAG_APPLIED 0x2 #define QFLAG_DONE (QFLAG_APPLY_ONCE|QFLAG_APPLIED) @@ -221,6 +228,8 @@ static struct chipset early_qrk[] __initdata = { PCI_CLASS_SERIAL_SMBUS, PCI_ANY_ID, 0, ati_bugs }, { PCI_VENDOR_ID_ATI, PCI_DEVICE_ID_ATI_SBX00_SMBUS, PCI_CLASS_SERIAL_SMBUS, PCI_ANY_ID, 0, ati_bugs_contd }, + { PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID, + PCI_CLASS_DISPLAY_VGA, 0xff00, 0, nvidia_x2apic_bugs}, {} }; -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V2 1/4] x86,idle: Quickly notice prediction failure for repeat mode
The prediction for future is difficult and when the cpuidle governor prediction fails and govenor possibly choose the shallower C-state than it should. How to quickly notice and find the failure becomes important for power saving. cpuidle menu governor has a method to predict the repeat pattern if there are 8 C-states residency which are continuous and the same or very close, so it will predict the next C-states residency will keep same residency time. There is a real case that turbostat utility (tools/power/x86/turbostat) at kernel 3.3 or early. turbostat utility will read 10 registers one by one at Sandybridge, so it will generate 10 IPIs to wake up idle CPUs. So cpuidle menu governor will predict it is repeat mode and there is another IPI wake up idle CPU soon, so it keeps idle CPU stay at C1 state even though CPU is totally idle. However, in the turbostat, following 10 registers reading is sleep 5 seconds by default, so the idle CPU will keep at C1 for a long time though it is idle until break event occurs. In a idle Sandybridge system, run "./turbostat -v", we will notice that deep C-state dangles between "70% ~ 99%". After patched the kernel, we will notice deep C-state stays at >99.98%. In the patch, a timer is added when menu governor detects a repeat mode and choose a shallow C-state. The timer is set to a time out value that greater than predicted time, and we conclude repeat mode prediction failure if timer is triggered. When repeat mode happens as expected, the timer is not triggered and CPU waken up from C-states and it will cancel the timer initiatively. When repeat mode does not happen, the timer will be time out and menu governor will quickly notice that the repeat mode prediction fails and then re-evaluates deeper C-states possibility. Below is another case which will clearly show the patch much benefit: #include #include #include #include #include #include #include volatile int * shutdown; volatile long * count; int delay = 20; int loop = 8; void usage(void) { fprintf(stderr, "Usage: idle_predict [options]\n" " --help -h Print this help\n" " --thread -n Thread number\n" " --loop -l Loop times in shallow Cstate\n" " --delay -t Sleep time (uS)in shallow Cstate\n"); } void *simple_loop() { int idle_num = 1; while (!(*shutdown)) { *count = *count + 1; if (idle_num % loop) usleep(delay); else { /* sleep 1 second */ usleep(100); idle_num = 0; } idle_num++; } } static void sighand(int sig) { *shutdown = 1; } int main(int argc, char *argv[]) { sigset_t sigset; int signum = SIGALRM; int i, c, er = 0, thread_num = 8; pthread_t pt[1024]; static char optstr[] = "n:l:t:h:"; while ((c = getopt(argc, argv, optstr)) != EOF) switch (c) { case 'n': thread_num = atoi(optarg); break; case 'l': loop = atoi(optarg); break; case 't': delay = atoi(optarg); break; case 'h': default: usage(); exit(1); } printf("thread=%d,loop=%d,delay=%d\n",thread_num,loop,delay); count = malloc(sizeof(long)); shutdown = malloc(sizeof(int)); *count = 0; *shutdown = 0; sigemptyset(&sigset); sigaddset(&sigset, signum); sigprocmask (SIG_BLOCK, &sigset, NULL); signal(SIGINT, sighand); signal(SIGTERM, sighand); for(i = 0; i < thread_num ; i++) pthread_create(&pt[i], NULL, simple_loop, NULL); for (i = 0; i < thread_num; i++) pthread_join(pt[i], NULL); exit(0); } Get powertop V2 from git://github.com/fenrus75/powertop, build powertop. After build the above test application, then run it. Test plaform can be Intel Sandybridge or other recent platforms. #./idle_predict -l 10 & #./powertop We will find that deep C-state will dangle between 40%~100% and much time spent on C1 state. It is because menu governor wrongly predict that repeat mode is kept, so it will choose the C1 shallow C-state even though it has chance to sleep 1 second in deep C-state. While after patched the kernel, we find that deep C-state will keep >99.
[PATCH V2 2/4] x86,idle: Quickly notice prediction failure in general case
The prediction for future is difficult and when the cpuidle governor prediction fails and govenor possibly choose the shallower C-state than it should. How to quickly notice and find the failure becomes important for power saving. The patch extends to general case that prediction logic get a small predicted residency, so it choose a shallow C-state though the expected residency is large . Once the prediction will be fail, the CPU will keep staying at shallow C-state for a long time. Acutally, the CPU has change enter into deep C-state. So when the expected residency is long enough but governor choose a shallow C-state, an timer will be added in order to monitor if the prediction failure. When C-state is waken up prior to the adding timer, the timer will be cancelled initiatively. When the timer is triggered and menu governor will quickly notice prediction failure and re-evaluates deeper C-states possibility. Signed-off-by: Youquan Song Signed-off-by: Rik van Riel --- drivers/cpuidle/governors/menu.c | 34 +- 1 files changed, 33 insertions(+), 1 deletions(-) diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index 37c0ff6..c824b4f 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -34,7 +34,7 @@ static DEFINE_PER_CPU(struct hrtimer, menu_hrtimer); static DEFINE_PER_CPU(int, hrtimer_status); /* menu hrtimer mode */ -enum {MENU_HRTIMER_STOP, MENU_HRTIMER_REPEAT}; +enum {MENU_HRTIMER_STOP, MENU_HRTIMER_REPEAT, MENU_HRTIMER_GENERAL}; /* * Concepts and ideas behind the menu governor @@ -116,6 +116,13 @@ enum {MENU_HRTIMER_STOP, MENU_HRTIMER_REPEAT}; * */ +/* + * The C-state residency is so long that is is worthwhile to exit + * from the shallow C-state and re-enter into a deeper C-state. + */ +static unsigned int perfect_cstate_ms __read_mostly = 30; +module_param(perfect_cstate_ms, uint, ); + struct menu_device { int last_state_idx; int needs_update; @@ -216,7 +223,17 @@ EXPORT_SYMBOL_GPL(menu_hrtimer_cancel); static enum hrtimer_restart menu_hrtimer_notify(struct hrtimer *hrtimer) { int cpu = smp_processor_id(); + struct menu_device *data = &per_cpu(menu_devices, cpu); + /* In general case, the expected residency is much larger than +* deepest C-state target residency, but prediction logic still +* predicts a small predicted residency, so the prediction +* history is totally broken if the timer is triggered. +* So reset the correction factor. +*/ + if (per_cpu(hrtimer_status, cpu) == MENU_HRTIMER_GENERAL) + data->correction_factor[data->bucket] = RESOLUTION * DECAY; + per_cpu(hrtimer_status, cpu) = MENU_HRTIMER_STOP; return HRTIMER_NORESTART; @@ -353,6 +370,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev) /* not deepest C-state chosen for low predicted residency */ if (low_predicted) { unsigned int timer_us = 0; + unsigned int perfect_us = 0; /* * Set a timer to detect whether this sleep is much @@ -363,12 +381,26 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev) */ timer_us = 2 * (data->predicted_us + MAX_DEVIATION); + perfect_us = perfect_cstate_ms * 1000; + if (repeat && (4 * timer_us < data->expected_us)) { hrtimer_start(hrtmr, ns_to_ktime(1000 * timer_us), HRTIMER_MODE_REL_PINNED); /* In repeat case, menu hrtimer is started */ per_cpu(hrtimer_status, cpu) = MENU_HRTIMER_REPEAT; + } else if (perfect_us < data->expected_us) { + /* +* The next timer is long. This could be because +* we did not make a useful prediction. +* In that case, it makes sense to re-enter +* into a deeper C-state after some time. +*/ + hrtimer_start(hrtmr, ns_to_ktime(1000 * timer_us), + HRTIMER_MODE_REL_PINNED); + /* In general case, menu hrtimer is started */ + per_cpu(hrtimer_status, cpu) = MENU_HRTIMER_GENERAL; } + } return data->last_state_idx; -- 1.7.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V2 4/4] x86,idle: Get typical recent sleep interval
The function detect_repeating_patterns was not very useful for workloads with alternating long and short pauses, for example virtual machines handling network requests for each other (say a web and database server). Instead, try to find a recent sleep interval that is somewhere between the median and the mode sleep time, by discarding outliers to the up side and recalculating the average and standard deviation until that is no longer required. This should do something sane with a sleep interval series like: 200 180 210 1 30 1000 170 200 The current code would simply discard such a series, while the new code will guess a typical sleep interval just shy of 200. The original patch come from Rik van Riel . Signed-off-by: Youquan Song Signed-off-by: Rik van Riel --- drivers/cpuidle/governors/menu.c | 69 + 1 files changed, 46 insertions(+), 23 deletions(-) diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index c824b4f..2411c4c 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -245,36 +245,59 @@ static enum hrtimer_restart menu_hrtimer_notify(struct hrtimer *hrtimer) * of points is below a threshold. If it is... then use the * average of these 8 points as the estimated value. */ -static int detect_repeating_patterns(struct menu_device *data) +static u32 get_typical_interval(struct menu_device *data) { - int i; - uint64_t avg = 0; - uint64_t stddev = 0; /* contains the square of the std deviation */ - int ret = 0; - - /* first calculate average and standard deviation of the past */ - for (i = 0; i < INTERVALS; i++) - avg += data->intervals[i]; - avg = avg / INTERVALS; + int i = 0, divisor = 0; + int64_t max = 0, avg = 0, stddev = 0; + int64_t thresh = LLONG_MAX; /* Discard outliers above this value. */ + unsigned int ret = 0; - /* if the avg is beyond the known next tick, it's worthless */ - if (avg > data->expected_us) - return 0; - - for (i = 0; i < INTERVALS; i++) - stddev += (data->intervals[i] - avg) * - (data->intervals[i] - avg); +again: - stddev = stddev / INTERVALS; + /* first calculate average and standard deviation of the past */ + max = avg = divisor = stddev = 0; + for (i = 0; i < INTERVALS; i++) { + int64_t value = data->intervals[i]; + if (value <= thresh) { + avg += value; + divisor++; + if (value > max) + max = value; + } + } + do_div(avg, divisor); + for (i = 0; i < INTERVALS; i++) { + int64_t value = data->intervals[i]; + if (value <= thresh) { + int64_t diff = value - avg; + stddev += diff * diff; + } + } + do_div(stddev, divisor); + stddev = int_sqrt(stddev); /* -* now.. if stddev is small.. then assume we have a -* repeating pattern and predict we keep doing this. +* If we have outliers to the upside in our distribution, discard +* those by setting the threshold to exclude these outliers, then +* calculate the average and standard deviation again. Once we get +* down to the bottom 3/4 of our samples, stop excluding samples. +* +* This can deal with workloads that have long pauses interspersed +* with sporadic activity with a bunch of short pauses. +* +* The typical interval is obtained when standard deviation is small +* or standard deviation is small compared to the average interval. */ - - if (avg && stddev < STDDEV_THRESH) { + if (((avg > stddev * 6) && (divisor * 4 >= INTERVALS * 3)) + || stddev <= 20) { data->predicted_us = avg; ret = 1; + return ret; + + } else if ((divisor * 4) > INTERVALS * 3) { + /* Exclude the max interval */ + thresh = max - 1; + goto again; } return ret; @@ -330,7 +353,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev) data->predicted_us = div_round64(data->expected_us * data->correction_factor[data->bucket], RESOLUTION * DECAY); - repeat = detect_repeating_patterns(data); + repeat = get_typical_interval(data); /* * We want to default to C1 (hlt), not to busy polling -- 1.7.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vg
[PATCH V2 0/4]: x86,idle: Enhance menu governor C-state prediction
V2: Add menu timer status enums depends on Rafael suggestion. The prediction for future is difficult and when the cpuidle governor prediction fails and govenor possibly choose the shallower C-state than it should. How to quickly notice and find the failure becomes important for power saving. cpuidle menu governor has a method to predict the repeat pattern if there are 8 C-states residency which are continuous and the same or very close, so it will predict the next C-states residency will keep same residency time. This patchset adds a timer when menu governor choose a non-deepest C-state in order to wake up quickly from shallow C-state to avoid staying too long at shallow C-state for prediction failure. The timer is set to a time out value that is greater than predicted time and if the timer with the value is triggered , we can confidently conclude prediction is failure. When prediction succeeds, CPU is waken up from C-states in predicted time and the timer is not triggered and will be cancelled right after CPU waken up. When prediction fails, the timer is triggered to wake up CPU from shallow C-states, so menu governor will quickly notice that prediction fails and then re-evaluates deeper C-states possibility. This patchset can improves cpuidle prediction process for both repeat mode and general mode. The patchset integrates one patch from Rik van Riel , which try to find a typical interval along with cut the upside outliers depends on historical sleep intervals. The patch tends to choose a shallow C-state to achieve better performance and ehancement of prediction failure will advise it if the deepest C-state should be chosen. Testing result: The whole patchset achieve good result after bunch of testing/tuning. Testing on two sockets Sandybridge server, SPECPower2008 get 2%~5% increase ssj_ops/watt; Running benchmark in phoronix-test-suite: compress-7zip, build-linux-kernel, apache, fio etc, it also proves to increase the performance/power; What's more, it not only boosts the performance but also saves power. There are also 2 cases will clear show this patchset benefit. One case is turbostat utility (tools/power/x86/turbostat) at kernel 3.3 or early . turbostat utility will read 10 registers one by one at Sandybridge, so it will generate 10 IPIs to wake up idle CPUs. So cpuidle menu governor will predict it is repeat mode and there is another IPI wake up idle CPU soon, so it keeps idle CPU stay at C1 state even though CPU is totally idle. However, in the turbostat , following 10 registers reading is sleep 5 seconds by default, so the idle CPU will keep at C1 for a long time though it is idle until break event occurs. In a idle Sandybridge system, run "./turbostat -v", we will notice that deep C-state dangles between "70% ~ 99%". After patched the kernel, we will notice deep C-state stays at >99.98%. Below is another case which will clearly show the patch much benefit: #include #include #include #include #include #include #include volatile int * shutdown; volatile long * count; int delay = 20; int loop = 8; void usage(void) { fprintf(stderr, "Usage: idle_predict [options]\n" " --help -h Print this help\n" " --thread -n Thread number\n" " --loop -l Loop times in shallow Cstate\n" " --delay -t Sleep time (uS)in shallow Cstate\n"); } void *simple_loop() { int idle_num = 1; while (!(*shutdown)) { *count = *count + 1; if (idle_num % loop) usleep(delay); else { /* sleep 1 second */ usleep(100); idle_num = 0; } idle_num++; } } static void sighand(int sig) { *shutdown = 1; } int main(int argc, char *argv[]) { sigset_t sigset; int signum = SIGALRM; int i, c, er = 0, thread_num = 8; pthread_t pt[1024]; static char optstr[] = "n:l:t:h:"; while ((c = getopt(argc, argv, optstr)) != EOF) switch (c) { case 'n': thread_num = atoi(optarg); break; case 'l': loop = atoi(optarg); break; case 't': delay = atoi(optarg); break; case 'h': default: usage(); exit(1); } printf("thread=%d,loop=%d,delay=%d\n",thread_num,loop,delay); count = malloc(sizeof(long)); shutdown = malloc(sizeof(int)); *count = 0; *shutdown = 0; sigemptyset(&sigset); sigadds
[PATCH V2 3/4] x86,idle: Set residency to 0 if target Cstate not enter
When cpuidle governor choose a C-state to enter for idle CPU, but it notice that there is tasks request to be executed. So the idle CPU will not really enter the target C-state and go to run task. In this situation, it will use the residency of previous really entered target C-states. Obviously, it is not reasonable. So, this patch fix it by set the target C-state residency to 0. Signed-off-by: Youquan Song Signed-off-by: Rik van Riel --- drivers/cpuidle/cpuidle.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c index e28f6ea..01dca54 100644 --- a/drivers/cpuidle/cpuidle.c +++ b/drivers/cpuidle/cpuidle.c @@ -144,6 +144,10 @@ int cpuidle_idle_call(void) /* ask the governor for the next state */ next_state = cpuidle_curr_governor->select(drv, dev); if (need_resched()) { + dev->last_residency = 0; + /* give the governor an opportunity to reflect on the outcome */ + if (cpuidle_curr_governor->reflect) + cpuidle_curr_governor->reflect(dev, next_state); local_irq_enable(); return 0; } -- 1.7.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/5] x86,idle: Quickly notice prediction failure in general case
The prediction for future is difficult and when the cpuidle governor prediction fails and govenor possibly choose the shallower C-state than it should. How to quickly notice and find the failure becomes important for power saving. The patch extends to general case that prediction logic get a small predicted residency, so it choose a shallow C-state though the expected residency is large . Once the prediction will be fail, the CPU will keep staying at shallow C-state for a long time. Acutally, the CPU has change enter into deep C-state. So when the expected residency is long enough but governor choose a shallow C-state, an timer will be added in order to monitor if the prediction failure. When C-state is waken up prior to the adding timer, the timer will be cancelled initiatively. When the timer is triggered and menu governor will quickly notice prediction failure and re-evaluates deeper C-states possibility. Signed-off-by: Youquan Song Signed-off-by: Rik van Riel --- drivers/cpuidle/governors/menu.c | 22 ++ 1 files changed, 22 insertions(+), 0 deletions(-) diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index beeab6a..b34bf11 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -114,6 +114,13 @@ static DEFINE_PER_CPU(int, hrtimer_started); * */ +/* + * The C-state residency is so long that is is worthwhile to exit + * from the shallow C-state and re-enter into a deeper C-state. + */ +static unsigned int perfect_cstate_ms __read_mostly = 30; +module_param(perfect_cstate_ms, uint, ); + struct menu_device { int last_state_idx; int needs_update; @@ -351,6 +358,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev) /* not deepest C-state chosen for low predicted residency */ if (low_predicted) { unsigned int timer_us = 0; + unsigned int perfect_us = 0; /* * Set a timer to detect whether this sleep is much @@ -361,12 +369,26 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev) */ timer_us = 2 * (data->predicted_us + MAX_DEVIATION); + perfect_us = perfect_cstate_ms * 1000; + if (repeat && (4 * timer_us < data->expected_us)) { hrtimer_start(hrtmr, ns_to_ktime(1000 * timer_us), HRTIMER_MODE_REL_PINNED); /* menu hrtimer is started */ per_cpu(hrtimer_started, cpu) = 1; + } else if (perfect_us < data->expected_us) { + /* +* The next timer is long. This could be because +* we did not make a useful prediction. +* In that case, it makes sense to re-enter +* into a deeper C-state after some time. +*/ + hrtimer_start(hrtmr, ns_to_ktime(1000 * timer_us), + HRTIMER_MODE_REL_PINNED); + /* menu hrtimer is started */ + per_cpu(hrtimer_started, cpu) = 1; } + } return data->last_state_idx; -- 1.7.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 5/5] x86,idle: Get typical recent sleep interval
The function detect_repeating_patterns was not very useful for workloads with alternating long and short pauses, for example virtual machines handling network requests for each other (say a web and database server). Instead, try to find a recent sleep interval that is somewhere between the median and the mode sleep time, by discarding outliers to the up side and recalculating the average and standard deviation until that is no longer required. This should do something sane with a sleep interval series like: 200 180 210 1 30 1000 170 200 The current code would simply discard such a series, while the new code will guess a typical sleep interval just shy of 200. The original patch come from Rik van Riel . Signed-off-by: Youquan Song Signed-off-by: Rik van Riel --- drivers/cpuidle/governors/menu.c | 69 + 1 files changed, 46 insertions(+), 23 deletions(-) diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index 7dbac97..dbb9e1c 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -237,36 +237,59 @@ static enum hrtimer_restart menu_hrtimer_notify(struct hrtimer *hrtimer) * of points is below a threshold. If it is... then use the * average of these 8 points as the estimated value. */ -static int detect_repeating_patterns(struct menu_device *data) +static u32 get_typical_interval(struct menu_device *data) { - int i; - uint64_t avg = 0; - uint64_t stddev = 0; /* contains the square of the std deviation */ - int ret = 0; - - /* first calculate average and standard deviation of the past */ - for (i = 0; i < INTERVALS; i++) - avg += data->intervals[i]; - avg = avg / INTERVALS; + int i = 0, divisor = 0; + int64_t max = 0, avg = 0, stddev = 0; + int64_t thresh = LLONG_MAX; /* Discard outliers above this value. */ + unsigned int ret = 0; - /* if the avg is beyond the known next tick, it's worthless */ - if (avg > data->expected_us) - return 0; - - for (i = 0; i < INTERVALS; i++) - stddev += (data->intervals[i] - avg) * - (data->intervals[i] - avg); +again: - stddev = stddev / INTERVALS; + /* first calculate average and standard deviation of the past */ + max = avg = divisor = stddev = 0; + for (i = 0; i < INTERVALS; i++) { + int64_t value = data->intervals[i]; + if (value <= thresh) { + avg += value; + divisor++; + if (value > max) + max = value; + } + } + do_div(avg, divisor); + for (i = 0; i < INTERVALS; i++) { + int64_t value = data->intervals[i]; + if (value <= thresh) { + int64_t diff = value - avg; + stddev += diff * diff; + } + } + do_div(stddev, divisor); + stddev = int_sqrt(stddev); /* -* now.. if stddev is small.. then assume we have a -* repeating pattern and predict we keep doing this. +* If we have outliers to the upside in our distribution, discard +* those by setting the threshold to exclude these outliers, then +* calculate the average and standard deviation again. Once we get +* down to the bottom 3/4 of our samples, stop excluding samples. +* +* This can deal with workloads that have long pauses interspersed +* with sporadic activity with a bunch of short pauses. +* +* The typical interval is obtained when standard deviation is small +* or standard deviation is small compared to the average interval. */ - - if (avg && stddev < STDDEV_THRESH) { + if (((avg > stddev * 6) && (divisor * 4 >= INTERVALS * 3)) + || stddev <= 20) { data->predicted_us = avg; ret = 1; + return ret; + + } else if ((divisor * 4) > INTERVALS * 3) { + /* Exclude the max interval */ + thresh = max - 1; + goto again; } return ret; @@ -322,7 +345,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev) data->predicted_us = div_round64(data->expected_us * data->correction_factor[data->bucket], RESOLUTION * DECAY); - repeat = detect_repeating_patterns(data); + repeat = get_typical_interval(data); /* * We want to default to C1 (hlt), not to busy polling -- 1.7.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vg
[PATCH 4/5] x86,idle: Set residency to 0 if target Cstate not enter
When cpuidle governor choose a C-state to enter for idle CPU, but it notice that there is tasks request to be executed. So the idle CPU will not really enter the target C-state and go to run task. In this situation, it will use the residency of previous really entered target C-states. Obviously, it is not reasonable. So, this patch fix it by set the target C-state residency to 0. Signed-off-by: Youquan Song Signed-off-by: Rik van Riel --- drivers/cpuidle/cpuidle.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c index e28f6ea..01dca54 100644 --- a/drivers/cpuidle/cpuidle.c +++ b/drivers/cpuidle/cpuidle.c @@ -144,6 +144,10 @@ int cpuidle_idle_call(void) /* ask the governor for the next state */ next_state = cpuidle_curr_governor->select(drv, dev); if (need_resched()) { + dev->last_residency = 0; + /* give the governor an opportunity to reflect on the outcome */ + if (cpuidle_curr_governor->reflect) + cpuidle_curr_governor->reflect(dev, next_state); local_irq_enable(); return 0; } -- 1.7.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/5] x86,idle: Reset correction factor
In general case, the expected residency is much larger than deepest C-state target residency, but prediction logic still predicts the small predicted residency, so the prediction history is totally broken. In this situation, reset the correction factor is the only choice. Signed-off-by: Youquan Song Signed-off-by: Rik van Riel --- drivers/cpuidle/governors/menu.c |6 +- 1 files changed, 5 insertions(+), 1 deletions(-) diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index b34bf11..7dbac97 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -221,6 +221,10 @@ EXPORT_SYMBOL_GPL(menu_hrtimer_cancel); static enum hrtimer_restart menu_hrtimer_notify(struct hrtimer *hrtimer) { int cpu = smp_processor_id(); + struct menu_device *data = &per_cpu(menu_devices, cpu); + + if (per_cpu(hrtimer_started, cpu) == 2) + data->correction_factor[data->bucket] = RESOLUTION * DECAY; per_cpu(hrtimer_started, cpu) = 0; @@ -386,7 +390,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev) hrtimer_start(hrtmr, ns_to_ktime(1000 * timer_us), HRTIMER_MODE_REL_PINNED); /* menu hrtimer is started */ - per_cpu(hrtimer_started, cpu) = 1; + per_cpu(hrtimer_started, cpu) = 2; } } -- 1.7.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/5] x86,idle: Quickly notice prediction failure for repeat mode
The prediction for future is difficult and when the cpuidle governor prediction fails and govenor possibly choose the shallower C-state than it should. How to quickly notice and find the failure becomes important for power saving. cpuidle menu governor has a method to predict the repeat pattern if there are 8 C-states residency which are continuous and the same or very close, so it will predict the next C-states residency will keep same residency time. There is a real case that turbostat utility (tools/power/x86/turbostat) at kernel 3.3 or early. turbostat utility will read 10 registers one by one at Sandybridge, so it will generate 10 IPIs to wake up idle CPUs. So cpuidle menu governor will predict it is repeat mode and there is another IPI wake up idle CPU soon, so it keeps idle CPU stay at C1 state even though CPU is totally idle. However, in the turbostat, following 10 registers reading is sleep 5 seconds by default, so the idle CPU will keep at C1 for a long time though it is idle until break event occurs. In a idle Sandybridge system, run "./turbostat -v", we will notice that deep C-state dangles between "70% ~ 99%". After patched the kernel, we will notice deep C-state stays at >99.98%. In the patch, a timer is added when menu governor detects a repeat mode and choose a shallow C-state. The timer is set to a time out value that greater than predicted time, and we conclude repeat mode prediction failure if timer is triggered. When repeat mode happens as expected, the timer is not triggered and CPU waken up from C-states and it will cancel the timer initiatively. When repeat mode does not happen, the timer will be time out and menu governor will quickly notice that the repeat mode prediction fails and then re-evaluates deeper C-states possibility. Below is another case which will clearly show the patch much benefit: #include #include #include #include #include #include #include volatile int * shutdown; volatile long * count; int delay = 20; int loop = 8; void usage(void) { fprintf(stderr, "Usage: idle_predict [options]\n" " --help -h Print this help\n" " --thread -n Thread number\n" " --loop -l Loop times in shallow Cstate\n" " --delay -t Sleep time (uS)in shallow Cstate\n"); } void *simple_loop() { int idle_num = 1; while (!(*shutdown)) { *count = *count + 1; if (idle_num % loop) usleep(delay); else { /* sleep 1 second */ usleep(100); idle_num = 0; } idle_num++; } } static void sighand(int sig) { *shutdown = 1; } int main(int argc, char *argv[]) { sigset_t sigset; int signum = SIGALRM; int i, c, er = 0, thread_num = 8; pthread_t pt[1024]; static char optstr[] = "n:l:t:h:"; while ((c = getopt(argc, argv, optstr)) != EOF) switch (c) { case 'n': thread_num = atoi(optarg); break; case 'l': loop = atoi(optarg); break; case 't': delay = atoi(optarg); break; case 'h': default: usage(); exit(1); } printf("thread=%d,loop=%d,delay=%d\n",thread_num,loop,delay); count = malloc(sizeof(long)); shutdown = malloc(sizeof(int)); *count = 0; *shutdown = 0; sigemptyset(&sigset); sigaddset(&sigset, signum); sigprocmask (SIG_BLOCK, &sigset, NULL); signal(SIGINT, sighand); signal(SIGTERM, sighand); for(i = 0; i < thread_num ; i++) pthread_create(&pt[i], NULL, simple_loop, NULL); for (i = 0; i < thread_num; i++) pthread_join(pt[i], NULL); exit(0); } Get powertop V2 from git://github.com/fenrus75/powertop, build powertop. After build the above test application, then run it. Test plaform can be Intel Sandybridge or other recent platforms. #./idle_predict -l 10 & #./powertop We will find that deep C-state will dangle between 40%~100% and much time spent on C1 state. It is because menu governor wrongly predict that repeat mode is kept, so it will choose the C1 shallow C-state even though it has chance to sleep 1 second in deep C-state. While after patched the kernel, we find that deep C-state will keep >99.
[PATCH 0/5] x86,idle: Enhance menu governor C-state prediction
The prediction for future is difficult and when the cpuidle governor prediction fails and govenor possibly choose the shallower C-state than it should. How to quickly notice and find the failure becomes important for power saving. cpuidle menu governor has a method to predict the repeat pattern if there are 8 C-states residency which are continuous and the same or very close, so it will predict the next C-states residency will keep same residency time. This patchset adds a timer when menu governor choose a non-deepest C-state in order to wake up quickly from shallow C-state to avoid staying too long at shallow C-state for prediction failure. The timer is set to a time out value that is greater than predicted time and if the timer with the value is triggered , we can confidently conclude prediction is failure. When prediction succeeds, CPU is waken up from C-states in predicted time and the timer is not triggered and will be cancelled right after CPU waken up. When prediction fails, the timer is triggered to wake up CPU from shallow C-states, so menu governor will quickly notice that prediction fails and then re-evaluates deeper C-states possibility. This patchset can improves cpuidle prediction process for both repeat mode and general mode. The patchset integrates one patch from Rik van Riel , which try to find a typical interval along with cut the upside outliers depends on historical sleep intervals. The patch tends to choose a shallow C-state to achieve better performance and ehancement of prediction failure will advise it if the deepest C-state should be chosen. Testing result: The whole patchset achieve good result after bunch of testing/tuning. Testing on two sockets Sandybridge server, SPECPower2008 get 2%~5% increase ssj_ops/watt; Running benchmark in phoronix-test-suite: compress-7zip, build-linux-kernel, apache, fio etc, it also proves to increase the performance/power; What's more, it not only boosts the performance but also saves power. There are also 2 cases will clear show this patchset benefit. One case is turbostat utility (tools/power/x86/turbostat) at kernel 3.3 or early . turbostat utility will read 10 registers one by one at Sandybridge, so it will generate 10 IPIs to wake up idle CPUs. So cpuidle menu governor will predict it is repeat mode and there is another IPI wake up idle CPU soon, so it keeps idle CPU stay at C1 state even though CPU is totally idle. However, in the turbostat , following 10 registers reading is sleep 5 seconds by default, so the idle CPU will keep at C1 for a long time though it is idle until break event occurs. In a idle Sandybridge system, run "./turbostat -v", we will notice that deep C-state dangles between "70% ~ 99%". After patched the kernel, we will notice deep C-state stays at >99.98%. Below is another case which will clearly show the patch much benefit: #include #include #include #include #include #include #include volatile int * shutdown; volatile long * count; int delay = 20; int loop = 8; void usage(void) { fprintf(stderr, "Usage: idle_predict [options]\n" " --help -h Print this help\n" " --thread -n Thread number\n" " --loop -l Loop times in shallow Cstate\n" " --delay -t Sleep time (uS)in shallow Cstate\n"); } void *simple_loop() { int idle_num = 1; while (!(*shutdown)) { *count = *count + 1; if (idle_num % loop) usleep(delay); else { /* sleep 1 second */ usleep(100); idle_num = 0; } idle_num++; } } static void sighand(int sig) { *shutdown = 1; } int main(int argc, char *argv[]) { sigset_t sigset; int signum = SIGALRM; int i, c, er = 0, thread_num = 8; pthread_t pt[1024]; static char optstr[] = "n:l:t:h:"; while ((c = getopt(argc, argv, optstr)) != EOF) switch (c) { case 'n': thread_num = atoi(optarg); break; case 'l': loop = atoi(optarg); break; case 't': delay = atoi(optarg); break; case 'h': default: usage(); exit(1); } printf("thread=%d,loop=%d,delay=%d\n",thread_num,loop,delay); count = malloc(sizeof(long)); shutdown = malloc(sizeof(int)); *count = 0; *shutdown = 0; sigemptyset(&sigset); sigaddset(&sigset, signum); sigprocmask (SIG_BLOCK, &sigset,
Re: [PATCH V2 0/3] x86,idle: Enhance cpuidle prediction to handle its failure
> > One case is turbostat utility (tools/power/x86/turbostat) at kernel 3.3 or > > early > > . turbostat utility will read 10 registers one by one at Sandybridge, so it > > will > > generate 10 IPIs to wake up idle CPUs. So cpuidle menu governor will > > predict it > > is repeat mode and there is another IPI wake up idle CPU soon, so it keeps > > idle > > CPU stay at C1 state even though CPU is totally idle. However, in the > > turbostat > > , following 10 registers reading is sleep 5 seconds by default, so the idle > > CPU > > will keep at C1 for a long time though it is idle until break event occurs. > > In a idle Sandybridge system, run "./turbostat -v", we will notice that > > deep > > C-state dangles between "70% ~ 99%". After patched the kernel, we will > > notice > > deep C-state stays at >99.98%. > > Is there an impact on performances ? In this case, turbostat is utility to measure cpu idle status and itself also is a workload to system. Its purpose is that show cpu C-state information every 5 seconds. After patched the kernel, it also does the same thing as usual. So I think the performance has no/little impact. I do not find performance impact in my tests. If you performance impact cases or suggestions, I will be very glad to try. Thanks -Youquan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2 1/3] x86,idle: Quickly notice prediction failure for repeat mode
> Could I convince you to try out my variation on > detect_repeating_intervals? :) > > http://people.redhat.com/riel/cstate/cstate-stddev-converge.patch > > I suspect that small change might help your code adapt to changed > conditions even faster. Yes. of course. your patch of cstate-stddev-converge is a good point by filter some noise first, then calculate further. I will try to integrate the patch to my patchset, then ask you review tomorrow. Thanks -Youquan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V2 3/3] x86,idle: Set residency to 0 if target Cstate not really enter
When cpuidle governor choose a C-state to enter for idle CPU, but it notice that there is tasks request to be executed. So the idle CPU will not really enter the target C-state and go to run task. In this situation, it will use the residency of previous really entered target C-states. Obviously, it is not reasonable. So, this patch fix it by set the target C-state residency to 0. Signed-off-by: Youquan Song --- drivers/cpuidle/cpuidle.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c index 2f0083a..7992417 100644 --- a/drivers/cpuidle/cpuidle.c +++ b/drivers/cpuidle/cpuidle.c @@ -136,6 +136,10 @@ int cpuidle_idle_call(void) /* ask the governor for the next state */ next_state = cpuidle_curr_governor->select(drv, dev); if (need_resched()) { + dev->last_residency = 0; + /* give the governor an opportunity to reflect on the outcome */ + if (cpuidle_curr_governor->reflect) + cpuidle_curr_governor->reflect(dev, next_state); local_irq_enable(); return 0; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V2 2/3] x86,idle: Quickly notice prediction failure in general case
The prediction for future is difficult and when the cpuidle governor prediction fails and govenor possibly choose the shallower C-state than it should. How to quickly notice and find the failure becomes important for power saving. The patch extends the patch to enhance the prediction for repeat mode by add a timer when menu governor choose a shallow C-state. The timer is set to time out in 50 milli-seconds by default. It is special twist that there are no power saving gains even sleep longer than it. When C-state is waken up prior to the adding timer, the timer will be cancelled initiatively. When the timer is triggered and menu governor will quickly notice prediction failure and re-evaluates deeper C-states possibility. Signed-off-by: Youquan Song --- drivers/cpuidle/governors/menu.c | 48 ++ 1 files changed, 33 insertions(+), 15 deletions(-) diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index 8c23fbd..9f92dd4 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -113,6 +113,13 @@ static DEFINE_PER_CPU(int, hrtimer_started); * represented in the system load average. * */ + +/* + * Default set to 50 milliseconds based on special twist mentioned above that + * there are no power gains sleep longer than it. + */ +static unsigned int perfect_cstate_ms __read_mostly = 50; +module_param(perfect_cstate_ms, uint, ); struct menu_device { int last_state_idx; @@ -343,26 +350,37 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev) data->exit_us = s->exit_latency; } } - + + /* not deepest C-state chosen */ if (data->last_state_idx < drv->state_count - 1) { + unsigned int repeat_us = 0; + unsigned int perfect_us = 0; + + /* +* Set enough timer to recognize the repeat mode broken. +* If the timer is time out, the repeat mode prediction +* fails,then re-evaluate deeper C-states possibility. +* If the timer is not triggered, the timer will be +* cancelled when CPU waken up. +*/ + repeat_us = + (repeat ? (2 * data->predicted_us + MAX_DEVIATION) : 0); + perfect_us = perfect_cstate_ms * 1000; /* Repeat mode detected */ - if (repeat) { - unsigned int repeat_us = 0; - /* -* Set enough timer to recognize the repeat mode broken. -* If the timer is time out, the repeat mode prediction -* fails,then re-evaluate deeper C-states possibility. -* If the timer is not triggered, the timer will be -* cancelled when CPU waken up. -*/ - repeat_us = 2 * data->predicted_us + MAX_DEVIATION; - hrtimer_start(hrtmr, ns_to_ktime(1000 * repeat_us), - HRTIMER_MODE_REL_PINNED); + if (repeat && (repeat_us < perfect_us)) { + hrtimer_start(hrtmr, ns_to_ktime(1000 * repeat_us), + HRTIMER_MODE_REL_PINNED); + /* menu hrtimer is started */ + per_cpu(hrtimer_started, cpu) = 1; + } else if (perfect_us < data->expected_us) { + /* expected time is larger than adding timer time */ + hrtimer_start(hrtmr, ns_to_ktime(1000 * perfect_us), + HRTIMER_MODE_REL_PINNED); /* menu hrtimer is started */ per_cpu(hrtimer_started, cpu) = 1; - } - } + } + } return data->last_state_idx; } -- 1.6.4.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V2 1/3] x86,idle: Quickly notice prediction failure for repeat mode
The prediction for future is difficult and when the cpuidle governor prediction fails and govenor possibly choose the shallower C-state than it should. How to quickly notice and find the failure becomes important for power saving. cpuidle menu governor has a method to predict the repeat pattern if there are 8 C-states residency which are continuous and the same or very close, so it will predict the next C-states residency will keep same residency time. There is a real case that turbostat utility (tools/power/x86/turbostat) at kernel 3.3 or early. turbostat utility will read 10 registers one by one at Sandybridge, so it will generate 10 IPIs to wake up idle CPUs. So cpuidle menu governor will predict it is repeat mode and there is another IPI wake up idle CPU soon, so it keeps idle CPU stay at C1 state even though CPU is totally idle. However, in the turbostat, following 10 registers reading is sleep 5 seconds by default, so the idle CPU will keep at C1 for a long time though it is idle until break event occurs. In a idle Sandybridge system, run "./turbostat -v", we will notice that deep C-state dangles between "70% ~ 99%". After patched the kernel, we will notice deep C-state stays at >99.98%. In the patch, a timer is added when menu governor detects a repeat mode and choose a shallow C-state. The timer is set to a time out value that greater than predicted time, and we conclude repeat mode prediction failure if timer is triggered. When repeat mode happens as expected, the timer is not triggered and CPU waken up from C-states and it will cancel the timer initiatively. When repeat mode does not happen, the timer will be time out and menu governor will quickly notice that the repeat mode prediction fails and then re-evaluates deeper C-states possibility. Below is another case which will clearly show the patch much benefit: #include #include #include #include #include #include #include volatile int * shutdown; volatile long * count; int delay = 20; int loop = 8; void usage(void) { fprintf(stderr, "Usage: idle_predict [options]\n" " --help -h Print this help\n" " --thread -n Thread number\n" " --loop -l Loop times in shallow Cstate\n" " --delay -t Sleep time (uS)in shallow Cstate\n"); } void *simple_loop() { int idle_num = 1; while (!(*shutdown)) { *count = *count + 1; if (idle_num % loop) usleep(delay); else { /* sleep 1 second */ usleep(100); idle_num = 0; } idle_num++; } } static void sighand(int sig) { *shutdown = 1; } int main(int argc, char *argv[]) { sigset_t sigset; int signum = SIGALRM; int i, c, er = 0, thread_num = 8; pthread_t pt[1024]; static char optstr[] = "n:l:t:h:"; while ((c = getopt(argc, argv, optstr)) != EOF) switch (c) { case 'n': thread_num = atoi(optarg); break; case 'l': loop = atoi(optarg); break; case 't': delay = atoi(optarg); break; case 'h': default: usage(); exit(1); } printf("thread=%d,loop=%d,delay=%d\n",thread_num,loop,delay); count = malloc(sizeof(long)); shutdown = malloc(sizeof(int)); *count = 0; *shutdown = 0; sigemptyset(&sigset); sigaddset(&sigset, signum); sigprocmask (SIG_BLOCK, &sigset, NULL); signal(SIGINT, sighand); signal(SIGTERM, sighand); for(i = 0; i < thread_num ; i++) pthread_create(&pt[i], NULL, simple_loop, NULL); for (i = 0; i < thread_num; i++) pthread_join(pt[i], NULL); exit(0); } Get powertop V2 from git://github.com/fenrus75/powertop, build powertop. After build the above test application, then run it. Test plaform can be Intel Sandybridge or other recent platforms. #./idle_predict -l 10 & #./powertop We will find that deep C-state will dangle between 40%~100% and much time spent on C1 state. It is because menu governor wrongly predict that repeat mode is kept, so it will choose the C1 shallow C-state even though it has chance to sleep 1 second in deep C-state. While after patched the kernel, we find that deep C-state will keep >99.6
[PATCH V2 0/3] x86,idle: Enhance cpuidle prediction to handle its failure
The prediction for future is difficult and when the cpuidle governor prediction fails and govenor possibly choose the shallower C-state than it should. How to quickly notice and find the failure becomes important for power saving. cpuidle menu governor has a method to predict the repeat pattern if there are 8 C-states residency which are continuous and the same or very close, so it will predict the next C-states residency will keep same residency time. This patchset adds a timer when menu governor choose a non-deepest C-state in order to wake up quickly from shallow C-state to avoid staying too long at shallow C-state for prediction failure. The timer is set to a time out value that is greater than predicted time and if the timer with the value is triggered , we can confidently conclude prediction is failure. When prediction succeeds, CPU is waken up from C-states in predicted time and the timer is not triggered and will be cancelled right after CPU waken up. When prediction fails, the timer is triggered to wake up CPU from shallow C-states, so menu governor will quickly notice that prediction fails and then re-evaluates deeper C-states possibility. This patchset can improves cpuidle prediction process for both repeat mode and general mode. There are 2 cases will clear show this patchset benefit. One case is turbostat utility (tools/power/x86/turbostat) at kernel 3.3 or early . turbostat utility will read 10 registers one by one at Sandybridge, so it will generate 10 IPIs to wake up idle CPUs. So cpuidle menu governor will predict it is repeat mode and there is another IPI wake up idle CPU soon, so it keeps idle CPU stay at C1 state even though CPU is totally idle. However, in the turbostat , following 10 registers reading is sleep 5 seconds by default, so the idle CPU will keep at C1 for a long time though it is idle until break event occurs. In a idle Sandybridge system, run "./turbostat -v", we will notice that deep C-state dangles between "70% ~ 99%". After patched the kernel, we will notice deep C-state stays at >99.98%. Below is another case which will clearly show the patch much benefit: #include #include #include #include #include #include #include volatile int * shutdown; volatile long * count; int delay = 20; int loop = 8; void usage(void) { fprintf(stderr, "Usage: idle_predict [options]\n" " --help -h Print this help\n" " --thread -n Thread number\n" " --loop -l Loop times in shallow Cstate\n" " --delay -t Sleep time (uS)in shallow Cstate\n"); } void *simple_loop() { int idle_num = 1; while (!(*shutdown)) { *count = *count + 1; if (idle_num % loop) usleep(delay); else { /* sleep 1 second */ usleep(100); idle_num = 0; } idle_num++; } } static void sighand(int sig) { *shutdown = 1; } int main(int argc, char *argv[]) { sigset_t sigset; int signum = SIGALRM; int i, c, er = 0, thread_num = 8; pthread_t pt[1024]; static char optstr[] = "n:l:t:h:"; while ((c = getopt(argc, argv, optstr)) != EOF) switch (c) { case 'n': thread_num = atoi(optarg); break; case 'l': loop = atoi(optarg); break; case 't': delay = atoi(optarg); break; case 'h': default: usage(); exit(1); } printf("thread=%d,loop=%d,delay=%d\n",thread_num,loop,delay); count = malloc(sizeof(long)); shutdown = malloc(sizeof(int)); *count = 0; *shutdown = 0; sigemptyset(&sigset); sigaddset(&sigset, signum); sigprocmask (SIG_BLOCK, &sigset, NULL); signal(SIGINT, sighand); signal(SIGTERM, sighand); for(i = 0; i < thread_num ; i++) pthread_create(&pt[i], NULL, simple_loop, NULL); for (i = 0; i < thread_num; i++) pthread_join(pt[i], NULL); exit(0); } Get powertop v2 from git://github.com/fenrus75/powertop, build powertop. After build the above test application, then run it. Test plaform can be Intel Sandybridge or other recent platforms. #./idle_predict -l 10 & #./powertop We will find that deep C-state will dangle between 40%~100% and much time spent on C1 state. It is because menu governor wrongly predict that repeat mode is kept, so it will choose the C1 shallow C-state even though it has c
Re: KS/Plumbers: c-state governor BOF
> Your patches could make a lot of sense when integrated with my > patches: > > http://people.redhat.com/riel/cstate/ > However, we should probably get the tracepoint upstream first, > so we can know for sure :) I can not access the patches at this directory. Can you send it to me? I will look at your patches and then integrated with my patches to look what will happen tomorrow. Do you have test case share? or ideas how to show the benefit. I have done many test for my pathes. It show some benefit big or small in various cases, but there is no negative effect showed at least. I have two onviced test cases to show the great benefit 1. turbostat v1 (before 3.5) 2. I write the simple test application which also show greate benefit. running it by #./idle_predict -l 8 I write a simple application using usleep which it is clear to the repeat mode prediction failure will greatly effect the application with such repeat pattern. --- #include #include #include #include #include #include #include volatile int * shutdown; volatile long * count; int delay = 20; int loop = 8; void usage(void) { fprintf(stderr, "Usage: idle_predict [options]\n" " --help -h Print this help\n" " --thread -n Thread number\n" " --loop -l Loop times in shallow Cstate\n" " --delay -t Sleep time (uS)in shallow Cstate\n"); } void *simple_loop() { int idle_num = 1; while (*shutdown) { *count = *count + 1; if (idle_num % loop) usleep(delay); else { /* sleep 1 second */ usleep(100); idle_num = 0; } idle_num++; } } static void sighand(int sig) { *shutdown = 0; } int main(int argc, char *argv[]) { sigset_t sigset; int signum = SIGALRM; int i, c, er = 0, thread_num = 8; pthread_t pt[1024]; static char optstr[] = "n:l:t:h:"; while ((c = getopt(argc, argv, optstr)) != EOF) switch (c) { case 'n': thread_num = atoi(optarg); break; case 'l': loop = atoi(optarg); break; case 't': delay = atoi(optarg); break; case 'h': default: usage(); exit(1); } printf("thread=%d,loop=%d,delay=%d\n",thread_num,loop,delay); count = malloc(sizeof(long)); shutdown = malloc(sizeof(int)); *count = 0; *shutdown = 1; sigemptyset(&sigset); sigaddset(&sigset, signum); sigprocmask (SIG_BLOCK, &sigset, NULL); signal(SIGINT, sighand); signal(SIGTERM, sighand); for(i = 0; i < thread_num ; i++) pthread_create(&pt[i], NULL, simple_loop, NULL); for (i = 0; i < thread_num; i++) pthread_join(pt[i], NULL); exit(0); } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: KS/Plumbers: c-state governor BOF
> After talking about my RFC patches to the c-state governor with > Matthew and Arjan, it is clear that the whole concept of how > things are done could use some more discussion. > > Since a good number of us will be in San Diego next week, at > Kernel Summit / Plumbers / etc, I will organize a c-state > governor BOF for those who are interested. > > Things to think about: > - what should the c-state governor do? > - how to best predict the future? > - what kinds of odd workloads do we need to accomodate? Hi Rik, Just notice there is a topic to discuss menu governor at Kernel Summit. Acutally, I have posted a patchset to at May 11 2012 to bring up the topic, at that time, I only have a convinced and proved application turbostat v1 to prove that my patch are useful. I try to find other workloads to prove that the patchset are also solidated useful. But I stucked in other high priority tasks, so I move slow on it. >From you bring up the issue I guess that you already has real workload to show this issue. My patchset is not only improve repeat mode failure but also improve general prediction failure. Let's have a discuss and talk about it. Here is the patchset posted at May 11 2012. http://lwn.net/Articles/496919/ "x86,idle: Enhance cpuidle prediction to handle its failure" http://lkml.indiana.edu/hypermail/linux/kernel/1205.1/02267.html "[PATCH 1/3] x86,idle: Quickly notice prediction failure for repeat mode" http://lkml.indiana.edu/hypermail/linux/kernel/1205.1/02268.html "[PATCH 2/3] x86,idle: Quickly notice prediction failure in general case" http://lkml.indiana.edu/hypermail/linux/kernel/1205.1/02269.html "[PATCH 3/3] x86,idle: Set residency to 0 if target Cstate not really enter" Thanks -Youquan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/