Using ps to display process information never exit, and can't be killed
Sorry to use that big mail list account since I don't know any specific mail list account should be used for that problem. We're running Linux box on Gx platform from Tilera. The kernel use some vendor specific patches, but most of them are the same as standard kernel. We encounter a problem occasionally, that I'm trying to resolve it. But while I used 'ps' to get process information, the new launched ps print out nothing and can't exit, ^C doesn't work. I find out its pid under /proc, and it's in RUNNING state: # cat status Name: ps State: R (running) Tgid: 1298 Pid:1298 PPid: 1 TracerPid: 0 Uid:0 0 0 0 Gid:0 0 0 0 FDSize: 64 Groups: 0 1 2 3 4 6 10 489 VmPeak: 3776 kB VmSize: 3712 kB VmLck: 0 kB VmHWM: 2624 kB VmRSS: 2624 kB VmData: 832 kB VmStk: 256 kB VmExe: 192 kB VmLib: 2176 kB VmPTE: 6 kB VmSwap:0 kB Threads:1 SigQ: 7/8113 SigPnd: 0100 ShdPnd: 000a0103 SigBlk: SigIgn: 0004 SigCgt: 73d3fef9 CapInh: CapPrm: CapEff: CapBnd: Cpus_allowed: f, Cpus_allowed_list: 0-35 Mems_allowed: 3 Mems_allowed_list: 0-1 voluntary_ctxt_switches:1 nonvoluntary_ctxt_switches: 0 And it can't be killed even using SIGKILL. Since it's under *RUNNING* status, its stack can't be dumped. Is there any exist mechanism can be used to get it stack, or other information, to help me figure out what's the cause of ps pend on *RUNNING*? System information: # uname -a Linux localhost 2.6.38.8-MDE-4.0.0.141101 #7 SMP Fri Sep 28 21:46:08 CST 2012 tilegx GNU/Linux Best regards. -- Cyberman Wu -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/3] dmaengine: dw_dmac: Enhance device tree support
On Fri, Oct 12, 2012 at 11:14 AM, Viresh Kumar wrote: > dw_dmac driver already supports device tree but it used to have its platform > data passed the non-DT way. > > This patch does following changes: > - pass platform data via DT, non-DT way still takes precedence if both are > used. > - create generic filter routine > - Earlier slave information was made available by slave specific filter > routines > in chan->private field. Now, this information would be passed from within > dmac > DT node. Slave drivers would now be required to pass bus_id (a string) as > parameter to this generic filter(), which would be compared against the > slave > data passed from DT, by the generic filter routine. > - Update binding document DT parsing of this patch can be tested with following non-official patch :) dmaengine: dw_dmac: Add dt params debug routine Signed-off-by: Viresh Kumar --- drivers/dma/dw_dmac.c | 42 +- 1 file changed, 41 insertions(+), 1 deletion(-) diff --git a/drivers/dma/dw_dmac.c b/drivers/dma/dw_dmac.c index 05c1dff..569914d 100644 --- a/drivers/dma/dw_dmac.c +++ b/drivers/dma/dw_dmac.c @@ -1504,6 +1504,43 @@ static inline int dw_dma_parse_dt(struct platform_device *pdev) } #endif +static void dw_dma_parse_dt_debug(struct dw_dma_platform_data *pdata) +{ + int i = -1; + + if (!pdata) { + printk(KERN_ERR "dw_dma: unable to read info from DT\n"); + return; + } + + printk(KERN_ERR "\nPrinting dw_dma DT info\n"); + + printk(KERN_ERR "nr_channels: %x\n", pdata->nr_channels); + printk(KERN_ERR "is_private: %x\n", pdata->is_private); + printk(KERN_ERR "chan_allocation_order: %x\n", + pdata->chan_allocation_order); + + printk(KERN_ERR "chan_priority: %x\n", pdata->chan_priority); + printk(KERN_ERR "block_size: %x\n", pdata->block_size); + + printk(KERN_ERR "nr_masters: %x\n", pdata->nr_masters); + printk(KERN_ERR "data_width: %d %d %d %d\n", pdata->data_width[0], + pdata->data_width[1], pdata->data_width[2], + pdata->data_width[3]); + + /* parse slave data */ + printk(KERN_ERR "slave_info\n"); + + while (++i < pdata->sd_count) { + printk(KERN_INFO "bus_id: %s\n", pdata->sd[i].bus_id); + printk(KERN_INFO "cfg_hi: %x\n", pdata->sd[i].cfg_hi); + printk(KERN_INFO "cfg_lo: %x\n", pdata->sd[i].cfg_lo); + printk(KERN_INFO "src_master: %x\n", + pdata->sd[i].src_master); + printk(KERN_INFO "dst_master: %x\n", + pdata->sd[i].dst_master); + } +} static int __devinit dw_probe(struct platform_device *pdev) { struct dw_dma_platform_data *pdata; @@ -1515,9 +1552,12 @@ static int __devinit dw_probe(struct platform_device *pdev) int i; pdata = dev_get_platdata(>dev); - if (!pdata) + if (!pdata) { pdata = dw_dma_parse_dt(pdev); + dw_dma_parse_dt_debug(pdata); + } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/3] dmaengine: dw_dmac: Enhance device tree support
dw_dmac driver already supports device tree but it used to have its platform data passed the non-DT way. This patch does following changes: - pass platform data via DT, non-DT way still takes precedence if both are used. - create generic filter routine - Earlier slave information was made available by slave specific filter routines in chan->private field. Now, this information would be passed from within dmac DT node. Slave drivers would now be required to pass bus_id (a string) as parameter to this generic filter(), which would be compared against the slave data passed from DT, by the generic filter routine. - Update binding document Signed-off-by: Viresh Kumar --- Documentation/devicetree/bindings/dma/snps-dma.txt | 44 ++ drivers/dma/dw_dmac.c | 147 + drivers/dma/dw_dmac_regs.h | 4 + include/linux/dw_dmac.h| 43 +++--- 4 files changed, 221 insertions(+), 17 deletions(-) diff --git a/Documentation/devicetree/bindings/dma/snps-dma.txt b/Documentation/devicetree/bindings/dma/snps-dma.txt index c0d85db..5bb3dfb 100644 --- a/Documentation/devicetree/bindings/dma/snps-dma.txt +++ b/Documentation/devicetree/bindings/dma/snps-dma.txt @@ -6,6 +6,26 @@ Required properties: - interrupt-parent: Should be the phandle for the interrupt controller that services interrupts for this device - interrupt: Should contain the DMAC interrupt number +- nr_channels: Number of channels supported by hardware +- is_private: The device channels should be marked as private and not for by the + general purpose DMA channel allocator. False if not passed. +- chan_allocation_order: order of allocation of channel, 0 (default): ascending, + 1: descending +- chan_priority: priority of channels. 0 (default): increase from chan 0->n, 1: + increase from chan n->0 +- block_size: Maximum block size supported by the controller +- nr_masters: Number of AHB masters supported by the controller +- data_width: Maximum data width supported by hardware per AHB master + (0 - 8bits, 1 - 16bits, ..., 5 - 256bits) +- slave_info: + - bus_id: name of this device channel, not just a device name since + devices may have more than one channel e.g. "foo_tx". For using the + dw_generic_filter(), slave drivers must pass exactly this string as + param to filter function. + - cfg_hi: Platform-specific initializer for the CFG_HI register + - cfg_lo: Platform-specific initializer for the CFG_LO register + - src_master: src master for transfers on allocated channel. + - dst_master: dest master for transfers on allocated channel. Example: @@ -14,4 +34,28 @@ Example: reg = <0xfc00 0x1000>; interrupt-parent = <>; interrupts = <12>; + + nr_channels = <8>; + chan_allocation_order = <1>; + chan_priority = <1>; + block_size = <0xfff>; + nr_masters = <2>; + data_width = <3 3 0 0>; + + slave_info { + uart0-tx { + bus_id = "uart0-tx"; + cfg_hi = <0x4000>; /* 0x8 << 11 */ + cfg_lo = <0>; + src_master = <0>; + dst_master = <1>; + }; + spi0-tx { + bus_id = "spi0-tx"; + cfg_hi = <0x2000>; /* 0x4 << 11 */ + cfg_lo = <0>; + src_master = <0>; + dst_master = <0>; + }; + }; }; diff --git a/drivers/dma/dw_dmac.c b/drivers/dma/dw_dmac.c index c4b0eb3..9a7d084 100644 --- a/drivers/dma/dw_dmac.c +++ b/drivers/dma/dw_dmac.c @@ -1179,6 +1179,58 @@ static void dwc_free_chan_resources(struct dma_chan *chan) dev_vdbg(chan2dev(chan), "%s: done\n", __func__); } +bool dw_generic_filter(struct dma_chan *chan, void *param) +{ + struct dw_dma *dw = to_dw_dma(chan->device); + static struct dw_dma *last_dw; + static char *last_bus_id; + int found = 0, i = -1; + + /* +* dmaengine framework calls this routine for all channels of all dma +* controller, until true is returned. If 'param' bus_id is not +* registered with a dma controller (dw), then there is no need of +* running below function for all channels of dw. +* +* This block of code does this by saving the parameters of last +* failure. If dw and param are same, i.e. trying on same dw with +* different channel, return false. +*/ + if (last_dw) { + if ((last_bus_id == param) && (last_dw == dw)) + return false; + } + + /* +*
[PATCH 3/3] ARM: SPEAr13xx: Pass DW DMAC platform data from DT
This patch adds dw_dmac's platform data to DT node. It also creates slave info node for SPEAr13xx, for the devices which were using dw_dmac. Signed-off-by: Viresh Kumar --- arch/arm/boot/dts/spear1340.dtsi | 19 ++ arch/arm/boot/dts/spear13xx.dtsi | 38 arch/arm/mach-spear13xx/include/mach/spear.h | 2 -- arch/arm/mach-spear13xx/spear1310.c | 4 +-- arch/arm/mach-spear13xx/spear1340.c | 27 +++--- arch/arm/mach-spear13xx/spear13xx.c | 54 ++-- 6 files changed, 65 insertions(+), 79 deletions(-) diff --git a/arch/arm/boot/dts/spear1340.dtsi b/arch/arm/boot/dts/spear1340.dtsi index d71fe2a..8ea3f66 100644 --- a/arch/arm/boot/dts/spear1340.dtsi +++ b/arch/arm/boot/dts/spear1340.dtsi @@ -24,6 +24,25 @@ status = "disabled"; }; + dma@ea80 { + slave_info { + uart1_tx { + bus_id = "uart1_tx"; + cfg_hi = <0x6000>; /* 0xC << 11 */ + cfg_lo = <0>; + src_master = <0>; + dst_master = <1>; + }; + uart1_tx { + bus_id = "uart1_tx"; + cfg_hi = <0x680>; /* 0xD << 7 */ + cfg_lo = <0>; + src_master = <1>; + dst_master = <0>; + }; + }; + }; + spi1: spi@5d40 { compatible = "arm,pl022", "arm,primecell"; reg = <0x5d40 0x1000>; diff --git a/arch/arm/boot/dts/spear13xx.dtsi b/arch/arm/boot/dts/spear13xx.dtsi index f7b84ac..f06bb50 100644 --- a/arch/arm/boot/dts/spear13xx.dtsi +++ b/arch/arm/boot/dts/spear13xx.dtsi @@ -91,6 +91,37 @@ reg = <0xea80 0x1000>; interrupts = <0 19 0x4>; status = "disabled"; + + nr_channels = <8>; + chan_allocation_order = <1>; + chan_priority = <1>; + block_size = <0xfff>; + nr_masters = <2>; + data_width = <3 3 0 0>; + + slave_info { + ssp0_tx { + bus_id = "ssp0_tx"; + cfg_hi = <0x2000>; /* 0x4 << 11 */ + cfg_lo = <0>; + src_master = <0>; + dst_master = <0>; + }; + ssp0_rx { + bus_id = "ssp0_rx"; + cfg_hi = <0x280>; /* 0x5 << 7 */ + cfg_lo = <0>; + src_master = <0>; + dst_master = <0>; + }; + cf { + bus_id = "cf"; + cfg_hi = <0>; + cfg_lo = <0>; + src_master = <0>; + dst_master = <0>; + }; + }; }; dma@eb00 { @@ -98,6 +129,13 @@ reg = <0xeb00 0x1000>; interrupts = <0 59 0x4>; status = "disabled"; + + nr_channels = <8>; + chan_allocation_order = <1>; + chan_priority = <1>; + block_size = <0xfff>; + nr_masters = <2>; + data_width = <3 3 0 0>; }; fsmc: flash@b000 { diff --git a/arch/arm/mach-spear13xx/include/mach/spear.h b/arch/arm/mach-spear13xx/include/mach/spear.h index 07d90ac..71bf5b6 100644 --- a/arch/arm/mach-spear13xx/include/mach/spear.h +++ b/arch/arm/mach-spear13xx/include/mach/spear.h @@ -43,8 +43,6 @@ #define VA_L2CC_BASE IOMEM(UL(0xFB00)) /* others */ -#define DMAC0_BASE UL(0xEA80) -#define DMAC1_BASE UL(0xEB00) #define MCIF_CF_BASE UL(0xB280) /* Devices present in SPEAr1310 */ diff --git a/arch/arm/mach-spear13xx/spear1310.c b/arch/arm/mach-spear13xx/spear1310.c index 9fbbfc5..0e60195
[PATCH 1/3] dmaengine: dw_dmac: Update documentation style comments for dw_dma_platform_data
Documentation style comments were missing for few fields in struct dw_dma_platform_data. Add these. Signed-off-by: Viresh Kumar --- include/linux/dw_dmac.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/linux/dw_dmac.h b/include/linux/dw_dmac.h index e1c8c9e..62a6190 100644 --- a/include/linux/dw_dmac.h +++ b/include/linux/dw_dmac.h @@ -19,6 +19,8 @@ * @nr_channels: Number of channels supported by hardware (max 8) * @is_private: The device channels should be marked as private and not for * by the general purpose DMA channel allocator. + * @chan_allocation_order: Allocate channels starting from 0 or 7 + * @chan_priority: Set channel priority increasing from 0 to 7 or 7 to 0. * @block_size: Maximum block size supported by the controller * @nr_masters: Number of AHB masters supported by the controller * @data_width: Maximum data width supported by hardware per AHB master -- 1.7.12.rc2.18.g61b472e -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] signal.git, pile 2 (was Re: [RFC][CFT][CFReview] execve and kernel_thread unification work)
On Fri, Oct 12, 2012 at 02:09:58AM +0100, Al Viro wrote: > > How granular are you planning to make that? I mean, we are talking about > 3 objects here - init/main.o, kernel/kthread.o and kernel/kmod.o. Do they > get TOC separate from that of arch/powerpc/kernel/entry_64.o? Potentially, yes, it would be up to the linker. > Anyway, if ppc folks can live with that stuff in its current form for now, Yes, we can. Paul. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 2/2] sched:Pick the apt busy sched group during load balancing
Hi everyone, The figures SCHED_GRP1:3200 and SCHED_GRP2:1156 shown below in the changelog is the probable figure as calculated with the per-entity- load-tracking metric for the runqueue load. > If a sched group has passed the test for sufficient load in > update_sg_lb_stats,to qualify for load balancing,then PJT's > metrics has to be used to qualify the right sched group as the busiest group. > > The scenario which led to this patch is shown below: > Consider Task1 and Task2 to be a long running task > and Tasks 3,4,5,6 to be short running tasks > > Task3 > Task4 > Task1 Task5 > Task2 Task6 > ---- > SCHED_GRP1SCHED_GRP2 > > Normal load calculator would qualify SCHED_GRP2 as > the candidate for sd->busiest due to the following loads > that it calculates. > > SCHED_GRP1:2048 > SCHED_GRP2:4096 > > Load calculator would probably qualify SCHED_GRP1 as the candidate > for sd->busiest due to the following loads that it calculates > > SCHED_GRP1:3200 > SCHED_GRP2:1156 > Regards Preeti -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/4] module: add syscall to load module from fd
Rusty, On Fri, Oct 12, 2012 at 12:16 AM, Rusty Russell wrote: > "H. Peter Anvin" writes: > >> On 10/10/2012 06:03 AM, Michael Kerrisk (man-pages) wrote: >>> Good point. A "whole hog" openat()-style interface is worth thinking about >>> too. >> >> *Although* you could argue that you can always simply open the module >> file first, and that finit_module() is really what we should have had in >> the first place. Then you don't need the flags since those would come >> from openat(). > > There's no fundamental reason that modules have to be in a file. I'm > thinking of compressed modules, or an initrd which simply includes all > the modules it wants to load in one linear file. > > Also, --force options manipulate the module before loading (as did the > now-obsolete module rename option). Sure. But my point that started this subthread was: should we take the opportunity now to add a 'flags' argument to the new finit_module() system call, so as to allow flexibility in extending the behavior in future? There have been so many cases of revised system calls in the past few years that replaced calls without a 'flags' argument that it seems worth at least some thought before the API is cast in stone. Thanks, Michael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 01/06] input/rmi4: Public header and documentation
On Thu, Oct 11, 2012 at 03:56:22AM +, Christopher Heiny wrote: Fix your mailer to word wrap within paragraphs. > If this feature is a deal-breaker, we can take it out. In the absence > of a generic GPIO implementation for CS, though, I'd much rather leave > it in. Once generic GPIO CS arrives, we'll remove it pretty quickly. Why not just implement this at an appropriate level in the SPI subsystem? One of the great things about Linux is that you can change the core code... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
linux-next: build warnings after merge of the tip tree
Hi all, After merging the tip tree, today's linux-next build (powerpc allnoconfig) produced these warnings: kernel/sched/fair.c:801:22: warning: 'task_h_load' declared 'static' but never defined [-Wunused-function] kernel/sched/fair.c:1013:13: warning: 'account_offnode_enqueue' defined but not used [-Wunused-function] Introduced by commit 4ae834f767c5 ("sched/numa: Implement NUMA home-node selection code"). -- Cheers, Stephen Rothwells...@canb.auug.org.au pgpdSL7ckyJCR.pgp Description: PGP signature
linux-next: build warning after merge of the vfs tree
Hi Al, After merging the vfs tree, today's linux-next build (powerpc allnoconfig) produced this warning: fs/namespace.c: In function 'do_mount': fs/namespace.c:2219:8: warning: passing argument 3 of 'security_sb_mount' discards 'const' qualifier from pointer target type [enabled by default] include/linux/security.h:1967:19: note: expected 'char *' but argument is of type 'const char *' Introduced by commit 5804bc88667e ("consitify do_mount() arguments"). The prototype of the !CONFIG_SECURITY version was not completely updated. -- Cheers, Stephen Rothwells...@canb.auug.org.au pgpRkf0gffVul.pgp Description: PGP signature
Re: [PATCH v6 1/6] tracing,x86: Add a TSC trace_clock
On Fri, Oct 12, 2012 at 1:27 AM, David Sharp wrote: > +#include Please use the Kbuild infrastructure ("generic-y += ..." in arch/*/include/asm/Kbuild) instead of adding wrappers around the asm-generic version. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] efivarfs: Implement exclusive access for {get,set}_variable
Hi Greg, Should this be backported to the stable kernels? No, the efivarfs code that this touches was only recently committed; it won't be in any of the stable series. Cheers, Jeremy -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 07/10] thp: implement splitting pmd for huge zero page
On 10/12/2012 12:13 PM, Kirill A. Shutemov wrote: On Fri, Oct 12, 2012 at 11:23:37AM +0800, Ni zhan Chen wrote: On 10/02/2012 11:19 PM, Kirill A. Shutemov wrote: From: "Kirill A. Shutemov" We can't split huge zero page itself, but we can split the pmd which points to it. On splitting the pmd we create a table with all ptes set to normal zero page. Signed-off-by: Kirill A. Shutemov Reviewed-by: Andrea Arcangeli --- mm/huge_memory.c | 32 1 files changed, 32 insertions(+), 0 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 95032d3..3f1c59c 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1600,6 +1600,7 @@ int split_huge_page(struct page *page) struct anon_vma *anon_vma; int ret = 1; + BUG_ON(is_huge_zero_pfn(page_to_pfn(page))); BUG_ON(!PageAnon(page)); anon_vma = page_lock_anon_vma(page); if (!anon_vma) @@ -2503,6 +2504,32 @@ static int khugepaged(void *none) return 0; } +static void __split_huge_zero_page_pmd(struct vm_area_struct *vma, + unsigned long haddr, pmd_t *pmd) +{ + pgtable_t pgtable; + pmd_t _pmd; + int i; + + pmdp_clear_flush_notify(vma, haddr, pmd); why I can't find function pmdp_clear_flush_notify in kernel source code? Do you mean pmdp_clear_flush_young_notify or something like that? It was changed recently. See commit 2ec74c3 mm: move all mmu notifier invocations to be done outside the PT lock Oh, thanks! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 2/2] sched:Pick the apt busy sched group during load balancing
If a sched group has passed the test for sufficient load in update_sg_lb_stats,to qualify for load balancing,then PJT's metrics has to be used to qualify the right sched group as the busiest group. The scenario which led to this patch is shown below: Consider Task1 and Task2 to be a long running task and Tasks 3,4,5,6 to be short running tasks Task3 Task4 Task1 Task5 Task2 Task6 -- -- SCHED_GRP1 SCHED_GRP2 Normal load calculator would qualify SCHED_GRP2 as the candidate for sd->busiest due to the following loads that it calculates. SCHED_GRP1:2048 SCHED_GRP2:4096 Load calculator would probably qualify SCHED_GRP1 as the candidate for sd->busiest due to the following loads that it calculates SCHED_GRP1:3200 SCHED_GRP2:1156 This patch aims to strike a balance between the loads of the group and the number of tasks running on the group to decide the busiest group in the sched_domain. This means we will need to use the PJT's metrics but with an additional constraint. Signed-off-by: Preeti U Murthy --- kernel/sched/fair.c | 22 +++--- 1 file changed, 19 insertions(+), 3 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index dd0fb28..d45b7b4 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -165,7 +165,8 @@ void sched_init_granularity(void) #else # define WMULT_CONST (1UL << 32) #endif - +#define NR_THRESHOLD 2 +#define LOAD_THRESHOLD 1 #define WMULT_SHIFT32 /* @@ -4169,6 +4170,7 @@ struct sd_lb_stats { /* Statistics of the busiest group */ unsigned int busiest_idle_cpus; unsigned long max_load; + u64 max_sg_load; /* Equivalent of max_load but calculated using pjt's metric*/ unsigned long busiest_load_per_task; unsigned long busiest_nr_running; unsigned long busiest_group_capacity; @@ -4628,8 +4630,21 @@ static bool update_sd_pick_busiest(struct lb_env *env, struct sched_group *sg, struct sg_lb_stats *sgs) { - if (sgs->avg_load <= sds->max_load) - return false; + /* Use PJT's metrics to qualify a sched_group as busy +* But a low load sched group may be queueing up many tasks +* +* So before dismissing a sched group with lesser load,ensure +* that the number of processes on it is checked if it is +* not too less loaded than the max load so far +*/ + if (sgs->avg_cfs_runnable_load <= sds->max_sg_load) { + if (sgs->avg_cfs_runnable_load > LOAD_THRESHOLD * sds->max_sg_load) { + if (sgs->sum_nr_running <= (NR_THRESHOLD + sds->busiest_nr_running)) + return false; + } else { + return false; + } + } if (sgs->sum_nr_running > sgs->group_capacity) return true; @@ -4708,6 +4723,7 @@ static inline void update_sd_lb_stats(struct lb_env *env, sds->this_idle_cpus = sgs.idle_cpus; } else if (update_sd_pick_busiest(env, sds, sg, )) { sds->max_load = sgs.avg_load; + sds->max_sg_load = sgs.avg_cfs_runnable_load; sds->busiest = sg; sds->busiest_nr_running = sgs.sum_nr_running; sds->busiest_idle_cpus = sgs.idle_cpus; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 1/2] sched:Prevent movement of short running tasks during load balancing
Prevent sched groups with low load as tracked by PJT's metrics from being candidates of the load balance routine.This metric is chosen to be 1024+15%*1024.But using PJT's metrics it has been observed that even when three 10% tasks are running,the load sometimes does not exceed this threshold.The call should be taken if the tasks can afford to be throttled. This is why an additional metric has been included,which can determine how long we can tolerate tasks not being moved even if the load is low. Signed-off-by: Preeti U Murthy --- kernel/sched/fair.c | 16 1 file changed, 16 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index dbddcf6..dd0fb28 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4188,6 +4188,7 @@ struct sd_lb_stats { */ struct sg_lb_stats { unsigned long avg_load; /*Avg load across the CPUs of the group */ + u64 avg_cfs_runnable_load; /* Equivalent of avg_load but calculated using pjt's metric */ unsigned long group_load; /* Total load over the CPUs of the group */ unsigned long sum_nr_running; /* Nr tasks running in the group */ unsigned long sum_weighted_load; /* Weighted load of group's tasks */ @@ -4504,6 +4505,7 @@ static inline void update_sg_lb_stats(struct lb_env *env, unsigned long load, max_cpu_load, min_cpu_load; unsigned int balance_cpu = -1, first_idle_cpu = 0; unsigned long avg_load_per_task = 0; + u64 group_load = 0; /* computed using PJT's metric */ int i; if (local_group) @@ -4548,6 +4550,7 @@ static inline void update_sg_lb_stats(struct lb_env *env, if (idle_cpu(i)) sgs->idle_cpus++; + group_load += cpu_rq(i)->cfs.runnable_load_avg; update_sg_numa_stats(sgs, rq); } @@ -4572,6 +4575,19 @@ static inline void update_sg_lb_stats(struct lb_env *env, sgs->avg_load = (sgs->group_load*SCHED_POWER_SCALE) / group->sgp->power; /* +* Check if the sched group has not crossed the threshold. +* +* Also check if the sched_group although being within the threshold,is not +* queueing too many tasks.If yes to both,then make it an +* invalid candidate for load balancing +* +* The below condition is included as a tunable to meet performance and power needs +*/ + sgs->avg_cfs_runnable_load = (group_load * SCHED_POWER_SCALE) / group->sgp->power; + if (sgs->avg_cfs_runnable_load <= 1178 && sgs->sum_nr_running <= 2) + sgs->avg_cfs_runnable_load = 0; + + /* * Consider the group unbalanced when the imbalance is larger * than the average weight of a task. * -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 0/2] sched: Load Balancing using Per-entity-Load-tracking
Hi everyone, This patchset uses the per-entity-load-tracking patchset which will soon be available in the kernel.It is based on the tip/master tree and the first 8 latest patches of sched:per-entity-load-tracking alone have been imported to the tree to avoid the complexities of task groups and to hold back the optimizations of this patch for now. This patchset is an attempt to begin the integration of Per-entity-load- metric for the cfs_rq,henceforth referred to as PJT's metric,with the load balancer in a step wise fashion,and progress based on the consequences. The following issues have been considered towards this: [NOTE:an x% task referred to in the logs and below is calculated over a duty cycle of 10ms.] 1.Consider a scenario,where there are two 10% tasks running on a cpu.The present code will consider the load on this queue to be 2048,while using PJT's metric the load is calculated to be <1000,rarely exceeding this limit.Although the tasks are not contributing much to the cpu load,they are decided to be moved by the scheduler. But one could argue that 'not moving one of these tasks could throttle them.If there was an idle cpu,perhaps we could have moved them'.While the power save mode would have been fine with not moving the task,the performance mode would prefer not to throttle the tasks.We could strive to strike a balance by making this decision tunable with certain parameters. This patchset includes such tunables.This issue is addressed in Patch[1/2]. 2.We need to be able to do this cautiously,as the scheduler code is too complex.This patchset is an attempt to begin the integration of PJT's metric with the load balancer in a step wise fashion,and progress based on the consequences. I dont intend to vary the parameters used by the load balancer.Some parameters are however included anew to make decisions about including a sched group as a candidate for load balancing. This patchset therefore has two primary aims. Patch[1/2]: This patch aims at detecting short running tasks and prevent their movement.In update_sg_lb_stats,dismiss a sched group as a candidate for load balancing,if load calculated by PJT's metric says that the average load on the sched_group <= 1024+(.15*1024). This is a tunable,which can be varied after sufficient experiments. Patch[2/2]:In the current scheduler greater load would be analogous to more number of tasks.Therefore when the busiest group is picked from the sched domain in update_sd_lb_stats,only the loads of the groups are compared between them.If we were to use PJT's metric,a higher load does not necessarily mean more number of tasks.This patch addresses this issue. 3.The next step towards integration should be in using the PJT's metric for comparison between the loads of the busy sched group and the sched group which has to pull the tasks,which happens in find_busiest_group. --- Preeti U Murthy (2): sched:Prevent movement of short running tasks during load balancing sched:Pick the apt busy sched group during load balancing kernel/sched/fair.c | 38 +++--- 1 file changed, 35 insertions(+), 3 deletions(-) -- Regards, Preeti U Murthy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] uprobe: fix misleading log entry
> On Wed, Jul 18, 2012 at 5:22 PM, Srikar Dronamraju > wrote: > > * Jovi Zhang [2012-07-18 11:08:42]: > > > >> From 68232ef2decae95b807f2f3763e8ea99c1a3b2ae Mon Sep 17 00:00:00 2001 > >> From: Jovi Zhang > >> Date: Wed, 18 Jul 2012 17:51:26 +0800 > >> Subject: [PATCH] uprobe: fix misleading log entry > >> > >> There don't have any 'r' prefix in uprobe event naming, remove it. > >> > >> Signed-off-by: Jovi Zhang > >> --- > >> kernel/trace/trace_uprobe.c |2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c > >> index cf382de..852a584 100644 > >> --- a/kernel/trace/trace_uprobe.c > >> +++ b/kernel/trace/trace_uprobe.c > >> @@ -191,7 +191,7 @@ static int create_trace_uprobe(int argc, char **argv) > >> if (argv[0][0] == '-') > >> is_delete = true; > >> else if (argv[0][0] != 'p') { > >> - pr_info("Probe definition must be started with 'p', 'r' or" > >> " '-'.\n"); > >> + pr_info("Probe definition must be started with 'p' or > >> '-'.\n"); > >> return -EINVAL; > >> } > >> > > > > Yes, uprobes doesnt support return probes. So we should not have > > mentioned about r. > > > > Acked-by: Srikar Dronamraju > > > Ingo/Andrew Can you please pick this. -- Thanks and Regards Srikar -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] efivarfs: Implement exclusive access for {get,set}_variable
On Thu, Oct 11, 2012 at 02:42:36PM +0100, Matt Fleming wrote: > On Thu, 2012-10-11 at 21:19 +0800, Jeremy Kerr wrote: > > Currently, efivarfs does not enforce exclusion over the get_variable and > > set_variable operations. Section 7.1 of UEFI requires us to only allow a > > single processor to enter {get,set}_variable services at once. > > > > This change acquires the efivars->lock over calls to these operations > > from the efivarfs paths. > > > > Signed-off-by: Jeremy Kerr > > > > --- > > drivers/firmware/efivars.c | 68 +++-- > > 1 file changed, 43 insertions(+), 25 deletions(-) > > Thanks, applied to 'next'. Should this be backported to the stable kernels? thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] backlight: lm3639: Return proper error in lm3639_bled_mode_store error paths
2012년 10월 11일 14:11, Axel Lin 쓴 글: Signed-off-by: Axel Lin --- drivers/video/backlight/lm3639_bl.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/video/backlight/lm3639_bl.c b/drivers/video/backlight/lm3639_bl.c index c6915c6..585949b 100644 --- a/drivers/video/backlight/lm3639_bl.c +++ b/drivers/video/backlight/lm3639_bl.c @@ -206,11 +206,11 @@ static ssize_t lm3639_bled_mode_store(struct device *dev, out: dev_err(pchip->dev, "%s:i2c access fail to register\n", __func__); - return size; + return ret; out_input: dev_err(pchip->dev, "%s:input conversion fail\n", __func__); - return size; + return ret; } Thank you Alex for fixing code. Acked-by: G.Shark Jeong -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 0/4] DMA: PL330: Fix mem leaks and balance probe/remove
Hello, On 5 October 2012 06:17, Inderpal Singh wrote: > The first 2 patches of this series fix memory leaks because the memory > allocated for peripheral channels and DMA descriptors were not getting > freed. > > The last 2 patches balance the module's remove function. > > This series depends on "61c6e7531d3b66b3 DMA: PL330: Check the > pointer returned by kzalloc" which is on slave-dma's "fixes" branch. > Hence slave-dma tree's "next" branch was merged with "fixes" and > applied patch at [1] to fix the build error. > > [1] http://permalink.gmane.org/gmane.linux.kernel.next/24274 > > Changes since v1: > - Protect only list_add_tail with spin_locks > - Return EBUSY from remove if channel is in use > - unregister dma_device in remove > > Inderpal Singh (4): > DMA: PL330: Free memory allocated for peripheral channels > DMA: PL330: Change allocation method to properly free DMA > descriptors > DMA: PL330: Balance module remove function with probe > DMA: PL330: unregister dma_device in module's remove function > > drivers/dma/pl330.c | 53 > --- > 1 file changed, 38 insertions(+), 15 deletions(-) > Any comments on this v2 version of the patchset? Thanks, Inder > -- > 1.7.9.5 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] acpi : acpi_bus_trim() stops removing devices when failing to remove the device
Hi Toshi, 2012/10/11 22:58, Toshi Kani wrote: On Thu, 2012-10-11 at 19:12 +0900, Yasuaki Ishimatsu wrote: acpi_bus_trim() stops removing devices, when acpi_bus_remove() return error number. But acpi_bus_remove() cannot return error number correctly. acpi_bus_remove() only return -EINVAL, when dev argument is NULL. Thus even if device cannot be removed correctly, acpi_bus_trim() ignores and continues to remove devices. acpi_bus_hot_remove_device() uses acpi_bus_trim() for removing devices. Therefore acpi_bus_hot_remove_device() can send "_EJ0" to firmware, even if the device is running on the system. In this case, the system cannot work well. Vasilis hit the bug at memory hotplug and reported it as follow: https://lkml.org/lkml/2012/9/26/318 So acpi_bus_trim() should check whether device was removed or not correctly. The patch adds error check into some functions to remove the device. Applying the patch, acpi_bus_trim() stops removing devices when failing to remove the device. But I think there is no impact with the exceptionof CPU and Memory hotplug path. Because other device also fails but the fail is an irregular case like device is NULL. v1->v2 - add a rollback for reinstalling a notify handler. Signed-off-by: Yasuaki Ishimatsu Thanks for the update. Looks good. Reviewed-by: Toshi Kani Thank you for reviewing. Thanks, Yasauaki Ishimatsu -Toshi --- drivers/acpi/scan.c| 21 ++--- drivers/base/dd.c | 22 +- include/linux/device.h |2 +- 3 files changed, 36 insertions(+), 9 deletions(-) Index: linux-3.6/drivers/acpi/scan.c === --- linux-3.6.orig/drivers/acpi/scan.c 2012-10-11 18:31:40.189019503 +0900 +++ linux-3.6/drivers/acpi/scan.c 2012-10-11 18:42:35.669041641 +0900 @@ -445,18 +445,29 @@ static int acpi_device_remove(struct dev { struct acpi_device *acpi_dev = to_acpi_device(dev); struct acpi_driver *acpi_drv = acpi_dev->driver; + int ret; if (acpi_drv) { if (acpi_drv->ops.notify) acpi_device_remove_notify_handler(acpi_dev); - if (acpi_drv->ops.remove) - acpi_drv->ops.remove(acpi_dev, acpi_dev->removal_type); + if (acpi_drv->ops.remove) { + ret = acpi_drv->ops.remove(acpi_dev, + acpi_dev->removal_type); + if (ret) + goto rollback; + } } acpi_dev->driver = NULL; acpi_dev->driver_data = NULL; put_device(dev); return 0; + +rollback: + if (acpi_drv->ops.notify) + acpi_device_install_notify_handler(acpi_dev); + + return ret; } struct bus_type acpi_bus_type = { @@ -1226,11 +1237,15 @@ static int acpi_device_set_context(struc static int acpi_bus_remove(struct acpi_device *dev, int rmdevice) { + int ret; + if (!dev) return -EINVAL; dev->removal_type = ACPI_BUS_REMOVAL_EJECT; - device_release_driver(>dev); + ret = device_release_driver(>dev); + if (ret) + return ret; if (!rmdevice) return 0; Index: linux-3.6/drivers/base/dd.c === --- linux-3.6.orig/drivers/base/dd.c2012-10-11 18:31:40.191019505 +0900 +++ linux-3.6/drivers/base/dd.c 2012-10-11 18:31:46.873020548 +0900 @@ -475,9 +475,10 @@ EXPORT_SYMBOL_GPL(driver_attach); * __device_release_driver() must be called with @dev lock held. * When called for a USB interface, @dev->parent lock must be held as well. */ -static void __device_release_driver(struct device *dev) +static int __device_release_driver(struct device *dev) { struct device_driver *drv; + int ret = 0; drv = dev->driver; if (drv) { @@ -493,9 +494,11 @@ static void __device_release_driver(stru pm_runtime_put_sync(dev); if (dev->bus && dev->bus->remove) - dev->bus->remove(dev); + ret = dev->bus->remove(dev); else if (drv->remove) - drv->remove(dev); + ret = drv->remove(dev); + if (ret) + goto rollback; devres_release_all(dev); dev->driver = NULL; dev_set_drvdata(dev, NULL); @@ -506,6 +509,12 @@ static void __device_release_driver(stru dev); } + + return ret; + +rollback: + driver_sysfs_add(dev); + return ret; } /** @@ -515,16 +524,19 @@ static void __device_release_driver(stru * Manually detach device from driver. * When called for a USB interface, @dev->parent lock must be held. */ -void
Re: [PATCH v3 07/10] thp: implement splitting pmd for huge zero page
On Fri, Oct 12, 2012 at 11:23:37AM +0800, Ni zhan Chen wrote: > On 10/02/2012 11:19 PM, Kirill A. Shutemov wrote: > >From: "Kirill A. Shutemov" > > > >We can't split huge zero page itself, but we can split the pmd which > >points to it. > > > >On splitting the pmd we create a table with all ptes set to normal zero > >page. > > > >Signed-off-by: Kirill A. Shutemov > >Reviewed-by: Andrea Arcangeli > >--- > > mm/huge_memory.c | 32 > > 1 files changed, 32 insertions(+), 0 deletions(-) > > > >diff --git a/mm/huge_memory.c b/mm/huge_memory.c > >index 95032d3..3f1c59c 100644 > >--- a/mm/huge_memory.c > >+++ b/mm/huge_memory.c > >@@ -1600,6 +1600,7 @@ int split_huge_page(struct page *page) > > struct anon_vma *anon_vma; > > int ret = 1; > >+BUG_ON(is_huge_zero_pfn(page_to_pfn(page))); > > BUG_ON(!PageAnon(page)); > > anon_vma = page_lock_anon_vma(page); > > if (!anon_vma) > >@@ -2503,6 +2504,32 @@ static int khugepaged(void *none) > > return 0; > > } > >+static void __split_huge_zero_page_pmd(struct vm_area_struct *vma, > >+unsigned long haddr, pmd_t *pmd) > >+{ > >+pgtable_t pgtable; > >+pmd_t _pmd; > >+int i; > >+ > >+pmdp_clear_flush_notify(vma, haddr, pmd); > > why I can't find function pmdp_clear_flush_notify in kernel source > code? Do you mean pmdp_clear_flush_young_notify or something like > that? It was changed recently. See commit 2ec74c3 mm: move all mmu notifier invocations to be done outside the PT lock -- Kirill A. Shutemov -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
linux-next: manual merge of the signal tree with the vfs tree
Hi Al, Today's linux-next merge of the signal tree got a conflict in arch/powerpc/kernel/sys_ppc32.c between commit 2a5e5beb88c5 ("vfs: define struct filename and have getname() return it") from the vfs tree and commit be6abfa769fa ("powerpc: switch to generic sys_execve()/ kernel_execve()") from the signal tree. The latter removed removed the function that was modified by the former, so I did that and can carry the fix as necessary (no action is required). -- Cheers, Stephen Rothwells...@canb.auug.org.au pgprL16u9rWqs.pgp Description: PGP signature
linux-next: manual merge of the signal tree with the vfs tree
Hi Al, Today's linux-next merge of the signal tree got a conflict in arch/powerpc/kernel/process.c between commit 2a5e5beb88c5 ("vfs: define struct filename and have getname() return it") from the vfs tree and commit be6abfa769fa ("powerpc: switch to generic sys_execve ()/kernel_execve()") from the signal tree. The latter rewrote the function modified by the former, so I used the latter and can carry the fix as necessary (no action is required). -- Cheers, Stephen Rothwells...@canb.auug.org.au pgpmXS6GlFxAM.pgp Description: PGP signature
linux-next: manual merge of the signal tree with the vfs tree
Hi Al, Today's linux-next merge of the signal tree got a conflict in arch/mn10300/kernel/process.c between commit 2a5e5beb88c5 ("vfs: define struct filename and have getname() return it") from the vfs tree and commit 8f1597e959a3 ("mn10300: switch to generic sys_execve()") from the signal tree. The latter removed the function modified by the former, so I did that and can carry the fix as necessary (no action is required). -- Cheers, Stephen Rothwells...@canb.auug.org.au pgpFd7pouOU5N.pgp Description: PGP signature
linux-next: manual merge of the signal tree with the vfs tree
Hi Al, Today's linux-next merge of the signal tree got a conflict in arch/m68k/kernel/process.c between commit 2a5e5beb88c5 ("vfs: define struct filename and have getname() return it") from the vfs tree and commit d878d6dacee2 ("m68k: switch to generic sys_execve()/kernel_execve()") from the signal tree. The latter removed the function that the latter modified, so I did that and can carry the fix as necessary (no action is required). -- Cheers, Stephen Rothwells...@canb.auug.org.au pgp5RTyAq5xpI.pgp Description: PGP signature
linux-next: manual merge of the signal tree with the vfs tree
Hi Al, Today's linux-next merge of the signal tree got a conflict in arch/frv/kernel/process.c between commit 2a5e5beb88c5 ("vfs: define struct filename and have getname() return it") from the vfs tree and commit 460dabab73f2 ("frv: switch to generic sys_execve()") from the signal tree. The latter removed the function updated by the former, so I just did that and can carry the fix as necessary (no action is required). -- Cheers, Stephen Rothwells...@canb.auug.org.au http://www.canb.auug.org.au/~sfr/ pgpclYGxelHVo.pgp Description: PGP signature
linux-next: manual merge of the signal tree with the vfs tree
Hi Al, Today's linux-next merge of the signal tree got a conflict in arch/c6x/kernel/process.c between commit 2a5e5beb88c5 ("vfs: define struct filename and have getname() return it") from the vfs tree and commit 680a14535c33 ("c6x: switch to generic sys_execve") from the signal tree. The latter removes the function modified by the former so I did that and can carry the fix as necessary (no action is required). -- Cheers, Stephen Rothwells...@canb.auug.org.au pgp5Mf08QFuUc.pgp Description: PGP signature
Re: linux-next: build failure after merge of the l2-mtd tree
Hi Stephen, On Fri, 2012-10-12 at 11:14 +1100, Stephen Rothwell wrote: > Hi Artem, > > After merging the l2-mtd tree, today's linux-next build (x86_64 > allmodconfig) failed like this: > > ERROR: "denali_init" [drivers/mtd/nand/denali_pci.ko] undefined! > ERROR: "denali_remove" [drivers/mtd/nand/denali_pci.ko] undefined! > > Probably caused by commit 305b1ee29c8e ("mtd: denali: split the generic > driver and PCI layer"). > > I have used the l2-mtd tree from next-20121011 for today. Sorry about that. I just sent a patch to fix this. Please disregard the first email, I forgot to include Artem in the email. Thanks, Dinh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH RFC 2/2] [x86] Optimize copy_page by re-arranging instruction sequence and saving register
> > Load and write operation occupy about 35% and 10% respectively for > > most industry benchmarks. Fetched 16-aligned bytes code include about > > 4 instructions, implying 1.34(0.35 * 4) load, 0.4 write. > > Modern CPU support 2 load and 1 write per cycle, so throughput from > > write is bottleneck for memcpy or copy_page, and some slight CPU only > > support one mem operation per cycle. So it is enough to issue one > read > > and write instruction per cycle, and we can save registers. > > So is that also true for AMD CPUs? Although Bulldozer put 32byte instruction into decoupled 16byte entry buffers, it still decode 4 instructions per cycle, so 4 instructions will be fed into execution unit and 2 loads ,1 write will be issued per cycle. Thanks Ling -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] genirq: for edge interrupt IRQS_ONESHOT support with irq thread
In our system, there is one edge interrupt, and we want it to be irq thread with IRQS_ONESHOT, and found in handle_edge_irq(), even with IRQS_ONESHOT, the irq is still unmasked without care of flag IRQS_ONESHOT. It causes IRQS_ONESHOT can not work well for edge interrupt, but also after the irq thread finished with flag IRQS_ONESHOT, the irq will be possible to be unmasked again, it should be messing mask/unmask logic. Signed-off-by: liu chuansheng --- kernel/irq/chip.c |8 +++- 1 files changed, 7 insertions(+), 1 deletions(-) diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c index 57d86d0..f23f524 100644 --- a/kernel/irq/chip.c +++ b/kernel/irq/chip.c @@ -497,7 +497,13 @@ handle_edge_irq(unsigned int irq, struct irq_desc *desc) kstat_incr_irqs_this_cpu(irq, desc); /* Start handling the irq */ - desc->irq_data.chip->irq_ack(>irq_data); + if (desc->istate & IRQS_ONESHOT) { + mask_ack_irq(desc); + handle_irq_event(desc); + cond_unmask_irq(desc); + goto out_unlock; + } else + desc->irq_data.chip->irq_ack(>irq_data); do { if (unlikely(!desc->action)) { -- 1.7.0.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
linux-next: manual merge of the pinctrl tree with Linus' tree
Hi Linus, Today's linux-next merge of the pinctrl tree got a conflict in drivers/mtd/nand/atmel_nand.c between commit 28446acb1f82 ("mtd: atmel nand: fix gpio missing request") from Linus' tree and commit 08695153170c ("MTD: atmel nand: fix gpio missing request") from the pinctrl tree. I fixed it up (see below) and can carry the fix as necessary (no action is required). -- Cheers, Stephen Rothwells...@canb.auug.org.au diff --cc drivers/mtd/nand/atmel_nand.c index 9144557,f48587b..000 --- a/drivers/mtd/nand/atmel_nand.c +++ b/drivers/mtd/nand/atmel_nand.c @@@ -1414,12 -585,19 +1416,20 @@@ static int __init atmel_nand_probe(stru nand_chip->IO_ADDR_W = host->io_base; nand_chip->cmd_ctrl = atmel_nand_cmd_ctrl; + pinctrl = devm_pinctrl_get_select_default(>dev); + if (IS_ERR(pinctrl)) { + dev_err(host->dev, "Failed to request pinctrl\n"); + res = PTR_ERR(pinctrl); + goto err_ecc_ioremap; + } + if (gpio_is_valid(host->board.rdy_pin)) { - res = gpio_request(host->board.rdy_pin, "nand_rdy"); + res = devm_gpio_request(>dev, + host->board.rdy_pin, "nand_rdy"); if (res < 0) { dev_err(>dev, - "can't request rdy gpio %d\n", host->board.rdy_pin); + "can't request rdy gpio %d\n", + host->board.rdy_pin); goto err_ecc_ioremap; } @@@ -1435,11 -612,11 +1444,12 @@@ } if (gpio_is_valid(host->board.enable_pin)) { - res = gpio_request(host->board.enable_pin, "nand_enable"); + res = devm_gpio_request(>dev, + host->board.enable_pin, "nand_enable"); if (res < 0) { dev_err(>dev, - "can't request enable gpio %d\n", host->board.enable_pin); + "can't request enable gpio %d\n", + host->board.enable_pin); goto err_ecc_ioremap; } @@@ -1465,11 -664,11 +1475,12 @@@ atmel_nand_enable(host); if (gpio_is_valid(host->board.det_pin)) { - res = gpio_request(host->board.det_pin, "nand_det"); + res = devm_gpio_request(>dev, + host->board.det_pin, "nand_det"); if (res < 0) { dev_err(>dev, - "can't request det gpio %d\n", host->board.det_pin); + "can't request det gpio %d\n", + host->board.det_pin); goto err_no_card; } pgpImxNCqOazi.pgp Description: PGP signature
Re: [PATCH v3 07/10] thp: implement splitting pmd for huge zero page
On 10/02/2012 11:19 PM, Kirill A. Shutemov wrote: From: "Kirill A. Shutemov" We can't split huge zero page itself, but we can split the pmd which points to it. On splitting the pmd we create a table with all ptes set to normal zero page. Signed-off-by: Kirill A. Shutemov Reviewed-by: Andrea Arcangeli --- mm/huge_memory.c | 32 1 files changed, 32 insertions(+), 0 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 95032d3..3f1c59c 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1600,6 +1600,7 @@ int split_huge_page(struct page *page) struct anon_vma *anon_vma; int ret = 1; + BUG_ON(is_huge_zero_pfn(page_to_pfn(page))); BUG_ON(!PageAnon(page)); anon_vma = page_lock_anon_vma(page); if (!anon_vma) @@ -2503,6 +2504,32 @@ static int khugepaged(void *none) return 0; } +static void __split_huge_zero_page_pmd(struct vm_area_struct *vma, + unsigned long haddr, pmd_t *pmd) +{ + pgtable_t pgtable; + pmd_t _pmd; + int i; + + pmdp_clear_flush_notify(vma, haddr, pmd); why I can't find function pmdp_clear_flush_notify in kernel source code? Do you mean pmdp_clear_flush_young_notify or something like that? + /* leave pmd empty until pte is filled */ + + pgtable = get_pmd_huge_pte(vma->vm_mm); + pmd_populate(vma->vm_mm, &_pmd, pgtable); + + for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) { + pte_t *pte, entry; + entry = pfn_pte(my_zero_pfn(haddr), vma->vm_page_prot); + entry = pte_mkspecial(entry); + pte = pte_offset_map(&_pmd, haddr); + VM_BUG_ON(!pte_none(*pte)); + set_pte_at(vma->vm_mm, haddr, pte, entry); + pte_unmap(pte); + } + smp_wmb(); /* make pte visible before pmd */ + pmd_populate(vma->vm_mm, pmd, pgtable); +} + void __split_huge_page_pmd(struct vm_area_struct *vma, unsigned long address, pmd_t *pmd) { @@ -2516,6 +2543,11 @@ void __split_huge_page_pmd(struct vm_area_struct *vma, unsigned long address, spin_unlock(>vm_mm->page_table_lock); return; } + if (is_huge_zero_pmd(*pmd)) { + __split_huge_zero_page_pmd(vma, haddr, pmd); + spin_unlock(>vm_mm->page_table_lock); + return; + } page = pmd_page(*pmd); VM_BUG_ON(!page_count(page)); get_page(page); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH RFC 2/2] [x86] Optimize copy_page by re-arranging instruction sequence and saving register
> > Load and write operation occupy about 35% and 10% respectively for > > most industry benchmarks. Fetched 16-aligned bytes code include about > > 4 instructions, implying 1.34(0.35 * 4) load, 0.4 write. > > Modern CPU support 2 load and 1 write per cycle, so throughput from > > write is bottleneck for memcpy or copy_page, and some slight CPU only > > support one mem operation per cycle. So it is enough to issue one > read > > and write instruction per cycle, and we can save registers. > > I don't think "saving registers" is a useful goal here. Ling: issuing one read and write ops in one cycle is enough for copy_page or memcpy performance, so we could avoid saving and restoring registers operation. > > > > In this patch we also re-arrange instruction sequence to improve > > performance The performance on atom is improved about 11%, 9% on > > hot/cold-cache case respectively. > > That's great, but the question is what happened to the older CPUs that > also this sequence. It may be safer to add a new variant for Atom, > unless you can benchmark those too. Ling: I tested new and original version on core2, the patch improved performance about 9%, Although core2 is out-of-order pipeline and weaken instruction sequence requirement, because of ROB size limitation, new patch issues write operation earlier and get more parallelism possibility for the pair of write and load ops and better result. Attached core2-cpu-info (I have no older machine) Thanks Ling core2-cpu-info Description: core2-cpu-info
linux-next: manual merge of the pinctrl tree with Linus' tree
Hi Linus, Today's linux-next merge of the pinctrl tree got a conflict in arch/arm/mach-at91/at91sam9x5.c between commits af2a5f09fb6d ("Replace clk_lookup.con_id with clk_lookup.dev_id entries for twi clk") and f7d19b906556 ("ARM: at91: add clocks for I2C DT entries") from Linus' tree and commit 5c70cd3c7c69 ("arm: at91: dt: at91sam9 add pinctrl support") from the pinctrl tree. I fixed it up (see below) and can carry the fix as necessary (no action is required). -- Cheers, Stephen Rothwells...@canb.auug.org.au diff --cc arch/arm/mach-at91/at91sam9x5.c index e503538,79b5c52..000 --- a/arch/arm/mach-at91/at91sam9x5.c +++ b/arch/arm/mach-at91/at91sam9x5.c @@@ -231,13 -231,10 +231,13 @@@ static struct clk_lookup periph_clocks_ CLKDEV_CON_DEV_ID("t0_clk", "f800c000.timer", _clk), CLKDEV_CON_DEV_ID("dma_clk", "ec00.dma-controller", _clk), CLKDEV_CON_DEV_ID("dma_clk", "ee00.dma-controller", _clk), + CLKDEV_CON_DEV_ID(NULL, "f801.i2c", _clk), + CLKDEV_CON_DEV_ID(NULL, "f8014000.i2c", _clk), + CLKDEV_CON_DEV_ID(NULL, "f8018000.i2c", _clk), - CLKDEV_CON_ID("pioA", _clk), - CLKDEV_CON_ID("pioB", _clk), - CLKDEV_CON_ID("pioC", _clk), - CLKDEV_CON_ID("pioD", _clk), + CLKDEV_CON_DEV_ID(NULL, "f400.gpio", _clk), + CLKDEV_CON_DEV_ID(NULL, "f600.gpio", _clk), + CLKDEV_CON_DEV_ID(NULL, "f800.gpio", _clk), + CLKDEV_CON_DEV_ID(NULL, "fa00.gpio", _clk), /* additional fake clock for macb_hclk */ CLKDEV_CON_DEV_ID("hclk", "f802c000.ethernet", _clk), CLKDEV_CON_DEV_ID("hclk", "f803.ethernet", _clk), pgpY5DUDpSqEJ.pgp Description: PGP signature
linux-next: manual merge of the pinctrl tree with the tree
Hi Linus, Today's linux-next merge of the pinctrl tree got a conflict in arch/arm/mach-at91/at91sam9n12.c between commit f7d19b906556 ("ARM: at91: add clocks for I2C DT entries") from the tree and commit 5c70cd3c7c69 ("arm: at91: dt: at91sam9 add pinctrl support") from the pinctrl tree. I fixed it up (see below) and can carry the fix as necessary (no action is required). -- Cheers, Stephen Rothwells...@canb.auug.org.au diff --cc arch/arm/mach-at91/at91sam9n12.c index 732d3d3,a4676a5..000 --- a/arch/arm/mach-at91/at91sam9n12.c +++ b/arch/arm/mach-at91/at91sam9n12.c @@@ -169,12 -169,10 +169,12 @@@ static struct clk_lookup periph_clocks_ CLKDEV_CON_DEV_ID("t0_clk", "f8008000.timer", _clk), CLKDEV_CON_DEV_ID("t0_clk", "f800c000.timer", _clk), CLKDEV_CON_DEV_ID("dma_clk", "ec00.dma-controller", _clk), + CLKDEV_CON_DEV_ID(NULL, "f801.i2c", _clk), + CLKDEV_CON_DEV_ID(NULL, "f8014000.i2c", _clk), - CLKDEV_CON_ID("pioA", _clk), - CLKDEV_CON_ID("pioB", _clk), - CLKDEV_CON_ID("pioC", _clk), - CLKDEV_CON_ID("pioD", _clk), + CLKDEV_CON_DEV_ID(NULL, "f400.gpio", _clk), + CLKDEV_CON_DEV_ID(NULL, "f600.gpio", _clk), + CLKDEV_CON_DEV_ID(NULL, "f800.gpio", _clk), + CLKDEV_CON_DEV_ID(NULL, "fa00.gpio", _clk), /* additional fake clock for macb_hclk */ CLKDEV_CON_DEV_ID("hclk", "50.ohci", _clk), CLKDEV_CON_DEV_ID("ohci_clk", "50.ohci", _clk), pgpmiJ0ujUCxY.pgp Description: PGP signature
linux-next: manual merge of the pinctrl tree with Linus' tree
Hi Linus, Today's linux-next merge of the pinctrl tree got a conflict in arch/arm/mach-at91/at91sam9263.c between commit f7d19b906556 ("ARM: at91: add clocks for I2C DT entries") from Linus' tree and commit 5c70cd3c7c69 ("arm: at91: dt: at91sam9 add pinctrl support") from the pinctrl tree. I fixed it up (see below) and can carry the fix as necessary (no action is required). -- Cheers, Stephen Rothwells...@canb.auug.org.au diff --cc arch/arm/mach-at91/at91sam9263.c index 6a01d03,2edf813..000 --- a/arch/arm/mach-at91/at91sam9263.c +++ b/arch/arm/mach-at91/at91sam9263.c @@@ -211,7 -210,11 +211,12 @@@ static struct clk_lookup periph_clocks_ CLKDEV_CON_DEV_ID("hclk", "a0.ohci", _clk), CLKDEV_CON_DEV_ID("spi_clk", "fffa4000.spi", _clk), CLKDEV_CON_DEV_ID("spi_clk", "fffa8000.spi", _clk), + CLKDEV_CON_DEV_ID(NULL, "fff88000.i2c", _clk), + CLKDEV_CON_DEV_ID(NULL, "f200.gpio", _clk), + CLKDEV_CON_DEV_ID(NULL, "f400.gpio", _clk), + CLKDEV_CON_DEV_ID(NULL, "f600.gpio", _clk), + CLKDEV_CON_DEV_ID(NULL, "f800.gpio", _clk), + CLKDEV_CON_DEV_ID(NULL, "fa00.gpio", _clk), }; static struct clk_lookup usart_clocks_lookups[] = { pgp4jSIBefZjP.pgp Description: PGP signature
linux-next: manual merge of the pinctrl tree with Linus' tree
Hi Linus, Today's linux-next merge of the pinctrl tree got a conflict in arch/arm/boot/dts/at91sam9g25ek.dts between commit fbc1871511ed ("ARM: dts: add twi nodes for atmel boards") from Linus' tree and commit 77ccddbdc0c9 ("arm: at91: dt: at91sam9 add serial pinctrl support") from the pinctrl tree. I fixed it up (see below) and can carry the fix as necessary (no action is required). -- Cheers, Stephen Rothwells...@canb.auug.org.au diff --cc arch/arm/boot/dts/at91sam9g25ek.dts index 877c08f,c5ab16f..000 --- a/arch/arm/boot/dts/at91sam9g25ek.dts +++ b/arch/arm/boot/dts/at91sam9g25ek.dts @@@ -13,49 -13,4 +13,20 @@@ / { model = "Atmel AT91SAM9G25-EK"; compatible = "atmel,at91sam9g25ek", "atmel,at91sam9x5ek", "atmel,at91sam9x5", "atmel,at91sam9"; + - chosen { - bootargs = "console=ttyS0,115200 root=/dev/mtdblock1 rw rootfstype=ubifs ubi.mtd=1 root=ubi0:rootfs"; - }; - + ahb { + apb { - dbgu: serial@f200 { - status = "okay"; - }; - - usart0: serial@f801c000 { - status = "okay"; - }; - - macb0: ethernet@f802c000 { - phy-mode = "rmii"; - status = "okay"; - }; - + i2c0: i2c@f801 { + status = "okay"; + }; + + i2c1: i2c@f8014000 { + status = "okay"; + }; + + i2c2: i2c@f8018000 { + status = "okay"; + }; + }; - - usb0: ohci@0060 { - status = "okay"; - num-ports = <2>; - atmel,vbus-gpio = < 19 1 - 20 1 - >; - }; - - usb1: ehci@0070 { - status = "okay"; - }; + }; }; pgpMJVF6zX0tn.pgp Description: PGP signature
Re: [PATCH 1/2] tracing: trivial cleanup
On Thu, 2012-10-11 at 19:31 -0700, David Sharp wrote: > On Thu, Oct 11, 2012 at 6:56 PM, Steven Rostedt wrote: > > Sorry, I know this is late, but it was pushed down in my todo list > > (never off, but something I probably wouldn't have seen for a few more > > months). > > > > On Thu, 2012-06-07 at 16:46 -0700, Vaibhav Nagarnaik wrote: > >> From: David Sharp > > > > If this is from David it needs his SOB. > > Is that true even though we are working for the same company? > Yes. I would never push a patch from Ingo without his Signed-off-by even though he and I work for the same company ;-) Although, the GPL would let you. But it's best not to do it. -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v6 6/6] tracing: Fix maybe-uninitialized warning in ftrace_function_set_regexp
On Thu, Oct 11, 2012 at 6:36 PM, Steven Rostedt wrote: > On Thu, 2012-10-11 at 16:27 -0700, David Sharp wrote: >> Compiler warning: >> >> kernel/trace/trace_events_filter.c: In function >> 'ftrace_function_set_filter_cb': >> kernel/trace/trace_events_filter.c:2074:8: error: 'ret' may be used >> uninitialized in this function [-Werror=maybe-uninitialized] >> >> Signed-off-by: David Sharp >> Cc: Steven Rostedt >> --- >> kernel/trace/trace_events_filter.c |3 ++- >> 1 files changed, 2 insertions(+), 1 deletions(-) >> >> diff --git a/kernel/trace/trace_events_filter.c >> b/kernel/trace/trace_events_filter.c >> index 431dba8..ef36953 100644 >> --- a/kernel/trace/trace_events_filter.c >> +++ b/kernel/trace/trace_events_filter.c >> @@ -2002,9 +2002,10 @@ static int ftrace_function_set_regexp(struct >> ftrace_ops *ops, int filter, >> static int __ftrace_function_set_filter(int filter, char *buf, int len, >> struct function_filter_data *data) >> { >> - int i, re_cnt, ret; >> + int i, re_cnt; >> int *reset; >> char **re; >> + int ret = 0; >> >> reset = filter ? >first_filter : >first_notrace; >> > > It has already been fixed in mainline: > > 92d8d4a8b0f4c6eba70f6e62b48e38bd005a56e6 > > http://marc.info/?l=linux-kernel=134012157512078 Okay, drop this one then. I haven't been rebasing. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] binfmt_script: do not leave interp on stack
When binfmt_script's load_script ran, it would manipulate bprm->buf and leave bprm->interp pointing to the local stack. If a series of scripts are executed, the final one will have leaked kernel stack contents into the cmdline. For a proof of concept, see DoTest.sh from: http://www.halfdog.net/Security/2012/LinuxKernelBinfmtScriptStackDataDisclosure/ Largely based on a patch by halfdog. Cleaned up various style issues, including those reported by Randy Dunlap and scripts/checkpatch.pl. Cc: halfdog Cc: sta...@vger.kernel.org Signed-off-by: Kees Cook --- For more background, see the earlier thread: https://lkml.org/lkml/2012/8/18/75 --- fs/binfmt_script.c | 126 +--- 1 file changed, 101 insertions(+), 25 deletions(-) diff --git a/fs/binfmt_script.c b/fs/binfmt_script.c index d3b8c1f..15fe9e8 100644 --- a/fs/binfmt_script.c +++ b/fs/binfmt_script.c @@ -14,12 +14,22 @@ #include #include +/* + * Check if this handler is suitable to load the interpreter identified + * by first BINPRM_BUF_SIZE bytes in bprm->buf following "#!". + * + * Returns: + * 0: success; the new executable is ready in bprm->mm. + * -ve: interpreter not found, or other binfmts failed to find a + *suitable binary. + */ static int load_script(struct linux_binprm *bprm,struct pt_regs *regs) { const char *i_arg, *i_name; char *cp; struct file *file; - char interp[BINPRM_BUF_SIZE]; + char bprm_buf_copy[BINPRM_BUF_SIZE]; + const char *bprm_old_interp_name; int retval; if ((bprm->buf[0] != '#') || (bprm->buf[1] != '!') || @@ -30,34 +40,57 @@ static int load_script(struct linux_binprm *bprm,struct pt_regs *regs) * Sorta complicated, but hopefully it will work. -TYT */ - bprm->recursion_depth++; - allow_write_access(bprm->file); - fput(bprm->file); - bprm->file = NULL; + /* +* Keep bprm unchanged until we known that this is a script +* to be handled by this loader. Copy bprm->buf for sure, +* otherwise returning -ENOEXEC will make other handlers see +* modified data. +*/ + memcpy(bprm_buf_copy, bprm->buf, BINPRM_BUF_SIZE); - bprm->buf[BINPRM_BUF_SIZE - 1] = '\0'; - if ((cp = strchr(bprm->buf, '\n')) == NULL) - cp = bprm->buf+BINPRM_BUF_SIZE-1; + /* Locate and truncate end of string. */ + bprm_buf_copy[BINPRM_BUF_SIZE - 1] = '\0'; + cp = strchr(bprm_buf_copy, '\n'); + if (cp == NULL) + cp = bprm_buf_copy + BINPRM_BUF_SIZE - 1; *cp = '\0'; - while (cp > bprm->buf) { + /* Truncate trailing white-space. */ + while (cp > bprm_buf_copy) { cp--; if ((*cp == ' ') || (*cp == '\t')) *cp = '\0'; else break; } - for (cp = bprm->buf+2; (*cp == ' ') || (*cp == '\t'); cp++); + /* Skip leading white-space. */ + for (cp = bprm_buf_copy + 2; (*cp == ' ') || (*cp == '\t'); cp++) + /* nothing */ ; + + /* +* No interpreter name found? No problem to let other handlers +* retry, we did not change anything. +*/ if (*cp == '\0') - return -ENOEXEC; /* No interpreter name found */ + return -ENOEXEC; + i_name = cp; i_arg = NULL; + /* Find start of first argument. */ for ( ; *cp && (*cp != ' ') && (*cp != '\t'); cp++) /* nothing */ ; + /* Truncate and skip leading white-space. */ while ((*cp == ' ') || (*cp == '\t')) *cp++ = '\0'; if (*cp) i_arg = cp; - strcpy (interp, i_name); + + /* +* So this is our point-of-no-return: modification of bprm +* will be irreversible, so if we fail to setup execution +* using the new interpreter name (i_name), we have to make +* sure that no other handler tries again. +*/ + /* * OK, we've parsed out the interpreter name and * (optional) argument. @@ -68,34 +101,77 @@ static int load_script(struct linux_binprm *bprm,struct pt_regs *regs) * This is done in reverse order, because of how the * user environment and arguments are stored. */ + + /* +* Ugly: we store pointer to local stack frame in bprm, +* so make sure to clean this up before returning. +*/ + bprm_old_interp_name = bprm->interp; + bprm->interp = i_name; + retval = remove_arg_zero(bprm); if (retval) - return retval; - retval = copy_strings_kernel(1, >interp, bprm); - if (retval < 0) return retval; + goto out; + + /* +* copy_strings_kernel is ok here, even when racy: since no +* user can be attached to new mm, there is nobody to race +*
Re: [PATCH 1/2] tracing: trivial cleanup
On Thu, Oct 11, 2012 at 6:56 PM, Steven Rostedt wrote: > Sorry, I know this is late, but it was pushed down in my todo list > (never off, but something I probably wouldn't have seen for a few more > months). > > On Thu, 2012-06-07 at 16:46 -0700, Vaibhav Nagarnaik wrote: >> From: David Sharp > > If this is from David it needs his SOB. Is that true even though we are working for the same company? > -- Steve > >> >> Remove ftrace_format_syscall() declaration; it is neither defined nor >> used. Also update a comment and formatting. >> >> Signed-off-by: Vaibhav Nagarnaik Signed-off-by: David Sharp >> --- >> include/trace/syscall.h|2 -- >> kernel/trace/ring_buffer.c |6 +++--- >> 2 files changed, 3 insertions(+), 5 deletions(-) >> >> diff --git a/include/trace/syscall.h b/include/trace/syscall.h >> index 31966a4..0c95796 100644 >> --- a/include/trace/syscall.h >> +++ b/include/trace/syscall.h >> @@ -39,8 +39,6 @@ extern int reg_event_syscall_enter(struct >> ftrace_event_call *call); >> extern void unreg_event_syscall_enter(struct ftrace_event_call *call); >> extern int reg_event_syscall_exit(struct ftrace_event_call *call); >> extern void unreg_event_syscall_exit(struct ftrace_event_call *call); >> -extern int >> -ftrace_format_syscall(struct ftrace_event_call *call, struct trace_seq *s); >> enum print_line_t print_syscall_enter(struct trace_iterator *iter, int >> flags, >> struct trace_event *event); >> enum print_line_t print_syscall_exit(struct trace_iterator *iter, int flags, >> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c >> index 1d0f6a8..96c2dd1 100644 >> --- a/kernel/trace/ring_buffer.c >> +++ b/kernel/trace/ring_buffer.c >> @@ -1816,7 +1816,7 @@ rb_add_time_stamp(struct ring_buffer_event *event, u64 >> delta) >> } >> >> /** >> - * ring_buffer_update_event - update event type and data >> + * rb_update_event - update event type and data >> * @event: the even to update >> * @type: the type of event >> * @length: the size of the event field in the ring buffer >> @@ -2716,8 +2716,8 @@ EXPORT_SYMBOL_GPL(ring_buffer_discard_commit); >> * and not the length of the event which would hold the header. >> */ >> int ring_buffer_write(struct ring_buffer *buffer, >> - unsigned long length, >> - void *data) >> + unsigned long length, >> + void *data) >> { >> struct ring_buffer_per_cpu *cpu_buffer; >> struct ring_buffer_event *event; > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Request for review] Revised delete_module(2) manual page
Lucas De Marchi writes: > What do you think? Mark as deprecated now and remove when kernel > removes it? Or remove now? Complain now, and I'll queue the removal in two merge windows. Thats gives us a chance just in case someone actually uses this; if so I want to talk to them about what it is they want! Thanks, Rusty. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 8/8] x86/lguest: Use __pa_symbol instead of __pa on C visible symbols
Alexander Duyck writes: > The function lguest_write_cr3 is using __pa to convert swapper_pg_dir and > initial_page_table from virtual addresses to physical. The correct function > to use for these values is __pa_symbol since they are C visible symbols. > > Cc: Rusty Russell > Signed-off-by: Alexander Duyck Acked-by: Rusty Russell Thanks, Rusty. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/3] virtio-net: inline header support
"Michael S. Tsirkin" writes: > On Thu, Oct 11, 2012 at 10:33:31AM +1030, Rusty Russell wrote: >> OK. Well, Anthony wants qemu to be robust in this regard, so I am >> tempted to rework all the qemu drivers to handle arbitrary layouts. >> They could use a good audit anyway. > > I agree here. Still trying to understand whether we can agree to use > a feature bit for this, or not. I'd *like* to imply it by the new PCI layout, but if it doesn't work we'll add a new feature bit. I'm resisting a feature bit, since it constrains future implementations which could otherwise assume it. >> This would become a glaring exception, but I'm tempted to fix it to 32 >> bytes at the same time as we get the new pci layout (ie. for the virtio >> 1.0 spec). > > But this isn't a virtio-pci only issue, is it? > qemu has s390 bus with same limmitation. > How can we tie it to pci layout? They can use a transport feature if they need to, of course. But perhaps the timing with ccw will coincide with the fix, in which they don't need to, but it might be a bit late. Cornelia? Cheers, Rusty. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [SCSI] virtio-scsi: Add real 2-clause BSD license to header
Paolo Bonzini writes: > Il 11/10/2012 08:41, Bryan Venteicher ha scritto: >> This is analogous to commit a1b383870a made by Rusty Russell to all >> the VirtIO headers at the time. This eases the use of the header as >> is by other OSes. >> >> Signed-off-by: Bryan Venteicher > > Acked-by: Paolo Bonzini Applied! Thanks, Rusty. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/4] module: add syscall to load module from fd
"H. Peter Anvin" writes: > On 10/10/2012 06:03 AM, Michael Kerrisk (man-pages) wrote: >> Good point. A "whole hog" openat()-style interface is worth thinking about >> too. > > *Although* you could argue that you can always simply open the module > file first, and that finit_module() is really what we should have had in > the first place. Then you don't need the flags since those would come > from openat(). There's no fundamental reason that modules have to be in a file. I'm thinking of compressed modules, or an initrd which simply includes all the modules it wants to load in one linear file. Also, --force options manipulate the module before loading (as did the now-obsolete module rename option). Cheers, Rusty. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
linux-next: manual merge of the kvm-ppc tree with the kvm tree
Hi Alexander, Today's linux-next merge of the kvm-ppc tree got a conflict in arch/powerpc/kvm/44x_emulate.c between commits from the kvm tree and the same patches plus another commit from the kvm-ppc tree. I just used the kvm-ppc tree version. This happened because 1) you rebased/rewrote your tree before it was merged into the kvm tree 2) left the old version in your kvm-ppc-next branch 3) added some more commits to your kvm-ppc-next branch Don't do that! Unfortunately, the best thing you can do now is to rebase your kvm-ppc-next branch on top of the kvm tree. :-( -- Cheers, Stephen Rothwells...@canb.auug.org.au pgpWoXo34wMDu.pgp Description: PGP signature
Re: [PATCH 1/2] tracing: trivial cleanup
Sorry, I know this is late, but it was pushed down in my todo list (never off, but something I probably wouldn't have seen for a few more months). On Thu, 2012-06-07 at 16:46 -0700, Vaibhav Nagarnaik wrote: > From: David Sharp If this is from David it needs his SOB. -- Steve > > Remove ftrace_format_syscall() declaration; it is neither defined nor > used. Also update a comment and formatting. > > Signed-off-by: Vaibhav Nagarnaik > --- > include/trace/syscall.h|2 -- > kernel/trace/ring_buffer.c |6 +++--- > 2 files changed, 3 insertions(+), 5 deletions(-) > > diff --git a/include/trace/syscall.h b/include/trace/syscall.h > index 31966a4..0c95796 100644 > --- a/include/trace/syscall.h > +++ b/include/trace/syscall.h > @@ -39,8 +39,6 @@ extern int reg_event_syscall_enter(struct ftrace_event_call > *call); > extern void unreg_event_syscall_enter(struct ftrace_event_call *call); > extern int reg_event_syscall_exit(struct ftrace_event_call *call); > extern void unreg_event_syscall_exit(struct ftrace_event_call *call); > -extern int > -ftrace_format_syscall(struct ftrace_event_call *call, struct trace_seq *s); > enum print_line_t print_syscall_enter(struct trace_iterator *iter, int flags, > struct trace_event *event); > enum print_line_t print_syscall_exit(struct trace_iterator *iter, int flags, > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c > index 1d0f6a8..96c2dd1 100644 > --- a/kernel/trace/ring_buffer.c > +++ b/kernel/trace/ring_buffer.c > @@ -1816,7 +1816,7 @@ rb_add_time_stamp(struct ring_buffer_event *event, u64 > delta) > } > > /** > - * ring_buffer_update_event - update event type and data > + * rb_update_event - update event type and data > * @event: the even to update > * @type: the type of event > * @length: the size of the event field in the ring buffer > @@ -2716,8 +2716,8 @@ EXPORT_SYMBOL_GPL(ring_buffer_discard_commit); > * and not the length of the event which would hold the header. > */ > int ring_buffer_write(struct ring_buffer *buffer, > - unsigned long length, > - void *data) > + unsigned long length, > + void *data) > { > struct ring_buffer_per_cpu *cpu_buffer; > struct ring_buffer_event *event; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()
On Fri, Oct 12, 2012 at 2:33 AM, Shea Levy wrote: > > FWIW (and probably that's not much), the NixOS[0] distro doesn't currently > use /lib/firmware. There is no /lib directory by default on NixOS, instead > we create a new symlink tree representing the current system on each system > change and symlink /run/current-system to that tree. We currently build > udev/systemd with the --with-firmware-path=/run/current-system/firmware > configuration-time option, but we also patch module-init-tools and kmod to > respect the $MODULE_DIR env var and may do the same for firmware in the > future. The way we do things has significant advantages (or at least we like > to think so), but we already have exceptions for /bin/sh and /usr/bin/env, > so I suspect we'll probably add in /lib/firmware if this functionality moves > into the kernel. The kernel parameter for customizing firmware search path will be added, so you use can pass your search path from kernel command too. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/33] AutoNUMA27
Hi Mel, On Thu, Oct 11, 2012 at 10:34:32PM +0100, Mel Gorman wrote: > So after getting through the full review of it, there wasn't anything > I could not stand. I think it's *very* heavy on some of the paths like > the idle balancer which I was not keen on and the fault paths are also > quite heavy. I think the weight on some of these paths can be reduced > but not to 0 if the objectives to autonuma are to be met. > > I'm not fully convinced that the task exchange is actually necessary or > beneficial because it somewhat assumes that there is a symmetry between CPU > and memory balancing that may not be true. The fact that it only considers The problem is that without an active task exchange and no explicit call to stop_one_cpu*, there's no way to migrate a currently running task and clearly we need that. We can indefinitely wait hoping the task goes to sleep and leaves the CPU idle, or that a couple of other tasks start and trigger load balance events. We must move tasks even if all cpus are in a steady rq->nr_running == 1 state and there's no other scheduler balance event that could possibly attempt to move tasks around in such a steady state. Of course one could hack the active idle balancing so that it does the active NUMA balancing action, but that would be a purely artificial complication: it would add unnecessary delay and it would provide no benefit whatsoever. Why don't we dump the active idle balancing too, and we hack the load balancing to do the active idle balancing as well? Of course then the two will be more integrated. But it'll be a mess and slower and there's a good reason why they exist as totally separated pieces of code working in parallel. We can integrate it more, but in my view the result would be worse and more complicated. Last but not the least messing the idle balancing code to do an active NUMA balancing action (somehow invoking stop_one_cpu* in the steady state described above) would force even cellphones and UP kernels to deal with NUMA code somehow. > tasks that are currently running feels a bit random but examining all tasks > that recently ran on the node would be far too expensive to there is no So far this seems a good tradeoff. Nothing will prevent us to scan deeper into the runqueues later if find a way to do that efficiently. > good answer. You are caught between a rock and a hard place and either > direction you go is wrong for different reasons. You need something more I think you described the problem perfectly ;). > frequent than scans (because it'll converge too slowly) but doing it from > the balancer misses some tasks and may run too frequently and it's unclear > how it effects the current load balancer decisions. I don't have a good > alternative solution for this but ideally it would be better integrated with > the existing scheduler when there is more data on what those scheduling > decisions should be. That will only come from a wide range of testing and > the inevitable bug reports. > > That said, this is concentrating on the problems without considering the > situations where it would work very well. I think it'll come down to HPC > and anything jitter-sensitive will hate this while workloads like JVM, > virtualisation or anything that uses a lot of memory without caring about > placement will love it. It's not perfect but it's better than incurring > the cost of remote access unconditionally. Full agreement. Your detailed full review was very appreciated, thanks! Andrea -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch 3/7] smpboot: Provide infrastructure for percpu hotplug threads
On Wed, Sep 19, 2012 at 5:47 PM, Sasha Levin wrote: > Hi Thomas, > > On 07/16/2012 12:42 PM, Thomas Gleixner wrote: >> Provide a generic interface for setting up and tearing down percpu >> threads. >> >> On registration the threads for already online cpus are created and >> started. On deregistration (modules) the threads are stoppped. >> >> During hotplug operations the threads are created, started, parked and >> unparked. The datastructure for registration provides a pointer to >> percpu storage space and optional setup, cleanup, park, unpark >> functions. These functions are called when the thread state changes. >> >> Each implementation has to provide a function which is queried and >> returns whether the thread should run and the thread function itself. >> >> The core code handles all state transitions and avoids duplicated code >> in the call sites. >> >> Signed-off-by: Thomas Gleixner >> --- > > This patch seems to cause the following BUG() on KVM guests with large amount > of > VCPUs: > > [0.511760] [ cut here ] > [0.511761] kernel BUG at kernel/smpboot.c:134! > [0.511764] invalid opcode: [#3] PREEMPT SMP DEBUG_PAGEALLOC > [0.511779] CPU 0 > [0.511780] Pid: 70, comm: watchdog/10 Tainted: G D W > 3.6.0-rc6-next-20120919-sasha-1-gb54aafe #365 > [0.511783] RIP: 0010:[] [] > smpboot_thread_fn+0x196/0x2e0 > [0.511785] RSP: 0018:88000cf4bdd0 EFLAGS: 00010206 > [0.511786] RAX: RBX: 88000cf58000 RCX: > > [0.511787] RDX: RSI: 0001 RDI: > 0001 > [0.511788] RBP: 88000cf4be30 R08: R09: > 0001 > [0.511789] R10: R11: R12: > 88000cdb9ff0 > [0.511790] R13: 84c60920 R14: 000a R15: > 88000cf58000 > [0.511792] FS: () GS:88000d20() > knlGS: > [0.511794] CS: 0010 DS: ES: CR0: 8005003b > [0.511795] CR2: CR3: 04c26000 CR4: > 000406f0 > [0.511801] DR0: DR1: DR2: > > [0.511805] DR3: DR6: 0ff0 DR7: > 0400 > [0.511807] Process watchdog/10 (pid: 70, threadinfo 88000cf4a000, task > 88000cf58000) > [0.511808] Stack: > [0.511822] 88000cf4bfd8 88000cf4bfd8 > > [0.511833] 88000cf4be00 839eace5 88000cf4be30 > 88000cdd1c68 > [0.511844] 88000cdb9ff0 811414e0 > > [0.511845] Call Trace: > [0.511852] [] ? schedule+0x55/0x60 > [0.511857] [] ? __smpboot_create_thread+0xf0/0xf0 > [0.511863] [] kthread+0xe3/0xf0 > [0.511867] [] ? wait_for_common+0x143/0x180 > [0.511873] [] kernel_thread_helper+0x4/0x10 > [0.511878] [] ? retint_restore_args+0x13/0x13 > [0.511883] [] ? insert_kthread_work+0x90/0x90 > [0.511888] [] ? gs_change+0x13/0x13 > [0.511916] Code: 24 04 02 00 00 00 0f 1f 80 00 00 00 00 e8 b3 46 ff ff e9 > b6 > fe ff ff 66 0f 1f 44 00 00 45 8b 34 24 e8 ff 72 8a 00 41 39 c6 74 0a <0f> 0b > 0f > 1f 84 00 00 00 00 00 41 8b 44 24 04 85 c0 74 0f 83 f8 > [0.511919] RIP [] smpboot_thread_fn+0x196/0x2e0 > [0.511920] RSP > [0.511922] ---[ end trace 127920ef70923ae1 ]--- > > I'm starting the guest with numa=fake=10, so vcpu 0 ends up on the same (fake) > node as vcpu 10, and while digging into the bug, it seems that the issue is > that > vcpu10's thread gets scheduled on vcpu0. > > Beyond that I don't really understand what's wrong... Ping? Still seeing that with linux-next. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v6 6/6] tracing: Fix maybe-uninitialized warning in ftrace_function_set_regexp
On Thu, 2012-10-11 at 16:27 -0700, David Sharp wrote: > Compiler warning: > > kernel/trace/trace_events_filter.c: In function > 'ftrace_function_set_filter_cb': > kernel/trace/trace_events_filter.c:2074:8: error: 'ret' may be used > uninitialized in this function [-Werror=maybe-uninitialized] > > Signed-off-by: David Sharp > Cc: Steven Rostedt > --- > kernel/trace/trace_events_filter.c |3 ++- > 1 files changed, 2 insertions(+), 1 deletions(-) > > diff --git a/kernel/trace/trace_events_filter.c > b/kernel/trace/trace_events_filter.c > index 431dba8..ef36953 100644 > --- a/kernel/trace/trace_events_filter.c > +++ b/kernel/trace/trace_events_filter.c > @@ -2002,9 +2002,10 @@ static int ftrace_function_set_regexp(struct > ftrace_ops *ops, int filter, > static int __ftrace_function_set_filter(int filter, char *buf, int len, > struct function_filter_data *data) > { > - int i, re_cnt, ret; > + int i, re_cnt; > int *reset; > char **re; > + int ret = 0; > > reset = filter ? >first_filter : >first_notrace; > It has already been fixed in mainline: 92d8d4a8b0f4c6eba70f6e62b48e38bd005a56e6 http://marc.info/?l=linux-kernel=134012157512078 -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC][PATCH] perf: Add a few generic stalled-cycles events
>From 89cb6a25b9f714e55a379467a832ee015014ed11 Mon Sep 17 00:00:00 2001 From: Sukadev Bhattiprolu Date: Tue, 18 Sep 2012 10:59:01 -0700 Subject: [PATCH] perf: Add a few generic stalled-cycles events The existing generic event 'stalled-cycles-backend' corresponds to PM_CMPLU_STALL event in Power7. While this event is useful, detailed performance analysis often requires us to find more specific reasons for the stalled cycle. For instance, stalled cycles in Power7 can occur due to, among others: - instruction fetch unit (IFU), - Load-store-unit (LSU), - Fixed point unit (FXU) - Branch unit (BRU) While it is possible to use raw codes to monitor these events, it quickly becomes cumbersome with performance analysis frequently requiring mapping the raw event codes in reports to their symbolic names. This patch is a proposal to try and generalize such perf events. Since the code changes are quite simple, I bunched all the 4 events together. I am not familiar with how readily these events would map to other architectures. Here is some information on the events for Power7: stalled-cycles-fixed-point (PM_CMPLU_STALL_FXU) Following a completion stall, the last instruction to finish before completion resumes was from the Fixed Point Unit. Completion stall is any period when no groups completed and the completion table was not empty for that thread. stalled-cycles-load-store (PM_CMPLU_STALL_LSU) Following a completion stall, the last instruction to finish before completion resumes was from the Load-Store Unit. stalled-cycles-instruction-fetch (PM_CMPLU_STALL_IFU) Following a completion stall, the last instruction to finish before completion resumes was from the Instruction Fetch Unit. stalled-cycles-branch (PM_CMPLU_STALL_BRU) Following a completion stall, the last instruction to finish before completion resumes was from the Branch Unit. Looking for feedback on this approach and if this can be further extended. Power7 has 530 events[2] out of which a "CPI stack analysis"[1] uses about 26 events. [1] CPI Stack analysis https://www.power.org/documentation/commonly-used-metrics-for-performance-analysis [2] Power7 events: https://www.power.org/documentation/comprehensive-pmu-event-reference-power7/ Signed-off-by: Sukadev Bhattiprolu --- arch/powerpc/perf/power7-pmu.c |4 include/linux/perf_event.h |4 tools/perf/builtin-stat.c |4 tools/perf/util/evsel.c|4 tools/perf/util/parse-events.l |4 tools/perf/util/python.c |4 6 files changed, 24 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/perf/power7-pmu.c b/arch/powerpc/perf/power7-pmu.c index 1251e4d..813e7c7 100644 --- a/arch/powerpc/perf/power7-pmu.c +++ b/arch/powerpc/perf/power7-pmu.c @@ -304,6 +304,10 @@ static int power7_generic_events[] = { [PERF_COUNT_HW_CACHE_MISSES] = 0x400f0, /* LD_MISS_L1 */ [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = 0x10068, /* BRU_FIN */ [PERF_COUNT_HW_BRANCH_MISSES] = 0x400f6,/* BR_MPRED */ + [PERF_COUNT_HW_STALLED_CYCLES_FIXED_POINT] = 0x20014,/* CMPLU_STALL_FXU */ + [PERF_COUNT_HW_STALLED_CYCLES_LOAD_STORE] = 0x20012,/* CMPLU_STALL_LSU */ + [PERF_COUNT_HW_STALLED_CYCLES_INSTRUCTION_FETCH] = 0x4004c,/* CMPLU_STALL_IFU */ + [PERF_COUNT_HW_STALLED_CYCLES_BRANCH] = 0x4004e,/* CMPLU_STALL_BRU */ }; #define C(x) PERF_COUNT_HW_CACHE_##x diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index bdb4161..ff9f0a6 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -55,6 +55,10 @@ enum perf_hw_id { PERF_COUNT_HW_STALLED_CYCLES_FRONTEND = 7, PERF_COUNT_HW_STALLED_CYCLES_BACKEND= 8, PERF_COUNT_HW_REF_CPU_CYCLES= 9, + PERF_COUNT_HW_STALLED_CYCLES_FIXED_POINT = 10, + PERF_COUNT_HW_STALLED_CYCLES_LOAD_STORE = 11, + PERF_COUNT_HW_STALLED_CYCLES_INSTRUCTION_FETCH = 12, + PERF_COUNT_HW_STALLED_CYCLES_BRANCH = 13, PERF_COUNT_HW_MAX, /* non-ABI */ }; diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index 861f0ae..6275dbb 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -77,6 +77,10 @@ static struct perf_event_attr default_attrs[] = { { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_INSTRUCTIONS }, { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS }, { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_MISSES }, + { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_STALLED_CYCLES_FIXED_POINT }, + { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_STALLED_CYCLES_LOAD_STORE }, +
Re: [PATCH 2.6.32.y, 2.6.34.y] ALSA: hda - ALSA HD Audio patch for Intel Panther Point DeviceIDs
On Tue, 2012-10-09 at 10:56 -0400, Paul Gortmaker wrote: > On 12-10-08 09:55 PM, Ben Hutchings wrote: > > On Mon, 2012-10-08 at 18:07 -0700, Jonathan Nieder wrote: > >> From: Seth Heasley > >> Date: Wed, 20 Apr 2011 10:59:57 -0700 > >> > >> commit d2edeb7c6f1dada8ca7d5c23e42d604e92ae0c76 upstream. > >> > >> This patch adds the HD Audio Controller DeviceIDs for the Intel Panther > >> Point PCH. > >> > >> [jn: backported for 2.6.32.y by Ana Guerrero] > >> > >> Signed-off-by: Seth Heasley > >> Signed-off-by: Takashi Iwai > >> Tested-by: Ana Guerrero # EliteBook 8570w > >> Signed-off-by: Jonathan Nieder > >> --- > >> Hi Willy and Paul, > >> > >> Please consider > >> > >> d2edeb7c6f1d ALSA: hda - ALSA HD Audio patch for Intel Panther Point > >>DeviceIDs > >> > >> for application to the 2.6.32.y and 2.6.34.y trees. > >> > >> It does what it says on the cover. The patch was merged in the 3.0 > >> cycle, so newer stable kernels don't need it. Backported and > >> tested[1] against Debian's 2.6.32.y-based kernel by Ana (cc-ed) -- > >> thanks! > >> > >> Thoughts of all kinds welcome, as always. > > [...] > > > > I queued up a whole series of device ID updates for Debian stable, as > > they didn't obviously depend on other changes: > > > > cea310e ALSA: hda_intel: ALSA HD Audio patch for Intel Patsburg DeviceIDs > > e35d4b1 ALSA: hda: add Vortex86MX PCI ids > > 0f0714c ALSA: hda - Add support for VMware controller > > d2edeb7 ALSA: hda - ALSA HD Audio patch for Intel Panther Point DeviceIDs > > I found that d2edeb7c6f depends (only trivially, on context) on b686453543f, > which makes a blanket class for intel IDs so that "...the driver will work > with any new control chips in future." In that respect, I guess it too > (b68645) is in the same class as the other "add more IDs" patches? Yes, it seems like this would also be worth adding. (Always assuming that any functional changes it depends on are also small enough for stable.) It's not a clean cherry-pick as the line above has changed. Ben. > Paul. > -- > > > > > (They can be cherry-picked cleanly in the above order.) But if I've > > missed some post-2.6.32 dependencies then I would like to know. > > > > Ben. > > > -- Ben Hutchings Kids! Bringing about Armageddon can be dangerous. Do not attempt it in your own home. - Terry Pratchett and Neil Gaiman, `Good Omens' signature.asc Description: This is a digitally signed message part
[PATCH RT 1/8] random: Make it work on rt
From: Thomas Gleixner Delegate the random insertion to the forced threaded interrupt handler. Store the return IP of the hard interrupt handler in the irq descriptor and feed it into the random generator as a source of entropy. Signed-off-by: Thomas Gleixner Cc: stable...@vger.kernel.org Signed-off-by: Steven Rostedt --- drivers/char/random.c | 10 ++ include/linux/irqdesc.h |1 + include/linux/random.h |2 +- kernel/irq/handle.c |7 +-- kernel/irq/manage.c |6 ++ 5 files changed, 19 insertions(+), 7 deletions(-) diff --git a/drivers/char/random.c b/drivers/char/random.c index d38af32..66c8a0f 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -767,18 +767,16 @@ EXPORT_SYMBOL_GPL(add_input_randomness); static DEFINE_PER_CPU(struct fast_pool, irq_randomness); -void add_interrupt_randomness(int irq, int irq_flags) +void add_interrupt_randomness(int irq, int irq_flags, __u64 ip) { struct entropy_store*r; struct fast_pool*fast_pool = &__get_cpu_var(irq_randomness); - struct pt_regs *regs = get_irq_regs(); unsigned long now = jiffies; __u32 input[4], cycles = get_cycles(); input[0] = cycles ^ jiffies; input[1] = irq; - if (regs) { - __u64 ip = instruction_pointer(regs); + if (ip) { input[2] = ip; input[3] = ip >> 32; } @@ -792,7 +790,11 @@ void add_interrupt_randomness(int irq, int irq_flags) fast_pool->last = now; r = nonblocking_pool.initialized ? _pool : _pool; +#ifndef CONFIG_PREEMPT_RT_FULL __mix_pool_bytes(r, _pool->pool, sizeof(fast_pool->pool), NULL); +#else + mix_pool_bytes(r, _pool->pool, sizeof(fast_pool->pool), NULL); +#endif /* * If we don't have a valid cycle counter, and we see * back-to-back timer interrupts, then skip giving credit for diff --git a/include/linux/irqdesc.h b/include/linux/irqdesc.h index f1e2527..5f4f091 100644 --- a/include/linux/irqdesc.h +++ b/include/linux/irqdesc.h @@ -53,6 +53,7 @@ struct irq_desc { unsigned intirq_count; /* For detecting broken IRQs */ unsigned long last_unhandled; /* Aging timer for unhandled count */ unsigned intirqs_unhandled; + u64 random_ip; raw_spinlock_t lock; struct cpumask *percpu_enabled; #ifdef CONFIG_SMP diff --git a/include/linux/random.h b/include/linux/random.h index 29e217a..3995b33 100644 --- a/include/linux/random.h +++ b/include/linux/random.h @@ -53,7 +53,7 @@ extern void rand_initialize_irq(int irq); extern void add_device_randomness(const void *, unsigned int); extern void add_input_randomness(unsigned int type, unsigned int code, unsigned int value); -extern void add_interrupt_randomness(int irq, int irq_flags); +extern void add_interrupt_randomness(int irq, int irq_flags, __u64 ip); extern void get_random_bytes(void *buf, int nbytes); extern void get_random_bytes_arch(void *buf, int nbytes); diff --git a/kernel/irq/handle.c b/kernel/irq/handle.c index a768885..f6b91bc 100644 --- a/kernel/irq/handle.c +++ b/kernel/irq/handle.c @@ -116,6 +116,8 @@ static void irq_wake_thread(struct irq_desc *desc, struct irqaction *action) irqreturn_t handle_irq_event_percpu(struct irq_desc *desc, struct irqaction *action) { + struct pt_regs *regs = get_irq_regs(); + u64 ip = regs ? instruction_pointer(regs) : 0; irqreturn_t retval = IRQ_NONE; unsigned int flags = 0, irq = desc->irq_data.irq; @@ -157,8 +159,9 @@ handle_irq_event_percpu(struct irq_desc *desc, struct irqaction *action) } while (action); #ifndef CONFIG_PREEMPT_RT_FULL - /* FIXME: Can we unbreak that ? */ - add_interrupt_randomness(irq, flags); + add_interrupt_randomness(irq, flags, ip); +#else + desc->random_ip = ip; #endif if (!noirqdebug) diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c index 87dc053..2204340 100644 --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -816,6 +816,12 @@ static int irq_thread(void *data) action_ret = handler_fn(desc, action); if (!noirqdebug) note_interrupt(action->irq, desc, action_ret); +#ifdef CONFIG_PREEMPT_RT_FULL + migrate_disable(); + add_interrupt_randomness(action->irq, 0, +desc->random_ip ^ (u64) action); + migrate_enable(); +#endif } wake = atomic_dec_and_test(>threads_active); -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at
[PATCH RT 6/8] sched: Better debug output for might sleep
From: Thomas Gleixner might sleep can tell us where interrupts have been disabled, but we have no idea what disabled preemption. Add some debug infrastructure. Cc: stable...@vger.kernel.org Signed-off-by: Thomas Gleixner Signed-off-by: Steven Rostedt --- include/linux/sched.h |4 kernel/sched.c| 23 +-- 2 files changed, 25 insertions(+), 2 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 03498cc..12317b6 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1612,6 +1612,10 @@ struct task_struct { int kmap_idx; pte_t kmap_pte[KM_TYPE_NR]; #endif + +#ifdef CONFIG_DEBUG_PREEMPT + unsigned long preempt_disable_ip; +#endif }; #ifdef CONFIG_PREEMPT_RT_FULL diff --git a/kernel/sched.c b/kernel/sched.c index 87654e6..cdf9484 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -4487,8 +4487,13 @@ void __kprobes add_preempt_count(int val) DEBUG_LOCKS_WARN_ON((preempt_count() & PREEMPT_MASK) >= PREEMPT_MASK - 10); #endif - if (preempt_count() == val) - trace_preempt_off(CALLER_ADDR0, get_parent_ip(CALLER_ADDR1)); + if (preempt_count() == val) { + unsigned long ip = get_parent_ip(CALLER_ADDR1); +#ifdef CONFIG_DEBUG_PREEMPT + current->preempt_disable_ip = ip; +#endif + trace_preempt_off(CALLER_ADDR0, ip); + } } EXPORT_SYMBOL(add_preempt_count); @@ -4530,6 +4535,13 @@ static noinline void __schedule_bug(struct task_struct *prev) print_modules(); if (irqs_disabled()) print_irqtrace_events(prev); +#ifdef DEBUG_PREEMPT + if (in_atomic_preempt_off()) { + pr_err("Preemption disabled at:"); + print_ip_sym(current->preempt_disable_ip); + pr_cont("\n"); + } +#endif if (regs) show_regs(regs); @@ -8912,6 +8924,13 @@ void __might_sleep(const char *file, int line, int preempt_offset) debug_show_held_locks(current); if (irqs_disabled()) print_irqtrace_events(current); +#ifdef DEBUG_PREEMPT + if (!preempt_count_equals(preempt_offset)) { + pr_err("Preemption disabled at:"); + print_ip_sym(current->preempt_disable_ip); + pr_cont("\n"); + } +#endif dump_stack(); } EXPORT_SYMBOL(__might_sleep); -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH RT 8/8] Linux 3.2.31-rt47-rc1
From: Steven Rostedt --- localversion-rt |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/localversion-rt b/localversion-rt index 2721581..8a02d38 100644 --- a/localversion-rt +++ b/localversion-rt @@ -1 +1 @@ --rt46 +-rt47-rc1 -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH RT 4/8] mm: page_alloc: Use local_lock_on() instead of plain spinlock
From: Thomas Gleixner The plain spinlock while sufficient does not update the local_lock internals. Use a proper local_lock function instead to ease debugging. Signed-off-by: Thomas Gleixner Cc: stable...@vger.kernel.org Signed-off-by: Steven Rostedt --- include/linux/locallock.h | 11 +++ mm/page_alloc.c |4 ++-- 2 files changed, 13 insertions(+), 2 deletions(-) diff --git a/include/linux/locallock.h b/include/linux/locallock.h index 0161fbb..f1804a3 100644 --- a/include/linux/locallock.h +++ b/include/linux/locallock.h @@ -137,6 +137,12 @@ static inline int __local_lock_irqsave(struct local_irq_lock *lv) _flags = __get_cpu_var(lvar).flags; \ } while (0) +#define local_lock_irqsave_on(lvar, _flags, cpu) \ + do {\ + __local_lock_irqsave(_cpu(lvar, cpu)); \ + _flags = per_cpu(lvar, cpu).flags; \ + } while (0) + static inline int __local_unlock_irqrestore(struct local_irq_lock *lv, unsigned long flags) { @@ -156,6 +162,11 @@ static inline int __local_unlock_irqrestore(struct local_irq_lock *lv, put_local_var(lvar);\ } while (0) +#define local_unlock_irqrestore_on(lvar, flags, cpu) \ + do {\ + __local_unlock_irqrestore(_cpu(lvar, cpu), flags); \ + } while (0) + #define local_spin_trylock_irq(lvar, lock) \ ({ \ int __locked; \ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 67202bc..8678a7f 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -227,9 +227,9 @@ static DEFINE_LOCAL_IRQ_LOCK(pa_lock); #ifdef CONFIG_PREEMPT_RT_BASE # define cpu_lock_irqsave(cpu, flags) \ - spin_lock_irqsave(_cpu(pa_lock, cpu).lock, flags) + local_lock_irqsave_on(pa_lock, flags, cpu) # define cpu_unlock_irqrestore(cpu, flags) \ - spin_unlock_irqrestore(_cpu(pa_lock, cpu).lock, flags) + local_unlock_irqrestore_on(pa_lock, flags, cpu) #else # define cpu_lock_irqsave(cpu, flags) local_irq_save(flags) # define cpu_unlock_irqrestore(cpu, flags) local_irq_restore(flags) -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH RT 5/8] rt: rwsem/rwlock: lockdep annotations
From: Thomas Gleixner rwlocks and rwsems on RT do not allow multiple readers. Annotate the lockdep acquire functions accordingly. Signed-off-by: Thomas Gleixner Cc: stable...@vger.kernel.org Signed-off-by: Steven Rostedt --- kernel/rt.c | 46 +- 1 file changed, 25 insertions(+), 21 deletions(-) diff --git a/kernel/rt.c b/kernel/rt.c index 092d6b3..aa10504 100644 --- a/kernel/rt.c +++ b/kernel/rt.c @@ -216,15 +216,17 @@ int __lockfunc rt_read_trylock(rwlock_t *rwlock) * write locked. */ migrate_disable(); - if (rt_mutex_owner(lock) != current) + if (rt_mutex_owner(lock) != current) { ret = rt_mutex_trylock(lock); - else if (!rwlock->read_depth) + if (ret) + rwlock_acquire(>dep_map, 0, 1, _RET_IP_); + } else if (!rwlock->read_depth) { ret = 0; + } - if (ret) { + if (ret) rwlock->read_depth++; - rwlock_acquire_read(>dep_map, 0, 1, _RET_IP_); - } else + else migrate_enable(); return ret; @@ -242,13 +244,13 @@ void __lockfunc rt_read_lock(rwlock_t *rwlock) { struct rt_mutex *lock = >lock; - rwlock_acquire_read(>dep_map, 0, 0, _RET_IP_); - /* * recursive read locks succeed when current owns the lock */ - if (rt_mutex_owner(lock) != current) + if (rt_mutex_owner(lock) != current) { + rwlock_acquire(>dep_map, 0, 0, _RET_IP_); __rt_spin_lock(lock); + } rwlock->read_depth++; } @@ -264,11 +266,11 @@ EXPORT_SYMBOL(rt_write_unlock); void __lockfunc rt_read_unlock(rwlock_t *rwlock) { - rwlock_release(>dep_map, 1, _RET_IP_); - /* Release the lock only when read_depth is down to 0 */ - if (--rwlock->read_depth == 0) + if (--rwlock->read_depth == 0) { + rwlock_release(>dep_map, 1, _RET_IP_); __rt_spin_unlock(>lock); + } } EXPORT_SYMBOL(rt_read_unlock); @@ -315,9 +317,10 @@ EXPORT_SYMBOL(rt_up_write); void rt_up_read(struct rw_semaphore *rwsem) { - rwsem_release(>dep_map, 1, _RET_IP_); - if (--rwsem->read_depth == 0) + if (--rwsem->read_depth == 0) { + rwsem_release(>dep_map, 1, _RET_IP_); rt_mutex_unlock(>lock); + } } EXPORT_SYMBOL(rt_up_read); @@ -366,15 +369,16 @@ int rt_down_read_trylock(struct rw_semaphore *rwsem) * but not when read_depth == 0 which means that the rwsem is * write locked. */ - if (rt_mutex_owner(lock) != current) + if (rt_mutex_owner(lock) != current) { ret = rt_mutex_trylock(>lock); - else if (!rwsem->read_depth) + if (ret) + rwsem_acquire(>dep_map, 0, 1, _RET_IP_); + } else if (!rwsem->read_depth) { ret = 0; + } - if (ret) { + if (ret) rwsem->read_depth++; - rwsem_acquire(>dep_map, 0, 1, _RET_IP_); - } return ret; } EXPORT_SYMBOL(rt_down_read_trylock); @@ -383,10 +387,10 @@ static void __rt_down_read(struct rw_semaphore *rwsem, int subclass) { struct rt_mutex *lock = >lock; - rwsem_acquire_read(>dep_map, subclass, 0, _RET_IP_); - - if (rt_mutex_owner(lock) != current) + if (rt_mutex_owner(lock) != current) { + rwsem_acquire(>dep_map, subclass, 0, _RET_IP_); rt_mutex_lock(>lock); + } rwsem->read_depth++; } -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH RT 2/8] softirq: Init softirq local lock after per cpu section is set up
From: Steven Rostedt I discovered this bug when booting 3.4-rt on my powerpc box. It crashed with the following report: [ cut here ] kernel BUG at /work/rt/stable-rt.git/kernel/rtmutex_common.h:75! Oops: Exception in kernel mode, sig: 5 [#1] PREEMPT SMP NR_CPUS=64 NUMA PA Semi PWRficient Modules linked in: NIP: c04aa03c LR: c04aa01c CTR: c009b2ac REGS: c0003e8d7950 TRAP: 0700 Not tainted (3.4.11-test-rt19) MSR: 90029032 CR: 2482 XER: 2000 SOFTE: 0 TASK = c0003e8fdcd0[11] 'ksoftirqd/1' THREAD: c0003e8d4000 CPU: 1 GPR00: 0001 c0003e8d7bd0 c0d6cbb0 GPR04: c0003e8fdcd0 24004082 c0011454 GPR08: 8001 c0003e8fdcd1 GPR12: 2484 cfff0280 3ad8 GPR16: 0072c798 0060 GPR20: 00642741 0072c858 3af0 0417 GPR24: 0072dcd0 c0003e7ff990 0001 GPR28: c0792340 c0ccec78 c1182338 NIP [c04aa03c] .wakeup_next_waiter+0x44/0xb8 LR [c04aa01c] .wakeup_next_waiter+0x24/0xb8 Call Trace: [c0003e8d7bd0] [c04aa01c] .wakeup_next_waiter+0x24/0xb8 (unreliable) [c0003e8d7c60] [c04a0320] .rt_spin_lock_slowunlock+0x8c/0xe4 [c0003e8d7ce0] [c04a07cc] .rt_spin_unlock+0x54/0x64 [c0003e8d7d60] [c00636bc] .__thread_do_softirq+0x130/0x174 [c0003e8d7df0] [c006379c] .run_ksoftirqd+0x9c/0x1a4 [c0003e8d7ea0] [c0080b68] .kthread+0xa8/0xb4 [c0003e8d7f90] [c001c2f8] .kernel_thread+0x54/0x70 Instruction dump: 6000 e86d01c8 38630730 4bff7061 6000 ebbf0008 7c7c1b78 e81d0040 7fe00278 7c74 7800d182 6801 <0b00> e88d01c8 387d0010 38840738 The rtmutex_common.h:75 is: rt_mutex_top_waiter(struct rt_mutex *lock) { struct rt_mutex_waiter *w; w = plist_first_entry(>wait_list, struct rt_mutex_waiter, list_entry); BUG_ON(w->lock != lock); return w; } Where the waiter->lock is corrupted. I saw various other random bugs that all had to with the softirq lock and plist. As plist needs to be initialized before it is used I investigated how this lock is initialized. It's initialized with: void __init softirq_early_init(void) { local_irq_lock_init(local_softirq_lock); } Where: #define local_irq_lock_init(lvar) \ do {\ int __cpu; \ for_each_possible_cpu(__cpu)\ spin_lock_init(_cpu(lvar, __cpu).lock); \ } while (0) As the softirq lock is a local_irq_lock, which is a per_cpu lock, the initialization is done to all per_cpu versions of the lock. But lets look at where the softirq_early_init() is called from. In init/main.c: start_kernel() /* * Interrupts are still disabled. Do necessary setups, then * enable them */ softirq_early_init(); tick_init(); boot_cpu_init(); page_address_init(); printk(KERN_NOTICE "%s", linux_banner); setup_arch(_line); mm_init_owner(_mm, _task); mm_init_cpumask(_mm); setup_command_line(command_line); setup_nr_cpu_ids(); setup_per_cpu_areas(); smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */ One of the first things that is called is the initialization of the softirq lock. But if you look further down, we see the per_cpu areas have not been set up yet. Thus initializing a local_irq_lock() before the per_cpu section is set up, may not work as it is initializing the per cpu locks before the per cpu exists. By moving the softirq_early_init() right after setup_per_cpu_areas(), the kernel boots fine. Signed-off-by: Steven Rostedt Cc: Clark Williams Cc: John Kacur Cc: Carsten Emde Cc: voml...@texas.net Cc: stable...@vger.kernel.org Link: http://lkml.kernel.org/r/1349362924.6755.18.ca...@gandalf.local.home Signed-off-by: Thomas Gleixner --- init/main.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/init/main.c b/init/main.c index d432bea..6f96224 100644 --- a/init/main.c +++ b/init/main.c @@ -490,7 +490,6 @@ asmlinkage void __init start_kernel(void) * Interrupts are still disabled. Do necessary setups, then * enable them */ - softirq_early_init(); tick_init(); boot_cpu_init(); page_address_init(); @@ -501,6 +500,7 @@ asmlinkage void __init start_kernel(void) setup_command_line(command_line); setup_nr_cpu_ids(); setup_per_cpu_areas(); + softirq_early_init(); smp_prepare_boot_cpu(); /*
[PATCH RT 7/8] stomp_machine: Use mutex_trylock when called from inactive cpu
From: Thomas Gleixner If the stop machinery is called from inactive CPU we cannot use mutex_lock, because some other stomp machine invokation might be in progress and the mutex can be contended. We cannot schedule from this context, so trylock and loop. Signed-off-by: Thomas Gleixner Cc: stable...@vger.kernel.org Signed-off-by: Steven Rostedt --- kernel/stop_machine.c | 13 + 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c index 561ba3a..e98c70b 100644 --- a/kernel/stop_machine.c +++ b/kernel/stop_machine.c @@ -158,7 +158,7 @@ static DEFINE_PER_CPU(struct cpu_stop_work, stop_cpus_work); static void queue_stop_cpus_work(const struct cpumask *cpumask, cpu_stop_fn_t fn, void *arg, -struct cpu_stop_done *done) +struct cpu_stop_done *done, bool inactive) { struct cpu_stop_work *work; unsigned int cpu; @@ -175,7 +175,12 @@ static void queue_stop_cpus_work(const struct cpumask *cpumask, * Make sure that all work is queued on all cpus before we * any of the cpus can execute it. */ - mutex_lock(_lock); + if (!inactive) { + mutex_lock(_lock); + } else { + while (!mutex_trylock(_lock)) + cpu_relax(); + } for_each_cpu(cpu, cpumask) cpu_stop_queue_work(_cpu(cpu_stopper, cpu), _cpu(stop_cpus_work, cpu)); @@ -188,7 +193,7 @@ static int __stop_cpus(const struct cpumask *cpumask, struct cpu_stop_done done; cpu_stop_init_done(, cpumask_weight(cpumask)); - queue_stop_cpus_work(cpumask, fn, arg, ); + queue_stop_cpus_work(cpumask, fn, arg, , false); wait_for_stop_done(); return done.executed ? done.ret : -ENOENT; } @@ -601,7 +606,7 @@ int stop_machine_from_inactive_cpu(int (*fn)(void *), void *data, set_state(, STOPMACHINE_PREPARE); cpu_stop_init_done(, num_active_cpus()); queue_stop_cpus_work(cpu_active_mask, stop_machine_cpu_stop, , -); +, true); ret = stop_machine_cpu_stop(); /* Busy wait for completion. */ -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH RT 3/8] mm: slab: Fix potential deadlock
From: Thomas Gleixner = [ INFO: possible recursive locking detected ] 3.6.0-rt1+ #49 Not tainted - swapper/0/1 is trying to acquire lock: lock_slab_on+0x72/0x77 but task is already holding lock: __local_lock_irq+0x24/0x77 other info that might help us debug this: Possible unsafe locking scenario: CPU0 lock(_cpu(slab_lock, __cpu).lock); lock(_cpu(slab_lock, __cpu).lock); *** DEADLOCK *** May be due to missing lock nesting notation 2 locks held by swapper/0/1: kmem_cache_create+0x33/0x89 __local_lock_irq+0x24/0x77 stack backtrace: Pid: 1, comm: swapper/0 Not tainted 3.6.0-rt1+ #49 Call Trace: __lock_acquire+0x9a4/0xdc4 ? __local_lock_irq+0x24/0x77 ? lock_slab_on+0x72/0x77 lock_acquire+0xc4/0x108 ? lock_slab_on+0x72/0x77 ? unlock_slab_on+0x5b/0x5b rt_spin_lock+0x36/0x3d ? lock_slab_on+0x72/0x77 ? migrate_disable+0x85/0x93 lock_slab_on+0x72/0x77 do_ccupdate_local+0x19/0x44 slab_on_each_cpu+0x36/0x5a do_tune_cpucache+0xc1/0x305 enable_cpucache+0x8c/0xb5 setup_cpu_cache+0x28/0x182 __kmem_cache_create+0x34b/0x380 ? shmem_mount+0x1a/0x1a kmem_cache_create+0x4a/0x89 ? shmem_mount+0x1a/0x1a shmem_init+0x3e/0xd4 kernel_init+0x11c/0x214 kernel_thread_helper+0x4/0x10 ? retint_restore_args+0x13/0x13 ? start_kernel+0x3bc/0x3bc ? gs_change+0x13/0x13 It's not a missing annotation. It's simply wrong code and needs to be fixed. Instead of nesting the local and the remote cpu lock simply acquire only the remote cpu lock, which is sufficient protection for this procedure. Signed-off-by: Thomas Gleixner Cc: stable...@vger.kernel.org Signed-off-by: Steven Rostedt --- include/linux/locallock.h |8 mm/slab.c | 10 ++ 2 files changed, 10 insertions(+), 8 deletions(-) diff --git a/include/linux/locallock.h b/include/linux/locallock.h index 8fbc393..0161fbb 100644 --- a/include/linux/locallock.h +++ b/include/linux/locallock.h @@ -96,6 +96,9 @@ static inline void __local_lock_irq(struct local_irq_lock *lv) #define local_lock_irq(lvar) \ do { __local_lock_irq(_local_var(lvar)); } while (0) +#define local_lock_irq_on(lvar, cpu) \ + do { __local_lock_irq(_cpu(lvar, cpu)); } while (0) + static inline void __local_unlock_irq(struct local_irq_lock *lv) { LL_WARN(!lv->nestcnt); @@ -111,6 +114,11 @@ static inline void __local_unlock_irq(struct local_irq_lock *lv) put_local_var(lvar);\ } while (0) +#define local_unlock_irq_on(lvar, cpu) \ + do {\ + __local_unlock_irq(_cpu(lvar, cpu));\ + } while (0) + static inline int __local_lock_irqsave(struct local_irq_lock *lv) { if (lv->owner != current) { diff --git a/mm/slab.c b/mm/slab.c index 411c545..6a8fd1b 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -747,18 +747,12 @@ slab_on_each_cpu(void (*func)(void *arg, int this_cpu), void *arg) static void lock_slab_on(unsigned int cpu) { - if (cpu == smp_processor_id()) - local_lock_irq(slab_lock); - else - local_spin_lock_irq(slab_lock, _cpu(slab_lock, cpu).lock); + local_lock_irq_on(slab_lock, cpu); } static void unlock_slab_on(unsigned int cpu) { - if (cpu == smp_processor_id()) - local_unlock_irq(slab_lock); - else - local_spin_unlock_irq(slab_lock, _cpu(slab_lock, cpu).lock); + local_unlock_irq_on(slab_lock, cpu); } #endif -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH RT 0/8] [ANNOUNCE] 3.2.31-rt47-rc1 stable review
Dear RT Folks, This is the RT stable review cycle of patch 3.2.31-rt47-rc1. Please scream at me if I messed something up. Please test the patches too. The -rc release will be uploaded to kernel.org and will be deleted when the final release is out. This is just a review release (or release candidate). The pre-releases will not be pushed to the git repository, only the final release is. If all goes well, this patch will be converted to the next main release on 10/16/2012. Enjoy, -- Steve To build 3.2.31-rt47-rc1 directly, the following patches should be applied: http://www.kernel.org/pub/linux/kernel/v3.x/linux-3.2.tar.xz http://www.kernel.org/pub/linux/kernel/v3.x/patch-3.2.31.xz http://www.kernel.org/pub/linux/kernel/projects/rt/3.2/patch-3.2.31-rt47-rc1.patch.xz You can also build from 3.2.31-rt46 by applying the incremental patch: http://www.kernel.org/pub/linux/kernel/projects/rt/3.2/incr/patch-3.2.31-rt46-rt47-rc1.patch.xz Changes from 3.2.31-rt46: --- Steven Rostedt (2): softirq: Init softirq local lock after per cpu section is set up Linux 3.2.31-rt47-rc1 Thomas Gleixner (6): random: Make it work on rt mm: slab: Fix potential deadlock mm: page_alloc: Use local_lock_on() instead of plain spinlock rt: rwsem/rwlock: lockdep annotations sched: Better debug output for might sleep stomp_machine: Use mutex_trylock when called from inactive cpu drivers/char/random.c | 10 ++ include/linux/irqdesc.h |1 + include/linux/locallock.h | 19 +++ include/linux/random.h|2 +- include/linux/sched.h |4 init/main.c |2 +- kernel/irq/handle.c |7 +-- kernel/irq/manage.c |6 ++ kernel/rt.c | 46 - kernel/sched.c| 23 +-- kernel/stop_machine.c | 13 + localversion-rt |2 +- mm/page_alloc.c |4 ++-- mm/slab.c | 10 ++ 14 files changed, 103 insertions(+), 46 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[git pull] vfs.git, pile 2
Stuff in this one - assorted fixes, lglock tidy-up, death to lock_super(). There'll be a VFS pile tomorrow (with patches from Jeff Layton, sanitizing getname() and related parts of audit and preparing for ESTALE fixes), but I'd rather push the stuff in this one ASAP - some of the bugs closed here are quite unpleasant. Please, pull from the usual place - git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-linus Shortlog: Al Viro (2): MAX_LFS_FILESIZE definition for 64bit needs LL... consitify do_mount() arguments Arnd Bergmann (1): vfs: bogus warnings in fs/namei.c Hugh Dickins (1): tmpfs,ceph,gfs2,isofs,reiserfs,xfs: fix fh_len checking Lai Jiangshan (3): lglock: remove unused DEFINE_LGLOCK_LOCKDEP() lglock: make the per_cpu locks static lglock: add DEFINE_STATIC_LGLOCK() Marco Stornelli (7): exofs: drop lock/unlock super ext3: drop lock/unlock super fat: drop lock/unlock super hpfs: drop lock/unlock super sysv: drop lock/unlock super ufs: drop lock/unlock super vfs: drop lock/unlock super Richard W.M. Jones (1): dup3: Return an error when oldfd == newfd. Sasha Levin (2): fs: prevent use after free in auditing when symlink following was denied fs: handle failed audit_log_start properly Diffstat: fs/ceph/export.c | 18 ++ fs/exofs/super.c |4 fs/ext3/super.c|6 -- fs/fat/dir.c |4 ++-- fs/fat/fat.h |5 +++-- fs/fat/inode.c |5 +++-- fs/fat/namei_msdos.c | 26 +- fs/fat/namei_vfat.c| 30 +++--- fs/file.c |3 +++ fs/file_table.c|2 +- fs/gfs2/export.c |4 fs/hpfs/super.c|3 --- fs/isofs/export.c |2 +- fs/namei.c |3 ++- fs/namespace.c | 12 ++-- fs/reiserfs/inode.c|6 +- fs/super.c | 23 --- fs/sysv/balloc.c | 18 +- fs/sysv/ialloc.c | 14 +++--- fs/sysv/inode.c|4 ++-- fs/sysv/super.c|1 + fs/sysv/sysv.h |1 + fs/ufs/balloc.c| 30 +++--- fs/ufs/ialloc.c| 16 fs/ufs/super.c | 21 +++-- fs/ufs/ufs.h |1 + fs/xfs/xfs_export.c|3 +++ include/linux/fs.h |5 ++--- include/linux/lglock.h | 19 --- include/linux/security.h | 12 ++-- kernel/audit.c |2 ++ mm/shmem.c |6 -- security/capability.c |4 ++-- security/security.c|4 ++-- security/selinux/hooks.c |4 ++-- security/smack/smack_lsm.c |4 ++-- security/tomoyo/common.h |2 +- security/tomoyo/mount.c|5 +++-- security/tomoyo/tomoyo.c |4 ++-- 39 files changed, 166 insertions(+), 170 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[git pull] signal.git, pile 2 (was Re: [RFC][CFT][CFReview] execve and kernel_thread unification work)
On Fri, Oct 12, 2012 at 11:16:33AM +1100, Paul Mackerras wrote: > On Thu, Oct 11, 2012 at 01:53:06PM +0100, Al Viro wrote: > > > Umm... Maybe, but let's do that as subsequent cleanup. Again, > > we almost certainly don't need to mess with TOC at all - the callbacks > > are in the main kernel, there are very few of them and they really are > > low-level details of exported mechanisms (i.e. kthread_create/run/etc. > > in kthread.h and call_usermode... in kmod.h). Again, we are talking > > about out-of-tree modules, they had better mechanism for at least > > 6 years and conversion to it is bloody trivial. Hell, it was even > > in late unlamented feature-removal-schedule.txt - since 2006. If that's > > not enough to retire an export, what is? > > OK... yes we can fix things up in a subsequent cleanup. > > We will need to fix the TOC handling when we go to using multiple TOCs > in the main kernel, with the linker managing the transitions between > TOCs. Our toolchain guys have been pushing us to do that for years, > because it should make things run faster, but first we'll have to stop > using ld -r to combine objects in subdirectories. How granular are you planning to make that? I mean, we are talking about 3 objects here - init/main.o, kernel/kthread.o and kernel/kmod.o. Do they get TOC separate from that of arch/powerpc/kernel/entry_64.o? Anyway, if ppc folks can live with that stuff in its current form for now, here's the second signal.git pull request. Stuff in there: kernel_thread/ kernel_execve/sys_execve conversions for several more architectures plus assorted signal fixes and cleanups. There'll be more (in particular, real fixes for alpha do_notify_resume() irq mess)... Linus, could you pull that queue? It's in the usual place - git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal for-linus Shortlog: Al Viro (38): powerpc: split ret_from_fork powerpc: switch to generic sys_execve()/kernel_execve() m68k: split ret_from_fork(), simplify kernel_thread() m68k: switch to generic sys_execve()/kernel_execve() frv: split ret_from_fork, simplify kernel_thread() a lot frv: switch to generic sys_execve() frv: switch to generic kernel_execve frv: switch to generic kernel_thread() mn10300: split ret_from_fork, simplify kernel_thread() mn10300: switch to generic sys_execve() mn10300: switch to generic kernel_execve() mn10300: convert to generic kernel_thread() c6x: switch to generic kernel_thread() xtensa: can't get to do_notify_resume() when user_mode(regs) is not true mn10300: get rid of calling do_notify_resume() when returning to kernel mode score: fix bogus restarts on sigreturn() ia64: can't reach do_signal() when returning to kernel mode mips: prevent hitting do_notify_resume() with !user_mode(regs) mips: unobfuscate _TIF..._MASK mips: merge the identical "return from syscall" per-ABI code mips: NOTIFY_RESUME is not needed in TIF masks unicore32: unobfuscate _TIF_WORK_MASK bury _TIF_RESTORE_SIGMASK sanitize tsk_is_polling() bury the rest of TIF_IRET parisc: fix double restarts parisc: don't bother looping in do_signal() parisc: decide whether to go to slow path (tracesys) based on thread flags h8300: trim _TIF_WORK_MASK unicore32: remove pointless test x86: get rid of duplicate code in case of CONFIG_VM86 frv: no need to raise SIGTRAP in setup_frame() mn10300: don't bother with SIGTRAP in setup_frame() microblaze: don't bother with SIGTRAP in setup_rt_frame() tile: don't bother with SIGTRAP in setup_frame avr32: trim masks m32r: trim masks alpha: don't open-code trace_report_syscall_{enter,exit} Greg Ungerer (1): m68k: always set stack frame format for ColdFire on thread start Mark Salter (3): c6x: add ret_from_kernel_thread(), simplify kernel_thread() c6x: switch to generic kernel_execve c6x: switch to generic sys_execve Richard Weinberger (1): Uninclude linux/freezer.h Diffstat: arch/alpha/include/asm/thread_info.h |3 +- arch/alpha/kernel/entry.S | 13 ++-- arch/alpha/kernel/ptrace.c| 32 - arch/arm/include/asm/thread_info.h|2 - arch/arm/kernel/signal.c |1 - arch/avr32/include/asm/thread_info.h | 18 ++--- arch/avr32/kernel/signal.c|1 - arch/blackfin/include/asm/thread_info.h |4 - arch/blackfin/kernel/signal.c |1 - arch/c6x/Kconfig |1 + arch/c6x/include/asm/processor.h |2 - arch/c6x/include/asm/syscalls.h |5 -- arch/c6x/include/asm/thread_info.h|1 - arch/c6x/include/asm/unistd.h |3 + arch/c6x/kernel/asm-offsets.c |1 - arch/c6x/kernel/entry.S | 56
Re: [PATCH 07/33] autonuma: mm_autonuma and task_autonuma data structures
Hi Christoph, On Fri, Oct 12, 2012 at 12:23:17AM +, Christoph Lameter wrote: > On Thu, 11 Oct 2012, Rik van Riel wrote: > > > These statistics are updated at page fault time, I > > believe while holding the page table lock. > > > > In other words, they are in code paths where updating > > the stats should not cause issues. > > The per cpu counters in the VM were introduced because of > counter contention caused at page fault time. This is the same code path > where you think that there cannot be contention. There's no contention at all in autonuma27. I changed it in autonuma28, to get real time updates in mm_autonuma from migration events. There is no lock taken though (the spinlock below is taken once every pass, very rarely). It's a few liner change shown in detail below. The only contention point is this: + ACCESS_ONCE(mm_numa_fault[access_nid]) += numpages; + ACCESS_ONCE(mm_autonuma->mm_numa_fault_tot) += numpages; autonuma28 is much more experimental than autonuma27 :) I wouldn't focus on >1024 CPU systems for this though. The bigger the system the more costly any automatic placement logic will become, no matter which algorithm and which computation complexity the algorithm has, and chances are those will use NUMA hard bindings anyway considering how much they're expensive to setup and maintain. The diff looks like this, I can consider undoing it. Comments welcome. (but real time stats updates, converge faster in autonuma28) --- a/mm/autonuma.c +++ b/mm/autonuma.c static struct knuma_scand_data { struct list_head mm_head; /* entry: mm->mm_autonuma->mm_node */ struct mm_struct *mm; unsigned long address; - unsigned long *mm_numa_fault_tmp; } knuma_scand_data = { .mm_head = LIST_HEAD_INIT(knuma_scand_data.mm_head), }; + unsigned long tot; + + /* +* Set the task's fault_pass equal to the new +* mm's fault_pass, so new_pass will be false +* on the next fault by this thread in this +* same pass. +*/ + p->task_autonuma->task_numa_fault_pass = mm_numa_fault_pass; + /* If a new pass started, degrade the stats by a factor of 2 */ for_each_node(nid) task_numa_fault[nid] >>= 1; task_autonuma->task_numa_fault_tot >>= 1; + + if (mm_numa_fault_pass == + ACCESS_ONCE(mm_autonuma->mm_numa_fault_last_pass)) + return; + + spin_lock(_autonuma->mm_numa_fault_lock); + if (unlikely(mm_numa_fault_pass == +mm_autonuma->mm_numa_fault_last_pass)) { + spin_unlock(_autonuma->mm_numa_fault_lock); + return; + } + mm_autonuma->mm_numa_fault_last_pass = mm_numa_fault_pass; + + tot = 0; + for_each_node(nid) { + unsigned long fault = ACCESS_ONCE(mm_numa_fault[nid]); + fault >>= 1; + ACCESS_ONCE(mm_numa_fault[nid]) = fault; + tot += fault; + } + mm_autonuma->mm_numa_fault_tot = tot; + spin_unlock(_autonuma->mm_numa_fault_lock); } task_numa_fault[access_nid] += numpages; task_autonuma->task_numa_fault_tot += numpages; + ACCESS_ONCE(mm_numa_fault[access_nid]) += numpages; + ACCESS_ONCE(mm_autonuma->mm_numa_fault_tot) += numpages; + local_bh_enable(); } @@ -310,28 +355,35 @@ static void numa_hinting_fault_cpu_follow_memory(struct task_struct *p, @@ -593,35 +628,26 @@ static int knuma_scand_pmd(struct mm_struct *mm, goto out; if (pmd_trans_huge_lock(pmd, vma) == 1) { - int page_nid; - unsigned long *fault_tmp; ret = HPAGE_PMD_NR; VM_BUG_ON(address & ~HPAGE_PMD_MASK); - if (autonuma_mm_working_set() && pmd_numa(*pmd)) { + if (pmd_numa(*pmd)) { spin_unlock(>page_table_lock); goto out; } - page = pmd_page(*pmd); - /* only check non-shared pages */ if (page_mapcount(page) != 1) { spin_unlock(>page_table_lock); goto out; } - - page_nid = page_to_nid(page); - fault_tmp = knuma_scand_data.mm_numa_fault_tmp; - fault_tmp[page_nid] += ret; - if (pmd_numa(*pmd)) { spin_unlock(>page_table_lock); goto out; } - set_pmd_at(mm, address, pmd, pmd_mknuma(*pmd)); + /* defer TLB flush to lower the overhead */ spin_unlock(>page_table_lock); goto out; @@ -636,10 +662,9 @@ static int knuma_scand_pmd(struct mm_struct *mm, for (_address = address, _pte = pte; _address < end; _pte++, _address += PAGE_SIZE) { pte_t pteval = *_pte; - unsigned
Re: [PATCH 00/33] AutoNUMA27
On Thu, Oct 11, 2012 at 04:35:03PM +0100, Mel Gorman wrote: > If System CPU time really does go down as this converges then that > should be obvious from monitoring vmstat over time for a test. Early on > - high usage with that dropping as it converges. If that doesn't happen > then the tasks are not converging, the phases change constantly or > something unexpected happened that needs to be identified. Yes, all measurable kernel cost should be in the memory copies (migration and khugepaged, the latter is going to be optimized away). The migrations must stop after the workload converges. Either migrations are used to reach convergence or they shouldn't happen in the first place (not in any measurable amount). > Ok. Are they separate STREAM instances or threads running on the same > arrays? My understanding is separate instances. I think it's a single threaded benchmark and you run many copies. It was modified to run for 5min (otherwise upstream has not enough time to get it wrong, as result of background scheduling jitters). Thanks! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 10/33] autonuma: CPU follows memory algorithm
On Thu, Oct 11, 2012 at 03:58:05PM +0100, Mel Gorman wrote: > On Thu, Oct 04, 2012 at 01:50:52AM +0200, Andrea Arcangeli wrote: > > This algorithm takes as input the statistical information filled by the > > knuma_scand (mm->mm_autonuma) and by the NUMA hinting page faults > > (p->task_autonuma), evaluates it for the current scheduled task, and > > compares it against every other running process to see if it should > > move the current task to another NUMA node. > > > > That sounds expensive if there are a lot of running processes in the > system. How often does this happen? Mention it here even though I > realised much later that it's obvious from the patch itself. Ok I added: == This algorithm will run once every ~100msec, and can be easily slowed down further. Its computational complexity is O(nr_cpus) and it's executed by all CPUs. The number of running threads and processes is not going to alter the cost of this algorithm, only the online number of CPUs is. However practically this will very rarely hit on all CPUs runqueues. Most of the time it will only compute on local data in the task_autonuma struct (for example if convergence has been reached). Even if no convergence has been reached yet, it'll only scan the CPUs in the NUMA nodes where the local task_autonuma data is showing that they are worth migrating to. == It's configurable through sysfs, 100mses is the default. > > + * there is no affinity set for the task). > > + */ > > +static bool inline task_autonuma_cpu(struct task_struct *p, int cpu) > > +{ > > nit, but elsewhere you have > > static inline TYPE and here you have > static TYPE inline Fixed. > > > + int task_selected_nid; > > + struct task_autonuma *task_autonuma = p->task_autonuma; > > + > > + if (!task_autonuma) > > + return true; > > + > > + task_selected_nid = ACCESS_ONCE(task_autonuma->task_selected_nid); > > + if (task_selected_nid < 0 || task_selected_nid == cpu_to_node(cpu)) > > + return true; > > + else > > + return false; > > +} > > no need for else. Removed. > > > + > > +static inline void sched_autonuma_balance(void) > > +{ > > + struct task_autonuma *ta = current->task_autonuma; > > + > > + if (ta && current->mm) > > + __sched_autonuma_balance(); > > +} > > + > > Ok, so this could do with a comment explaining where it is called from. > It is called during idle balancing at least so potentially this is every > scheduler tick. It'll be run from softirq context so the cost will not > be obvious to a process but the overhead will be there. What happens if > this takes longer than a scheduler tick to run? Is that possible? softirqs can run for huge amount of time so it won't harm. Nested IRQs could even run on top of the softirq, and they could take milliseconds too if they're hyper inefficient and we must still run perfectly rock solid (with horrible latency, but still stable). I added: /* * This is called in the context of the SCHED_SOFTIRQ from * run_rebalance_domains(). */ > > +/* > > + * This function __sched_autonuma_balance() is responsible for > > This function is far too shot and could do with another few pages :P :) I tried to split it once already but gave up in the middle. > > + * "Full convergence" is achieved when all memory accesses by a task > > + * are 100% local to the CPU it is running on. A task's "best node" is > > I think this is the first time you defined convergence in the series. > The explanation should be included in the documentation. Ok. It's not too easy concept to explain with words. Here a try: * * A workload converges when all the memory of a thread or a process * has been placed in the NUMA node of the CPU where the process or * thread is running on. * > > + * other_diff: how much the current task is closer to fully converge > > + * on the node of the other CPU than the other task that is currently > > + * running in the other CPU. > > In the changelog you talked about comparing a process with every other > running process but here it looks like you intent to examine every > process that is *currently running* on a remote node and compare that. > What if the best process to swap with is not currently running? Do we > miss it? Correct, only currently running processes are being checked. If a task in R state goes to sleep immediately, it's not relevant where it runs. We focus on "long running" compute tasks, so tasks that are in R state most frequently. > > + * If both checks succeed it guarantees that we found a way to > > + * multilaterally improve the system wide NUMA > > + * convergence. Multilateral here means that the same checks will not > > + * succeed again on those same two tasks, after the task exchange, so > > + * there is no risk of ping-pong. > > + * > > At least not in that instance of time. A new CPU binding or change in > behaviour (such as a computation finishing and a reduce step starting) > might change that scoring. Yes. > > +
Re: uprobe: checking probe event include directory
On Wed, Jul 18, 2012 at 7:45 PM, Srikar Dronamraju wrote: > * Jovi Zhang [2012-07-18 19:38:27]: > >> On Wed, Jul 18, 2012 at 7:07 PM, Srikar Dronamraju >> wrote: >> > The patch looks good, >> > >> > Can you modify the description a bit. However you are free to ignore >> > these comments. After knowing your response, I will ack the patch. >> > >> > I would probably put this as: >> > >> > The subject could be >> > tracing: Verify target file before registering a uprobe event >> > >> > Description: >> > Without this patch, we can register a uprobe event for a directory. >> > Enabling such a uprobe event would anyway fail. >> > >> > Example: >> > >> > $ echo 'p /bin:0x4245c0' > /sys/kernel/debug/tracing/uprobe_events >> > >> > However directories cannot be valid targets for uprobe. >> > Hence verify if the target is a regular file during the probe >> > registration. >> >> Thanks srikar, your description is more clear than mine. >> >> >> From fd5077196038cc271e2116e1fca359a0011e1669 Mon Sep 17 00:00:00 2001 >> From: Jovi Zhang >> Date: Wed, 18 Jul 2012 18:16:44 +0800 >> Subject: [PATCH] tracing: Verify target file before registering a uprobe >> event >> >> Without this patch, we can register a uprobe event for a directory. >> Enabling such a uprobe event would anyway fail. >> >> Example: >> $ echo 'p /bin:0x4245c0' > /sys/kernel/debug/tracing/uprobe_events >> >> However dirctories cannot be valid targets for uprobe. >> Hence verify if the target is a regular file during the probe >> registration. >> >> Signed-off-by: Jovi Zhang >> Reviewed-by: Srikar Dronamraju >> --- >> kernel/trace/trace_uprobe.c |6 +- >> 1 file changed, 5 insertions(+), 1 deletion(-) >> >> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c >> index 85158fa..3b5f646 100644 >> --- a/kernel/trace/trace_uprobe.c >> +++ b/kernel/trace/trace_uprobe.c >> @@ -259,6 +259,10 @@ static int create_trace_uprobe(int argc, char **argv) >> goto fail_address_parse; >> >> inode = igrab(path.dentry->d_inode); >> + if (!S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode)) { >> + ret = -EINVAL; >> + goto fail_address_parse; >> + } >> >> argc -= 2; >> argv += 2; >> @@ -358,7 +362,7 @@ fail_address_parse: >> if (inode) >> iput(inode); >> >> - pr_info("Failed to parse address.\n"); >> + pr_info("Failed to parse address or file.\n"); >> >> return ret; >> } > > Looks good. > > Acked-by: Srikar Dronamraju > Hi Andrew, Is this patch ok to go through your tree? .jovi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][CFT][CFReview] execve and kernel_thread unification work
On Thu, Oct 11, 2012 at 01:53:06PM +0100, Al Viro wrote: > Umm... Maybe, but let's do that as subsequent cleanup. Again, > we almost certainly don't need to mess with TOC at all - the callbacks > are in the main kernel, there are very few of them and they really are > low-level details of exported mechanisms (i.e. kthread_create/run/etc. > in kthread.h and call_usermode... in kmod.h). Again, we are talking > about out-of-tree modules, they had better mechanism for at least > 6 years and conversion to it is bloody trivial. Hell, it was even > in late unlamented feature-removal-schedule.txt - since 2006. If that's > not enough to retire an export, what is? OK... yes we can fix things up in a subsequent cleanup. We will need to fix the TOC handling when we go to using multiple TOCs in the main kernel, with the linker managing the transitions between TOCs. Our toolchain guys have been pushing us to do that for years, because it should make things run faster, but first we'll have to stop using ld -r to combine objects in subdirectories. Paul. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
linux-next: build failure after merge of the l2-mtd tree
Hi Artem, After merging the l2-mtd tree, today's linux-next build (x86_64 allmodconfig) failed like this: ERROR: "denali_init" [drivers/mtd/nand/denali_pci.ko] undefined! ERROR: "denali_remove" [drivers/mtd/nand/denali_pci.ko] undefined! Probably caused by commit 305b1ee29c8e ("mtd: denali: split the generic driver and PCI layer"). I have used the l2-mtd tree from next-20121011 for today. -- Cheers, Stephen Rothwells...@canb.auug.org.au pgpy1SbpHtmy4.pgp Description: PGP signature
Re: [PATCH 2/2]suppress "Device nodeX does not have a release() function" warning
2012/10/12 5:31, David Rientjes wrote: On Thu, 11 Oct 2012, Yasuaki Ishimatsu wrote: When calling unregister_node(), the function shows following message at device_release(). "Device 'node2' does not have a release() function, it is broken and must be fixed." The reason is node's device struct does not have a release() function. So the patch registers node_device_release() to the device's release() function for suppressing the warning message. Additionally, the patch adds memset() to initialize a node struct into register_node(). Because the node struct is part of node_devices[] array and it cannot be freed by node_device_release(). So if system reuses the node struct, it has a garbage. Nice catch on reuse of the statically allocated node_devices[] for node hotplug. CC: David Rientjes CC: Jiang Liu Cc: Minchan Kim CC: Andrew Morton CC: KOSAKI Motohiro Signed-off-by: Yasuaki Ishimatsu Signed-off-by: Wen Congyang Can register_node() be made static in drivers/base/node.c and its declaration removed from linux/node.h? Yah. I'll fix it. Thanks, Yasuaki Ishimatsu Acked-by: David Rientjes -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT] TPM bugfixes
On 10/11/2012 04:00 PM, Andrew Morton wrote: > On Thu, 11 Oct 2012 15:49:36 -0700 > Andrew Morton wrote: > >> On Fri, 12 Oct 2012 00:45:06 +0200 >> richard -rw- weinberger wrote: >> >>> On Fri, Oct 12, 2012 at 12:19 AM, Andrew Morton >>> wrote: On Thu, 11 Oct 2012 21:54:18 +1100 (EST) James Morris wrote: > Please pull these fixes for the TPM code. > > The following changes since commit > 12250d843e8489ee00b5b7726da855e51694e792: > > Merge branch 'i2c-embedded/for-next' of > git://git.pengutronix.de/git/wsa/linux (2012-10-11 10:27:51 +0900) > > are available in the git repository at: > > > git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git > for-linus Gargh. Is it possible to add a human-readable http URL to these things so that people can actually look at the patches without hoop-jumping? >>> >>> rw@mantary:~> echo >>> "git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git >>> for-linus" | sed -e >>> 's/git\:\/\//http\:\/\//;s/\/pub\/scm\//\?p=/g;s/\.git >>> /\.git\;a=shortlog;h=refs\/heads\//g' >>> >>> http://git.kernel.org?p=linux/kernel/git/jmorris/linux-security.git;a=shortlog;h=refs/heads/for-linus >>> >> >> Geeze. >> >> Thanks. Followed by ^F, copy-n-paste, then hope it's on the first page? > > http://git.kernel.org/?p=linux/kernel/git/jmorris/linux-security.git;a=commit;h=abce9ac292e13da367bbd22c1f7669f988d931ac > > Which can be shortened to something like > > http://git.kernel.org/?p=linux/kernel/git/jmorris/linux-security.git;h=abce9ac2 > > Simply including the commit IDs would help a lot. Including the full > URL to each commit would be nicer. absolutely (for full URL). > I assume Junio owns that script? > -- -- ~Randy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] uprobe: fix misleading log entry
On Wed, Jul 18, 2012 at 5:22 PM, Srikar Dronamraju wrote: > * Jovi Zhang [2012-07-18 11:08:42]: > >> From 68232ef2decae95b807f2f3763e8ea99c1a3b2ae Mon Sep 17 00:00:00 2001 >> From: Jovi Zhang >> Date: Wed, 18 Jul 2012 17:51:26 +0800 >> Subject: [PATCH] uprobe: fix misleading log entry >> >> There don't have any 'r' prefix in uprobe event naming, remove it. >> >> Signed-off-by: Jovi Zhang >> --- >> kernel/trace/trace_uprobe.c |2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c >> index cf382de..852a584 100644 >> --- a/kernel/trace/trace_uprobe.c >> +++ b/kernel/trace/trace_uprobe.c >> @@ -191,7 +191,7 @@ static int create_trace_uprobe(int argc, char **argv) >> if (argv[0][0] == '-') >> is_delete = true; >> else if (argv[0][0] != 'p') { >> - pr_info("Probe definition must be started with 'p', 'r' or" " >> '-'.\n"); >> + pr_info("Probe definition must be started with 'p' or '-'.\n"); >> return -EINVAL; >> } >> > > Yes, uprobes doesnt support return probes. So we should not have > mentioned about r. > > Acked-by: Srikar Dronamraju > Hi Andrew, Is this patch ok to go through your mm tree? mainline still don't merge it. .jovi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] kvm/ftrace: change the diplaying format of vector in trace_msi_set_irq
This patch changes the way to diplay the vector in trace_msi_set_irq from %x to %u. Currently, it mismatches another output of ftrace such as kvm_msi_set_irq and kvm_inj_virq which uses %u. Signed-off-by: Koki Sanagi --- include/trace/events/kvm.h |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h index 7ef9e75..0a83632 100644 --- a/include/trace/events/kvm.h +++ b/include/trace/events/kvm.h @@ -109,7 +109,7 @@ TRACE_EVENT(kvm_msi_set_irq, __entry->data = data; ), - TP_printk("dst %u vec %x (%s|%s|%s%s)", + TP_printk("dst %u vec %u (%s|%s|%s%s)", (u8)(__entry->address >> 12), (u8)__entry->data, __print_symbolic((__entry->data >> 8 & 0x7), kvm_deliver_mode), (__entry->address & (1<<2)) ? "logical" : "physical", N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a��� 0��h���i
[PATCH] gen_init_cpio: avoid stack overflow when expanding
Fix possible overflow of the buffer used for expanding environment variables when building file list. $ cat usr/crash.list file foo ${BIG}${BIG}${BIG}${BIG}${BIG}${BIG} 0755 0 0 $ BIG=$(perl -e 'print "A" x 4096;') ./usr/gen_init_cpio usr/crash.list *** buffer overflow detected ***: ./usr/gen_init_cpio terminated This also replaces the space-indenting with tabs. Patch based on existing fix extracted from grsecurity. Cc: Michal Marek Cc: Gene Sally Cc: Brad Spengler Cc: PaX Team Cc: sta...@vger.kernel.org Signed-off-by: Kees Cook --- usr/gen_init_cpio.c | 43 +++ 1 file changed, 23 insertions(+), 20 deletions(-) diff --git a/usr/gen_init_cpio.c b/usr/gen_init_cpio.c index af0f22f..aca6edc 100644 --- a/usr/gen_init_cpio.c +++ b/usr/gen_init_cpio.c @@ -303,7 +303,7 @@ static int cpio_mkfile(const char *name, const char *location, int retval; int rc = -1; int namesize; - int i; + unsigned int i; mode |= S_IFREG; @@ -381,25 +381,28 @@ error: static char *cpio_replace_env(char *new_location) { - char expanded[PATH_MAX + 1]; - char env_var[PATH_MAX + 1]; - char *start; - char *end; - - for (start = NULL; (start = strstr(new_location, "${")); ) { - end = strchr(start, '}'); - if (start < end) { - *env_var = *expanded = '\0'; - strncat(env_var, start + 2, end - start - 2); - strncat(expanded, new_location, start - new_location); - strncat(expanded, getenv(env_var), PATH_MAX); - strncat(expanded, end + 1, PATH_MAX); - strncpy(new_location, expanded, PATH_MAX); - } else - break; - } - - return new_location; + char expanded[PATH_MAX + 1]; + char env_var[PATH_MAX + 1]; + char *start; + char *end; + + for (start = NULL; (start = strstr(new_location, "${")); ) { + end = strchr(start, '}'); + if (start < end) { + *env_var = *expanded = '\0'; + strncat(env_var, start + 2, end - start - 2); + strncat(expanded, new_location, start - new_location); + strncat(expanded, getenv(env_var), + PATH_MAX - strlen(expanded)); + strncat(expanded, end + 1, + PATH_MAX - strlen(expanded)); + strncpy(new_location, expanded, PATH_MAX); + new_location[PATH_MAX] = 0; + } else + break; + } + + return new_location; } -- 1.7.9.5 -- Kees Cook Chrome OS Security -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] MM: Support more pagesizes for MAP_HUGETLB/SHM_HUGETLB v4
From: Andi Kleen There was some desire in large applications using MAP_HUGETLB/SHM_HUGETLB to use 1GB huge pages on some mappings, and stay with 2MB on others. This is useful together with NUMA policy: use 2MB interleaving on some mappings, but 1GB on local mappings. This patch extends the IPC/SHM syscall interfaces slightly to allow specifying the page size. It borrows some upper bits in the existing flag arguments and allows encoding the log of the desired page size in addition to the *_HUGETLB flag. When 0 is specified the default size is used, this makes the change fully compatible. Extending the internal hugetlb code to handle this is straight forward. Instead of a single mount it just keeps an array of them and selects the right mount based on the specified page size. When no page size is specified it uses the mount of the default page size. The change is not visible in /proc/mounts because internal mounts don't appear there. It also has very little overhead: the additional mounts just consume a super block, but not more memory when not used. I also exported the new flags to the user headers (they were previously under __KERNEL__). Right now only symbols for x86 and some other architecture for 1GB and 2MB are defined. The interface should already work for all other architectures though. Only architectures that define multiple hugetlb sizes actually need it (that is currently x86, tile, powerpc). However tile and powerpc have user configurable hugetlb sizes, so it's not easy to add defines. A program on those architectures would need to query sysfs and use the appropiate log2. v2: Port to new tree. Fix unmount. v3: Ported to latest tree. v4: Ported to latest tree. Minor changes for review feedback. Updated description. Acked-by: Rik van Riel Acked-by: KAMEZAWA Hiroyuki Signed-off-by: Andi Kleen --- arch/x86/include/asm/mman.h |3 ++ fs/hugetlbfs/inode.c| 63 +++ include/linux/hugetlb.h | 12 ++- include/linux/shm.h | 19 include/uapi/asm-generic/mman.h | 13 ipc/shm.c |3 +- mm/mmap.c |5 ++- 7 files changed, 100 insertions(+), 18 deletions(-) diff --git a/arch/x86/include/asm/mman.h b/arch/x86/include/asm/mman.h index 593e51d..513b05f 100644 --- a/arch/x86/include/asm/mman.h +++ b/arch/x86/include/asm/mman.h @@ -3,6 +3,9 @@ #define MAP_32BIT 0x40/* only give out 32bit addresses */ +#define MAP_HUGE_2MB(21 << MAP_HUGE_SHIFT) +#define MAP_HUGE_1GB(30 << MAP_HUGE_SHIFT) + #include #endif /* _ASM_X86_MMAN_H */ diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index c5bc355..d34bb56 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -923,7 +923,7 @@ static struct file_system_type hugetlbfs_fs_type = { .kill_sb= kill_litter_super, }; -static struct vfsmount *hugetlbfs_vfsmount; +static struct vfsmount *hugetlbfs_vfsmount[HUGE_MAX_HSTATE]; static int can_do_hugetlb_shm(void) { @@ -932,9 +932,22 @@ static int can_do_hugetlb_shm(void) return capable(CAP_IPC_LOCK) || in_group_p(shm_group); } +static int get_hstate_idx(int page_size_log) +{ + struct hstate *h; + + if (!page_size_log) + return default_hstate_idx; + h = size_to_hstate(1 << page_size_log); + if (!h) + return -1; + return h - hstates; +} + struct file *hugetlb_file_setup(const char *name, unsigned long addr, size_t size, vm_flags_t acctflag, - struct user_struct **user, int creat_flags) + struct user_struct **user, + int creat_flags, int page_size_log) { int error = -ENOMEM; struct file *file; @@ -944,9 +957,14 @@ struct file *hugetlb_file_setup(const char *name, unsigned long addr, struct qstr quick_string; struct hstate *hstate; unsigned long num_pages; + int hstate_idx; + + hstate_idx = get_hstate_idx(page_size_log); + if (hstate_idx < 0) + return ERR_PTR(-ENODEV); *user = NULL; - if (!hugetlbfs_vfsmount) + if (!hugetlbfs_vfsmount[hstate_idx]) return ERR_PTR(-ENOENT); if (creat_flags == HUGETLB_SHMFS_INODE && !can_do_hugetlb_shm()) { @@ -963,7 +981,7 @@ struct file *hugetlb_file_setup(const char *name, unsigned long addr, } } - root = hugetlbfs_vfsmount->mnt_root; + root = hugetlbfs_vfsmount[hstate_idx]->mnt_root; quick_string.name = name; quick_string.len = strlen(quick_string.name); quick_string.hash = 0; @@ -971,7 +989,7 @@ struct file *hugetlb_file_setup(const char *name, unsigned long addr, if (!path.dentry) goto out_shm_unlock; - path.mnt = mntget(hugetlbfs_vfsmount); + path.mnt =
Re: [PATCH] MM: Support more pagesizes for MAP_HUGETLB/SHM_HUGETLB v3
> Alas, include/asm-generic/mman.h doesn't exist now. git resolved it automagically > > Does this change touch all the hugetlb-capable architectures? I took a look at this again. So not every hugetlb capable architecture needs it, only architectures with multiple hugetlb page sizes. This is only x86, tile, powerpc I looked at tile and powerpc and they both have configurable hugetlb page sizes. So it's somewhat awkward to add defines for them. One disadvantage of this is also the user programs would need to know the page sizes that are configured. That is definitely awkward, but I don't know of any way around that. Luckily there's a way in /sys to query this. -Andi > > z:/usr/src/linux-3.6> grep -rl MAP_HUGETLB arch > arch/alpha/include/asm/mman.h > arch/xtensa/include/asm/mman.h > arch/parisc/include/asm/mman.h > arch/tile/include/asm/mman.h > arch/sparc/include/asm/mman.h > arch/powerpc/include/asm/mman.h > arch/mips/include/asm/mman.h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: amd64, v3.6.0: Kernel panic + BUG at net/netfilter/nf_conntrack_core.c:220!
On Thu, Oct 11, 2012 at 11:27:33PM +0200, Borislav Petkov wrote: > On Thu, Oct 11, 2012 at 12:13:33PM -0700, Ian Applegate wrote: > > On machines serving mainly http traffic we are seeing the following > > panic, which is not yet reproducible. > > Must be this BUG_ON: > > if (!nf_ct_is_confirmed(ct)) { > > BUG_ON(hlist_nulls_unhashed(>tuplehash[IP_CT_DIR_ORIGINAL].hnnode)); > hlist_nulls_del_rcu(>tuplehash[IP_CT_DIR_ORIGINAL].hnnode); > } At quick glance, I think we're hitting a memory corruption, I don't see by now any sane code path to reach that bugtrap. More comments below: > Spamming some more lists and leaving the rest for reference. > > > > > > > [180926.566743] [ cut here ] > > [180926.572034] kernel BUG at net/netfilter/nf_conntrack_core.c:220! > > [180926.578873] invalid opcode: [#1] SMP > > [180926.583594] Modules linked in: xfs exportfs ipmi_devintf ipmi_si > > ipmi_msghandler dm_mod md_mod nf_conntr > > ack_ipv6 nf_defrag_ipv6 ip6table_filter ip6table_raw ip6_tables > > nf_conntrack_ipv4 nf_defrag_ipv4 xt_tcpudp x > > t_conntrack xt_multiport iptable_filter xt_NOTRACK nf_conntrack > > iptable_raw ip_tables x_tables nfsv4 auth_rp > > cgss fuse nfsv3 nfs_acl nfs fscache lockd sunrpc sfc mtd i2c_algo_bit > > i2c_core mdio igb dca uhci_hcd coretem > > p acpi_cpufreq kvm_intel kvm crc32c_intel aesni_intel ablk_helper > > cryptd aes_x86_64 aes_generic evdev sd_mod > > crc_t10dif mperf snd_pcm ahci snd_timer tpm_tis microcode snd tpm > > libahci tpm_bios soundcore libata snd_pag > > e_alloc pcspkr ehci_hcd lpc_ich usbcore mfd_core hpsa scsi_mod > > usb_common button processor thermal_sys > > [180926.657762] CPU 12 > > [180926.660008] Pid: 5948, comm: nginx-fl Not tainted 3.6.0-cloudflare > > #1 HP ProLiant DL180 G6 > > [180926.669820] RIP: 0010:[] [] > > destroy_conntrack+0x55/0xa9 [nf_conntrack] > > [180926.680871] RSP: 0018:8805bd73fbb8 EFLAGS: 00010246 > > [180926.686930] RAX: RBX: 8806b6f56c30 RCX: > > 8805bd73fc48 > > [180926.695055] RDX: RSI: 0006 RDI: > > 8806b6f56c30 > > [180926.703179] RBP: 81651780 R08: 000172e0 R09: > > 812cef91 > > [180926.711304] R10: dead00200200 R11: dead00100100 R12: > > 8806b6f56c30 > > [180926.727451] R13: R14: a02d6030 R15: > > > > [180926.735575] FS: 7f382cdb2710() GS:880627cc() > > knlGS: > > [180926.744766] CS: 0010 DS: ES: CR0: 80050033 > > [180926.751312] CR2: ff600400 CR3: 0005bd8d3000 CR4: > > 07e0 > > [180926.759436] DR0: DR1: DR2: > > > > [180926.767560] DR3: DR6: 0ff0 DR7: > > 0400 > > [180926.775686] Process nginx-fl (pid: 5948, threadinfo > > 8805bd73e000, task 8805c9755960) > > [180926.785265] Stack: > > [180926.787634] 81651780 8802720c2900 > > a02cde78 > > [180926.796087] 81651ec0 a02d6030 bff0efab > > 8805 > > [180926.804532] 000288050002 00030014 00140003 > > 06ff88060002 > > [180926.812985] Call Trace: > > [180926.815845] [] ? nf_conntrack_in+0x4ed/0x5bc > > [nf_conntrack] Here below the trace shows the output path to close a tcp socket. But the line above refers to a conntrack function that is called in the input path. If this process is just acting as plain http server, this backtrace doesn't seem consistent to me. > > [180926.824069] [] ? nf_iterate+0x41/0x77 > > [180926.830131] [] ? ip_options_echo+0x2ed/0x2ed > > [180926.836873] [] ? nf_hook_slow+0x68/0xfd > > [180926.843127] [] ? ip_options_echo+0x2ed/0x2ed > > [180926.849866] [] ? __ip_local_out+0x98/0x9d > > [180926.856315] [] ? ip_local_out+0x9/0x19 > > [180926.862465] [] ? tcp_transmit_skb+0x7ae/0x7f1 > > [180926.869305] [] ? virt_to_head_page+0x9/0x2c > > [180926.875949] [] ? tcp_send_active_reset+0xd5/0x101 > > [180926.883175] [] ? tcp_close+0x118/0x354 > > [180926.889334] [] ? inet_release+0x75/0x7b > > [180926.895591] [] ? sock_release+0x19/0x73 > > [180926.901845] [] ? sock_close+0x22/0x27 > > [180926.907906] [] ? __fput+0xe9/0x1ae > > [180926.913677] [] ? task_work_run+0x53/0x67 > > [180926.920031] [] ? do_notify_resume+0x79/0x8d > > [180926.926673] [] ? int_signal+0x12/0x17 > > [180926.932732] Code: 05 48 89 df ff d0 48 c7 c7 30 66 2d a0 e8 11 b0 > > 07 e1 48 89 df e8 72 25 00 00 48 8b 43 > > 78 a8 08 75 2a 48 8b 53 10 48 85 d2 75 04 <0f> 0b eb fe 48 8b 43 08 > > 48 89 02 a8 01 75 04 48 89 50 08 48 be > > [180926.954788] RIP [] destroy_conntrack+0x55/0xa9 > > [nf_conntrack] > > [180926.963217] RSP > > [180926.967700] ---[ end trace 54a660a52afd5820 ]--- > > [180926.973038] Kernel panic - not syncing: Fatal exception in interrupt > > > > --- > > Ian Applegate > >
Re: [PATCH v5 1/3] tracing,x86: Add a TSC trace_clock
On Mon, Oct 8, 2012 at 7:08 PM, Yoshihiro YUNOMAE wrote: > Hi David, > > This is a nice patch set. > > I just have found something should be fixed, which related to > your work. I'll send it following this mail. > > Would you mind adding these patches as your patch series? Thanks for noticing the stats issue. Added them to my series. > > Thanks, > > Yoshihiro YUNOMAE > > (2012/10/02 12:31), David Sharp wrote: >> In order to promote interoperability between userspace tracers and ftrace, >> add a trace_clock that reports raw TSC values which will then be recorded >> in the ring buffer. Userspace tracers that also record TSCs are then on >> exactly the same time base as the kernel and events can be unambiguously >> interlaced. >> > > -- > Yoshihiro YUNOMAE > Software Platform Research Dept. Linux Technology Center > Hitachi, Ltd., Yokohama Research Laboratory > E-mail: yoshihiro.yunomae...@hitachi.com > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v6 3/6] tracing: Format non-nanosec times from tsc clock without a decimal point.
With the addition of the "tsc" clock, formatting timestamps to look like fractional seconds is misleading. Mark clocks as either in nanoseconds or not, and format non-nanosecond timestamps as decimal integers. Tested: $ cd /sys/kernel/debug/tracing/ $ cat trace_clock [local] global tsc $ echo sched_switch > set_event $ echo 1 > tracing_enabled ; sleep 0.0005 ; echo 0 > tracing_enabled $ cat trace -0 [000] 6330.52: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 next_prio=120 sleep-29964 [000] 6330.555628: sched_switch: prev_comm=bash prev_pid=29964 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120 ... $ echo 1 > options/latency-format $ cat trace -0 0 4104553247us+: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 next_prio=120 sleep-29964 0 4104553322us+: sched_switch: prev_comm=bash prev_pid=29964 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120 ... $ echo tsc > trace_clock $ cat trace $ echo 1 > tracing_enabled ; sleep 0.0005 ; echo 0 > tracing_enabled $ echo 0 > options/latency-format $ cat trace -0 [000] 16490053398357: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 next_prio=120 sleep-31128 [000] 16490053588518: sched_switch: prev_comm=bash prev_pid=31128 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120 ... echo 1 > options/latency-format $ cat trace -0 0 91557653238+: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 next_prio=120 sleep-31128 0 91557843399+: sched_switch: prev_comm=bash prev_pid=31128 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120 ... v2: Move arch-specific bits out of generic code. v4: Fix x86_32 build due to 64-bit division. Google-Bug-Id: 6980623 Signed-off-by: David Sharp Cc: Steven Rostedt Cc: Masami Hiramatsu --- arch/x86/include/asm/trace_clock.h |2 +- include/linux/ftrace_event.h |6 +++ kernel/trace/trace.c | 15 +- kernel/trace/trace.h |4 -- kernel/trace/trace_output.c| 80 --- 5 files changed, 74 insertions(+), 33 deletions(-) diff --git a/arch/x86/include/asm/trace_clock.h b/arch/x86/include/asm/trace_clock.h index 7ee0d8c..45e17f5 100644 --- a/arch/x86/include/asm/trace_clock.h +++ b/arch/x86/include/asm/trace_clock.h @@ -9,7 +9,7 @@ extern u64 notrace trace_clock_x86_tsc(void); # define ARCH_TRACE_CLOCKS \ - { trace_clock_x86_tsc, "x86-tsc" }, + { trace_clock_x86_tsc, "x86-tsc", .in_ns = 0 }, #endif diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h index 642928c..c760670 100644 --- a/include/linux/ftrace_event.h +++ b/include/linux/ftrace_event.h @@ -86,6 +86,12 @@ struct trace_iterator { cpumask_var_t started; }; +enum trace_iter_flags { + TRACE_FILE_LAT_FMT = 1, + TRACE_FILE_ANNOTATE = 2, + TRACE_FILE_TIME_IN_NS = 4, +}; + struct trace_event; diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 4e26df3..cff3427 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -476,10 +476,11 @@ static const char *trace_options[] = { static struct { u64 (*func)(void); const char *name; + int in_ns; /* is this clock in nanoseconds? */ } trace_clocks[] = { - { trace_clock_local,"local" }, - { trace_clock_global, "global" }, - { trace_clock_counter, "counter" }, + { trace_clock_local,"local",1 }, + { trace_clock_global, "global", 1 }, + { trace_clock_counter, "counter", 0 }, ARCH_TRACE_CLOCKS }; @@ -2425,6 +2426,10 @@ __tracing_open(struct inode *inode, struct file *file) if (ring_buffer_overruns(iter->tr->buffer)) iter->iter_flags |= TRACE_FILE_ANNOTATE; + /* Output in nanoseconds only if we are using a clock in nanoseconds. */ + if (trace_clocks[trace_clock_id].in_ns) + iter->iter_flags |= TRACE_FILE_TIME_IN_NS; + /* stop the trace while dumping */ tracing_stop(); @@ -3324,6 +3329,10 @@ static int tracing_open_pipe(struct inode *inode, struct file *filp) if (trace_flags & TRACE_ITER_LATENCY_FMT) iter->iter_flags |= TRACE_FILE_LAT_FMT; + /* Output in nanoseconds only if we are using a clock in nanoseconds. */ + if (trace_clocks[trace_clock_id].in_ns) + iter->iter_flags |= TRACE_FILE_TIME_IN_NS; + iter->cpu_file = cpu_file; iter->tr = _trace; mutex_init(>mutex); diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index 55e1f7f..84fefed 100644 --- a/kernel/trace/trace.h +++
[PATCH v6 1/6] tracing,x86: Add a TSC trace_clock
In order to promote interoperability between userspace tracers and ftrace, add a trace_clock that reports raw TSC values which will then be recorded in the ring buffer. Userspace tracers that also record TSCs are then on exactly the same time base as the kernel and events can be unambiguously interlaced. Tested: Enabled a tracepoint and the "tsc" trace_clock and saw very large timestamp values. v2: Move arch-specific bits out of generic code. v3: Rename "x86-tsc", cleanups Google-Bug-Id: 6980623 Signed-off-by: David Sharp Cc: Steven Rostedt Cc: Masami Hiramatsu Cc: Ingo Molnar Cc: Thomas Gleixner Cc: "H. Peter Anvin" Acked-by: Ingo Molnar --- arch/alpha/include/asm/trace_clock.h |6 ++ arch/arm/include/asm/trace_clock.h|6 ++ arch/avr32/include/asm/trace_clock.h |6 ++ arch/blackfin/include/asm/trace_clock.h |6 ++ arch/c6x/include/asm/trace_clock.h|6 ++ arch/cris/include/asm/trace_clock.h |6 ++ arch/frv/include/asm/trace_clock.h|6 ++ arch/h8300/include/asm/trace_clock.h |6 ++ arch/hexagon/include/asm/trace_clock.h|6 ++ arch/ia64/include/asm/trace_clock.h |6 ++ arch/m32r/include/asm/trace_clock.h |6 ++ arch/m68k/include/asm/trace_clock.h |6 ++ arch/microblaze/include/asm/trace_clock.h |6 ++ arch/mips/include/asm/trace_clock.h |6 ++ arch/mn10300/include/asm/trace_clock.h|6 ++ arch/openrisc/include/asm/trace_clock.h |6 ++ arch/parisc/include/asm/trace_clock.h |6 ++ arch/powerpc/include/asm/trace_clock.h|6 ++ arch/s390/include/asm/trace_clock.h |6 ++ arch/score/include/asm/trace_clock.h |6 ++ arch/sh/include/asm/trace_clock.h |6 ++ arch/sparc/include/asm/trace_clock.h |6 ++ arch/tile/include/asm/trace_clock.h |6 ++ arch/um/include/asm/trace_clock.h |6 ++ arch/unicore32/include/asm/trace_clock.h |6 ++ arch/x86/include/asm/trace_clock.h| 16 arch/x86/kernel/Makefile |1 + arch/x86/kernel/trace_clock.c | 21 + arch/xtensa/include/asm/trace_clock.h |6 ++ include/asm-generic/trace_clock.h | 16 include/linux/trace_clock.h |2 ++ kernel/trace/trace.c |1 + 32 files changed, 213 insertions(+), 0 deletions(-) create mode 100644 arch/alpha/include/asm/trace_clock.h create mode 100644 arch/arm/include/asm/trace_clock.h create mode 100644 arch/avr32/include/asm/trace_clock.h create mode 100644 arch/blackfin/include/asm/trace_clock.h create mode 100644 arch/c6x/include/asm/trace_clock.h create mode 100644 arch/cris/include/asm/trace_clock.h create mode 100644 arch/frv/include/asm/trace_clock.h create mode 100644 arch/h8300/include/asm/trace_clock.h create mode 100644 arch/hexagon/include/asm/trace_clock.h create mode 100644 arch/ia64/include/asm/trace_clock.h create mode 100644 arch/m32r/include/asm/trace_clock.h create mode 100644 arch/m68k/include/asm/trace_clock.h create mode 100644 arch/microblaze/include/asm/trace_clock.h create mode 100644 arch/mips/include/asm/trace_clock.h create mode 100644 arch/mn10300/include/asm/trace_clock.h create mode 100644 arch/openrisc/include/asm/trace_clock.h create mode 100644 arch/parisc/include/asm/trace_clock.h create mode 100644 arch/powerpc/include/asm/trace_clock.h create mode 100644 arch/s390/include/asm/trace_clock.h create mode 100644 arch/score/include/asm/trace_clock.h create mode 100644 arch/sh/include/asm/trace_clock.h create mode 100644 arch/sparc/include/asm/trace_clock.h create mode 100644 arch/tile/include/asm/trace_clock.h create mode 100644 arch/um/include/asm/trace_clock.h create mode 100644 arch/unicore32/include/asm/trace_clock.h create mode 100644 arch/x86/include/asm/trace_clock.h create mode 100644 arch/x86/kernel/trace_clock.c create mode 100644 arch/xtensa/include/asm/trace_clock.h create mode 100644 include/asm-generic/trace_clock.h diff --git a/arch/alpha/include/asm/trace_clock.h b/arch/alpha/include/asm/trace_clock.h new file mode 100644 index 000..f35fab8 --- /dev/null +++ b/arch/alpha/include/asm/trace_clock.h @@ -0,0 +1,6 @@ +#ifndef _ASM_TRACE_CLOCK_H +#define _ASM_TRACE_CLOCK_H + +#include + +#endif diff --git a/arch/arm/include/asm/trace_clock.h b/arch/arm/include/asm/trace_clock.h new file mode 100644 index 000..f35fab8 --- /dev/null +++ b/arch/arm/include/asm/trace_clock.h @@ -0,0 +1,6 @@ +#ifndef _ASM_TRACE_CLOCK_H +#define _ASM_TRACE_CLOCK_H + +#include + +#endif diff --git a/arch/avr32/include/asm/trace_clock.h b/arch/avr32/include/asm/trace_clock.h new file mode 100644 index 000..f35fab8 --- /dev/null +++ b/arch/avr32/include/asm/trace_clock.h @@ -0,0 +1,6 @@ +#ifndef
[PATCH v6 6/6] tracing: Fix maybe-uninitialized warning in ftrace_function_set_regexp
Compiler warning: kernel/trace/trace_events_filter.c: In function 'ftrace_function_set_filter_cb': kernel/trace/trace_events_filter.c:2074:8: error: 'ret' may be used uninitialized in this function [-Werror=maybe-uninitialized] Signed-off-by: David Sharp Cc: Steven Rostedt --- kernel/trace/trace_events_filter.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c index 431dba8..ef36953 100644 --- a/kernel/trace/trace_events_filter.c +++ b/kernel/trace/trace_events_filter.c @@ -2002,9 +2002,10 @@ static int ftrace_function_set_regexp(struct ftrace_ops *ops, int filter, static int __ftrace_function_set_filter(int filter, char *buf, int len, struct function_filter_data *data) { - int i, re_cnt, ret; + int i, re_cnt; int *reset; char **re; + int ret = 0; reset = filter ? >first_filter : >first_notrace; -- 1.7.7.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v6 5/6] ftrace: Show raw time stamp on stats per cpu using counter or tsc mode for trace_clock
From: Yoshihiro YUNOMAE Show raw time stamp values for stats per cpu if you choose counter or tsc mode for trace_clock. Although a unit of tracing time stamp is nsec in local or global mode, the units in counter and TSC mode are tracing counter and cycles respectively. Signed-off-by: Yoshihiro YUNOMAE Cc: Steven Rostedt Cc: Frederic Weisbecker Cc: Ingo Molnar Signed-off-by: David Sharp --- kernel/trace/trace.c | 23 +-- 1 files changed, 17 insertions(+), 6 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index cff3427..8bfa3b7 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -4396,13 +4396,24 @@ tracing_stats_read(struct file *filp, char __user *ubuf, cnt = ring_buffer_bytes_cpu(tr->buffer, cpu); trace_seq_printf(s, "bytes: %ld\n", cnt); - t = ns2usecs(ring_buffer_oldest_event_ts(tr->buffer, cpu)); - usec_rem = do_div(t, USEC_PER_SEC); - trace_seq_printf(s, "oldest event ts: %5llu.%06lu\n", t, usec_rem); + if (trace_clocks[trace_clock_id].in_ns) { + /* local or global for trace_clock */ + t = ns2usecs(ring_buffer_oldest_event_ts(tr->buffer, cpu)); + usec_rem = do_div(t, USEC_PER_SEC); + trace_seq_printf(s, "oldest event ts: %5llu.%06lu\n", + t, usec_rem); + + t = ns2usecs(ring_buffer_time_stamp(tr->buffer, cpu)); + usec_rem = do_div(t, USEC_PER_SEC); + trace_seq_printf(s, "now ts: %5llu.%06lu\n", t, usec_rem); + } else { + /* counter or tsc mode for trace_clock */ + trace_seq_printf(s, "oldest event ts: %llu\n", + ring_buffer_oldest_event_ts(tr->buffer, cpu)); - t = ns2usecs(ring_buffer_time_stamp(tr->buffer, cpu)); - usec_rem = do_div(t, USEC_PER_SEC); - trace_seq_printf(s, "now ts: %5llu.%06lu\n", t, usec_rem); + trace_seq_printf(s, "now ts: %llu\n", + ring_buffer_time_stamp(tr->buffer, cpu)); + } count = simple_read_from_buffer(ubuf, count, ppos, s->buffer, s->len); -- 1.7.7.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v6 2/6] tracing: Reset ring buffer when changing trace_clocks
Because the "tsc" clock isn't in nanoseconds, the ring buffer must be reset when changing clocks so that incomparable timestamps don't end up in the same trace. Tested: Confirmed switching clocks resets the trace buffer. Google-Bug-Id: 6980623 Signed-off-by: David Sharp Cc: Steven Rostedt Cc: Masami Hiramatsu --- kernel/trace/trace.c |8 1 files changed, 8 insertions(+), 0 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 92fb08e..4e26df3 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -4012,6 +4012,14 @@ static ssize_t tracing_clock_write(struct file *filp, const char __user *ubuf, if (max_tr.buffer) ring_buffer_set_clock(max_tr.buffer, trace_clocks[i].func); + /* +* New clock may not be consistent with the previous clock. +* Reset the buffer so that it doesn't have incomparable timestamps. +*/ + tracing_reset_online_cpus(_trace); + if (max_tr.buffer) + tracing_reset_online_cpus(_tr); + mutex_unlock(_types_lock); *fpos += cnt; -- 1.7.7.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v6 0/6] TSC trace_clock
Added Yoshihiro Yonomae's patches to change the per_cpu stats to show raw timestamps when the clock is not in nanoseconds. Also added a small patch to fix a warning. David Sharp (4): tracing,x86: Add a TSC trace_clock tracing: Reset ring buffer when changing trace_clocks tracing: Format non-nanosec times from tsc clock without a decimal point. tracing: Fix maybe-uninitialized warning in ftrace_function_set_regexp Yoshihiro YUNOMAE (2): ftrace: Change unsigned long type of ring_buffer_oldest_event_ts() to u64 ftrace: Show raw time stamp on stats per cpu using counter or tsc mode for trace_clock arch/alpha/include/asm/trace_clock.h |6 ++ arch/arm/include/asm/trace_clock.h|6 ++ arch/avr32/include/asm/trace_clock.h |6 ++ arch/blackfin/include/asm/trace_clock.h |6 ++ arch/c6x/include/asm/trace_clock.h|6 ++ arch/cris/include/asm/trace_clock.h |6 ++ arch/frv/include/asm/trace_clock.h|6 ++ arch/h8300/include/asm/trace_clock.h |6 ++ arch/hexagon/include/asm/trace_clock.h|6 ++ arch/ia64/include/asm/trace_clock.h |6 ++ arch/m32r/include/asm/trace_clock.h |6 ++ arch/m68k/include/asm/trace_clock.h |6 ++ arch/microblaze/include/asm/trace_clock.h |6 ++ arch/mips/include/asm/trace_clock.h |6 ++ arch/mn10300/include/asm/trace_clock.h|6 ++ arch/openrisc/include/asm/trace_clock.h |6 ++ arch/parisc/include/asm/trace_clock.h |6 ++ arch/powerpc/include/asm/trace_clock.h|6 ++ arch/s390/include/asm/trace_clock.h |6 ++ arch/score/include/asm/trace_clock.h |6 ++ arch/sh/include/asm/trace_clock.h |6 ++ arch/sparc/include/asm/trace_clock.h |6 ++ arch/tile/include/asm/trace_clock.h |6 ++ arch/um/include/asm/trace_clock.h |6 ++ arch/unicore32/include/asm/trace_clock.h |6 ++ arch/x86/include/asm/trace_clock.h| 16 ++ arch/x86/kernel/Makefile |1 + arch/x86/kernel/trace_clock.c | 21 arch/xtensa/include/asm/trace_clock.h |6 ++ include/asm-generic/trace_clock.h | 16 ++ include/linux/ftrace_event.h |6 ++ include/linux/ring_buffer.h |2 +- include/linux/trace_clock.h |2 + kernel/trace/ring_buffer.c|4 +- kernel/trace/trace.c | 47 ++--- kernel/trace/trace.h |4 -- kernel/trace/trace_events_filter.c|3 +- kernel/trace/trace_output.c | 80 - 38 files changed, 316 insertions(+), 42 deletions(-) create mode 100644 arch/alpha/include/asm/trace_clock.h create mode 100644 arch/arm/include/asm/trace_clock.h create mode 100644 arch/avr32/include/asm/trace_clock.h create mode 100644 arch/blackfin/include/asm/trace_clock.h create mode 100644 arch/c6x/include/asm/trace_clock.h create mode 100644 arch/cris/include/asm/trace_clock.h create mode 100644 arch/frv/include/asm/trace_clock.h create mode 100644 arch/h8300/include/asm/trace_clock.h create mode 100644 arch/hexagon/include/asm/trace_clock.h create mode 100644 arch/ia64/include/asm/trace_clock.h create mode 100644 arch/m32r/include/asm/trace_clock.h create mode 100644 arch/m68k/include/asm/trace_clock.h create mode 100644 arch/microblaze/include/asm/trace_clock.h create mode 100644 arch/mips/include/asm/trace_clock.h create mode 100644 arch/mn10300/include/asm/trace_clock.h create mode 100644 arch/openrisc/include/asm/trace_clock.h create mode 100644 arch/parisc/include/asm/trace_clock.h create mode 100644 arch/powerpc/include/asm/trace_clock.h create mode 100644 arch/s390/include/asm/trace_clock.h create mode 100644 arch/score/include/asm/trace_clock.h create mode 100644 arch/sh/include/asm/trace_clock.h create mode 100644 arch/sparc/include/asm/trace_clock.h create mode 100644 arch/tile/include/asm/trace_clock.h create mode 100644 arch/um/include/asm/trace_clock.h create mode 100644 arch/unicore32/include/asm/trace_clock.h create mode 100644 arch/x86/include/asm/trace_clock.h create mode 100644 arch/x86/kernel/trace_clock.c create mode 100644 arch/xtensa/include/asm/trace_clock.h create mode 100644 include/asm-generic/trace_clock.h -- 1.7.7.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v6 4/6] ftrace: Change unsigned long type of ring_buffer_oldest_event_ts() to u64
From: Yoshihiro YUNOMAE ring_buffer_oldest_event_ts() should return a value of u64 type, because ring_buffer_per_cpu->buffer_page->buffer_data_page->time_stamp is u64 type. Signed-off-by: Yoshihiro YUNOMAE Cc: Steven Rostedt Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Vaibhav Nagarnaik Signed-off-by: David Sharp --- include/linux/ring_buffer.h |2 +- kernel/trace/ring_buffer.c |4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h index 6c8835f..c68a09a 100644 --- a/include/linux/ring_buffer.h +++ b/include/linux/ring_buffer.h @@ -159,7 +159,7 @@ int ring_buffer_record_is_on(struct ring_buffer *buffer); void ring_buffer_record_disable_cpu(struct ring_buffer *buffer, int cpu); void ring_buffer_record_enable_cpu(struct ring_buffer *buffer, int cpu); -unsigned long ring_buffer_oldest_event_ts(struct ring_buffer *buffer, int cpu); +u64 ring_buffer_oldest_event_ts(struct ring_buffer *buffer, int cpu); unsigned long ring_buffer_bytes_cpu(struct ring_buffer *buffer, int cpu); unsigned long ring_buffer_entries(struct ring_buffer *buffer); unsigned long ring_buffer_overruns(struct ring_buffer *buffer); diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 49491fa..db3806e 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -2925,12 +2925,12 @@ rb_num_of_entries(struct ring_buffer_per_cpu *cpu_buffer) * @buffer: The ring buffer * @cpu: The per CPU buffer to read from. */ -unsigned long ring_buffer_oldest_event_ts(struct ring_buffer *buffer, int cpu) +u64 ring_buffer_oldest_event_ts(struct ring_buffer *buffer, int cpu) { unsigned long flags; struct ring_buffer_per_cpu *cpu_buffer; struct buffer_page *bpage; - unsigned long ret; + u64 ret; if (!cpumask_test_cpu(cpu, buffer->cpumask)) return 0; -- 1.7.7.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 03/31] sections: Fix section conflicts in arch/frv
> Unfortunately __pminitconst isn't defined at this point: > arch/frv/kernel/setup.c:187:47: error: expected '=', ',', ';', 'asm' > or '__attribute__' before '*' token > arch/frv/kernel/setup.c:386:2: error: 'clock_cmodes' undeclared (first > use in this function) > arch/frv/kernel/setup.c:571:6: error: 'clock_cmodes' undeclared (first > use in this function) > make[2]: *** [arch/frv/kernel/setup.o] Error 1 > > http://kisskb.ellerman.id.au/kisskb/buildresult/7344691/ > > It seems the __pminit* variants are frv-specific, and don't cover all possible > combinations? Thanks for reporting. Does this fix it? -Andi --- frv: Fix const sections changhe Add __pminitconst to fix the build again. Reported by: Geert Uytterhoeven Signed-off-by: Andi Kleen diff --git a/arch/frv/kernel/setup.c b/arch/frv/kernel/setup.c index 1f1e5ef..b8993c8 100644 --- a/arch/frv/kernel/setup.c +++ b/arch/frv/kernel/setup.c @@ -112,9 +112,11 @@ char __initdata redboot_command_line[COMMAND_LINE_SIZE]; #ifdef CONFIG_PM #define __pminit #define __pminitdata +#define __pminitconst #else #define __pminit __init #define __pminitdata __initdata +#define __pminitconst __initconst #endif struct clock_cmode { -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 03/10] x86, mm: get early page table from BRK
On 10/12/2012 06:55 AM, Yinghai Lu wrote: > On Thu, Oct 11, 2012 at 7:41 AM, Konrad Rzeszutek Wilk > wrote: >> On Wed, Oct 10, 2012 at 08:49:05AM -0700, Yinghai Lu wrote: >>> >>> yes that is some extreme case: >>> assume that 2M range is [2T-2M, 2T), >> >> What is T in here? Terabyte? Is the '[' vs ')' a significance in your >> explanation? Should it be '[2T-2M, 2T]' ? > > yes, T is terabyte > > [2T-2M, 2T) is equal to [2T-2M, 2T-1] > > ) mean the boundary is not included. > Right... and just to defend Yinghai here, this is a standard mathematical notation. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
mmotm 2012-10-11-16-14 uploaded
The mm-of-the-moment snapshot 2012-10-11-16-14 has been uploaded to http://www.ozlabs.org/~akpm/mmotm/ mmotm-readme.txt says README for mm-of-the-moment: http://www.ozlabs.org/~akpm/mmotm/ This is a snapshot of my -mm patch queue. Uploaded at random hopefully more than once a week. You will need quilt to apply these patches to the latest Linus release (3.x or 3.x-rcY). The series file is in broken-out.tar.gz and is duplicated in http://ozlabs.org/~akpm/mmotm/series The file broken-out.tar.gz contains two datestamp files: .DATE and .DATE--mm-dd-hh-mm-ss. Both contain the string -mm-dd-hh-mm-ss, followed by the base kernel version against which this patch series is to be applied. This tree is partially included in linux-next. To see which patches are included in linux-next, consult the `series' file. Only the patches within the #NEXT_PATCHES_START/#NEXT_PATCHES_END markers are included in linux-next. A git tree which contains the memory management portion of this tree is maintained at git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git by Michal Hocko. It contains the patches which are between the "#NEXT_PATCHES_START mm" and "#NEXT_PATCHES_END" markers, from the series file, http://www.ozlabs.org/~akpm/mmotm/series. A full copy of the full kernel tree with the linux-next and mmotm patches already applied is available through git within an hour of the mmotm release. Individual mmotm releases are tagged. The master branch always points to the latest release, so it's constantly rebasing. http://git.cmpxchg.org/?p=linux-mmotm.git;a=summary To develop on top of mmotm git: $ git remote add mmotm git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git $ git remote update mmotm $ git checkout -b topic mmotm/master $ git send-email mmotm/master.. [...] To rebase a branch with older patches to a new mmotm release: $ git remote update mmotm $ git rebase --onto mmotm/master topic The directory http://www.ozlabs.org/~akpm/mmots/ (mm-of-the-second) contains daily snapshots of the -mm tree. It is updated more frequently than mmotm, and is untested. A git copy of this tree is available at http://git.cmpxchg.org/?p=linux-mmots.git;a=summary and use of this tree is similar to http://git.cmpxchg.org/?p=linux-mmotm.git, described above. This mmotm tree contains the following patches against 3.6: (patches marked "*" will be included in linux-next) origin.patch linux-next.patch linux-next-git-rejects.patch i-need-old-gcc.patch arch-alpha-kernel-systblss-remove-debug-check.patch * linux-coredumph-needs-asm-siginfoh.patch * memstick-remove-unused-field-from-state-struct.patch * memstick-ms_block-fix-complile-issue.patch * memstick-use-after-free-in-msb_disk_release.patch * memstick-memory-leak-on-error-in-msb_ftl_scan.patch * kernel-sysc-fix-stack-memory-content-leak-via-uname26.patch * drivers-video-backlight-lm3639_blc-return-proper-error-in-lm3639_bled_mode_store-error-paths.patch * pidns-remove-recursion-from-free_pid_ns-v5.patch * pidns-remove-recursion-from-free_pid_ns-v5-fix.patch * cris-fix-i-o-macros.patch * selinux-fix-sel_netnode_insert-suspicious-rcu-dereference.patch * mm-slab-release-slab_mutex-earlier-in-kmem_cache_destroy.patch * nohz-fix-idle-ticks-in-cpu-summary-line-of-proc-stat.patch * vfs-d_obtain_alias-needs-to-use-as-default-name.patch * cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch * cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved-fix.patch * acpi_memhotplugc-fix-memory-leak-when-memory-device-is-unbound-from-the-module-acpi_memhotplug.patch * acpi_memhotplugc-free-memory-device-if-acpi_memory_enable_device-failed.patch * acpi_memhotplugc-remove-memory-info-from-list-before-freeing-it.patch * acpi_memhotplugc-dont-allow-to-eject-the-memory-device-if-it-is-being-used.patch * acpi_memhotplugc-bind-the-memory-device-when-the-driver-is-being-loaded.patch * acpi_memhotplugc-auto-bind-the-memory-device-which-is-hotplugged-before-the-driver-is-loaded.patch * arch-x86-platform-iris-irisc-register-a-platform-device-and-a-platform-driver.patch * x86-numa-dont-check-if-node-is-numa_no_node.patch * arch-x86-tools-insn_sanityc-identify-source-of-messages.patch * uv-fix-incorrect-tlb-flush-all-issue.patch * olpc-fix-olpc-xo1-scic-build-errors.patch * fs-debugsfs-remove-unnecessary-inode-i_private-initialization.patch * pcmcia-move-unbind-rebind-into-dev_pm_opscomplete.patch * drm-i915-optimize-div_round_closest-call.patch * drm-fix-radeon-printk-format-warnings.patch cyber2000fb-avoid-palette-corruption-at-higher-clocks.patch * timeconstpl-remove-deprecated-defined-array.patch * time-dont-inline-export_symbol-functions.patch * h8300-select-generic-atomic64_t-support.patch * cciss-cleanup-bitops-usage.patch * cciss-use-check_signature.patch * block-store-partition_meta_infouuid-as-a-string.patch * init-reduce-partuuid-min-length-to-1-from-36.patch *
Re: Linux 2.6.32.60
On 10/11/2012 07:31 PM, Willy Tarreau wrote: > On Thu, Oct 11, 2012 at 07:58:04PM +0900, Greg KH wrote: >> On Thu, Oct 11, 2012 at 08:29:16AM +0200, Willy Tarreau wrote: >>> If you think these patches constitute a regression, I can revert them. >>> However I'd like convincing arguments since they're here to help address >>> a real issue. >> >> If I missed these when doing the random number generation backport for >> 3.0, and I should add them there as well, please let me know. > > At least I think they should not be in 2.6.32 without being in 3.0. > Probably that Peter's opinion will help us decide whether they should > go into 3.0 or 2.6.32 should revert them. > I would strongly argue for at least one of the RDRAND-enabling versions being in all supported kernels; the second (with Ted Ts'o's changes) is better, but touches a *lot* of subsystems; the plain one is self-contained but only helps RDRAND-enabled hardware. Without these patches the random subsystem has a critical security flaw, which puts it into the scope for stable. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Q] Default SLAB allocator
David Rientjes writes: > On Thu, 11 Oct 2012, Andi Kleen wrote: > >> > While I've always thought SLUB was the default and recommended allocator, >> > I'm surprise to find that it's not always the case: >> >> iirc the main performance reasons for slab over slub have mostly >> disappeared, so in theory slab could be finally deprecated now. >> > > SLUB is a non-starter for us and incurs a >10% performance degradation in > netperf TCP_RR. When did you last test? Our regressions had disappeared a few kernels ago. -Andi -- a...@linux.intel.com -- Speaking for myself only -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] compat: VIDEO_SET_SPU_PALETTE missing error check
The compat ioctl for VIDEO_SET_SPU_PALETTE was missing an error check while converting ioctl arguments. This could lead to leaking kernel stack contents into userspace. Patch extracted from existing fix in grsecurity. Cc: David Miller Cc: Brad Spengler Cc: PaX Team Cc: sta...@vger.kernel.org Signed-off-by: Kees Cook --- fs/compat_ioctl.c |2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/compat_ioctl.c b/fs/compat_ioctl.c index f505402..4c6285f 100644 --- a/fs/compat_ioctl.c +++ b/fs/compat_ioctl.c @@ -210,6 +210,8 @@ static int do_video_set_spu_palette(unsigned int fd, unsigned int cmd, err = get_user(palp, >palette); err |= get_user(length, >length); + if (err) + return -EFAULT; up_native = compat_alloc_user_space(sizeof(struct video_spu_palette)); err = put_user(compat_ptr(palp), _native->palette); -- 1.7.9.5 -- Kees Cook Chrome OS Security -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -v3 0/7] x86: Use BRK to pre mapping page table to make xen happy
On Wed, Oct 10, 2012 at 11:13 PM, Yinghai Lu wrote: > On Wed, Oct 10, 2012 at 9:40 AM, Stefano Stabellini > wrote: >> >> So you are missing the Xen patches entirely in this iteration of the >> series? > > please check updated for-x86-mm branch. > > [PATCH -v4 00/15] x86: Use BRK to pre mapping page table to make xen happy > > could be found at: > > git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git > for-x86-mm Stefano, Can you try -v4 to see if xen works with the changes? Thanks Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT] TPM bugfixes
On Thu, 11 Oct 2012 15:49:36 -0700 Andrew Morton wrote: > On Fri, 12 Oct 2012 00:45:06 +0200 > richard -rw- weinberger wrote: > > > On Fri, Oct 12, 2012 at 12:19 AM, Andrew Morton > > wrote: > > > On Thu, 11 Oct 2012 21:54:18 +1100 (EST) > > > James Morris wrote: > > > > > >> Please pull these fixes for the TPM code. > > >> > > >> The following changes since commit > > >> 12250d843e8489ee00b5b7726da855e51694e792: > > >> > > >> Merge branch 'i2c-embedded/for-next' of > > >> git://git.pengutronix.de/git/wsa/linux (2012-10-11 10:27:51 +0900) > > >> > > >> are available in the git repository at: > > >> > > >> > > >> git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git > > >> for-linus > > > > > > Gargh. Is it possible to add a human-readable http URL to these things > > > so that people can actually look at the patches without hoop-jumping? > > > > rw@mantary:~> echo > > "git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git > > for-linus" | sed -e > > 's/git\:\/\//http\:\/\//;s/\/pub\/scm\//\?p=/g;s/\.git > > /\.git\;a=shortlog;h=refs\/heads\//g' > > > > http://git.kernel.org?p=linux/kernel/git/jmorris/linux-security.git;a=shortlog;h=refs/heads/for-linus > > > > Geeze. > > Thanks. Followed by ^F, copy-n-paste, then hope it's on the first page? http://git.kernel.org/?p=linux/kernel/git/jmorris/linux-security.git;a=commit;h=abce9ac292e13da367bbd22c1f7669f988d931ac Which can be shortened to something like http://git.kernel.org/?p=linux/kernel/git/jmorris/linux-security.git;h=abce9ac2 Simply including the commit IDs would help a lot. Including the full URL to each commit would be nicer. I assume Junio owns that script? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Q] Default SLAB allocator
On Thu, 11 Oct 2012, Andi Kleen wrote: > > While I've always thought SLUB was the default and recommended allocator, > > I'm surprise to find that it's not always the case: > > iirc the main performance reasons for slab over slub have mostly > disappeared, so in theory slab could be finally deprecated now. > SLUB is a non-starter for us and incurs a >10% performance degradation in netperf TCP_RR. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 0/8] Improve performance of VM translation on x86_64
On 10/12/2012 06:40 AM, Andi Kleen wrote: > > Patch series looks good to me. Thanks for doing this properly. > Reviewed-by: Andi Kleen > Agreed. Acked-by: H. Peter Anvin I will pick this up after the merge window closes unless Ingo beats me to it. (I'm currently traveling.) -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 3/3] tracing: Format non-nanosec times from tsc clock without a decimal point.
On Thu, Oct 11, 2012 at 1:43 PM, Steven Rostedt wrote: > On Mon, 2012-10-01 at 20:31 -0700, David Sharp wrote: > >> static int >> -lat_print_timestamp(struct trace_seq *s, u64 abs_usecs, >> - unsigned long rel_usecs) >> +lat_print_timestamp(struct trace_iterator *iter, u64 next_ts) >> { >> - return trace_seq_printf(s, " %4lldus%c: ", abs_usecs, >> - rel_usecs > preempt_mark_thresh ? '!' : >> - rel_usecs > 1 ? '+' : ' '); >> + unsigned long verbose = trace_flags & TRACE_ITER_VERBOSE; >> + unsigned long in_ns = iter->iter_flags & TRACE_FILE_TIME_IN_NS; >> + unsigned long long abs_ts = iter->ts - iter->tr->time_start; >> + unsigned long long rel_ts = next_ts - iter->ts; >> + struct trace_seq *s = >seq; >> + unsigned long mark_thresh; >> + int ret; >> + >> + if (in_ns) { >> + abs_ts = ns2usecs(abs_ts); >> + rel_ts = ns2usecs(rel_ts); >> + } >> + >> + if (verbose && in_ns) { >> + unsigned long abs_msec = abs_ts; >> + unsigned long abs_usec = do_div(abs_msec, USEC_PER_MSEC); >> + unsigned long rel_msec = rel_ts; >> + unsigned long rel_usec = do_div(rel_msec, USEC_PER_MSEC); >> + >> + ret = trace_seq_printf( >> + s, "[%08llx] %ld.%03ldms (+%ld.%03ldms): ", >> + ns2usecs(iter->ts), >> + abs_msec, abs_usec, >> + rel_msec, rel_usec); >> + } else if (verbose && !in_ns) { >> + ret = trace_seq_printf( >> + s, "[%016llx] %lld (+%lld): ", >> + iter->ts, abs_ts, rel_ts); >> + } else if (!verbose && in_ns) { >> + ret = trace_seq_printf( >> + s, " %4lldus: ", > > Missing %c. > >> + abs_ts, >> + rel_ts > preempt_mark_thresh_us ? '!' : >> + rel_ts > 1 ? '+' : ' '); >> + } else { /* !verbose && !in_ns */ >> + ret = trace_seq_printf(s, " %4lld%s%c: ", abs_ts); > > Um, "%s%c" with no matching arguments. Sorry for being so sloppy on these patches... Wonder why I didn't see the printf warnings. I'll send out an new patchset with this fixed, and include Yoshihiro Yunomae's patches. > > -- Steve > >> + } >> + return ret; >> } >> > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/