Using ps to display process information never exit, and can't be killed

2012-10-11 Thread Cyberman Wu
Sorry to use that big mail list account since I don't know any
specific mail list account should be used for that problem.

We're running Linux box on Gx platform from Tilera. The kernel use
some vendor specific patches, but most of them
are the same as standard kernel.

We encounter a problem occasionally, that I'm trying to resolve it.
But while I used 'ps' to get process information,
the new launched ps print out nothing and can't exit, ^C doesn't work.
I find out its pid under /proc, and it's in RUNNING
state:
# cat status
Name:   ps
State:  R (running)
Tgid:   1298
Pid:1298
PPid:   1
TracerPid:  0
Uid:0   0   0   0
Gid:0   0   0   0
FDSize: 64
Groups: 0 1 2 3 4 6 10 489
VmPeak: 3776 kB
VmSize: 3712 kB
VmLck: 0 kB
VmHWM:  2624 kB
VmRSS:  2624 kB
VmData:  832 kB
VmStk:   256 kB
VmExe:   192 kB
VmLib:  2176 kB
VmPTE: 6 kB
VmSwap:0 kB
Threads:1
SigQ:   7/8113
SigPnd: 0100
ShdPnd: 000a0103
SigBlk: 
SigIgn: 0004
SigCgt: 73d3fef9
CapInh: 
CapPrm: 
CapEff: 
CapBnd: 
Cpus_allowed:   f,
Cpus_allowed_list:  0-35
Mems_allowed:   3
Mems_allowed_list:  0-1
voluntary_ctxt_switches:1
nonvoluntary_ctxt_switches: 0

And it can't be killed even using SIGKILL.

Since it's under *RUNNING* status, its stack can't be dumped. Is there
any exist mechanism can be used to
get it stack, or other information, to help me figure out what's the
cause of ps pend on *RUNNING*?


System information:
# uname -a
Linux localhost 2.6.38.8-MDE-4.0.0.141101 #7 SMP Fri Sep 28 21:46:08
CST 2012 tilegx GNU/Linux



Best regards.

-- 
Cyberman Wu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] dmaengine: dw_dmac: Enhance device tree support

2012-10-11 Thread viresh kumar
On Fri, Oct 12, 2012 at 11:14 AM, Viresh Kumar  wrote:
> dw_dmac driver already supports device tree but it used to have its platform
> data passed the non-DT way.
>
> This patch does following changes:
> - pass platform data via DT, non-DT way still takes precedence if both are 
> used.
> - create generic filter routine
> - Earlier slave information was made available by slave specific filter 
> routines
>   in chan->private field. Now, this information would be passed from within 
> dmac
>   DT node. Slave drivers would now be required to pass bus_id (a string) as
>   parameter to this generic filter(), which would be compared against the 
> slave
>   data passed from DT, by the generic filter routine.
> - Update binding document

DT parsing of this patch can be tested with following non-official patch :)


dmaengine: dw_dmac: Add dt params debug routine

Signed-off-by: Viresh Kumar 
---
 drivers/dma/dw_dmac.c | 42 +-
 1 file changed, 41 insertions(+), 1 deletion(-)

diff --git a/drivers/dma/dw_dmac.c b/drivers/dma/dw_dmac.c
index 05c1dff..569914d 100644
--- a/drivers/dma/dw_dmac.c
+++ b/drivers/dma/dw_dmac.c
@@ -1504,6 +1504,43 @@ static inline int dw_dma_parse_dt(struct
platform_device *pdev)
 }
 #endif

+static void dw_dma_parse_dt_debug(struct dw_dma_platform_data *pdata)
+{
+   int i = -1;
+
+   if (!pdata) {
+   printk(KERN_ERR "dw_dma: unable to read info from DT\n");
+   return;
+   }
+
+   printk(KERN_ERR "\nPrinting dw_dma DT info\n");
+
+   printk(KERN_ERR "nr_channels: %x\n", pdata->nr_channels);
+   printk(KERN_ERR "is_private: %x\n", pdata->is_private);
+   printk(KERN_ERR "chan_allocation_order: %x\n",
+   pdata->chan_allocation_order);
+
+   printk(KERN_ERR "chan_priority: %x\n", pdata->chan_priority);
+   printk(KERN_ERR "block_size: %x\n", pdata->block_size);
+
+   printk(KERN_ERR "nr_masters: %x\n", pdata->nr_masters);
+   printk(KERN_ERR "data_width: %d %d %d %d\n", pdata->data_width[0],
+   pdata->data_width[1], pdata->data_width[2],
+   pdata->data_width[3]);
+
+   /* parse slave data */
+   printk(KERN_ERR "slave_info\n");
+
+   while (++i < pdata->sd_count) {
+   printk(KERN_INFO "bus_id: %s\n", pdata->sd[i].bus_id);
+   printk(KERN_INFO "cfg_hi: %x\n", pdata->sd[i].cfg_hi);
+   printk(KERN_INFO "cfg_lo: %x\n", pdata->sd[i].cfg_lo);
+   printk(KERN_INFO "src_master: %x\n",
+   pdata->sd[i].src_master);
+   printk(KERN_INFO "dst_master: %x\n",
+   pdata->sd[i].dst_master);
+   }
+}
 static int __devinit dw_probe(struct platform_device *pdev)
 {
struct dw_dma_platform_data *pdata;
@@ -1515,9 +1552,12 @@ static int __devinit dw_probe(struct
platform_device *pdev)
int i;

pdata = dev_get_platdata(>dev);
-   if (!pdata)
+   if (!pdata) {
pdata = dw_dma_parse_dt(pdev);
+   dw_dma_parse_dt_debug(pdata);
+   }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/3] dmaengine: dw_dmac: Enhance device tree support

2012-10-11 Thread Viresh Kumar
dw_dmac driver already supports device tree but it used to have its platform
data passed the non-DT way.

This patch does following changes:
- pass platform data via DT, non-DT way still takes precedence if both are used.
- create generic filter routine
- Earlier slave information was made available by slave specific filter routines
  in chan->private field. Now, this information would be passed from within dmac
  DT node. Slave drivers would now be required to pass bus_id (a string) as
  parameter to this generic filter(), which would be compared against the slave
  data passed from DT, by the generic filter routine.
- Update binding document

Signed-off-by: Viresh Kumar 
---
 Documentation/devicetree/bindings/dma/snps-dma.txt |  44 ++
 drivers/dma/dw_dmac.c  | 147 +
 drivers/dma/dw_dmac_regs.h |   4 +
 include/linux/dw_dmac.h|  43 +++---
 4 files changed, 221 insertions(+), 17 deletions(-)

diff --git a/Documentation/devicetree/bindings/dma/snps-dma.txt 
b/Documentation/devicetree/bindings/dma/snps-dma.txt
index c0d85db..5bb3dfb 100644
--- a/Documentation/devicetree/bindings/dma/snps-dma.txt
+++ b/Documentation/devicetree/bindings/dma/snps-dma.txt
@@ -6,6 +6,26 @@ Required properties:
 - interrupt-parent: Should be the phandle for the interrupt controller
   that services interrupts for this device
 - interrupt: Should contain the DMAC interrupt number
+- nr_channels: Number of channels supported by hardware
+- is_private: The device channels should be marked as private and not for by 
the
+  general purpose DMA channel allocator. False if not passed.
+- chan_allocation_order: order of allocation of channel, 0 (default): 
ascending,
+  1: descending
+- chan_priority: priority of channels. 0 (default): increase from chan 0->n, 1:
+  increase from chan n->0
+- block_size: Maximum block size supported by the controller
+- nr_masters: Number of AHB masters supported by the controller
+- data_width: Maximum data width supported by hardware per AHB master
+  (0 - 8bits, 1 - 16bits, ..., 5 - 256bits)
+- slave_info:
+   - bus_id: name of this device channel, not just a device name since
+ devices may have more than one channel e.g. "foo_tx". For using the
+ dw_generic_filter(), slave drivers must pass exactly this string as
+ param to filter function.
+   - cfg_hi: Platform-specific initializer for the CFG_HI register
+   - cfg_lo: Platform-specific initializer for the CFG_LO register
+   - src_master: src master for transfers on allocated channel.
+   - dst_master: dest master for transfers on allocated channel.
 
 Example:
 
@@ -14,4 +34,28 @@ Example:
reg = <0xfc00 0x1000>;
interrupt-parent = <>;
interrupts = <12>;
+
+   nr_channels = <8>;
+   chan_allocation_order = <1>;
+   chan_priority = <1>;
+   block_size = <0xfff>;
+   nr_masters = <2>;
+   data_width = <3 3 0 0>;
+
+   slave_info {
+   uart0-tx {
+   bus_id = "uart0-tx";
+   cfg_hi = <0x4000>;  /* 0x8 << 11 */
+   cfg_lo = <0>;
+   src_master = <0>;
+   dst_master = <1>;
+   };
+   spi0-tx {
+   bus_id = "spi0-tx";
+   cfg_hi = <0x2000>;  /* 0x4 << 11 */
+   cfg_lo = <0>;
+   src_master = <0>;
+   dst_master = <0>;
+   };
+   };
};
diff --git a/drivers/dma/dw_dmac.c b/drivers/dma/dw_dmac.c
index c4b0eb3..9a7d084 100644
--- a/drivers/dma/dw_dmac.c
+++ b/drivers/dma/dw_dmac.c
@@ -1179,6 +1179,58 @@ static void dwc_free_chan_resources(struct dma_chan 
*chan)
dev_vdbg(chan2dev(chan), "%s: done\n", __func__);
 }
 
+bool dw_generic_filter(struct dma_chan *chan, void *param)
+{
+   struct dw_dma *dw = to_dw_dma(chan->device);
+   static struct dw_dma *last_dw;
+   static char *last_bus_id;
+   int found = 0, i = -1;
+
+   /*
+* dmaengine framework calls this routine for all channels of all dma
+* controller, until true is returned. If 'param' bus_id is not
+* registered with a dma controller (dw), then there is no need of
+* running below function for all channels of dw.
+*
+* This block of code does this by saving the parameters of last
+* failure. If dw and param are same, i.e. trying on same dw with
+* different channel, return false.
+*/
+   if (last_dw) {
+   if ((last_bus_id == param) && (last_dw == dw))
+   return false;
+   }
+
+   /*
+* 

[PATCH 3/3] ARM: SPEAr13xx: Pass DW DMAC platform data from DT

2012-10-11 Thread Viresh Kumar
This patch adds dw_dmac's platform data to DT node. It also creates slave info
node for SPEAr13xx, for the devices which were using dw_dmac.

Signed-off-by: Viresh Kumar 
---
 arch/arm/boot/dts/spear1340.dtsi | 19 ++
 arch/arm/boot/dts/spear13xx.dtsi | 38 
 arch/arm/mach-spear13xx/include/mach/spear.h |  2 --
 arch/arm/mach-spear13xx/spear1310.c  |  4 +--
 arch/arm/mach-spear13xx/spear1340.c  | 27 +++---
 arch/arm/mach-spear13xx/spear13xx.c  | 54 ++--
 6 files changed, 65 insertions(+), 79 deletions(-)

diff --git a/arch/arm/boot/dts/spear1340.dtsi b/arch/arm/boot/dts/spear1340.dtsi
index d71fe2a..8ea3f66 100644
--- a/arch/arm/boot/dts/spear1340.dtsi
+++ b/arch/arm/boot/dts/spear1340.dtsi
@@ -24,6 +24,25 @@
status = "disabled";
};
 
+   dma@ea80 {
+   slave_info {
+   uart1_tx {
+   bus_id = "uart1_tx";
+   cfg_hi = <0x6000>;  /* 0xC << 11 */
+   cfg_lo = <0>;
+   src_master = <0>;
+   dst_master = <1>;
+   };
+   uart1_tx {
+   bus_id = "uart1_tx";
+   cfg_hi = <0x680>;   /* 0xD << 7 */
+   cfg_lo = <0>;
+   src_master = <1>;
+   dst_master = <0>;
+   };
+   };
+   };
+
spi1: spi@5d40 {
compatible = "arm,pl022", "arm,primecell";
reg = <0x5d40 0x1000>;
diff --git a/arch/arm/boot/dts/spear13xx.dtsi b/arch/arm/boot/dts/spear13xx.dtsi
index f7b84ac..f06bb50 100644
--- a/arch/arm/boot/dts/spear13xx.dtsi
+++ b/arch/arm/boot/dts/spear13xx.dtsi
@@ -91,6 +91,37 @@
reg = <0xea80 0x1000>;
interrupts = <0 19 0x4>;
status = "disabled";
+
+   nr_channels = <8>;
+   chan_allocation_order = <1>;
+   chan_priority = <1>;
+   block_size = <0xfff>;
+   nr_masters = <2>;
+   data_width = <3 3 0 0>;
+
+   slave_info {
+   ssp0_tx {
+   bus_id = "ssp0_tx";
+   cfg_hi = <0x2000>;  /* 0x4 << 11 */
+   cfg_lo = <0>;
+   src_master = <0>;
+   dst_master = <0>;
+   };
+   ssp0_rx {
+   bus_id = "ssp0_rx";
+   cfg_hi = <0x280>;   /* 0x5 << 7 */
+   cfg_lo = <0>;
+   src_master = <0>;
+   dst_master = <0>;
+   };
+   cf {
+   bus_id = "cf";
+   cfg_hi = <0>;
+   cfg_lo = <0>;
+   src_master = <0>;
+   dst_master = <0>;
+   };
+   };
};
 
dma@eb00 {
@@ -98,6 +129,13 @@
reg = <0xeb00 0x1000>;
interrupts = <0 59 0x4>;
status = "disabled";
+
+   nr_channels = <8>;
+   chan_allocation_order = <1>;
+   chan_priority = <1>;
+   block_size = <0xfff>;
+   nr_masters = <2>;
+   data_width = <3 3 0 0>;
};
 
fsmc: flash@b000 {
diff --git a/arch/arm/mach-spear13xx/include/mach/spear.h 
b/arch/arm/mach-spear13xx/include/mach/spear.h
index 07d90ac..71bf5b6 100644
--- a/arch/arm/mach-spear13xx/include/mach/spear.h
+++ b/arch/arm/mach-spear13xx/include/mach/spear.h
@@ -43,8 +43,6 @@
 #define VA_L2CC_BASE   IOMEM(UL(0xFB00))
 
 /* others */
-#define DMAC0_BASE UL(0xEA80)
-#define DMAC1_BASE UL(0xEB00)
 #define MCIF_CF_BASE   UL(0xB280)
 
 /* Devices present in SPEAr1310 */
diff --git a/arch/arm/mach-spear13xx/spear1310.c 
b/arch/arm/mach-spear13xx/spear1310.c
index 9fbbfc5..0e60195 

[PATCH 1/3] dmaengine: dw_dmac: Update documentation style comments for dw_dma_platform_data

2012-10-11 Thread Viresh Kumar
Documentation style comments were missing for few fields in struct
dw_dma_platform_data. Add these.

Signed-off-by: Viresh Kumar 
---
 include/linux/dw_dmac.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/dw_dmac.h b/include/linux/dw_dmac.h
index e1c8c9e..62a6190 100644
--- a/include/linux/dw_dmac.h
+++ b/include/linux/dw_dmac.h
@@ -19,6 +19,8 @@
  * @nr_channels: Number of channels supported by hardware (max 8)
  * @is_private: The device channels should be marked as private and not for
  * by the general purpose DMA channel allocator.
+ * @chan_allocation_order: Allocate channels starting from 0 or 7
+ * @chan_priority: Set channel priority increasing from 0 to 7 or 7 to 0.
  * @block_size: Maximum block size supported by the controller
  * @nr_masters: Number of AHB masters supported by the controller
  * @data_width: Maximum data width supported by hardware per AHB master
-- 
1.7.12.rc2.18.g61b472e


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [git pull] signal.git, pile 2 (was Re: [RFC][CFT][CFReview] execve and kernel_thread unification work)

2012-10-11 Thread Paul Mackerras
On Fri, Oct 12, 2012 at 02:09:58AM +0100, Al Viro wrote:
> 
> How granular are you planning to make that?  I mean, we are talking about
> 3 objects here - init/main.o, kernel/kthread.o and kernel/kmod.o.  Do they
> get TOC separate from that of arch/powerpc/kernel/entry_64.o?

Potentially, yes, it would be up to the linker.

> Anyway, if ppc folks can live with that stuff in its current form for now,

Yes, we can.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 2/2] sched:Pick the apt busy sched group during load balancing

2012-10-11 Thread preeti
Hi everyone,

The figures SCHED_GRP1:3200 and SCHED_GRP2:1156 shown below in the
changelog is the probable figure as calculated with the per-entity-
load-tracking metric for the runqueue load.

> If a sched group has passed the test for sufficient load in
> update_sg_lb_stats,to qualify for load balancing,then PJT's
> metrics has to be used to qualify the right sched group as the busiest group.
> 
> The scenario which led to this patch is shown below:
> Consider Task1 and Task2 to be a long running task
> and Tasks 3,4,5,6 to be short running tasks
> 
>   Task3
>   Task4
> Task1 Task5
> Task2 Task6
> ----
> SCHED_GRP1SCHED_GRP2
> 
> Normal load calculator would qualify SCHED_GRP2 as
> the candidate for sd->busiest due to the following loads
> that it calculates.
> 
> SCHED_GRP1:2048
> SCHED_GRP2:4096
> 
> Load calculator would probably qualify SCHED_GRP1 as the candidate
> for sd->busiest due to the following loads that it calculates
> 
> SCHED_GRP1:3200
> SCHED_GRP2:1156
> 
Regards
Preeti

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] module: add syscall to load module from fd

2012-10-11 Thread Michael Kerrisk (man-pages)
Rusty,

On Fri, Oct 12, 2012 at 12:16 AM, Rusty Russell  wrote:
> "H. Peter Anvin"  writes:
>
>> On 10/10/2012 06:03 AM, Michael Kerrisk (man-pages) wrote:
>>> Good point. A "whole hog" openat()-style interface is worth thinking about 
>>> too.
>>
>> *Although* you could argue that you can always simply open the module
>> file first, and that finit_module() is really what we should have had in
>> the first place.  Then you don't need the flags since those would come
>> from openat().
>
> There's no fundamental reason that modules have to be in a file.  I'm
> thinking of compressed modules, or an initrd which simply includes all
> the modules it wants to load in one linear file.
>
> Also, --force options manipulate the module before loading (as did the
> now-obsolete module rename option).

Sure. But my point that started this subthread was: should we take the
opportunity now to add a 'flags' argument to the new finit_module()
system call, so as to allow flexibility in extending the behavior in
future? There have been so many cases of revised system calls in the
past few years that replaced calls without a 'flags' argument that it
seems worth at least some thought before the API is cast in stone.

Thanks,

Michael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 01/06] input/rmi4: Public header and documentation

2012-10-11 Thread Mark Brown
On Thu, Oct 11, 2012 at 03:56:22AM +, Christopher Heiny wrote:

Fix your mailer to word wrap within paragraphs.

> If this feature is a deal-breaker, we can take it out.  In the absence
> of a generic GPIO implementation for CS, though, I'd much rather leave
> it in.  Once generic GPIO CS arrives, we'll remove it pretty quickly.  

Why not just implement this at an appropriate level in the SPI
subsystem?  One of the great things about Linux is that you can change
the core code...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: build warnings after merge of the tip tree

2012-10-11 Thread Stephen Rothwell
Hi all,

After merging the tip tree, today's linux-next build (powerpc allnoconfig)
produced these warnings:

kernel/sched/fair.c:801:22: warning: 'task_h_load' declared 'static' but never 
defined [-Wunused-function]
kernel/sched/fair.c:1013:13: warning: 'account_offnode_enqueue' defined but not 
used [-Wunused-function]

Introduced by commit 4ae834f767c5 ("sched/numa: Implement NUMA home-node
selection code").

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpdSL7ckyJCR.pgp
Description: PGP signature


linux-next: build warning after merge of the vfs tree

2012-10-11 Thread Stephen Rothwell
Hi Al,

After merging the vfs tree, today's linux-next build (powerpc allnoconfig)
produced this warning:

fs/namespace.c: In function 'do_mount':
fs/namespace.c:2219:8: warning: passing argument 3 of 'security_sb_mount' 
discards 'const' qualifier from pointer target type [enabled by default]
include/linux/security.h:1967:19: note: expected 'char *' but argument is of 
type 'const char *'

Introduced by commit 5804bc88667e ("consitify do_mount() arguments").
The prototype of the !CONFIG_SECURITY version was not completely updated.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpRkf0gffVul.pgp
Description: PGP signature


Re: [PATCH v6 1/6] tracing,x86: Add a TSC trace_clock

2012-10-11 Thread Geert Uytterhoeven
On Fri, Oct 12, 2012 at 1:27 AM, David Sharp  wrote:
> +#include 

Please use the Kbuild infrastructure ("generic-y += ..." in
arch/*/include/asm/Kbuild)
instead of adding wrappers around the asm-generic version.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] efivarfs: Implement exclusive access for {get,set}_variable

2012-10-11 Thread Jeremy Kerr

Hi Greg,


Should this be backported to the stable kernels?


No, the efivarfs code that this touches was only recently committed; it 
won't be in any of the stable series.


Cheers,


Jeremy

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 07/10] thp: implement splitting pmd for huge zero page

2012-10-11 Thread Ni zhan Chen

On 10/12/2012 12:13 PM, Kirill A. Shutemov wrote:

On Fri, Oct 12, 2012 at 11:23:37AM +0800, Ni zhan Chen wrote:

On 10/02/2012 11:19 PM, Kirill A. Shutemov wrote:

From: "Kirill A. Shutemov" 

We can't split huge zero page itself, but we can split the pmd which
points to it.

On splitting the pmd we create a table with all ptes set to normal zero
page.

Signed-off-by: Kirill A. Shutemov 
Reviewed-by: Andrea Arcangeli 
---
  mm/huge_memory.c |   32 
  1 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 95032d3..3f1c59c 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1600,6 +1600,7 @@ int split_huge_page(struct page *page)
struct anon_vma *anon_vma;
int ret = 1;
+   BUG_ON(is_huge_zero_pfn(page_to_pfn(page)));
BUG_ON(!PageAnon(page));
anon_vma = page_lock_anon_vma(page);
if (!anon_vma)
@@ -2503,6 +2504,32 @@ static int khugepaged(void *none)
return 0;
  }
+static void __split_huge_zero_page_pmd(struct vm_area_struct *vma,
+   unsigned long haddr, pmd_t *pmd)
+{
+   pgtable_t pgtable;
+   pmd_t _pmd;
+   int i;
+
+   pmdp_clear_flush_notify(vma, haddr, pmd);

why I can't find function pmdp_clear_flush_notify in kernel source
code? Do you mean pmdp_clear_flush_young_notify or something like
that?

It was changed recently. See commit
2ec74c3 mm: move all mmu notifier invocations to be done outside the PT lock


Oh, thanks!


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH 2/2] sched:Pick the apt busy sched group during load balancing

2012-10-11 Thread Preeti U Murthy
If a sched group has passed the test for sufficient load in
update_sg_lb_stats,to qualify for load balancing,then PJT's
metrics has to be used to qualify the right sched group as the busiest group.

The scenario which led to this patch is shown below:
Consider Task1 and Task2 to be a long running task
and Tasks 3,4,5,6 to be short running tasks

Task3
Task4
Task1   Task5
Task2   Task6
--  --
SCHED_GRP1  SCHED_GRP2

Normal load calculator would qualify SCHED_GRP2 as
the candidate for sd->busiest due to the following loads
that it calculates.

SCHED_GRP1:2048
SCHED_GRP2:4096

Load calculator would probably qualify SCHED_GRP1 as the candidate
for sd->busiest due to the following loads that it calculates

SCHED_GRP1:3200
SCHED_GRP2:1156

This patch aims to strike a balance between the loads of the
group and the number of tasks running on the group to decide the
busiest group in the sched_domain.

This means we will need to use the PJT's metrics but with an
additional constraint.

Signed-off-by: Preeti U Murthy 
---
 kernel/sched/fair.c |   22 +++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index dd0fb28..d45b7b4 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -165,7 +165,8 @@ void sched_init_granularity(void)
 #else
 # define WMULT_CONST   (1UL << 32)
 #endif
-
+#define NR_THRESHOLD 2
+#define LOAD_THRESHOLD 1
 #define WMULT_SHIFT32
 
 /*
@@ -4169,6 +4170,7 @@ struct sd_lb_stats {
/* Statistics of the busiest group */
unsigned int  busiest_idle_cpus;
unsigned long max_load;
+   u64 max_sg_load; /* Equivalent of max_load but calculated using pjt's 
metric*/
unsigned long busiest_load_per_task;
unsigned long busiest_nr_running;
unsigned long busiest_group_capacity;
@@ -4628,8 +4630,21 @@ static bool update_sd_pick_busiest(struct lb_env *env,
   struct sched_group *sg,
   struct sg_lb_stats *sgs)
 {
-   if (sgs->avg_load <= sds->max_load)
-   return false;
+   /* Use PJT's metrics to qualify a sched_group as busy
+* But a low load sched group may be queueing up many tasks
+*
+* So before dismissing a sched group with lesser load,ensure
+* that the number of processes on it is checked if it is
+* not too less loaded than the max load so far
+*/
+   if (sgs->avg_cfs_runnable_load <= sds->max_sg_load) {
+   if (sgs->avg_cfs_runnable_load > LOAD_THRESHOLD * 
sds->max_sg_load) {
+   if (sgs->sum_nr_running <= (NR_THRESHOLD + 
sds->busiest_nr_running))
+   return false;
+   } else {
+   return false;
+   }
+   }
 
if (sgs->sum_nr_running > sgs->group_capacity)
return true;
@@ -4708,6 +4723,7 @@ static inline void update_sd_lb_stats(struct lb_env *env,
sds->this_idle_cpus = sgs.idle_cpus;
} else if (update_sd_pick_busiest(env, sds, sg, )) {
sds->max_load = sgs.avg_load;
+   sds->max_sg_load = sgs.avg_cfs_runnable_load;
sds->busiest = sg;
sds->busiest_nr_running = sgs.sum_nr_running;
sds->busiest_idle_cpus = sgs.idle_cpus;

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH 1/2] sched:Prevent movement of short running tasks during load balancing

2012-10-11 Thread Preeti U Murthy
Prevent sched groups with low load as tracked by PJT's metrics
from being candidates of the load balance routine.This metric is chosen to be
1024+15%*1024.But using PJT's metrics it has been observed that even when
three 10% tasks are running,the load sometimes does not exceed this
threshold.The call should be taken if the tasks can afford to be throttled.

This is why an additional metric has been included,which can determine how
long we can tolerate tasks not being moved even if the load is low.

Signed-off-by:  Preeti U Murthy 
---
 kernel/sched/fair.c |   16 
 1 file changed, 16 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index dbddcf6..dd0fb28 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4188,6 +4188,7 @@ struct sd_lb_stats {
  */
 struct sg_lb_stats {
unsigned long avg_load; /*Avg load across the CPUs of the group */
+   u64 avg_cfs_runnable_load; /* Equivalent of avg_load but calculated 
using pjt's metric */
unsigned long group_load; /* Total load over the CPUs of the group */
unsigned long sum_nr_running; /* Nr tasks running in the group */
unsigned long sum_weighted_load; /* Weighted load of group's tasks */
@@ -4504,6 +4505,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
unsigned long load, max_cpu_load, min_cpu_load;
unsigned int balance_cpu = -1, first_idle_cpu = 0;
unsigned long avg_load_per_task = 0;
+   u64 group_load = 0; /* computed using PJT's metric */
int i;
 
if (local_group)
@@ -4548,6 +4550,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
if (idle_cpu(i))
sgs->idle_cpus++;
 
+   group_load += cpu_rq(i)->cfs.runnable_load_avg;
update_sg_numa_stats(sgs, rq);
}
 
@@ -4572,6 +4575,19 @@ static inline void update_sg_lb_stats(struct lb_env *env,
sgs->avg_load = (sgs->group_load*SCHED_POWER_SCALE) / group->sgp->power;
 
/*
+* Check if the sched group has not crossed the threshold.
+*
+* Also check if the sched_group although being within the threshold,is 
not
+* queueing too many tasks.If yes to both,then make it an
+* invalid candidate for load balancing
+*
+* The below condition is included as a tunable to meet performance and 
power needs
+*/
+   sgs->avg_cfs_runnable_load = (group_load * SCHED_POWER_SCALE) / 
group->sgp->power;
+   if (sgs->avg_cfs_runnable_load <= 1178 && sgs->sum_nr_running <= 2)
+   sgs->avg_cfs_runnable_load = 0;
+
+   /*
 * Consider the group unbalanced when the imbalance is larger
 * than the average weight of a task.
 *

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH 0/2] sched: Load Balancing using Per-entity-Load-tracking

2012-10-11 Thread Preeti U Murthy
Hi everyone,

This patchset uses the per-entity-load-tracking patchset which will soon be
available in the kernel.It is based on the tip/master tree and the first 8
latest patches of sched:per-entity-load-tracking alone have been imported to
the tree to avoid the complexities of task groups and to hold back the
optimizations of this patch for now.

This patchset is an attempt to begin the integration of Per-entity-load-
metric for the cfs_rq,henceforth referred to as PJT's metric,with the load
balancer in a step wise fashion,and progress based on the consequences.

The following issues have been considered towards this:
[NOTE:an x% task referred to in the logs and below is calculated over a
duty cycle of 10ms.]

1.Consider a scenario,where there are two 10% tasks running on a cpu.The
  present code will consider the load on this queue to be 2048,while
  using PJT's metric the load is calculated to be <1000,rarely exceeding this
  limit.Although the tasks are not contributing much to the cpu load,they are
  decided to be moved by the scheduler.

  But one could argue that 'not moving one of these tasks could throttle
  them.If there was an idle cpu,perhaps we could have moved them'.While the
  power save mode would have been fine with not moving the task,the
  performance mode would prefer not to throttle the tasks.We could strive
  to strike a balance by making this decision tunable with certain parameters.
  This patchset includes such tunables.This issue is addressed in Patch[1/2].

2.We need to be able to do this cautiously,as the scheduler code is too
  complex.This patchset is an attempt to begin the integration of PJT's
  metric with the load balancer in a step wise fashion,and progress based on
  the consequences.
  I dont intend to vary the parameters used by the load balancer.Some
  parameters are however included anew to make decisions about including a
  sched group as a candidate for load balancing.

  This patchset therefore has two primary aims.
 Patch[1/2]: This patch aims at detecting short running tasks and
 prevent their movement.In update_sg_lb_stats,dismiss a sched group
 as a candidate for load balancing,if load calculated by PJT's metric
 says that the average load on the sched_group <= 1024+(.15*1024).
 This is a tunable,which can be varied after sufficient experiments.

 Patch[2/2]:In the current scheduler greater load would be analogous
 to more number of tasks.Therefore when the busiest group is picked
 from the sched domain in update_sd_lb_stats,only the loads of the
 groups are compared between them.If we were to use PJT's metric,a
 higher load does not necessarily mean more number of tasks.This
 patch addresses this issue.

3.The next step towards integration should be in using the PJT's metric for
  comparison between the loads of the busy sched group and the sched
  group which has to pull the tasks,which happens in find_busiest_group.
---

Preeti U Murthy (2):
  sched:Prevent movement of short running tasks during load balancing
  sched:Pick the apt busy sched group during load balancing


 kernel/sched/fair.c |   38 +++---
 1 file changed, 35 insertions(+), 3 deletions(-)

--
Regards,
Preeti U Murthy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] uprobe: fix misleading log entry

2012-10-11 Thread Srikar Dronamraju
> On Wed, Jul 18, 2012 at 5:22 PM, Srikar Dronamraju
>  wrote:
> > * Jovi Zhang  [2012-07-18 11:08:42]:
> >
> >> From 68232ef2decae95b807f2f3763e8ea99c1a3b2ae Mon Sep 17 00:00:00 2001
> >> From: Jovi Zhang 
> >> Date: Wed, 18 Jul 2012 17:51:26 +0800
> >> Subject: [PATCH] uprobe: fix misleading log entry
> >>
> >> There don't have any 'r' prefix in uprobe event naming, remove it.
> >>
> >> Signed-off-by: Jovi Zhang 
> >> ---
> >>  kernel/trace/trace_uprobe.c |2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> >> index cf382de..852a584 100644
> >> --- a/kernel/trace/trace_uprobe.c
> >> +++ b/kernel/trace/trace_uprobe.c
> >> @@ -191,7 +191,7 @@ static int create_trace_uprobe(int argc, char **argv)
> >>   if (argv[0][0] == '-')
> >>   is_delete = true;
> >>   else if (argv[0][0] != 'p') {
> >> - pr_info("Probe definition must be started with 'p', 'r' or" 
> >> " '-'.\n");
> >> + pr_info("Probe definition must be started with 'p' or 
> >> '-'.\n");
> >>   return -EINVAL;
> >>   }
> >>
> >
> > Yes, uprobes doesnt support return probes. So we should not have
> > mentioned about r.
> >
> > Acked-by: Srikar Dronamraju 
> >
> 

Ingo/Andrew

Can you please pick this.


-- 
Thanks and Regards
Srikar

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] efivarfs: Implement exclusive access for {get,set}_variable

2012-10-11 Thread Greg KH
On Thu, Oct 11, 2012 at 02:42:36PM +0100, Matt Fleming wrote:
> On Thu, 2012-10-11 at 21:19 +0800, Jeremy Kerr wrote:
> > Currently, efivarfs does not enforce exclusion over the get_variable and
> > set_variable operations. Section 7.1 of UEFI requires us to only allow a
> > single processor to enter {get,set}_variable services at once.
> > 
> > This change acquires the efivars->lock over calls to these operations
> > from the efivarfs paths.
> > 
> > Signed-off-by: Jeremy Kerr 
> > 
> > ---
> >  drivers/firmware/efivars.c |   68 +++--
> >  1 file changed, 43 insertions(+), 25 deletions(-)
> 
> Thanks, applied to 'next'.

Should this be backported to the stable kernels?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] backlight: lm3639: Return proper error in lm3639_bled_mode_store error paths

2012-10-11 Thread gshark

2012년 10월 11일 14:11, Axel Lin 쓴 글:

Signed-off-by: Axel Lin 
---
  drivers/video/backlight/lm3639_bl.c |4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/video/backlight/lm3639_bl.c 
b/drivers/video/backlight/lm3639_bl.c
index c6915c6..585949b 100644
--- a/drivers/video/backlight/lm3639_bl.c
+++ b/drivers/video/backlight/lm3639_bl.c
@@ -206,11 +206,11 @@ static ssize_t lm3639_bled_mode_store(struct device *dev,
  
  out:

dev_err(pchip->dev, "%s:i2c access fail to register\n", __func__);
-   return size;
+   return ret;
  
  out_input:

dev_err(pchip->dev, "%s:input conversion fail\n", __func__);
-   return size;
+   return ret;
  
  }
  


Thank you Alex for fixing code.
Acked-by: G.Shark Jeong
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/4] DMA: PL330: Fix mem leaks and balance probe/remove

2012-10-11 Thread Inderpal Singh
Hello,

On 5 October 2012 06:17, Inderpal Singh  wrote:
> The first 2 patches of this series fix memory leaks because the memory
> allocated for peripheral channels and DMA descriptors were not getting
> freed.
>
> The last 2 patches balance the module's remove function.
>
> This series depends on "61c6e7531d3b66b3 DMA: PL330: Check the
> pointer returned by kzalloc" which is on slave-dma's "fixes" branch.
> Hence slave-dma tree's "next" branch was merged with "fixes" and
> applied patch at [1] to fix the build error.
>
> [1] http://permalink.gmane.org/gmane.linux.kernel.next/24274
>
> Changes since v1:
>  - Protect only list_add_tail with spin_locks
>  - Return EBUSY from remove if channel is in use
>  - unregister dma_device in remove
>
> Inderpal Singh (4):
>   DMA: PL330: Free memory allocated for peripheral channels
>   DMA: PL330: Change allocation method to properly free  DMA
> descriptors
>   DMA: PL330: Balance module remove function with probe
>   DMA: PL330: unregister dma_device in module's remove function
>
>  drivers/dma/pl330.c |   53 
> ---
>  1 file changed, 38 insertions(+), 15 deletions(-)
>

Any comments on this v2 version of the patchset?

Thanks,
Inder

> --
> 1.7.9.5
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] acpi : acpi_bus_trim() stops removing devices when failing to remove the device

2012-10-11 Thread Yasuaki Ishimatsu

Hi Toshi,

2012/10/11 22:58, Toshi Kani wrote:

On Thu, 2012-10-11 at 19:12 +0900, Yasuaki Ishimatsu wrote:

acpi_bus_trim() stops removing devices, when acpi_bus_remove() return error
number. But acpi_bus_remove() cannot return error number correctly.
acpi_bus_remove() only return -EINVAL, when dev argument is NULL. Thus even if
device cannot be removed correctly, acpi_bus_trim() ignores and continues to
remove devices. acpi_bus_hot_remove_device() uses acpi_bus_trim() for removing
devices. Therefore acpi_bus_hot_remove_device() can send "_EJ0" to firmware,
even if the device is running on the system. In this case, the system cannot
work well.

Vasilis hit the bug at memory hotplug and reported it as follow:
https://lkml.org/lkml/2012/9/26/318

So acpi_bus_trim() should check whether device was removed or not correctly.
The patch adds error check into some functions to remove the device.

Applying the patch, acpi_bus_trim() stops removing devices when failing
to remove the device. But I think there is no impact with the
exceptionof CPU and Memory hotplug path. Because other device also fails
but the fail is an irregular case like device is NULL.

v1->v2
- add a rollback for reinstalling a notify handler.

Signed-off-by: Yasuaki Ishimatsu 


Thanks for the update. Looks good.

Reviewed-by: Toshi Kani 


Thank you for reviewing.

Thanks,
Yasauaki Ishimatsu


-Toshi



---
  drivers/acpi/scan.c|   21 ++---
  drivers/base/dd.c  |   22 +-
  include/linux/device.h |2 +-
  3 files changed, 36 insertions(+), 9 deletions(-)

Index: linux-3.6/drivers/acpi/scan.c
===
--- linux-3.6.orig/drivers/acpi/scan.c  2012-10-11 18:31:40.189019503 +0900
+++ linux-3.6/drivers/acpi/scan.c   2012-10-11 18:42:35.669041641 +0900
@@ -445,18 +445,29 @@ static int acpi_device_remove(struct dev
  {
struct acpi_device *acpi_dev = to_acpi_device(dev);
struct acpi_driver *acpi_drv = acpi_dev->driver;
+   int ret;

if (acpi_drv) {
if (acpi_drv->ops.notify)
acpi_device_remove_notify_handler(acpi_dev);
-   if (acpi_drv->ops.remove)
-   acpi_drv->ops.remove(acpi_dev, acpi_dev->removal_type);
+   if (acpi_drv->ops.remove) {
+   ret = acpi_drv->ops.remove(acpi_dev,
+  acpi_dev->removal_type);
+   if (ret)
+   goto rollback;
+   }
}
acpi_dev->driver = NULL;
acpi_dev->driver_data = NULL;

put_device(dev);
return 0;
+
+rollback:
+   if (acpi_drv->ops.notify)
+   acpi_device_install_notify_handler(acpi_dev);
+
+   return ret;
  }

  struct bus_type acpi_bus_type = {
@@ -1226,11 +1237,15 @@ static int acpi_device_set_context(struc

  static int acpi_bus_remove(struct acpi_device *dev, int rmdevice)
  {
+   int ret;
+
if (!dev)
return -EINVAL;

dev->removal_type = ACPI_BUS_REMOVAL_EJECT;
-   device_release_driver(>dev);
+   ret = device_release_driver(>dev);
+   if (ret)
+   return ret;

if (!rmdevice)
return 0;
Index: linux-3.6/drivers/base/dd.c
===
--- linux-3.6.orig/drivers/base/dd.c2012-10-11 18:31:40.191019505 +0900
+++ linux-3.6/drivers/base/dd.c 2012-10-11 18:31:46.873020548 +0900
@@ -475,9 +475,10 @@ EXPORT_SYMBOL_GPL(driver_attach);
   * __device_release_driver() must be called with @dev lock held.
   * When called for a USB interface, @dev->parent lock must be held as well.
   */
-static void __device_release_driver(struct device *dev)
+static int __device_release_driver(struct device *dev)
  {
struct device_driver *drv;
+   int ret = 0;

drv = dev->driver;
if (drv) {
@@ -493,9 +494,11 @@ static void __device_release_driver(stru
pm_runtime_put_sync(dev);

if (dev->bus && dev->bus->remove)
-   dev->bus->remove(dev);
+   ret = dev->bus->remove(dev);
else if (drv->remove)
-   drv->remove(dev);
+   ret = drv->remove(dev);
+   if (ret)
+   goto rollback;
devres_release_all(dev);
dev->driver = NULL;
dev_set_drvdata(dev, NULL);
@@ -506,6 +509,12 @@ static void __device_release_driver(stru
 dev);

}
+
+   return ret;
+
+rollback:
+   driver_sysfs_add(dev);
+   return ret;
  }

  /**
@@ -515,16 +524,19 @@ static void __device_release_driver(stru
   * Manually detach device from driver.
   * When called for a USB interface, @dev->parent lock must be held.
   */
-void 

Re: [PATCH v3 07/10] thp: implement splitting pmd for huge zero page

2012-10-11 Thread Kirill A. Shutemov
On Fri, Oct 12, 2012 at 11:23:37AM +0800, Ni zhan Chen wrote:
> On 10/02/2012 11:19 PM, Kirill A. Shutemov wrote:
> >From: "Kirill A. Shutemov" 
> >
> >We can't split huge zero page itself, but we can split the pmd which
> >points to it.
> >
> >On splitting the pmd we create a table with all ptes set to normal zero
> >page.
> >
> >Signed-off-by: Kirill A. Shutemov 
> >Reviewed-by: Andrea Arcangeli 
> >---
> >  mm/huge_memory.c |   32 
> >  1 files changed, 32 insertions(+), 0 deletions(-)
> >
> >diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> >index 95032d3..3f1c59c 100644
> >--- a/mm/huge_memory.c
> >+++ b/mm/huge_memory.c
> >@@ -1600,6 +1600,7 @@ int split_huge_page(struct page *page)
> > struct anon_vma *anon_vma;
> > int ret = 1;
> >+BUG_ON(is_huge_zero_pfn(page_to_pfn(page)));
> > BUG_ON(!PageAnon(page));
> > anon_vma = page_lock_anon_vma(page);
> > if (!anon_vma)
> >@@ -2503,6 +2504,32 @@ static int khugepaged(void *none)
> > return 0;
> >  }
> >+static void __split_huge_zero_page_pmd(struct vm_area_struct *vma,
> >+unsigned long haddr, pmd_t *pmd)
> >+{
> >+pgtable_t pgtable;
> >+pmd_t _pmd;
> >+int i;
> >+
> >+pmdp_clear_flush_notify(vma, haddr, pmd);
> 
> why I can't find function pmdp_clear_flush_notify in kernel source
> code? Do you mean pmdp_clear_flush_young_notify or something like
> that?

It was changed recently. See commit
2ec74c3 mm: move all mmu notifier invocations to be done outside the PT lock

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: manual merge of the signal tree with the vfs tree

2012-10-11 Thread Stephen Rothwell
Hi Al,

Today's linux-next merge of the signal tree got a conflict in
arch/powerpc/kernel/sys_ppc32.c between commit 2a5e5beb88c5 ("vfs: define
struct filename and have getname() return it") from the vfs tree and
commit be6abfa769fa ("powerpc: switch to generic sys_execve()/
kernel_execve()") from the signal tree.

The latter removed removed the function that was modified by the former,
so I did that and can carry the fix as necessary (no action is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgprL16u9rWqs.pgp
Description: PGP signature


linux-next: manual merge of the signal tree with the vfs tree

2012-10-11 Thread Stephen Rothwell
Hi Al,

Today's linux-next merge of the signal tree got a conflict in
arch/powerpc/kernel/process.c between commit 2a5e5beb88c5 ("vfs: define
struct filename and have getname() return it") from the vfs tree and
commit be6abfa769fa ("powerpc: switch to generic sys_execve
()/kernel_execve()") from the signal tree.

The latter rewrote the function modified by the former, so I used the
latter and can carry the fix as necessary (no action is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpmXS6GlFxAM.pgp
Description: PGP signature


linux-next: manual merge of the signal tree with the vfs tree

2012-10-11 Thread Stephen Rothwell
Hi Al,

Today's linux-next merge of the signal tree got a conflict in
arch/mn10300/kernel/process.c between commit 2a5e5beb88c5 ("vfs: define
struct filename and have getname() return it") from the vfs tree and
commit 8f1597e959a3 ("mn10300: switch to generic sys_execve()") from the
signal tree.

The latter removed the function modified by the former, so I did that and
can carry the fix as necessary (no action is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpFd7pouOU5N.pgp
Description: PGP signature


linux-next: manual merge of the signal tree with the vfs tree

2012-10-11 Thread Stephen Rothwell
Hi Al,

Today's linux-next merge of the signal tree got a conflict in
arch/m68k/kernel/process.c between commit 2a5e5beb88c5 ("vfs: define
struct filename and have getname() return it") from the vfs tree and
commit d878d6dacee2 ("m68k: switch to generic sys_execve()/kernel_execve()")
from the signal tree.

The latter removed the function that the latter modified, so I did that
and can carry the fix as necessary (no action is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgp5RTyAq5xpI.pgp
Description: PGP signature


linux-next: manual merge of the signal tree with the vfs tree

2012-10-11 Thread Stephen Rothwell
Hi Al,

Today's linux-next merge of the signal tree got a conflict in
arch/frv/kernel/process.c between commit 2a5e5beb88c5 ("vfs: define
struct filename and have getname() return it") from the vfs tree and
commit 460dabab73f2 ("frv: switch to generic sys_execve()") from the
signal tree.

The latter removed the function updated by the former, so I just did that
and can carry the fix as necessary (no action is required).


-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/


pgpclYGxelHVo.pgp
Description: PGP signature


linux-next: manual merge of the signal tree with the vfs tree

2012-10-11 Thread Stephen Rothwell
Hi Al,

Today's linux-next merge of the signal tree got a conflict in
arch/c6x/kernel/process.c between commit 2a5e5beb88c5 ("vfs: define
struct filename and have getname() return it") from the vfs tree and
commit 680a14535c33 ("c6x: switch to generic sys_execve") from the signal
tree.

The latter removes the function modified by the former so I did that and
can carry the fix as necessary (no action is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgp5Mf08QFuUc.pgp
Description: PGP signature


Re: linux-next: build failure after merge of the l2-mtd tree

2012-10-11 Thread Dinh Nguyen
Hi Stephen,

On Fri, 2012-10-12 at 11:14 +1100, Stephen Rothwell wrote:
> Hi Artem,
> 
> After merging the l2-mtd tree, today's linux-next build (x86_64
> allmodconfig) failed like this:
> 
> ERROR: "denali_init" [drivers/mtd/nand/denali_pci.ko] undefined!
> ERROR: "denali_remove" [drivers/mtd/nand/denali_pci.ko] undefined!
> 
> Probably caused by commit 305b1ee29c8e ("mtd: denali: split the generic
> driver and PCI layer").
> 
> I have used the l2-mtd tree from next-20121011 for today.

Sorry about that. I just sent a patch to fix this. Please disregard the
first email, I forgot to include Artem in the email.

Thanks,
Dinh



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH RFC 2/2] [x86] Optimize copy_page by re-arranging instruction sequence and saving register

2012-10-11 Thread Ma, Ling
> > Load and write operation occupy about 35% and 10% respectively for
> > most industry benchmarks. Fetched 16-aligned bytes code include about
> > 4 instructions, implying 1.34(0.35 * 4) load, 0.4 write.
> > Modern CPU support 2 load and 1 write per cycle, so throughput from
> > write is bottleneck for memcpy or copy_page, and some slight CPU only
> > support one mem operation per cycle. So it is enough to issue one
> read
> > and write instruction per cycle, and we can save registers.
> 
> So is that also true for AMD CPUs?
Although Bulldozer put 32byte instruction into decoupled 16byte entry buffers,
it still decode 4 instructions per cycle, so 4 instructions will be fed into 
execution unit and
2 loads ,1 write will be issued per cycle.

Thanks
Ling
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] genirq: for edge interrupt IRQS_ONESHOT support with irq thread

2012-10-11 Thread Chuansheng Liu

In our system, there is one edge interrupt, and we want it to be
irq thread with IRQS_ONESHOT, and found in handle_edge_irq(),
even with IRQS_ONESHOT, the irq is still unmasked without care of
flag IRQS_ONESHOT.

It causes IRQS_ONESHOT can not work well for edge interrupt, but also
after the irq thread finished with flag IRQS_ONESHOT, the irq will be
possible to be unmasked again, it should be messing mask/unmask logic.

Signed-off-by: liu chuansheng 
---
 kernel/irq/chip.c |8 +++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 57d86d0..f23f524 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -497,7 +497,13 @@ handle_edge_irq(unsigned int irq, struct irq_desc *desc)
kstat_incr_irqs_this_cpu(irq, desc);
 
/* Start handling the irq */
-   desc->irq_data.chip->irq_ack(>irq_data);
+   if (desc->istate & IRQS_ONESHOT) {
+   mask_ack_irq(desc);
+   handle_irq_event(desc);
+   cond_unmask_irq(desc);
+   goto out_unlock;
+   } else
+   desc->irq_data.chip->irq_ack(>irq_data);
 
do {
if (unlikely(!desc->action)) {
-- 
1.7.0.4



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: manual merge of the pinctrl tree with Linus' tree

2012-10-11 Thread Stephen Rothwell
Hi Linus,

Today's linux-next merge of the pinctrl tree got a conflict in
drivers/mtd/nand/atmel_nand.c between commit 28446acb1f82 ("mtd: atmel
nand: fix gpio missing request") from Linus' tree and commit 08695153170c
("MTD: atmel nand: fix gpio missing request") from the pinctrl tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc drivers/mtd/nand/atmel_nand.c
index 9144557,f48587b..000
--- a/drivers/mtd/nand/atmel_nand.c
+++ b/drivers/mtd/nand/atmel_nand.c
@@@ -1414,12 -585,19 +1416,20 @@@ static int __init atmel_nand_probe(stru
nand_chip->IO_ADDR_W = host->io_base;
nand_chip->cmd_ctrl = atmel_nand_cmd_ctrl;
  
+   pinctrl = devm_pinctrl_get_select_default(>dev);
+   if (IS_ERR(pinctrl)) {
+   dev_err(host->dev, "Failed to request pinctrl\n");
+   res = PTR_ERR(pinctrl);
+   goto err_ecc_ioremap;
+   }
+ 
if (gpio_is_valid(host->board.rdy_pin)) {
-   res = gpio_request(host->board.rdy_pin, "nand_rdy");
+   res = devm_gpio_request(>dev,
+   host->board.rdy_pin, "nand_rdy");
if (res < 0) {
dev_err(>dev,
 -  "can't request rdy gpio %d\n", 
host->board.rdy_pin);
 +  "can't request rdy gpio %d\n",
 +  host->board.rdy_pin);
goto err_ecc_ioremap;
}
  
@@@ -1435,11 -612,11 +1444,12 @@@
}
  
if (gpio_is_valid(host->board.enable_pin)) {
-   res = gpio_request(host->board.enable_pin, "nand_enable");
+   res = devm_gpio_request(>dev,
+   host->board.enable_pin, "nand_enable");
if (res < 0) {
dev_err(>dev,
 -  "can't request enable gpio %d\n", 
host->board.enable_pin);
 +  "can't request enable gpio %d\n",
 +  host->board.enable_pin);
goto err_ecc_ioremap;
}
  
@@@ -1465,11 -664,11 +1475,12 @@@
atmel_nand_enable(host);
  
if (gpio_is_valid(host->board.det_pin)) {
-   res = gpio_request(host->board.det_pin, "nand_det");
+   res = devm_gpio_request(>dev,
+   host->board.det_pin, "nand_det");
if (res < 0) {
dev_err(>dev,
 -  "can't request det gpio %d\n", 
host->board.det_pin);
 +  "can't request det gpio %d\n",
 +  host->board.det_pin);
goto err_no_card;
}
  


pgpImxNCqOazi.pgp
Description: PGP signature


Re: [PATCH v3 07/10] thp: implement splitting pmd for huge zero page

2012-10-11 Thread Ni zhan Chen

On 10/02/2012 11:19 PM, Kirill A. Shutemov wrote:

From: "Kirill A. Shutemov" 

We can't split huge zero page itself, but we can split the pmd which
points to it.

On splitting the pmd we create a table with all ptes set to normal zero
page.

Signed-off-by: Kirill A. Shutemov 
Reviewed-by: Andrea Arcangeli 
---
  mm/huge_memory.c |   32 
  1 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 95032d3..3f1c59c 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1600,6 +1600,7 @@ int split_huge_page(struct page *page)
struct anon_vma *anon_vma;
int ret = 1;
  
+	BUG_ON(is_huge_zero_pfn(page_to_pfn(page)));

BUG_ON(!PageAnon(page));
anon_vma = page_lock_anon_vma(page);
if (!anon_vma)
@@ -2503,6 +2504,32 @@ static int khugepaged(void *none)
return 0;
  }
  
+static void __split_huge_zero_page_pmd(struct vm_area_struct *vma,

+   unsigned long haddr, pmd_t *pmd)
+{
+   pgtable_t pgtable;
+   pmd_t _pmd;
+   int i;
+
+   pmdp_clear_flush_notify(vma, haddr, pmd);


why I can't find function pmdp_clear_flush_notify in kernel source code? 
Do you mean pmdp_clear_flush_young_notify or something like that?



+   /* leave pmd empty until pte is filled */
+
+   pgtable = get_pmd_huge_pte(vma->vm_mm);
+   pmd_populate(vma->vm_mm, &_pmd, pgtable);
+
+   for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
+   pte_t *pte, entry;
+   entry = pfn_pte(my_zero_pfn(haddr), vma->vm_page_prot);
+   entry = pte_mkspecial(entry);
+   pte = pte_offset_map(&_pmd, haddr);
+   VM_BUG_ON(!pte_none(*pte));
+   set_pte_at(vma->vm_mm, haddr, pte, entry);
+   pte_unmap(pte);
+   }
+   smp_wmb(); /* make pte visible before pmd */
+   pmd_populate(vma->vm_mm, pmd, pgtable);
+}
+
  void __split_huge_page_pmd(struct vm_area_struct *vma, unsigned long address,
pmd_t *pmd)
  {
@@ -2516,6 +2543,11 @@ void __split_huge_page_pmd(struct vm_area_struct *vma, 
unsigned long address,
spin_unlock(>vm_mm->page_table_lock);
return;
}
+   if (is_huge_zero_pmd(*pmd)) {
+   __split_huge_zero_page_pmd(vma, haddr, pmd);
+   spin_unlock(>vm_mm->page_table_lock);
+   return;
+   }
page = pmd_page(*pmd);
VM_BUG_ON(!page_count(page));
get_page(page);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH RFC 2/2] [x86] Optimize copy_page by re-arranging instruction sequence and saving register

2012-10-11 Thread Ma, Ling
> > Load and write operation occupy about 35% and 10% respectively for
> > most industry benchmarks. Fetched 16-aligned bytes code include about
> > 4 instructions, implying 1.34(0.35 * 4) load, 0.4 write.
> > Modern CPU support 2 load and 1 write per cycle, so throughput from
> > write is bottleneck for memcpy or copy_page, and some slight CPU only
> > support one mem operation per cycle. So it is enough to issue one
> read
> > and write instruction per cycle, and we can save registers.
> 
> I don't think "saving registers" is a useful goal here.

Ling: issuing one read and write ops in one cycle is enough for copy_page or 
memcpy performance,
so we could avoid saving and restoring registers operation.

> >
> > In this patch we also re-arrange instruction sequence to improve
> > performance The performance on atom is improved about 11%, 9% on
> > hot/cold-cache case respectively.
> 
> That's great, but the question is what happened to the older CPUs that
> also this sequence. It may be safer to add a new variant for Atom,
> unless you can benchmark those too.

Ling: 
I tested new and original version on core2, the patch improved performance 
about 9%,
Although core2 is out-of-order pipeline and weaken instruction sequence 
requirement, 
because of ROB size limitation, new patch issues write operation earlier and
get more parallelism possibility for the pair of write and load ops and better 
result.
Attached core2-cpu-info (I have no older machine)


Thanks
Ling

 


core2-cpu-info
Description: core2-cpu-info


linux-next: manual merge of the pinctrl tree with Linus' tree

2012-10-11 Thread Stephen Rothwell
Hi Linus,

Today's linux-next merge of the pinctrl tree got a conflict in 
arch/arm/mach-at91/at91sam9x5.c between commits af2a5f09fb6d ("Replace 
clk_lookup.con_id with clk_lookup.dev_id entries for twi clk") and f7d19b906556 
("ARM: at91: add clocks for I2C DT entries") from Linus' tree and commit 
5c70cd3c7c69 ("arm: at91: dt: at91sam9 add pinctrl support") from the pinctrl 
tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc arch/arm/mach-at91/at91sam9x5.c
index e503538,79b5c52..000
--- a/arch/arm/mach-at91/at91sam9x5.c
+++ b/arch/arm/mach-at91/at91sam9x5.c
@@@ -231,13 -231,10 +231,13 @@@ static struct clk_lookup periph_clocks_
CLKDEV_CON_DEV_ID("t0_clk", "f800c000.timer", _clk),
CLKDEV_CON_DEV_ID("dma_clk", "ec00.dma-controller", _clk),
CLKDEV_CON_DEV_ID("dma_clk", "ee00.dma-controller", _clk),
 +  CLKDEV_CON_DEV_ID(NULL, "f801.i2c", _clk),
 +  CLKDEV_CON_DEV_ID(NULL, "f8014000.i2c", _clk),
 +  CLKDEV_CON_DEV_ID(NULL, "f8018000.i2c", _clk),
-   CLKDEV_CON_ID("pioA", _clk),
-   CLKDEV_CON_ID("pioB", _clk),
-   CLKDEV_CON_ID("pioC", _clk),
-   CLKDEV_CON_ID("pioD", _clk),
+   CLKDEV_CON_DEV_ID(NULL, "f400.gpio", _clk),
+   CLKDEV_CON_DEV_ID(NULL, "f600.gpio", _clk),
+   CLKDEV_CON_DEV_ID(NULL, "f800.gpio", _clk),
+   CLKDEV_CON_DEV_ID(NULL, "fa00.gpio", _clk),
/* additional fake clock for macb_hclk */
CLKDEV_CON_DEV_ID("hclk", "f802c000.ethernet", _clk),
CLKDEV_CON_DEV_ID("hclk", "f803.ethernet", _clk),


pgpY5DUDpSqEJ.pgp
Description: PGP signature


linux-next: manual merge of the pinctrl tree with the tree

2012-10-11 Thread Stephen Rothwell
Hi Linus,

Today's linux-next merge of the pinctrl tree got a conflict in
arch/arm/mach-at91/at91sam9n12.c between commit f7d19b906556 ("ARM: at91:
add clocks for I2C DT entries") from the  tree and commit 5c70cd3c7c69
("arm: at91: dt: at91sam9 add pinctrl support") from the pinctrl tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc arch/arm/mach-at91/at91sam9n12.c
index 732d3d3,a4676a5..000
--- a/arch/arm/mach-at91/at91sam9n12.c
+++ b/arch/arm/mach-at91/at91sam9n12.c
@@@ -169,12 -169,10 +169,12 @@@ static struct clk_lookup periph_clocks_
CLKDEV_CON_DEV_ID("t0_clk", "f8008000.timer", _clk),
CLKDEV_CON_DEV_ID("t0_clk", "f800c000.timer", _clk),
CLKDEV_CON_DEV_ID("dma_clk", "ec00.dma-controller", _clk),
 +  CLKDEV_CON_DEV_ID(NULL, "f801.i2c", _clk),
 +  CLKDEV_CON_DEV_ID(NULL, "f8014000.i2c", _clk),
-   CLKDEV_CON_ID("pioA", _clk),
-   CLKDEV_CON_ID("pioB", _clk),
-   CLKDEV_CON_ID("pioC", _clk),
-   CLKDEV_CON_ID("pioD", _clk),
+   CLKDEV_CON_DEV_ID(NULL, "f400.gpio", _clk),
+   CLKDEV_CON_DEV_ID(NULL, "f600.gpio", _clk),
+   CLKDEV_CON_DEV_ID(NULL, "f800.gpio", _clk),
+   CLKDEV_CON_DEV_ID(NULL, "fa00.gpio", _clk),
/* additional fake clock for macb_hclk */
CLKDEV_CON_DEV_ID("hclk", "50.ohci", _clk),
CLKDEV_CON_DEV_ID("ohci_clk", "50.ohci", _clk),


pgpmiJ0ujUCxY.pgp
Description: PGP signature


linux-next: manual merge of the pinctrl tree with Linus' tree

2012-10-11 Thread Stephen Rothwell
Hi Linus,

Today's linux-next merge of the pinctrl tree got a conflict in
arch/arm/mach-at91/at91sam9263.c between commit f7d19b906556 ("ARM: at91:
add clocks for I2C DT entries") from Linus' tree and commit 5c70cd3c7c69
("arm: at91: dt: at91sam9 add pinctrl support") from the pinctrl tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc arch/arm/mach-at91/at91sam9263.c
index 6a01d03,2edf813..000
--- a/arch/arm/mach-at91/at91sam9263.c
+++ b/arch/arm/mach-at91/at91sam9263.c
@@@ -211,7 -210,11 +211,12 @@@ static struct clk_lookup periph_clocks_
CLKDEV_CON_DEV_ID("hclk", "a0.ohci", _clk),
CLKDEV_CON_DEV_ID("spi_clk", "fffa4000.spi", _clk),
CLKDEV_CON_DEV_ID("spi_clk", "fffa8000.spi", _clk),
 +  CLKDEV_CON_DEV_ID(NULL, "fff88000.i2c", _clk),
+   CLKDEV_CON_DEV_ID(NULL, "f200.gpio", _clk),
+   CLKDEV_CON_DEV_ID(NULL, "f400.gpio", _clk),
+   CLKDEV_CON_DEV_ID(NULL, "f600.gpio", _clk),
+   CLKDEV_CON_DEV_ID(NULL, "f800.gpio", _clk),
+   CLKDEV_CON_DEV_ID(NULL, "fa00.gpio", _clk),
  };
  
  static struct clk_lookup usart_clocks_lookups[] = {


pgp4jSIBefZjP.pgp
Description: PGP signature


linux-next: manual merge of the pinctrl tree with Linus' tree

2012-10-11 Thread Stephen Rothwell
Hi Linus,

Today's linux-next merge of the pinctrl tree got a conflict in
arch/arm/boot/dts/at91sam9g25ek.dts between commit fbc1871511ed ("ARM:
dts: add twi nodes for atmel boards") from Linus' tree and commit
77ccddbdc0c9 ("arm: at91: dt: at91sam9 add serial pinctrl support") from
the pinctrl tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc arch/arm/boot/dts/at91sam9g25ek.dts
index 877c08f,c5ab16f..000
--- a/arch/arm/boot/dts/at91sam9g25ek.dts
+++ b/arch/arm/boot/dts/at91sam9g25ek.dts
@@@ -13,49 -13,4 +13,20 @@@
  / {
model = "Atmel AT91SAM9G25-EK";
compatible = "atmel,at91sam9g25ek", "atmel,at91sam9x5ek", 
"atmel,at91sam9x5", "atmel,at91sam9";
 +
-   chosen {
-   bootargs = "console=ttyS0,115200 root=/dev/mtdblock1 rw 
rootfstype=ubifs ubi.mtd=1 root=ubi0:rootfs";
-   };
- 
 +  ahb {
 +  apb {
-   dbgu: serial@f200 {
-   status = "okay";
-   };
- 
-   usart0: serial@f801c000 {
-   status = "okay";
-   };
- 
-   macb0: ethernet@f802c000 {
-   phy-mode = "rmii";
-   status = "okay";
-   };
- 
 +  i2c0: i2c@f801 {
 +  status = "okay";
 +  };
 +
 +  i2c1: i2c@f8014000 {
 +  status = "okay";
 +  };
 +
 +  i2c2: i2c@f8018000 {
 +  status = "okay";
 +  };
 +  };
- 
-   usb0: ohci@0060 {
-   status = "okay";
-   num-ports = <2>;
-   atmel,vbus-gpio = < 19 1
-   20 1
- >;
-   };
- 
-   usb1: ehci@0070 {
-   status = "okay";
-   };
 +  };
  };


pgpMJVF6zX0tn.pgp
Description: PGP signature


Re: [PATCH 1/2] tracing: trivial cleanup

2012-10-11 Thread Steven Rostedt
On Thu, 2012-10-11 at 19:31 -0700, David Sharp wrote:
> On Thu, Oct 11, 2012 at 6:56 PM, Steven Rostedt  wrote:
> > Sorry, I know this is late, but it was pushed down in my todo list
> > (never off, but something I probably wouldn't have seen for a few more
> > months).
> >
> > On Thu, 2012-06-07 at 16:46 -0700, Vaibhav Nagarnaik wrote:
> >> From: David Sharp 
> >
> > If this is from David it needs his SOB.
> 
> Is that true even though we are working for the same company?
> 

Yes.

I would never push a patch from Ingo without his Signed-off-by even
though he and I work for the same company ;-)

Although, the GPL would let you. But it's best not to do it.

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 6/6] tracing: Fix maybe-uninitialized warning in ftrace_function_set_regexp

2012-10-11 Thread David Sharp
On Thu, Oct 11, 2012 at 6:36 PM, Steven Rostedt  wrote:
> On Thu, 2012-10-11 at 16:27 -0700, David Sharp wrote:
>> Compiler warning:
>>
>> kernel/trace/trace_events_filter.c: In function 
>> 'ftrace_function_set_filter_cb':
>> kernel/trace/trace_events_filter.c:2074:8: error: 'ret' may be used 
>> uninitialized in this function [-Werror=maybe-uninitialized]
>>
>> Signed-off-by: David Sharp 
>> Cc: Steven Rostedt 
>> ---
>>  kernel/trace/trace_events_filter.c |3 ++-
>>  1 files changed, 2 insertions(+), 1 deletions(-)
>>
>> diff --git a/kernel/trace/trace_events_filter.c 
>> b/kernel/trace/trace_events_filter.c
>> index 431dba8..ef36953 100644
>> --- a/kernel/trace/trace_events_filter.c
>> +++ b/kernel/trace/trace_events_filter.c
>> @@ -2002,9 +2002,10 @@ static int ftrace_function_set_regexp(struct 
>> ftrace_ops *ops, int filter,
>>  static int __ftrace_function_set_filter(int filter, char *buf, int len,
>>   struct function_filter_data *data)
>>  {
>> - int i, re_cnt, ret;
>> + int i, re_cnt;
>>   int *reset;
>>   char **re;
>> + int ret = 0;
>>
>>   reset = filter ? >first_filter : >first_notrace;
>>
>
> It has already been fixed in mainline:
>
> 92d8d4a8b0f4c6eba70f6e62b48e38bd005a56e6
>
> http://marc.info/?l=linux-kernel=134012157512078

Okay, drop this one then. I haven't been rebasing.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] binfmt_script: do not leave interp on stack

2012-10-11 Thread Kees Cook
When binfmt_script's load_script ran, it would manipulate bprm->buf and
leave bprm->interp pointing to the local stack. If a series of scripts
are executed, the final one will have leaked kernel stack contents into
the cmdline. For a proof of concept, see DoTest.sh from:
http://www.halfdog.net/Security/2012/LinuxKernelBinfmtScriptStackDataDisclosure/

Largely based on a patch by halfdog. Cleaned up various style issues,
including those reported by Randy Dunlap and scripts/checkpatch.pl.

Cc: halfdog 
Cc: sta...@vger.kernel.org
Signed-off-by: Kees Cook 
---
For more background, see the earlier thread:
https://lkml.org/lkml/2012/8/18/75
---
 fs/binfmt_script.c |  126 +---
 1 file changed, 101 insertions(+), 25 deletions(-)

diff --git a/fs/binfmt_script.c b/fs/binfmt_script.c
index d3b8c1f..15fe9e8 100644
--- a/fs/binfmt_script.c
+++ b/fs/binfmt_script.c
@@ -14,12 +14,22 @@
 #include 
 #include 
 
+/*
+ * Check if this handler is suitable to load the interpreter identified
+ * by first BINPRM_BUF_SIZE bytes in bprm->buf following "#!".
+ *
+ * Returns:
+ * 0: success; the new executable is ready in bprm->mm.
+ *   -ve: interpreter not found, or other binfmts failed to find a
+ *suitable binary.
+ */
 static int load_script(struct linux_binprm *bprm,struct pt_regs *regs)
 {
const char *i_arg, *i_name;
char *cp;
struct file *file;
-   char interp[BINPRM_BUF_SIZE];
+   char bprm_buf_copy[BINPRM_BUF_SIZE];
+   const char *bprm_old_interp_name;
int retval;
 
if ((bprm->buf[0] != '#') || (bprm->buf[1] != '!') ||
@@ -30,34 +40,57 @@ static int load_script(struct linux_binprm *bprm,struct 
pt_regs *regs)
 * Sorta complicated, but hopefully it will work.  -TYT
 */
 
-   bprm->recursion_depth++;
-   allow_write_access(bprm->file);
-   fput(bprm->file);
-   bprm->file = NULL;
+   /*
+* Keep bprm unchanged until we known that this is a script
+* to be handled by this loader. Copy bprm->buf for sure,
+* otherwise returning -ENOEXEC will make other handlers see
+* modified data.
+*/
+   memcpy(bprm_buf_copy, bprm->buf, BINPRM_BUF_SIZE);
 
-   bprm->buf[BINPRM_BUF_SIZE - 1] = '\0';
-   if ((cp = strchr(bprm->buf, '\n')) == NULL)
-   cp = bprm->buf+BINPRM_BUF_SIZE-1;
+   /* Locate and truncate end of string. */
+   bprm_buf_copy[BINPRM_BUF_SIZE - 1] = '\0';
+   cp = strchr(bprm_buf_copy, '\n');
+   if (cp == NULL)
+   cp = bprm_buf_copy + BINPRM_BUF_SIZE - 1;
*cp = '\0';
-   while (cp > bprm->buf) {
+   /* Truncate trailing white-space. */
+   while (cp > bprm_buf_copy) {
cp--;
if ((*cp == ' ') || (*cp == '\t'))
*cp = '\0';
else
break;
}
-   for (cp = bprm->buf+2; (*cp == ' ') || (*cp == '\t'); cp++);
+   /* Skip leading white-space. */
+   for (cp = bprm_buf_copy + 2; (*cp == ' ') || (*cp == '\t'); cp++)
+   /* nothing */ ;
+
+   /*
+* No interpreter name found? No problem to let other handlers
+* retry, we did not change anything.
+*/
if (*cp == '\0') 
-   return -ENOEXEC; /* No interpreter name found */
+   return -ENOEXEC;
+
i_name = cp;
i_arg = NULL;
+   /* Find start of first argument. */
for ( ; *cp && (*cp != ' ') && (*cp != '\t'); cp++)
/* nothing */ ;
+   /* Truncate and skip leading white-space. */
while ((*cp == ' ') || (*cp == '\t'))
*cp++ = '\0';
if (*cp)
i_arg = cp;
-   strcpy (interp, i_name);
+
+   /*
+* So this is our point-of-no-return: modification of bprm
+* will be irreversible, so if we fail to setup execution
+* using the new interpreter name (i_name), we have to make
+* sure that no other handler tries again.
+*/
+
/*
 * OK, we've parsed out the interpreter name and
 * (optional) argument.
@@ -68,34 +101,77 @@ static int load_script(struct linux_binprm *bprm,struct 
pt_regs *regs)
 * This is done in reverse order, because of how the
 * user environment and arguments are stored.
 */
+
+   /*
+* Ugly: we store pointer to local stack frame in bprm,
+* so make sure to clean this up before returning.
+*/
+   bprm_old_interp_name = bprm->interp;
+   bprm->interp = i_name;
+
retval = remove_arg_zero(bprm);
if (retval)
-   return retval;
-   retval = copy_strings_kernel(1, >interp, bprm);
-   if (retval < 0) return retval; 
+   goto out;
+
+   /*
+* copy_strings_kernel is ok here, even when racy: since no
+* user can be attached to new mm, there is nobody to race
+* 

Re: [PATCH 1/2] tracing: trivial cleanup

2012-10-11 Thread David Sharp
On Thu, Oct 11, 2012 at 6:56 PM, Steven Rostedt  wrote:
> Sorry, I know this is late, but it was pushed down in my todo list
> (never off, but something I probably wouldn't have seen for a few more
> months).
>
> On Thu, 2012-06-07 at 16:46 -0700, Vaibhav Nagarnaik wrote:
>> From: David Sharp 
>
> If this is from David it needs his SOB.

Is that true even though we are working for the same company?

> -- Steve
>
>>
>> Remove ftrace_format_syscall() declaration; it is neither defined nor
>> used. Also update a comment and formatting.
>>
>> Signed-off-by: Vaibhav Nagarnaik 

Signed-off-by: David Sharp 

>> ---
>>  include/trace/syscall.h|2 --
>>  kernel/trace/ring_buffer.c |6 +++---
>>  2 files changed, 3 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/trace/syscall.h b/include/trace/syscall.h
>> index 31966a4..0c95796 100644
>> --- a/include/trace/syscall.h
>> +++ b/include/trace/syscall.h
>> @@ -39,8 +39,6 @@ extern int reg_event_syscall_enter(struct 
>> ftrace_event_call *call);
>>  extern void unreg_event_syscall_enter(struct ftrace_event_call *call);
>>  extern int reg_event_syscall_exit(struct ftrace_event_call *call);
>>  extern void unreg_event_syscall_exit(struct ftrace_event_call *call);
>> -extern int
>> -ftrace_format_syscall(struct ftrace_event_call *call, struct trace_seq *s);
>>  enum print_line_t print_syscall_enter(struct trace_iterator *iter, int 
>> flags,
>> struct trace_event *event);
>>  enum print_line_t print_syscall_exit(struct trace_iterator *iter, int flags,
>> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
>> index 1d0f6a8..96c2dd1 100644
>> --- a/kernel/trace/ring_buffer.c
>> +++ b/kernel/trace/ring_buffer.c
>> @@ -1816,7 +1816,7 @@ rb_add_time_stamp(struct ring_buffer_event *event, u64 
>> delta)
>>  }
>>
>>  /**
>> - * ring_buffer_update_event - update event type and data
>> + * rb_update_event - update event type and data
>>   * @event: the even to update
>>   * @type: the type of event
>>   * @length: the size of the event field in the ring buffer
>> @@ -2716,8 +2716,8 @@ EXPORT_SYMBOL_GPL(ring_buffer_discard_commit);
>>   * and not the length of the event which would hold the header.
>>   */
>>  int ring_buffer_write(struct ring_buffer *buffer,
>> - unsigned long length,
>> - void *data)
>> +   unsigned long length,
>> +   void *data)
>>  {
>>   struct ring_buffer_per_cpu *cpu_buffer;
>>   struct ring_buffer_event *event;
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Request for review] Revised delete_module(2) manual page

2012-10-11 Thread Rusty Russell
Lucas De Marchi  writes:
> What do you think? Mark as deprecated now and remove when kernel
> removes it? Or remove now?

Complain now, and I'll queue the removal in two merge windows.

Thats gives us a chance just in case someone actually uses this; if so I
want to talk to them about what it is they want!

Thanks,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 8/8] x86/lguest: Use __pa_symbol instead of __pa on C visible symbols

2012-10-11 Thread Rusty Russell
Alexander Duyck  writes:

> The function lguest_write_cr3 is using __pa to convert swapper_pg_dir and
> initial_page_table from virtual addresses to physical.  The correct function
> to use for these values is __pa_symbol since they are C visible symbols.
>
> Cc: Rusty Russell 
> Signed-off-by: Alexander Duyck 

Acked-by: Rusty Russell 

Thanks,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] virtio-net: inline header support

2012-10-11 Thread Rusty Russell
"Michael S. Tsirkin"  writes:
> On Thu, Oct 11, 2012 at 10:33:31AM +1030, Rusty Russell wrote:
>> OK.  Well, Anthony wants qemu to be robust in this regard, so I am
>> tempted to rework all the qemu drivers to handle arbitrary layouts.
>> They could use a good audit anyway.
>
> I agree here. Still trying to understand whether we can agree to use
> a feature bit for this, or not.

I'd *like* to imply it by the new PCI layout, but if it doesn't work
we'll add a new feature bit.

I'm resisting a feature bit, since it constrains future implementations
which could otherwise assume it.

>> This would become a glaring exception, but I'm tempted to fix it to 32
>> bytes at the same time as we get the new pci layout (ie. for the virtio
>> 1.0 spec).
>
> But this isn't a virtio-pci only issue, is it?
> qemu has s390 bus with same limmitation.
> How can we tie it to pci layout?

They can use a transport feature if they need to, of course.  But
perhaps the timing with ccw will coincide with the fix, in which they
don't need to, but it might be a bit late.

Cornelia?

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [SCSI] virtio-scsi: Add real 2-clause BSD license to header

2012-10-11 Thread Rusty Russell
Paolo Bonzini  writes:

> Il 11/10/2012 08:41, Bryan Venteicher ha scritto:
>> This is analogous to commit a1b383870a made by Rusty Russell to all
>> the VirtIO headers at the time. This eases the use of the header as
>> is by other OSes.
>> 
>> Signed-off-by: Bryan Venteicher 
>
> Acked-by: Paolo Bonzini 

Applied!

Thanks,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] module: add syscall to load module from fd

2012-10-11 Thread Rusty Russell
"H. Peter Anvin"  writes:

> On 10/10/2012 06:03 AM, Michael Kerrisk (man-pages) wrote:
>> Good point. A "whole hog" openat()-style interface is worth thinking about 
>> too.
>
> *Although* you could argue that you can always simply open the module
> file first, and that finit_module() is really what we should have had in
> the first place.  Then you don't need the flags since those would come
> from openat().

There's no fundamental reason that modules have to be in a file.  I'm
thinking of compressed modules, or an initrd which simply includes all
the modules it wants to load in one linear file.

Also, --force options manipulate the module before loading (as did the
now-obsolete module rename option).

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: manual merge of the kvm-ppc tree with the kvm tree

2012-10-11 Thread Stephen Rothwell
Hi Alexander,

Today's linux-next merge of the kvm-ppc tree got a conflict in
arch/powerpc/kvm/44x_emulate.c between commits from the kvm tree and
the same patches plus another commit from the kvm-ppc tree.

I just used the kvm-ppc tree version.

This happened because
1) you rebased/rewrote your tree before it was merged into the kvm tree
2) left the old version in your kvm-ppc-next branch
3) added some more commits to your kvm-ppc-next branch

Don't do that!

Unfortunately, the best thing you can do now is to rebase your
kvm-ppc-next branch on top of the kvm tree.  :-(

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpWoXo34wMDu.pgp
Description: PGP signature


Re: [PATCH 1/2] tracing: trivial cleanup

2012-10-11 Thread Steven Rostedt
Sorry, I know this is late, but it was pushed down in my todo list
(never off, but something I probably wouldn't have seen for a few more
months).

On Thu, 2012-06-07 at 16:46 -0700, Vaibhav Nagarnaik wrote:
> From: David Sharp 

If this is from David it needs his SOB.

-- Steve

> 
> Remove ftrace_format_syscall() declaration; it is neither defined nor
> used. Also update a comment and formatting.
> 
> Signed-off-by: Vaibhav Nagarnaik 
> ---
>  include/trace/syscall.h|2 --
>  kernel/trace/ring_buffer.c |6 +++---
>  2 files changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/include/trace/syscall.h b/include/trace/syscall.h
> index 31966a4..0c95796 100644
> --- a/include/trace/syscall.h
> +++ b/include/trace/syscall.h
> @@ -39,8 +39,6 @@ extern int reg_event_syscall_enter(struct ftrace_event_call 
> *call);
>  extern void unreg_event_syscall_enter(struct ftrace_event_call *call);
>  extern int reg_event_syscall_exit(struct ftrace_event_call *call);
>  extern void unreg_event_syscall_exit(struct ftrace_event_call *call);
> -extern int
> -ftrace_format_syscall(struct ftrace_event_call *call, struct trace_seq *s);
>  enum print_line_t print_syscall_enter(struct trace_iterator *iter, int flags,
> struct trace_event *event);
>  enum print_line_t print_syscall_exit(struct trace_iterator *iter, int flags,
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index 1d0f6a8..96c2dd1 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -1816,7 +1816,7 @@ rb_add_time_stamp(struct ring_buffer_event *event, u64 
> delta)
>  }
>  
>  /**
> - * ring_buffer_update_event - update event type and data
> + * rb_update_event - update event type and data
>   * @event: the even to update
>   * @type: the type of event
>   * @length: the size of the event field in the ring buffer
> @@ -2716,8 +2716,8 @@ EXPORT_SYMBOL_GPL(ring_buffer_discard_commit);
>   * and not the length of the event which would hold the header.
>   */
>  int ring_buffer_write(struct ring_buffer *buffer,
> - unsigned long length,
> - void *data)
> +   unsigned long length,
> +   void *data)
>  {
>   struct ring_buffer_per_cpu *cpu_buffer;
>   struct ring_buffer_event *event;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

2012-10-11 Thread Ming Lei
On Fri, Oct 12, 2012 at 2:33 AM, Shea Levy  wrote:

>
> FWIW (and probably that's not much), the NixOS[0] distro doesn't currently
> use /lib/firmware. There is no /lib directory by default on NixOS, instead
> we create a new symlink tree representing the current system on each system
> change and symlink /run/current-system to that tree. We currently build
> udev/systemd with the --with-firmware-path=/run/current-system/firmware
> configuration-time option, but we also patch module-init-tools and kmod to
> respect the $MODULE_DIR env var and may do the same for firmware in the
> future. The way we do things has significant advantages (or at least we like
> to think so), but we already have exceptions for /bin/sh and /usr/bin/env,
> so I suspect we'll probably add in /lib/firmware if this functionality moves
> into the kernel.

The kernel parameter for customizing firmware search path will be added, so
you use can pass your search path from kernel command too.

Thanks,
-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/33] AutoNUMA27

2012-10-11 Thread Andrea Arcangeli
Hi Mel,

On Thu, Oct 11, 2012 at 10:34:32PM +0100, Mel Gorman wrote:
> So after getting through the full review of it, there wasn't anything
> I could not stand. I think it's *very* heavy on some of the paths like
> the idle balancer which I was not keen on and the fault paths are also
> quite heavy.  I think the weight on some of these paths can be reduced
> but not to 0 if the objectives to autonuma are to be met.
> 
> I'm not fully convinced that the task exchange is actually necessary or
> beneficial because it somewhat assumes that there is a symmetry between CPU
> and memory balancing that may not be true. The fact that it only considers

The problem is that without an active task exchange and no explicit
call to stop_one_cpu*, there's no way to migrate a currently running
task and clearly we need that. We can indefinitely wait hoping the
task goes to sleep and leaves the CPU idle, or that a couple of other
tasks start and trigger load balance events.

We must move tasks even if all cpus are in a steady rq->nr_running ==
1 state and there's no other scheduler balance event that could
possibly attempt to move tasks around in such a steady state.

Of course one could hack the active idle balancing so that it does the
active NUMA balancing action, but that would be a purely artificial
complication: it would add unnecessary delay and it would provide no
benefit whatsoever.

Why don't we dump the active idle balancing too, and we hack the load
balancing to do the active idle balancing as well? Of course then the
two will be more integrated. But it'll be a mess and slower and
there's a good reason why they exist as totally separated pieces of
code working in parallel.

We can integrate it more, but in my view the result would be worse and
more complicated. Last but not the least messing the idle balancing
code to do an active NUMA balancing action (somehow invoking
stop_one_cpu* in the steady state described above) would force even
cellphones and UP kernels to deal with NUMA code somehow.

> tasks that are currently running feels a bit random but examining all tasks
> that recently ran on the node would be far too expensive to there is no

So far this seems a good tradeoff. Nothing will prevent us to scan
deeper into the runqueues later if find a way to do that efficiently.

> good answer. You are caught between a rock and a hard place and either
> direction you go is wrong for different reasons. You need something more

I think you described the problem perfectly ;).

> frequent than scans (because it'll converge too slowly) but doing it from
> the balancer misses some tasks and may run too frequently and it's unclear
> how it effects the current load balancer decisions. I don't have a good
> alternative solution for this but ideally it would be better integrated with
> the existing scheduler when there is more data on what those scheduling
> decisions should be. That will only come from a wide range of testing and
> the inevitable bug reports.
> 
> That said, this is concentrating on the problems without considering the
> situations where it would work very well.  I think it'll come down to HPC
> and anything jitter-sensitive will hate this while workloads like JVM,
> virtualisation or anything that uses a lot of memory without caring about
> placement will love it. It's not perfect but it's better than incurring
> the cost of remote access unconditionally.

Full agreement.

Your detailed full review was very appreciated, thanks!

Andrea
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch 3/7] smpboot: Provide infrastructure for percpu hotplug threads

2012-10-11 Thread Sasha Levin
On Wed, Sep 19, 2012 at 5:47 PM, Sasha Levin  wrote:
> Hi Thomas,
>
> On 07/16/2012 12:42 PM, Thomas Gleixner wrote:
>> Provide a generic interface for setting up and tearing down percpu
>> threads.
>>
>> On registration the threads for already online cpus are created and
>> started. On deregistration (modules) the threads are stoppped.
>>
>> During hotplug operations the threads are created, started, parked and
>> unparked. The datastructure for registration provides a pointer to
>> percpu storage space and optional setup, cleanup, park, unpark
>> functions. These functions are called when the thread state changes.
>>
>> Each implementation has to provide a function which is queried and
>> returns whether the thread should run and the thread function itself.
>>
>> The core code handles all state transitions and avoids duplicated code
>> in the call sites.
>>
>> Signed-off-by: Thomas Gleixner 
>> ---
>
> This patch seems to cause the following BUG() on KVM guests with large amount 
> of
> VCPUs:
>
> [0.511760] [ cut here ]
> [0.511761] kernel BUG at kernel/smpboot.c:134!
> [0.511764] invalid opcode:  [#3] PREEMPT SMP DEBUG_PAGEALLOC
> [0.511779] CPU 0
> [0.511780] Pid: 70, comm: watchdog/10 Tainted: G  D W
> 3.6.0-rc6-next-20120919-sasha-1-gb54aafe #365
> [0.511783] RIP: 0010:[]  []
> smpboot_thread_fn+0x196/0x2e0
> [0.511785] RSP: 0018:88000cf4bdd0  EFLAGS: 00010206
> [0.511786] RAX:  RBX: 88000cf58000 RCX: 
> 
> [0.511787] RDX:  RSI: 0001 RDI: 
> 0001
> [0.511788] RBP: 88000cf4be30 R08:  R09: 
> 0001
> [0.511789] R10:  R11:  R12: 
> 88000cdb9ff0
> [0.511790] R13: 84c60920 R14: 000a R15: 
> 88000cf58000
> [0.511792] FS:  () GS:88000d20()
> knlGS:
> [0.511794] CS:  0010 DS:  ES:  CR0: 8005003b
> [0.511795] CR2:  CR3: 04c26000 CR4: 
> 000406f0
> [0.511801] DR0:  DR1:  DR2: 
> 
> [0.511805] DR3:  DR6: 0ff0 DR7: 
> 0400
> [0.511807] Process watchdog/10 (pid: 70, threadinfo 88000cf4a000, task
> 88000cf58000)
> [0.511808] Stack:
> [0.511822]  88000cf4bfd8 88000cf4bfd8  
> 
> [0.511833]  88000cf4be00 839eace5 88000cf4be30 
> 88000cdd1c68
> [0.511844]  88000cdb9ff0 811414e0  
> 
> [0.511845] Call Trace:
> [0.511852]  [] ? schedule+0x55/0x60
> [0.511857]  [] ? __smpboot_create_thread+0xf0/0xf0
> [0.511863]  [] kthread+0xe3/0xf0
> [0.511867]  [] ? wait_for_common+0x143/0x180
> [0.511873]  [] kernel_thread_helper+0x4/0x10
> [0.511878]  [] ? retint_restore_args+0x13/0x13
> [0.511883]  [] ? insert_kthread_work+0x90/0x90
> [0.511888]  [] ? gs_change+0x13/0x13
> [0.511916] Code: 24 04 02 00 00 00 0f 1f 80 00 00 00 00 e8 b3 46 ff ff e9 
> b6
> fe ff ff 66 0f 1f 44 00 00 45 8b 34 24 e8 ff 72 8a 00 41 39 c6 74 0a <0f> 0b 
> 0f
> 1f 84 00 00 00 00 00 41 8b 44 24 04 85 c0 74 0f 83 f8
> [0.511919] RIP  [] smpboot_thread_fn+0x196/0x2e0
> [0.511920]  RSP 
> [0.511922] ---[ end trace 127920ef70923ae1 ]---
>
> I'm starting the guest with numa=fake=10, so vcpu 0 ends up on the same (fake)
> node as vcpu 10, and while digging into the bug, it seems that the issue is 
> that
> vcpu10's thread gets scheduled on vcpu0.
>
> Beyond that I don't really understand what's wrong...

Ping? Still seeing that with linux-next.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 6/6] tracing: Fix maybe-uninitialized warning in ftrace_function_set_regexp

2012-10-11 Thread Steven Rostedt
On Thu, 2012-10-11 at 16:27 -0700, David Sharp wrote:
> Compiler warning:
> 
> kernel/trace/trace_events_filter.c: In function 
> 'ftrace_function_set_filter_cb':
> kernel/trace/trace_events_filter.c:2074:8: error: 'ret' may be used 
> uninitialized in this function [-Werror=maybe-uninitialized]
> 
> Signed-off-by: David Sharp 
> Cc: Steven Rostedt 
> ---
>  kernel/trace/trace_events_filter.c |3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/trace/trace_events_filter.c 
> b/kernel/trace/trace_events_filter.c
> index 431dba8..ef36953 100644
> --- a/kernel/trace/trace_events_filter.c
> +++ b/kernel/trace/trace_events_filter.c
> @@ -2002,9 +2002,10 @@ static int ftrace_function_set_regexp(struct 
> ftrace_ops *ops, int filter,
>  static int __ftrace_function_set_filter(int filter, char *buf, int len,
>   struct function_filter_data *data)
>  {
> - int i, re_cnt, ret;
> + int i, re_cnt;
>   int *reset;
>   char **re;
> + int ret = 0;
>  
>   reset = filter ? >first_filter : >first_notrace;
>  

It has already been fixed in mainline:

92d8d4a8b0f4c6eba70f6e62b48e38bd005a56e6

http://marc.info/?l=linux-kernel=134012157512078

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][PATCH] perf: Add a few generic stalled-cycles events

2012-10-11 Thread Sukadev Bhattiprolu

>From 89cb6a25b9f714e55a379467a832ee015014ed11 Mon Sep 17 00:00:00 2001
From: Sukadev Bhattiprolu 
Date: Tue, 18 Sep 2012 10:59:01 -0700
Subject: [PATCH] perf: Add a few generic stalled-cycles events

The existing generic event 'stalled-cycles-backend' corresponds to
PM_CMPLU_STALL event in Power7. While this event is useful, detailed
performance analysis often requires us to find more specific reasons
for the stalled cycle. For instance, stalled cycles in Power7 can
occur due to, among others:

- instruction fetch unit (IFU),
- Load-store-unit (LSU),
- Fixed point unit (FXU)
- Branch unit (BRU)

While it is possible to use raw codes to monitor these events, it quickly
becomes cumbersome with performance analysis frequently requiring mapping
the raw event codes in reports to their symbolic names.

This patch is a proposal to try and generalize such perf events. Since
the code changes are quite simple, I bunched all the 4 events together.

I am not familiar with how readily these events would map to other
architectures. Here is some information on the events for Power7:

stalled-cycles-fixed-point (PM_CMPLU_STALL_FXU)

Following a completion stall, the last instruction to finish
before completion resumes was from the Fixed Point Unit.

Completion stall is any period when no groups completed and
the completion table was not empty for that thread.

stalled-cycles-load-store (PM_CMPLU_STALL_LSU)

Following a completion stall, the last instruction to finish
before completion resumes was from the Load-Store Unit.

stalled-cycles-instruction-fetch (PM_CMPLU_STALL_IFU)

Following a completion stall, the last instruction to finish
before completion resumes was from the Instruction Fetch Unit.

stalled-cycles-branch (PM_CMPLU_STALL_BRU)

Following a completion stall, the last instruction to finish
before completion resumes was from the Branch Unit.

Looking for feedback on this approach and if this can be further extended.
Power7 has 530 events[2] out of which a "CPI stack analysis"[1] uses about 26
events.


[1] CPI Stack analysis

https://www.power.org/documentation/commonly-used-metrics-for-performance-analysis

[2] Power7 events:

https://www.power.org/documentation/comprehensive-pmu-event-reference-power7/

Signed-off-by: Sukadev Bhattiprolu 
---
 arch/powerpc/perf/power7-pmu.c |4 
 include/linux/perf_event.h |4 
 tools/perf/builtin-stat.c  |4 
 tools/perf/util/evsel.c|4 
 tools/perf/util/parse-events.l |4 
 tools/perf/util/python.c   |4 
 6 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/perf/power7-pmu.c b/arch/powerpc/perf/power7-pmu.c
index 1251e4d..813e7c7 100644
--- a/arch/powerpc/perf/power7-pmu.c
+++ b/arch/powerpc/perf/power7-pmu.c
@@ -304,6 +304,10 @@ static int power7_generic_events[] = {
[PERF_COUNT_HW_CACHE_MISSES] = 0x400f0, /* LD_MISS_L1   */
[PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = 0x10068,  /* BRU_FIN  */
[PERF_COUNT_HW_BRANCH_MISSES] = 0x400f6,/* BR_MPRED */
+   [PERF_COUNT_HW_STALLED_CYCLES_FIXED_POINT] = 0x20014,/* CMPLU_STALL_FXU 
*/
+   [PERF_COUNT_HW_STALLED_CYCLES_LOAD_STORE] = 0x20012,/* CMPLU_STALL_LSU 
*/
+   [PERF_COUNT_HW_STALLED_CYCLES_INSTRUCTION_FETCH] = 0x4004c,/* 
CMPLU_STALL_IFU */
+   [PERF_COUNT_HW_STALLED_CYCLES_BRANCH] = 0x4004e,/* CMPLU_STALL_BRU */
 };
 
 #define C(x)   PERF_COUNT_HW_CACHE_##x
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index bdb4161..ff9f0a6 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -55,6 +55,10 @@ enum perf_hw_id {
PERF_COUNT_HW_STALLED_CYCLES_FRONTEND   = 7,
PERF_COUNT_HW_STALLED_CYCLES_BACKEND= 8,
PERF_COUNT_HW_REF_CPU_CYCLES= 9,
+   PERF_COUNT_HW_STALLED_CYCLES_FIXED_POINT = 10,
+   PERF_COUNT_HW_STALLED_CYCLES_LOAD_STORE = 11,
+   PERF_COUNT_HW_STALLED_CYCLES_INSTRUCTION_FETCH = 12,
+   PERF_COUNT_HW_STALLED_CYCLES_BRANCH = 13,
 
PERF_COUNT_HW_MAX,  /* non-ABI */
 };
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 861f0ae..6275dbb 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -77,6 +77,10 @@ static struct perf_event_attr default_attrs[] = {
   { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_INSTRUCTIONS   
},
   { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS
},
   { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_MISSES  
},
+  { .type = PERF_TYPE_HARDWARE, .config = 
PERF_COUNT_HW_STALLED_CYCLES_FIXED_POINT },
+  { .type = PERF_TYPE_HARDWARE, .config = 
PERF_COUNT_HW_STALLED_CYCLES_LOAD_STORE },
+  

Re: [PATCH 2.6.32.y, 2.6.34.y] ALSA: hda - ALSA HD Audio patch for Intel Panther Point DeviceIDs

2012-10-11 Thread Ben Hutchings
On Tue, 2012-10-09 at 10:56 -0400, Paul Gortmaker wrote:
> On 12-10-08 09:55 PM, Ben Hutchings wrote:
> > On Mon, 2012-10-08 at 18:07 -0700, Jonathan Nieder wrote:
> >> From: Seth Heasley 
> >> Date: Wed, 20 Apr 2011 10:59:57 -0700
> >>
> >> commit d2edeb7c6f1dada8ca7d5c23e42d604e92ae0c76 upstream.
> >>
> >> This patch adds the HD Audio Controller DeviceIDs for the Intel Panther 
> >> Point PCH.
> >>
> >> [jn: backported for 2.6.32.y by Ana Guerrero]
> >>
> >> Signed-off-by: Seth Heasley 
> >> Signed-off-by: Takashi Iwai 
> >> Tested-by: Ana Guerrero  # EliteBook 8570w
> >> Signed-off-by: Jonathan Nieder 
> >> ---
> >> Hi Willy and Paul,
> >>
> >> Please consider
> >>
> >>   d2edeb7c6f1d ALSA: hda - ALSA HD Audio patch for Intel Panther Point
> >>DeviceIDs
> >>
> >> for application to the 2.6.32.y and 2.6.34.y trees.
> >>
> >> It does what it says on the cover.  The patch was merged in the 3.0
> >> cycle, so newer stable kernels don't need it.  Backported and
> >> tested[1] against Debian's 2.6.32.y-based kernel by Ana (cc-ed) --
> >> thanks!
> >>
> >> Thoughts of all kinds welcome, as always.
> > [...]
> > 
> > I queued up a whole series of device ID updates for Debian stable, as
> > they didn't obviously depend on other changes:
> > 
> > cea310e ALSA: hda_intel: ALSA HD Audio patch for Intel Patsburg DeviceIDs
> > e35d4b1 ALSA: hda: add Vortex86MX PCI ids
> > 0f0714c ALSA: hda - Add support for VMware controller
> > d2edeb7 ALSA: hda - ALSA HD Audio patch for Intel Panther Point DeviceIDs
> 
> I found that d2edeb7c6f depends (only trivially, on context) on b686453543f,
> which makes a blanket class for intel IDs so that "...the driver will work
> with any new control chips in future."   In that respect, I guess it too
> (b68645) is in the same class as the other "add more IDs" patches?

Yes, it seems like this would also be worth adding.  (Always assuming
that any functional changes it depends on are also small enough for
stable.)  It's not a clean cherry-pick as the line above has changed.

Ben.

> Paul.
> --
> 
> > 
> > (They can be cherry-picked cleanly in the above order.)  But if I've
> > missed some post-2.6.32 dependencies then I would like to know.
> > 
> > Ben.
> > 
> 

-- 
Ben Hutchings
Kids!  Bringing about Armageddon can be dangerous.  Do not attempt it in
your own home. - Terry Pratchett and Neil Gaiman, `Good Omens'


signature.asc
Description: This is a digitally signed message part


[PATCH RT 1/8] random: Make it work on rt

2012-10-11 Thread Steven Rostedt
From: Thomas Gleixner 

Delegate the random insertion to the forced threaded interrupt
handler. Store the return IP of the hard interrupt handler in the irq
descriptor and feed it into the random generator as a source of
entropy.

Signed-off-by: Thomas Gleixner 
Cc: stable...@vger.kernel.org
Signed-off-by: Steven Rostedt 
---
 drivers/char/random.c   |   10 ++
 include/linux/irqdesc.h |1 +
 include/linux/random.h  |2 +-
 kernel/irq/handle.c |7 +--
 kernel/irq/manage.c |6 ++
 5 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index d38af32..66c8a0f 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -767,18 +767,16 @@ EXPORT_SYMBOL_GPL(add_input_randomness);
 
 static DEFINE_PER_CPU(struct fast_pool, irq_randomness);
 
-void add_interrupt_randomness(int irq, int irq_flags)
+void add_interrupt_randomness(int irq, int irq_flags, __u64 ip)
 {
struct entropy_store*r;
struct fast_pool*fast_pool = &__get_cpu_var(irq_randomness);
-   struct pt_regs  *regs = get_irq_regs();
unsigned long   now = jiffies;
__u32   input[4], cycles = get_cycles();
 
input[0] = cycles ^ jiffies;
input[1] = irq;
-   if (regs) {
-   __u64 ip = instruction_pointer(regs);
+   if (ip) {
input[2] = ip;
input[3] = ip >> 32;
}
@@ -792,7 +790,11 @@ void add_interrupt_randomness(int irq, int irq_flags)
fast_pool->last = now;
 
r = nonblocking_pool.initialized ? _pool : _pool;
+#ifndef CONFIG_PREEMPT_RT_FULL
__mix_pool_bytes(r, _pool->pool, sizeof(fast_pool->pool), NULL);
+#else
+   mix_pool_bytes(r, _pool->pool, sizeof(fast_pool->pool), NULL);
+#endif
/*
 * If we don't have a valid cycle counter, and we see
 * back-to-back timer interrupts, then skip giving credit for
diff --git a/include/linux/irqdesc.h b/include/linux/irqdesc.h
index f1e2527..5f4f091 100644
--- a/include/linux/irqdesc.h
+++ b/include/linux/irqdesc.h
@@ -53,6 +53,7 @@ struct irq_desc {
unsigned intirq_count;  /* For detecting broken IRQs */
unsigned long   last_unhandled; /* Aging timer for unhandled 
count */
unsigned intirqs_unhandled;
+   u64 random_ip;
raw_spinlock_t  lock;
struct cpumask  *percpu_enabled;
 #ifdef CONFIG_SMP
diff --git a/include/linux/random.h b/include/linux/random.h
index 29e217a..3995b33 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -53,7 +53,7 @@ extern void rand_initialize_irq(int irq);
 extern void add_device_randomness(const void *, unsigned int);
 extern void add_input_randomness(unsigned int type, unsigned int code,
 unsigned int value);
-extern void add_interrupt_randomness(int irq, int irq_flags);
+extern void add_interrupt_randomness(int irq, int irq_flags, __u64 ip);
 
 extern void get_random_bytes(void *buf, int nbytes);
 extern void get_random_bytes_arch(void *buf, int nbytes);
diff --git a/kernel/irq/handle.c b/kernel/irq/handle.c
index a768885..f6b91bc 100644
--- a/kernel/irq/handle.c
+++ b/kernel/irq/handle.c
@@ -116,6 +116,8 @@ static void irq_wake_thread(struct irq_desc *desc, struct 
irqaction *action)
 irqreturn_t
 handle_irq_event_percpu(struct irq_desc *desc, struct irqaction *action)
 {
+   struct pt_regs *regs = get_irq_regs();
+   u64 ip = regs ? instruction_pointer(regs) : 0;
irqreturn_t retval = IRQ_NONE;
unsigned int flags = 0, irq = desc->irq_data.irq;
 
@@ -157,8 +159,9 @@ handle_irq_event_percpu(struct irq_desc *desc, struct 
irqaction *action)
} while (action);
 
 #ifndef CONFIG_PREEMPT_RT_FULL
-   /* FIXME: Can we unbreak that ? */
-   add_interrupt_randomness(irq, flags);
+   add_interrupt_randomness(irq, flags, ip);
+#else
+   desc->random_ip = ip;
 #endif
 
if (!noirqdebug)
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 87dc053..2204340 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -816,6 +816,12 @@ static int irq_thread(void *data)
action_ret = handler_fn(desc, action);
if (!noirqdebug)
note_interrupt(action->irq, desc, action_ret);
+#ifdef CONFIG_PREEMPT_RT_FULL
+   migrate_disable();
+   add_interrupt_randomness(action->irq, 0,
+desc->random_ip ^ (u64) 
action);
+   migrate_enable();
+#endif
}
 
wake = atomic_dec_and_test(>threads_active);
-- 
1.7.10.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

[PATCH RT 6/8] sched: Better debug output for might sleep

2012-10-11 Thread Steven Rostedt
From: Thomas Gleixner 

might sleep can tell us where interrupts have been disabled, but we
have no idea what disabled preemption. Add some debug infrastructure.

Cc: stable...@vger.kernel.org
Signed-off-by: Thomas Gleixner 
Signed-off-by: Steven Rostedt 
---
 include/linux/sched.h |4 
 kernel/sched.c|   23 +--
 2 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 03498cc..12317b6 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1612,6 +1612,10 @@ struct task_struct {
int kmap_idx;
pte_t kmap_pte[KM_TYPE_NR];
 #endif
+
+#ifdef CONFIG_DEBUG_PREEMPT
+   unsigned long preempt_disable_ip;
+#endif
 };
 
 #ifdef CONFIG_PREEMPT_RT_FULL
diff --git a/kernel/sched.c b/kernel/sched.c
index 87654e6..cdf9484 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4487,8 +4487,13 @@ void __kprobes add_preempt_count(int val)
DEBUG_LOCKS_WARN_ON((preempt_count() & PREEMPT_MASK) >=
PREEMPT_MASK - 10);
 #endif
-   if (preempt_count() == val)
-   trace_preempt_off(CALLER_ADDR0, get_parent_ip(CALLER_ADDR1));
+   if (preempt_count() == val) {
+   unsigned long ip = get_parent_ip(CALLER_ADDR1);
+#ifdef CONFIG_DEBUG_PREEMPT
+   current->preempt_disable_ip = ip;
+#endif
+   trace_preempt_off(CALLER_ADDR0, ip);
+   }
 }
 EXPORT_SYMBOL(add_preempt_count);
 
@@ -4530,6 +4535,13 @@ static noinline void __schedule_bug(struct task_struct 
*prev)
print_modules();
if (irqs_disabled())
print_irqtrace_events(prev);
+#ifdef DEBUG_PREEMPT
+   if (in_atomic_preempt_off()) {
+   pr_err("Preemption disabled at:");
+   print_ip_sym(current->preempt_disable_ip);
+   pr_cont("\n");
+   }
+#endif
 
if (regs)
show_regs(regs);
@@ -8912,6 +8924,13 @@ void __might_sleep(const char *file, int line, int 
preempt_offset)
debug_show_held_locks(current);
if (irqs_disabled())
print_irqtrace_events(current);
+#ifdef DEBUG_PREEMPT
+   if (!preempt_count_equals(preempt_offset)) {
+   pr_err("Preemption disabled at:");
+   print_ip_sym(current->preempt_disable_ip);
+   pr_cont("\n");
+   }
+#endif
dump_stack();
 }
 EXPORT_SYMBOL(__might_sleep);
-- 
1.7.10.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RT 8/8] Linux 3.2.31-rt47-rc1

2012-10-11 Thread Steven Rostedt
From: Steven Rostedt 

---
 localversion-rt |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/localversion-rt b/localversion-rt
index 2721581..8a02d38 100644
--- a/localversion-rt
+++ b/localversion-rt
@@ -1 +1 @@
--rt46
+-rt47-rc1
-- 
1.7.10.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RT 4/8] mm: page_alloc: Use local_lock_on() instead of plain spinlock

2012-10-11 Thread Steven Rostedt
From: Thomas Gleixner 

The plain spinlock while sufficient does not update the local_lock
internals. Use a proper local_lock function instead to ease debugging.

Signed-off-by: Thomas Gleixner 
Cc: stable...@vger.kernel.org
Signed-off-by: Steven Rostedt 
---
 include/linux/locallock.h |   11 +++
 mm/page_alloc.c   |4 ++--
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/include/linux/locallock.h b/include/linux/locallock.h
index 0161fbb..f1804a3 100644
--- a/include/linux/locallock.h
+++ b/include/linux/locallock.h
@@ -137,6 +137,12 @@ static inline int __local_lock_irqsave(struct 
local_irq_lock *lv)
_flags = __get_cpu_var(lvar).flags; \
} while (0)
 
+#define local_lock_irqsave_on(lvar, _flags, cpu)   \
+   do {\
+   __local_lock_irqsave(_cpu(lvar, cpu));  \
+   _flags = per_cpu(lvar, cpu).flags;  \
+   } while (0)
+
 static inline int __local_unlock_irqrestore(struct local_irq_lock *lv,
unsigned long flags)
 {
@@ -156,6 +162,11 @@ static inline int __local_unlock_irqrestore(struct 
local_irq_lock *lv,
put_local_var(lvar);\
} while (0)
 
+#define local_unlock_irqrestore_on(lvar, flags, cpu)   \
+   do {\
+   __local_unlock_irqrestore(_cpu(lvar, cpu), flags);  \
+   } while (0)
+
 #define local_spin_trylock_irq(lvar, lock) \
({  \
int __locked;   \
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 67202bc..8678a7f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -227,9 +227,9 @@ static DEFINE_LOCAL_IRQ_LOCK(pa_lock);
 
 #ifdef CONFIG_PREEMPT_RT_BASE
 # define cpu_lock_irqsave(cpu, flags)  \
-   spin_lock_irqsave(_cpu(pa_lock, cpu).lock, flags)
+   local_lock_irqsave_on(pa_lock, flags, cpu)
 # define cpu_unlock_irqrestore(cpu, flags) \
-   spin_unlock_irqrestore(_cpu(pa_lock, cpu).lock, flags)
+   local_unlock_irqrestore_on(pa_lock, flags, cpu)
 #else
 # define cpu_lock_irqsave(cpu, flags)  local_irq_save(flags)
 # define cpu_unlock_irqrestore(cpu, flags) local_irq_restore(flags)
-- 
1.7.10.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RT 5/8] rt: rwsem/rwlock: lockdep annotations

2012-10-11 Thread Steven Rostedt
From: Thomas Gleixner 

rwlocks and rwsems on RT do not allow multiple readers. Annotate the
lockdep acquire functions accordingly.

Signed-off-by: Thomas Gleixner 
Cc: stable...@vger.kernel.org
Signed-off-by: Steven Rostedt 
---
 kernel/rt.c |   46 +-
 1 file changed, 25 insertions(+), 21 deletions(-)

diff --git a/kernel/rt.c b/kernel/rt.c
index 092d6b3..aa10504 100644
--- a/kernel/rt.c
+++ b/kernel/rt.c
@@ -216,15 +216,17 @@ int __lockfunc rt_read_trylock(rwlock_t *rwlock)
 * write locked.
 */
migrate_disable();
-   if (rt_mutex_owner(lock) != current)
+   if (rt_mutex_owner(lock) != current) {
ret = rt_mutex_trylock(lock);
-   else if (!rwlock->read_depth)
+   if (ret)
+   rwlock_acquire(>dep_map, 0, 1, _RET_IP_);
+   } else if (!rwlock->read_depth) {
ret = 0;
+   }
 
-   if (ret) {
+   if (ret)
rwlock->read_depth++;
-   rwlock_acquire_read(>dep_map, 0, 1, _RET_IP_);
-   } else
+   else
migrate_enable();
 
return ret;
@@ -242,13 +244,13 @@ void __lockfunc rt_read_lock(rwlock_t *rwlock)
 {
struct rt_mutex *lock = >lock;
 
-   rwlock_acquire_read(>dep_map, 0, 0, _RET_IP_);
-
/*
 * recursive read locks succeed when current owns the lock
 */
-   if (rt_mutex_owner(lock) != current)
+   if (rt_mutex_owner(lock) != current) {
+   rwlock_acquire(>dep_map, 0, 0, _RET_IP_);
__rt_spin_lock(lock);
+   }
rwlock->read_depth++;
 }
 
@@ -264,11 +266,11 @@ EXPORT_SYMBOL(rt_write_unlock);
 
 void __lockfunc rt_read_unlock(rwlock_t *rwlock)
 {
-   rwlock_release(>dep_map, 1, _RET_IP_);
-
/* Release the lock only when read_depth is down to 0 */
-   if (--rwlock->read_depth == 0)
+   if (--rwlock->read_depth == 0) {
+   rwlock_release(>dep_map, 1, _RET_IP_);
__rt_spin_unlock(>lock);
+   }
 }
 EXPORT_SYMBOL(rt_read_unlock);
 
@@ -315,9 +317,10 @@ EXPORT_SYMBOL(rt_up_write);
 
 void  rt_up_read(struct rw_semaphore *rwsem)
 {
-   rwsem_release(>dep_map, 1, _RET_IP_);
-   if (--rwsem->read_depth == 0)
+   if (--rwsem->read_depth == 0) {
+   rwsem_release(>dep_map, 1, _RET_IP_);
rt_mutex_unlock(>lock);
+   }
 }
 EXPORT_SYMBOL(rt_up_read);
 
@@ -366,15 +369,16 @@ int  rt_down_read_trylock(struct rw_semaphore *rwsem)
 * but not when read_depth == 0 which means that the rwsem is
 * write locked.
 */
-   if (rt_mutex_owner(lock) != current)
+   if (rt_mutex_owner(lock) != current) {
ret = rt_mutex_trylock(>lock);
-   else if (!rwsem->read_depth)
+   if (ret)
+   rwsem_acquire(>dep_map, 0, 1, _RET_IP_);
+   } else if (!rwsem->read_depth) {
ret = 0;
+   }
 
-   if (ret) {
+   if (ret)
rwsem->read_depth++;
-   rwsem_acquire(>dep_map, 0, 1, _RET_IP_);
-   }
return ret;
 }
 EXPORT_SYMBOL(rt_down_read_trylock);
@@ -383,10 +387,10 @@ static void __rt_down_read(struct rw_semaphore *rwsem, 
int subclass)
 {
struct rt_mutex *lock = >lock;
 
-   rwsem_acquire_read(>dep_map, subclass, 0, _RET_IP_);
-
-   if (rt_mutex_owner(lock) != current)
+   if (rt_mutex_owner(lock) != current) {
+   rwsem_acquire(>dep_map, subclass, 0, _RET_IP_);
rt_mutex_lock(>lock);
+   }
rwsem->read_depth++;
 }
 
-- 
1.7.10.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RT 2/8] softirq: Init softirq local lock after per cpu section is set up

2012-10-11 Thread Steven Rostedt
From: Steven Rostedt 

I discovered this bug when booting 3.4-rt on my powerpc box. It crashed
with the following report:

[ cut here ]
kernel BUG at /work/rt/stable-rt.git/kernel/rtmutex_common.h:75!
Oops: Exception in kernel mode, sig: 5 [#1]
PREEMPT SMP NR_CPUS=64 NUMA PA Semi PWRficient
Modules linked in:
NIP: c04aa03c LR: c04aa01c CTR: c009b2ac
REGS: c0003e8d7950 TRAP: 0700   Not tainted  (3.4.11-test-rt19)
MSR: 90029032   CR: 2482  XER: 2000
SOFTE: 0
TASK = c0003e8fdcd0[11] 'ksoftirqd/1' THREAD: c0003e8d4000 CPU: 1
GPR00: 0001 c0003e8d7bd0 c0d6cbb0 
GPR04: c0003e8fdcd0  24004082 c0011454
GPR08:  8001 c0003e8fdcd1 
GPR12: 2484 cfff0280  3ad8
GPR16:  0072c798 0060 
GPR20: 00642741 0072c858 3af0 0417
GPR24: 0072dcd0 c0003e7ff990  0001
GPR28:  c0792340 c0ccec78 c1182338
NIP [c04aa03c] .wakeup_next_waiter+0x44/0xb8
LR [c04aa01c] .wakeup_next_waiter+0x24/0xb8
Call Trace:
[c0003e8d7bd0] [c04aa01c] .wakeup_next_waiter+0x24/0xb8 (unreliable)
[c0003e8d7c60] [c04a0320] .rt_spin_lock_slowunlock+0x8c/0xe4
[c0003e8d7ce0] [c04a07cc] .rt_spin_unlock+0x54/0x64
[c0003e8d7d60] [c00636bc] .__thread_do_softirq+0x130/0x174
[c0003e8d7df0] [c006379c] .run_ksoftirqd+0x9c/0x1a4
[c0003e8d7ea0] [c0080b68] .kthread+0xa8/0xb4
[c0003e8d7f90] [c001c2f8] .kernel_thread+0x54/0x70
Instruction dump:
6000 e86d01c8 38630730 4bff7061 6000 ebbf0008 7c7c1b78 e81d0040
7fe00278 7c74 7800d182 6801 <0b00> e88d01c8 387d0010 38840738

The rtmutex_common.h:75 is:

rt_mutex_top_waiter(struct rt_mutex *lock)
{
struct rt_mutex_waiter *w;

w = plist_first_entry(>wait_list, struct rt_mutex_waiter,
   list_entry);
BUG_ON(w->lock != lock);

return w;
}

Where the waiter->lock is corrupted. I saw various other random bugs
that all had to with the softirq lock and plist. As plist needs to be
initialized before it is used I investigated how this lock is
initialized. It's initialized with:

void __init softirq_early_init(void)
{
local_irq_lock_init(local_softirq_lock);
}

Where:

#define local_irq_lock_init(lvar)   \
do {\
int __cpu;  \
for_each_possible_cpu(__cpu)\
spin_lock_init(_cpu(lvar, __cpu).lock); \
} while (0)

As the softirq lock is a local_irq_lock, which is a per_cpu lock, the
initialization is done to all per_cpu versions of the lock. But lets
look at where the softirq_early_init() is called from.

In init/main.c: start_kernel()

/*
 * Interrupts are still disabled. Do necessary setups, then
 * enable them
 */
softirq_early_init();
tick_init();
boot_cpu_init();
page_address_init();
printk(KERN_NOTICE "%s", linux_banner);
setup_arch(_line);
mm_init_owner(_mm, _task);
mm_init_cpumask(_mm);
setup_command_line(command_line);
setup_nr_cpu_ids();
setup_per_cpu_areas();
smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */

One of the first things that is called is the initialization of the
softirq lock. But if you look further down, we see the per_cpu areas
have not been set up yet. Thus initializing a local_irq_lock() before
the per_cpu section is set up, may not work as it is initializing the
per cpu locks before the per cpu exists.

By moving the softirq_early_init() right after setup_per_cpu_areas(),
the kernel boots fine.

Signed-off-by: Steven Rostedt 
Cc: Clark Williams 
Cc: John Kacur 
Cc: Carsten Emde 
Cc: voml...@texas.net
Cc: stable...@vger.kernel.org
Link: http://lkml.kernel.org/r/1349362924.6755.18.ca...@gandalf.local.home
Signed-off-by: Thomas Gleixner 
---
 init/main.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/init/main.c b/init/main.c
index d432bea..6f96224 100644
--- a/init/main.c
+++ b/init/main.c
@@ -490,7 +490,6 @@ asmlinkage void __init start_kernel(void)
  * Interrupts are still disabled. Do necessary setups, then
  * enable them
  */
-   softirq_early_init();
tick_init();
boot_cpu_init();
page_address_init();
@@ -501,6 +500,7 @@ asmlinkage void __init start_kernel(void)
setup_command_line(command_line);
setup_nr_cpu_ids();
setup_per_cpu_areas();
+   softirq_early_init();
smp_prepare_boot_cpu(); /* 

[PATCH RT 7/8] stomp_machine: Use mutex_trylock when called from inactive cpu

2012-10-11 Thread Steven Rostedt
From: Thomas Gleixner 

If the stop machinery is called from inactive CPU we cannot use
mutex_lock, because some other stomp machine invokation might be in
progress and the mutex can be contended. We cannot schedule from this
context, so trylock and loop.

Signed-off-by: Thomas Gleixner 
Cc: stable...@vger.kernel.org
Signed-off-by: Steven Rostedt 
---
 kernel/stop_machine.c |   13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 561ba3a..e98c70b 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -158,7 +158,7 @@ static DEFINE_PER_CPU(struct cpu_stop_work, stop_cpus_work);
 
 static void queue_stop_cpus_work(const struct cpumask *cpumask,
 cpu_stop_fn_t fn, void *arg,
-struct cpu_stop_done *done)
+struct cpu_stop_done *done, bool inactive)
 {
struct cpu_stop_work *work;
unsigned int cpu;
@@ -175,7 +175,12 @@ static void queue_stop_cpus_work(const struct cpumask 
*cpumask,
 * Make sure that all work is queued on all cpus before we
 * any of the cpus can execute it.
 */
-   mutex_lock(_lock);
+   if (!inactive) {
+   mutex_lock(_lock);
+   } else {
+   while (!mutex_trylock(_lock))
+   cpu_relax();
+   }
for_each_cpu(cpu, cpumask)
cpu_stop_queue_work(_cpu(cpu_stopper, cpu),
_cpu(stop_cpus_work, cpu));
@@ -188,7 +193,7 @@ static int __stop_cpus(const struct cpumask *cpumask,
struct cpu_stop_done done;
 
cpu_stop_init_done(, cpumask_weight(cpumask));
-   queue_stop_cpus_work(cpumask, fn, arg, );
+   queue_stop_cpus_work(cpumask, fn, arg, , false);
wait_for_stop_done();
return done.executed ? done.ret : -ENOENT;
 }
@@ -601,7 +606,7 @@ int stop_machine_from_inactive_cpu(int (*fn)(void *), void 
*data,
set_state(, STOPMACHINE_PREPARE);
cpu_stop_init_done(, num_active_cpus());
queue_stop_cpus_work(cpu_active_mask, stop_machine_cpu_stop, ,
-);
+, true);
ret = stop_machine_cpu_stop();
 
/* Busy wait for completion. */
-- 
1.7.10.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RT 3/8] mm: slab: Fix potential deadlock

2012-10-11 Thread Steven Rostedt
From: Thomas Gleixner 

 =
[ INFO: possible recursive locking detected ]
 3.6.0-rt1+ #49 Not tainted
 -
 swapper/0/1 is trying to acquire lock:
 lock_slab_on+0x72/0x77

 but task is already holding lock:
 __local_lock_irq+0x24/0x77

 other info that might help us debug this:
  Possible unsafe locking scenario:

CPU0

   lock(_cpu(slab_lock, __cpu).lock);
   lock(_cpu(slab_lock, __cpu).lock);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 2 locks held by swapper/0/1:
 kmem_cache_create+0x33/0x89
 __local_lock_irq+0x24/0x77

 stack backtrace:
 Pid: 1, comm: swapper/0 Not tainted 3.6.0-rt1+ #49
 Call Trace:
 __lock_acquire+0x9a4/0xdc4
 ? __local_lock_irq+0x24/0x77
 ? lock_slab_on+0x72/0x77
 lock_acquire+0xc4/0x108
 ? lock_slab_on+0x72/0x77
 ? unlock_slab_on+0x5b/0x5b
 rt_spin_lock+0x36/0x3d
 ? lock_slab_on+0x72/0x77
 ? migrate_disable+0x85/0x93
 lock_slab_on+0x72/0x77
 do_ccupdate_local+0x19/0x44
 slab_on_each_cpu+0x36/0x5a
 do_tune_cpucache+0xc1/0x305
 enable_cpucache+0x8c/0xb5
 setup_cpu_cache+0x28/0x182
 __kmem_cache_create+0x34b/0x380
 ? shmem_mount+0x1a/0x1a
 kmem_cache_create+0x4a/0x89
 ? shmem_mount+0x1a/0x1a
 shmem_init+0x3e/0xd4
 kernel_init+0x11c/0x214
 kernel_thread_helper+0x4/0x10
 ? retint_restore_args+0x13/0x13
 ? start_kernel+0x3bc/0x3bc
 ? gs_change+0x13/0x13

It's not a missing annotation. It's simply wrong code and needs to be
fixed. Instead of nesting the local and the remote cpu lock simply
acquire only the remote cpu lock, which is sufficient protection for
this procedure.

Signed-off-by: Thomas Gleixner 
Cc: stable...@vger.kernel.org
Signed-off-by: Steven Rostedt 
---
 include/linux/locallock.h |8 
 mm/slab.c |   10 ++
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/include/linux/locallock.h b/include/linux/locallock.h
index 8fbc393..0161fbb 100644
--- a/include/linux/locallock.h
+++ b/include/linux/locallock.h
@@ -96,6 +96,9 @@ static inline void __local_lock_irq(struct local_irq_lock *lv)
 #define local_lock_irq(lvar)   \
do { __local_lock_irq(_local_var(lvar)); } while (0)
 
+#define local_lock_irq_on(lvar, cpu)   \
+   do { __local_lock_irq(_cpu(lvar, cpu)); } while (0)
+
 static inline void __local_unlock_irq(struct local_irq_lock *lv)
 {
LL_WARN(!lv->nestcnt);
@@ -111,6 +114,11 @@ static inline void __local_unlock_irq(struct 
local_irq_lock *lv)
put_local_var(lvar);\
} while (0)
 
+#define local_unlock_irq_on(lvar, cpu) \
+   do {\
+   __local_unlock_irq(_cpu(lvar, cpu));\
+   } while (0)
+
 static inline int __local_lock_irqsave(struct local_irq_lock *lv)
 {
if (lv->owner != current) {
diff --git a/mm/slab.c b/mm/slab.c
index 411c545..6a8fd1b 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -747,18 +747,12 @@ slab_on_each_cpu(void (*func)(void *arg, int this_cpu), 
void *arg)
 
 static void lock_slab_on(unsigned int cpu)
 {
-   if (cpu == smp_processor_id())
-   local_lock_irq(slab_lock);
-   else
-   local_spin_lock_irq(slab_lock, _cpu(slab_lock, cpu).lock);
+   local_lock_irq_on(slab_lock, cpu);
 }
 
 static void unlock_slab_on(unsigned int cpu)
 {
-   if (cpu == smp_processor_id())
-   local_unlock_irq(slab_lock);
-   else
-   local_spin_unlock_irq(slab_lock, _cpu(slab_lock, cpu).lock);
+   local_unlock_irq_on(slab_lock, cpu);
 }
 #endif
 
-- 
1.7.10.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RT 0/8] [ANNOUNCE] 3.2.31-rt47-rc1 stable review

2012-10-11 Thread Steven Rostedt

Dear RT Folks,

This is the RT stable review cycle of patch 3.2.31-rt47-rc1.

Please scream at me if I messed something up. Please test the patches too.

The -rc release will be uploaded to kernel.org and will be deleted when
the final release is out. This is just a review release (or release candidate).

The pre-releases will not be pushed to the git repository, only the
final release is.

If all goes well, this patch will be converted to the next main release
on 10/16/2012.

Enjoy,

-- Steve


To build 3.2.31-rt47-rc1 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v3.x/linux-3.2.tar.xz

  http://www.kernel.org/pub/linux/kernel/v3.x/patch-3.2.31.xz

  
http://www.kernel.org/pub/linux/kernel/projects/rt/3.2/patch-3.2.31-rt47-rc1.patch.xz

You can also build from 3.2.31-rt46 by applying the incremental patch:

http://www.kernel.org/pub/linux/kernel/projects/rt/3.2/incr/patch-3.2.31-rt46-rt47-rc1.patch.xz


Changes from 3.2.31-rt46:

---


Steven Rostedt (2):
  softirq: Init softirq local lock after per cpu section is set up
  Linux 3.2.31-rt47-rc1

Thomas Gleixner (6):
  random: Make it work on rt
  mm: slab: Fix potential deadlock
  mm: page_alloc: Use local_lock_on() instead of plain spinlock
  rt: rwsem/rwlock: lockdep annotations
  sched: Better debug output for might sleep
  stomp_machine: Use mutex_trylock when called from inactive cpu


 drivers/char/random.c |   10 ++
 include/linux/irqdesc.h   |1 +
 include/linux/locallock.h |   19 +++
 include/linux/random.h|2 +-
 include/linux/sched.h |4 
 init/main.c   |2 +-
 kernel/irq/handle.c   |7 +--
 kernel/irq/manage.c   |6 ++
 kernel/rt.c   |   46 -
 kernel/sched.c|   23 +--
 kernel/stop_machine.c |   13 +
 localversion-rt   |2 +-
 mm/page_alloc.c   |4 ++--
 mm/slab.c |   10 ++
 14 files changed, 103 insertions(+), 46 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[git pull] vfs.git, pile 2

2012-10-11 Thread Al Viro
Stuff in this one - assorted fixes, lglock tidy-up, death to
lock_super().  There'll be a VFS pile tomorrow (with patches from Jeff Layton,
sanitizing getname() and related parts of audit and preparing for ESTALE
fixes), but I'd rather push the stuff in this one ASAP - some of the bugs
closed here are quite unpleasant.  Please, pull from the usual place -
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-linus

Shortlog:
Al Viro (2):
  MAX_LFS_FILESIZE definition for 64bit needs LL...
  consitify do_mount() arguments

Arnd Bergmann (1):
  vfs: bogus warnings in fs/namei.c

Hugh Dickins (1):
  tmpfs,ceph,gfs2,isofs,reiserfs,xfs: fix fh_len checking

Lai Jiangshan (3):
  lglock: remove unused DEFINE_LGLOCK_LOCKDEP()
  lglock: make the per_cpu locks static
  lglock: add DEFINE_STATIC_LGLOCK()

Marco Stornelli (7):
  exofs: drop lock/unlock super
  ext3: drop lock/unlock super
  fat: drop lock/unlock super
  hpfs: drop lock/unlock super
  sysv: drop lock/unlock super
  ufs: drop lock/unlock super
  vfs: drop lock/unlock super

Richard W.M. Jones (1):
  dup3: Return an error when oldfd == newfd.

Sasha Levin (2):
  fs: prevent use after free in auditing when symlink following was denied
  fs: handle failed audit_log_start properly

Diffstat:
 fs/ceph/export.c   |   18 ++
 fs/exofs/super.c   |4 
 fs/ext3/super.c|6 --
 fs/fat/dir.c   |4 ++--
 fs/fat/fat.h   |5 +++--
 fs/fat/inode.c |5 +++--
 fs/fat/namei_msdos.c   |   26 +-
 fs/fat/namei_vfat.c|   30 +++---
 fs/file.c  |3 +++
 fs/file_table.c|2 +-
 fs/gfs2/export.c   |4 
 fs/hpfs/super.c|3 ---
 fs/isofs/export.c  |2 +-
 fs/namei.c |3 ++-
 fs/namespace.c |   12 ++--
 fs/reiserfs/inode.c|6 +-
 fs/super.c |   23 ---
 fs/sysv/balloc.c   |   18 +-
 fs/sysv/ialloc.c   |   14 +++---
 fs/sysv/inode.c|4 ++--
 fs/sysv/super.c|1 +
 fs/sysv/sysv.h |1 +
 fs/ufs/balloc.c|   30 +++---
 fs/ufs/ialloc.c|   16 
 fs/ufs/super.c |   21 +++--
 fs/ufs/ufs.h   |1 +
 fs/xfs/xfs_export.c|3 +++
 include/linux/fs.h |5 ++---
 include/linux/lglock.h |   19 ---
 include/linux/security.h   |   12 ++--
 kernel/audit.c |2 ++
 mm/shmem.c |6 --
 security/capability.c  |4 ++--
 security/security.c|4 ++--
 security/selinux/hooks.c   |4 ++--
 security/smack/smack_lsm.c |4 ++--
 security/tomoyo/common.h   |2 +-
 security/tomoyo/mount.c|5 +++--
 security/tomoyo/tomoyo.c   |4 ++--
 39 files changed, 166 insertions(+), 170 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[git pull] signal.git, pile 2 (was Re: [RFC][CFT][CFReview] execve and kernel_thread unification work)

2012-10-11 Thread Al Viro
On Fri, Oct 12, 2012 at 11:16:33AM +1100, Paul Mackerras wrote:
> On Thu, Oct 11, 2012 at 01:53:06PM +0100, Al Viro wrote:
> 
> > Umm...  Maybe, but let's do that as subsequent cleanup.  Again,
> > we almost certainly don't need to mess with TOC at all - the callbacks
> > are in the main kernel, there are very few of them and they really are
> > low-level details of exported mechanisms (i.e. kthread_create/run/etc.
> > in kthread.h and call_usermode... in kmod.h).  Again, we are talking
> > about out-of-tree modules, they had better mechanism for at least
> > 6 years and conversion to it is bloody trivial.  Hell, it was even
> > in late unlamented feature-removal-schedule.txt - since 2006.  If that's
> > not enough to retire an export, what is?
> 
> OK... yes we can fix things up in a subsequent cleanup.
> 
> We will need to fix the TOC handling when we go to using multiple TOCs
> in the main kernel, with the linker managing the transitions between
> TOCs.  Our toolchain guys have been pushing us to do that for years,
> because it should make things run faster, but first we'll have to stop
> using ld -r to combine objects in subdirectories.

How granular are you planning to make that?  I mean, we are talking about
3 objects here - init/main.o, kernel/kthread.o and kernel/kmod.o.  Do they
get TOC separate from that of arch/powerpc/kernel/entry_64.o?

Anyway, if ppc folks can live with that stuff in its current form for now,
here's the second signal.git pull request.  Stuff in there: kernel_thread/
kernel_execve/sys_execve conversions for several more architectures plus
assorted signal fixes and cleanups.  There'll be more (in particular, real
fixes for alpha do_notify_resume() irq mess)...  Linus, could you pull that
queue?  It's in the usual place -
git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal for-linus

Shortlog:
Al Viro (38):
  powerpc: split ret_from_fork
  powerpc: switch to generic sys_execve()/kernel_execve()
  m68k: split ret_from_fork(), simplify kernel_thread()
  m68k: switch to generic sys_execve()/kernel_execve()
  frv: split ret_from_fork, simplify kernel_thread() a lot
  frv: switch to generic sys_execve()
  frv: switch to generic kernel_execve
  frv: switch to generic kernel_thread()
  mn10300: split ret_from_fork, simplify kernel_thread()
  mn10300: switch to generic sys_execve()
  mn10300: switch to generic kernel_execve()
  mn10300: convert to generic kernel_thread()
  c6x: switch to generic kernel_thread()
  xtensa: can't get to do_notify_resume() when user_mode(regs) is not true
  mn10300: get rid of calling do_notify_resume() when returning to kernel 
mode
  score: fix bogus restarts on sigreturn()
  ia64: can't reach do_signal() when returning to kernel mode
  mips: prevent hitting do_notify_resume() with !user_mode(regs)
  mips: unobfuscate _TIF..._MASK
  mips: merge the identical "return from syscall" per-ABI code
  mips: NOTIFY_RESUME is not needed in TIF masks
  unicore32: unobfuscate _TIF_WORK_MASK
  bury _TIF_RESTORE_SIGMASK
  sanitize tsk_is_polling()
  bury the rest of TIF_IRET
  parisc: fix double restarts
  parisc: don't bother looping in do_signal()
  parisc: decide whether to go to slow path (tracesys) based on thread flags
  h8300: trim _TIF_WORK_MASK
  unicore32: remove pointless test
  x86: get rid of duplicate code in case of CONFIG_VM86
  frv: no need to raise SIGTRAP in setup_frame()
  mn10300: don't bother with SIGTRAP in setup_frame()
  microblaze: don't bother with SIGTRAP in setup_rt_frame()
  tile: don't bother with SIGTRAP in setup_frame
  avr32: trim masks
  m32r: trim masks
  alpha: don't open-code trace_report_syscall_{enter,exit}

Greg Ungerer (1):
  m68k: always set stack frame format for ColdFire on thread start

Mark Salter (3):
  c6x: add ret_from_kernel_thread(), simplify kernel_thread()
  c6x: switch to generic kernel_execve
  c6x: switch to generic sys_execve

Richard Weinberger (1):
  Uninclude linux/freezer.h

Diffstat:
 arch/alpha/include/asm/thread_info.h  |3 +-
 arch/alpha/kernel/entry.S |   13 ++--
 arch/alpha/kernel/ptrace.c|   32 -
 arch/arm/include/asm/thread_info.h|2 -
 arch/arm/kernel/signal.c  |1 -
 arch/avr32/include/asm/thread_info.h  |   18 ++---
 arch/avr32/kernel/signal.c|1 -
 arch/blackfin/include/asm/thread_info.h   |4 -
 arch/blackfin/kernel/signal.c |1 -
 arch/c6x/Kconfig  |1 +
 arch/c6x/include/asm/processor.h  |2 -
 arch/c6x/include/asm/syscalls.h   |5 --
 arch/c6x/include/asm/thread_info.h|1 -
 arch/c6x/include/asm/unistd.h |3 +
 arch/c6x/kernel/asm-offsets.c |1 -
 arch/c6x/kernel/entry.S   |   56 

Re: [PATCH 07/33] autonuma: mm_autonuma and task_autonuma data structures

2012-10-11 Thread Andrea Arcangeli
Hi Christoph,

On Fri, Oct 12, 2012 at 12:23:17AM +, Christoph Lameter wrote:
> On Thu, 11 Oct 2012, Rik van Riel wrote:
> 
> > These statistics are updated at page fault time, I
> > believe while holding the page table lock.
> >
> > In other words, they are in code paths where updating
> > the stats should not cause issues.
> 
> The per cpu counters in the VM were introduced because of
> counter contention caused at page fault time. This is the same code path
> where you think that there cannot be contention.

There's no contention at all in autonuma27.

I changed it in autonuma28, to get real time updates in mm_autonuma
from migration events.

There is no lock taken though (the spinlock below is taken once every
pass, very rarely). It's a few liner change shown in detail below. The
only contention point is this:

+   ACCESS_ONCE(mm_numa_fault[access_nid]) += numpages;
+   ACCESS_ONCE(mm_autonuma->mm_numa_fault_tot) += numpages;

autonuma28 is much more experimental than autonuma27 :)

I wouldn't focus on >1024 CPU systems for this though. The bigger the
system the more costly any automatic placement logic will become, no
matter which algorithm and which computation complexity the algorithm
has, and chances are those will use NUMA hard bindings anyway
considering how much they're expensive to setup and maintain.

The diff looks like this, I can consider undoing it. Comments
welcome. (but real time stats updates, converge faster in autonuma28)

--- a/mm/autonuma.c
+++ b/mm/autonuma.c
 
 static struct knuma_scand_data {
struct list_head mm_head; /* entry: mm->mm_autonuma->mm_node */
struct mm_struct *mm;
unsigned long address;
-   unsigned long *mm_numa_fault_tmp;
 } knuma_scand_data = {
.mm_head = LIST_HEAD_INIT(knuma_scand_data.mm_head),
 };






+   unsigned long tot;
+
+   /*
+* Set the task's fault_pass equal to the new
+* mm's fault_pass, so new_pass will be false
+* on the next fault by this thread in this
+* same pass.
+*/
+   p->task_autonuma->task_numa_fault_pass = mm_numa_fault_pass;
+
/* If a new pass started, degrade the stats by a factor of 2 */
for_each_node(nid)
task_numa_fault[nid] >>= 1;
task_autonuma->task_numa_fault_tot >>= 1;
+
+   if (mm_numa_fault_pass ==
+   ACCESS_ONCE(mm_autonuma->mm_numa_fault_last_pass))
+   return;
+
+   spin_lock(_autonuma->mm_numa_fault_lock);
+   if (unlikely(mm_numa_fault_pass ==
+mm_autonuma->mm_numa_fault_last_pass)) {
+   spin_unlock(_autonuma->mm_numa_fault_lock);
+   return;
+   }
+   mm_autonuma->mm_numa_fault_last_pass = mm_numa_fault_pass;
+
+   tot = 0;
+   for_each_node(nid) {
+   unsigned long fault = ACCESS_ONCE(mm_numa_fault[nid]);
+   fault >>= 1;
+   ACCESS_ONCE(mm_numa_fault[nid]) = fault;
+   tot += fault;
+   }
+   mm_autonuma->mm_numa_fault_tot = tot;
+   spin_unlock(_autonuma->mm_numa_fault_lock);
 }






task_numa_fault[access_nid] += numpages;
task_autonuma->task_numa_fault_tot += numpages;
 
+   ACCESS_ONCE(mm_numa_fault[access_nid]) += numpages;
+   ACCESS_ONCE(mm_autonuma->mm_numa_fault_tot) += numpages;
+
local_bh_enable();
 }
 
@@ -310,28 +355,35 @@ static void numa_hinting_fault_cpu_follow_memory(struct 
task_struct *p,
@@ -593,35 +628,26 @@ static int knuma_scand_pmd(struct mm_struct *mm,
goto out;
 
if (pmd_trans_huge_lock(pmd, vma) == 1) {
-   int page_nid;
-   unsigned long *fault_tmp;
ret = HPAGE_PMD_NR;
 
VM_BUG_ON(address & ~HPAGE_PMD_MASK);
 
-   if (autonuma_mm_working_set() && pmd_numa(*pmd)) {
+   if (pmd_numa(*pmd)) {
spin_unlock(>page_table_lock);
goto out;
}
-
page = pmd_page(*pmd);
-
/* only check non-shared pages */
if (page_mapcount(page) != 1) {
spin_unlock(>page_table_lock);
goto out;
}
-
-   page_nid = page_to_nid(page);
-   fault_tmp = knuma_scand_data.mm_numa_fault_tmp;
-   fault_tmp[page_nid] += ret;
-
if (pmd_numa(*pmd)) {
spin_unlock(>page_table_lock);
goto out;
}
-
set_pmd_at(mm, address, pmd, pmd_mknuma(*pmd));
+
/* defer TLB flush to lower the overhead */
spin_unlock(>page_table_lock);
goto out;
@@ -636,10 +662,9 @@ static int knuma_scand_pmd(struct mm_struct *mm,
for (_address = address, _pte = pte; _address < end;
 _pte++, _address += PAGE_SIZE) {
pte_t pteval = *_pte;
-   unsigned 

Re: [PATCH 00/33] AutoNUMA27

2012-10-11 Thread Andrea Arcangeli
On Thu, Oct 11, 2012 at 04:35:03PM +0100, Mel Gorman wrote:
> If System CPU time really does go down as this converges then that
> should be obvious from monitoring vmstat over time for a test. Early on
> - high usage with that dropping as it converges. If that doesn't happen
>   then the tasks are not converging, the phases change constantly or
> something unexpected happened that needs to be identified.

Yes, all measurable kernel cost should be in the memory copies
(migration and khugepaged, the latter is going to be optimized away).

The migrations must stop after the workload converges. Either
migrations are used to reach convergence or they shouldn't happen in
the first place (not in any measurable amount).

> Ok. Are they separate STREAM instances or threads running on the same
> arrays? 

My understanding is separate instances. I think it's a single threaded
benchmark and you run many copies. It was modified to run for 5min
(otherwise upstream has not enough time to get it wrong, as result of
background scheduling jitters).

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 10/33] autonuma: CPU follows memory algorithm

2012-10-11 Thread Andrea Arcangeli
On Thu, Oct 11, 2012 at 03:58:05PM +0100, Mel Gorman wrote:
> On Thu, Oct 04, 2012 at 01:50:52AM +0200, Andrea Arcangeli wrote:
> > This algorithm takes as input the statistical information filled by the
> > knuma_scand (mm->mm_autonuma) and by the NUMA hinting page faults
> > (p->task_autonuma), evaluates it for the current scheduled task, and
> > compares it against every other running process to see if it should
> > move the current task to another NUMA node.
> > 
> 
> That sounds expensive if there are a lot of running processes in the
> system. How often does this happen? Mention it here even though I
> realised much later that it's obvious from the patch itself.

Ok I added:

==
This algorithm will run once every ~100msec, and can be easily slowed
down further. Its computational complexity is O(nr_cpus) and it's
executed by all CPUs. The number of running threads and processes is
not going to alter the cost of this algorithm, only the online number
of CPUs is. However practically this will very rarely hit on all CPUs
runqueues. Most of the time it will only compute on local data in the
task_autonuma struct (for example if convergence has been
reached). Even if no convergence has been reached yet, it'll only scan
the CPUs in the NUMA nodes where the local task_autonuma data is
showing that they are worth migrating to.
==

It's configurable through sysfs, 100mses is the default.

> > + * there is no affinity set for the task).
> > + */
> > +static bool inline task_autonuma_cpu(struct task_struct *p, int cpu)
> > +{
> 
> nit, but elsewhere you have
> 
> static inline TYPE and here you have
> static TYPE inline

Fixed.

> 
> > +   int task_selected_nid;
> > +   struct task_autonuma *task_autonuma = p->task_autonuma;
> > +
> > +   if (!task_autonuma)
> > +   return true;
> > +
> > +   task_selected_nid = ACCESS_ONCE(task_autonuma->task_selected_nid);
> > +   if (task_selected_nid < 0 || task_selected_nid == cpu_to_node(cpu))
> > +   return true;
> > +   else
> > +   return false;
> > +}
> 
> no need for else.

Removed.

> 
> > +
> > +static inline void sched_autonuma_balance(void)
> > +{
> > +   struct task_autonuma *ta = current->task_autonuma;
> > +
> > +   if (ta && current->mm)
> > +   __sched_autonuma_balance();
> > +}
> > +
> 
> Ok, so this could do with a comment explaining where it is called from.
> It is called during idle balancing at least so potentially this is every
> scheduler tick. It'll be run from softirq context so the cost will not
> be obvious to a process but the overhead will be there. What happens if
> this takes longer than a scheduler tick to run? Is that possible?

softirqs can run for huge amount of time so it won't harm.

Nested IRQs could even run on top of the softirq, and they could take
milliseconds too if they're hyper inefficient and we must still run
perfectly rock solid (with horrible latency, but still stable).

I added:

/*
 * This is called in the context of the SCHED_SOFTIRQ from
 * run_rebalance_domains().
 */

> > +/*
> > + * This function __sched_autonuma_balance() is responsible for
> 
> This function is far too shot and could do with another few pages :P

:) I tried to split it once already but gave up in the middle.

> > + * "Full convergence" is achieved when all memory accesses by a task
> > + * are 100% local to the CPU it is running on. A task's "best node" is
> 
> I think this is the first time you defined convergence in the series.
> The explanation should be included in the documentation.

Ok. It's not too easy concept to explain with words.  Here a try:

 *
 * A workload converges when all the memory of a thread or a process
 * has been placed in the NUMA node of the CPU where the process or
 * thread is running on.
 *

> > + * other_diff: how much the current task is closer to fully converge
> > + * on the node of the other CPU than the other task that is currently
> > + * running in the other CPU.
> 
> In the changelog you talked about comparing a process with every other
> running process but here it looks like you intent to examine every
> process that is *currently running* on a remote node and compare that.
> What if the best process to swap with is not currently running? Do we
> miss it?

Correct, only currently running processes are being checked. If a task
in R state goes to sleep immediately, it's not relevant where it
runs. We focus on "long running" compute tasks, so tasks that are in R
state most frequently.

> > + * If both checks succeed it guarantees that we found a way to
> > + * multilaterally improve the system wide NUMA
> > + * convergence. Multilateral here means that the same checks will not
> > + * succeed again on those same two tasks, after the task exchange, so
> > + * there is no risk of ping-pong.
> > + *
> 
> At least not in that instance of time. A new CPU binding or change in
> behaviour (such as a computation finishing and a reduce step starting)
> might change that scoring.

Yes.

> > + 

Re: uprobe: checking probe event include directory

2012-10-11 Thread Jovi Zhang
On Wed, Jul 18, 2012 at 7:45 PM, Srikar Dronamraju
 wrote:
> * Jovi Zhang  [2012-07-18 19:38:27]:
>
>> On Wed, Jul 18, 2012 at 7:07 PM, Srikar Dronamraju
>>  wrote:
>> > The patch looks good,
>> >
>> > Can you modify the description a bit. However you are free to ignore
>> > these comments. After knowing your response,  I will ack the patch.
>> >
>> > I would probably put this as:
>> >
>> > The subject could be
>> > tracing: Verify target file before registering a uprobe event
>> >
>> > Description:
>> > Without this patch, we can register a uprobe event for a directory.
>> > Enabling such a uprobe event would anyway fail.
>> >
>> > Example:
>> >
>> > $ echo 'p /bin:0x4245c0' > /sys/kernel/debug/tracing/uprobe_events
>> >
>> > However directories cannot be valid targets for uprobe.
>> > Hence verify if the target is a regular file during the probe
>> > registration.
>>
>> Thanks srikar, your description is more clear than mine.
>>
>>
>> From fd5077196038cc271e2116e1fca359a0011e1669 Mon Sep 17 00:00:00 2001
>> From: Jovi Zhang 
>> Date: Wed, 18 Jul 2012 18:16:44 +0800
>> Subject: [PATCH] tracing: Verify target file before registering a uprobe
>>  event
>>
>> Without this patch, we can register a uprobe event for a directory.
>> Enabling such a uprobe event would anyway fail.
>>
>> Example:
>> $ echo 'p /bin:0x4245c0' > /sys/kernel/debug/tracing/uprobe_events
>>
>> However dirctories cannot be valid targets for uprobe.
>> Hence verify if the target is a regular file during the probe
>> registration.
>>
>> Signed-off-by: Jovi Zhang 
>> Reviewed-by: Srikar Dronamraju 
>> ---
>>  kernel/trace/trace_uprobe.c |6 +-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
>> index 85158fa..3b5f646 100644
>> --- a/kernel/trace/trace_uprobe.c
>> +++ b/kernel/trace/trace_uprobe.c
>> @@ -259,6 +259,10 @@ static int create_trace_uprobe(int argc, char **argv)
>>   goto fail_address_parse;
>>
>>   inode = igrab(path.dentry->d_inode);
>> +  if (!S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode)) {
>> + ret = -EINVAL;
>> + goto fail_address_parse;
>> + }
>>
>>   argc -= 2;
>>   argv += 2;
>> @@ -358,7 +362,7 @@ fail_address_parse:
>>   if (inode)
>>   iput(inode);
>>
>> - pr_info("Failed to parse address.\n");
>> + pr_info("Failed to parse address or file.\n");
>>
>>   return ret;
>>  }
>
> Looks good.
>
> Acked-by: Srikar Dronamraju 
>

Hi Andrew,
Is this patch ok to go through your tree?

.jovi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][CFT][CFReview] execve and kernel_thread unification work

2012-10-11 Thread Paul Mackerras
On Thu, Oct 11, 2012 at 01:53:06PM +0100, Al Viro wrote:

>   Umm...  Maybe, but let's do that as subsequent cleanup.  Again,
> we almost certainly don't need to mess with TOC at all - the callbacks
> are in the main kernel, there are very few of them and they really are
> low-level details of exported mechanisms (i.e. kthread_create/run/etc.
> in kthread.h and call_usermode... in kmod.h).  Again, we are talking
> about out-of-tree modules, they had better mechanism for at least
> 6 years and conversion to it is bloody trivial.  Hell, it was even
> in late unlamented feature-removal-schedule.txt - since 2006.  If that's
> not enough to retire an export, what is?

OK... yes we can fix things up in a subsequent cleanup.

We will need to fix the TOC handling when we go to using multiple TOCs
in the main kernel, with the linker managing the transitions between
TOCs.  Our toolchain guys have been pushing us to do that for years,
because it should make things run faster, but first we'll have to stop
using ld -r to combine objects in subdirectories.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: build failure after merge of the l2-mtd tree

2012-10-11 Thread Stephen Rothwell
Hi Artem,

After merging the l2-mtd tree, today's linux-next build (x86_64
allmodconfig) failed like this:

ERROR: "denali_init" [drivers/mtd/nand/denali_pci.ko] undefined!
ERROR: "denali_remove" [drivers/mtd/nand/denali_pci.ko] undefined!

Probably caused by commit 305b1ee29c8e ("mtd: denali: split the generic
driver and PCI layer").

I have used the l2-mtd tree from next-20121011 for today.
-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpy1SbpHtmy4.pgp
Description: PGP signature


Re: [PATCH 2/2]suppress "Device nodeX does not have a release() function" warning

2012-10-11 Thread Yasuaki Ishimatsu

2012/10/12 5:31, David Rientjes wrote:

On Thu, 11 Oct 2012, Yasuaki Ishimatsu wrote:


When calling unregister_node(), the function shows following message at
device_release().

"Device 'node2' does not have a release() function, it is broken and must
be fixed."

The reason is node's device struct does not have a release() function.

So the patch registers node_device_release() to the device's release()
function for suppressing the warning message. Additionally, the patch adds
memset() to initialize a node struct into register_node(). Because the node
struct is part of node_devices[] array and it cannot be freed by
node_device_release(). So if system reuses the node struct, it has a garbage.



Nice catch on reuse of the statically allocated node_devices[] for node
hotplug.


CC: David Rientjes 
CC: Jiang Liu 
Cc: Minchan Kim 
CC: Andrew Morton 
CC: KOSAKI Motohiro 
Signed-off-by: Yasuaki Ishimatsu 
Signed-off-by: Wen Congyang 


Can register_node() be made static in drivers/base/node.c and its
declaration removed from linux/node.h?


Yah. I'll fix it.

Thanks,
Yasuaki Ishimatsu



Acked-by: David Rientjes 




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT] TPM bugfixes

2012-10-11 Thread Randy Dunlap
On 10/11/2012 04:00 PM, Andrew Morton wrote:

> On Thu, 11 Oct 2012 15:49:36 -0700
> Andrew Morton  wrote:
> 
>> On Fri, 12 Oct 2012 00:45:06 +0200
>> richard -rw- weinberger  wrote:
>>
>>> On Fri, Oct 12, 2012 at 12:19 AM, Andrew Morton
>>>  wrote:
 On Thu, 11 Oct 2012 21:54:18 +1100 (EST)
 James Morris  wrote:

> Please pull these fixes for the TPM code.
>
> The following changes since commit 
> 12250d843e8489ee00b5b7726da855e51694e792:
>
>   Merge branch 'i2c-embedded/for-next' of 
> git://git.pengutronix.de/git/wsa/linux (2012-10-11 10:27:51 +0900)
>
> are available in the git repository at:
>
>   
> git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git 
> for-linus

 Gargh.  Is it possible to add a human-readable http URL to these things
 so that people can actually look at the patches without hoop-jumping?
>>>
>>> rw@mantary:~> echo
>>> "git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git
>>> for-linus" | sed -e
>>> 's/git\:\/\//http\:\/\//;s/\/pub\/scm\//\?p=/g;s/\.git
>>> /\.git\;a=shortlog;h=refs\/heads\//g'
>>>
>>> http://git.kernel.org?p=linux/kernel/git/jmorris/linux-security.git;a=shortlog;h=refs/heads/for-linus
>>>
>>
>> Geeze.
>>
>> Thanks.  Followed by ^F, copy-n-paste, then hope it's on the first page?
> 
> http://git.kernel.org/?p=linux/kernel/git/jmorris/linux-security.git;a=commit;h=abce9ac292e13da367bbd22c1f7669f988d931ac
> 
> Which can be shortened to something like
> 
> http://git.kernel.org/?p=linux/kernel/git/jmorris/linux-security.git;h=abce9ac2
> 
> Simply including the commit IDs would help a lot.  Including the full
> URL to each commit would be nicer.

absolutely (for full URL).

> I assume Junio owns that script?
> --


-- 
~Randy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] uprobe: fix misleading log entry

2012-10-11 Thread Jovi Zhang
On Wed, Jul 18, 2012 at 5:22 PM, Srikar Dronamraju
 wrote:
> * Jovi Zhang  [2012-07-18 11:08:42]:
>
>> From 68232ef2decae95b807f2f3763e8ea99c1a3b2ae Mon Sep 17 00:00:00 2001
>> From: Jovi Zhang 
>> Date: Wed, 18 Jul 2012 17:51:26 +0800
>> Subject: [PATCH] uprobe: fix misleading log entry
>>
>> There don't have any 'r' prefix in uprobe event naming, remove it.
>>
>> Signed-off-by: Jovi Zhang 
>> ---
>>  kernel/trace/trace_uprobe.c |2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
>> index cf382de..852a584 100644
>> --- a/kernel/trace/trace_uprobe.c
>> +++ b/kernel/trace/trace_uprobe.c
>> @@ -191,7 +191,7 @@ static int create_trace_uprobe(int argc, char **argv)
>>   if (argv[0][0] == '-')
>>   is_delete = true;
>>   else if (argv[0][0] != 'p') {
>> - pr_info("Probe definition must be started with 'p', 'r' or" " 
>> '-'.\n");
>> + pr_info("Probe definition must be started with 'p' or '-'.\n");
>>   return -EINVAL;
>>   }
>>
>
> Yes, uprobes doesnt support return probes. So we should not have
> mentioned about r.
>
> Acked-by: Srikar Dronamraju 
>

Hi Andrew,

Is this patch ok to go through your mm tree? mainline still don't merge it.

.jovi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] kvm/ftrace: change the diplaying format of vector in trace_msi_set_irq

2012-10-11 Thread Sanagi, Koki
This patch changes the way to diplay the vector in trace_msi_set_irq from %x to
%u.  Currently, it mismatches another output of ftrace such as kvm_msi_set_irq
and kvm_inj_virq which uses %u.

Signed-off-by: Koki Sanagi 
---
 include/trace/events/kvm.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h
index 7ef9e75..0a83632 100644
--- a/include/trace/events/kvm.h
+++ b/include/trace/events/kvm.h
@@ -109,7 +109,7 @@ TRACE_EVENT(kvm_msi_set_irq,
__entry->data   = data;
),
 
-   TP_printk("dst %u vec %x (%s|%s|%s%s)",
+   TP_printk("dst %u vec %u (%s|%s|%s%s)",
  (u8)(__entry->address >> 12), (u8)__entry->data,
  __print_symbolic((__entry->data >> 8 & 0x7), 
kvm_deliver_mode),
  (__entry->address & (1<<2)) ? "logical" : "physical",
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

[PATCH] gen_init_cpio: avoid stack overflow when expanding

2012-10-11 Thread Kees Cook
Fix possible overflow of the buffer used for expanding environment
variables when building file list.

$ cat usr/crash.list
file foo ${BIG}${BIG}${BIG}${BIG}${BIG}${BIG} 0755 0 0
$ BIG=$(perl -e 'print "A" x 4096;') ./usr/gen_init_cpio usr/crash.list
*** buffer overflow detected ***: ./usr/gen_init_cpio terminated

This also replaces the space-indenting with tabs.

Patch based on existing fix extracted from grsecurity.

Cc: Michal Marek 
Cc: Gene Sally 
Cc: Brad Spengler 
Cc: PaX Team 
Cc: sta...@vger.kernel.org
Signed-off-by: Kees Cook 
---
 usr/gen_init_cpio.c |   43 +++
 1 file changed, 23 insertions(+), 20 deletions(-)

diff --git a/usr/gen_init_cpio.c b/usr/gen_init_cpio.c
index af0f22f..aca6edc 100644
--- a/usr/gen_init_cpio.c
+++ b/usr/gen_init_cpio.c
@@ -303,7 +303,7 @@ static int cpio_mkfile(const char *name, const char 
*location,
int retval;
int rc = -1;
int namesize;
-   int i;
+   unsigned int i;
 
mode |= S_IFREG;
 
@@ -381,25 +381,28 @@ error:
 
 static char *cpio_replace_env(char *new_location)
 {
-   char expanded[PATH_MAX + 1];
-   char env_var[PATH_MAX + 1];
-   char *start;
-   char *end;
-
-   for (start = NULL; (start = strstr(new_location, "${")); ) {
-   end = strchr(start, '}');
-   if (start < end) {
-   *env_var = *expanded = '\0';
-   strncat(env_var, start + 2, end - start - 2);
-   strncat(expanded, new_location, start - new_location);
-   strncat(expanded, getenv(env_var), PATH_MAX);
-   strncat(expanded, end + 1, PATH_MAX);
-   strncpy(new_location, expanded, PATH_MAX);
-   } else
-   break;
-   }
-
-   return new_location;
+   char expanded[PATH_MAX + 1];
+   char env_var[PATH_MAX + 1];
+   char *start;
+   char *end;
+
+   for (start = NULL; (start = strstr(new_location, "${")); ) {
+   end = strchr(start, '}');
+   if (start < end) {
+   *env_var = *expanded = '\0';
+   strncat(env_var, start + 2, end - start - 2);
+   strncat(expanded, new_location, start - new_location);
+   strncat(expanded, getenv(env_var),
+   PATH_MAX - strlen(expanded));
+   strncat(expanded, end + 1,
+   PATH_MAX - strlen(expanded));
+   strncpy(new_location, expanded, PATH_MAX);
+   new_location[PATH_MAX] = 0;
+   } else
+   break;
+   }
+
+   return new_location;
 }
 
 
-- 
1.7.9.5


-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] MM: Support more pagesizes for MAP_HUGETLB/SHM_HUGETLB v4

2012-10-11 Thread Andi Kleen
From: Andi Kleen 

There was some desire in large applications using MAP_HUGETLB/SHM_HUGETLB
to use 1GB huge pages on some mappings, and stay with 2MB on others. This
is useful together with NUMA policy: use 2MB interleaving on some mappings,
but 1GB on local mappings.

This patch extends the IPC/SHM syscall interfaces slightly to allow specifying
the page size.

It borrows some upper bits in the existing flag arguments and allows encoding
the log of the desired page size in addition to the *_HUGETLB flag.
When 0 is specified the default size is used, this makes the change fully
compatible.

Extending the internal hugetlb code to handle this is straight forward. Instead
of a single mount it just keeps an array of them and selects the right
mount based on the specified page size. When no page size is specified
it uses the mount of the default page size.

The change is not visible in /proc/mounts because internal mounts
don't appear there. It also has very little overhead: the additional
mounts just consume a super block, but not more memory when not used.

I also exported the new flags to the user headers
(they were previously under __KERNEL__). Right now only symbols
for x86 and some other architecture for 1GB and 2MB are defined.
The interface should already work for all other architectures
though.  Only architectures that define multiple hugetlb sizes
actually need it (that is currently x86, tile, powerpc). However
tile and powerpc have user configurable hugetlb sizes, so it's
not easy to add defines. A program on those architectures would
need to query sysfs and use the appropiate log2.

v2: Port to new tree. Fix unmount.
v3: Ported to latest tree.
v4: Ported to latest tree. Minor changes for review feedback. Updated
description.
Acked-by: Rik van Riel 
Acked-by: KAMEZAWA Hiroyuki 
Signed-off-by: Andi Kleen 
---
 arch/x86/include/asm/mman.h |3 ++
 fs/hugetlbfs/inode.c|   63 +++
 include/linux/hugetlb.h |   12 ++-
 include/linux/shm.h |   19 
 include/uapi/asm-generic/mman.h |   13 
 ipc/shm.c   |3 +-
 mm/mmap.c   |5 ++-
 7 files changed, 100 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/mman.h b/arch/x86/include/asm/mman.h
index 593e51d..513b05f 100644
--- a/arch/x86/include/asm/mman.h
+++ b/arch/x86/include/asm/mman.h
@@ -3,6 +3,9 @@
 
 #define MAP_32BIT  0x40/* only give out 32bit addresses */
 
+#define MAP_HUGE_2MB(21 << MAP_HUGE_SHIFT)
+#define MAP_HUGE_1GB(30 << MAP_HUGE_SHIFT)
+
 #include 
 
 #endif /* _ASM_X86_MMAN_H */
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index c5bc355..d34bb56 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -923,7 +923,7 @@ static struct file_system_type hugetlbfs_fs_type = {
.kill_sb= kill_litter_super,
 };
 
-static struct vfsmount *hugetlbfs_vfsmount;
+static struct vfsmount *hugetlbfs_vfsmount[HUGE_MAX_HSTATE];
 
 static int can_do_hugetlb_shm(void)
 {
@@ -932,9 +932,22 @@ static int can_do_hugetlb_shm(void)
return capable(CAP_IPC_LOCK) || in_group_p(shm_group);
 }
 
+static int get_hstate_idx(int page_size_log)
+{
+   struct hstate *h;
+
+   if (!page_size_log)
+   return default_hstate_idx;
+   h = size_to_hstate(1 << page_size_log);
+   if (!h)
+   return -1;
+   return h - hstates;
+}
+
 struct file *hugetlb_file_setup(const char *name, unsigned long addr,
size_t size, vm_flags_t acctflag,
-   struct user_struct **user, int creat_flags)
+   struct user_struct **user,
+   int creat_flags, int page_size_log)
 {
int error = -ENOMEM;
struct file *file;
@@ -944,9 +957,14 @@ struct file *hugetlb_file_setup(const char *name, unsigned 
long addr,
struct qstr quick_string;
struct hstate *hstate;
unsigned long num_pages;
+   int hstate_idx;
+
+   hstate_idx = get_hstate_idx(page_size_log);
+   if (hstate_idx < 0)
+   return ERR_PTR(-ENODEV);
 
*user = NULL;
-   if (!hugetlbfs_vfsmount)
+   if (!hugetlbfs_vfsmount[hstate_idx])
return ERR_PTR(-ENOENT);
 
if (creat_flags == HUGETLB_SHMFS_INODE && !can_do_hugetlb_shm()) {
@@ -963,7 +981,7 @@ struct file *hugetlb_file_setup(const char *name, unsigned 
long addr,
}
}
 
-   root = hugetlbfs_vfsmount->mnt_root;
+   root = hugetlbfs_vfsmount[hstate_idx]->mnt_root;
quick_string.name = name;
quick_string.len = strlen(quick_string.name);
quick_string.hash = 0;
@@ -971,7 +989,7 @@ struct file *hugetlb_file_setup(const char *name, unsigned 
long addr,
if (!path.dentry)
goto out_shm_unlock;
 
-   path.mnt = mntget(hugetlbfs_vfsmount);
+   path.mnt = 

Re: [PATCH] MM: Support more pagesizes for MAP_HUGETLB/SHM_HUGETLB v3

2012-10-11 Thread Andi Kleen
> Alas, include/asm-generic/mman.h doesn't exist now.

git resolved it automagically

> 
> Does this change touch all the hugetlb-capable architectures?

I took a look at this again. So not every hugetlb capable architecture
needs it, only architectures with multiple hugetlb page sizes.

This is only x86, tile, powerpc

I looked at tile and powerpc and they both have configurable
hugetlb page sizes. So it's somewhat awkward to add defines
for them.

One disadvantage of this is also the user programs would need
to know the page sizes that are configured. That is definitely
awkward, but I don't know of any way around that.

Luckily there's a way in /sys to query this.

-Andi

> 
> z:/usr/src/linux-3.6> grep -rl MAP_HUGETLB arch
> arch/alpha/include/asm/mman.h
> arch/xtensa/include/asm/mman.h
> arch/parisc/include/asm/mman.h
> arch/tile/include/asm/mman.h
> arch/sparc/include/asm/mman.h
> arch/powerpc/include/asm/mman.h
> arch/mips/include/asm/mman.h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: amd64, v3.6.0: Kernel panic + BUG at net/netfilter/nf_conntrack_core.c:220!

2012-10-11 Thread Pablo Neira Ayuso
On Thu, Oct 11, 2012 at 11:27:33PM +0200, Borislav Petkov wrote:
> On Thu, Oct 11, 2012 at 12:13:33PM -0700, Ian Applegate wrote:
> > On machines serving mainly http traffic we are seeing the following
> > panic, which is not yet reproducible.
> 
> Must be this BUG_ON:
> 
>   if (!nf_ct_is_confirmed(ct)) {
>   
> BUG_ON(hlist_nulls_unhashed(>tuplehash[IP_CT_DIR_ORIGINAL].hnnode));
>   hlist_nulls_del_rcu(>tuplehash[IP_CT_DIR_ORIGINAL].hnnode);
>   }

At quick glance, I think we're hitting a memory corruption, I don't
see by now any sane code path to reach that bugtrap.

More comments below:

> Spamming some more lists and leaving the rest for reference.
> 
> > 
> > 
> > [180926.566743] [ cut here ]
> > [180926.572034] kernel BUG at net/netfilter/nf_conntrack_core.c:220!
> > [180926.578873] invalid opcode:  [#1] SMP
> > [180926.583594] Modules linked in: xfs exportfs ipmi_devintf ipmi_si
> > ipmi_msghandler dm_mod md_mod nf_conntr
> > ack_ipv6 nf_defrag_ipv6 ip6table_filter ip6table_raw ip6_tables
> > nf_conntrack_ipv4 nf_defrag_ipv4 xt_tcpudp x
> > t_conntrack xt_multiport iptable_filter xt_NOTRACK nf_conntrack
> > iptable_raw ip_tables x_tables nfsv4 auth_rp
> > cgss fuse nfsv3 nfs_acl nfs fscache lockd sunrpc sfc mtd i2c_algo_bit
> > i2c_core mdio igb dca uhci_hcd coretem
> > p acpi_cpufreq kvm_intel kvm crc32c_intel aesni_intel ablk_helper
> > cryptd aes_x86_64 aes_generic evdev sd_mod
> >  crc_t10dif mperf snd_pcm ahci snd_timer tpm_tis microcode snd tpm
> > libahci tpm_bios soundcore libata snd_pag
> > e_alloc pcspkr ehci_hcd lpc_ich usbcore mfd_core hpsa scsi_mod
> > usb_common button processor thermal_sys
> > [180926.657762] CPU 12
> > [180926.660008] Pid: 5948, comm: nginx-fl Not tainted 3.6.0-cloudflare
> > #1 HP ProLiant DL180 G6
> > [180926.669820] RIP: 0010:[]  []
> > destroy_conntrack+0x55/0xa9 [nf_conntrack]
> > [180926.680871] RSP: 0018:8805bd73fbb8  EFLAGS: 00010246
> > [180926.686930] RAX:  RBX: 8806b6f56c30 RCX:
> > 8805bd73fc48
> > [180926.695055] RDX:  RSI: 0006 RDI:
> > 8806b6f56c30
> > [180926.703179] RBP: 81651780 R08: 000172e0 R09:
> > 812cef91
> > [180926.711304] R10: dead00200200 R11: dead00100100 R12:
> > 8806b6f56c30
> > [180926.727451] R13:  R14: a02d6030 R15:
> > 
> > [180926.735575] FS:  7f382cdb2710() GS:880627cc()
> > knlGS:
> > [180926.744766] CS:  0010 DS:  ES:  CR0: 80050033
> > [180926.751312] CR2: ff600400 CR3: 0005bd8d3000 CR4:
> > 07e0
> > [180926.759436] DR0:  DR1:  DR2:
> > 
> > [180926.767560] DR3:  DR6: 0ff0 DR7:
> > 0400
> > [180926.775686] Process nginx-fl (pid: 5948, threadinfo
> > 8805bd73e000, task 8805c9755960)
> > [180926.785265] Stack:
> > [180926.787634]   81651780 8802720c2900
> > a02cde78
> > [180926.796087]  81651ec0 a02d6030 bff0efab
> > 8805
> > [180926.804532]  000288050002 00030014 00140003
> > 06ff88060002
> > [180926.812985] Call Trace:
> > [180926.815845]  [] ? nf_conntrack_in+0x4ed/0x5bc
> > [nf_conntrack]

Here below the trace shows the output path to close a tcp socket. But
the line above refers to a conntrack function that is called in the
input path.

If this process is just acting as plain http server, this backtrace
doesn't seem consistent to me.

> > [180926.824069]  [] ? nf_iterate+0x41/0x77
> > [180926.830131]  [] ? ip_options_echo+0x2ed/0x2ed
> > [180926.836873]  [] ? nf_hook_slow+0x68/0xfd
> > [180926.843127]  [] ? ip_options_echo+0x2ed/0x2ed
> > [180926.849866]  [] ? __ip_local_out+0x98/0x9d
> > [180926.856315]  [] ? ip_local_out+0x9/0x19
> > [180926.862465]  [] ? tcp_transmit_skb+0x7ae/0x7f1
> > [180926.869305]  [] ? virt_to_head_page+0x9/0x2c
> > [180926.875949]  [] ? tcp_send_active_reset+0xd5/0x101
> > [180926.883175]  [] ? tcp_close+0x118/0x354
> > [180926.889334]  [] ? inet_release+0x75/0x7b
> > [180926.895591]  [] ? sock_release+0x19/0x73
> > [180926.901845]  [] ? sock_close+0x22/0x27
> > [180926.907906]  [] ? __fput+0xe9/0x1ae
> > [180926.913677]  [] ? task_work_run+0x53/0x67
> > [180926.920031]  [] ? do_notify_resume+0x79/0x8d
> > [180926.926673]  [] ? int_signal+0x12/0x17
> > [180926.932732] Code: 05 48 89 df ff d0 48 c7 c7 30 66 2d a0 e8 11 b0
> > 07 e1 48 89 df e8 72 25 00 00 48 8b 43
> >  78 a8 08 75 2a 48 8b 53 10 48 85 d2 75 04 <0f> 0b eb fe 48 8b 43 08
> > 48 89 02 a8 01 75 04 48 89 50 08 48 be
> > [180926.954788] RIP  [] destroy_conntrack+0x55/0xa9
> > [nf_conntrack]
> > [180926.963217]  RSP 
> > [180926.967700] ---[ end trace 54a660a52afd5820 ]---
> > [180926.973038] Kernel panic - not syncing: Fatal exception in interrupt
> > 
> > ---
> > Ian Applegate
> > 

Re: [PATCH v5 1/3] tracing,x86: Add a TSC trace_clock

2012-10-11 Thread David Sharp
On Mon, Oct 8, 2012 at 7:08 PM, Yoshihiro YUNOMAE
 wrote:
> Hi David,
>
> This is a nice patch set.
>
> I just have found something should be fixed, which related to
> your work. I'll send it following this mail.
>
> Would you mind adding these patches as your patch series?

Thanks for noticing the stats issue. Added them to my series.

>
> Thanks,
>
> Yoshihiro YUNOMAE
>
> (2012/10/02 12:31), David Sharp wrote:
>> In order to promote interoperability between userspace tracers and ftrace,
>> add a trace_clock that reports raw TSC values which will then be recorded
>> in the ring buffer. Userspace tracers that also record TSCs are then on
>> exactly the same time base as the kernel and events can be unambiguously
>> interlaced.
>>
>
> --
> Yoshihiro YUNOMAE
> Software Platform Research Dept. Linux Technology Center
> Hitachi, Ltd., Yokohama Research Laboratory
> E-mail: yoshihiro.yunomae...@hitachi.com
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v6 3/6] tracing: Format non-nanosec times from tsc clock without a decimal point.

2012-10-11 Thread David Sharp
With the addition of the "tsc" clock, formatting timestamps to look like
fractional seconds is misleading. Mark clocks as either in nanoseconds or
not, and format non-nanosecond timestamps as decimal integers.

Tested:
$ cd /sys/kernel/debug/tracing/
$ cat trace_clock
[local] global tsc
$ echo sched_switch > set_event
$ echo 1 > tracing_enabled ; sleep 0.0005 ; echo 0 > tracing_enabled
$ cat trace
  -0 [000]  6330.52: sched_switch: prev_comm=swapper 
prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 
next_prio=120
   sleep-29964 [000]  6330.555628: sched_switch: prev_comm=bash 
prev_pid=29964 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 
next_prio=120
  ...
$ echo 1 > options/latency-format
$ cat trace
  -0   0 4104553247us+: sched_switch: prev_comm=swapper prev_pid=0 
prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 next_prio=120
   sleep-29964   0 4104553322us+: sched_switch: prev_comm=bash prev_pid=29964 
prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
  ...
$ echo tsc > trace_clock
$ cat trace
$ echo 1 > tracing_enabled ; sleep 0.0005 ; echo 0 > tracing_enabled
$ echo 0 > options/latency-format
$ cat trace
  -0 [000] 16490053398357: sched_switch: prev_comm=swapper 
prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 
next_prio=120
   sleep-31128 [000] 16490053588518: sched_switch: prev_comm=bash 
prev_pid=31128 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 
next_prio=120
  ...
echo 1 > options/latency-format
$ cat trace
  -0   0 91557653238+: sched_switch: prev_comm=swapper prev_pid=0 
prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 next_prio=120
   sleep-31128   0 91557843399+: sched_switch: prev_comm=bash prev_pid=31128 
prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
  ...

v2:
Move arch-specific bits out of generic code.
v4:
Fix x86_32 build due to 64-bit division.

Google-Bug-Id: 6980623
Signed-off-by: David Sharp 
Cc: Steven Rostedt 
Cc: Masami Hiramatsu 
---
 arch/x86/include/asm/trace_clock.h |2 +-
 include/linux/ftrace_event.h   |6 +++
 kernel/trace/trace.c   |   15 +-
 kernel/trace/trace.h   |4 --
 kernel/trace/trace_output.c|   80 ---
 5 files changed, 74 insertions(+), 33 deletions(-)

diff --git a/arch/x86/include/asm/trace_clock.h 
b/arch/x86/include/asm/trace_clock.h
index 7ee0d8c..45e17f5 100644
--- a/arch/x86/include/asm/trace_clock.h
+++ b/arch/x86/include/asm/trace_clock.h
@@ -9,7 +9,7 @@
 extern u64 notrace trace_clock_x86_tsc(void);
 
 # define ARCH_TRACE_CLOCKS \
-   { trace_clock_x86_tsc,  "x86-tsc" },
+   { trace_clock_x86_tsc,  "x86-tsc",  .in_ns = 0 },
 
 #endif
 
diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 642928c..c760670 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -86,6 +86,12 @@ struct trace_iterator {
cpumask_var_t   started;
 };
 
+enum trace_iter_flags {
+   TRACE_FILE_LAT_FMT  = 1,
+   TRACE_FILE_ANNOTATE = 2,
+   TRACE_FILE_TIME_IN_NS   = 4,
+};
+
 
 struct trace_event;
 
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 4e26df3..cff3427 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -476,10 +476,11 @@ static const char *trace_options[] = {
 static struct {
u64 (*func)(void);
const char *name;
+   int in_ns;  /* is this clock in nanoseconds? */
 } trace_clocks[] = {
-   { trace_clock_local,"local" },
-   { trace_clock_global,   "global" },
-   { trace_clock_counter,  "counter" },
+   { trace_clock_local,"local",1 },
+   { trace_clock_global,   "global",   1 },
+   { trace_clock_counter,  "counter",  0 },
ARCH_TRACE_CLOCKS
 };
 
@@ -2425,6 +2426,10 @@ __tracing_open(struct inode *inode, struct file *file)
if (ring_buffer_overruns(iter->tr->buffer))
iter->iter_flags |= TRACE_FILE_ANNOTATE;
 
+   /* Output in nanoseconds only if we are using a clock in nanoseconds. */
+   if (trace_clocks[trace_clock_id].in_ns)
+   iter->iter_flags |= TRACE_FILE_TIME_IN_NS;
+
/* stop the trace while dumping */
tracing_stop();
 
@@ -3324,6 +3329,10 @@ static int tracing_open_pipe(struct inode *inode, struct 
file *filp)
if (trace_flags & TRACE_ITER_LATENCY_FMT)
iter->iter_flags |= TRACE_FILE_LAT_FMT;
 
+   /* Output in nanoseconds only if we are using a clock in nanoseconds. */
+   if (trace_clocks[trace_clock_id].in_ns)
+   iter->iter_flags |= TRACE_FILE_TIME_IN_NS;
+
iter->cpu_file = cpu_file;
iter->tr = _trace;
mutex_init(>mutex);
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 55e1f7f..84fefed 100644
--- a/kernel/trace/trace.h
+++ 

[PATCH v6 1/6] tracing,x86: Add a TSC trace_clock

2012-10-11 Thread David Sharp
In order to promote interoperability between userspace tracers and ftrace,
add a trace_clock that reports raw TSC values which will then be recorded
in the ring buffer. Userspace tracers that also record TSCs are then on
exactly the same time base as the kernel and events can be unambiguously
interlaced.

Tested: Enabled a tracepoint and the "tsc" trace_clock and saw very large
timestamp values.

v2:
Move arch-specific bits out of generic code.
v3:
Rename "x86-tsc", cleanups

Google-Bug-Id: 6980623
Signed-off-by: David Sharp 
Cc: Steven Rostedt 
Cc: Masami Hiramatsu 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: "H. Peter Anvin" 
Acked-by: Ingo Molnar 
---
 arch/alpha/include/asm/trace_clock.h  |6 ++
 arch/arm/include/asm/trace_clock.h|6 ++
 arch/avr32/include/asm/trace_clock.h  |6 ++
 arch/blackfin/include/asm/trace_clock.h   |6 ++
 arch/c6x/include/asm/trace_clock.h|6 ++
 arch/cris/include/asm/trace_clock.h   |6 ++
 arch/frv/include/asm/trace_clock.h|6 ++
 arch/h8300/include/asm/trace_clock.h  |6 ++
 arch/hexagon/include/asm/trace_clock.h|6 ++
 arch/ia64/include/asm/trace_clock.h   |6 ++
 arch/m32r/include/asm/trace_clock.h   |6 ++
 arch/m68k/include/asm/trace_clock.h   |6 ++
 arch/microblaze/include/asm/trace_clock.h |6 ++
 arch/mips/include/asm/trace_clock.h   |6 ++
 arch/mn10300/include/asm/trace_clock.h|6 ++
 arch/openrisc/include/asm/trace_clock.h   |6 ++
 arch/parisc/include/asm/trace_clock.h |6 ++
 arch/powerpc/include/asm/trace_clock.h|6 ++
 arch/s390/include/asm/trace_clock.h   |6 ++
 arch/score/include/asm/trace_clock.h  |6 ++
 arch/sh/include/asm/trace_clock.h |6 ++
 arch/sparc/include/asm/trace_clock.h  |6 ++
 arch/tile/include/asm/trace_clock.h   |6 ++
 arch/um/include/asm/trace_clock.h |6 ++
 arch/unicore32/include/asm/trace_clock.h  |6 ++
 arch/x86/include/asm/trace_clock.h|   16 
 arch/x86/kernel/Makefile  |1 +
 arch/x86/kernel/trace_clock.c |   21 +
 arch/xtensa/include/asm/trace_clock.h |6 ++
 include/asm-generic/trace_clock.h |   16 
 include/linux/trace_clock.h   |2 ++
 kernel/trace/trace.c  |1 +
 32 files changed, 213 insertions(+), 0 deletions(-)
 create mode 100644 arch/alpha/include/asm/trace_clock.h
 create mode 100644 arch/arm/include/asm/trace_clock.h
 create mode 100644 arch/avr32/include/asm/trace_clock.h
 create mode 100644 arch/blackfin/include/asm/trace_clock.h
 create mode 100644 arch/c6x/include/asm/trace_clock.h
 create mode 100644 arch/cris/include/asm/trace_clock.h
 create mode 100644 arch/frv/include/asm/trace_clock.h
 create mode 100644 arch/h8300/include/asm/trace_clock.h
 create mode 100644 arch/hexagon/include/asm/trace_clock.h
 create mode 100644 arch/ia64/include/asm/trace_clock.h
 create mode 100644 arch/m32r/include/asm/trace_clock.h
 create mode 100644 arch/m68k/include/asm/trace_clock.h
 create mode 100644 arch/microblaze/include/asm/trace_clock.h
 create mode 100644 arch/mips/include/asm/trace_clock.h
 create mode 100644 arch/mn10300/include/asm/trace_clock.h
 create mode 100644 arch/openrisc/include/asm/trace_clock.h
 create mode 100644 arch/parisc/include/asm/trace_clock.h
 create mode 100644 arch/powerpc/include/asm/trace_clock.h
 create mode 100644 arch/s390/include/asm/trace_clock.h
 create mode 100644 arch/score/include/asm/trace_clock.h
 create mode 100644 arch/sh/include/asm/trace_clock.h
 create mode 100644 arch/sparc/include/asm/trace_clock.h
 create mode 100644 arch/tile/include/asm/trace_clock.h
 create mode 100644 arch/um/include/asm/trace_clock.h
 create mode 100644 arch/unicore32/include/asm/trace_clock.h
 create mode 100644 arch/x86/include/asm/trace_clock.h
 create mode 100644 arch/x86/kernel/trace_clock.c
 create mode 100644 arch/xtensa/include/asm/trace_clock.h
 create mode 100644 include/asm-generic/trace_clock.h

diff --git a/arch/alpha/include/asm/trace_clock.h 
b/arch/alpha/include/asm/trace_clock.h
new file mode 100644
index 000..f35fab8
--- /dev/null
+++ b/arch/alpha/include/asm/trace_clock.h
@@ -0,0 +1,6 @@
+#ifndef _ASM_TRACE_CLOCK_H
+#define _ASM_TRACE_CLOCK_H
+
+#include 
+
+#endif
diff --git a/arch/arm/include/asm/trace_clock.h 
b/arch/arm/include/asm/trace_clock.h
new file mode 100644
index 000..f35fab8
--- /dev/null
+++ b/arch/arm/include/asm/trace_clock.h
@@ -0,0 +1,6 @@
+#ifndef _ASM_TRACE_CLOCK_H
+#define _ASM_TRACE_CLOCK_H
+
+#include 
+
+#endif
diff --git a/arch/avr32/include/asm/trace_clock.h 
b/arch/avr32/include/asm/trace_clock.h
new file mode 100644
index 000..f35fab8
--- /dev/null
+++ b/arch/avr32/include/asm/trace_clock.h
@@ -0,0 +1,6 @@
+#ifndef 

[PATCH v6 6/6] tracing: Fix maybe-uninitialized warning in ftrace_function_set_regexp

2012-10-11 Thread David Sharp
Compiler warning:

kernel/trace/trace_events_filter.c: In function 'ftrace_function_set_filter_cb':
kernel/trace/trace_events_filter.c:2074:8: error: 'ret' may be used 
uninitialized in this function [-Werror=maybe-uninitialized]

Signed-off-by: David Sharp 
Cc: Steven Rostedt 
---
 kernel/trace/trace_events_filter.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/kernel/trace/trace_events_filter.c 
b/kernel/trace/trace_events_filter.c
index 431dba8..ef36953 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -2002,9 +2002,10 @@ static int ftrace_function_set_regexp(struct ftrace_ops 
*ops, int filter,
 static int __ftrace_function_set_filter(int filter, char *buf, int len,
struct function_filter_data *data)
 {
-   int i, re_cnt, ret;
+   int i, re_cnt;
int *reset;
char **re;
+   int ret = 0;
 
reset = filter ? >first_filter : >first_notrace;
 
-- 
1.7.7.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v6 5/6] ftrace: Show raw time stamp on stats per cpu using counter or tsc mode for trace_clock

2012-10-11 Thread David Sharp
From: Yoshihiro YUNOMAE 

Show raw time stamp values for stats per cpu if you choose counter or tsc mode
for trace_clock. Although a unit of tracing time stamp is nsec in local or 
global mode,
the units in counter and TSC mode are tracing counter and cycles respectively.

Signed-off-by: Yoshihiro YUNOMAE 
Cc: Steven Rostedt 
Cc: Frederic Weisbecker 
Cc: Ingo Molnar 
Signed-off-by: David Sharp 
---
 kernel/trace/trace.c |   23 +--
 1 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index cff3427..8bfa3b7 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4396,13 +4396,24 @@ tracing_stats_read(struct file *filp, char __user *ubuf,
cnt = ring_buffer_bytes_cpu(tr->buffer, cpu);
trace_seq_printf(s, "bytes: %ld\n", cnt);
 
-   t = ns2usecs(ring_buffer_oldest_event_ts(tr->buffer, cpu));
-   usec_rem = do_div(t, USEC_PER_SEC);
-   trace_seq_printf(s, "oldest event ts: %5llu.%06lu\n", t, usec_rem);
+   if (trace_clocks[trace_clock_id].in_ns) {
+   /* local or global for trace_clock */
+   t = ns2usecs(ring_buffer_oldest_event_ts(tr->buffer, cpu));
+   usec_rem = do_div(t, USEC_PER_SEC);
+   trace_seq_printf(s, "oldest event ts: %5llu.%06lu\n",
+   t, usec_rem);
+
+   t = ns2usecs(ring_buffer_time_stamp(tr->buffer, cpu));
+   usec_rem = do_div(t, USEC_PER_SEC);
+   trace_seq_printf(s, "now ts: %5llu.%06lu\n", t, usec_rem);
+   } else {
+   /* counter or tsc mode for trace_clock */
+   trace_seq_printf(s, "oldest event ts: %llu\n",
+   ring_buffer_oldest_event_ts(tr->buffer, cpu));
 
-   t = ns2usecs(ring_buffer_time_stamp(tr->buffer, cpu));
-   usec_rem = do_div(t, USEC_PER_SEC);
-   trace_seq_printf(s, "now ts: %5llu.%06lu\n", t, usec_rem);
+   trace_seq_printf(s, "now ts: %llu\n",
+   ring_buffer_time_stamp(tr->buffer, cpu));
+   }
 
count = simple_read_from_buffer(ubuf, count, ppos, s->buffer, s->len);
 
-- 
1.7.7.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v6 2/6] tracing: Reset ring buffer when changing trace_clocks

2012-10-11 Thread David Sharp
Because the "tsc" clock isn't in nanoseconds, the ring buffer must be
reset when changing clocks so that incomparable timestamps don't end up
in the same trace.

Tested: Confirmed switching clocks resets the trace buffer.

Google-Bug-Id: 6980623
Signed-off-by: David Sharp 
Cc: Steven Rostedt 
Cc: Masami Hiramatsu 
---
 kernel/trace/trace.c |8 
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 92fb08e..4e26df3 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4012,6 +4012,14 @@ static ssize_t tracing_clock_write(struct file *filp, 
const char __user *ubuf,
if (max_tr.buffer)
ring_buffer_set_clock(max_tr.buffer, trace_clocks[i].func);
 
+   /*
+* New clock may not be consistent with the previous clock.
+* Reset the buffer so that it doesn't have incomparable timestamps.
+*/
+   tracing_reset_online_cpus(_trace);
+   if (max_tr.buffer)
+   tracing_reset_online_cpus(_tr);
+
mutex_unlock(_types_lock);
 
*fpos += cnt;
-- 
1.7.7.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v6 0/6] TSC trace_clock

2012-10-11 Thread David Sharp
Added Yoshihiro Yonomae's patches to change the per_cpu stats to show raw
timestamps when the clock is not in nanoseconds.

Also added a small patch to fix a warning.


David Sharp (4):
  tracing,x86: Add a TSC trace_clock
  tracing: Reset ring buffer when changing trace_clocks
  tracing: Format non-nanosec times from tsc clock without a decimal
point.
  tracing: Fix maybe-uninitialized warning in
ftrace_function_set_regexp

Yoshihiro YUNOMAE (2):
  ftrace: Change unsigned long type of ring_buffer_oldest_event_ts() to
u64
  ftrace: Show raw time stamp on stats per cpu using counter or tsc
mode for trace_clock

 arch/alpha/include/asm/trace_clock.h  |6 ++
 arch/arm/include/asm/trace_clock.h|6 ++
 arch/avr32/include/asm/trace_clock.h  |6 ++
 arch/blackfin/include/asm/trace_clock.h   |6 ++
 arch/c6x/include/asm/trace_clock.h|6 ++
 arch/cris/include/asm/trace_clock.h   |6 ++
 arch/frv/include/asm/trace_clock.h|6 ++
 arch/h8300/include/asm/trace_clock.h  |6 ++
 arch/hexagon/include/asm/trace_clock.h|6 ++
 arch/ia64/include/asm/trace_clock.h   |6 ++
 arch/m32r/include/asm/trace_clock.h   |6 ++
 arch/m68k/include/asm/trace_clock.h   |6 ++
 arch/microblaze/include/asm/trace_clock.h |6 ++
 arch/mips/include/asm/trace_clock.h   |6 ++
 arch/mn10300/include/asm/trace_clock.h|6 ++
 arch/openrisc/include/asm/trace_clock.h   |6 ++
 arch/parisc/include/asm/trace_clock.h |6 ++
 arch/powerpc/include/asm/trace_clock.h|6 ++
 arch/s390/include/asm/trace_clock.h   |6 ++
 arch/score/include/asm/trace_clock.h  |6 ++
 arch/sh/include/asm/trace_clock.h |6 ++
 arch/sparc/include/asm/trace_clock.h  |6 ++
 arch/tile/include/asm/trace_clock.h   |6 ++
 arch/um/include/asm/trace_clock.h |6 ++
 arch/unicore32/include/asm/trace_clock.h  |6 ++
 arch/x86/include/asm/trace_clock.h|   16 ++
 arch/x86/kernel/Makefile  |1 +
 arch/x86/kernel/trace_clock.c |   21 
 arch/xtensa/include/asm/trace_clock.h |6 ++
 include/asm-generic/trace_clock.h |   16 ++
 include/linux/ftrace_event.h  |6 ++
 include/linux/ring_buffer.h   |2 +-
 include/linux/trace_clock.h   |2 +
 kernel/trace/ring_buffer.c|4 +-
 kernel/trace/trace.c  |   47 ++---
 kernel/trace/trace.h  |4 --
 kernel/trace/trace_events_filter.c|3 +-
 kernel/trace/trace_output.c   |   80 -
 38 files changed, 316 insertions(+), 42 deletions(-)
 create mode 100644 arch/alpha/include/asm/trace_clock.h
 create mode 100644 arch/arm/include/asm/trace_clock.h
 create mode 100644 arch/avr32/include/asm/trace_clock.h
 create mode 100644 arch/blackfin/include/asm/trace_clock.h
 create mode 100644 arch/c6x/include/asm/trace_clock.h
 create mode 100644 arch/cris/include/asm/trace_clock.h
 create mode 100644 arch/frv/include/asm/trace_clock.h
 create mode 100644 arch/h8300/include/asm/trace_clock.h
 create mode 100644 arch/hexagon/include/asm/trace_clock.h
 create mode 100644 arch/ia64/include/asm/trace_clock.h
 create mode 100644 arch/m32r/include/asm/trace_clock.h
 create mode 100644 arch/m68k/include/asm/trace_clock.h
 create mode 100644 arch/microblaze/include/asm/trace_clock.h
 create mode 100644 arch/mips/include/asm/trace_clock.h
 create mode 100644 arch/mn10300/include/asm/trace_clock.h
 create mode 100644 arch/openrisc/include/asm/trace_clock.h
 create mode 100644 arch/parisc/include/asm/trace_clock.h
 create mode 100644 arch/powerpc/include/asm/trace_clock.h
 create mode 100644 arch/s390/include/asm/trace_clock.h
 create mode 100644 arch/score/include/asm/trace_clock.h
 create mode 100644 arch/sh/include/asm/trace_clock.h
 create mode 100644 arch/sparc/include/asm/trace_clock.h
 create mode 100644 arch/tile/include/asm/trace_clock.h
 create mode 100644 arch/um/include/asm/trace_clock.h
 create mode 100644 arch/unicore32/include/asm/trace_clock.h
 create mode 100644 arch/x86/include/asm/trace_clock.h
 create mode 100644 arch/x86/kernel/trace_clock.c
 create mode 100644 arch/xtensa/include/asm/trace_clock.h
 create mode 100644 include/asm-generic/trace_clock.h

-- 
1.7.7.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v6 4/6] ftrace: Change unsigned long type of ring_buffer_oldest_event_ts() to u64

2012-10-11 Thread David Sharp
From: Yoshihiro YUNOMAE 

ring_buffer_oldest_event_ts() should return a value of u64 type, because
ring_buffer_per_cpu->buffer_page->buffer_data_page->time_stamp is u64 type.

Signed-off-by: Yoshihiro YUNOMAE 
Cc: Steven Rostedt 
Cc: Frederic Weisbecker 
Cc: Ingo Molnar 
Cc: Vaibhav Nagarnaik 
Signed-off-by: David Sharp 
---
 include/linux/ring_buffer.h |2 +-
 kernel/trace/ring_buffer.c  |4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index 6c8835f..c68a09a 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -159,7 +159,7 @@ int ring_buffer_record_is_on(struct ring_buffer *buffer);
 void ring_buffer_record_disable_cpu(struct ring_buffer *buffer, int cpu);
 void ring_buffer_record_enable_cpu(struct ring_buffer *buffer, int cpu);
 
-unsigned long ring_buffer_oldest_event_ts(struct ring_buffer *buffer, int cpu);
+u64 ring_buffer_oldest_event_ts(struct ring_buffer *buffer, int cpu);
 unsigned long ring_buffer_bytes_cpu(struct ring_buffer *buffer, int cpu);
 unsigned long ring_buffer_entries(struct ring_buffer *buffer);
 unsigned long ring_buffer_overruns(struct ring_buffer *buffer);
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 49491fa..db3806e 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -2925,12 +2925,12 @@ rb_num_of_entries(struct ring_buffer_per_cpu 
*cpu_buffer)
  * @buffer: The ring buffer
  * @cpu: The per CPU buffer to read from.
  */
-unsigned long ring_buffer_oldest_event_ts(struct ring_buffer *buffer, int cpu)
+u64 ring_buffer_oldest_event_ts(struct ring_buffer *buffer, int cpu)
 {
unsigned long flags;
struct ring_buffer_per_cpu *cpu_buffer;
struct buffer_page *bpage;
-   unsigned long ret;
+   u64 ret;
 
if (!cpumask_test_cpu(cpu, buffer->cpumask))
return 0;
-- 
1.7.7.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 03/31] sections: Fix section conflicts in arch/frv

2012-10-11 Thread Andi Kleen
> Unfortunately __pminitconst isn't defined at this point:
> arch/frv/kernel/setup.c:187:47: error: expected '=', ',', ';', 'asm'
> or '__attribute__' before '*' token
> arch/frv/kernel/setup.c:386:2: error: 'clock_cmodes' undeclared (first
> use in this function)
> arch/frv/kernel/setup.c:571:6: error: 'clock_cmodes' undeclared (first
> use in this function)
> make[2]: *** [arch/frv/kernel/setup.o] Error 1
> 
> http://kisskb.ellerman.id.au/kisskb/buildresult/7344691/
> 
> It seems the __pminit* variants are frv-specific, and don't cover all possible
> combinations?

Thanks for reporting.

Does this fix it? -Andi

---

frv: Fix const sections changhe

Add __pminitconst to fix the build again.

Reported by: Geert Uytterhoeven
Signed-off-by: Andi Kleen 

diff --git a/arch/frv/kernel/setup.c b/arch/frv/kernel/setup.c
index 1f1e5ef..b8993c8 100644
--- a/arch/frv/kernel/setup.c
+++ b/arch/frv/kernel/setup.c
@@ -112,9 +112,11 @@ char __initdata redboot_command_line[COMMAND_LINE_SIZE];
 #ifdef CONFIG_PM
 #define __pminit
 #define __pminitdata
+#define __pminitconst
 #else
 #define __pminit __init
 #define __pminitdata __initdata
+#define __pminitconst __initconst
 #endif
 
 struct clock_cmode {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 03/10] x86, mm: get early page table from BRK

2012-10-11 Thread H. Peter Anvin
On 10/12/2012 06:55 AM, Yinghai Lu wrote:
> On Thu, Oct 11, 2012 at 7:41 AM, Konrad Rzeszutek Wilk
>  wrote:
>> On Wed, Oct 10, 2012 at 08:49:05AM -0700, Yinghai Lu wrote:
>>>
>>> yes that is some extreme case:
>>> assume that 2M range is [2T-2M, 2T),
>>
>> What is T in here? Terabyte? Is the '[' vs ')' a significance in your
>> explanation? Should it be '[2T-2M, 2T]' ?
> 
> yes, T is terabyte
> 
> [2T-2M, 2T) is equal to [2T-2M, 2T-1]
> 
> ) mean the boundary is not included.
> 

Right... and just to defend Yinghai here, this is a standard
mathematical notation.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


mmotm 2012-10-11-16-14 uploaded

2012-10-11 Thread akpm
The mm-of-the-moment snapshot 2012-10-11-16-14 has been uploaded to

   http://www.ozlabs.org/~akpm/mmotm/

mmotm-readme.txt says

README for mm-of-the-moment:

http://www.ozlabs.org/~akpm/mmotm/

This is a snapshot of my -mm patch queue.  Uploaded at random hopefully
more than once a week.

You will need quilt to apply these patches to the latest Linus release (3.x
or 3.x-rcY).  The series file is in broken-out.tar.gz and is duplicated in
http://ozlabs.org/~akpm/mmotm/series

The file broken-out.tar.gz contains two datestamp files: .DATE and
.DATE--mm-dd-hh-mm-ss.  Both contain the string -mm-dd-hh-mm-ss,
followed by the base kernel version against which this patch series is to
be applied.

This tree is partially included in linux-next.  To see which patches are
included in linux-next, consult the `series' file.  Only the patches
within the #NEXT_PATCHES_START/#NEXT_PATCHES_END markers are included in
linux-next.

A git tree which contains the memory management portion of this tree is
maintained at git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git
by Michal Hocko.  It contains the patches which are between the
"#NEXT_PATCHES_START mm" and "#NEXT_PATCHES_END" markers, from the series
file, http://www.ozlabs.org/~akpm/mmotm/series.


A full copy of the full kernel tree with the linux-next and mmotm patches
already applied is available through git within an hour of the mmotm
release.  Individual mmotm releases are tagged.  The master branch always
points to the latest release, so it's constantly rebasing.

http://git.cmpxchg.org/?p=linux-mmotm.git;a=summary

To develop on top of mmotm git:

  $ git remote add mmotm 
git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git
  $ git remote update mmotm
  $ git checkout -b topic mmotm/master
  
  $ git send-email mmotm/master.. [...]

To rebase a branch with older patches to a new mmotm release:

  $ git remote update mmotm
  $ git rebase --onto mmotm/master  topic




The directory http://www.ozlabs.org/~akpm/mmots/ (mm-of-the-second)
contains daily snapshots of the -mm tree.  It is updated more frequently
than mmotm, and is untested.

A git copy of this tree is available at

http://git.cmpxchg.org/?p=linux-mmots.git;a=summary

and use of this tree is similar to
http://git.cmpxchg.org/?p=linux-mmotm.git, described above.


This mmotm tree contains the following patches against 3.6:
(patches marked "*" will be included in linux-next)

  origin.patch
  linux-next.patch
  linux-next-git-rejects.patch
  i-need-old-gcc.patch
  arch-alpha-kernel-systblss-remove-debug-check.patch
* linux-coredumph-needs-asm-siginfoh.patch
* memstick-remove-unused-field-from-state-struct.patch
* memstick-ms_block-fix-complile-issue.patch
* memstick-use-after-free-in-msb_disk_release.patch
* memstick-memory-leak-on-error-in-msb_ftl_scan.patch
* kernel-sysc-fix-stack-memory-content-leak-via-uname26.patch
* 
drivers-video-backlight-lm3639_blc-return-proper-error-in-lm3639_bled_mode_store-error-paths.patch
* pidns-remove-recursion-from-free_pid_ns-v5.patch
* pidns-remove-recursion-from-free_pid_ns-v5-fix.patch
* cris-fix-i-o-macros.patch
* selinux-fix-sel_netnode_insert-suspicious-rcu-dereference.patch
* mm-slab-release-slab_mutex-earlier-in-kmem_cache_destroy.patch
* nohz-fix-idle-ticks-in-cpu-summary-line-of-proc-stat.patch
* vfs-d_obtain_alias-needs-to-use-as-default-name.patch
* cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch
* cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved-fix.patch
* 
acpi_memhotplugc-fix-memory-leak-when-memory-device-is-unbound-from-the-module-acpi_memhotplug.patch
* acpi_memhotplugc-free-memory-device-if-acpi_memory_enable_device-failed.patch
* acpi_memhotplugc-remove-memory-info-from-list-before-freeing-it.patch
* 
acpi_memhotplugc-dont-allow-to-eject-the-memory-device-if-it-is-being-used.patch
* acpi_memhotplugc-bind-the-memory-device-when-the-driver-is-being-loaded.patch
* 
acpi_memhotplugc-auto-bind-the-memory-device-which-is-hotplugged-before-the-driver-is-loaded.patch
* 
arch-x86-platform-iris-irisc-register-a-platform-device-and-a-platform-driver.patch
* x86-numa-dont-check-if-node-is-numa_no_node.patch
* arch-x86-tools-insn_sanityc-identify-source-of-messages.patch
* uv-fix-incorrect-tlb-flush-all-issue.patch
* olpc-fix-olpc-xo1-scic-build-errors.patch
* fs-debugsfs-remove-unnecessary-inode-i_private-initialization.patch
* pcmcia-move-unbind-rebind-into-dev_pm_opscomplete.patch
* drm-i915-optimize-div_round_closest-call.patch
* drm-fix-radeon-printk-format-warnings.patch
  cyber2000fb-avoid-palette-corruption-at-higher-clocks.patch
* timeconstpl-remove-deprecated-defined-array.patch
* time-dont-inline-export_symbol-functions.patch
* h8300-select-generic-atomic64_t-support.patch
* cciss-cleanup-bitops-usage.patch
* cciss-use-check_signature.patch
* block-store-partition_meta_infouuid-as-a-string.patch
* init-reduce-partuuid-min-length-to-1-from-36.patch
* 

Re: Linux 2.6.32.60

2012-10-11 Thread H. Peter Anvin
On 10/11/2012 07:31 PM, Willy Tarreau wrote:
> On Thu, Oct 11, 2012 at 07:58:04PM +0900, Greg KH wrote:
>> On Thu, Oct 11, 2012 at 08:29:16AM +0200, Willy Tarreau wrote:
>>> If you think these patches constitute a regression, I can revert them.
>>> However I'd like convincing arguments since they're here to help address
>>> a real issue.
>>
>> If I missed these when doing the random number generation backport for
>> 3.0, and I should add them there as well, please let me know.
> 
> At least I think they should not be in 2.6.32 without being in 3.0.
> Probably that Peter's opinion will help us decide whether they should
> go into 3.0 or 2.6.32 should revert them.
> 

I would strongly argue for at least one of the RDRAND-enabling versions
being in all supported kernels; the second (with Ted Ts'o's changes) is
better, but touches a *lot* of subsystems; the plain one is
self-contained but only helps RDRAND-enabled hardware.

Without these patches the random subsystem has a critical security flaw,
which puts it into the scope for stable.

-hpa



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Q] Default SLAB allocator

2012-10-11 Thread Andi Kleen
David Rientjes  writes:

> On Thu, 11 Oct 2012, Andi Kleen wrote:
>
>> > While I've always thought SLUB was the default and recommended allocator,
>> > I'm surprise to find that it's not always the case:
>> 
>> iirc the main performance reasons for slab over slub have mostly
>> disappeared, so in theory slab could be finally deprecated now.
>> 
>
> SLUB is a non-starter for us and incurs a >10% performance degradation in 
> netperf TCP_RR.

When did you last test? Our regressions had disappeared a few kernels
ago.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] compat: VIDEO_SET_SPU_PALETTE missing error check

2012-10-11 Thread Kees Cook
The compat ioctl for VIDEO_SET_SPU_PALETTE was missing an error check
while converting ioctl arguments. This could lead to leaking kernel
stack contents into userspace.

Patch extracted from existing fix in grsecurity.

Cc: David Miller 
Cc: Brad Spengler 
Cc: PaX Team 
Cc: sta...@vger.kernel.org
Signed-off-by: Kees Cook 
---
 fs/compat_ioctl.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/compat_ioctl.c b/fs/compat_ioctl.c
index f505402..4c6285f 100644
--- a/fs/compat_ioctl.c
+++ b/fs/compat_ioctl.c
@@ -210,6 +210,8 @@ static int do_video_set_spu_palette(unsigned int fd, 
unsigned int cmd,
 
err  = get_user(palp, >palette);
err |= get_user(length, >length);
+   if (err)
+   return -EFAULT;
 
up_native = compat_alloc_user_space(sizeof(struct video_spu_palette));
err  = put_user(compat_ptr(palp), _native->palette);
-- 
1.7.9.5


-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v3 0/7] x86: Use BRK to pre mapping page table to make xen happy

2012-10-11 Thread Yinghai Lu
On Wed, Oct 10, 2012 at 11:13 PM, Yinghai Lu  wrote:
> On Wed, Oct 10, 2012 at 9:40 AM, Stefano Stabellini
>  wrote:
>>
>> So you are missing the Xen patches entirely in this iteration of the
>> series?
>
> please check updated for-x86-mm branch.
>
> [PATCH -v4 00/15] x86: Use BRK to pre mapping page table to make xen happy
>
> could be found at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
> for-x86-mm

Stefano,

Can you try -v4 to see if xen works with the changes?

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT] TPM bugfixes

2012-10-11 Thread Andrew Morton
On Thu, 11 Oct 2012 15:49:36 -0700
Andrew Morton  wrote:

> On Fri, 12 Oct 2012 00:45:06 +0200
> richard -rw- weinberger  wrote:
> 
> > On Fri, Oct 12, 2012 at 12:19 AM, Andrew Morton
> >  wrote:
> > > On Thu, 11 Oct 2012 21:54:18 +1100 (EST)
> > > James Morris  wrote:
> > >
> > >> Please pull these fixes for the TPM code.
> > >>
> > >> The following changes since commit 
> > >> 12250d843e8489ee00b5b7726da855e51694e792:
> > >>
> > >>   Merge branch 'i2c-embedded/for-next' of 
> > >> git://git.pengutronix.de/git/wsa/linux (2012-10-11 10:27:51 +0900)
> > >>
> > >> are available in the git repository at:
> > >>
> > >>   
> > >> git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git 
> > >> for-linus
> > >
> > > Gargh.  Is it possible to add a human-readable http URL to these things
> > > so that people can actually look at the patches without hoop-jumping?
> > 
> > rw@mantary:~> echo
> > "git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git
> > for-linus" | sed -e
> > 's/git\:\/\//http\:\/\//;s/\/pub\/scm\//\?p=/g;s/\.git
> > /\.git\;a=shortlog;h=refs\/heads\//g'
> > 
> > http://git.kernel.org?p=linux/kernel/git/jmorris/linux-security.git;a=shortlog;h=refs/heads/for-linus
> > 
> 
> Geeze.
> 
> Thanks.  Followed by ^F, copy-n-paste, then hope it's on the first page?

http://git.kernel.org/?p=linux/kernel/git/jmorris/linux-security.git;a=commit;h=abce9ac292e13da367bbd22c1f7669f988d931ac

Which can be shortened to something like

http://git.kernel.org/?p=linux/kernel/git/jmorris/linux-security.git;h=abce9ac2

Simply including the commit IDs would help a lot.  Including the full
URL to each commit would be nicer.

I assume Junio owns that script?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Q] Default SLAB allocator

2012-10-11 Thread David Rientjes
On Thu, 11 Oct 2012, Andi Kleen wrote:

> > While I've always thought SLUB was the default and recommended allocator,
> > I'm surprise to find that it's not always the case:
> 
> iirc the main performance reasons for slab over slub have mostly
> disappeared, so in theory slab could be finally deprecated now.
> 

SLUB is a non-starter for us and incurs a >10% performance degradation in 
netperf TCP_RR.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/8] Improve performance of VM translation on x86_64

2012-10-11 Thread H. Peter Anvin
On 10/12/2012 06:40 AM, Andi Kleen wrote:
> 
> Patch series looks good to me. Thanks for doing this properly.
> Reviewed-by: Andi Kleen 
> 

Agreed.

Acked-by: H. Peter Anvin 

I will pick this up after the merge window closes unless Ingo beats me
to it.  (I'm currently traveling.)

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 3/3] tracing: Format non-nanosec times from tsc clock without a decimal point.

2012-10-11 Thread David Sharp
On Thu, Oct 11, 2012 at 1:43 PM, Steven Rostedt  wrote:
> On Mon, 2012-10-01 at 20:31 -0700, David Sharp wrote:
>
>>  static int
>> -lat_print_timestamp(struct trace_seq *s, u64 abs_usecs,
>> - unsigned long rel_usecs)
>> +lat_print_timestamp(struct trace_iterator *iter, u64 next_ts)
>>  {
>> - return trace_seq_printf(s, " %4lldus%c: ", abs_usecs,
>> - rel_usecs > preempt_mark_thresh ? '!' :
>> -   rel_usecs > 1 ? '+' : ' ');
>> + unsigned long verbose = trace_flags & TRACE_ITER_VERBOSE;
>> + unsigned long in_ns = iter->iter_flags & TRACE_FILE_TIME_IN_NS;
>> + unsigned long long abs_ts = iter->ts - iter->tr->time_start;
>> + unsigned long long rel_ts = next_ts - iter->ts;
>> + struct trace_seq *s = >seq;
>> + unsigned long mark_thresh;
>> + int ret;
>> +
>> + if (in_ns) {
>> + abs_ts = ns2usecs(abs_ts);
>> + rel_ts = ns2usecs(rel_ts);
>> + }
>> +
>> + if (verbose && in_ns) {
>> + unsigned long abs_msec = abs_ts;
>> + unsigned long abs_usec = do_div(abs_msec, USEC_PER_MSEC);
>> + unsigned long rel_msec = rel_ts;
>> + unsigned long rel_usec = do_div(rel_msec, USEC_PER_MSEC);
>> +
>> + ret = trace_seq_printf(
>> + s, "[%08llx] %ld.%03ldms (+%ld.%03ldms): ",
>> + ns2usecs(iter->ts),
>> + abs_msec, abs_usec,
>> + rel_msec, rel_usec);
>> + } else if (verbose && !in_ns) {
>> + ret = trace_seq_printf(
>> + s, "[%016llx] %lld (+%lld): ",
>> + iter->ts, abs_ts, rel_ts);
>> + } else if (!verbose && in_ns) {
>> + ret = trace_seq_printf(
>> + s, " %4lldus: ",
>
> Missing %c.
>
>> + abs_ts,
>> + rel_ts > preempt_mark_thresh_us ? '!' :
>> +   rel_ts > 1 ? '+' : ' ');
>> + } else { /* !verbose && !in_ns */
>> + ret = trace_seq_printf(s, " %4lld%s%c: ", abs_ts);
>
> Um, "%s%c" with no matching arguments.

Sorry for being so sloppy on these patches... Wonder why I didn't see
the printf warnings.

I'll send out an new patchset with this fixed, and include Yoshihiro
Yunomae's patches.

>
> -- Steve
>
>> + }
>> + return ret;
>>  }
>>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >