date:20120911

Re: [PATCH] HID: picolcd_fb: Use flush_delayed_work instead of flush_delayed_work_sync

2012-09-11 Thread Bruno Prémont

Hi Axel,

On Wed, 12 Sep 2012 13:39:09 Axel Lin  wrote:
> Seems this is a left over of commit 4382973
> "workqueue: deprecate flush[_delayed]_work_sync()"

It is, see https://lkml.org/lkml/2012/9/6/297 for a previous patch.
This should fix itself when Linus merges Tejun's workqueue changes and
picolcd changes.

Bruno


> This fixes below build warning:
> 
>   CC [M]  drivers/hid/hid-picolcd_fb.o
> drivers/hid/hid-picolcd_fb.c: In function 'picolcd_exit_framebuffer':
> drivers/hid/hid-picolcd_fb.c:611:2: warning: 'flush_delayed_work_sync' is 
> deprecated (declared at include/linux/workqueue.h:454) 
> [-Wdeprecated-declarations]
> 
> Signed-off-by: Axel Lin 
> Cc: Tejun Heo 
> ---
>  drivers/hid/hid-picolcd_fb.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/hid/hid-picolcd_fb.c b/drivers/hid/hid-picolcd_fb.c
> index 0008a51..eb00357 100644
> --- a/drivers/hid/hid-picolcd_fb.c
> +++ b/drivers/hid/hid-picolcd_fb.c
> @@ -608,7 +608,7 @@ void picolcd_exit_framebuffer(struct picolcd_data *data)
>   /* make sure there is no running update - thus that fbdata->picolcd
>* once obtained under lock is guaranteed not to get free() under
>* the feet of the deferred work */
> - flush_delayed_work_sync(>deferred_work);
> + flush_delayed_work(>deferred_work);
>  
>   data->fb_info = NULL;
>   unregister_framebuffer(info);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: iwl3945: order 5 allocation during ifconfig up; vm problem?

2012-09-11 Thread Marc MERLIN

On Wed, Sep 12, 2012 at 07:16:28AM +0200, Eric Dumazet wrote:
> On Tue, 2012-09-11 at 16:25 -0700, Andrew Morton wrote:
> 
> > Asking for a 256k allocation is pretty crazy - this is an operating
> > system kernel, not a userspace application.
> > 
> > I'm wondering if this is due to a recent change, but I'm having trouble
> > working out where the allocation call site is.
> > --
> 
> (Adding Marc Merlin to CC, since he reported same problem)
> 
> Thats the firmware loading in iwlwifi driver. Not sure if it can use SG.
> 
> drivers/net/wireless/iwlwifi/iwl-drv.c
> 
> iwl_alloc_ucode() -> iwl_alloc_fw_desc() -> dma_alloc_coherent()
> 
> It seems some sections of /lib/firmware/iwlwifi*.ucode files are above
> 128 Kbytes, so dma_alloc_coherent() try order-5 allocations

Thanks for looping me in, yes, this looks very familiar to me :)

In the other thread, Johannes Berg gave me this patch which is supposed to
help: http://p.sipsolutions.net/11ea33b376a5bac5.txt

Unfortunately due to very long work days, I haven't had the time to try it
out yet, but I will soon.

Would that help in this case too?

And to answer David Rientjes, I also have compaction on:
gandalfthegreat:~# zgrep CONFIG_COMPACTION /proc/config.gz 
CONFIG_COMPACTION=y

Full config:
http://marc.merlins.org/tmp/config-3.5.2-amd64-preempt-noide-20120731

If that helps for comparison, my thread is here:
http://www.spinics.net/lists/linux-wireless/msg96438.html

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: manual merge of the staging tree with the thermal tree

2012-09-11 Thread Stephen Rothwell

Hi Greg,

Today's linux-next merge of the staging tree got a conflict in
drivers/staging/omap-thermal/omap-thermal-common.c between commit
76f726fb24bd ("Thermal: Remove tc1/tc2 in generic thermal layer") from
the thermal tree and commit 765a1939a364 ("staging: omap-thermal: fix
polling period settings") from the staging tree.

I fixed it up (I think - see below) and can carry the fix as necessary
(no action is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc drivers/staging/omap-thermal/omap-thermal-common.c
index b4cd6cc,46ee0a9..000
--- a/drivers/staging/omap-thermal/omap-thermal-common.c
+++ b/drivers/staging/omap-thermal/omap-thermal-common.c
@@@ -248,7 -268,9 +270,8 @@@ int omap_thermal_expose_sensor(struct o
/* Create thermal zone */
data->omap_thermal = thermal_zone_device_register(domain,
OMAP_TRIP_NUMBER, 0, data, _thermal_ops,
-   0, FAST_TEMP_MONITORING_RATE);
 -  1, 2, /*TODO: remove this when FW allows */
+   FAST_TEMP_MONITORING_RATE,
+   FAST_TEMP_MONITORING_RATE);
if (IS_ERR_OR_NULL(data->omap_thermal)) {
dev_err(bg_ptr->dev, "thermal zone device is NULL\n");
return PTR_ERR(data->omap_thermal);


pgpeAeqMCSxxW.pgp
Description: PGP signature

Re: [RFC v8 PATCH 00/20] memory-hotplug: hot-remove physical memory

2012-09-11 Thread Wen Congyang

At 09/10/2012 09:52 PM, Vasilis Liaskovitis Wrote:
> Hi,
> 
> On Mon, Sep 10, 2012 at 10:01:44AM +0800, Wen Congyang wrote:
>> At 09/10/2012 09:46 AM, Yasuaki Ishimatsu Wrote:
>>> Hi Wen,
>>>
>>> 2012/09/01 5:49, Andrew Morton wrote:
 On Tue, 28 Aug 2012 18:00:07 +0800
 we...@cn.fujitsu.com wrote:

> This patch series aims to support physical memory hot-remove.

 I doubt if many people have hardware which permits physical memory
 removal?  How would you suggest that people with regular hardware can
 test these chagnes?
>>>
>>> How do you test the patch? As Andrew says, for hot-removing memory,
>>> we need a particular hardware. I think so too. So many people may want
>>> to know how to test the patch.
>>> If we apply following patch to kvm guest, can we hot-remove memory on
>>> kvm guest?
>>>
>>> http://lists.gnu.org/archive/html/qemu-devel/2012-07/msg01389.html
>>
>> Yes, if we apply this patchset, we can test hot-remove memory on kvm guest.
>> But that patchset doesn't implement _PS3, so there is some restriction.
> 
> the following repos contain the patchset above, plus 2 more patches that add
> PS3 support to the dimm devices in qemu/seabios:
> 
> https://github.com/vliaskov/seabios/commits/memhp-v2
> https://github.com/vliaskov/qemu-kvm/commits/memhp-v2
> 
> I have not posted the PS3 patches yet in the qemu list, but will post them
> soon for v3 of the memory hotplug series. If you have issues testing, let me
> know.

Hmm, seabios doesn't support ACPI table SLIT. We can specify node it for dimm
device, so I think we should support SLIT in seabios. Otherwise we may meet
the following kernel messages:
[  325.016769] init_memory_mapping: [mem 0x4000-0x5fff]
[  325.018060]  [mem 0x4000-0x5fff] page 2M
[  325.019168] [ea000100-ea00011f] potential offnode 
page_structs
[  325.024172] [ea000120-ea00013f] potential offnode 
page_structs
[  325.028596]  [ea000140-ea00017f] PMD -> 
[88003500-8800353f] on node 1
[  325.031775] [ea000160-ea00017f] potential offnode 
page_structs

Do you have plan to do it?

Thanks
Wen Congyang

> 
> thanks,
> 
> - Vasilis
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] HID: picolcd_fb: Use flush_delayed_work instead of flush_delayed_work_sync

2012-09-11 Thread Axel Lin

Seems this is a left over of commit 4382973
"workqueue: deprecate flush[_delayed]_work_sync()"

This fixes below build warning:

  CC [M]  drivers/hid/hid-picolcd_fb.o
drivers/hid/hid-picolcd_fb.c: In function 'picolcd_exit_framebuffer':
drivers/hid/hid-picolcd_fb.c:611:2: warning: 'flush_delayed_work_sync' is 
deprecated (declared at include/linux/workqueue.h:454) 
[-Wdeprecated-declarations]

Signed-off-by: Axel Lin 
Cc: Tejun Heo 
---
 drivers/hid/hid-picolcd_fb.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/hid/hid-picolcd_fb.c b/drivers/hid/hid-picolcd_fb.c
index 0008a51..eb00357 100644
--- a/drivers/hid/hid-picolcd_fb.c
+++ b/drivers/hid/hid-picolcd_fb.c
@@ -608,7 +608,7 @@ void picolcd_exit_framebuffer(struct picolcd_data *data)
/* make sure there is no running update - thus that fbdata->picolcd
 * once obtained under lock is guaranteed not to get free() under
 * the feet of the deferred work */
-   flush_delayed_work_sync(>deferred_work);
+   flush_delayed_work(>deferred_work);
 
data->fb_info = NULL;
unregister_framebuffer(info);
-- 
1.7.9.5



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT] Digital signature library bugfix

2012-09-11 Thread Linus Torvalds

On Wed, Sep 12, 2012 at 11:34 AM, James Morris  wrote:
>
> -   if (!err && len == hlen)
> -   err = memcmp(out2, h, hlen);
> +   if (err || len != hlen) {
> +   err = -EINVAL;
> +   goto err;
> +   }
> +
> +   err = memcmp(out2, h, hlen);
>
>  err:

Hmm. I'll pull, but this seems to drop the error return from
pkcs_1_v1_5_decode_emsa() and always replace it with -EINVAL.

Now, I didn't look, and maybe that's the only error that the decode
thing can return, but still, it looks bad.

Wouldn't it have been better to do instead

   if (err)
  goto err;
   err = -EINVAL;
   if (len != hlen)
  goto err;

and not overwrite the 'err' return with EINVAL?

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: manual merge of the usb tree with the usb.current tree

2012-09-11 Thread Stephen Rothwell

Hi Greg,

Today's linux-next merge of the usb tree got a conflict in
drivers/usb/dwc3/gadget.c between commit f4a53c55117b ("usb: dwc3:
gadget: fix pending isoc handling") from the usb.current tree and commit
348e026fafe2 ("usb: dwc3: gadget: Fix sparse warnings") from the usb tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc drivers/usb/dwc3/gadget.c
index c2813c2b,ba444e7..000
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@@ -1091,26 -1078,10 +1082,21 @@@ static int __dwc3_gadget_ep_queue(struc
 *
 */
if (dep->flags & DWC3_EP_PENDING_REQUEST) {
-   int ret;
- 
 +  /*
 +   * If xfernotready is already elapsed and it is a case
 +   * of isoc transfer, then issue END TRANSFER, so that
 +   * you can receive xfernotready again and can have
 +   * notion of current microframe.
 +   */
 +  if (usb_endpoint_xfer_isoc(dep->endpoint.desc)) {
 +  dwc3_stop_active_transfer(dwc, dep->number);
 +  return 0;
 +  }
 +
ret = __dwc3_gadget_kick_transfer(dep, 0, true);
-   if (ret && ret != -EBUSY) {
-   struct dwc3 *dwc = dep->dwc;
- 
+   if (ret && ret != -EBUSY)
dev_dbg(dwc->dev, "%s: failed to kick transfers\n",
dep->name);
-   }
}
  
/*


pgpsr2z9NzNBl.pgp
Description: PGP signature

linux-next: manual merge of the usb tree with the usb.current tree

2012-09-11 Thread Stephen Rothwell

Hi Greg,

Today's linux-next merge of the usb tree got a conflict in
drivers/usb/musb/tusb6010.c between commit ff41aaa3b6c1 ("usb: musb:
tusb6010: fix error path in tusb_probe()") from the usb.current tree and
commit 65b3d52d02a5 ("usb: musb: add musb_ida for multi instance
support") from the usb tree.

They both updated the same goto - I used the latter (no action is
required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpZ212e2Ga20.pgp
Description: PGP signature

hot-added cpu is not asiggned to the correct node

2012-09-11 Thread Yasuaki Ishimatsu

When I hot-added CPUs and memories simultaneously using container driver,
all the hot-added CPUs were mistakenly assigned to node0.

Accoding to my DSDT, hot-added CPUs and memorys have PXM#1. So in my system,
these devices should be assigned to node1 as follows:

--- Expected result
ls /sys/devices/system/node/node1/:
cpu16 cpu17 cpu18 cpu19 cpu20 cpu21 cpu22 cpu23 cpu24 cpu25 cpu26 cpu27
cpu28 cpu29 cpu30 cpu31 cpulist ... memory512 memory513 - 767 meminfo ...

=> hot-added CPUs and memorys are assigned to same node.
---

But in actuality, the CPUs were assigned to node0 and the memorys were assigned
to node1 as follows:

--- Actual result
ls /sys/devices/system/node/node0/:
cpu0 cpu1 cpu2 cpu3 cpu4 cpu5 cpu6 cpu7 cpu8 cpu9 cpu10 cpu11 cpu12 cpu13
cpu14 cpu15 cpu16 cpu17 cpu18 cpu19 cpu20 cpu21 cpu22 cpu23 cpu24 cpu25 cpu26
cpu27 cpu28 cpu29 cpu30 cpu31 cpulist ... memory1 memory2 - 255 meminfo ...

ls /sys/devices/system/node/node1/:
cpulist memory512 memory513 - 767 meminfo ...

=> hot-added CPUs are assinged to node0 and hot-added memorys are assigned to
   node1. CPUs and memorys has same PXM#. But assigned node is different.
---

In my investigation, "acpi_map_cpu2node()" causes the problem.

---
#arch/x86/kernel/acpi/boot.c"
static void __cpuinit acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
 {
 #ifdef CONFIG_ACPI_NUMA
   int nid;

   nid = acpi_get_node(handle);
   if (nid == -1 || !node_online(nid))
   return;
   set_apicid_to_node(physid, nid);
   numa_set_node(cpu, nid);
 #endif
 }
---

In my DSDT, CPUs were written ahead of memories, so CPUs were hot-added
before memories. Thus the system has memory-less-node temporarily .
In this case, "node_online()" fails. So the CPU is assigned to node 0.

When I wrote memories ahead of CPUs in DSDT, the CPUs were assigned to the
correct node. In current Linux, the CPUs were assigned to the correct node
or not depends on the order of hot-added resources in DSDT.

ACPI specification doesn't define the order of hot-added resources. So I think
the kernel should properly handle any DSDT conformable to its specification.

I'm thinking a solution about the problem, but I don't have any good idea...
Does anyone has opinion how we should treat it?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 0/2] Add UIO device supporting dynamic memory allocation

2012-09-11 Thread Damian Hobson-Garcia

Reposting: I realized that this series should have gone out to a broader list.
My apologies to those who those who will recieve a duplicate post. 

Hello all,

I've been using this UIO driver for allocation/deallocation
of memory regions through an IOMMU via the dma-mapping API, but
it seems that it would be more generally useful for userspace drivers
to access CMA memory regions. I don't know if it's useful to try to add
this functionality into the core uio driver or not, so for now I've kept
all dynamic memory handling in the specific device driver.

The number and size of the dynamically allocatable regions is defined
statically in the device platform data, and the actually memory is
allocated and deallocated when the device is opened/closed.

Details of the dynamically allocated regions are available from sysfs in
exactly the same was as for static regions. The total number of
dynamic and static regions combined cannot exceed MAX_UIO_MAPS.

Any comments, especially with regard to exposing the dma-mapping API to
userspace in this way, would be greatly appreciated.

Damian Hobson-Garcia (2):
  Add new uio device for dynamic memory allocation
  ARM: shmobile: sh7372: Change VPU UIO to uio_dmem_genirq

 arch/arm/mach-shmobile/setup-sh7372.c |   19 +-
 drivers/uio/Kconfig   |   16 ++
 drivers/uio/Makefile  |1 +
 drivers/uio/uio_dmem_genirq.c |  356 +
 include/linux/platform_data/uio_dmem_genirq.h |   26 ++
 5 files changed, 413 insertions(+), 5 deletions(-)
 create mode 100644 drivers/uio/uio_dmem_genirq.c
 create mode 100644 include/linux/platform_data/uio_dmem_genirq.h

-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 1/2] Add new uio device for dynamic memory allocation

2012-09-11 Thread Damian Hobson-Garcia

This device extends the uio_pdrv_genirq driver to provide limited
dynamic memory allocation for UIO devices.  This allows UIO devices
to use CMA and IOMMU allocated memory regions. This driver is based
on the uio_pdrv_genirq driver and provides the same generic interrupt
handling capabilities.  Like uio_prdv_genirq,
a fixed number of memory regions, defined in the platform device's
.resources field are exported to userpace. This driver adds the ability
to export additional regions whose number and size are known at boot time,
but whose memory is not allocated until the uio device file is opened for
the first time.  When the device file is closed, the allocated memory block
is freed.  Physical (DMA) addresses for the dynamic regions are provided to
the userspace via /sys/class/uio/uioN/maps/mapM/addr in the same way as
static addresses are when the uio device file is open, when no processes
are holding the device file open, the address returned to userspace is
DMA_ERROR_CODE.

Signed-off-by: Damian Hobson-Garcia 
---
 drivers/uio/Kconfig   |   16 ++
 drivers/uio/Makefile  |1 +
 drivers/uio/uio_dmem_genirq.c |  356 +
 include/linux/platform_data/uio_dmem_genirq.h |   26 ++
 4 files changed, 399 insertions(+), 0 deletions(-)
 create mode 100644 drivers/uio/uio_dmem_genirq.c
 create mode 100644 include/linux/platform_data/uio_dmem_genirq.h

diff --git a/drivers/uio/Kconfig b/drivers/uio/Kconfig
index 6f3ea9b..ee4226b 100644
--- a/drivers/uio/Kconfig
+++ b/drivers/uio/Kconfig
@@ -44,6 +44,22 @@ config UIO_PDRV_GENIRQ
 
  If you don't know what to do here, say N.
 
+config UIO_DMEM_GENIRQ
+   tristate "Userspace platform driver with generic irq and dynamic memory"
+   help
+ Platform driver for Userspace I/O devices, including generic
+ interrupt handling code. Shared interrupts are not supported.
+
+ Memory regions can be specified with the same platform device
+ resources as the UIO_PDRV drivers, but dynamic regions can also
+ be specified.
+ The number and size of these regions is static,
+ but the memory allocation is not performed until
+ the associated device file is opened. The
+ memory is freed once the uio device is closed.
+
+ If you don't know what to do here, say N.
+
 config UIO_AEC
tristate "AEC video timestamp device"
depends on PCI
diff --git a/drivers/uio/Makefile b/drivers/uio/Makefile
index d4dd9a5..b354c53 100644
--- a/drivers/uio/Makefile
+++ b/drivers/uio/Makefile
@@ -2,6 +2,7 @@ obj-$(CONFIG_UIO)   += uio.o
 obj-$(CONFIG_UIO_CIF)  += uio_cif.o
 obj-$(CONFIG_UIO_PDRV) += uio_pdrv.o
 obj-$(CONFIG_UIO_PDRV_GENIRQ)  += uio_pdrv_genirq.o
+obj-$(CONFIG_UIO_DMEM_GENIRQ)  += uio_dmem_genirq.o
 obj-$(CONFIG_UIO_AEC)  += uio_aec.o
 obj-$(CONFIG_UIO_SERCOS3)  += uio_sercos3.o
 obj-$(CONFIG_UIO_PCI_GENERIC)  += uio_pci_generic.o
diff --git a/drivers/uio/uio_dmem_genirq.c b/drivers/uio/uio_dmem_genirq.c
new file mode 100644
index 000..ef3e0fd
--- /dev/null
+++ b/drivers/uio/uio_dmem_genirq.c
@@ -0,0 +1,354 @@
+/*
+ * drivers/uio/uio_dmem_genirq.c
+ *
+ * Userspace I/O platform driver with generic IRQ handling code.
+ *
+ * Copyright (C) 2012 Damian Hobson-Garcia
+ *
+ * Based on uio_pdrv_genirq.c by Magnus Damm
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#define DRIVER_NAME "uio_dmem_genirq"
+
+struct uio_dmem_genirq_platdata {
+   struct uio_info *uioinfo;
+   spinlock_t lock;
+   unsigned long flags;
+   struct platform_device *pdev;
+   unsigned int dmem_region_start;
+   unsigned int num_dmem_regions;
+   struct mutex alloc_lock;
+   unsigned int refcnt;
+};
+
+static int uio_dmem_genirq_open(struct uio_info *info, struct inode *inode)
+{
+   struct uio_dmem_genirq_platdata *priv = info->priv;
+   struct uio_mem *uiomem;
+   int ret = 0;
+
+   uiomem = >uioinfo->mem[priv->dmem_region_start];
+
+   mutex_lock(>alloc_lock);
+   while (!priv->refcnt && uiomem < >uioinfo->mem[MAX_UIO_MAPS]) {
+   void *addr;
+   if (!uiomem->size)
+   break;
+
+   addr = dma_alloc_coherent(>pdev->dev, uiomem->size,
+   (dma_addr_t *)>addr, GFP_KERNEL);
+   if (!addr) {
+   ret = -ENOMEM;
+   break;
+   }
+
+   uiomem->internal_addr = addr;
+   ++uiomem;
+   }
+   priv->refcnt++;
+
+   mutex_unlock(>alloc_lock);
+   /* Wait until the Runtime PM code has

[RFC PATCH 2/2] ARM: shmobile: sh7372: Change VPU UIO to uio_dmem_genirq

2012-09-11 Thread Damian Hobson-Garcia

This allows the VPU memory to be allocated dynamically only when it
is needed.

Signed-off-by: Damian Hobson-Garcia 
---
 arch/arm/mach-shmobile/setup-sh7372.c |   19 ++-
 1 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/arch/arm/mach-shmobile/setup-sh7372.c 
b/arch/arm/mach-shmobile/setup-sh7372.c
index 1106b4f..fe75701 100644
--- a/arch/arm/mach-shmobile/setup-sh7372.c
+++ b/arch/arm/mach-shmobile/setup-sh7372.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -761,11 +762,19 @@ static struct platform_device usb_dma1_device = {
},
 };
 
+static unsigned int region_sizes[] = {
+   (32 << 20),
+};
+
 /* VPU */
-static struct uio_info vpu_platform_data = {
-   .name = "VPU5HG",
-   .version = "0",
-   .irq = intcs_evt2irq(0x980),
+static struct uio_dmem_genirq_pdata vpu_platform_data = {
+   .uioinfo = {
+   .name = "VPU5HG",
+   .version = "0",
+   .irq = intcs_evt2irq(0x980),
+   },
+   .dynamic_region_sizes   = region_sizes,
+   .num_dynamic_regions= ARRAY_SIZE(region_sizes),
 };
 
 static struct resource vpu_resources[] = {
@@ -778,7 +787,7 @@ static struct resource vpu_resources[] = {
 };
 
 static struct platform_device vpu_device = {
-   .name   = "uio_pdrv_genirq",
+   .name   = "uio_dmem_genirq",
.id = 0,
.dev = {
.platform_data  = _platform_data,
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Iptables log-level does not work with kernel 3.6-rc

2012-09-11 Thread auto75914331

This rule

$IPTABLES -A RULE_0_in  -j LOG  --log-level notice --log-prefix "DENY  in: "


result with linux 3.6-rc5

Sep 12 06:37:29 x kernel: <5>DENY  in: IN=eth0 OUT= MAC=...


result with linux 3.5.3 and older:

Sep  9 10:43:01 x kernel: DENY  in: IN=eth0 OUT= MAC..


Thanks

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] gpio: pcf857x: fixup smatch WARNING

2012-09-11 Thread Kuninori Morimoto

6e20a0a429bd4dc07d6de16d9c247270e04e4aa0
(gpio: pcf857x: enable gpio_to_irq() support)
added new smatch warnings

drivers/gpio/gpio-pcf857x.c:288 pcf857x_probe() error: we previously \
assumed 'pdata' could be null (see line 277)
drivers/gpio/gpio-pcf857x.c:364 pcf857x_probe() warn: variable dereferenced\
 before check 'pdata' (see line 292)
drivers/gpio/gpio-pcf857x.c:421 pcf857x_remove() error: we previously\
 assumed 'pdata' could be null (see line 410)

This patch fixes it

Reported-by: Fengguang Wu 
Signed-off-by: Kuninori Morimoto 
---
 drivers/gpio/gpio-pcf857x.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpio/gpio-pcf857x.c b/drivers/gpio/gpio-pcf857x.c
index 12e3e48..16af35c 100644
--- a/drivers/gpio/gpio-pcf857x.c
+++ b/drivers/gpio/gpio-pcf857x.c
@@ -285,7 +285,7 @@ static int pcf857x_probe(struct i2c_client *client,
gpio->chip.ngpio= id->driver_data;
 
/* enable gpio_to_irq() if platform has settings */
-   if (pdata->irq) {
+   if (pdata && pdata->irq) {
status = pcf857x_irq_domain_init(gpio, pdata, >dev);
if (status < 0) {
dev_err(>dev, "irq_domain init failed\n");
@@ -394,7 +394,7 @@ fail:
dev_dbg(>dev, "probe error %d for '%s'\n",
status, client->name);
 
-   if (pdata->irq)
+   if (pdata && pdata->irq)
pcf857x_irq_domain_cleanup(gpio);
 
kfree(gpio);
@@ -418,7 +418,7 @@ static int pcf857x_remove(struct i2c_client *client)
}
}
 
-   if (pdata->irq)
+   if (pdata && pdata->irq)
pcf857x_irq_domain_cleanup(gpio);
 
status = gpiochip_remove(>chip);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: iwl3945: order 5 allocation during ifconfig up; vm problem?

2012-09-11 Thread Eric Dumazet

On Tue, 2012-09-11 at 16:25 -0700, Andrew Morton wrote:

> Asking for a 256k allocation is pretty crazy - this is an operating
> system kernel, not a userspace application.
> 
> I'm wondering if this is due to a recent change, but I'm having trouble
> working out where the allocation call site is.
> --

(Adding Marc Merlin to CC, since he reported same problem)

Thats the firmware loading in iwlwifi driver. Not sure if it can use SG.

drivers/net/wireless/iwlwifi/iwl-drv.c

iwl_alloc_ucode() -> iwl_alloc_fw_desc() -> dma_alloc_coherent()

It seems some sections of /lib/firmware/iwlwifi*.ucode files are above
128 Kbytes, so dma_alloc_coherent() try order-5 allocations


# ls -l /lib/firmware/iwlwifi*.ucode
-rw-r--r-- 1 root root 335056 2012-01-23 18:20 
/lib/firmware/iwlwifi-1000-3.ucode
-rw-r--r-- 1 root root 337520 2012-01-23 18:20 
/lib/firmware/iwlwifi-1000-5.ucode
-rw-r--r-- 1 root root 689680 2012-01-24 19:18 /lib/firmware/iwlwifi-105-6.ucode
-rw-r--r-- 1 root root 701228 2012-01-24 19:18 /lib/firmware/iwlwifi-135-6.ucode
-rw-r--r-- 1 root root 695876 2012-01-24 19:19 
/lib/firmware/iwlwifi-2000-6.ucode
-rw-r--r-- 1 root root 707392 2012-01-24 19:19 
/lib/firmware/iwlwifi-2030-6.ucode
-rw-r--r-- 1 root root 150100 2012-01-23 18:20 
/lib/firmware/iwlwifi-3945-2.ucode
-rw-r--r-- 1 root root 187972 2012-01-23 18:20 
/lib/firmware/iwlwifi-4965-2.ucode
-rw-r--r-- 1 root root 345008 2012-01-23 18:20 
/lib/firmware/iwlwifi-5000-1.ucode
-rw-r--r-- 1 root root 353240 2012-01-23 18:20 
/lib/firmware/iwlwifi-5000-2.ucode
-rw-r--r-- 1 root root 340696 2012-01-23 18:21 
/lib/firmware/iwlwifi-5000-5.ucode
-rw-r--r-- 1 root root 337400 2012-01-23 18:20 
/lib/firmware/iwlwifi-5150-2.ucode
-rw-r--r-- 1 root root 462280 2012-01-24 19:20 
/lib/firmware/iwlwifi-6000-4.ucode
-rw-r--r-- 1 root root 444128 2012-01-24 19:20 
/lib/firmware/iwlwifi-6000g2a-5.ucode
-rw-r--r-- 1 root root 460912 2012-01-24 19:20 
/lib/firmware/iwlwifi-6000g2b-5.ucode
-rw-r--r-- 1 root root 679436 2012-01-24 19:19 
/lib/firmware/iwlwifi-6000g2b-6.ucode
-rw-r--r-- 1 root root 463692 2012-01-23 18:20 
/lib/firmware/iwlwifi-6050-4.ucode
-rw-r--r-- 1 root root 469780 2012-01-23 18:20 
/lib/firmware/iwlwifi-6050-5.ucode


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: runtime PM and special power switches

2012-09-11 Thread Dave Airlie

On Wed, Sep 12, 2012 at 8:58 AM, Rafael J. Wysocki  wrote:
> On Wednesday, September 12, 2012, Dave Airlie wrote:
>> On Wed, Sep 12, 2012 at 7:32 AM, Alan Stern  
>> wrote:
>> > On Tue, 11 Sep 2012, Rafael J. Wysocki wrote:
>> >
>> >> Hi,
>> >>
>> >> On Tuesday, September 11, 2012, Dave Airlie wrote:
>> >> > Hi Rafael,
>> >> >
>> >> > I've been investigating runtime PM support for some use-cases on GPUs.
>> >> >
>> >> > In some laptops we have a secondary GPU (optimus) that can be powered
>> >> > up for certain 3D tasks and then turned off when finished with. Now I
>> >> > did an initial pass on supporting it without using the kernel runtime
>> >> > PM stuff, but Alan said I should take a look so here I am.
>> >>
>> >> Alan Stern or Alan Cox? :-)
>> >>
>> >> > While I've started to get a handle on things, we have a bit of an
>> >> > extra that I'm not sure we cater for.
>> >> >
>> >> > Currently we get called from the PCI layer which after we are finished
>> >> > with our runtime suspend callback, will go put the device into the
>> >> > correct state etc, however on these optimus/powerxpress laptops we
>> >> > have a separate ACPI or platform driver controlled power switch that
>> >> > we need to call once the PCI layer is finished the job. This switch
>> >> > effectively turns the power to the card completely off leaving it
>> >> > drawing no power.
>> >> >
>> >> > No we can't hit the switch from the driver callback as the PCI layer
>> >> > will get lost, so I'm wondering how you'd envisage we could plug this
>> >> > in.
>> >>
>> >> Hmm.  In principle we might modify pci_pm_runtime_suspend() so that it
>> >> doesn't call pci_finish_runtime_suspend() if pci_dev->state_saved is
>> >> set.  That would actually make it work in analogy with 
>> >> pci_pm_suspend_noirq(),
>> >> so perhaps it's not even too dangerous.
>> >
>> > This sounds more like a job for a power domain.  Unless the power
>> > switch is already in the device hierarchy as a parent to the PCI
>> > device.
>>
>> I'll have to investigate power domains then,
>>
>> The switch is hidden in many different places, one some laptops its in
>> a ACPI _DSM on one GPU, on others its in an ACPI _DSM on the other
>> one, in some its in a different ACPI _DSM, then we have it in the ACPI
>> ATPX method on others, and finally Apple have it in a piece of hw that
>> isn't just on the LPC bus or somewhere like that.
>>
>> Currently we just hide it all inside vga_switcheroo and I'd just need
>> an interface to call that once the layers have stopped poking
>> registers in PCI config space, if we could fix PCI runtime suspend so
>> the driver was the last to g 2et called then that would also not suck.
>
> Well, as I said, we may try to change the PCI layer so that it doesn't
> access the device any more in pci_pm_runtime_suspend() if it sees that
> pci_dev->state_saved has been set by the driver's callback.  Then,
> your drivers would only need to set pci_dev->state_saved in their
> .runtime_suspend() callbacks.
>

Actually it appears I'll need this, I'd forgotten things are a bit
messier than I thought

So there are two variants on the _DSM for nvidia dual-gpu machines,
the older pre-optimus _DSM requires
an explicit power off call post-D3, however for optimus _DSM the D3
transition will flick the power switch, however
the pci code then goes and seem to turn the device back to D0 for some
reason. So yes after save state,
I'd really appreciate if it the pci layer would stop poking my device.

> Alternatively, which may be less hackish but more work, you can set the
> pm_domain pointer in the device structure to a struct dev_pm_domain whose
> ops will just call the corresponding bus type's ops except for
> .runtime_suspend() that will execute the additional ACPI stuff after calling
> the bus type's method.

I've mostly written this, and it seems to work, I've jsut set a
pm_domain in the vga switcheroo code that copies the dev->bus->pm
into a private structure. I'll need this for the old nvidia and radeon
poweroffs.

Dave.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] fs: prevent double-free on an error path in core dumper

2012-09-11 Thread Cong Wang

On Tue, 2012-09-11 at 17:59 +0200, Oleg Nesterov wrote:
> But, again, the caller does free_note_info(), so why
> elf_note_info_init()
> tries to handle the kmalloc failures? Afaics, we can simplify the code
> and fix the bug.
> 
> What do you think about the patch below? 

Looks reasonable and neat. :)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: manual merge of the workqueues tree with the omap_dss2 tree

2012-09-11 Thread Stephen Rothwell

Hi Tejun,

Today's linux-next merge of the workqueues tree got a conflict in
drivers/video/omap2/displays/panel-taal.c between commit 5e56ad44b4d2
("OMAPDSS: Taal: use devm_* functions") from the omap_dss2 tree and
commit 203b42f73174 ("workqueue: make deferrable delayed_work initializer
names consistent") from the workqueues tree.

I fixed it up (see below) and can carry the fix as necessary (no action 
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc drivers/video/omap2/displays/panel-taal.c
index 4cf9416,6b5e6e0..000
--- a/drivers/video/omap2/displays/panel-taal.c
+++ b/drivers/video/omap2/displays/panel-taal.c
@@@ -925,50 -900,29 +925,50 @@@ static int taal_probe(struct omap_dss_d
  
atomic_set(>do_update, 0);
  
 -  td->workqueue = create_singlethread_workqueue("taal_esd");
 -  if (td->workqueue == NULL) {
 -  dev_err(>dev, "can't create ESD workqueue\n");
 -  r = -ENOMEM;
 -  goto err_wq;
 +  if (gpio_is_valid(td->reset_gpio)) {
 +  r = devm_gpio_request_one(>dev, td->reset_gpio,
 +  GPIOF_OUT_INIT_LOW, "taal rst");
 +  if (r) {
 +  dev_err(>dev, "failed to request reset gpio\n");
 +  return r;
 +  }
}
 -  INIT_DEFERRABLE_WORK(>esd_work, taal_esd_work);
 -  INIT_DELAYED_WORK(>ulps_work, taal_ulps_work);
  
 -  dev_set_drvdata(>dev, td);
 +  if (gpio_is_valid(td->ext_te_gpio)) {
 +  r = devm_gpio_request_one(>dev, td->ext_te_gpio,
 +  GPIOF_IN, "taal irq");
 +  if (r) {
 +  dev_err(>dev, "GPIO request failed\n");
 +  return r;
 +  }
 +
 +  r = devm_request_irq(>dev, gpio_to_irq(td->ext_te_gpio),
 +  taal_te_isr,
 +  IRQF_TRIGGER_RISING,
 +  "taal vsync", dssdev);
  
 -  if (gpio_is_valid(panel_data->reset_gpio)) {
 -  r = gpio_request_one(panel_data->reset_gpio, GPIOF_OUT_INIT_LOW,
 -  "taal rst");
if (r) {
 -  dev_err(>dev, "failed to request reset gpio\n");
 -  goto err_rst_gpio;
 +  dev_err(>dev, "IRQ request failed\n");
 +  return r;
}
 +
-   INIT_DELAYED_WORK_DEFERRABLE(>te_timeout_work,
++  INIT_DEFERRABLE_WORK(>te_timeout_work,
 +  taal_te_timeout_work_callback);
 +
 +  dev_dbg(>dev, "Using GPIO TE\n");
}
  
 +  td->workqueue = create_singlethread_workqueue("taal_esd");
 +  if (td->workqueue == NULL) {
 +  dev_err(>dev, "can't create ESD workqueue\n");
 +  return -ENOMEM;
 +  }
-   INIT_DELAYED_WORK_DEFERRABLE(>esd_work, taal_esd_work);
++  INIT_DEFERRABLE_WORK(>esd_work, taal_esd_work);
 +  INIT_DELAYED_WORK(>ulps_work, taal_ulps_work);
 +
taal_hw_reset(dssdev);
  
 -  if (panel_data->use_dsi_backlight) {
 +  if (td->use_dsi_backlight) {
memset(, 0, sizeof(struct backlight_properties));
props.max_brightness = 255;
  


pgpxXH3esAERV.pgp
Description: PGP signature

[PATCH] kbuild: setlocalversion: ignore private tags while reporting local version

2012-09-11 Thread Tushar Behera

The output 'git describe' is relative to the immediate preceding tag.
When the immediate tag preceding the HEAD is a private tag,
setlocalversion extracts information with respect to the private tag and
wrongly reports them with respect to Linux tag.

Fixing this to extract information with respect to Linux tags.

CC: Michal Marek 
Signed-off-by: Tushar Behera 
---
 scripts/setlocalversion |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/scripts/setlocalversion b/scripts/setlocalversion
index bd6dca8..d2c857c 100755
--- a/scripts/setlocalversion
+++ b/scripts/setlocalversion
@@ -57,7 +57,8 @@ scm_version()
fi
# If we are past a tagged commit (like
# "v2.6.30-rc5-302-g72357d5"), we pretty print it.
-   if atag="`git describe 2>/dev/null`"; then
+   # Also match linux tags pattern to discard private tags
+   if atag="`git describe --match v[2-9].* 2>/dev/null`"; 
then
echo "$atag" | awk -F- '{printf("-%05d-%s", 
$(NF-1),$(NF))}'
 
# If we don't have a tag at all we print -g{commitish}.
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[3.5.0 BUG] vmx_handle_exit: unexpected, valid vectoring info (0x80000b0e)

2012-09-11 Thread Fengguang Wu

Hi,

3 of my test boxes running v3.5 kernel become unaccessible and I find
two of them kept emitting this dmesg:

vmx_handle_exit: unexpected, valid vectoring info (0x8b0e) and exit reason 
is 0x31

The other one has froze and the above lines are the last dmesg.
Any ideas?

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] tty/serial/kgdboc: Add and wire up clear_irqs callback

2012-09-11 Thread Colin Cross

On Tue, Sep 11, 2012 at 9:06 PM, Anton Vorontsov
 wrote:
> On Tue, Sep 11, 2012 at 08:42:46PM -0700, Colin Cross wrote:
> [...]
>> > The "problem" is in the last step. If we exit NMI without making UART
>> > know that we're done with the interrupt, we will reenter the NMI
>> > immediately, even without any new characters from the UART.
>>
>> The UART irq line should go low when you read the character out of the
>
> Probably some controllers may lower the line by themselves, but not
> all, and probably most of them need an explicit clear.

Anything 8250-based will clear the interrupt automatically, assuming
you read the status registers as well as the character register.

>> receive buffer, or the polling rx function should clear the interrupt
>> for you.
>
> Yes, that's an option. But that way we add a new semantic for the
> polling routines, and effecitvely we just merge the two callbacks.
>
> Of course, if Alan is OK with this, I'm more than OK too. :-)
>
> (But the polling routines would need to clear all interrupts, not
> just rx/tx. For example, if the controller indicated some error, and
> nobody clears it, then we'll start reentering infinitely.)

For exynos5, the only non-8250 based serial port I've come across, we
clear all interrupts in the rx poll function (see
https://android.googlesource.com/kernel/exynos/+/ef427aafffb7153dde59745e440fd7ec41ea969d/arch/arm/mach-exynos/exynos_fiq_debugger.c).

>> If you use a clear_irqs callback, you can drop characters if
>> one arrives between the last character buffer read and calling
>> clear_irqs.
>
> Only if we call clear_irqs() after reading the characters, but we do
> it before. So if new characters are available, we will reenter NMI,
> which is OK.
>
> But if used incorrectly, it truly can cause dropping (or staling) of
> characters, so I'd better add some comments about this.

What does clear_irqs() mean for a status or tx interrupt?  The tx
interrupt will generally re-assert as long as the tx fifo is empty,
which would require disabling it.  On 8250 ports, status interrupts
will re-assert until the corresponding status register is read.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: manual merge of the kvm tree with Linus' tree

2012-09-11 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the kvm tree got a conflict in
arch/x86/kvm/i8259.c between commit 749c59fd15b2 ("KVM: PIC: fix use of
uninitialised variable") from Linus' tree and commit ec798660cf72 ("KVM:
cleanup pic reset") from the kvm tree.

The latter removed the code fixed by the former, so I just did that (no
action is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpYQdfTy9k4Z.pgp
Description: PGP signature

Re: Storage related regression in linux-next 20120824

2012-09-11 Thread Zheng Liu

On Sun, Sep 09, 2012 at 04:50:47PM -0400, Jeff Garzik wrote:
> On 09/09/2012 04:36 PM, Hugh Dickins wrote:
> >On Sun, Sep 9, 2012 at 1:28 PM, Jeff Garzik  wrote:
> >>I'll send Linus a patch to disable.
> >
> >Thanks, but no, the change in question hasn't reached Linus yet, it's
> >just a linux-next or mmotm thing - isn't it?
> 
> Yep, libata-dev#upstream.  Sorry, got my own branches confused for a
> second there.

Sorry for delay reply.  In our product system, we tested a lot of
different brands of SATA disks, and it is OK after fua is enabled on
default.  Certainly we can't test all disks. :-(

So I think we quite have to revert my patch now, and try to fix the
current problem.

Regards,
Zheng
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Resend][PATCH V3] trace,x86: add x86 irq vector tracepoints

2012-09-11 Thread H. Peter Anvin


On 09/11/2012 05:00 PM, Seiji Aguchi wrote:

Thomas,

Please review my patch as we talked in Plumbers.



Is there any measurable latency added here?  These are some of the most 
performance- (or at least latency-)critical paths in the kernel.


-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] module: add syscall to load module from fd

2012-09-11 Thread H. Peter Anvin


On 09/06/2012 11:13 AM, Kees Cook wrote:

Instead of (or in addition to) kernel module signing, being able to reason
about the origin of a kernel module would be valuable in situations
where an OS already trusts a specific file system, file, etc, due to
things like security labels or an existing root of trust to a partition
through things like dm-verity.

This introduces a new syscall (currently only on x86), similar to
init_module, that has only two arguments. The first argument is used as
a file descriptor to the module and the second argument is a pointer to
the NULL terminated string of module arguments.



Please use the standard naming convention, which is an f- prefix (i.e. 
finit_module()).


-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] tty/serial/kgdboc: Add and wire up clear_irqs callback

2012-09-11 Thread Anton Vorontsov

On Tue, Sep 11, 2012 at 08:42:46PM -0700, Colin Cross wrote:
[...]
> > The "problem" is in the last step. If we exit NMI without making UART
> > know that we're done with the interrupt, we will reenter the NMI
> > immediately, even without any new characters from the UART.
> 
> The UART irq line should go low when you read the character out of the

Probably some controllers may lower the line by themselves, but not
all, and probably most of them need an explicit clear.

> receive buffer, or the polling rx function should clear the interrupt
> for you.

Yes, that's an option. But that way we add a new semantic for the
polling routines, and effecitvely we just merge the two callbacks.

Of course, if Alan is OK with this, I'm more than OK too. :-)

(But the polling routines would need to clear all interrupts, not
just rx/tx. For example, if the controller indicated some error, and
nobody clears it, then we'll start reentering infinitely.)

> If you use a clear_irqs callback, you can drop characters if
> one arrives between the last character buffer read and calling
> clear_irqs.

Only if we call clear_irqs() after reading the characters, but we do
it before. So if new characters are available, we will reenter NMI,
which is OK.

But if used incorrectly, it truly can cause dropping (or staling) of
characters, so I'd better add some comments about this.

Thanks!

Anton.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] tracing,x86: add a TSC trace_clock; reset buffer on clock change

2012-09-11 Thread Steven Rostedt

On Tue, 2012-09-11 at 19:41 -0700, David Sharp wrote:
> In order to promote interoperability between userspace tracers and ftrace,
> add a trace_clock that reports raw TSC values which will then be recorded
> in the ring buffer. Userspace tracers that also record TSCs are then on
> exactly the same time base as the kernel and events can be unambiguously
> interlaced.
> 
> Tested: Enabled a tracepoint and the "tsc" trace_clock and saw very large
> timestamp values.
> 
> Google-Bug-Id: 6980623
> Signed-off-by: David Sharp 
> ---
>  include/linux/trace_clock.h |3 +++
>  kernel/trace/trace.c|3 +++
>  kernel/trace/trace_clock.c  |   16 
>  3 files changed, 22 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/trace_clock.h b/include/linux/trace_clock.h
> index 4eb4902..b86c7363 100644
> --- a/include/linux/trace_clock.h
> +++ b/include/linux/trace_clock.h
> @@ -16,5 +16,8 @@ extern u64 notrace trace_clock_local(void);
>  extern u64 notrace trace_clock(void);
>  extern u64 notrace trace_clock_global(void);
>  extern u64 notrace trace_clock_counter(void);
> +#ifdef CONFIG_X86_TSC
> +extern u64 notrace trace_clock_tsc(void);
> +#endif
>  
>  #endif /* _LINUX_TRACE_CLOCK_H */
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 5c38c81..dc1f1fa 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -480,6 +480,9 @@ static struct {
>   { trace_clock_local,"local" },
>   { trace_clock_global,   "global" },
>   { trace_clock_counter,  "counter" },
> +#ifdef CONFIG_X86_TSC
> + { trace_clock_tsc,  "tsc" },
> +#endif
>  };

I really hate adding arch defs in generic code. Perhaps what we could do
is add a define here. Something like:

#ifndef ARCH_TRACE_CLOCKS
# define ARCH_TRACE_CLOCKS
#endif

[...]
{ trace_clock_counter, "counter" },
ARCH_TRACE_CLOCKS
};

and have ARCH_TRACE_CLOCKS defined somewhere in an arch specific header.
Not sure what header we could use though :-/

That is, in a header have:

#define ARCH_TRACE_CLOCKS \
{ trace_clock_x86_tsc,  "tsc" },

and also define trace_clock_x86_tsc in arch/x86/kernel...

-- Steve


>  
>  int trace_clock_id;
> diff --git a/kernel/trace/trace_clock.c b/kernel/trace/trace_clock.c
> index 3947835..1770737 100644
> --- a/kernel/trace/trace_clock.c
> +++ b/kernel/trace/trace_clock.c
> @@ -125,3 +125,19 @@ u64 notrace trace_clock_counter(void)
>  {
>   return atomic64_add_return(1, _counter);
>  }
> +
> +#ifdef CONFIG_X86_TSC
> +/*
> + * trace_clock_tsc(): A clock that is just the cycle counter.
> + *
> + * Unlike the other clocks, this is not in nanoseconds.
> + */
> +u64 notrace trace_clock_tsc(void)
> +{
> + u64 ret;
> + rdtsc_barrier();
> + rdtscll(ret);
> +
> + return ret;
> +}
> +#endif


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: manual merge of the trivial tree with the mfd tree

2012-09-11 Thread Stephen Rothwell

Hi Jiri,

Today's linux-next merge of the trivial tree got a conflict in
drivers/video/backlight/88pm860x_bl.c between commit a6ccdcd98c39 ("mfd:
88pm860x: Use REG resource for backlight") from the mfd tree and commit
e1c9ac420ef1 ("Revert "backlight: fix memory leak on obscure error
path"") from the trivial tree.

I just used the version from the mfd tree and can carry the fix as
necessary (no action is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpBi7ZQl3ix5.pgp
Description: PGP signature

Re: [RFC] tty/serial/kgdboc: Add and wire up clear_irqs callback

2012-09-11 Thread Colin Cross

On Tue, Sep 11, 2012 at 8:32 PM, Anton Vorontsov
 wrote:
> On Tue, Sep 11, 2012 at 03:15:40PM +0100, Alan Cox wrote:
>> Anton Vorontsov  wrote:
>> > This patch implements a new callback: clear_irqs. It is used for the
>>
>> This bit I still really don't like. I would like to know what the generic
>> IRQ folks thing about it and if Thomas Gleixner has any brilliant ideas
>> here. I don't think its a show stopper it would just be nice if there was
>> a better solution first.
>
> Yup, good idea, Cc'ing.
>
> Hello Thomas,
>
> We're dissussing a patch that adds a clear_irq callback into UART
> drivers. For convenience, the particular patch is inlined at the end of
> this email. The rationale and the background for the whole thing can be
> found here: http://lkml.org/lkml/2012/9/10/2
>
> So, just for visual clearness, and for the fun of it, here is some
> glorious ascii art of what we have:
>
>  ,---NMI-|`|
> UART_IRQ---INT_controller| CPU |
>  `---IRQ-|,|
>
> Pretty much standard scheme. That is, on the interrupt controller level
> we can reroute any IRQ to NMI, and back in 2008 folks at Google found
> that rerouting the UART IRQ to NMI brings some really cool features: we
> can have a very reliable and powerful debugger pretty much on every
> embedded machine, and without loosing the UART/console port itself.
>
> So, it works like this:
>
> - At boot time, Linux arch/ code reroutes UART IRQ to NMI, we connect
>   the port to the KGDBoC, and so forth...;
> - User starts typing on the serial port;
> - UART raises its IRQ line;
> - Through the controller, one of CPUs gets an NMI;
> - In NMI context, CPU reads a character from UART;
> - Then it checks if the character resulted into the special 'enter
>   KGDB' magic sequence:
> - If yes, then the CPU invites other CPUs into the KGDB, and thus
>   kernel enters the debugger;
> - If the character wasn't part of the magic command (or the sequence is
>   yet incomplete), then CPU exits NMI and continues as normal.
>
> The "problem" is in the last step. If we exit NMI without making UART
> know that we're done with the interrupt, we will reenter the NMI
> immediately, even without any new characters from the UART.

The UART irq line should go low when you read the character out of the
receive buffer, or the polling rx function should clear the interrupt
for you.  If you use a clear_irqs callback, you can drop characters if
one arrives between the last character buffer read and calling
clear_irqs.

> The obvious solution would be to "mask/reroute NMI at INT_controller
> level or queue serial port's IRQ routine from somewhere, e.g. a tasklet
> or software raised IRQ". Unfortunately, this means that we have to keep
> NMI disabled for this long:
>
> 1. We exit the NMI context with NMI source disabled/rerouted;
> 2. CPU continues to execute the kernel;
> 3. Kernel receives a timer interrupt, or software-raised interrupt, or
>UART IRQ, which was temporary rerouted back to normal interrupts;
> 4. It executes normal IRQ-entry, tracing, lots of other stuff,
>interrupt handlers, softirq handlers, and thus we clear the UART
>interrupt;
> 5. Once the UART is cleared, we reenable NMI (in the arch-code, we can
>do that in our do_IRQ());
>
> As you can see, with this solution the NMI debugger will be effectively
> disabled from 1. to 5., thus shall the hang happen in this sensitive
> code, we would no longer able to debug it.
>
> And this defeats the main purpose of the NMI debugger: we must keep NMI
> enabled all the time when we're not in the debugger, the NMI debugger
> is always available (unless the debugger crashed :-)
>
> That's why I came with the clear_irq callback in the serial drivers
> that we call from the NMI context, it's much simpler and keeps the
> debugger robust. So, personally I too can't think of any other
> plausible solution that would keep all the features intact.
>
>
> Thanks,
>
> Anton.
>
>
> - - - -
> [PATCH] tty/serial/kgdboc: Add and wire up clear_irqs callback
>
> This patch implements a new callback: clear_irqs. It is used for the
> cases when KDB-entry (e.g. NMI) and KDB IO (e.g. serial port) shares the
> same interrupt. To get the idea, let's take some real example (ARM
> machine): we have a serial port which interrupt is routed to an NMI, and
> the interrupt is used to enter KDB. Once there is some activity on the
> serial port, the CPU receives NMI exception, and we fall into KDB shell.
> So, it is our "debug console", and it is able to interrupt (and thus
> debug) even IRQ handlers themselves.
>
> When used that way, the interrupt never reaches serial driver's IRQ
> handler routine, which means that serial driver will not silence the
> interrupt. NMIs behaviour are quite arch-specific, and we can't assume
> that we can use them as ordinary IRQs, e.g. on some arches (like ARM) we
> can't handle data aborts, the behaviour is undefined then. So we can't
> just handle

[GIT] Digital signature library bugfix

2012-09-11 Thread James Morris

Please apply for v3.6.

The following changes since commit 0bd1189e239c76eb3a50e458548fbe7e4a5dfff1:
  Linus Torvalds (1):
Merge branch 'for-3.6-fixes' of git://git.kernel.org/.../tj/wq

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git 
for-linus

Dmitry Kasatkin (1):
  digsig: add hash size comparision on signature verification

 lib/digsig.c |8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

commit 83e7c8fb4347186f6723f4c7d176999becbb3830
Author: Dmitry Kasatkin 
Date:   Thu Sep 6 01:06:49 2012 +0300

digsig: add hash size comparision on signature verification

Commit b35e286a640f31d619a637332972498b51f3fd90 introduced the bug.

When pkcs_1_v1_5_decode_emsa() returns without error and hash sizes do not
match, hash comparision is not done and digsig_verify_rsa() returns no 
error.
This is a bug and this patch fixes it.

Cc: sta...@vger.kernel.org
Signed-off-by: Dmitry Kasatkin 
Signed-off-by: James Morris 

diff --git a/lib/digsig.c b/lib/digsig.c
index 286d558..77b1848 100644
--- a/lib/digsig.c
+++ b/lib/digsig.c
@@ -164,8 +164,12 @@ static int digsig_verify_rsa(struct key *key,
 
err = pkcs_1_v1_5_decode_emsa(out1, len, mblen, out2, );
 
-   if (!err && len == hlen)
-   err = memcmp(out2, h, hlen);
+   if (err || len != hlen) {
+   err = -EINVAL;
+   goto err;
+   }
+
+   err = memcmp(out2, h, hlen);
 
 err:
mpi_free(in);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3] trace: Move trace event enable from fs_initcall to core_initcall

2012-09-11 Thread Ezequiel Garcia

On Tue, Sep 11, 2012 at 9:34 PM, Steven Rostedt  wrote:
> On Sat, 2012-09-08 at 17:01 -0300, Ezequiel Garcia wrote:
>> This patch splits trace event initialization in two stages:
>>  * ftrace enable
>>  * sysfs event entry creation
>>
>> This allows to capture trace events from an earlier point
>> by using 'trace_event' kernel parameter and is important
>> to trace boot-up allocations.
>>
>> Note that, in order to enable events at core_initcall,
>> it's necessary to move init_ftrace_syscalls() from
>> core_initcall to early_initcall.
>
> Found another issue...
>
>>
>> Cc: Steven Rostedt 
>> Signed-off-by: Ezequiel Garcia 
>> ---
>> Changes from v1:
>>   * Rework code as requested by Steven.
>>
>> Changes from v2:
>>   * Move init_ftrace_syscalls() to early_initcall,
>> so syscalls self-test pass.
>>
>>  kernel/trace/trace_events.c   |  104 
>> +++-
>>  kernel/trace/trace_syscalls.c |2 +-
>>  2 files changed, 71 insertions(+), 35 deletions(-)
>>
>> diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
>> index 29111da..4eaf86e 100644
>> --- a/kernel/trace/trace_events.c
>> +++ b/kernel/trace/trace_events.c
>> @@ -1199,6 +1199,31 @@ event_create_dir(struct ftrace_event_call *call, 
>> struct dentry *d_events,
>>   return 0;
>>  }
>>
>> +static void event_remove(struct ftrace_event_call *call)
>> +{
>> + ftrace_event_enable_disable(call, 0);
>> + if (call->event.funcs)
>> + __unregister_ftrace_event(>event);
>> + list_del(>list);
>> +}
>> +
>> +static int event_init(struct ftrace_event_call *call)
>> +{
>> + int ret = 0;
>> +
>> + if (WARN_ON(!call->name))
>> + return -EINVAL;
>> +
>> + if (call->class->raw_init) {
>> + ret = call->class->raw_init(call);
>
> If raw_init() returns a failure, we skip this event.
>
>> + if (ret < 0 && ret != -ENOSYS)
>> + pr_warn("Could not initialize trace events/%s\n",
>> + call->name);
>> + }
>> +
>> + return ret;
>> +}
>> +
>>  static int
>>  __trace_add_event_call(struct ftrace_event_call *call, struct module *mod,
>>  const struct file_operations *id,
>> @@ -1209,19 +1234,9 @@ __trace_add_event_call(struct ftrace_event_call 
>> *call, struct module *mod,
>>   struct dentry *d_events;
>>   int ret;
>>
>> - /* The linker may leave blanks */
>> - if (!call->name)
>> - return -EINVAL;
>> -
>> - if (call->class->raw_init) {
>> - ret = call->class->raw_init(call);
>> - if (ret < 0) {
>> - if (ret != -ENOSYS)
>> - pr_warning("Could not initialize trace 
>> events/%s\n",
>> -call->name);
>> - return ret;
>> - }
>> - }
>> + ret = event_init(call);
>> + if (ret < 0)
>> + return ret;
>>
>>   d_events = event_trace_events_dir();
>>   if (!d_events)
>> @@ -1272,13 +1287,10 @@ static void remove_subsystem_dir(const char *name)
>>   */
>>  static void __trace_remove_event_call(struct ftrace_event_call *call)
>>  {
>> - ftrace_event_enable_disable(call, 0);
>> - if (call->event.funcs)
>> - __unregister_ftrace_event(>event);
>> - debugfs_remove_recursive(call->dir);
>> - list_del(>list);
>> + event_remove(call);
>>   trace_destroy_fields(call);
>>   destroy_preds(call);
>> + debugfs_remove_recursive(call->dir);
>>   remove_subsystem_dir(call->class->system);
>>  }
>>
>> @@ -1450,6 +1462,36 @@ static __init int setup_trace_event(char *str)
>>  }
>>  __setup("trace_event=", setup_trace_event);
>>
>> +static __init int event_trace_enable(void)
>> +{
>> + struct ftrace_event_call **iter, *call;
>> + char *buf = bootup_event_buf;
>> + char *token;
>> + int ret;
>> +
>> + for_each_event(iter, __start_ftrace_events, __stop_ftrace_events) {
>> +
>> + call = *iter;
>> + ret = event_init(call);
>> + if (!ret)
>> + list_add(>list, _events);
>> + }
>> +
>> + while (true) {
>> + token = strsep(, ",");
>> +
>> + if (!token)
>> + break;
>> + if (!*token)
>> + continue;
>> +
>> + ret = ftrace_set_clr_event(token, 1);
>> + if (ret)
>> + pr_warn("Failed to enable trace event: %s\n", token);
>> + }
>> + return 0;
>> +}
>> +
>>  static __init int event_trace_init(void)
>>  {
>>   struct ftrace_event_call **call;
>> @@ -1457,8 +1499,6 @@ static __init int event_trace_init(void)
>>   struct dentry *entry;
>>   struct dentry *d_events;
>>   int ret;
>> - char *buf = bootup_event_buf;
>> - char *token;
>>
>>   d_tracer = tracing_init_dentry();
>>   if (!d_tracer)
>> @@ -1497,24 +1537,19 @@ static __init int

Re: [PATCH 09/12] thp: introduce khugepaged_prealloc_page and khugepaged_alloc_page

2012-09-11 Thread Xiao Guangrong

On 09/12/2012 10:03 AM, Hugh Dickins wrote:

> What brought me to look at it was hitting "BUG at mm/huge_memory.c:1842!"
> running tmpfs kbuild swapping load (with memcg's memory.limit_in_bytes
> forcing out to swap), while I happened to have CONFIG_NUMA=y.
> 
> That's the VM_BUG_ON(*hpage) on entry to khugepaged_alloc_page().

> 
> So maybe 9/12 is just obscuring what was already a BUG, either earlier
> in your series or elsewhere in mmotm (I've never seen it on 3.6-rc or
> earlier releases, nor without CONFIG_NUMA).  I've not spent any time
> looking for it, maybe it's obvious - can you spot and fix it?

Hugh,

I think i have already found the reason, if i am correct, the bug was existing
before my patch.

Could you please try below patch? And, could please allow me to fix the bug 
first,
then post another patch to improve the things you dislike?


Subject: [PATCH] thp: fix forgetting to reset the page alloc indicator

If NUMA is enabled, the indicator is not reset if the previous page
request is failed, then it will trigger the BUG_ON in khugepaged_alloc_page

Signed-off-by: Xiao Guangrong 
---
 mm/huge_memory.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index e366ca5..66d2bc6 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1825,6 +1825,7 @@ static bool khugepaged_prealloc_page(struct page **hpage, 
bool *wait)
return false;

*wait = false;
+   *hpage = NULL;
khugepaged_alloc_sleep();
} else if (*hpage) {
put_page(*hpage);
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC] tty/serial/kgdboc: Add and wire up clear_irqs callback

2012-09-11 Thread Anton Vorontsov

On Tue, Sep 11, 2012 at 03:15:40PM +0100, Alan Cox wrote:
> Anton Vorontsov  wrote:
> > This patch implements a new callback: clear_irqs. It is used for the
> 
> This bit I still really don't like. I would like to know what the generic
> IRQ folks thing about it and if Thomas Gleixner has any brilliant ideas
> here. I don't think its a show stopper it would just be nice if there was
> a better solution first.

Yup, good idea, Cc'ing.

Hello Thomas,

We're dissussing a patch that adds a clear_irq callback into UART
drivers. For convenience, the particular patch is inlined at the end of
this email. The rationale and the background for the whole thing can be
found here: http://lkml.org/lkml/2012/9/10/2

So, just for visual clearness, and for the fun of it, here is some
glorious ascii art of what we have:

 ,---NMI-|`|
UART_IRQ---INT_controller| CPU |
 `---IRQ-|,|

Pretty much standard scheme. That is, on the interrupt controller level
we can reroute any IRQ to NMI, and back in 2008 folks at Google found
that rerouting the UART IRQ to NMI brings some really cool features: we
can have a very reliable and powerful debugger pretty much on every
embedded machine, and without loosing the UART/console port itself.

So, it works like this:

- At boot time, Linux arch/ code reroutes UART IRQ to NMI, we connect
  the port to the KGDBoC, and so forth...;
- User starts typing on the serial port;
- UART raises its IRQ line;
- Through the controller, one of CPUs gets an NMI;
- In NMI context, CPU reads a character from UART;
- Then it checks if the character resulted into the special 'enter
  KGDB' magic sequence:
- If yes, then the CPU invites other CPUs into the KGDB, and thus
  kernel enters the debugger;
- If the character wasn't part of the magic command (or the sequence is
  yet incomplete), then CPU exits NMI and continues as normal.

The "problem" is in the last step. If we exit NMI without making UART
know that we're done with the interrupt, we will reenter the NMI
immediately, even without any new characters from the UART.

The obvious solution would be to "mask/reroute NMI at INT_controller
level or queue serial port's IRQ routine from somewhere, e.g. a tasklet
or software raised IRQ". Unfortunately, this means that we have to keep
NMI disabled for this long:

1. We exit the NMI context with NMI source disabled/rerouted;
2. CPU continues to execute the kernel;
3. Kernel receives a timer interrupt, or software-raised interrupt, or
   UART IRQ, which was temporary rerouted back to normal interrupts;
4. It executes normal IRQ-entry, tracing, lots of other stuff,
   interrupt handlers, softirq handlers, and thus we clear the UART
   interrupt;
5. Once the UART is cleared, we reenable NMI (in the arch-code, we can
   do that in our do_IRQ());

As you can see, with this solution the NMI debugger will be effectively
disabled from 1. to 5., thus shall the hang happen in this sensitive
code, we would no longer able to debug it.

And this defeats the main purpose of the NMI debugger: we must keep NMI
enabled all the time when we're not in the debugger, the NMI debugger
is always available (unless the debugger crashed :-)

That's why I came with the clear_irq callback in the serial drivers
that we call from the NMI context, it's much simpler and keeps the
debugger robust. So, personally I too can't think of any other
plausible solution that would keep all the features intact.

Thanks,

Anton.

- - - -
[PATCH] tty/serial/kgdboc: Add and wire up clear_irqs callback

This patch implements a new callback: clear_irqs. It is used for the
cases when KDB-entry (e.g. NMI) and KDB IO (e.g. serial port) shares the
same interrupt. To get the idea, let's take some real example (ARM
machine): we have a serial port which interrupt is routed to an NMI, and
the interrupt is used to enter KDB. Once there is some activity on the
serial port, the CPU receives NMI exception, and we fall into KDB shell.
So, it is our "debug console", and it is able to interrupt (and thus
debug) even IRQ handlers themselves.

When used that way, the interrupt never reaches serial driver's IRQ
handler routine, which means that serial driver will not silence the
interrupt. NMIs behaviour are quite arch-specific, and we can't assume
that we can use them as ordinary IRQs, e.g. on some arches (like ARM) we
can't handle data aborts, the behaviour is undefined then. So we can't
just handle execution to serial driver's IRQ handler from the NMI
context once we're done with KDB (plus this would defeat the debugger's
purpose: we want the NMI handler be as simple as possible, so it will
have less chances to hang).

So, given that have to deal with it somehow, we have two options:

1. Implement something that clears the interrupt; 2. Implement a whole
new concept of grabbing tty for exclusive KDB use, plus implement
mask/unmask callbacks, i.e.:
   - Since consoles might use ttys w/o opending them, we would

Re: [PATCH V3 1/3] drivers/char/tpm: Add new device driver to support IBM vTPM

2012-09-11 Thread James Morris

On Fri, 7 Sep 2012, Kent Yoder wrote:

> > >   James did accept my pull request, so these are already in
> > > security-next...
> > 
> > For the driver itself, it's not a big issue (though I did found issue
> > while reviewing it so it will need another round of updates). For the
> > code that changes arch/powerpc, especially prom_init.c, that stuff must
> > at the very least be acked by me (or the acting powerpc person if I'm
> > away) if it's going to go via a different tree.
> 
>   Sorry about that.  Hopefully there won't be any changes there and we
> can amend with your ack.
> 
>   As for the driver updates, I'd hate to see everyone else's code in the
> pull request get delayed yet again.  James, will it be ok to apply the
> update on top of security-next?

I guess?

-- 
James Morris

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] Add ratelimited printk for different alert levels

2012-09-11 Thread Joe Perches

On Wed, 2012-09-12 at 03:43 +0530, raghu.prabh...@gmail.com wrote:
> Ratelimited printk will be useful in printing xfs messages which are otherwise
> not required to be printed always due to their high rate (to prevent kernel 
> ring
> buffer from overflowing), while at the same time required to be printed.
[]
> diff --git a/fs/xfs/xfs_message.h b/fs/xfs/xfs_message.h
[]
> @@ -30,6 +32,32 @@ void xfs_debug(const struct xfs_mount *mp, const char 
> *fmt, ...)
>  }
>  #endif
>  
> +#define xfs_printk_ratelimited(xfs_printk, dev, fmt, ...)\
> +do { \
> + static DEFINE_RATELIMIT_STATE(_rs,  \
> +   DEFAULT_RATELIMIT_INTERVAL,   \
> +   DEFAULT_RATELIMIT_BURST); \
> + if (__ratelimit(&_rs))  \
> + xfs_printk(dev, fmt, ##__VA_ARGS__);\
> +} while (0)

It might be better to use an xfs singleton RATELIMIT_STATE

DEFINE_RATELIMIT_STATE(xfs_rs);
...
#define xfs_printk_ratelimited(xfs_printk, dev, fmt, ...)   \
do {\
if (__ratelimit(_rs))   \
xfs_printk(dev, fmt, ##__VA_ARGS__);\
} while (0)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] spi: omap2-mcspi: Cleanup the omap2_mcspi_txrx_dma function

2012-09-11 Thread Mark Brown

On Tue, Sep 11, 2012 at 12:13:20PM +0530, Shubhrajyoti D wrote:
> Currently in omap2_mcspi_txrx_dma the tx and the rx support is
> interleaved. Make the rx related code in omap2_mcspi_rx_dma
> and the tx related code omap2_mcspi_tx_dma and call the functions.

I'd ideally like some testing from the OMAP side before applying this -
is there someone who can give a Tested-by?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] tpm: fix tpm_acpi sparse warning on different address spaces

2012-09-11 Thread James Morris

On Tue, 4 Sep 2012, Kent Yoder wrote:

> acpi_os_map_memory expects its return value to be in the __iomem address
> space. Tag the variable we're using as such and use memcpy_fromio to
> avoid further sparse warnings.
> 
> Signed-off-by: Kent Yoder 

Applied to
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git next


-- 
James Morris

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 16/16] ARM: spear: move platform_data definitions

2012-09-11 Thread viresh kumar

On Tue, Sep 11, 2012 at 6:32 PM, Arnd Bergmann  wrote:
> Platform data for device drivers should be defined in
> include/linux/platform_data/*.h, not in the architecture
> and platform specific directories.
>
> This moves such data out of the spear include directories
>
> Signed-off-by: Arnd Bergmann 
>
> Cc: Viresh Kumar 
> Cc: Shiraz Hashim 
> Cc: spear-de...@list.st.com
> Cc: Dmitry Torokhov 

Acked-by: Viresh Kumar 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

make tar*-pkg considered dangerous

2012-09-11 Thread Andi Kleen


Hi,

We've had some incidents with people destroying Fedore 17 installs
(to the point of reinstall) by installing a kernel tarball generated with 
make tar*-pkg

The problem is that the tarball includes /lib/{modules,firmware},
but on FC17 /lib is a symlink. tar when it unpacks the tarball
replaces the symlink with the directory. So they end up
with a /lib which only contains the new kernel files, but nothing else,

And then the system doesn't boot anymore.

I'm not sure there is a good fix for this. I don't know of a way to
convince tar to not do that. And putting everything into /usr 
would be very incompatible.

Disable these make targets or add warnings?  If disabling people should
use rpms or dpkgs instead?

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] CodingStyle: Add networking specific block comment style

2012-09-11 Thread Joe Perches

The block comment style in net/ and drivers/net is non-standard.
Document it.

Signed-off-by: Joe Perches 
---
> This conflicts with the preferred style for long (multi-line) comments 
> documented in
> ./Documentation/CodingStyle.  If this is the way comments should be done in 
> the
> networking code this patch should also include an update to Chapter 8 in 
> CodingStyle
> documenting the networking specific style to avoid confusion.

 Documentation/CodingStyle |   10 ++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/Documentation/CodingStyle b/Documentation/CodingStyle
index cb9258b..495e5ba 100644
--- a/Documentation/CodingStyle
+++ b/Documentation/CodingStyle
@@ -454,6 +454,16 @@ The preferred style for long (multi-line) comments is:
 * with beginning and ending almost-blank lines.
 */
 
+For files in net/ and drivers/net/ the preferred style for long (multi-line)
+comments is a little different.
+
+   /* The preferred comment style for files in net/ and drivers/net
+* looks like this.
+*
+* It is nearly the same as the generally preferred comment style,
+* but there is no initial almost-blank line.
+*/
+
 It's also important to comment data, whether they are basic types or derived
 types.  To this end, use just one data declaration per line (no commas for
 multiple data declarations).  This leaves you room for a small comment on each




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] module: add syscall to load module from fd

2012-09-11 Thread James Morris

On Mon, 10 Sep 2012, Rusty Russell wrote:

> Kees Cook  writes:
> > On Fri, Sep 7, 2012 at 10:12 AM, Mimi Zohar  
> > wrote:
> >> This method is a consistent and extensible approach to verifying the
> >> integrity of file data/metadata, including kernel modules. The only
> >> downside to this approach, I think, is that it requires changes to the
> >> userspace tool.
> >
> > I'm fine with this -- it's an expected change that I'll pursue with
> > glibc, kmod, etc. Without the userspace changes, nothing will use the
> > new syscall. :) I've already got kmod (and older module-init-tools)
> > patched to do this locally.
> 
> A syscall is the right way to do this.  But does it need to be done?
> 
> 1) Do the LSM guys really want this hook?

Yes.

Acked-by: James Morris 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] samples/seccomp: fix 31 bit build on s390

2012-09-11 Thread James Morris

On Sat, 8 Sep 2012, Heiko Carstens wrote:

> >From cea999ef4e68e23c70e64baf054768bdebe15e1b Mon Sep 17 00:00:00 2001
> From: Heiko Carstens 
> Date: Sat, 8 Sep 2012 10:23:42 +0200
> Subject: [PATCH] samples/seccomp: fix 31 bit build on s390
> 
> On s390 the flag to force 31 builds is -m31 instead of -m32 unlike
> on all (?) other architectures.
> 
> Fixes this compile error:
> 
>   HOSTCC  samples/seccomp/bpf-direct.o
> cc1: error: unrecognized command line option "-m32"
> make[2]: *** [samples/seccomp/bpf-direct.o] Error 1
> 
> Signed-off-by: Heiko Carstens 

Applied to
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git next
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [alsa-devel] [PATCH] ASoC: codecs: Add DA9055 codec driver

2012-09-11 Thread Mark Brown

On Tue, Sep 11, 2012 at 08:33:43PM +0530, Ashish Chavan wrote:

> +/* LDO voltage level value */
> +static const char * const da9055_ldo_lvl_select_txt[] = {
> + "1.05V", "1.1V", "1.2V", "1.4V"
> +};

> +static const struct soc_enum da9055_ldo_lvl_select =
> + SOC_ENUM_SINGLE(DA9055_LDO_CTRL, 4, 4, da9055_ldo_lvl_select_txt);

There's a regulator API, if configuration is required we should be using
that.

> +/* Digital MIC clock rate select */
> +static const char * const da9055_dmic_clk_rate_txt[] = {
> + "3MHz", "1.5MHz"
> +};
> +
> +static const struct soc_enum da9055_dmic_clk_rate =
> + SOC_ENUM_SINGLE(DA9055_MIC_CONFIG, 2, 2, da9055_dmic_clk_rate_txt);
> +
> +/* Digital MIC sample phase select */
> +static const char * const da9055_dmic_phase_txt[] = {
> + "Sample on DMICCLK edges", "Sample between DMICCLK edges"
> +};
> +
> +static const struct soc_enum da9055_dmic_phase =
> + SOC_ENUM_SINGLE(DA9055_MIC_CONFIG, 1, 2, da9055_dmic_phase_txt);
> +
> +/* Digital MIC channel select */
> +static const char * const da9055_dmic_channel_select_txt[] = {
> + "Rising Left Falling Right", "Falling Left Rising Right"
> +};

Why is any of this being exposed to userspace?  If this should be
configured I'd expect it to be static platform data, not something that
gets changed at runtime.

> +/* MIC bias voltage level select */
> +static const char * const da9055_mic_bias_level_txt[] = {
> + "1.6V", "1.8V", "2.1V", "2.2V"
> +};

Again, why is this being exposed to userspace/  I'm fairly sure we went
through similar stuff with your last driver...

> + SOC_DOUBLE_R_TLV("HeadPhone Volume",
> +  DA9055_HP_L_GAIN, DA9055_HP_R_GAIN,
> +  0, 0x3f, 0, hp_vol_tlv),

Headphone.

> + /* Mute controls */
> + SOC_DOUBLE_R("Mic Mute Switch", DA9055_MIC_L_CTRL,
> +  DA9055_MIC_R_CTRL, 6, 1, 0),
> + SOC_DOUBLE_R("Aux Mute Switch", DA9055_AUX_L_CTRL,
> +  DA9055_AUX_R_CTRL, 6, 1, 0),
> + SOC_DOUBLE_R("Mixin PGA Mute Switch", DA9055_MIXIN_L_CTRL,
> +  DA9055_MIXIN_R_CTRL, 6, 1, 0),
> + SOC_DOUBLE_R("ADC Mute Switch", DA9055_ADC_L_CTRL,
> +  DA9055_ADC_R_CTRL, 6, 1, 0),
> + SOC_DOUBLE_R("HeadPhone Mute Switch", DA9055_HP_L_CTRL,
> +  DA9055_HP_R_CTRL, 6, 1, 0),
> + SOC_SINGLE("Lineout Mute Switch", DA9055_LINE_CTRL, 6, 1, 0),
> + SOC_SINGLE("DAC Soft Mute Switch", DA9055_DAC_FILTERS5, 7, 1, 0),

No "Mute".  Again, I'm fairly sure we had the same issue last time.

> + /* LDO control */
> + SOC_SINGLE("LDO Enable", DA9055_LDO_CTRL, 7, 1, 0),
> + SOC_ENUM("LDO Level Select", da9055_ldo_lvl_select),

The LDO enable should absolutely not be being exposed to userspace!

> + /* DMIC controls */
> + SOC_DOUBLE_R("DMIC Enable", DA9055_MIXIN_L_SELECT,
> +  DA9055_MIXIN_R_SELECT, 7, 1, 0),

Switch if this is a mute.

> + /* MIC PGA input selection controls */
> + SOC_ENUM("Mic Left Input Select", da9055_mic_l_select),
> + SOC_ENUM("Mic Right Input Select", da9055_mic_r_select),

DAPM.

> +
> + /* ALC Controls */
> + SOC_DOUBLE_EXT("ALC Enable", DA9055_ALC_CTRL1, 3, 7, 1, 0,
> +snd_soc_get_volsw, da9055_put_alc_sw),

ALC Switch.  All enable controls should be switches.

> +static int da9055_hw_params(struct snd_pcm_substream *substream,
> + struct snd_pcm_hw_params *params,
> + struct snd_soc_dai *dai)
> +{
> + struct snd_soc_codec *codec = dai->codec;
> + struct da9055_priv *da9055 = snd_soc_codec_get_drvdata(codec);
> + u8 aif_ctrl, fs;
> + u32 sysclk;
> +
> + /* Set AIF source to Left and Right ADC */
> + snd_soc_write(codec, DA9055_DIG_ROUTING_AIF,
> +   DA9055_AIF_L_SRC | DA9055_AIF_R_SRC);

This should be in DAPM.

> + aif_ctrl = snd_soc_read(codec, DA9055_AIF_CTRL) & 0xf3;

Use snd_soc_update_bits() later on.

> + aif_ctrl |= (DA9055_AIF_OE | DA9055_AIF_EN);

DAPM.

> + /* In slave mode, there is only one set of divisors */
> + if (!da9055->master)
> + fout = 2822400;

Should check the user supplied this value - also, what happens if the
user sets the device to slave mode after setting up the PLL?

> + /* Enable VMID reference & master bias */
> + snd_soc_write(codec, DA9055_REFERENCES,
> +   DA9055_VMID_EN | DA9055_BIAS_EN);

set_bias_level()

> + /* Enable Mic Bias */
> + snd_soc_write(codec, DA9055_MIC_BIAS_CTRL, DA9055_MIC_BIAS_EN);

DAPM for this and most of the rest of this funciton.

> + da9055->regmap = regmap_init_i2c(i2c, _regmap_config);

devm_regmap_init_i2c()
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: manual merge of the mfd tree with Linus' tree

2012-09-11 Thread Stephen Rothwell

Hi Samuel,

Today's linux-next merge of the mfd tree got a conflict in
Documentation/devicetree/bindings/regulator/tps6586x.txt between commit
7f852e0584f6 ("regulator: tps6586x: correct vin pin for sm0/sm1/sm2")
from Linus' tree and commit 566a725dff0d ("mfd: dt: tps6586x: Add power
off control") from the mfd tree.

Just context changes.  I fixed it up (see below) and can carry the fix as
necessary (no action is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc Documentation/devicetree/bindings/regulator/tps6586x.txt
index da80c2a,03dfa4e..000
--- a/Documentation/devicetree/bindings/regulator/tps6586x.txt
+++ b/Documentation/devicetree/bindings/regulator/tps6586x.txt
@@@ -30,9 -34,11 +34,11 @@@ Example
#gpio-cells = <2>;
gpio-controller;
  
+   ti,system-power-controller;
+ 
 -  sm0-supply = <_reg>;
 -  sm1-supply = <_reg>;
 -  sm2-supply = <_reg>;
 +  vin-sm0-supply = <_reg>;
 +  vin-sm1-supply = <_reg>;
 +  vin-sm2-supply = <_reg>;
vinldo01-supply = <...>;
vinldo23-supply = <...>;
vinldo4-supply = <...>;


pgpznAX2203ZO.pgp
Description: PGP signature

[PATCH 2/3] tracing: reset ring buffer when changing trace_clocks

2012-09-11 Thread David Sharp

Because the "tsc" clock isn't in nanoseconds, the ring buffer must be
reset when changing clocks so that incomparable timestamps don't end up
in the same trace.

Tested: Confirmed switching clocks resets the trace buffer.
Signed-off-by: David Sharp 
---
 kernel/trace/trace.c |8 
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index dc1f1fa..6911f35 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4014,6 +4014,14 @@ static ssize_t tracing_clock_write(struct file *filp, 
const char __user *ubuf,
if (max_tr.buffer)
ring_buffer_set_clock(max_tr.buffer, trace_clocks[i].func);
 
+   /*
+* New clock may not be consistent with the previous clock.
+* Reset the buffer so that it doesn't have incomparable timestamps.
+*/
+   tracing_reset_online_cpus(_trace);
+   if (max_tr.buffer)
+   tracing_reset_online_cpus(_tr);
+
mutex_unlock(_types_lock);
 
*fpos += cnt;
-- 
1.7.7.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix queueing work if !bdi_cap_writeback_dirty()

2012-09-11 Thread Fengguang Wu

On Wed, Sep 12, 2012 at 03:28:42AM +0900, OGAWA Hirofumi wrote:
> 
> If bdi has BDI_CAP_NO_WRITEBACK, bdi_forker_thread() doesn't start
> writeback thread. This means there is no consumer of work item made
> by bdi_queue_work().
> 
> This adds to checking of !bdi_cap_writeback_dirty(sb->s_bdi) before
> calling bdi_queue_work(), otherwise queued work never be consumed.

Thanks for catching this! Does this bug have any side effects other
than memory leaking? It may be possible for some caller that actually
expect it to do some work to make progress, otherwise will eventually
block.  If so, we'll need to fix the caller.

Thanks,
Fengguang

> Signed-off-by: OGAWA Hirofumi 
> ---
> 
>  fs/fs-writeback.c |7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff -puN fs/fs-writeback.c~noop_backing_dev_info-check-fix fs/fs-writeback.c
> --- linux/fs/fs-writeback.c~noop_backing_dev_info-check-fix   2012-09-11 
> 06:12:30.0 +0900
> +++ linux-hirofumi/fs/fs-writeback.c  2012-09-11 06:12:30.0 +0900
> @@ -120,6 +120,9 @@ __bdi_start_writeback(struct backing_dev
>  {
>   struct wb_writeback_work *work;
>  
> + if (!bdi_cap_writeback_dirty(bdi))
> + return;
> +
>   /*
>* This is WB_SYNC_NONE writeback, so if allocation fails just
>* wakeup the thread for old dirty data writeback
> @@ -1310,7 +1313,7 @@ void writeback_inodes_sb_nr(struct super
>   .reason = reason,
>   };
>  
> - if (sb->s_bdi == _backing_dev_info)
> + if (!bdi_cap_writeback_dirty(sb->s_bdi))
>   return;
>   WARN_ON(!rwsem_is_locked(>s_umount));
>   bdi_queue_work(sb->s_bdi, );
> @@ -1396,7 +1399,7 @@ void sync_inodes_sb(struct super_block *
>   };
>  
>   /* Nothing to do? */
> - if (sb->s_bdi == _backing_dev_info)
> + if (!bdi_cap_writeback_dirty(sb->s_bdi))
>   return;
>   WARN_ON(!rwsem_is_locked(>s_umount));
>  
> _
> 
> -- 
> OGAWA Hirofumi 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/3] tracing: format non-nanosec times from tsc clock without a decimal point.

2012-09-11 Thread David Sharp

With the addition of the "tsc" clock, formatting timestamps to look like
fractional seconds is misleading. Mark clocks as either in nanoseconds or
not, and format non-nanosecond timestamps as decimal integers.

Tested:
$ cd /sys/kernel/debug/tracing/
$ cat trace_clock
[local] global tsc
$ echo sched_switch > set_event
$ echo 1 > tracing_enabled ; sleep 0.0005 ; echo 0 > tracing_enabled
$ cat trace
  -0 [000]  6330.52: sched_switch: prev_comm=swapper 
prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 
next_prio=120
   sleep-29964 [000]  6330.555628: sched_switch: prev_comm=bash 
prev_pid=29964 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 
next_prio=120
  ...
$ echo 1 > options/latency-format
$ cat trace
  -0   0 4104553247us+: sched_switch: prev_comm=swapper prev_pid=0 
prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 next_prio=120
   sleep-29964   0 4104553322us+: sched_switch: prev_comm=bash prev_pid=29964 
prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
  ...
$ echo tsc > trace_clock
$ cat trace
$ echo 1 > tracing_enabled ; sleep 0.0005 ; echo 0 > tracing_enabled
$ echo 0 > options/latency-format
$ cat trace
  -0 [000] 16490053398357: sched_switch: prev_comm=swapper 
prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 
next_prio=120
   sleep-31128 [000] 16490053588518: sched_switch: prev_comm=bash 
prev_pid=31128 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 
next_prio=120
  ...
echo 1 > options/latency-format
$ cat trace
  -0   0 91557653238+: sched_switch: prev_comm=swapper prev_pid=0 
prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 next_prio=120
   sleep-31128   0 91557843399+: sched_switch: prev_comm=bash prev_pid=31128 
prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
  ...

Google-Bug-Id: 6980623
Signed-off-by: David Sharp 
---
 include/linux/ftrace_event.h |6 +++
 kernel/trace/trace.c |   17 +++--
 kernel/trace/trace.h |4 --
 kernel/trace/trace_output.c  |   82 +-
 4 files changed, 76 insertions(+), 33 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 642928c..c760670 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -86,6 +86,12 @@ struct trace_iterator {
cpumask_var_t   started;
 };
 
+enum trace_iter_flags {
+   TRACE_FILE_LAT_FMT  = 1,
+   TRACE_FILE_ANNOTATE = 2,
+   TRACE_FILE_TIME_IN_NS   = 4,
+};
+
 
 struct trace_event;
 
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 6911f35..4b78ce2 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -476,12 +476,13 @@ static const char *trace_options[] = {
 static struct {
u64 (*func)(void);
const char *name;
+   int in_ns; /* is this clock in nanoseconds? */
 } trace_clocks[] = {
-   { trace_clock_local,"local" },
-   { trace_clock_global,   "global" },
-   { trace_clock_counter,  "counter" },
+   { trace_clock_local,"local",1 },
+   { trace_clock_global,   "global",   1 },
+   { trace_clock_counter,  "counter",  0 },
 #ifdef CONFIG_X86_TSC
-   { trace_clock_tsc,  "tsc" },
+   { trace_clock_tsc,  "tsc",  0 },
 #endif
 };
 
@@ -2427,6 +2428,10 @@ __tracing_open(struct inode *inode, struct file *file)
if (ring_buffer_overruns(iter->tr->buffer))
iter->iter_flags |= TRACE_FILE_ANNOTATE;
 
+   /* Output in nanoseconds only if we are using a clock in nanoseconds. */
+   if (trace_clocks[trace_clock_id].in_ns)
+   iter->iter_flags |= TRACE_FILE_TIME_IN_NS;
+
/* stop the trace while dumping */
tracing_stop();
 
@@ -3326,6 +3331,10 @@ static int tracing_open_pipe(struct inode *inode, struct 
file *filp)
if (trace_flags & TRACE_ITER_LATENCY_FMT)
iter->iter_flags |= TRACE_FILE_LAT_FMT;
 
+   /* Output in nanoseconds only if we are using a clock in nanoseconds. */
+   if (trace_clocks[trace_clock_id].in_ns)
+   iter->iter_flags |= TRACE_FILE_TIME_IN_NS;
+
iter->cpu_file = cpu_file;
iter->tr = _trace;
mutex_init(>mutex);
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 55e1f7f..84fefed 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -409,10 +409,6 @@ void tracing_start_sched_switch_record(void);
 int register_tracer(struct tracer *type);
 void unregister_tracer(struct tracer *type);
 int is_tracing_stopped(void);
-enum trace_file_type {
-   TRACE_FILE_LAT_FMT  = 1,
-   TRACE_FILE_ANNOTATE = 2,
-};
 
 extern cpumask_var_t __read_mostly tracing_buffer_mask;
 
diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
index 123b189..c86d62a 100644
--- a/kernel/trace/trace_output.c
+++

[PATCH 1/3] tracing,x86: add a TSC trace_clock; reset buffer on clock change

2012-09-11 Thread David Sharp

In order to promote interoperability between userspace tracers and ftrace,
add a trace_clock that reports raw TSC values which will then be recorded
in the ring buffer. Userspace tracers that also record TSCs are then on
exactly the same time base as the kernel and events can be unambiguously
interlaced.

Tested: Enabled a tracepoint and the "tsc" trace_clock and saw very large
timestamp values.

Google-Bug-Id: 6980623
Signed-off-by: David Sharp 
---
 include/linux/trace_clock.h |3 +++
 kernel/trace/trace.c|3 +++
 kernel/trace/trace_clock.c  |   16 
 3 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/include/linux/trace_clock.h b/include/linux/trace_clock.h
index 4eb4902..b86c7363 100644
--- a/include/linux/trace_clock.h
+++ b/include/linux/trace_clock.h
@@ -16,5 +16,8 @@ extern u64 notrace trace_clock_local(void);
 extern u64 notrace trace_clock(void);
 extern u64 notrace trace_clock_global(void);
 extern u64 notrace trace_clock_counter(void);
+#ifdef CONFIG_X86_TSC
+extern u64 notrace trace_clock_tsc(void);
+#endif
 
 #endif /* _LINUX_TRACE_CLOCK_H */
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 5c38c81..dc1f1fa 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -480,6 +480,9 @@ static struct {
{ trace_clock_local,"local" },
{ trace_clock_global,   "global" },
{ trace_clock_counter,  "counter" },
+#ifdef CONFIG_X86_TSC
+   { trace_clock_tsc,  "tsc" },
+#endif
 };
 
 int trace_clock_id;
diff --git a/kernel/trace/trace_clock.c b/kernel/trace/trace_clock.c
index 3947835..1770737 100644
--- a/kernel/trace/trace_clock.c
+++ b/kernel/trace/trace_clock.c
@@ -125,3 +125,19 @@ u64 notrace trace_clock_counter(void)
 {
return atomic64_add_return(1, _counter);
 }
+
+#ifdef CONFIG_X86_TSC
+/*
+ * trace_clock_tsc(): A clock that is just the cycle counter.
+ *
+ * Unlike the other clocks, this is not in nanoseconds.
+ */
+u64 notrace trace_clock_tsc(void)
+{
+   u64 ret;
+   rdtsc_barrier();
+   rdtscll(ret);
+
+   return ret;
+}
+#endif
-- 
1.7.7.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/3] TSC trace_clock

2012-09-11 Thread David Sharp

As discussed at Plumbers, here are my patches to add a TSC clock to ftrace.
Also noticeable is that the formatting patch improves the output of the
"counter" clock.

David Sharp (3):
  tracing,x86: add a TSC trace_clock; reset buffer on clock change
  tracing: reset ring buffer when changing trace_clocks
  tracing: format non-nanosec times from tsc clock without a decimal
point.

 include/linux/ftrace_event.h |6 +++
 include/linux/trace_clock.h  |3 ++
 kernel/trace/trace.c |   26 --
 kernel/trace/trace.h |4 --
 kernel/trace/trace_clock.c   |   16 
 kernel/trace/trace_output.c  |   82 +-
 6 files changed, 105 insertions(+), 32 deletions(-)

-- 
1.7.7.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Query of zram/zsmalloc promotion

2012-09-11 Thread Minchan Kim

Hi all,

I would like to promote zram/zsmalloc from staging tree.
I already tried it https://lkml.org/lkml/2012/8/8/37 but I didn't get
any response from you guys.

I think zram/zsmalloc's code qulity is good and they
are used for many embedded vendors for a long time.
So it's proper time to promote them.

The zram should put on under driver/block/. I think it's not
arguable but the issue is which directory we should keep *zsmalloc*.

Now Nitin want to keep it with zram so it would be in driver/blocks/zram/
But I don't like it because zsmalloc touches several fields of struct page
freely(and AFAIRC, Andrew had a same concern with me) so I want to put
it under mm/.

In addtion, now zcache use it, too so it's rather awkward if we put it
under dirver/blocks/zram/.

So questions.

To Andrew:
Is it okay to put it under mm/ ? Or /lib?

To Jens:
Is it okay to put zram under drvier/block/ If you are okay, I will start sending
patchset after I sort out zsmalloc's location issue.

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 09/12] thp: introduce khugepaged_prealloc_page and khugepaged_alloc_page

2012-09-11 Thread Xiao Guangrong

On 09/12/2012 10:03 AM, Hugh Dickins wrote:
> On Mon, 13 Aug 2012, Xiao Guangrong wrote:
> 
>> They are used to abstract the difference between NUMA enabled and NUMA 
>> disabled
>> to make the code more readable
>>
>> Signed-off-by: Xiao Guangrong 
>> ---
>>  mm/huge_memory.c |  166 
>> --
>>  1 files changed, 98 insertions(+), 68 deletions(-)
> 
> Hmm, that in itself is not necessarily an improvement.
> 
> I'm a bit sceptical about this patch,
> thp-introduce-khugepaged_prealloc_page-and-khugepaged_alloc_page.patch
> in last Thursday's mmotm 2012-09-06-16-46.
> 
> What brought me to look at it was hitting "BUG at mm/huge_memory.c:1842!"
> running tmpfs kbuild swapping load (with memcg's memory.limit_in_bytes
> forcing out to swap), while I happened to have CONFIG_NUMA=y.
> 
> That's the VM_BUG_ON(*hpage) on entry to khugepaged_alloc_page().

I will look into it, thanks for your point it out.

> 
> (If I'm honest, I'll admit I have Michel's "interval trees for anon rmap"
> patches in on top, and so the line number was actually shifted to 1839:
> but I don't believe his patches were in any way involved here, and
> indeed I've not yet found a problem with them: they look very good.)
> 
> I expect the BUG could quite easily be fixed up by making another call
> to khugepaged_prealloc_page() from somewhere to free up the hpage;
> but forgive me if I dislike using "prealloc" to free.
> 
> I do agree with you that the several CONFIG_NUMA ifdefs dotted around
> mm/huge_memory.c are regrettable, but I'm not at all sure that you're
> improving the situation with this patch, which gives misleading names
> to functions and moves the mmap_sem upping out of line.
> 
> I think you need to revisit it: maybe not go so far (leaving a few
> CONFIG_NUMAs behind, if they're not too bad), or maybe go further
> (add a separate function for freeing in the NUMA case, instead of
> using "prealloc").  I don't know what's best: have a play and see.

Sorry for that, i will find a better way to do this.

> 
> That's what I was intending to write yesterday.  But overnight I
> was running with this 9/12 backed out (I think 10,11,12 should be
> independent), and found "BUG at mm/huge_memory.c:1835!" this morning.
> 
> That's the VM_BUG_ON(*hpage) below #else in collapse_huge_page()
> when 9/12 is reverted.
> 
> So maybe 9/12 is just obscuring what was already a BUG, either earlier
> in your series or elsewhere in mmotm (I've never seen it on 3.6-rc or
> earlier releases, nor without CONFIG_NUMA).  I've not spent any time
> looking for it, maybe it's obvious - can you spot and fix it?

Sure, will fix it as soon as possible. Thanks!


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ 34/95] USB: option: add ZTE K5006-Z

2012-09-11 Thread Ben Hutchings

On Tue, 2012-09-11 at 09:43 +0200, Bjørn Mork wrote:
> Thomas Schäfer  writes:
> 
> > Am Montag, 10. September 2012 um 00:42:24 schrieb Ben Hutchings:
> >> 3.2-stable review patch.  If anyone has any objections, please let me know.
> >
> >
> >
> > referring to
> >
> > http://www.spinics.net/lists/linux-usb/msg70131.html
> >
> > it should not be done with ff ff ff
> 
> 
> Yes, sorry about not getting around to fixing that in time.  I've just
> submitted a proposed fix.  But as my fix removes the bogus entry, you
> may just as well go ahead and add this patch to stable so that the fixup
> applies cleanly later. If accepted...
[...]

If this addition is 'bogus' then I'll defer it.

Ben.

-- 
Ben Hutchings
Make three consecutive correct guesses and you will be considered an expert.


signature.asc
Description: This is a digitally signed message part

Re: [PATCH 16/18] perf evsel: Introduce perf_evsel__{str,int}val methods

2012-09-11 Thread Namhyung Kim

On Tue, 11 Sep 2012 20:53:08 -0300, Arnaldo Carvalho de Melo wrote:
> From: Arnaldo Carvalho de Melo 
>
> Wrappers to the libtraceevent routines, so that we can further reduce
> the surface contact perf builtins have with it.

I just realized that this breaks the python extension:

namhyung@sejong:perf$ python/twatch.py 
Traceback (most recent call last):
  File "python/twatch.py", line 16, in 
import perf
ImportError: /home/namhyung/project/linux/tools/perf/python/perf.so: undefined 
symbol: pevent_find_field


Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 15/16] ARM: samsung: move platform_data definitions

2012-09-11 Thread Kukjin Kim

Arnd Bergmann wrote:
> 
> Platform data for device drivers should be defined in
> include/linux/platform_data/*.h, not in the architecture
> and platform specific directories.
> 
> This moves such data out of the samsung include directories
> 
> Signed-off-by: Arnd Bergmann 
> Cc: Kukjin Kim 

Yeah, basically looks OK on this.

Acked-by: Kukjin Kim 

BTW, how about re-ordering inclusion  after
 rather than just replacing like following?

---
#include 
...
#include 
...
#include 
...
#include 
...
---

And there are small comments...

> Cc: Kyungmin Park 
> Cc: Ben Dooks 
> Cc: Mark Brown 
> Cc: Jeff Garzik 
> Cc: Guenter Roeck 
> Cc: "Wolfram Sang (embedded platforms)" 
> Cc: Dmitry Torokhov 
> Cc: Bryan Wu 
> Cc: Richard Purdie 
> Cc: Sylwester Nawrocki 
> Cc: Mauro Carvalho Chehab 
> Cc: Chris Ball 
> Cc: David Woodhouse 
> Cc: Grant Likely 
> Cc: Felipe Balbi 
> Cc: Greg Kroah-Hartman 
> Cc: Alan Stern 
> Cc: Sangbeom Kim 
> Cc: Liam Girdwood 
> Cc: linux-samsung-...@vger.kernel.org
> ---
>  arch/arm/mach-exynos/dev-audio.c   |2 +-
>  arch/arm/mach-exynos/dev-ohci.c|2 +-
>  arch/arm/mach-exynos/mach-nuri.c   |6 +++---
>  arch/arm/mach-exynos/mach-origen.c |6 +++---
>  arch/arm/mach-exynos/mach-smdk4x12.c   |2 +-
>  arch/arm/mach-exynos/mach-smdkv310.c   |6 +++---
>  arch/arm/mach-exynos/mach-universal_c210.c |4 ++--
>  arch/arm/mach-exynos/setup-i2c0.c  |2 +-
>  arch/arm/mach-exynos/setup-i2c1.c  |2 +-
>  arch/arm/mach-exynos/setup-i2c2.c  |2 +-
>  arch/arm/mach-exynos/setup-i2c3.c  |2 +-
>  arch/arm/mach-exynos/setup-i2c4.c  |2 +-
>  arch/arm/mach-exynos/setup-i2c5.c  |2 +-
>  arch/arm/mach-exynos/setup-i2c6.c  |2 +-
>  arch/arm/mach-exynos/setup-i2c7.c  |2 +-
>  arch/arm/mach-s3c24xx/common-smdk.c|4 ++--
>  arch/arm/mach-s3c24xx/mach-amlm5900.c  |2 +-
>  arch/arm/mach-s3c24xx/mach-anubis.c|6 +++---
>  arch/arm/mach-s3c24xx/mach-at2440evb.c |6 +++---
>  arch/arm/mach-s3c24xx/mach-bast.c  |8 
>  arch/arm/mach-s3c24xx/mach-gta02.c |   10 +-
>  arch/arm/mach-s3c24xx/mach-h1940.c |8 
>  arch/arm/mach-s3c24xx/mach-jive.c  |6 +++---
>  arch/arm/mach-s3c24xx/mach-mini2440.c  |   10 +-
>  arch/arm/mach-s3c24xx/mach-n30.c   |8 
>  arch/arm/mach-s3c24xx/mach-nexcoder.c  |2 +-
>  arch/arm/mach-s3c24xx/mach-osiris.c|4 ++--
>  arch/arm/mach-s3c24xx/mach-otom.c  |2 +-
>  arch/arm/mach-s3c24xx/mach-qt2410.c|8 
>  arch/arm/mach-s3c24xx/mach-rx1950.c|   10 +-
>  arch/arm/mach-s3c24xx/mach-rx3715.c|2 +-
>  arch/arm/mach-s3c24xx/mach-smdk2410.c  |2 +-
>  arch/arm/mach-s3c24xx/mach-smdk2413.c  |4 ++--
>  arch/arm/mach-s3c24xx/mach-smdk2416.c  |8 
>  arch/arm/mach-s3c24xx/mach-smdk2440.c  |2 +-
>  arch/arm/mach-s3c24xx/mach-smdk2443.c  |2 +-
>  arch/arm/mach-s3c24xx/mach-tct_hammer.c|2 +-
>  arch/arm/mach-s3c24xx/mach-vr1000.c|6 +++---
>  arch/arm/mach-s3c24xx/mach-vstms.c |4 ++--
>  arch/arm/mach-s3c24xx/setup-i2c.c  |2 +-
>  arch/arm/mach-s3c24xx/simtec-audio.c   |2 +-
>  arch/arm/mach-s3c24xx/simtec-usb.c |2 +-
>  arch/arm/mach-s3c64xx/dev-audio.c  |2 +-
>  arch/arm/mach-s3c64xx/mach-anw6410.c   |2 +-
>  arch/arm/mach-s3c64xx/mach-crag6410-module.c   |2 +-
>  arch/arm/mach-s3c64xx/mach-crag6410.c  |4 ++--
>  arch/arm/mach-s3c64xx/mach-hmt.c   |4 ++--
>  arch/arm/mach-s3c64xx/mach-mini6410.c  |4 ++--
>  arch/arm/mach-s3c64xx/mach-ncp.c   |2 +-
>  arch/arm/mach-s3c64xx/mach-real6410.c  |4 ++--
>  arch/arm/mach-s3c64xx/mach-smartq.c|8 
>  arch/arm/mach-s3c64xx/mach-smdk6400.c  |2 +-
>  arch/arm/mach-s3c64xx/mach-smdk6410.c  |6 +++---
>  arch/arm/mach-s3c64xx/setup-i2c0.c |2 +-
>  arch/arm/mach-s3c64xx/setup-i2c1.c |2 +-
>  arch/arm/mach-s3c64xx/setup-ide.c  |2 +-
>  arch/arm/mach-s5p64x0/dev-audio.c  |2 +-
>  arch/arm/mach-s5p64x0/mach-smdk6440.c  |4 ++--
>  arch/arm/mach-s5p64x0/mach-smdk6450.c  |4 ++--
>  arch/arm/mach-s5p64x0/setup-i2c0.c |2 +-
>  arch/arm/mach-s5p64x0/setup-i2c1.c

[PATCH 2/2] perf sched: Fixup for the die() removal

2012-09-11 Thread Namhyung Kim

From: Namhyung Kim 

The commit a116e05dcf61 ("perf sched: Remove die() calls") replaced
die() call to pr_debug + return -1, but it should be pr_err otherwise
it'll not show up unless -v option is given.  Fix it.

Signed-off-by: Namhyung Kim 
---
 tools/perf/builtin-sched.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index af305f57bd22..9b9e32eaa805 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -438,8 +438,8 @@ static int self_open_counters(void)
fd = sys_perf_event_open(, 0, -1, -1, 0);
 
if (fd < 0)
-   pr_debug("Error: sys_perf_event_open() syscall returned"
-"with %d (%s)\n", fd, strerror(errno));
+   pr_err("Error: sys_perf_event_open() syscall returned "
+  "with %d (%s)\n", fd, strerror(errno));
return fd;
 }
 
@@ -700,7 +700,7 @@ static int replay_switch_event(struct perf_sched *sched,
delta = 0;
 
if (delta < 0) {
-   pr_debug("hm, delta: %" PRIu64 " < 0 ?\n", delta);
+   pr_err("hm, delta: %" PRIu64 " < 0 ?\n", delta);
return -1;
}
 
@@ -990,7 +990,7 @@ static int latency_runtime_event(struct perf_sched *sched,
return -1;
atoms = thread_atoms_search(>atom_root, thread, 
>cmp_pid);
if (!atoms) {
-   pr_debug("in-event: Internal tree error");
+   pr_err("in-event: Internal tree error");
return -1;
}
if (add_sched_out_event(atoms, 'R', timestamp))
@@ -1024,7 +1024,7 @@ static int latency_wakeup_event(struct perf_sched *sched,
return -1;
atoms = thread_atoms_search(>atom_root, wakee, 
>cmp_pid);
if (!atoms) {
-   pr_debug("wakeup-event: Internal tree error");
+   pr_err("wakeup-event: Internal tree error");
return -1;
}
if (add_sched_out_event(atoms, 'S', timestamp))
@@ -1079,7 +1079,7 @@ static int latency_migrate_task_event(struct perf_sched 
*sched,
register_pid(sched, migrant->pid, migrant->comm);
atoms = thread_atoms_search(>atom_root, migrant, 
>cmp_pid);
if (!atoms) {
-   pr_debug("migration-event: Internal tree error");
+   pr_err("migration-event: Internal tree error");
return -1;
}
if (add_sched_out_event(atoms, 'R', timestamp))
@@ -1286,7 +1286,7 @@ static int map_switch_event(struct perf_sched *sched, 
struct perf_evsel *evsel,
delta = 0;
 
if (delta < 0) {
-   pr_debug("hm, delta: %" PRIu64 " < 0 ?\n", delta);
+   pr_err("hm, delta: %" PRIu64 " < 0 ?\n", delta);
return -1;
}
 
-- 
1.7.11.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/2] perf test: Fixup for the die() removal

2012-09-11 Thread Namhyung Kim

From: Namhyung Kim 

The commit 32c7f7383a09 ("perf test: Remove die() calls") replaced
die() call to pr_debug + return -1, but it should be pr_err otherwise
it'll not show up unless -v option is given.  Fix it.

Signed-off-by: Namhyung Kim 
---
 tools/perf/builtin-test.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-test.c b/tools/perf/builtin-test.c
index d33143efefce..4aed1553db56 100644
--- a/tools/perf/builtin-test.c
+++ b/tools/perf/builtin-test.c
@@ -1026,15 +1026,15 @@ static int __test__rdpmc(void)
 
fd = sys_perf_event_open(, 0, -1, -1, 0);
if (fd < 0) {
-   pr_debug("Error: sys_perf_event_open() syscall returned "
-"with %d (%s)\n", fd, strerror(errno));
+   pr_err("Error: sys_perf_event_open() syscall returned "
+  "with %d (%s)\n", fd, strerror(errno));
return -1;
}
 
addr = mmap(NULL, page_size, PROT_READ, MAP_SHARED, fd, 0);
if (addr == (void *)(-1)) {
-   pr_debug("Error: mmap() syscall returned with (%s)\n",
-strerror(errno));
+   pr_err("Error: mmap() syscall returned with (%s)\n",
+  strerror(errno));
goto out_close;
}
 
-- 
1.7.11.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 09/12] thp: introduce khugepaged_prealloc_page and khugepaged_alloc_page

2012-09-11 Thread Hugh Dickins

On Mon, 13 Aug 2012, Xiao Guangrong wrote:

> They are used to abstract the difference between NUMA enabled and NUMA 
> disabled
> to make the code more readable
> 
> Signed-off-by: Xiao Guangrong 
> ---
>  mm/huge_memory.c |  166 
> --
>  1 files changed, 98 insertions(+), 68 deletions(-)

Hmm, that in itself is not necessarily an improvement.

I'm a bit sceptical about this patch,
thp-introduce-khugepaged_prealloc_page-and-khugepaged_alloc_page.patch
in last Thursday's mmotm 2012-09-06-16-46.

What brought me to look at it was hitting "BUG at mm/huge_memory.c:1842!"
running tmpfs kbuild swapping load (with memcg's memory.limit_in_bytes
forcing out to swap), while I happened to have CONFIG_NUMA=y.

That's the VM_BUG_ON(*hpage) on entry to khugepaged_alloc_page().

(If I'm honest, I'll admit I have Michel's "interval trees for anon rmap"
patches in on top, and so the line number was actually shifted to 1839:
but I don't believe his patches were in any way involved here, and
indeed I've not yet found a problem with them: they look very good.)

I expect the BUG could quite easily be fixed up by making another call
to khugepaged_prealloc_page() from somewhere to free up the hpage;
but forgive me if I dislike using "prealloc" to free.

I do agree with you that the several CONFIG_NUMA ifdefs dotted around
mm/huge_memory.c are regrettable, but I'm not at all sure that you're
improving the situation with this patch, which gives misleading names
to functions and moves the mmap_sem upping out of line.

I think you need to revisit it: maybe not go so far (leaving a few
CONFIG_NUMAs behind, if they're not too bad), or maybe go further
(add a separate function for freeing in the NUMA case, instead of
using "prealloc").  I don't know what's best: have a play and see.

That's what I was intending to write yesterday.  But overnight I
was running with this 9/12 backed out (I think 10,11,12 should be
independent), and found "BUG at mm/huge_memory.c:1835!" this morning.

That's the VM_BUG_ON(*hpage) below #else in collapse_huge_page()
when 9/12 is reverted.

So maybe 9/12 is just obscuring what was already a BUG, either earlier
in your series or elsewhere in mmotm (I've never seen it on 3.6-rc or
earlier releases, nor without CONFIG_NUMA).  I've not spent any time
looking for it, maybe it's obvious - can you spot and fix it?

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 16/18] perf evsel: Introduce perf_evsel__{str,int}val methods

2012-09-11 Thread Namhyung Kim

On Tue, 11 Sep 2012 20:53:08 -0300, Arnaldo Carvalho de Melo wrote:
> From: Arnaldo Carvalho de Melo 
>
> Wrappers to the libtraceevent routines, so that we can further reduce
> the surface contact perf builtins have with it.
[snip]
> +char *perf_evsel__strval(struct perf_evsel *evsel, struct perf_sample 
> *sample,
> +  const char *name)
> +{
> + struct format_field *field = pevent_find_field(evsel->tp_format, name);
> + int offset;
> +
> +if (!field)
> +return NULL;

Whitespace problem?

Btw, as a generic wrapper shouldn't it handle common fields also?  If
you care about performance, how about switching the order of finding
fields in question, i.e.:

struct format_field *field = pevent_find_field(evsel->tp_format, name);

if (!field) {
field = pevent_find_common_field(evsel->tp_format, name);
if (!field)
return NULL;
}

> +
> + offset = field->offset;
> +
> + if (field->flags & FIELD_IS_DYNAMIC) {
> + offset = *(int *)(sample->raw_data + field->offset);
> + offset &= 0x;
> + }
> +
> + return sample->raw_data + offset;
> +}
> +
> +u64 perf_evsel__intval(struct perf_evsel *evsel, struct perf_sample *sample,
> +const char *name)
> +{
> + struct format_field *field = pevent_find_field(evsel->tp_format, name);
> + u64 val;
> +
> +if (!field)
> +return 0;

Ditto.

Thanks,
Namhyung

> +
> + val = pevent_read_number(evsel->tp_format->pevent,
> +  sample->raw_data + field->offset, field->size);
> + return val;
> +
> +}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [V4 PATCH 27/27] memory,hotplug: Don't modify the zone_start_pfn outside of zone_span_writelock()

2012-09-11 Thread Lai Jiangshan

On 09/11/2012 06:18 PM, Yasuaki Ishimatsu wrote:
> Hi Lai,
> 
> 2012/09/11 18:44, Lai Jiangshan wrote:
>> On 09/11/2012 08:40 AM, Yasuaki Ishimatsu wrote:
>>> Hi Lai,
>>>
>>> Using memory_online to hot-added node's memory, the following kernel 
>>> messages
>>> were shown. Is this a known issue?
>>
>> Fixed.
>>
>> Subject: Don't modify the zone_start_pfn outside of zone_span_writelock()
>>
>> Original __add_zone() and new online_movable/online_kernel
>> maybe call sleep-able init_currently_empty_zone() to init wait_table,
>>
>> but this function also modifies the zone_start_pfn without lock.
>> so we move this code out, and ensure the modification of zone_start_pfn is 
>> done
>> with zone_span_writelock() held or booting.
>>
>> Since zone_start_pfn is not modified by init_currently_empty_zone()
>> grow_zone_span() needs to be updated to be aware of empty zone.
>>
>> Signed-off-by: Lai Jiangshan 
>> Reported-by: Yasuaki ISIMATU 
>> Tested-by: Wen Congyang 
> 
> Applying the patch, the kernel messages disappeared. Thanks.
> But I have a question. Using online_movable, the following messages are shown.
> 
> [  608.314608] Built 3 zonelists in Node order, mobility grouping on.  Total 
> pages: 7844412
> [  608.411478] Policy zone: Normal
> 
> I think memory is allocated to ZONE_MOVABLE by using online_movable.
> So why is "Policy zone: Normal" shown? It should be "Policy zone: Movable"
> 
>


I don't know the mean of "Policy zone" of here. but:

-
/* Highest zone. An specific allocation for a zone below that is not
   policied. */
enum zone_type policy_zone = 0;



extern enum zone_type policy_zone;

static inline void check_highest_zone(enum zone_type k)
{
if (k > policy_zone && k != ZONE_MOVABLE)
policy_zone = k;
}

--

so I think the output is correct.

Thanks,
Lai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: + mm-mmapc-replace-find_vma_prepare-with-clearer-find_vma_links.patch added to -mm tree

2012-09-11 Thread David Rientjes

On Tue, 11 Sep 2012, Hugh Dickins wrote:

> > > This does revert 2.6.27's dfe195fb79e88 ("mm: fix uninitialized variables
> > > for find_vma_prepare callers"), but it looks like gcc 4.3.0 was one of
> > > those releases too eager to shout about uninitialized variables: only
> > > copy_vma() warns with 4.5.1 and 4.7.1, which a BUG on error silences.
> > > 
> > 
> > That trick doesn't work if CONFIG_BUG=n.
> > 
> > mm/mmap.c: In function 'copy_vma':
> > mm/mmap.c:2344: warning: 'prev' may be used uninitialized in this function
> > mm/mmap.c:2345: warning: 'rb_link' may be used uninitialized in this 
> > function
> > mm/mmap.c:2345: warning: 'rb_parent' may be used uninitialized in this 
> > function
> 
> Hmm, right: not an option I ever choose, and I hadn't given it a thought.
> 
> But do we care?  If this introduced the only such warning, I would care.

It now has the distinction of being the only warning for CONFIG_BUG=n in 
mm/*.

> But that seems to be far from the case - building my usual config without
> CONFIG_BUG gives me 36 warnings, of uninitialized and unused and
> control reaches end of non-void varieties.
> 

What's so special about copy_vma() that we can't fix it all up to use 
uninitialized_var() so this is handled the proper way?  If CONFIG_BUG 
exists, then we should support it, right?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] memory-hotplug: don't replace lowmem pages with highmem

2012-09-11 Thread Minchan Kim

[1] reporeted that lowmem pages could be replaced by
highmem pages during migration of CMA and fixed.

Quote from [1]'s description
"
The filesystem layer expects pages in the block device's mapping to not
be in highmem (the mapping's gfp mask is set in bdget()), but CMA can
currently replace lowmem pages with highmem pages, leading to crashes in
filesystem code such as the one below:

  Unable to handle kernel NULL pointer dereference at virtual address 
0400
  pgd = c0c98000
  [0400] *pgd=00c91831, *pte=, *ppte=
  Internal error: Oops: 817 [#1] PREEMPT SMP ARM
  CPU: 0Not tainted  (3.5.0-rc5+ #80)
  PC is at __memzero+0x24/0x80
  ...
  Process fsstress (pid: 323, stack limit = 0xc0cbc2f0)
  Backtrace:
  [] (ext4_getblk+0x0/0x180) from [] 
(ext4_bread+0x1c/0x98)
  [] (ext4_bread+0x0/0x98) from [] 
(ext4_mkdir+0x160/0x3bc)
   r4:c15337f0
  [] (ext4_mkdir+0x0/0x3bc) from [] 
(vfs_mkdir+0x8c/0x98)
  [] (vfs_mkdir+0x0/0x98) from [] 
(sys_mkdirat+0x74/0xac)
   r6: r5:c152eb40 r4:01ff r3:c14b43f0
  [] (sys_mkdirat+0x0/0xac) from [] 
(sys_mkdir+0x20/0x24)
   r6:beccdcf0 r5:00074000 r4:beccdbbc
  [] (sys_mkdir+0x0/0x24) from [] 
(ret_fast_syscall+0x0/0x30)
"

Memory-hotplug has same problem with CMA so [1]'s fix could be applied
with memory-hotplug, too.

Fix it by reusing.

[1] 6a6dccba2, mm: cma: don't replace lowmem pages with highmem

Cc: Kamezawa Hiroyuki 
Cc: Yasuaki Ishimatsu 
Cc: Michal Nazarewicz 
Cc: Marek Szyprowski 
Cc: Wen Congyang 
Signed-off-by: Minchan Kim 
---
 mm/memory_hotplug.c |   15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 4491a6b..fb71e5c 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -752,13 +752,6 @@ static unsigned long scan_lru_pages(unsigned long start, 
unsigned long end)
return 0;
 }
 
-static struct page *
-hotremove_migrate_alloc(struct page *page, unsigned long private, int **x)
-{
-   /* This should be improved!! */
-   return alloc_page(GFP_HIGHUSER_MOVABLE);
-}
-
 #define NR_OFFLINE_AT_ONCE_PAGES   (256)
 static int
 do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
@@ -809,8 +802,12 @@ do_migrate_range(unsigned long start_pfn, unsigned long 
end_pfn)
putback_lru_pages();
goto out;
}
-   /* this function returns # of failed pages */
-   ret = migrate_pages(, hotremove_migrate_alloc, 0,
+
+   /*
+* alloc_migrate_target should be improved!!
+* migrate_pages returns # of failed pages.
+*/
+   ret = migrate_pages(, alloc_migrate_target, 0,
true, MIGRATE_SYNC);
if (ret)
putback_lru_pages();
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/2] mm: refactor out __alloc_contig_migrate_alloc

2012-09-11 Thread Minchan Kim

__alloc_contig_migrate_alloc can be used by memory-hotplug so
refactor out(move + rename as a common name) it into
page_isolation.c.

Cc: Kamezawa Hiroyuki 
Cc: Yasuaki Ishimatsu 
Cc: Michal Nazarewicz 
Cc: Marek Szyprowski 
Cc: Wen Congyang 
Signed-off-by: Minchan Kim 
---

This patch is intended for preparing next bug fix patch.

 include/linux/page-isolation.h |3 ++-
 mm/page_alloc.c|   14 +-
 mm/page_isolation.c|   11 +++
 3 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 105077a..1c82261 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -37,6 +37,7 @@ int test_pages_isolated(unsigned long start_pfn, unsigned 
long end_pfn);
  */
 int set_migratetype_isolate(struct page *page);
 void unset_migratetype_isolate(struct page *page, unsigned migratetype);
-
+struct page *alloc_migrate_target(struct page *page, unsigned long private,
+   int **resultp);
 
 #endif
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a4ff74e..6716023 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5648,18 +5648,6 @@ static unsigned long pfn_max_align_up(unsigned long pfn)
pageblock_nr_pages));
 }
 
-static struct page *
-__alloc_contig_migrate_alloc(struct page *page, unsigned long private,
-int **resultp)
-{
-   gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE;
-
-   if (PageHighMem(page))
-   gfp_mask |= __GFP_HIGHMEM;
-
-   return alloc_page(gfp_mask);
-}
-
 /* [start, end) must belong to a single zone. */
 static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
 {
@@ -5700,7 +5688,7 @@ static int __alloc_contig_migrate_range(unsigned long 
start, unsigned long end)
}
 
ret = migrate_pages(,
-   __alloc_contig_migrate_alloc,
+   alloc_migrate_target,
0, false, MIGRATE_SYNC);
}
 
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 247d1f1..6936545 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -233,3 +233,14 @@ int test_pages_isolated(unsigned long start_pfn, unsigned 
long end_pfn)
spin_unlock_irqrestore(>lock, flags);
return ret ? 0 : -EBUSY;
 }
+
+struct page *alloc_migrate_target(struct page *page, unsigned long private,
+ int **resultp)
+{
+gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE;
+
+if (PageHighMem(page))
+gfp_mask |= __GFP_HIGHMEM;
+
+return alloc_page(gfp_mask);
+}
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] clk: Make the managed clk functions generically available

2012-09-11 Thread Greg Ungerer


On 12/09/12 00:50, Artem Bityutskiy wrote:

On Tue, 2012-09-11 at 16:44 +0200, Lars-Peter Clausen wrote:

On 09/10/2012 02:39 AM, Russell King - ARM Linux wrote:

On Mon, Sep 10, 2012 at 08:20:21AM +0800, Mark Brown wrote:

[...]
OK, that's what I'd thought was going on - it was the fact that you'd
just acked the patch rather than asked for it to go to the patch tracker
or something which made me wonder if things had changed.


I kind'a forgot because it's been soo long since I took any of those
patches...


Ok, what's the plan? Should I add this patch to the patch tracker?


I'd propose to send it to Linus for v3.6 even.


If we do this can we make a decision quickly on it.

I need to fix a regression on some ColdFire parts for 3.6. I either need
to use this clk patch, which fixes my problem, or a specific patch for the
ColdFire clk code http://marc.info/?l=linux-m68k=134725225823437=2

Regards
Greg


--

Greg Ungerer  --  Principal EngineerEMAIL: g...@snapgear.com
SnapGear Group, McAfee  PHONE:   +61 7 3435 2888
8 Gardner Close FAX: +61 7 3217 5323
Milton, QLD, 4064, AustraliaWEB: http://www.SnapGear.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 3/3] perf tool: Allow wildcard in PMU name

2012-09-11 Thread Yan, Zheng

On 09/11/2012 10:05 PM, Jiri Olsa wrote:
> On Mon, Sep 10, 2012 at 03:53:51PM +0800, Yan, Zheng wrote:
>> From: "Yan, Zheng" 
>>
> 
> SNIP
> 
>> +int parse_events_add_pmu(struct list_head **_list, int *idx,
>>   char *name, struct list_head *head_config)
>>  {
>>  struct perf_event_attr attr;
>> -struct perf_pmu *pmu;
>> +struct list_head *list;
>> +struct perf_pmu *pmu = NULL;
>> +struct perf_evsel *evsel, *first = NULL;
>> +int orig_idx = *idx;
>>  
>> -pmu = perf_pmu__find(name);
>> -if (!pmu)
>> -return -EINVAL;
>> +list = malloc(sizeof(*list));
>> +if (!list)
>> +return -ENOMEM;
>> +INIT_LIST_HEAD(list);
> 
> list should be allocated only if (!*_list)) same as in add_event function
> 
> I haven't test, but I think you'll leak/loose events if there's another pmu
> event defined after ','
> 

I think *_list is always NULL, because the code in parse-event.y is:

---
PE_NAME '/' event_config '/'
{
struct parse_events_data__events *data = _data;
struct list_head *list = NULL;

ABORT_ON(parse_events_add_pmu(, >idx, $1, $3));
parse_events__free_terms($3);
$$ = list;
}

---

Regards
Yan, Zheng
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 1/2] cgroups: add documentation on extended attributes usage

2012-09-11 Thread Li Zefan

> v2: update cgroups.txt instead of creating a new file
> 
> Cc: Li Zefan 
> Cc: Tejun Heo 
> Cc: Hugh Dickins 
> Cc: Hillf Danton 
> Cc: Lennart Poettering 
> Signed-off-by: Aristeu Rozanski 
> 

Acked-by: Li Zefan 

> ---
>  Documentation/cgroups/cgroups.txt |   24 ++--
>  1 file changed, 22 insertions(+), 2 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 01/18] perf test: Remove die() calls

2012-09-11 Thread Namhyung Kim

On Tue, 11 Sep 2012 18:03:10 -0700, Arnaldo Carvalho de Melo wrote:
> Em Wed, Sep 12, 2012 at 09:24:33AM +0900, Namhyung Kim escreveu:
>> Hi, Arnaldo
>> 
>> On Tue, 11 Sep 2012 20:52:53 -0300, Arnaldo Carvalho de Melo wrote:
>> > From: Arnaldo Carvalho de Melo 
>> >
>> > Just use pr_err() + return -1 and let the other tests run as well and
>> > then the perf's main() exit doing whatever it needs.
>> [snip]
>> > diff --git a/tools/perf/builtin-test.c b/tools/perf/builtin-test.c
>> > index cf33e50..6ae102e 100644
>> > --- a/tools/perf/builtin-test.c
>> > +++ b/tools/perf/builtin-test.c
>> > @@ -1023,14 +1023,16 @@ static int __test__rdpmc(void)
>> >  
>> >fd = sys_perf_event_open(, 0, -1, -1, 0);
>> >if (fd < 0) {
>> > -  die("Error: sys_perf_event_open() syscall returned "
>> > -  "with %d (%s)\n", fd, strerror(errno));
>> > +  pr_debug("Error: sys_perf_event_open() syscall returned "
>> > +   "with %d (%s)\n", fd, strerror(errno));
>> 
>> s/pr_debug/pr_err/ ?
>> 
>> Otherwise the message will not be shown unless -v option is given
>> - and it's not sync with the changelog ;-)
>
> Oops :-\ Can you sent a fixup patch for this and the other case?

Will send it right soon.

Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH] checkpatch: Check networking specific block comment style

2012-09-11 Thread Allan, Bruce W

> -Original Message-
> From: netdev-ow...@vger.kernel.org [mailto:netdev-
> ow...@vger.kernel.org] On Behalf Of Joe Perches
> Sent: Tuesday, September 11, 2012 5:48 PM
> To: Andrew Morton
> Cc: Andy Whitcroft; David Miller; LKML; netdev
> Subject: [PATCH] checkpatch: Check networking specific block comment
> style
> 
> In an effort to get fewer checkpatch reviewer corrections,
> add a networking specific style test for the preferred
> networking comment style.
> 
>   /* The preferred style for block comments in
>* drivers/net/... and net/... is like this
>*/
> 
> These tests are only used in net/ and drivers/net/
> 
> Tested with:
> 
> $ cat drivers/net/t.c
> 
> /* foo */
> 
> /*
>  * foo
>  */
> 
> /* foo
>  */
> 
> /* foo
>  * bar */
> $ ./scripts/checkpatch.pl -f drivers/net/t.c
> WARNING: networking block comments don't use an empty /* line, use /* 
> Comment...

This conflicts with the preferred style for long (multi-line) comments 
documented in
./Documentation/CodingStyle.  If this is the way comments should be done in the
networking code this patch should also include an update to Chapter 8 in 
CodingStyle
documenting the networking specific style to avoid confusion.

Re: [PATCH signal#execve2] syscalls,x86: Add execveat() system call (v3)

2012-09-11 Thread Al Viro

On Wed, Sep 12, 2012 at 01:55:52AM +0100, Meredydd Luff wrote:
> Al (in particular): I've reworked this on top of your generic
> execve() changes, as well as incorporating feedback from HPA.
> Could you take another look please (and merge if all is well)?
> 
> [v3: now rebased onto signal.git#execve2, and takes a flags
> parameter which understands AT_SYMLINK_NOFOLLOW; all thanks to
> feedback from https://lkml.org/lkml/2012/8/1/418]
> 
> HPA is already on record calling for an execveat() which also does
> fexecve()'s job: https://lkml.org/lkml/2006/7/11/556.
> And the current glibc hack for fexecve() is already causing problems
> in the wild. Eg: https://bugzilla.redhat.com/show_bug.cgi?id=241609,
> https://lkml.org/lkml/2006/12/27/123, and as recounted at
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=514043.

Please, declare open_execat(), leaving open_exec() as it is (i.e. a
trivial wrapper for open_execat()).  Would cut down on the patch
footprint a bit...

> + bprm->filename = filename ?:
> + (const char *) file->f_path.dentry->d_name.name;

Absolutely not.  If nothing else, ->d_name can change on rename() *and*
get underlying memory freed.  At zero notice.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 1/1] pci-iov: Fix warnings when CONFIG_PCI_IOV is enabled

2012-09-11 Thread Jiang Wang

-Original Message-
From: Bjorn Helgaas [mailto:bhelg...@google.com] 
Sent: Tuesday, September 11, 2012 4:35 PM
To: Jiang Wang
Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; Chaitanya Lala; 
Francis St. Amant; Jiang Wang
Subject: Re: [PATCH 1/1] pci-iov: Fix warnings when CONFIG_PCI_IOV is enabled

On Tue, Sep 4, 2012 at 5:11 PM, Jiang Wang  wrote:
> When CONFIG_PCI_IOV is enabled, the kernel will call sriov_init().
> This function tries to allocate virtual resources even if the virtual 
> function of a PCI devive is not enabled by the BIOS.
>
> This sometimes causes following warning messages during boot up:
> pci :02:00.0: BAR 9: can't allocate mem resource 
> [0x00-0x3f] pci :02:00.0: BAR 7: can't allocate mem 
> resource [0x00-0x03]
>
> Tested on three Mitac motherboards.
>
> Signed-off-by: Jiang Wang 
> ---
>  drivers/pci/iov.c |3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 
> 6554e1a..0ca8cb2 100644
> --- a/drivers/pci/iov.c
> +++ b/drivers/pci/iov.c
> @@ -441,7 +441,8 @@ static int sriov_init(struct pci_dev *dev, int pos)
> if (ctrl & PCI_SRIOV_CTRL_VFE) {
> pci_write_config_word(dev, pos + PCI_SRIOV_CTRL, 0);
> ssleep(1);
> -   }
> +   } else
> +   return 0;

But this would mean that Linux can't *ever* enable VFs unless the BIOS enabled 
them, right?  I don't think that's what we want -- there's code in 
sriov_enable() to turn on VFE, assuming we have enough resources for the VFs.

--- I see. I didn't find the code in sriov_enable() before I send out the 
patch. Thanks for the info.

It's certainly possible that the BIOS didn't allocate large enough apertures in 
the bridges leading to the SR-IOV device to accommodate all the VFs, and Linux 
may not be smart enough to enlarge them.  And probably our warning messages in 
that case are not very enlightening.

 My purpose here is to remove those warnings. I also tried to disable 
CONFIG_PCI_IOV, but that causes another strange problem on one of machines. The 
LSI controller cannot get memory resources and 32 LSI controllers shown up even 
though only one actually exist.
I will drop this patch and do more debug to see what that problem is. Thanks.

Bjorn

> pci_read_config_word(dev, pos + PCI_SRIOV_TOTAL_VF, );
> if (!total)
> --
> 1.7.1
>

Regards,

Jiang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 15/18] perf sched: Use perf_tool as ancestor

2012-09-11 Thread Namhyung Kim

On Tue, 11 Sep 2012 20:53:07 -0300, Arnaldo Carvalho de Melo wrote:
> From: Arnaldo Carvalho de Melo 
>
> So that we can remove all the globals.
>
> Before:
>
>text  data bss dec hex filename
> 1586833110368 1438600 3135801  2fd939 /tmp/oldperf
>
> After:
>
>text  data bss dec hex filename
> 1629329 93568  848328 2571225  273bd9 /root/bin/perf
>

Just a nitpick below.

[snip]
> -static void print_bad_events(void)
> +static void print_bad_events(struct perf_sched *sched)
>  {
> - if (nr_unordered_timestamps && nr_timestamps) {
> + if (sched->nr_unordered_timestamps && sched->nr_timestamps) {
>   printf("  INFO: %.3f%% unordered timestamps (%ld out of %ld)\n",
> - 
> (double)nr_unordered_timestamps/(double)nr_timestamps*100.0,
> - nr_unordered_timestamps, nr_timestamps);
> + 
> (double)sched->nr_unordered_timestamps/(double)sched->nr_timestamps*100.0,

Isn't it sufficient to use this?

100.0 * sched->nr_unordered_timestamps / 
sched->nr_timestamps;

Ditto for belows.

Thanks,
Namhyung


> + sched->nr_unordered_timestamps, sched->nr_timestamps);
>   }
> - if (nr_lost_events && nr_events) {
> + if (sched->nr_lost_events && sched->nr_events) {
>   printf("  INFO: %.3f%% lost events (%ld out of %ld, in %ld 
> chunks)\n",
> - (double)nr_lost_events/(double)nr_events*100.0,
> - nr_lost_events, nr_events, nr_lost_chunks);
> + (double)sched->nr_lost_events/(double)sched->nr_events 
> * 100.0,
> + sched->nr_lost_events, sched->nr_events, 
> sched->nr_lost_chunks);
>   }
> - if (nr_state_machine_bugs && nr_timestamps) {
> + if (sched->nr_state_machine_bugs && sched->nr_timestamps) {
>   printf("  INFO: %.3f%% state machine bugs (%ld out of %ld)",
> - 
> (double)nr_state_machine_bugs/(double)nr_timestamps*100.0,
> - nr_state_machine_bugs, nr_timestamps);
> - if (nr_lost_events)
> + 
> (double)sched->nr_state_machine_bugs/(double)sched->nr_timestamps*100.0,
> + sched->nr_state_machine_bugs, sched->nr_timestamps);
> + if (sched->nr_lost_events)
>   printf(" (due to lost events?)");
>   printf("\n");
>   }
> - if (nr_context_switch_bugs && nr_timestamps) {
> + if (sched->nr_context_switch_bugs && sched->nr_timestamps) {
>   printf("  INFO: %.3f%% context switch bugs (%ld out of %ld)",
> - 
> (double)nr_context_switch_bugs/(double)nr_timestamps*100.0,
> - nr_context_switch_bugs, nr_timestamps);
> - if (nr_lost_events)
> + 
> (double)sched->nr_context_switch_bugs/(double)sched->nr_timestamps*100.0,
> + sched->nr_context_switch_bugs, sched->nr_timestamps);
> + if (sched->nr_lost_events)
>   printf(" (due to lost events?)");
>   printf("\n");
>   }
>  }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu 04/15] rcu: Permit RCU_NONIDLE() to be used from interrupt context

2012-09-11 Thread Paul E. McKenney

On Fri, Sep 07, 2012 at 11:16:07AM -0400, Steven Rostedt wrote:
> On Fri, 2012-09-07 at 07:47 -0700, Josh Triplett wrote:
> > On Fri, Sep 07, 2012 at 07:24:41AM -0700, Paul E. McKenney wrote:
> > > On Thu, Sep 06, 2012 at 11:09:40PM -0700, Josh Triplett wrote:
> > > > On Thu, Sep 06, 2012 at 03:54:04PM -0400, Steven Rostedt wrote:
> > > > > On Thu, 2012-09-06 at 11:54 -0700, Josh Triplett wrote:
> > > > > > Not sure I see much difference in aesthetics between the three 
> > > > > > approaches,
> > > > > > > but am willing to switch over to a generally agreed-upon scheme.
> > > > > > 
> > > > > > Steve, could I get an ack from you on the patch I sent?
> > > > > 
> > > > > I acked it, but do you just want me to take the patch? I'm getting 
> > > > > ready
> > > > > for another 3.7 push to tip.
> > > > 
> > > > Up to Paul.  What would make it easiest to coordinate that patch and the
> > > > corresponding bits in the RCU patch series?
> > > 
> > > All I need to do is to eventually remove the exports, correct?
> > > If so, full speed ahead!
> > 
> > Sounds like you could go ahead and remove the exports now, and just make
> > sure Steve's push goes in before yours.
> > 
> 
> Is there any rush to do this? I just plan on pushing it for 3.7.
> 
> Paul, you just push your changes through tip, right? Then we can just
> let Ingo know. I could even make the patch a separate branch, that Ingo
> can pull into the RCU branch too.

Yep!  But we also need to worry about -next and -fengguang.

How about if you push your change into 3.7 and I push mine into 3.8?

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 01/18] perf test: Remove die() calls

2012-09-11 Thread Arnaldo Carvalho de Melo

Em Wed, Sep 12, 2012 at 09:24:33AM +0900, Namhyung Kim escreveu:
> Hi, Arnaldo
> 
> On Tue, 11 Sep 2012 20:52:53 -0300, Arnaldo Carvalho de Melo wrote:
> > From: Arnaldo Carvalho de Melo 
> >
> > Just use pr_err() + return -1 and let the other tests run as well and
> > then the perf's main() exit doing whatever it needs.
> [snip]
> > diff --git a/tools/perf/builtin-test.c b/tools/perf/builtin-test.c
> > index cf33e50..6ae102e 100644
> > --- a/tools/perf/builtin-test.c
> > +++ b/tools/perf/builtin-test.c
> > @@ -1023,14 +1023,16 @@ static int __test__rdpmc(void)
> >  
> > fd = sys_perf_event_open(, 0, -1, -1, 0);
> > if (fd < 0) {
> > -   die("Error: sys_perf_event_open() syscall returned "
> > -   "with %d (%s)\n", fd, strerror(errno));
> > +   pr_debug("Error: sys_perf_event_open() syscall returned "
> > +"with %d (%s)\n", fd, strerror(errno));
> 
> s/pr_debug/pr_err/ ?
> 
> Otherwise the message will not be shown unless -v option is given
> - and it's not sync with the changelog ;-)

Oops :-\ Can you sent a fixup patch for this and the other case?

Thanks for the review!

- Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] ARM: OMAP2+: Enable pinctrl dummy states

2012-09-11 Thread Tony Lindgren

* Matt Porter  [120911 12:05]:
> On Tue, Sep 11, 2012 at 11:35:22AM -0700, Tony Lindgren wrote:
> > Added Linus Walleij to Cc as well.

Now I think I really managed to add Linus W to Cc, sent too fast
earlier.
...

> > But do you get an error then if the desired pins are not found?
> > If you do get an error, then sounds like it's OK to do.
>
> Hrm, no. In that case, it will be completely silent (assuming we took
> care of the pinmuxing in the bootloader) as it uses the dummy state.
> Only with debug on will you see the information that mcspi has used
> the dummy state as is the case with !DT.
...
 
> > Well I think we should consider at least the following:
> > 
> > 1. Always see warnings when device tree is populated with board-generic.
> >If somebody wants to use bootloader only muxing with DT, they can patch
> >in pinctrl_provide_dummies() somewhere. But let's assume we always
> >want to see the warnings with board-generic.c and DT.
> 
> Ok, this is clear.
> 
> > 2. For legacy booting without DT, we should not see any warnings
> >from pinctrl-single.c as it's DT based.
> 
> Right, except anything legacy booting without DT will require that
> dummy states be present otherwise it will fail probe.

But I guess we should enable the dummy states only for other
board-*.c files, not board-generic.c? 

> > 3. There may be other non-pinctrl drivers too that are not DT
> >based, and in those cases we should see the warnings as well
> >for in the non-DT case.
> 
> I'm not sure what you mean here. "non-pinctrl drivers" means any driver
> that is not yet pinctrl or DT enabled? It's unclear to me how this
> case has a bearing on mcspi and pinctrl enablement across legacy
> board-foo.c !DT booting platforms.

Right, sorry I meant "non DT pinctrl drivers"..
 
> However, I think if the approach was modified by only calling
> pinctrl_provide_dummies() when we are booting with DT populated
> and using board-generic.c then it will satisfy all of your
> concerns. Thoughts?

Hmm but shouldn't it be call pinctrl_provide_dummies() only
for other boards except board-generic.c? And that is assuming
we don't have any other "non DT pinctrl drivers" around.
 
> i.e. the legacy !DT booting will have dummy states and continue
> along through mcspi the way it does today, relying on board-foo level
> pinmux calls (or bootloader pinmuxing). Meanwhile DT booting will now
> require that a mcspi instance also require pinctrl entry in this dts.

Yes agreed, except let's just produce a warning for the pinctrl
errors..
 
> The only worrisome thing is the pinctrl requirement on DT booting is
> now an implicit requirement.

..as otherwise not much will work at this point :)
 
> > > > For board-generic.c we always want to see the warnings. And some boards
> > > > insist on doing all the muxing only in the bootloader.
> > > 
> > > Which warnings are you saying we should see in the board-generic.c
> > > case?  Sure, there's plenty of cases where this will be unused due to
> > > somebody setting all the muxes in the bootloader and then not using
> > > pinctrl data. I'll have to doublecheck but I believe that case is also
> > > fine as the -single driver can't override the dummy state if the DT has
> > > no pinctrl data for the spi driver.

I suggest all pinctrl errors should show up as warnings with
board-generic.c, but we should not exit out of the driver probe
on errors.

Regards,

Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH signal#execve2] syscalls,x86: Add execveat() system call (v3)

2012-09-11 Thread Meredydd Luff

Al (in particular): I've reworked this on top of your generic
execve() changes, as well as incorporating feedback from HPA.
Could you take another look please (and merge if all is well)?

[v3: now rebased onto signal.git#execve2, and takes a flags
parameter which understands AT_SYMLINK_NOFOLLOW; all thanks to
feedback from https://lkml.org/lkml/2012/8/1/418]

HPA is already on record calling for an execveat() which also does
fexecve()'s job: https://lkml.org/lkml/2006/7/11/556.
And the current glibc hack for fexecve() is already causing problems
in the wild. Eg: https://bugzilla.redhat.com/show_bug.cgi?id=241609,
https://lkml.org/lkml/2006/12/27/123, and as recounted at
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=514043.

So here's an attempt at just that:

--

This patch adds a new system call, execveat(2). execveat() is to
execve() as openat() is to open(): it takes a file descriptor that
refers to a directory, and resolves the filename relative to that.

In addition, if the filename is NULL, execveat() executes the file
to which the file descriptor refers. This replicates the functionality
of fexecve(), which is a system call in other UNIXen, but in Linux
glibc v2.16 it's a gross hack that depends on /proc being mounted.
That hack does not work in chrooted sandboxes, or stripped-down
systems without /proc mounted. execveat() does.

Only x86-64 and i386 ABIs are supported in this patch.

Signed-off-by: Meredydd Luff 
---
 arch/alpha/kernel/binfmt_loader.c |2 +-
 arch/x86/ia32/ia32entry.S |3 +-
 arch/x86/kernel/entry_64.S|   14 +
 arch/x86/syscalls/syscall_32.tbl  |1 +
 arch/x86/syscalls/syscall_64.tbl  |1 +
 arch/x86/um/sys_call_table_64.c   |1 +
 fs/binfmt_elf.c   |2 +-
 fs/binfmt_elf_fdpic.c |2 +-
 fs/binfmt_em86.c  |2 +-
 fs/binfmt_flat.c  |2 +-
 fs/binfmt_misc.c  |2 +-
 fs/binfmt_script.c|2 +-
 fs/exec.c |  117 +
 include/asm-generic/syscalls.h|7 ++
 include/linux/compat.h|3 +
 include/linux/fs.h|2 +-
 include/linux/sched.h |4 +
 17 files changed, 147 insertions(+), 20 deletions(-)

diff --git a/arch/alpha/kernel/binfmt_loader.c 
b/arch/alpha/kernel/binfmt_loader.c
index d1f474d..7968491 100644
--- a/arch/alpha/kernel/binfmt_loader.c
+++ b/arch/alpha/kernel/binfmt_loader.c
@@ -24,7 +24,7 @@ static int load_binary(struct linux_binprm *bprm, struct 
pt_regs *regs)
 
loader = bprm->vma->vm_end - sizeof(void *);
 
-   file = open_exec("/sbin/loader");
+   file = open_exec(AT_FDCWD, "/sbin/loader");
retval = PTR_ERR(file);
if (IS_ERR(file))
return retval;
diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index e75f941..8f60920 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -460,7 +460,8 @@ GLOBAL(\label)
PTREGSCALL stub32_sigreturn, sys32_sigreturn, %rdi
PTREGSCALL stub32_sigaltstack, sys32_sigaltstack, %rdx
PTREGSCALL stub32_execve, compat_sys_execve, %rcx
-   PTREGSCALL stub32_fork, sys_fork, %rdi
+PTREGSCALL stub32_execveat, compat_sys_execveat, %r9
+PTREGSCALL stub32_fork, sys_fork, %rdi
PTREGSCALL stub32_clone, sys32_clone, %rdx
PTREGSCALL stub32_vfork, sys_vfork, %rdi
PTREGSCALL stub32_iopl, sys_iopl, %rsi
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 58f8543..3fa3ed2 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -765,6 +765,20 @@ ENTRY(stub_execve)
CFI_ENDPROC
 END(stub_execve)
 
+ENTRY(stub_execveat)
+   CFI_STARTPROC
+   addq $8, %rsp
+   PARTIAL_FRAME 0
+   SAVE_REST
+   FIXUP_TOP_OF_STACK %r11
+   call sys_execveat
+   RESTORE_TOP_OF_STACK %r11
+   movq %rax,RAX(%rsp)
+   RESTORE_REST
+   jmp int_ret_from_sys_call
+   CFI_ENDPROC
+END(stub_execveat)
+
 /*
  * sigreturn is special because it needs to restore all registers on return.
  * This cannot be done with SYSRET, so use the IRET return path instead.
diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl
index a47103f..8fd4d3f 100644
--- a/arch/x86/syscalls/syscall_32.tbl
+++ b/arch/x86/syscalls/syscall_32.tbl
@@ -356,3 +356,4 @@
 347i386process_vm_readvsys_process_vm_readv
compat_sys_process_vm_readv
 348i386process_vm_writev   sys_process_vm_writev   
compat_sys_process_vm_writev
 349i386kcmpsys_kcmp
+350i386execveatsys_execveat
stub32_execveat
diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl
index a582bfe..f5e2558 100644
--- a/arch/x86/syscalls/syscall_64.tbl

[PATCH 1/3] 3.0.y: time: Improve sanity checking of timekeeping inputs

2012-09-11 Thread John Stultz

This is a -stable backport of 4e8b14526ca7fb046a81c94002c1c43b6fdf0e9b

Unexpected behavior could occur if the time is set to a value large
enough to overflow a 64bit ktime_t (which is something larger then the
year 2262).

Also unexpected behavior could occur if large negative offsets are
injected via adjtimex.

So this patch improves the sanity check timekeeping inputs by
improving the timespec_valid() check, and then makes better use of
timespec_valid() to make sure we don't set the time to an invalid
negative value or one that overflows ktime_t.

Note: This does not protect from setting the time close to overflowing
ktime_t and then letting natural accumulation cause the overflow.

Reported-by: CAI Qian 
Reported-by: Sasha Levin 
Signed-off-by: John Stultz 
Cc: Peter Zijlstra 
Cc: Prarit Bhargava 
Cc: Zhouping Liu 
Cc: Ingo Molnar 
Cc: sta...@vger.kernel.org
Link: 
http://lkml.kernel.org/r/1344454580-17031-1-git-send-email-john.stu...@linaro.org
Signed-off-by: Thomas Gleixner 
Cc: Linux Kernel 
Signed-off-by: John Stultz 
---
 include/linux/ktime.h |7 ---
 include/linux/time.h  |   22 --
 kernel/time/timekeeping.c |   26 --
 3 files changed, 44 insertions(+), 11 deletions(-)

diff --git a/include/linux/ktime.h b/include/linux/ktime.h
index 603bec2..06177ba10 100644
--- a/include/linux/ktime.h
+++ b/include/linux/ktime.h
@@ -58,13 +58,6 @@ union ktime {
 
 typedef union ktime ktime_t;   /* Kill this */
 
-#define KTIME_MAX  ((s64)~((u64)1 << 63))
-#if (BITS_PER_LONG == 64)
-# define KTIME_SEC_MAX (KTIME_MAX / NSEC_PER_SEC)
-#else
-# define KTIME_SEC_MAX LONG_MAX
-#endif
-
 /*
  * ktime_t definitions when using the 64-bit scalar representation:
  */
diff --git a/include/linux/time.h b/include/linux/time.h
index b306178..c8f7233 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -107,11 +107,29 @@ static inline struct timespec timespec_sub(struct 
timespec lhs,
return ts_delta;
 }
 
+#define KTIME_MAX  ((s64)~((u64)1 << 63))
+#if (BITS_PER_LONG == 64)
+# define KTIME_SEC_MAX (KTIME_MAX / NSEC_PER_SEC)
+#else
+# define KTIME_SEC_MAX LONG_MAX
+#endif
+
 /*
  * Returns true if the timespec is norm, false if denorm:
  */
-#define timespec_valid(ts) \
-   (((ts)->tv_sec >= 0) && (((unsigned long) (ts)->tv_nsec) < 
NSEC_PER_SEC))
+static inline bool timespec_valid(const struct timespec *ts)
+{
+   /* Dates before 1970 are bogus */
+   if (ts->tv_sec < 0)
+   return false;
+   /* Can't have more nanoseconds then a second */
+   if ((unsigned long)ts->tv_nsec >= NSEC_PER_SEC)
+   return false;
+   /* Disallow values that could overflow ktime_t */
+   if ((unsigned long long)ts->tv_sec >= KTIME_SEC_MAX)
+   return false;
+   return true;
+}
 
 extern void read_persistent_clock(struct timespec *ts);
 extern void read_boot_clock(struct timespec *ts);
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 678ae31..bfa6be73 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -382,7 +382,7 @@ int do_settimeofday(const struct timespec *tv)
struct timespec ts_delta;
unsigned long flags;
 
-   if ((unsigned long)tv->tv_nsec >= NSEC_PER_SEC)
+   if (!timespec_valid(tv))
return -EINVAL;
 
write_seqlock_irqsave(_lock, flags);
@@ -417,6 +417,8 @@ EXPORT_SYMBOL(do_settimeofday);
 int timekeeping_inject_offset(struct timespec *ts)
 {
unsigned long flags;
+   struct timespec tmp;
+   int ret = 0;
 
if ((unsigned long)ts->tv_nsec >= NSEC_PER_SEC)
return -EINVAL;
@@ -425,9 +427,16 @@ int timekeeping_inject_offset(struct timespec *ts)
 
timekeeping_forward_now();
 
+   tmp = timespec_add(xtime,  *ts);
+   if (!timespec_valid()) {
+   ret = -EINVAL;
+   goto error;
+   }
+
xtime = timespec_add(xtime, *ts);
wall_to_monotonic = timespec_sub(wall_to_monotonic, *ts);
 
+error: /* even if we error out, we forwarded the time, so call update */
timekeeping_update(true);
 
write_sequnlock_irqrestore(_lock, flags);
@@ -435,7 +444,7 @@ int timekeeping_inject_offset(struct timespec *ts)
/* signal hrtimers about time change */
clock_was_set();
 
-   return 0;
+   return ret;
 }
 EXPORT_SYMBOL(timekeeping_inject_offset);
 
@@ -582,7 +591,20 @@ void __init timekeeping_init(void)
struct timespec now, boot;
 
read_persistent_clock();
+   if (!timespec_valid()) {
+   pr_warn("WARNING: Persistent clock returned invalid value!\n"
+   " Check your CMOS/BIOS settings.\n");
+   now.tv_sec = 0;
+   now.tv_nsec = 0;
+   }
+
read_boot_clock();
+   if (!timespec_valid()) {
+

[PATCH 3/3] 3.0.y: time: Move ktime_t overflow checking into timespec_valid_strict

2012-09-11 Thread John Stultz

This is a -stable backport of cee58483cf56e0ba355fdd97ff5e8925329aa936

Andreas Bombe reported that the added ktime_t overflow checking added to
timespec_valid in commit 4e8b14526ca7 ("time: Improve sanity checking of
timekeeping inputs") was causing problems with X.org because it caused
timeouts larger then KTIME_T to be invalid.

Previously, these large timeouts would be clamped to KTIME_MAX and would
never expire, which is valid.

This patch splits the ktime_t overflow checking into a new
timespec_valid_strict function, and converts the timekeeping codes
internal checking to use this more strict function.

Reported-and-tested-by: Andreas Bombe 
Cc: Zhouping Liu 
Cc: Ingo Molnar 
Cc: Prarit Bhargava 
Cc: Thomas Gleixner 
Cc: sta...@vger.kernel.org
Signed-off-by: John Stultz 
Signed-off-by: Linus Torvalds 
Cc: Linux Kernel 
Signed-off-by: John Stultz 
---
 include/linux/time.h  |7 +++
 kernel/time/timekeeping.c |   14 ++
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/include/linux/time.h b/include/linux/time.h
index c8f7233..8c0216e 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -125,6 +125,13 @@ static inline bool timespec_valid(const struct timespec 
*ts)
/* Can't have more nanoseconds then a second */
if ((unsigned long)ts->tv_nsec >= NSEC_PER_SEC)
return false;
+   return true;
+}
+
+static inline bool timespec_valid_strict(const struct timespec *ts)
+{
+   if (!timespec_valid(ts))
+   return false;
/* Disallow values that could overflow ktime_t */
if ((unsigned long long)ts->tv_sec >= KTIME_SEC_MAX)
return false;
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 3bbaf2d0..c3cbd8c 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -382,7 +382,7 @@ int do_settimeofday(const struct timespec *tv)
struct timespec ts_delta;
unsigned long flags;
 
-   if (!timespec_valid(tv))
+   if (!timespec_valid_strict(tv))
return -EINVAL;
 
write_seqlock_irqsave(_lock, flags);
@@ -428,7 +428,7 @@ int timekeeping_inject_offset(struct timespec *ts)
timekeeping_forward_now();
 
tmp = timespec_add(xtime,  *ts);
-   if (!timespec_valid()) {
+   if (!timespec_valid_strict()) {
ret = -EINVAL;
goto error;
}
@@ -591,7 +591,7 @@ void __init timekeeping_init(void)
struct timespec now, boot;
 
read_persistent_clock();
-   if (!timespec_valid()) {
+   if (!timespec_valid_strict()) {
pr_warn("WARNING: Persistent clock returned invalid value!\n"
" Check your CMOS/BIOS settings.\n");
now.tv_sec = 0;
@@ -599,7 +599,7 @@ void __init timekeeping_init(void)
}
 
read_boot_clock();
-   if (!timespec_valid()) {
+   if (!timespec_valid_strict()) {
pr_warn("WARNING: Boot clock returned invalid value!\n"
" Check your CMOS/BIOS settings.\n");
boot.tv_sec = 0;
@@ -649,6 +649,12 @@ static void update_sleep_time(struct timespec t)
  */
 static void __timekeeping_inject_sleeptime(struct timespec *delta)
 {
+   if (!timespec_valid_strict(delta)) {
+   printk(KERN_WARNING "__timekeeping_inject_sleeptime: Invalid "
+   "sleep delta value!\n");
+   return;
+   }
+
xtime = timespec_add(xtime, *delta);
wall_to_monotonic = timespec_sub(wall_to_monotonic, *delta);
update_sleep_time(timespec_add(total_sleep_time, *delta));
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/3] 3.0-stable timekeeping fixes merged in 3.6

2012-09-11 Thread John Stultz

Just wanted to send out a few timekeeping fixes that were merged
in 3.6 which are appropriate for -stable.

This queue backports the following fixes:
-
cee58483cf56e0ba355fdd97ff5e8925329aa936time: Move ktime_t overflow 
checking into timespec_valid_strict
bf2ac312195155511a0f79325515cbb61929898atime: Avoid making adjustments if 
we haven't accumulated anything
4e8b14526ca7fb046a81c94002c1c43b6fdf0e9btime: Improve sanity checking of 
timekeeping inputs

I've run these through my timetest suite w/ kvm on both i386
& x86_64. But more testing would be of course appreciated.
https://github.com/johnstultz-work/timetests

I also have patch queues for all the -stable trees that I'll be
sending out as my testing completes for those trees.

Cc: Prarit Bhargava 
Cc: Thomas Gleixner 
Cc: Linux Kernel 

John Stultz (3):
  3.0.y: time: Improve sanity checking of timekeeping inputs
  3.0.y: time: Avoid making adjustments if we haven't accumulated
anything
  3.0.y: time: Move ktime_t overflow checking into
timespec_valid_strict

 include/linux/ktime.h |7 ---
 include/linux/time.h  |   29 +++--
 kernel/time/timekeeping.c |   36 ++--
 3 files changed, 61 insertions(+), 11 deletions(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/3] 3.0.y: time: Avoid making adjustments if we haven't accumulated anything

2012-09-11 Thread John Stultz

This is a -stable backport of bf2ac312195155511a0f79325515cbb61929898a

If update_wall_time() is called and the current offset isn't large
enough to accumulate, avoid re-calling timekeeping_adjust which may
change the clock freq and can cause 1ns inconsistencies with
CLOCK_REALTIME_COARSE/CLOCK_MONOTONIC_COARSE.

Signed-off-by: John Stultz 
Cc: Prarit Bhargava 
Cc: Ingo Molnar 
Cc: sta...@vger.kernel.org
Link: 
http://lkml.kernel.org/r/1345595449-34965-5-git-send-email-john.stu...@linaro.org
Signed-off-by: Thomas Gleixner 
Cc: Linux Kernel 
Signed-off-by: John Stultz 
---
 kernel/time/timekeeping.c |4 
 1 file changed, 4 insertions(+)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index bfa6be73..3bbaf2d0 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -919,6 +919,10 @@ static void update_wall_time(void)
 #else
offset = (clock->read(clock) - clock->cycle_last) & clock->mask;
 #endif
+   /* Check if there's really nothing to do */
+   if (offset < timekeeper.cycle_interval)
+   return;
+
timekeeper.xtime_nsec = (s64)xtime.tv_nsec << timekeeper.shift;
 
/*
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] checkpatch: Check networking specific block comment style

2012-09-11 Thread Joe Perches

In an effort to get fewer checkpatch reviewer corrections,
add a networking specific style test for the preferred
networking comment style.

/* The preferred style for block comments in
 * drivers/net/... and net/... is like this
 */

These tests are only used in net/ and drivers/net/

Tested with:

$ cat drivers/net/t.c

/* foo */

/*
 * foo
 */

/* foo
 */

/* foo
 * bar */
$ ./scripts/checkpatch.pl -f drivers/net/t.c
WARNING: networking block comments don't use an empty /* line, use /* Comment...
#4: FILE: net/t.c:4:
+
+/*

WARNING: networking block comments put the trailing */ on a separate line
#12: FILE: net/t.c:12:
+ * bar */

total: 0 errors, 2 warnings, 12 lines checked

Signed-off-by: Joe Perches 
---
 scripts/checkpatch.pl |   14 ++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index ca05ba2..7165516 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -1873,6 +1873,20 @@ sub process {
"No space is necessary after a cast\n" . $hereprev);
}
 
+   if ($realfile =~ m@^(drivers/net/|net/)@ &&
+   $rawline =~ /^\+[ \t]*\/\*[ \t]*$/ &&
+   $prevrawline =~ /^\+[ \t]*$/) {
+   WARN("NETWORKING_BLOCK_COMMENT_STYLE",
+"networking block comments don't use an empty /* 
line, use /* Comment...\n" . $hereprev);
+   }
+
+   if ($realfile =~ m@^(drivers/net/|net/)@ &&
+   $rawline !~ m@^\+[ \t]*(\/\*|\*\/)@ &&
+   $rawline =~ m@^\+[ \t]*.+\*\/[ \t]*$@) {
+   WARN("NETWORKING_BLOCK_COMMENT_STYLE",
+"networking block comments put the trailing */ on 
a separate line\n" . $herecurr);
+   }
+
 # check for spaces at the beginning of a line.
 # Exceptions:
 #  1) within comments


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 02/18] perf sched: Remove die() calls

2012-09-11 Thread Namhyung Kim

On Tue, 11 Sep 2012 20:52:54 -0300, Arnaldo Carvalho de Melo wrote:
> From: Arnaldo Carvalho de Melo 
>
> Just use pr_err() + return -1 and perf_session__process_events to abort
> when some event would call die(), then let the perf's main() exit doing
> whatever it needs.

It looks this patch has a same problem as patch 1/18.

Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 08/12] tty/serial: Add kgdb_nmi driver

2012-09-11 Thread Anton Vorontsov

On Tue, Sep 11, 2012 at 03:14:20PM +0100, Alan Cox wrote:
> > +struct kgdb_nmi_tty_priv {
> > +   struct tty_port port;
> > +   int opened;
> > +   struct tasklet_struct tlet;
> > +   STRUCT_KFIFO(char, KGDB_NMI_FIFO_SIZE) fifo;
> 
> I don't see where "opened" is used.

Yup, a leftover, with tty_port no longer needed. Thanks for noticing.

> > +static const struct tty_operations kgdb_nmi_tty_ops = {
> > +   .open   = kgdb_nmi_tty_open,
> > +   .close  = kgdb_nmi_tty_close,
> > +   .install= kgdb_nmi_tty_install,
> > +   .cleanup= kgdb_nmi_tty_cleanup,
> > +   .write_room = kgdb_nmi_tty_write_room,
> > +   .write  = kgdb_nmi_tty_write,
> 
> And a hangup method (just using tty_port helpers will do the job - it's
> needed so vhangup() works as expected on a port)

Will add. Thanks a lot!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3] trace: Move trace event enable from fs_initcall to core_initcall

2012-09-11 Thread Steven Rostedt

On Sat, 2012-09-08 at 17:01 -0300, Ezequiel Garcia wrote:
> This patch splits trace event initialization in two stages:
>  * ftrace enable
>  * sysfs event entry creation
> 
> This allows to capture trace events from an earlier point
> by using 'trace_event' kernel parameter and is important
> to trace boot-up allocations.
> 
> Note that, in order to enable events at core_initcall,
> it's necessary to move init_ftrace_syscalls() from
> core_initcall to early_initcall.

Found another issue...

> 
> Cc: Steven Rostedt 
> Signed-off-by: Ezequiel Garcia 
> ---
> Changes from v1:
>   * Rework code as requested by Steven.
> 
> Changes from v2:
>   * Move init_ftrace_syscalls() to early_initcall, 
> so syscalls self-test pass.
> 
>  kernel/trace/trace_events.c   |  104 +++-
>  kernel/trace/trace_syscalls.c |2 +-
>  2 files changed, 71 insertions(+), 35 deletions(-)
> 
> diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> index 29111da..4eaf86e 100644
> --- a/kernel/trace/trace_events.c
> +++ b/kernel/trace/trace_events.c
> @@ -1199,6 +1199,31 @@ event_create_dir(struct ftrace_event_call *call, 
> struct dentry *d_events,
>   return 0;
>  }
>  
> +static void event_remove(struct ftrace_event_call *call)
> +{
> + ftrace_event_enable_disable(call, 0);
> + if (call->event.funcs)
> + __unregister_ftrace_event(>event);
> + list_del(>list);
> +}
> +
> +static int event_init(struct ftrace_event_call *call)
> +{
> + int ret = 0;
> +
> + if (WARN_ON(!call->name))
> + return -EINVAL;
> +
> + if (call->class->raw_init) {
> + ret = call->class->raw_init(call);

If raw_init() returns a failure, we skip this event.

> + if (ret < 0 && ret != -ENOSYS)
> + pr_warn("Could not initialize trace events/%s\n",
> + call->name);
> + }
> +
> + return ret;
> +}
> +
>  static int
>  __trace_add_event_call(struct ftrace_event_call *call, struct module *mod,
>  const struct file_operations *id,
> @@ -1209,19 +1234,9 @@ __trace_add_event_call(struct ftrace_event_call *call, 
> struct module *mod,
>   struct dentry *d_events;
>   int ret;
>  
> - /* The linker may leave blanks */
> - if (!call->name)
> - return -EINVAL;
> -
> - if (call->class->raw_init) {
> - ret = call->class->raw_init(call);
> - if (ret < 0) {
> - if (ret != -ENOSYS)
> - pr_warning("Could not initialize trace 
> events/%s\n",
> -call->name);
> - return ret;
> - }
> - }
> + ret = event_init(call);
> + if (ret < 0)
> + return ret;
>  
>   d_events = event_trace_events_dir();
>   if (!d_events)
> @@ -1272,13 +1287,10 @@ static void remove_subsystem_dir(const char *name)
>   */
>  static void __trace_remove_event_call(struct ftrace_event_call *call)
>  {
> - ftrace_event_enable_disable(call, 0);
> - if (call->event.funcs)
> - __unregister_ftrace_event(>event);
> - debugfs_remove_recursive(call->dir);
> - list_del(>list);
> + event_remove(call);
>   trace_destroy_fields(call);
>   destroy_preds(call);
> + debugfs_remove_recursive(call->dir);
>   remove_subsystem_dir(call->class->system);
>  }
>  
> @@ -1450,6 +1462,36 @@ static __init int setup_trace_event(char *str)
>  }
>  __setup("trace_event=", setup_trace_event);
>  
> +static __init int event_trace_enable(void)
> +{
> + struct ftrace_event_call **iter, *call;
> + char *buf = bootup_event_buf;
> + char *token;
> + int ret;
> +
> + for_each_event(iter, __start_ftrace_events, __stop_ftrace_events) {
> +
> + call = *iter;
> + ret = event_init(call);
> + if (!ret)
> + list_add(>list, _events);
> + }
> +
> + while (true) {
> + token = strsep(, ",");
> +
> + if (!token)
> + break;
> + if (!*token)
> + continue;
> +
> + ret = ftrace_set_clr_event(token, 1);
> + if (ret)
> + pr_warn("Failed to enable trace event: %s\n", token);
> + }
> + return 0;
> +}
> +
>  static __init int event_trace_init(void)
>  {
>   struct ftrace_event_call **call;
> @@ -1457,8 +1499,6 @@ static __init int event_trace_init(void)
>   struct dentry *entry;
>   struct dentry *d_events;
>   int ret;
> - char *buf = bootup_event_buf;
> - char *token;
>  
>   d_tracer = tracing_init_dentry();
>   if (!d_tracer)
> @@ -1497,24 +1537,19 @@ static __init int event_trace_init(void)
>   if (trace_define_common_fields())
>   pr_warning("tracing: Failed to allocate common fields");
>  
> + /*
> +  * Early initialization already enabled ftrace

Re: [PATCH 01/18] perf test: Remove die() calls

2012-09-11 Thread Namhyung Kim

Hi, Arnaldo

On Tue, 11 Sep 2012 20:52:53 -0300, Arnaldo Carvalho de Melo wrote:
> From: Arnaldo Carvalho de Melo 
>
> Just use pr_err() + return -1 and let the other tests run as well and
> then the perf's main() exit doing whatever it needs.
[snip]
> diff --git a/tools/perf/builtin-test.c b/tools/perf/builtin-test.c
> index cf33e50..6ae102e 100644
> --- a/tools/perf/builtin-test.c
> +++ b/tools/perf/builtin-test.c
> @@ -1023,14 +1023,16 @@ static int __test__rdpmc(void)
>  
>   fd = sys_perf_event_open(, 0, -1, -1, 0);
>   if (fd < 0) {
> - die("Error: sys_perf_event_open() syscall returned "
> - "with %d (%s)\n", fd, strerror(errno));
> + pr_debug("Error: sys_perf_event_open() syscall returned "
> +  "with %d (%s)\n", fd, strerror(errno));

s/pr_debug/pr_err/ ?

Otherwise the message will not be shown unless -v option is given
- and it's not sync with the changelog ;-)


> + return -1;
>   }
>  
>   addr = mmap(NULL, page_size, PROT_READ, MAP_SHARED, fd, 0);
>   if (addr == (void *)(-1)) {
> - die("Error: mmap() syscall returned "
> - "with (%s)\n", strerror(errno));
> + pr_debug("Error: mmap() syscall returned with (%s)\n",
> +  strerror(errno));

Ditto.

Thanks,
Namhyung


> + goto out_close;
>   }
>  
>   for (n = 0; n < 6; n++) {
> @@ -1051,9 +1053,9 @@ static int __test__rdpmc(void)
>   }
>  
>   munmap(addr, page_size);
> - close(fd);
> -
>   pr_debug("   ");
> +out_close:
> + close(fd);
>  
>   if (!delta_sum)
>   return -1;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [Resend][PATCH V3] trace,x86: add x86 irq vector tracepoints

2012-09-11 Thread Seiji Aguchi

Thomas,

Please review my patch as we talked in Plumbers.

Seiji

> -Original Message-
> From: Seiji Aguchi
> Sent: Friday, August 24, 2012 11:22 AM
> To: Thomas Gleixner (t...@linutronix.de)
> Cc: linux-kernel@vger.kernel.org; rost...@goodmis.org; 'mi...@elte.hu' 
> (mi...@elte.hu); x...@kernel.org; dle-
> deve...@lists.sourceforge.net; Satoru Moriya
> Subject: [Resend][PATCH V3] trace,x86: add x86 irq vector tracepoints
> 
> Tomas,
> 
> It is helpful if you can review this patch.
> 
> Change log
>  v2 -> v3
>  - Remove an invalidate_tlb_vector event because it was replaced by a call 
> function vector
>in a following commit.
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=52aec3308db85f4e9f5c8b9f5dc4fbd0138c6fa4
> 
>  v1 -> v2
>  - Modify variable name from irq to vector.
>  - Merge arch-specific tracepoints below to an arch_irq_vector_entry/exit.
>- error_apic_vector
>- thermal_apic_vector
>- threshold_apic_vector
>- spurious_apic_vector
>- x86_platform_ipi_vector
> 
> As Vaibhav explained in the thread below, tracepoints for irq vectors are 
> useful.
> 
> http://www.spinics.net/lists/mm-commits/msg85707.html
> 
> 
> The current interrupt traces from irq_handler_entry and irq_handler_exit 
> provide when an interrupt is handled.  They provide good
> data about when the system has switched to kernel space and how it affects 
> the currently running processes.
> 
> There are some IRQ vectors which trigger the system into kernel space, which 
> are not handled in generic IRQ handlers.  Tracing such
> events gives us the information about IRQ interaction with other system 
> events.
> 
> The trace also tells where the system is spending its time.  We want to know 
> which cores are handling interrupts and how they are
> affecting other processes in the system.  Also, the trace provides 
> information about when the cores are idle and which interrupts are
> changing that state.
> 
> 
> On the other hand, my usecase is tracing just local timer event and getting a 
> value of instruction pointer.
> 
>   I suggested to add an argument local timer event to get instruction pointer 
> before.
>   But there is another way to get it with external module like systemtap.
>   So, I don't need to add any argument to irq vector tracepoints now.
> 
> Vaibhav's patch shared a trace point ,irq_vector_entry/irq_vector_exit, in 
> all events.
> But there is an above use case to trace specific irq_vector rather than 
> tracing all events.
> In this case, we are concerned about overhead due to unwanted events.
> 
> This patch modifies Vaibhav's one as follows.
>  - Separate generic, and across-architecture tracepoints to enable 
> independently.
>- nmi_vector
>- local_timer_vector
>- reschedule_vector
>- call_function_vector
>- call_function_single_vector
>- irq_work_entry_vector
> 
>  - Rename architecture-specific tracepoints from irq_vector_entry/exit to
>arch_irq_vector_entry/exit.
>- error_apic_vector
>- thermal_apic_vector
>- threshold_apic_vector
>- spurious_apic_vector
>- x86_platform_ipi_vector
> 
>Those x86 specific ones are not really frequently raised vectors, so
>enabling them all won't affect performance and readability of the
>traces too much.
> 
>  Signed-off-by: Seiji Aguchi 
> 
> ---
>  arch/x86/include/asm/irq_vectors.h   |9 ++
>  arch/x86/kernel/apic/apic.c  |7 +
>  arch/x86/kernel/cpu/mcheck/therm_throt.c |3 +
>  arch/x86/kernel/cpu/mcheck/threshold.c   |3 +
>  arch/x86/kernel/irq.c|5 +
>  arch/x86/kernel/irq_work.c   |3 +
>  arch/x86/kernel/nmi.c|3 +
>  arch/x86/kernel/smp.c|7 +
>  include/trace/events/irq_vectors.h   |  209 
> ++
>  9 files changed, 249 insertions(+), 0 deletions(-)  create mode 100644 
> include/trace/events/irq_vectors.h
> 
> diff --git a/arch/x86/include/asm/irq_vectors.h 
> b/arch/x86/include/asm/irq_vectors.h
> index 1508e51..510ced5 100644
> --- a/arch/x86/include/asm/irq_vectors.h
> +++ b/arch/x86/include/asm/irq_vectors.h
> @@ -158,4 +158,13 @@ static inline int invalid_vm86_irq(int irq)
>  # define NR_IRQS NR_IRQS_LEGACY
>  #endif
> 
> +#define irq_vector_name(vector) { vector, #vector }
> +
> +#define irq_vector_name_table
> \
> + irq_vector_name(ERROR_APIC_VECTOR), \
> + irq_vector_name(THERMAL_APIC_VECTOR),   \
> + irq_vector_name(THRESHOLD_APIC_VECTOR), \
> + irq_vector_name(SPURIOUS_APIC_VECTOR),  \
> + irq_vector_name(X86_PLATFORM_IPI_VECTOR)
> +
>  #endif /* _ASM_X86_IRQ_VECTORS_H */
> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index 
> 24deb30..b9cdd8f 100644
> ---

[PATCH 14/18] perf sched: Remove unused thread parameter

2012-09-11 Thread Arnaldo Carvalho de Melo

From: Arnaldo Carvalho de Melo 

>From the tracepoint handling routines.

Cc: David Ahern 
Cc: Frederic Weisbecker 
Cc: Jiri Olsa 
Cc: Mike Galbraith 
Cc: Namhyung Kim 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Link: http://lkml.kernel.org/n/tip-mcqd9mv34z6he0wqiz4a3...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/builtin-sched.c |   23 ---
 1 file changed, 8 insertions(+), 15 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 82e8ec2..af11b1a 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -1372,8 +1372,7 @@ static struct trace_sched_handler *trace_handler;
 static int process_sched_wakeup_event(struct perf_tool *tool __maybe_unused,
  struct event_format *event,
  struct perf_sample *sample,
- struct machine *machine,
- struct thread *thread __maybe_unused)
+ struct machine *machine)
 {
void *data = sample->raw_data;
struct trace_wakeup_event wakeup_event;
@@ -1489,8 +1488,7 @@ map_switch_event(struct trace_switch_event *switch_event,
 static int process_sched_switch_event(struct perf_tool *tool __maybe_unused,
  struct event_format *event,
  struct perf_sample *sample,
- struct machine *machine,
- struct thread *thread __maybe_unused)
+ struct machine *machine)
 {
int this_cpu = sample->cpu, err = 0;
void *data = sample->raw_data;
@@ -1524,8 +1522,7 @@ static int process_sched_switch_event(struct perf_tool 
*tool __maybe_unused,
 static int process_sched_runtime_event(struct perf_tool *tool __maybe_unused,
   struct event_format *event,
   struct perf_sample *sample,
-  struct machine *machine,
-  struct thread *thread __maybe_unused)
+  struct machine *machine)
 {
void *data = sample->raw_data;
struct trace_runtime_event runtime_event;
@@ -1545,8 +1542,7 @@ static int process_sched_runtime_event(struct perf_tool 
*tool __maybe_unused,
 static int process_sched_fork_event(struct perf_tool *tool __maybe_unused,
struct event_format *event,
struct perf_sample *sample,
-   struct machine *machine __maybe_unused,
-   struct thread *thread __maybe_unused)
+   struct machine *machine __maybe_unused)
 {
void *data = sample->raw_data;
struct trace_fork_event fork_event;
@@ -1568,8 +1564,7 @@ static int process_sched_fork_event(struct perf_tool 
*tool __maybe_unused,
 static int process_sched_exit_event(struct perf_tool *tool __maybe_unused,
struct event_format *event,
struct perf_sample *sample __maybe_unused,
-   struct machine *machine __maybe_unused,
-   struct thread *thread __maybe_unused)
+   struct machine *machine __maybe_unused)
 {
if (verbose)
printf("sched_exit event %p\n", event);
@@ -1580,8 +1575,7 @@ static int process_sched_exit_event(struct perf_tool 
*tool __maybe_unused,
 static int process_sched_migrate_task_event(struct perf_tool *tool 
__maybe_unused,
struct event_format *event,
struct perf_sample *sample,
-   struct machine *machine,
-   struct thread *thread 
__maybe_unused)
+   struct machine *machine)
 {
void *data = sample->raw_data;
struct trace_migrate_task_event migrate_task_event;
@@ -1603,8 +1597,7 @@ static int process_sched_migrate_task_event(struct 
perf_tool *tool __maybe_unuse
 typedef int (*tracepoint_handler)(struct perf_tool *tool,
  struct event_format *tp_format,
  struct perf_sample *sample,
- struct machine *machine,
- struct thread *thread);
+ struct machine *machine);
 
 static int perf_sched__process_tracepoint_sample(struct perf_tool *tool 
__maybe_unused,
 union perf_event *event 
__maybe_unused,
@@ -1626,7 +1619,7 @@ static int

[PATCH 01/18] perf test: Remove die() calls

2012-09-11 Thread Arnaldo Carvalho de Melo

From: Arnaldo Carvalho de Melo 

Just use pr_err() + return -1 and let the other tests run as well and
then the perf's main() exit doing whatever it needs.

Cc: David Ahern 
Cc: Frederic Weisbecker 
Cc: Jiri Olsa 
Cc: Mike Galbraith 
Cc: Namhyung Kim 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Link: http://lkml.kernel.org/n/tip-n5ahw26e94klmde9cz6rx...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/builtin-test.c |   14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/tools/perf/builtin-test.c b/tools/perf/builtin-test.c
index cf33e50..6ae102e 100644
--- a/tools/perf/builtin-test.c
+++ b/tools/perf/builtin-test.c
@@ -1023,14 +1023,16 @@ static int __test__rdpmc(void)
 
fd = sys_perf_event_open(, 0, -1, -1, 0);
if (fd < 0) {
-   die("Error: sys_perf_event_open() syscall returned "
-   "with %d (%s)\n", fd, strerror(errno));
+   pr_debug("Error: sys_perf_event_open() syscall returned "
+"with %d (%s)\n", fd, strerror(errno));
+   return -1;
}
 
addr = mmap(NULL, page_size, PROT_READ, MAP_SHARED, fd, 0);
if (addr == (void *)(-1)) {
-   die("Error: mmap() syscall returned "
-   "with (%s)\n", strerror(errno));
+   pr_debug("Error: mmap() syscall returned with (%s)\n",
+strerror(errno));
+   goto out_close;
}
 
for (n = 0; n < 6; n++) {
@@ -1051,9 +1053,9 @@ static int __test__rdpmc(void)
}
 
munmap(addr, page_size);
-   close(fd);
-
pr_debug("   ");
+out_close:
+   close(fd);
 
if (!delta_sum)
return -1;
-- 
1.7.9.2.358.g22243

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 16/18] perf evsel: Introduce perf_evsel__{str,int}val methods

2012-09-11 Thread Arnaldo Carvalho de Melo

From: Arnaldo Carvalho de Melo 

Wrappers to the libtraceevent routines, so that we can further reduce
the surface contact perf builtins have with it.

Cc: David Ahern 
Cc: Frederic Weisbecker 
Cc: Jiri Olsa 
Cc: Mike Galbraith 
Cc: Namhyung Kim 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Link: http://lkml.kernel.org/n/tip-rtmgzptvrifzjxqwb9vs6...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/evsel.c |   35 +++
 tools/perf/util/evsel.h |7 +++
 2 files changed, 42 insertions(+)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 06f7644..1506ba0 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include "asm/bug.h"
+#include "event-parse.h"
 #include "evsel.h"
 #include "evlist.h"
 #include "util.h"
@@ -1000,3 +1001,37 @@ int perf_event__synthesize_sample(union perf_event 
*event, u64 type,
 
return 0;
 }
+
+char *perf_evsel__strval(struct perf_evsel *evsel, struct perf_sample *sample,
+const char *name)
+{
+   struct format_field *field = pevent_find_field(evsel->tp_format, name);
+   int offset;
+
+if (!field)
+return NULL;
+
+   offset = field->offset;
+
+   if (field->flags & FIELD_IS_DYNAMIC) {
+   offset = *(int *)(sample->raw_data + field->offset);
+   offset &= 0x;
+   }
+
+   return sample->raw_data + offset;
+}
+
+u64 perf_evsel__intval(struct perf_evsel *evsel, struct perf_sample *sample,
+  const char *name)
+{
+   struct format_field *field = pevent_find_field(evsel->tp_format, name);
+   u64 val;
+
+if (!field)
+return 0;
+
+   val = pevent_read_number(evsel->tp_format->pevent,
+sample->raw_data + field->offset, field->size);
+   return val;
+
+}
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 390690e..dc40fe3 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -120,6 +120,13 @@ int perf_evsel__open(struct perf_evsel *evsel, struct 
cpu_map *cpus,
 struct thread_map *threads);
 void perf_evsel__close(struct perf_evsel *evsel, int ncpus, int nthreads);
 
+struct perf_sample;
+
+char *perf_evsel__strval(struct perf_evsel *evsel, struct perf_sample *sample,
+const char *name);
+u64 perf_evsel__intval(struct perf_evsel *evsel, struct perf_sample *sample,
+  const char *name);
+
 #define perf_evsel__match(evsel, t, c) \
(evsel->attr.type == PERF_TYPE_##t &&   \
 evsel->attr.config == PERF_COUNT_##c)
-- 
1.7.9.2.358.g22243

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 05/18] perf tools: Add missing perf_regs.h file to MANIFEST

2012-09-11 Thread Arnaldo Carvalho de Melo

From: Arnaldo Carvalho de Melo 

The 2bcd355 broke the perf-tar*-src-pkg generated tarballs builds, fix
it.

Cc: David Ahern 
Cc: Frederic Weisbecker 
Cc: Jiri Olsa 
Cc: Mike Galbraith 
Cc: Namhyung Kim 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Link: http://lkml.kernel.org/n/tip-2ndz2o636rn4q175fwn18...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/MANIFEST |1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/perf/MANIFEST b/tools/perf/MANIFEST
index b4b572e..0518079 100644
--- a/tools/perf/MANIFEST
+++ b/tools/perf/MANIFEST
@@ -10,6 +10,7 @@ include/linux/stringify.h
 lib/rbtree.c
 include/linux/swab.h
 arch/*/include/asm/unistd*.h
+arch/*/include/asm/perf_regs.h
 arch/*/lib/memcpy*.S
 arch/*/lib/memset*.S
 include/linux/poison.h
-- 
1.7.9.2.358.g22243

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 02/18] perf sched: Remove die() calls

2012-09-11 Thread Arnaldo Carvalho de Melo

From: Arnaldo Carvalho de Melo 

Just use pr_err() + return -1 and perf_session__process_events to abort
when some event would call die(), then let the perf's main() exit doing
whatever it needs.

Cc: David Ahern 
Cc: Frederic Weisbecker 
Cc: Jiri Olsa 
Cc: Mike Galbraith 
Cc: Namhyung Kim 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Link: http://lkml.kernel.org/n/tip-88cwdogxqomsy9tfr8r0a...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/builtin-sched.c |  281 
 1 file changed, 179 insertions(+), 102 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index a25a023..782f66d 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -423,8 +423,8 @@ static int self_open_counters(void)
fd = sys_perf_event_open(, 0, -1, -1, 0);
 
if (fd < 0)
-   die("Error: sys_perf_event_open() syscall returned"
-   "with %d (%s)\n", fd, strerror(errno));
+   pr_debug("Error: sys_perf_event_open() syscall returned"
+"with %d (%s)\n", fd, strerror(errno));
return fd;
 }
 
@@ -450,7 +450,8 @@ static void *thread_func(void *ctx)
sprintf(comm2, ":%s", this_task->comm);
prctl(PR_SET_NAME, comm2);
fd = self_open_counters();
-
+   if (fd < 0)
+   return NULL;
 again:
ret = sem_post(_task->ready_for_work);
BUG_ON(ret);
@@ -726,30 +727,30 @@ struct trace_migrate_task_event {
 };
 
 struct trace_sched_handler {
-   void (*switch_event)(struct trace_switch_event *,
-struct machine *,
-struct event_format *,
-struct perf_sample *sample);
-
-   void (*runtime_event)(struct trace_runtime_event *,
- struct machine *,
- struct perf_sample *sample);
+   int (*switch_event)(struct trace_switch_event *event,
+   struct machine *machine,
+   struct event_format *tp_format,
+   struct perf_sample *sample);
 
-   void (*wakeup_event)(struct trace_wakeup_event *,
-struct machine *,
-struct event_format *,
+   int (*runtime_event)(struct trace_runtime_event *event,
+struct machine *machine,
 struct perf_sample *sample);
 
-   void (*fork_event)(struct trace_fork_event *,
-  struct event_format *event);
+   int (*wakeup_event)(struct trace_wakeup_event *event,
+   struct machine *machine,
+   struct event_format *tp_format,
+   struct perf_sample *sample);
 
-   void (*migrate_task_event)(struct trace_migrate_task_event *,
-  struct machine *machine,
-  struct perf_sample *sample);
+   int (*fork_event)(struct trace_fork_event *event,
+ struct event_format *tp_format);
+
+   int (*migrate_task_event)(struct trace_migrate_task_event *event,
+ struct machine *machine,
+ struct perf_sample *sample);
 };
 
 
-static void
+static int
 replay_wakeup_event(struct trace_wakeup_event *wakeup_event,
struct machine *machine __used,
struct event_format *event, struct perf_sample *sample)
@@ -769,11 +770,12 @@ replay_wakeup_event(struct trace_wakeup_event 
*wakeup_event,
wakee = register_pid(wakeup_event->pid, wakeup_event->comm);
 
add_sched_event_wakeup(waker, sample->time, wakee);
+   return 0;
 }
 
 static u64 cpu_last_switched[MAX_CPUS];
 
-static void
+static int
 replay_switch_event(struct trace_switch_event *switch_event,
struct machine *machine __used,
struct event_format *event,
@@ -788,7 +790,7 @@ replay_switch_event(struct trace_switch_event *switch_event,
printf("sched_switch event %p\n", event);
 
if (cpu >= MAX_CPUS || cpu < 0)
-   return;
+   return 0;
 
timestamp0 = cpu_last_switched[cpu];
if (timestamp0)
@@ -796,8 +798,10 @@ replay_switch_event(struct trace_switch_event 
*switch_event,
else
delta = 0;
 
-   if (delta < 0)
-   die("hm, delta: %" PRIu64 " < 0 ?\n", delta);
+   if (delta < 0) {
+   pr_debug("hm, delta: %" PRIu64 " < 0 ?\n", delta);
+   return -1;
+   }
 
if (verbose) {
printf(" ... switch from %s/%d to %s/%d [ran %" PRIu64 " 
nsecs]\n",
@@ -813,10 +817,12 @@ replay_switch_event(struct trace_switch_event 
*switch_event,
 
add_sched_event_run(prev, timestamp, delta);

[PATCH 11/18] perf symbols: Make dsos__find function globally available

2012-09-11 Thread Arnaldo Carvalho de Melo

From: Jiri Olsa 

Changing dsos__find function from static to be globally available.

Signed-off-by: Jiri Olsa 
Cc: Frederic Weisbecker 
Cc: Ingo Molnar 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Link: 
http://lkml.kernel.org/r/1347295819-23177-4-git-send-email-jo...@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/symbol.c |2 +-
 tools/perf/util/symbol.h |1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index ba85d4f..bbb24e9 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1596,7 +1596,7 @@ void dsos__add(struct list_head *head, struct dso *dso)
list_add_tail(>node, head);
 }
 
-static struct dso *dsos__find(struct list_head *head, const char *name)
+struct dso *dsos__find(struct list_head *head, const char *name)
 {
struct dso *pos;
 
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index 41a15da..dde8a26 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -294,6 +294,7 @@ static inline void dso__set_loaded(struct dso *dso, enum 
map_type type)
 void dso__sort_by_name(struct dso *dso, enum map_type type);
 
 void dsos__add(struct list_head *head, struct dso *dso);
+struct dso *dsos__find(struct list_head *head, const char *name);
 struct dso *__dsos__findnew(struct list_head *head, const char *name);
 
 int dso__load(struct dso *dso, struct map *map, symbol_filter_t filter);
-- 
1.7.9.2.358.g22243

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 07/18] perf tools: include __WORDSIZE definition

2012-09-11 Thread Arnaldo Carvalho de Melo

From: Irina Tirdea 

__WORDSIZE is GLibC-specific and is not defined on all systems or glibc
versions (e.g. Android's bionic does not define it).

In file included from util/include/linux/bitmap.h:5:0,
 from util/header.h:10,
 from util/session.h:6,
 from util/build-id.h:4,
 from util/annotate.c:11:
util/include/linux/bitops.h: In function 'set_bit':
util/include/linux/bitops.h:25:12: error:
'__WORDSIZE' undeclared (first use in this function)
util/include/linux/bitops.h:25:12: note:
each undeclared identifier is reported only once for each function it appears in
util/include/linux/bitops.h:23:51: error:
parameter 'addr' set but not used [-Werror=unused-but-set-parameter]
util/include/linux/bitops.h: In function 'clear_bit':
util/include/linux/bitops.h:30:12: error:
'__WORDSIZE' undeclared (first use in this function)
util/include/linux/bitops.h:28:53: error:
parameter 'addr' set but not used [-Werror=unused-but-set-parameter]
In file included from util/header.h:10:0,
 from util/session.h:6,
 from util/build-id.h:4,
 from util/annotate.c:11:
util/include/linux/bitmap.h: In function 'bitmap_zero':
util/include/linux/bitmap.h:22:6: error:
'__WORDSIZE' undeclared (first use in this function)

Defining __WORDSIZE in perf's headers if it is not already defined.

Suggested-by: Peter Zijlstra 
Suggested-by: Pekka Enberg 
Signed-off-by: Irina Tirdea 
Cc: David Ahern 
Cc: Ingo Molnar 
Cc: Irina Tirdea 
Cc: Namhyung Kim 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Steven Rostedt 
Link: 
http://lkml.kernel.org/r/1347315303-29906-4-git-send-email-irina.tir...@intel.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/include/linux/bitops.h |4 
 1 file changed, 4 insertions(+)

diff --git a/tools/perf/util/include/linux/bitops.h 
b/tools/perf/util/include/linux/bitops.h
index 587a230..a55d8cf 100644
--- a/tools/perf/util/include/linux/bitops.h
+++ b/tools/perf/util/include/linux/bitops.h
@@ -5,6 +5,10 @@
 #include 
 #include 
 
+#ifndef __WORDSIZE
+#define __WORDSIZE (__SIZEOF_LONG__ * 8)
+#endif
+
 #define BITS_PER_LONG __WORDSIZE
 #define BITS_PER_BYTE   8
 #define BITS_TO_LONGS(nr)   DIV_ROUND_UP(nr, BITS_PER_BYTE * sizeof(long))
-- 
1.7.9.2.358.g22243

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 18/18] perf sched: Don't read all tracepoint variables in advance

2012-09-11 Thread Arnaldo Carvalho de Melo

From: Arnaldo Carvalho de Melo 

Do it just at the actual consumer of these fields, that way we avoid
needless lookups:

  [root@sandy ~]# perf sched record sleep 30s
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 8.585 MB perf.data (~375063 samples) ]

Before:

  [root@sandy ~]# perf stat -r 10 perf sched lat > /dev/null

   Performance counter stats for 'perf sched lat' (10 runs):

  103.592215 task-clock#0.993 CPUs utilized 
   ( +-  0.33% )
  12 context-switches  #0.114 K/sec 
   ( +-  3.29% )
   0 cpu-migrations#0.000 K/sec
   7,605 page-faults   #0.073 M/sec 
   ( +-  0.00% )
 345,796,112 cycles#3.338 GHz   
   ( +-  0.07% ) [82.90%]
 106,876,796 stalled-cycles-frontend   #   30.91% frontend cycles idle  
   ( +-  0.38% ) [83.23%]
  62,060,877 stalled-cycles-backend#   17.95% backend  cycles idle  
   ( +-  0.80% ) [67.14%]
 628,246,586 instructions  #1.82  insns per cycle
   #0.17  stalled cycles per 
insn  ( +-  0.04% ) [83.64%]
 134,962,057 branches  # 1302.820 M/sec 
   ( +-  0.10% ) [83.64%]
   1,233,037 branch-misses #0.91% of all branches   
   ( +-  0.29% ) [83.41%]

 0.104333272 seconds time elapsed   
   ( +-  0.33% )

  [root@sandy ~]# perf stat -r 10 perf sched lat > /dev/null

   Performance counter stats for 'perf sched lat' (10 runs):

 98.848272 task-clock#0.993 CPUs utilized   
 ( +-  0.48% )
11 context-switches  #0.112 K/sec   
 ( +-  2.83% )
 0 cpu-migrations#0.003 K/sec   
 ( +- 50.92% )
 7,604 page-faults   #0.077 M/sec   
 ( +-  0.00% )
   332,216,085 cycles#3.361 GHz 
 ( +-  0.14% ) [82.87%]
   100,623,710 stalled-cycles-frontend   #   30.29% frontend cycles idle
 ( +-  0.53% ) [82.95%]
58,788,692 stalled-cycles-backend#   17.70% backend  cycles idle
 ( +-  0.59% ) [67.15%]
   609,402,433 instructions  #1.83  insns per cycle
 #0.17  stalled cycles per insn 
 ( +-  0.04% ) [83.76%]
   131,277,138 branches  # 1328.067 M/sec   
 ( +-  0.06% ) [83.77%]
 1,117,871 branch-misses #0.85% of all branches 
 ( +-  0.32% ) [83.51%]

   0.099580430 seconds time elapsed 
 ( +-  0.48% )

  [root@sandy ~]#

Cc: David Ahern 
Cc: Frederic Weisbecker 
Cc: Jiri Olsa 
Cc: Mike Galbraith 
Cc: Namhyung Kim 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Link: http://lkml.kernel.org/n/tip-kracdpw8wqlr0xjh75uk8...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/builtin-sched.c |  277 
 1 file changed, 97 insertions(+), 180 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 0df5e7a..af305f5 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -97,73 +97,25 @@ struct work_atoms {
 
 typedef int (*sort_fn_t)(struct work_atoms *, struct work_atoms *);
 
-struct trace_switch_event {
-   char *prev_comm;
-   u32  prev_pid;
-   u32  prev_prio;
-   u64  prev_state;
-   char *next_comm;
-   u32  next_pid;
-   u32  next_prio;
-};
-
-struct trace_runtime_event {
-   char *comm;
-   u32  pid;
-   u64  runtime;
-   u64  vruntime;
-};
+struct perf_sched;
 
-struct trace_wakeup_event {
-   char *comm;
-   u32  pid;
-   u32  prio;
-   u32  success;
-   u32  cpu;
-};
+struct trace_sched_handler {
+   int (*switch_event)(struct perf_sched *sched, struct perf_evsel *evsel,
+   struct perf_sample *sample, struct machine 
*machine);
 
-struct trace_fork_event {
-   char *parent_comm;
-   u32  parent_pid;
-   char *child_comm;
-   u32   child_pid;
-};
+   int (*runtime_event)(struct perf_sched *sched, struct perf_evsel *evsel,
+struct perf_sample *sample, struct machine 
*machine);
 
-struct trace_migrate_task_event {
-   char *comm;
-   u32  pid;
-   u32  prio;
-   u32  cpu;
-};
+   int (*wakeup_event)(struct perf_sched *sched, struct perf_evsel *evsel,
+   struct perf_sample *sample, struct machine 
*machine);
 
-struct perf_sched;
-
-struct trace_sched_handler {
-   int (*switch_event)(struct perf_sched *sched,
-

[PATCH 10/18] perf tools: Add memdup function

2012-09-11 Thread Arnaldo Carvalho de Melo

From: Jiri Olsa 

Adding memdup function to duplicate region of memory.

  void *memdup(const void *src, size_t len)

Signed-off-by: Jiri Olsa 
Cc: Frederic Weisbecker 
Cc: Ingo Molnar 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Link: 
http://lkml.kernel.org/r/1347295819-23177-3-git-send-email-jo...@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/include/linux/string.h |2 ++
 tools/perf/util/string.c   |   18 +-
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/include/linux/string.h 
b/tools/perf/util/include/linux/string.h
index 3b2f590..6f19c54 100644
--- a/tools/perf/util/include/linux/string.h
+++ b/tools/perf/util/include/linux/string.h
@@ -1 +1,3 @@
 #include 
+
+void *memdup(const void *src, size_t len);
diff --git a/tools/perf/util/string.c b/tools/perf/util/string.c
index 199bc4d..3217059 100644
--- a/tools/perf/util/string.c
+++ b/tools/perf/util/string.c
@@ -1,5 +1,5 @@
 #include "util.h"
-#include "string.h"
+#include "linux/string.h"
 
 #define K 1024LL
 /*
@@ -335,3 +335,19 @@ char *rtrim(char *s)
 
return s;
 }
+
+/**
+ * memdup - duplicate region of memory
+ * @src: memory region to duplicate
+ * @len: memory region length
+ */
+void *memdup(const void *src, size_t len)
+{
+   void *p;
+
+   p = malloc(len);
+   if (p)
+   memcpy(p, src, len);
+
+   return p;
+}
-- 
1.7.9.2.358.g22243

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PULL 00/20] perf/core improvements and fixes

2012-09-11 Thread Arnaldo Carvalho de Melo

Hi Ingo,

Please consider pulling,

Best Regards,

- Arnaldo

The following changes since commit d5cb2aef4fda355fbafe8db4f425b73ea94d2019:

  Merge tag 'perf-core-for-mingo' of 
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core 
(2012-09-09 10:39:14 +0200)

are available in the git repository at:


  git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux 
tags/perf-core-for-mingo

for you to fetch changes up to 9ec3f4e437ede2f3b5087d412abe16a0219b3b99:

  perf sched: Don't read all tracepoint variables in advance (2012-09-11 
20:39:19 -0300)


perf/core improvements and fixes

. Remove die()/exit() calls from several tools.

. Add missing perf_regs.h file to MANIFEST

. Clean up and improve 'perf sched' performance by elliminating lots of
  needless calls to libtraceevent.

. More patches to make perf build on Android, from Irina Tirdea

. Resolve vdso callchains, from Jiri Olsa

Signed-off-by: Arnaldo Carvalho de Melo 


Arnaldo Carvalho de Melo (9):
  perf test: Remove die() calls
  perf sched: Remove die() calls
  perf kmem: Remove die() calls
  perf tools: Add missing perf_regs.h file to MANIFEST
  perf sched: Remove unused thread parameter
  perf sched: Use perf_tool as ancestor
  perf evsel: Introduce perf_evsel__{str,int}val methods
  perf sched: Use perf_evsel__{int,str}val
  perf sched: Don't read all tracepoint variables in advance

Irina Tirdea (5):
  perf tools: include wrapper for magic.h
  perf tools: Update types definitions for Android
  perf tools: include __WORDSIZE definition
  perf tools: fix ALIGN redefinition in system headers
  perf tools: Use __maybe_used for unused variables

Jiri Olsa (4):
  perf tools: Do backtrace post unwind only if we regs and stack were 
captured
  perf tools: Add memdup function
  perf symbols: Make dsos__find function globally available
  perf tools: Back [vdso] DSO with real data

 tools/lib/traceevent/event-parse.c |8 +-
 tools/lib/traceevent/event-parse.h |4 +-
 tools/perf/MANIFEST|1 +
 tools/perf/Makefile|3 +
 tools/perf/bench/bench.h   |3 +-
 tools/perf/bench/mem-memcpy.c  |2 +-
 tools/perf/bench/mem-memset.c  |2 +-
 tools/perf/bench/sched-messaging.c |2 +-
 tools/perf/bench/sched-pipe.c  |6 +-
 tools/perf/builtin-annotate.c  |2 +-
 tools/perf/builtin-bench.c |2 +-
 tools/perf/builtin-buildid-cache.c |   10 +-
 tools/perf/builtin-buildid-list.c  |3 +-
 tools/perf/builtin-diff.c  |4 +-
 tools/perf/builtin-evlist.c|2 +-
 tools/perf/builtin-help.c  |2 +-
 tools/perf/builtin-inject.c|   24 +-
 tools/perf/builtin-kmem.c  |  130 +-
 tools/perf/builtin-kvm.c   |2 +-
 tools/perf/builtin-list.c  |2 +-
 tools/perf/builtin-lock.c  |4 +-
 tools/perf/builtin-probe.c |   24 +-
 tools/perf/builtin-record.c|   10 +-
 tools/perf/builtin-report.c|   11 +-
 tools/perf/builtin-sched.c | 1446 +---
 tools/perf/builtin-script.c|   29 +-
 tools/perf/builtin-stat.c  |   40 +-
 tools/perf/builtin-test.c  |   23 +-
 tools/perf/builtin-timechart.c |   30 +-
 tools/perf/builtin-top.c   |7 +-
 tools/perf/ui/browser.c|7 +-
 tools/perf/ui/browsers/annotate.c  |6 +-
 tools/perf/ui/gtk/browser.c|5 +-
 tools/perf/ui/gtk/setup.c  |2 +-
 tools/perf/ui/gtk/util.c   |4 +-
 tools/perf/ui/helpline.c   |2 +-
 tools/perf/ui/helpline.h   |8 +-
 tools/perf/ui/hist.c   |   21 +-
 tools/perf/ui/tui/setup.c  |4 +-
 tools/perf/util/alias.c|3 +-
 tools/perf/util/annotate.c |6 +-
 tools/perf/util/annotate.h |   13 +-
 tools/perf/util/build-id.c |   11 +-
 tools/perf/util/cache.h|6 +-
 tools/perf/util/callchain.c|6 +-
 tools/perf/util/cgroup.c   |4 +-
 tools/perf/util/config.c

[PATCH 04/18] perf tools: include wrapper for magic.h

2012-09-11 Thread Arnaldo Carvalho de Melo

From: Irina Tirdea 

perf is currently including magic.h directly from the kernel. If the
glibc magic.h is also included, this leads to warnings that the
constants are redefined. This happens on some systems (e.g. Android).

Redefinition errors on Android:
In file included from util/util.h:79:0,
 from util/cache.h:5,
 from util/abspath.c:1:
util/../../../include/linux/magic.h:5:0:
error: "AFFS_SUPER_MAGIC" redefined [-Werror]
bionic/libc/include/sys/vfs.h:53:0:
note: this is the location of the previous definition
util/../../../include/linux/magic.h:19:0:
error: "EFS_SUPER_MAGIC" redefined [-Werror]
bionic/libc/include/sys/vfs.h:61:0:
note: this is the location of the previous definition
util/../../../include/linux/magic.h:26:0:
error: "HPFS_SUPER_MAGIC" redefined [-Werror]
bionic/libc/include/sys/vfs.h:67:0:
note: this is the location of the previous definition

Only two constants from magic.h are used by perf (DEBUGFS_MAGIC and
SYSFS_MAGIC). This fix provides a wrapper for magic.h that includes only
these constants instead of including the kernel header file directly.

Signed-off-by: Irina Tirdea 
Acked-by: Pekka Enberg 
Cc: David Ahern 
Cc: Ingo Molnar 
Cc: Irina Tirdea 
Cc: Namhyung Kim 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Steven Rostedt 
Link: 
http://lkml.kernel.org/r/1347315303-29906-2-git-send-email-irina.tir...@intel.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/Makefile   |1 +
 tools/perf/util/include/linux/magic.h |   12 
 tools/perf/util/util.h|2 +-
 3 files changed, 14 insertions(+), 1 deletion(-)
 create mode 100644 tools/perf/util/include/linux/magic.h

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index e4b2e8f..1d2723c 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -264,6 +264,7 @@ LIB_H += util/include/linux/ctype.h
 LIB_H += util/include/linux/kernel.h
 LIB_H += util/include/linux/list.h
 LIB_H += util/include/linux/export.h
+LIB_H += util/include/linux/magic.h
 LIB_H += util/include/linux/poison.h
 LIB_H += util/include/linux/prefetch.h
 LIB_H += util/include/linux/rbtree.h
diff --git a/tools/perf/util/include/linux/magic.h 
b/tools/perf/util/include/linux/magic.h
new file mode 100644
index 000..58b64ed
--- /dev/null
+++ b/tools/perf/util/include/linux/magic.h
@@ -0,0 +1,12 @@
+#ifndef _PERF_LINUX_MAGIC_H_
+#define _PERF_LINUX_MAGIC_H_
+
+#ifndef DEBUGFS_MAGIC
+#define DEBUGFS_MAGIC  0x64626720
+#endif
+
+#ifndef SYSFS_MAGIC
+#define SYSFS_MAGIC0x62656572
+#endif
+
+#endif
diff --git a/tools/perf/util/util.h b/tools/perf/util/util.h
index 67a3713..70fa70b 100644
--- a/tools/perf/util/util.h
+++ b/tools/perf/util/util.h
@@ -70,7 +70,7 @@
 #include 
 #include 
 #include 
-#include "../../../include/linux/magic.h"
+#include 
 #include "types.h"
 #include 
 
-- 
1.7.9.2.358.g22243

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 06/18] perf tools: Update types definitions for Android

2012-09-11 Thread Arnaldo Carvalho de Melo

From: Irina Tirdea 

Some type definitions are missing from Android or are already defined in
bionic and lead to redefinition errors.

Android defines in types.h __le32. Since perf is wrapping  with a
local version, we need to define this constant in the local version too.
Error in Android:
In file included from bionic/libc/include/unistd.h:36:0,
 from external/perf/tools/perf/util/util.h:46,
 from external/perf/tools/perf/util/cache.h:5,
 from external/perf/tools/perf/util/abspath.c:1:
bionic/libc/kernel/common/linux/capability.h:60:2:
error: unknown type name '__le32'

roundup() definition is missing:
util/symbol.c: In function 'symbols__fixup_end':
util/symbol.c:106: warning: implicit declaration of function 'roundup'
util/symbol.c:106: warning: nested extern declaration of 'roundup'

__force macro defined in perf is also defined in libc which leads to
redefinition errors. In order to avoid these, we guard these definition
with

Signed-off-by: Irina Tirdea 
Acked-by: Pekka Enberg 
Cc: David Ahern 
Cc: Ingo Molnar 
Cc: Irina Tirdea 
Cc: Namhyung Kim 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Steven Rostedt 
Link: 
http://lkml.kernel.org/r/1347315303-29906-3-git-send-email-irina.tir...@intel.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/include/linux/compiler.h |4 
 tools/perf/util/include/linux/kernel.h   |9 +
 tools/perf/util/include/linux/types.h|8 
 3 files changed, 21 insertions(+)

diff --git a/tools/perf/util/include/linux/compiler.h 
b/tools/perf/util/include/linux/compiler.h
index 2dc8671..ce2367b 100644
--- a/tools/perf/util/include/linux/compiler.h
+++ b/tools/perf/util/include/linux/compiler.h
@@ -12,4 +12,8 @@
 #define __used __attribute__((__unused__))
 #define __packed   __attribute__((__packed__))
 
+#ifndef __force
+#define __force
+#endif
+
 #endif
diff --git a/tools/perf/util/include/linux/kernel.h 
b/tools/perf/util/include/linux/kernel.h
index 4af9a10..a978f26 100644
--- a/tools/perf/util/include/linux/kernel.h
+++ b/tools/perf/util/include/linux/kernel.h
@@ -46,6 +46,15 @@
_min1 < _min2 ? _min1 : _min2; })
 #endif
 
+#ifndef roundup
+#define roundup(x, y) (\
+{  \
+   const typeof(y) __y = y;   \
+   (((x) + (__y - 1)) / __y) * __y;   \
+}  \
+)
+#endif
+
 #ifndef BUG_ON
 #ifdef NDEBUG
 #define BUG_ON(cond) do { if (cond) {} } while (0)
diff --git a/tools/perf/util/include/linux/types.h 
b/tools/perf/util/include/linux/types.h
index 12de3b8..eb46478 100644
--- a/tools/perf/util/include/linux/types.h
+++ b/tools/perf/util/include/linux/types.h
@@ -3,6 +3,14 @@
 
 #include 
 
+#ifndef __bitwise
+#define __bitwise
+#endif
+
+#ifndef __le32
+typedef __u32 __bitwise __le32;
+#endif
+
 #define DECLARE_BITMAP(name,bits) \
unsigned long name[BITS_TO_LONGS(bits)]
 
-- 
1.7.9.2.358.g22243

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 03/18] perf kmem: Remove die() calls

2012-09-11 Thread Arnaldo Carvalho de Melo

From: Arnaldo Carvalho de Melo 

Just use pr_err() + return -1 and perf_session__process_events to abort
when some event would call die(), then let the perf's main() exit doing
whatever it needs.

Cc: David Ahern 
Cc: Frederic Weisbecker 
Cc: Jiri Olsa 
Cc: Mike Galbraith 
Cc: Namhyung Kim 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Link: http://lkml.kernel.org/n/tip-i7rhuqfwshjiwc9gr9m1v...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/builtin-kmem.c |  108 -
 1 file changed, 67 insertions(+), 41 deletions(-)

diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index fc6607b..ad9f520 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -58,41 +58,52 @@ static unsigned long nr_allocs, nr_cross_allocs;
 
 #define PATH_SYS_NODE  "/sys/devices/system/node"
 
-static void init_cpunode_map(void)
+static int init_cpunode_map(void)
 {
FILE *fp;
-   int i;
+   int i, err = -1;
 
fp = fopen("/sys/devices/system/cpu/kernel_max", "r");
if (!fp) {
max_cpu_num = 4096;
-   return;
+   return 0;
+   }
+
+   if (fscanf(fp, "%d", _cpu_num) < 1) {
+   pr_err("Failed to read 'kernel_max' from sysfs");
+   goto out_close;
}
 
-   if (fscanf(fp, "%d", _cpu_num) < 1)
-   die("Failed to read 'kernel_max' from sysfs");
max_cpu_num++;
 
cpunode_map = calloc(max_cpu_num, sizeof(int));
-   if (!cpunode_map)
-   die("calloc");
+   if (!cpunode_map) {
+   pr_err("%s: calloc failed\n", __func__);
+   goto out_close;
+   }
+
for (i = 0; i < max_cpu_num; i++)
cpunode_map[i] = -1;
+
+   err = 0;
+out_close:
fclose(fp);
+   return err;
 }
 
-static void setup_cpunode_map(void)
+static int setup_cpunode_map(void)
 {
struct dirent *dent1, *dent2;
DIR *dir1, *dir2;
unsigned int cpu, mem;
char buf[PATH_MAX];
 
-   init_cpunode_map();
+   if (init_cpunode_map())
+   return -1;
 
dir1 = opendir(PATH_SYS_NODE);
if (!dir1)
-   return;
+   return -1;
 
while ((dent1 = readdir(dir1)) != NULL) {
if (dent1->d_type != DT_DIR ||
@@ -112,10 +123,11 @@ static void setup_cpunode_map(void)
closedir(dir2);
}
closedir(dir1);
+   return 0;
 }
 
-static void insert_alloc_stat(unsigned long call_site, unsigned long ptr,
- int bytes_req, int bytes_alloc, int cpu)
+static int insert_alloc_stat(unsigned long call_site, unsigned long ptr,
+int bytes_req, int bytes_alloc, int cpu)
 {
struct rb_node **node = _alloc_stat.rb_node;
struct rb_node *parent = NULL;
@@ -139,8 +151,10 @@ static void insert_alloc_stat(unsigned long call_site, 
unsigned long ptr,
data->bytes_alloc += bytes_alloc;
} else {
data = malloc(sizeof(*data));
-   if (!data)
-   die("malloc");
+   if (!data) {
+   pr_err("%s: malloc failed\n", __func__);
+   return -1;
+   }
data->ptr = ptr;
data->pingpong = 0;
data->hit = 1;
@@ -152,9 +166,10 @@ static void insert_alloc_stat(unsigned long call_site, 
unsigned long ptr,
}
data->call_site = call_site;
data->alloc_cpu = cpu;
+   return 0;
 }
 
-static void insert_caller_stat(unsigned long call_site,
+static int insert_caller_stat(unsigned long call_site,
  int bytes_req, int bytes_alloc)
 {
struct rb_node **node = _caller_stat.rb_node;
@@ -179,8 +194,10 @@ static void insert_caller_stat(unsigned long call_site,
data->bytes_alloc += bytes_alloc;
} else {
data = malloc(sizeof(*data));
-   if (!data)
-   die("malloc");
+   if (!data) {
+   pr_err("%s: malloc failed\n", __func__);
+   return -1;
+   }
data->call_site = call_site;
data->pingpong = 0;
data->hit = 1;
@@ -190,11 +207,12 @@ static void insert_caller_stat(unsigned long call_site,
rb_link_node(>node, parent, node);
rb_insert_color(>node, _caller_stat);
}
+
+   return 0;
 }
 
-static void perf_evsel__process_alloc_event(struct perf_evsel *evsel,
-   struct perf_sample *sample,
-   int node)
+static int perf_evsel__process_alloc_event(struct perf_evsel *evsel,
+  struct perf_sample *sample, int node)
 {
struct event_format *event =

[PATCH 12/18] perf tools: Back [vdso] DSO with real data

2012-09-11 Thread Arnaldo Carvalho de Melo

From: Jiri Olsa 

Storing data for VDSO shared object, because we need it for the post
unwind processing.

The VDSO shared object is same for all process on a running system, so
it makes no difference when we store it inside the tracer - perf.

When [vdso] map memory is hit, we retrieve [vdso] DSO image and store it
into temporary file.

During the build-id processing phase, the [vdso] DSO image is stored in
build-id db, and build-id reference is made inside perf.data. The
build-id vdso file object is called '[vdso]'. We don't use temporary
file name which gets removed when record is finished.

During report phase the vdso build-id object is treated as any other
build-id DSO object.

Adding following API for vdso object:

  bool is_vdso_map(const char *filename)
- returns true if the filename matches vdso map name

  struct dso *vdso__dso_findnew(struct list_head *head)
- find/create proper vdso DSO object

  vdso__exit(void)
- removes temporary VDSO image if there's any

This change makes backtrace dwarf post unwind possible from [vdso] maps.

Following output is current report of [vdso] sample dwarf backtrace:

  # Overhead  Command  Shared Object Symbol
  #   ...  .  .
  #
  99.52%   ex  [vdso] [.] 0x7fff3ace89af
   |
   --- 0x7fff3ace89af

Following output is new report of [vdso] sample dwarf backtrace:

  # Overhead  Command  Shared Object Symbol
  #   ...  .  .
  #
  99.52%   ex  [vdso] [.] 0x09af
   |
   --- 0x7fff3ace89af
   main
   __libc_start_main
   _start

Signed-off-by: Jiri Olsa 
Acked-by: Peter Zijlstra 
Cc: Frederic Weisbecker 
Cc: Ingo Molnar 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Link: 
http://lkml.kernel.org/r/1347295819-23177-5-git-send-email-jo...@redhat.com
[ committer note: s/ALIGN/PERF_ALIGN/g to cope with the android build changes ]
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/Makefile|2 +
 tools/perf/builtin-buildid-cache.c |3 +-
 tools/perf/util/header.c   |   70 ---
 tools/perf/util/header.h   |2 +-
 tools/perf/util/map.c  |   12 +++-
 tools/perf/util/session.c  |2 +
 tools/perf/util/vdso.c |  111 
 tools/perf/util/vdso.h |   18 ++
 8 files changed, 194 insertions(+), 26 deletions(-)
 create mode 100644 tools/perf/util/vdso.c
 create mode 100644 tools/perf/util/vdso.h

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index 1d2723c..209774b 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -337,6 +337,7 @@ LIB_H += util/intlist.h
 LIB_H += util/perf_regs.h
 LIB_H += util/unwind.h
 LIB_H += ui/helpline.h
+LIB_H += util/vdso.h
 
 LIB_OBJS += $(OUTPUT)util/abspath.o
 LIB_OBJS += $(OUTPUT)util/alias.o
@@ -404,6 +405,7 @@ LIB_OBJS += $(OUTPUT)util/cgroup.o
 LIB_OBJS += $(OUTPUT)util/target.o
 LIB_OBJS += $(OUTPUT)util/rblist.o
 LIB_OBJS += $(OUTPUT)util/intlist.o
+LIB_OBJS += $(OUTPUT)util/vdso.o
 
 LIB_OBJS += $(OUTPUT)ui/helpline.o
 LIB_OBJS += $(OUTPUT)ui/hist.o
diff --git a/tools/perf/builtin-buildid-cache.c 
b/tools/perf/builtin-buildid-cache.c
index 29ad20e..995368e 100644
--- a/tools/perf/builtin-buildid-cache.c
+++ b/tools/perf/builtin-buildid-cache.c
@@ -43,7 +43,8 @@ static int build_id_cache__add_file(const char *filename, 
const char *debugdir)
}
 
build_id__sprintf(build_id, sizeof(build_id), sbuild_id);
-   err = build_id_cache__add_s(sbuild_id, debugdir, filename, false);
+   err = build_id_cache__add_s(sbuild_id, debugdir, filename,
+   false, false);
if (verbose)
pr_info("Adding %s %s: %s\n", sbuild_id, filename,
err ? "FAIL" : "Ok");
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 974e758..87996ca 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -21,6 +21,7 @@
 #include "debug.h"
 #include "cpumap.h"
 #include "pmu.h"
+#include "vdso.h"
 
 static bool no_buildid_cache = false;
 
@@ -207,6 +208,29 @@ perf_header__set_cmdline(int argc, const char **argv)
continue;   \
else
 
+static int write_buildid(char *name, size_t name_len, u8 *build_id,
+pid_t pid, u16 misc, int fd)
+{
+   int err;
+   struct build_id_event b;
+   size_t len;
+
+   len = name_len + 1;
+   len = PERF_ALIGN(len, NAME_ALIGN);
+
+   memset(, 0, sizeof(b));
+   memcpy(_id, build_id, BUILD_ID_SIZE);
+   b.pid = pid;
+   b.header.misc = misc;
+   b.header.size = sizeof(b) + len;
+
+   err = do_write(fd,

[PATCH 09/18] perf tools: Do backtrace post unwind only if we regs and stack were captured

2012-09-11 Thread Arnaldo Carvalho de Melo

From: Jiri Olsa 

Bail out without error if we want to do backtrace post unwind, but were
not able to capture user registers or user stack during the record
phase, which is possible and valid case.

Signed-off-by: Jiri Olsa 
Cc: Frederic Weisbecker 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Link: 
http://lkml.kernel.org/r/1347295819-23177-2-git-send-email-jo...@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/session.c |5 +
 1 file changed, 5 insertions(+)

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 3806ea4..0ecd62b 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -388,6 +388,11 @@ int machine__resolve_callchain(struct machine *machine,
  (evsel->attr.sample_type & PERF_SAMPLE_STACK_USER)))
return 0;
 
+   /* Bail out if nothing was captured. */
+   if ((!sample->user_regs.regs) ||
+   (!sample->user_stack.size))
+   return 0;
+
return unwind__get_entries(unwind_entry, _cursor, machine,
   thread, evsel->attr.sample_regs_user,
   sample);
-- 
1.7.9.2.358.g22243

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 08/18] perf tools: fix ALIGN redefinition in system headers

2012-09-11 Thread Arnaldo Carvalho de Melo

From: Irina Tirdea 

On some systems (e.g. Android), ALIGN is defined in system headers as
ALIGN(p).  The definition of ALIGN used in perf takes 2 parameters:
ALIGN(x,a).  This leads to redefinition conflicts.

Redefinition error on Android:
In file included from util/include/linux/list.h:1:0,
from util/callchain.h:5,
from util/hist.h:6,
from util/session.h:4,
from util/build-id.h:4,
from util/annotate.c:11:
util/include/linux/kernel.h:11:0: error: "ALIGN" redefined [-Werror]
bionic/libc/include/sys/param.h:38:0: note: this is the location of
the previous definition

Conflics with system defined ALIGN in Android:
util/event.c: In function 'perf_event__synthesize_comm':
util/event.c:115:32: error: macro "ALIGN" passed 2 arguments, but takes just 1
util/event.c:115:9: error: 'ALIGN' undeclared (first use in this function)
util/event.c:115:9: note: each undeclared identifier is reported only once for
each function it appears in

In order to avoid this redefinition, ALIGN is renamed to PERF_ALIGN.

Signed-off-by: Irina Tirdea 
Acked-by: Pekka Enberg 
Cc: David Ahern 
Cc: Ingo Molnar 
Cc: Irina Tirdea 
Cc: Namhyung Kim 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Steven Rostedt 
Link: 
http://lkml.kernel.org/r/1347315303-29906-5-git-send-email-irina.tir...@intel.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/event.c|   10 +-
 tools/perf/util/event.h|2 +-
 tools/perf/util/header.c   |   16 
 tools/perf/util/include/linux/kernel.h |4 ++--
 tools/perf/util/session.c  |4 ++--
 tools/perf/util/symbol.c   |2 +-
 6 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 84ff6f16..f7f4805 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -112,7 +112,7 @@ static pid_t perf_event__synthesize_comm(struct perf_tool 
*tool,
event->comm.header.type = PERF_RECORD_COMM;
 
size = strlen(event->comm.comm) + 1;
-   size = ALIGN(size, sizeof(u64));
+   size = PERF_ALIGN(size, sizeof(u64));
memset(event->comm.comm + size, 0, machine->id_hdr_size);
event->comm.header.size = (sizeof(event->comm) -
(sizeof(event->comm.comm) - size) +
@@ -145,7 +145,7 @@ static pid_t perf_event__synthesize_comm(struct perf_tool 
*tool,
 sizeof(event->comm.comm));
 
size = strlen(event->comm.comm) + 1;
-   size = ALIGN(size, sizeof(u64));
+   size = PERF_ALIGN(size, sizeof(u64));
memset(event->comm.comm + size, 0, machine->id_hdr_size);
event->comm.header.size = (sizeof(event->comm) -
  (sizeof(event->comm.comm) - size) +
@@ -228,7 +228,7 @@ static int perf_event__synthesize_mmap_events(struct 
perf_tool *tool,
size = strlen(execname);
execname[size - 1] = '\0'; /* Remove \n */
memcpy(event->mmap.filename, execname, size);
-   size = ALIGN(size, sizeof(u64));
+   size = PERF_ALIGN(size, sizeof(u64));
event->mmap.len -= event->mmap.start;
event->mmap.header.size = (sizeof(event->mmap) -
(sizeof(event->mmap.filename) - 
size));
@@ -282,7 +282,7 @@ int perf_event__synthesize_modules(struct perf_tool *tool,
if (pos->dso->kernel)
continue;
 
-   size = ALIGN(pos->dso->long_name_len + 1, sizeof(u64));
+   size = PERF_ALIGN(pos->dso->long_name_len + 1, sizeof(u64));
event->mmap.header.type = PERF_RECORD_MMAP;
event->mmap.header.size = (sizeof(event->mmap) -
(sizeof(event->mmap.filename) - size));
@@ -494,7 +494,7 @@ int perf_event__synthesize_kernel_mmap(struct perf_tool 
*tool,
map = machine->vmlinux_maps[MAP__FUNCTION];
size = snprintf(event->mmap.filename, sizeof(event->mmap.filename),
"%s%s", mmap_name, symbol_name) + 1;
-   size = ALIGN(size, sizeof(u64));
+   size = PERF_ALIGN(size, sizeof(u64));
event->mmap.header.type = PERF_RECORD_MMAP;
event->mmap.header.size = (sizeof(event->mmap) -
(sizeof(event->mmap.filename) - size) + 
machine->id_hdr_size);
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 0e088d0..21b99e7 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -101,7 +101,7 @@ struct perf_sample {
 struct build_id_event {
struct perf_event_header header;
pid_tpid;
-   u8   build_id[ALIGN(BUILD_ID_SIZE, sizeof(u64))];
+   u8   build_id[PERF_ALIGN(BUILD_ID_SIZE, 
sizeof(u64))];

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1128 matches

Mail list logo