[PATCH] leds: wm8350: Complain if we fail to reenable DCDC

2013-03-01 Thread Mark Brown
Provide some trace, though the hardware is most likely non-functional if
this happens.

Signed-off-by: Mark Brown 
---
 drivers/leds/leds-wm8350.c |6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/leds/leds-wm8350.c b/drivers/leds/leds-wm8350.c
index ed15157..86bdab3 100644
--- a/drivers/leds/leds-wm8350.c
+++ b/drivers/leds/leds-wm8350.c
@@ -129,7 +129,11 @@ static void wm8350_led_disable(struct wm8350_led *led)
ret = regulator_disable(led->isink);
if (ret != 0) {
dev_err(led->cdev.dev, "Failed to disable ISINK: %d\n", ret);
-   regulator_enable(led->dcdc);
+   ret = regulator_enable(led->dcdc);
+   if (ret != 0) {
+   dev_err(led->cdev.dev, "Failed to reenable DCDC: %d\n",
+   ret);
+   }
return;
}
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] gpio: palmas: add in GPIO support for palmas charger

2013-03-01 Thread Laxman Dewangan

On Friday 01 March 2013 10:36 PM, Ian Lartey wrote:

Palmas charger has 16 GPIOs
add palmas_gpio_[read|write|update] api to take account
second bank of GPIOs

Signed-off-by: Ian Lartey 
Signed-off-by: Graeme Gregory 
---


Looks good.
Acked-by: Laxman Dewangan


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mfd: twl4030-audio: Fix argument type for twl4030_audio_disable_resource()

2013-03-01 Thread Mark Brown
Looks like the conversion to enum was missed for the definition of this
function, the declaration has been updated.

Signed-off-by: Mark Brown 
---
 drivers/mfd/twl4030-audio.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/mfd/twl4030-audio.c b/drivers/mfd/twl4030-audio.c
index e16edca..d2ab222 100644
--- a/drivers/mfd/twl4030-audio.c
+++ b/drivers/mfd/twl4030-audio.c
@@ -118,7 +118,7 @@ EXPORT_SYMBOL_GPL(twl4030_audio_enable_resource);
  * Disable the resource.
  * The function returns with error or the content of the register
  */
-int twl4030_audio_disable_resource(unsigned id)
+int twl4030_audio_disable_resource(enum twl4030_audio_res id)
 {
struct twl4030_audio *audio = platform_get_drvdata(twl4030_audio_dev);
int val;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mfd: tps65912: Declare and use tps65912_irq_exit()

2013-03-01 Thread Mark Brown
Clean up interrupts on exit, silencing a sparse warning caused by
tps65912_irq_exit() being defined but not prototyped as we go.

Signed-off-by: Mark Brown 
---
 drivers/mfd/tps65912-core.c  |1 +
 include/linux/mfd/tps65912.h |1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/mfd/tps65912-core.c b/drivers/mfd/tps65912-core.c
index 4658b5b..aeb8e40 100644
--- a/drivers/mfd/tps65912-core.c
+++ b/drivers/mfd/tps65912-core.c
@@ -169,6 +169,7 @@ err:
 void tps65912_device_exit(struct tps65912 *tps65912)
 {
mfd_remove_devices(tps65912->dev);
+   tps65912_irq_exit(tps65912);
kfree(tps65912);
 }
 
diff --git a/include/linux/mfd/tps65912.h b/include/linux/mfd/tps65912.h
index aaceab4..6d30903 100644
--- a/include/linux/mfd/tps65912.h
+++ b/include/linux/mfd/tps65912.h
@@ -323,5 +323,6 @@ int tps65912_device_init(struct tps65912 *tps65912);
 void tps65912_device_exit(struct tps65912 *tps65912);
 int tps65912_irq_init(struct tps65912 *tps65912, int irq,
struct tps65912_platform_data *pdata);
+int tps65912_irq_exit(struct tps65912 *tps65912);
 
 #endif /*  __LINUX_MFD_TPS65912_H */
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] epoll: preserve ordering of events from ovflist

2013-03-01 Thread Eric Wong
Events arriving in ovflist are stored in LIFO order, so
we should account for that when inserting them into rddlist.

Signed-off-by: Eric Wong 
Cc: Davide Libenzi 
Cc: Al Viro 
Cc: Andrew Morton 
---
  I think this can lead to starvation in some rare cases, but I have not
  been able to trigger it.  The window for ovflist insertion is tiny.

 fs/eventpoll.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 9fec183..5a1a596 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -598,7 +598,7 @@ static int ep_scan_ready_list(struct eventpoll *ep,
 * contain them, and the list_splice() below takes care of them.
 */
if (!ep_is_linked(>rdllink)) {
-   list_add_tail(>rdllink, >rdllist);
+   list_add(>rdllink, >rdllist);
__pm_stay_awake(epi->ws);
}
}
-- 
Eric Wong
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v6 3/3] QEMU-AER: Qemu changes to support AER for VFIO-PCI devices

2013-03-01 Thread Vijay Mohan Pandarathil
- Create eventfd per vfio device assigned to a guest and register an
  event handler

- This fd is passed to the vfio_pci driver through the SET_IRQ ioctl

- When the device encounters an error, the eventfd is signalled
  and the qemu eventfd handler gets invoked.

- In the handler decide what action to take. Current action taken
  is to stop the guest.

Signed-off-by: Vijay Mohan Pandarathil 
---
 hw/vfio_pci.c  | 123 +
 linux-headers/linux/vfio.h |   1 +
 2 files changed, 124 insertions(+)

diff --git a/hw/vfio_pci.c b/hw/vfio_pci.c
index ad9ae36..3c78771 100644
--- a/hw/vfio_pci.c
+++ b/hw/vfio_pci.c
@@ -38,6 +38,7 @@
 #include "qemu/error-report.h"
 #include "qemu/queue.h"
 #include "qemu/range.h"
+#include "sysemu/sysemu.h"
 
 /* #define DEBUG_VFIO */
 #ifdef DEBUG_VFIO
@@ -129,7 +130,9 @@ typedef struct VFIODevice {
 PCIHostDeviceAddress host;
 QLIST_ENTRY(VFIODevice) next;
 struct VFIOGroup *group;
+EventNotifier err_notifier;
 bool reset_works;
+bool pci_aer;
 } VFIODevice;
 
 typedef struct VFIOGroup {
@@ -1802,6 +1805,7 @@ static int vfio_get_device(VFIOGroup *group, const char 
*name, VFIODevice *vdev)
 {
 struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) };
 struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) };
+struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info) };
 int ret, i;
 
 ret = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
@@ -1904,6 +1908,18 @@ static int vfio_get_device(VFIOGroup *group, const char 
*name, VFIODevice *vdev)
 }
 vdev->config_offset = reg_info.offset;
 
+irq_info.index = VFIO_PCI_ERR_IRQ_INDEX;
+
+ret = ioctl(vdev->fd, VFIO_DEVICE_GET_IRQ_INFO, _info);
+if (ret) {
+/* This can fail for an old kernel or legacy PCI dev */
+DPRINTF("VFIO_DEVICE_GET_IRQ_INFO failure ret=%d\n", ret);
+ret = 0;
+} else if (irq_info.count == 1) {
+vdev->pci_aer = true;
+} else {
+error_report("vfio: Warning: Could not enable error recovery for the 
device\n");
+}
 error:
 if (ret) {
 QLIST_REMOVE(vdev, next);
@@ -1925,6 +1941,110 @@ static void vfio_put_device(VFIODevice *vdev)
 }
 }
 
+static void vfio_err_notifier_handler(void *opaque)
+{
+VFIODevice *vdev = opaque;
+
+if (!event_notifier_test_and_clear(>err_notifier)) {
+return;
+}
+
+/*
+ * TBD. Retrieve the error details and decide what action
+ * needs to be taken. One of the actions could be to pass
+ * the error to the guest and have the guest driver recover
+ * from the error. This requires that PCIe capabilities be
+ * exposed to the guest. For now, we just terminate the
+ * guest to contain the error.
+ */
+
+error_report("%s (%04x:%02x:%02x.%x)"
+"Unrecoverable error detected...\n"
+"Please collect any data possible and then kill the guest",
+__func__, vdev->host.domain, vdev->host.bus,
+vdev->host.slot, vdev->host.function);
+
+vm_stop(RUN_STATE_IO_ERROR);
+}
+
+/*
+ * Registers error notifier for devices supporting error recovery.
+ * If we encounter a failure in this function, we report an error
+ * and continue after disabling error recovery support for the
+ * device.
+ */
+static void vfio_register_err_notifier(VFIODevice *vdev)
+{
+int ret;
+int argsz;
+struct vfio_irq_set *irq_set;
+int32_t *pfd;
+
+if (!vdev->pci_aer) {
+return;
+}
+
+if (event_notifier_init(>err_notifier, 0)) {
+error_report("vfio: Warning: Unable to init event notifier for error 
detection\n");
+vdev->pci_aer = false;
+return;
+}
+
+argsz = sizeof(*irq_set) + sizeof(*pfd);
+
+irq_set = g_malloc0(argsz);
+irq_set->argsz = argsz;
+irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD |
+ VFIO_IRQ_SET_ACTION_TRIGGER;
+irq_set->index = VFIO_PCI_ERR_IRQ_INDEX;
+irq_set->start = 0;
+irq_set->count = 1;
+pfd = (int32_t *)_set->data;
+
+*pfd = event_notifier_get_fd(>err_notifier);
+qemu_set_fd_handler(*pfd, vfio_err_notifier_handler, NULL, vdev);
+
+ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, irq_set);
+if (ret) {
+error_report("vfio: Failed to set up error notification\n");
+qemu_set_fd_handler(*pfd, NULL, NULL, vdev);
+event_notifier_cleanup(>err_notifier);
+vdev->pci_aer = false;
+}
+g_free(irq_set);
+}
+static void vfio_unregister_err_notifier(VFIODevice *vdev)
+{
+int argsz;
+struct vfio_irq_set *irq_set;
+int32_t *pfd;
+int ret;
+
+if (!vdev->pci_aer) {
+return;
+}
+
+argsz = sizeof(*irq_set) + sizeof(*pfd);
+
+irq_set = g_malloc0(argsz);
+irq_set->argsz = argsz;
+irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD |
+ VFIO_IRQ_SET_ACTION_TRIGGER;
+irq_set->index = 

[PATCH v6 2/3] VFIO-AER: Vfio-pci driver changes for supporting AER

2013-03-01 Thread Vijay Mohan Pandarathil
- New VFIO_SET_IRQ ioctl option to pass the eventfd that is signaled 
when
  an error occurs in the vfio_pci_device

- Register pci_error_handler for the vfio_pci driver

- When the device encounters an error, the error handler registered by
  the vfio_pci driver gets invoked by the AER infrastructure

- In the error handler, signal the eventfd registered for the device.

- This results in the qemu eventfd handler getting invoked and
  appropriate action taken for the guest.

Signed-off-by: Vijay Mohan Pandarathil 
---
 drivers/vfio/pci/vfio_pci.c | 44 -
 drivers/vfio/pci/vfio_pci_intrs.c   | 49 +
 drivers/vfio/pci/vfio_pci_private.h |  1 +
 include/uapi/linux/vfio.h   |  1 +
 4 files changed, 94 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 8189cb6..acfcb1a 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -201,7 +201,9 @@ static int vfio_pci_get_irq_count(struct vfio_pci_device 
*vdev, int irq_type)
 
return (flags & PCI_MSIX_FLAGS_QSIZE) + 1;
}
-   }
+   } else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX)
+   if (pci_is_pcie(vdev->pdev))
+   return 1;
 
return 0;
 }
@@ -317,6 +319,17 @@ static long vfio_pci_ioctl(void *device_data,
if (info.argsz < minsz || info.index >= VFIO_PCI_NUM_IRQS)
return -EINVAL;
 
+   switch (info.index) {
+   case VFIO_PCI_INTX_IRQ_INDEX ... VFIO_PCI_MSIX_IRQ_INDEX:
+   break;
+   case VFIO_PCI_ERR_IRQ_INDEX:
+   if (pci_is_pcie(vdev->pdev))
+   break;
+   /* pass thru to return error */
+   default:
+   return -EINVAL;
+   }
+
info.flags = VFIO_IRQ_INFO_EVENTFD;
 
info.count = vfio_pci_get_irq_count(vdev, info.index);
@@ -551,11 +564,40 @@ static void vfio_pci_remove(struct pci_dev *pdev)
kfree(vdev);
 }
 
+static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
+ pci_channel_state_t state)
+{
+   struct vfio_pci_device *vdev;
+   struct vfio_device *device;
+
+   device = vfio_device_get_from_dev(>dev);
+   if (device == NULL)
+   return PCI_ERS_RESULT_DISCONNECT;
+
+   vdev = vfio_device_data(device);
+   if (vdev == NULL) {
+   vfio_device_put(device);
+   return PCI_ERS_RESULT_DISCONNECT;
+   }
+
+   if (vdev->err_trigger)
+   eventfd_signal(vdev->err_trigger, 1);
+
+   vfio_device_put(device);
+
+   return PCI_ERS_RESULT_CAN_RECOVER;
+}
+
+static struct pci_error_handlers vfio_err_handlers = {
+   .error_detected = vfio_pci_aer_err_detected,
+};
+
 static struct pci_driver vfio_pci_driver = {
.name   = "vfio-pci",
.id_table   = NULL, /* only dynamic ids */
.probe  = vfio_pci_probe,
.remove = vfio_pci_remove,
+   .err_handler= _err_handlers,
 };
 
 static void __exit vfio_pci_cleanup(void)
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c 
b/drivers/vfio/pci/vfio_pci_intrs.c
index 3639371..4a29830 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -745,6 +745,48 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_device 
*vdev,
return 0;
 }
 
+static int vfio_pci_set_err_trigger(struct vfio_pci_device *vdev,
+   unsigned index, unsigned start,
+   unsigned count, uint32_t flags, void *data)
+{
+   int32_t fd = *(int32_t *)data;
+
+   if ((index != VFIO_PCI_ERR_IRQ_INDEX) ||
+   !(flags & VFIO_IRQ_SET_DATA_TYPE_MASK))
+   return -EINVAL;
+
+   /* DATA_NONE/DATA_BOOL enables loopback testing */
+
+   if (flags & VFIO_IRQ_SET_DATA_NONE) {
+   if (vdev->err_trigger)
+   eventfd_signal(vdev->err_trigger, 1);
+   return 0;
+   } else if (flags & VFIO_IRQ_SET_DATA_BOOL) {
+   uint8_t trigger = *(uint8_t *)data;
+   if (trigger && vdev->err_trigger)
+   eventfd_signal(vdev->err_trigger, 1);
+   return 0;
+   }
+
+   /* Handle SET_DATA_EVENTFD */
+
+   if (fd == -1) {
+   if (vdev->err_trigger)
+   eventfd_ctx_put(vdev->err_trigger);
+   vdev->err_trigger = NULL;
+   return 0;
+   } else if (fd >= 0) {
+   struct eventfd_ctx *efdctx;
+   efdctx = eventfd_ctx_fdget(fd);
+   if (IS_ERR(efdctx))
+   return PTR_ERR(efdctx);
+   if 

[PATCH v6 1/3] VFIO: Wrapper for getting reference to vfio_device from device

2013-03-01 Thread Vijay Mohan Pandarathil
- Added vfio_device_get_from_dev() as wrapper to get
  reference to vfio_device from struct device.

- Added vfio_device_data() as a wrapper to get device_data from
  vfio_device.

Signed-off-by: Vijay Mohan Pandarathil 
---
 drivers/vfio/vfio.c  | 27 ++-
 include/linux/vfio.h |  3 +++
 2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 28e2d5b..eec6674 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -407,12 +407,13 @@ static void vfio_device_release(struct kref *kref)
 }
 
 /* Device reference always implies a group reference */
-static void vfio_device_put(struct vfio_device *device)
+void vfio_device_put(struct vfio_device *device)
 {
struct vfio_group *group = device->group;
kref_put_mutex(>kref, vfio_device_release, >device_lock);
vfio_group_put(group);
 }
+EXPORT_SYMBOL_GPL(vfio_device_put);
 
 static void vfio_device_get(struct vfio_device *device)
 {
@@ -642,6 +643,30 @@ int vfio_add_group_dev(struct device *dev,
 }
 EXPORT_SYMBOL_GPL(vfio_add_group_dev);
 
+/**
+ * This does a get on the vfio_device from device.
+ * Callers of this function will have to call vfio_put_device() to
+ * remove the reference.
+ */
+struct vfio_device *vfio_device_get_from_dev(struct device *dev)
+{
+   struct vfio_device *device = dev_get_drvdata(dev);
+
+   vfio_device_get(device);
+
+   return device;
+}
+EXPORT_SYMBOL_GPL(vfio_device_get_from_dev);
+
+/*
+ * Caller must hold a reference to the vfio_device
+ */
+void *vfio_device_data(struct vfio_device *device)
+{
+   return device->device_data;
+}
+EXPORT_SYMBOL_GPL(vfio_device_data);
+
 /* Given a referenced group, check if it contains the device */
 static bool vfio_dev_present(struct vfio_group *group, struct device *dev)
 {
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index ab9e862..ac8d488 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -45,6 +45,9 @@ extern int vfio_add_group_dev(struct device *dev,
  void *device_data);
 
 extern void *vfio_del_group_dev(struct device *dev);
+extern struct vfio_device *vfio_device_get_from_dev(struct device *dev);
+extern void vfio_device_put(struct vfio_device *device);
+extern void *vfio_device_data(struct vfio_device *device);
 
 /**
  * struct vfio_iommu_driver_ops - VFIO IOMMU driver callbacks
-- 
1.7.11.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v6 0/3] AER-KVM: Error containment of VFIO devices assigned to KVM guests

2013-03-01 Thread Vijay Mohan Pandarathil
Add support for error containment when a VFIO device assigned to a KVM
guest encounters an error. This is for PCIe devices/drivers that support AER
functionality. When the host OS is notified of an error in a device either
through the firmware first approach or through an interrupt handled by the AER
root port driver, the error handler registered by the vfio-pci driver gets
invoked. The qemu process is signaled through an eventfd registered per
VFIO device by the qemu process. In the eventfd handler, qemu decides on
what action to take. In this implementation, guest is brought down to
contain the error.


v6:
 - Rebased to latest upstream
 - Resolved merge conflict with vfio_dev_present()
v5:
 - Rebased to latest upstream stable bits
 - Incorporated v4 feedback
v4:
 - Stop the guest instead of terminating
 - Remove unwanted returns from functions
 - Incorporate other feedback
v3:
 - Removed PCI_AER* flags from device info ioctl.
 - Incorporated feedback
v2:
 - Rebased to latest upstream stable bits
 - Changed the new ioctl to be part of VFIO_SET_IRQs ioctl
 - Added a new patch to get/put reference to a vfio device from struct device
 - Incorporated all other feedback.

---

Vijay Mohan Pandarathil(3):

[PATCH 1/3] VFIO: Wrapper to get reference to vfio_device from device 
[PATCH 2/3] VFIO-AER: Vfio-pci driver changes for supporting AER
[PATCH 3/3] QEMU-AER: Qemu changes to support AER for VFIO-PCI devices

Kernel files changed


 drivers/vfio/vfio.c  | 27 ++-
 include/linux/vfio.h |  3 +++
 2 files changed, 29 insertions(+), 1 deletion(-)

 drivers/vfio/pci/vfio_pci.c | 44 -
 drivers/vfio/pci/vfio_pci_intrs.c   | 49 +
 drivers/vfio/pci/vfio_pci_private.h |  1 +
 include/uapi/linux/vfio.h   |  1 +
 4 files changed, 94 insertions(+), 1 deletion(-)

Qemu files changed

 hw/vfio_pci.c  | 123 +
 linux-headers/linux/vfio.h |   1 +
 2 files changed, 124 insertions(+)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] btrfs: return EPERM in btrfs_rm_device()

2013-03-01 Thread Jerry Snitselaar
Currently there are error paths in btrfs_rm_device() where EINVAL is
returned telling the user they passed an invalid argument even though
they passed a valid device. Change to return EPERM instead as the
operation is not permitted.

Signed-off-by: Jerry Snitselaar 
---
 fs/btrfs/volumes.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 5cbb7f4..3e1586c 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1392,14 +1392,14 @@ int btrfs_rm_device(struct btrfs_root *root, char 
*device_path)
if ((all_avail & BTRFS_BLOCK_GROUP_RAID10) && num_devices <= 4) {
printk(KERN_ERR "btrfs: unable to go below four devices "
   "on raid10\n");
-   ret = -EINVAL;
+   ret = -EPERM;
goto out;
}
 
if ((all_avail & BTRFS_BLOCK_GROUP_RAID1) && num_devices <= 2) {
printk(KERN_ERR "btrfs: unable to go below two "
   "devices on raid1\n");
-   ret = -EINVAL;
+   ret = -EPERM;
goto out;
}
 
@@ -1449,14 +1449,14 @@ int btrfs_rm_device(struct btrfs_root *root, char 
*device_path)
 
if (device->is_tgtdev_for_dev_replace) {
pr_err("btrfs: unable to remove the dev_replace target dev\n");
-   ret = -EINVAL;
+   ret = -EPERM;
goto error_brelse;
}
 
if (device->writeable && root->fs_info->fs_devices->rw_devices == 1) {
printk(KERN_ERR "btrfs: unable to remove the only writeable "
   "device\n");
-   ret = -EINVAL;
+   ret = -EPERM;
goto error_brelse;
}
 
-- 
1.8.2.rc1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/2] ipc: do not hold ipc lock more than necessary

2013-03-01 Thread Michel Lespinasse
On Sat, Mar 2, 2013 at 12:43 PM, Emmanuel Benisty  wrote:
> Hi,
>
> On Sat, Mar 2, 2013 at 7:16 AM, Davidlohr Bueso  
> wrote:
>> The following set of not-thoroughly-tested patches are based on the
>> discussion of holding the ipc lock unnecessarily, such as for permissions
>> and security checks:
>>
>> https://lkml.org/lkml/2013/2/28/540
>>
>> Patch 0/1: Introduces new functions, analogous to ipc_lock and ipc_lock_check
>> in the ipc utility code, allowing to obtain the ipc object without holding 
>> the lock.
>>
>> Patch 0/2: Use the new functions and only acquire the ipc lock when needed.
>
> Not sure how much a work in progress this is but my machine dies
> immediately when I start chromium, crappy mobile phone picture here:
> http://i.imgur.com/S0hfPz3.jpg

We are missing the top of the trace there, so it's hard to be sure -
however, this could well be caused by the if (!out) check (instead of
if (IS_ERR(out)) that I noticed in patch 1/2.

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] arch/arc (with fixes) for v3.9-rc1

2013-03-01 Thread Vineet Gupta
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Friday 22 February 2013 12:28 PM, Vineet Gupta wrote:
> Hi Linus,
> 
> I would like to introduce the Linux port to ARC Processors (from Synopsys)
> for 3.9-rc1. The patch-set has been discussed on the public lists since Nov
> and has received a fair bit of review, specially from Arnd, tglx, Al and
> other subsystem maintainers for DeviceTree, kgdb .
> 
> The arch bits are in arch/arc, some asm-generic changes (acked by Arnd), a
> minor change to PARISC (acked by Helge).
> 
> The series is a touch bigger for a new port for 2 main reasons: 1. It enables
> a basic kernel in first sub-series and adds ptrace/kgdb/.. later 2. Some of
> the fallout of review (DeviceTree support, multi-platform-image support) were
> added on top of orig series, primarily to record the revision history.
> 
> Please consider pulling.
> 
> Thanks, Vineet

Hi Linus,

This updated pull request contains the
- -original ARC port (1st pull request)
- -fixes due to our GNU tools catching up with the new syscall/ptrace ABI
- -some (minor) cross-arch Kconfig updates.

Please consider pulling.

Thanks,
Vineet

The following changes since commit 949db153b6466c6f7cad5a427ecea94985927311:

  Linux 3.8-rc5 (2013-01-25 11:57:28 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc.git
tags/arc-v3.9-rc1-late

for you to fetch changes up to 8ccfe6675fa974bd06d64f74d0fdee6a5267d2aa:

  ARC: split elf.h into uapi and export it for userspace (2013-02-27 20:00:26 
+0530)

- 
Initial ARC Linux port with some fixes on top for 3.9-rc1

- 
Gilad Ben-Yossef (1):
  ARC: Add support for ioremap_prot API

Mischa Jonker (1):
  ARC: kgdb support

Vineet Gupta (80):
  ARC: Generic Headers
  ARC: Build system: Makefiles, Kconfig, Linker script
  ARC: irqflags - Interrupt enabling/disabling at in-core intc
  ARC: Atomic/bitops/cmpxchg/barriers
  asm-generic headers: uaccess.h to conditionally define segment_eq()
  ARC: uaccess friends
  asm-generic: uaccess: Allow arches to over-ride __{get,put}_user_fn()
  ARC: [optim] uaccess __{get,put}_user() optimised
  asm-generic headers: Allow yet more arch overrides in checksum.h
  ARC: Checksum/byteorder/swab routines
  ARC: Fundamental ARCH data-types/defines
  ARC: Spinlock/rwlock/mutex primitives
  ARC: String library
  ARC: Low level IRQ/Trap/Exception Handling
  ARC: Interrupt Handling
  ARC: Non-MMU Exception Handling
  ARC: Syscall support (no-legacy-syscall ABI)
  ARC: Process-creation/scheduling/idle-loop
  ARC: Timers/counters/delay management
  ARC: Signal handling
  ARC: [Review] Preparing to fix incorrect syscall restarts due to signals
  ARC: [Review] Prevent incorrect syscall restarts
  ARC: Cache Flush Management
  ARC: Page Table Management
  ARC: MMU Context Management
  ARC: MMU Exception Handling
  ARC: TLB flush Handling
  ARC: Page Fault handling
  ARC: I/O and DMA Mappings
  ARC: Boot #1: low-level, setup_arch(), /proc/cpuinfo, mem init
  ARC: [plat-arcfpga] Static platform device for CONFIG_SERIAL_ARC
  ARC: [DeviceTree] Basic support
  ARC: [DeviceTree] Convert some Kconfig items to runtime values
  ARC: [plat-arcfpga]: Enabling DeviceTree for Angel4 board
  ARC: Last bits (stubs) to get to a running kernel with UART
  ARC: [plat-arcfpga] defconfig
  ARC: [optim] Cache "current" in Register r25
  ARC: ptrace support
  ARC: Futex support
  ARC: OProfile support
  ARC: Support for high priority interrupts in the in-core intc
  ARC: Module support
  ARC: Diagnostics: show_regs() etc
  ARC: SMP support
  ARC: DWARF2 .debug_frame based stack unwinder
  ARC: stacktracing APIs based on dw2 unwinder
  ARC: disassembly (needed by kprobes/kgdb/unaligned-access-emul)
  ARC: kprobes support
  sysctl: Enable PARISC "unaligned-trap" to be used cross-arch
  ARC: Unaligned access emulation
  ARC: Boot #2: Verbose Boot reporting / feature verification
  ARC: [plat-arfpga] BVCI Latency Unit setup
  perf, ARC: Enable building perf tools for ARC
  ARC: perf support (software counters only)
  ARC: Support for single cycle Close Coupled Mem (CCM)
  ARC: Hostlink Pseudo-Driver for Metaware Debugger
  ARC: UAPI Disintegrate arch/arc/include/asm
  ARC: [Review] Multi-platform image #1: Kconfig enablement
  ARC: Fold boards sub-menu into platform/SoC menu
  ARC: [Review] Multi-platform image #2: Board callback Infrastructure
  ARC: [Review] Multi-platform image #3: switch to board callback
  ARC: [Review] Multi-platform image #4: Isolate platform headers
  ARC: [Review] Multi-platform image #5: NR_IRQS defined by ARC core
  ARC: 

Re: WARNING at tty_buffer.c:428 process_one_work()

2013-03-01 Thread David Miller
From: Al Viro 
Date: Sat, 2 Mar 2013 05:23:30 +

> On Sat, Mar 02, 2013 at 03:49:35PM +1100, Stephen Rothwell wrote:
>> On Fri, 01 Mar 2013 17:10:08 -0500 (EST) David Miller  
>> wrote:
> 
>> > Ok, next I'm hitting some regression in Al Viro's signal changes when 
>> > userland
>> > starts up. :-)
>> 
>> If only we had some way of testing this stuff before it gets into Linus'
>> tree ... ;-)
> 
> Dave, my deep apologies.  I only now realized that you hadn't been on Cc of
> the discussion of that crap; see signal.git#for-linus (commit aee41fe) for
> fix...  I thought I'd Cc'd you back then (about a week ago) ;-/

No apology necessary.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Boot failure on Origen board using latest kernel

2013-03-01 Thread Padma Venkat
Hi,

On Sat, Mar 2, 2013 at 10:26 AM, Sachin Kamat  wrote:
> Hi Alim,
>
> On 2 March 2013 10:18, Alim Akhtar  wrote:
>> Hi Sachin,
>>
>> Looks like exynos4 is not yet moved to the generic dma binding recently
>> merged.
>>
>> Could you try out below:
>
> I forgot to mention in the previous mail that the problem was in non-dt case.
> With the below entries added in exynos4 dtsi file, it boots fine in DT case.

For non-dt case I tested on 6410 board where it is using different DMA
controller. I will a send patch to fix this.

Thanks
Padma
>
> --
> With warm regards,
> Sachin
> --
> To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" 
> in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!

2013-03-01 Thread Yinghai Lu
On Fri, Mar 1, 2013 at 7:03 PM, chen tang  wrote:
>
> Thank you for your suggestion and fix work. :)
> I would prefer your Plan b. But one last thing I want to confirm:
>
> Will "allocating pgdat and zone on local node" prevent node hot-removing ?
> Or is it safe to free all node data when removing a node ?
> AFAIK, no way to ensure node data is not on thread stack.

Not sure. I need to go over the code.
That is slub's limitation.

If it is not, it should be fixed.

>
> If it is OK, I think Plan B is OK, and we can improve movablemem_map more in
> the future.
>
> BTW, I didn't mean to deny your idea and work. NUMA performance is always
> understand our  consideration.
> It's just we plan it as a long way development in the future.
> movablemem_map is very important to us. And we do hope to keep it in kernel
> now, and improve it later.

That does not look like right way to do development with mainline tree
to add new
features.

You don't need to put development/testing support patches in the mainline.
Just put those support patches in your local tree.

Everyone have bunch of development/debug/teststub patches in their own
hardisk for their working area, but don't need put them into mainline tree.

Good practice should be:
Have the feature completely done in your local tree and etc.
then send out several patchset. and get reviewed and get merged
one by one.

Sometime would turn out that your whole patchset has problem that
can not be fixed during review, and should be redesign again.

Mainline tree is NOT testbed.

For pci-root-bus hotplug, I already had code done completely.
Then send out patchset one by one to get completely review.
One patchset about acpi-scan is totally rewritten by Rafael after he understood
our needs with better and clean design.
Now still have ioapic and iommu left, and those patchset have been in
my local tree more than 6 months and I keep optimizing them.

BTW, Please do not top-post later.

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Boot failure on Origen board using latest kernel

2013-03-01 Thread Alim Akhtar
On Sat, Mar 2, 2013 at 10:26 AM, Sachin Kamat  wrote:
> Hi Alim,
>
> On 2 March 2013 10:18, Alim Akhtar  wrote:
>> Hi Sachin,
>>
>> Looks like exynos4 is not yet moved to the generic dma binding recently
>> merged.
>>
>> Could you try out below:
>
> I forgot to mention in the previous mail that the problem was in non-dt case.
> With the below entries added in exynos4 dtsi file, it boots fine in DT case.
>

Aha, I understand.
But I think the current mainline kernel is broken for __DT case__ also
for exynos4 and that is what my suggested patch fix.
Thanks for you confirmation.

> --
> With warm regards,
> Sachin

--
Regards,
Alim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: WARNING at tty_buffer.c:428 process_one_work()

2013-03-01 Thread Al Viro
On Sat, Mar 02, 2013 at 03:49:35PM +1100, Stephen Rothwell wrote:
> On Fri, 01 Mar 2013 17:10:08 -0500 (EST) David Miller  
> wrote:

> > Ok, next I'm hitting some regression in Al Viro's signal changes when 
> > userland
> > starts up. :-)
> 
> If only we had some way of testing this stuff before it gets into Linus'
> tree ... ;-)

Dave, my deep apologies.  I only now realized that you hadn't been on Cc of
the discussion of that crap; see signal.git#for-linus (commit aee41fe) for
fix...  I thought I'd Cc'd you back then (about a week ago) ;-/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: upgrade to 3.8.1 : BUG Scheduling while atomic in bonding driver:

2013-03-01 Thread Linda Walsh




Linda Walsh wrote:
>
>
> This patch is not in the latest kernel.  I don't know if it is the
> 'best' way, but it does stop BUG error messages.
---
Update -- it *used* to stop the messages in 3.6.7.

It no longer stops the messages in 3.8.1 -- (and isn't present by
default -- tried
adding the unlock/lock -- no difference.

Weird. *sigh*

>
>
>  Original Message 
> Subject:  Re: BUG: scheduling while atomic:
> ifup-bonding/3711/0x0002 -- V3.6.7
> Date: Wed, 28 Nov 2012 13:17:31 -0800
> From: Linda Walsh 
> To:   Cong Wang 
> CC:   LKML , Linux Kernel Network
> Developers 
> References:   <50b5248a.5010...@tlinx.org>
> 
>
>
>
> Cong Wang wrote:
>> Does this quick fix help?
>> diff --git a/drivers/net/bonding/bond_main.c 
>> b/drivers/net/bonding/bond_main.c
>> index 5f5b69f..4a4d9eb 100644
>> --- a/drivers/net/bonding/bond_main.c
>> +++ b/drivers/net/bonding/bond_main.c
>> @@ -1785,7 +1785,9 @@ int bond_enslave(struct net_device *bond_dev,
>> struct net_device *slave_dev)
>> new_slave->link == BOND_LINK_DOWN ? "DOWN" :
>> (new_slave->link == BOND_LINK_UP ? "UP" : "BACK"));
>>
>> +   read_unlock(>lock);
>> bond_update_speed_duplex(new_slave);
>> +   read_lock(>lock);
>>
>> if (USES_PRIMARY(bond->params.mode) && bond->params.primary[0]) {
>> /* if there is a primary slave, remember it */
>>
>> Thanks!
>>   
>
>
>
>
> Eric Dumazet wrote:
>> On Fri, 2013-03-01 at 00:15 -0800, Linda Walsh wrote:
>>   
>>> Just installed 3.8.1
>>>
>>> Thought this had been fixed?  Note it causes the kernel to
>>> show up as tainted after the 1st...
>>>
>>> 
>>
>> CC netdev & Jay Vosburgh & Jeff Kirsher
>>
>>   
>>> As the system was coming up and initializing the bond0 driver:
>>>
>>>
>>> [   19.847743] ixgbe :06:00.0: registered PHC device on eth_s2_0
>>> [   20.258245] BUG: scheduling while atomic: ifup-bonding/2003/0x0002
>>> [   20.264812] 4 locks held by ifup-bonding/2003:
>>> [   20.269298]  #0:  (>mutex){..}, at: []
>>> sysfs_write_file+0x3f/0x150
>>> [   20.278319]  #1:  (s_active#59){..}, at: []
>>> sysfs_write_file+0xbb/0x150
>>> [   20.287088]  #2:  (rtnl_mutex){..}, at: []
>>> rtnl_trylock+0x10/0x20
>>> [   20.295373]  #3:  (>lock){..}, at: []
>>> bond_enslave+0x4ef/0xb80
>>> [   20.303912] Modules linked in: iptable_filter kvm_intel kvm acpi_cpufreq
>>> mperf button processor mousedev iTCO_wdt
>>> [   20.314695] Pid: 2003, comm: ifup-bonding Not tainted 3.8.1-Isht-Van #5
>>> [   20.321340] Call Trace:
>>> [   20.323833]  [] __schedule_bug+0x5e/0x6c
>>> [   20.329356]  [] __schedule+0x762/0x7f0
>>> [   20.334701]  [] schedule+0x24/0x70
>>> [   20.339703]  [] 
>>> schedule_hrtimeout_range_clock+0xa4/0x130
>>> [   20.346699]  [] ? update_rmtp+0x60/0x60
>>> [   20.352130]  [] ? hrtimer_start_range_ns+0xf/0x20
>>> [   20.358434]  [] schedule_hrtimeout_range+0xe/0x10
>>> [   20.364734]  [] usleep_range+0x3b/0x40
>>> [   20.370082]  [] ixgbe_acquire_swfw_sync_X540+0xbc/0x100
>>> [   20.376905]  [] ixgbe_read_phy_reg_generic+0x3d/0x140
>>> [   20.383553]  []
>>> ixgbe_get_copper_link_capabilities_generic+0x2c/0x60
>>> [   20.391499]  [] ? bond_enslave+0x4ef/0xb80
>>> [   20.397194]  [] ixgbe_get_settings+0x34/0x340
>>> [   20.403148]  [] __ethtool_get_settings+0x88/0x130
>>> [   20.409448]  [] bond_update_speed_duplex+0x23/0x60
>>> [   20.415833]  [] bond_enslave+0x559/0xb80
>>> [   20.421356]  [] bonding_store_slaves+0x16f/0x1c0
>>> [   20.427569]  [] dev_attr_store+0x13/0x30
>>> [   20.433091]  [] sysfs_write_file+0xd4/0x150
>>> [   20.438872]  [] vfs_write+0xb1/0x190
>>> [   20.444047]  [] sys_write+0x50/0xa0
>>> [   20.449137]  [] system_call_fastpath+0x16/0x1b
>>> [   20.455264] BUG: scheduling while atomic: ifup-bonding/2003/0x0002
>>> [   20.461851] 4 locks held by ifup-bonding/2003:
>>> [   20.466334]  #0:  (>mutex){..}, at: []
>>> sysfs_write_file+0x3f/0x150
>>> [   20.475356]  #1:  (s_active#59){..}, at: []
>>> sysfs_write_file+0xbb/0x150
>>> [   20.484117]  #2:  (rtnl_mutex){..}, at: []
>>> rtnl_trylock+0x10/0x20
>>> [   20.492403]  #3:  (>lock){..}, at: []
>>> bond_enslave+0x4ef/0xb80
>>> [   20.500902] Modules linked in: iptable_filter kvm_intel kvm acpi_cpufreq
>>> mperf button processor mousedev iTCO_wdt
>>> [   20.511640] Pid: 2003, comm: ifup-bonding Tainted: GW
>>> 3.8.1-Isht-Van #5
>>> [   20.519240] Call Trace:
>>> [   20.521729]  [] __schedule_bug+0x5e/0x6c
>>> [   20.527251]  [] __schedule+0x762/0x7f0
>>> [   20.532599]  [] schedule+0x24/0x70
>>> [   20.537599]  [] 
>>> schedule_hrtimeout_range_clock+0xa4/0x130
>>> [   20.544592]  [] ? update_rmtp+0x60/0x60
>>> [   20.550026]  [] ? update_rmtp+0x60/0x60
>>> [   20.555462]  [] ? hrtimer_start_range_ns+0xf/0x20
>>> [   20.561763]  [] schedule_hrtimeout_range+0xe/0x10
>>> [   20.568064]  [] usleep_range+0x3b/0x40
>>> [   20.573415]  [] ixgbe_release_swfw_sync_X540+0x4e/0x60

Re: mmotm 2013-03-01-15-50 uploaded (strict user copy)

2013-03-01 Thread Randy Dunlap
On 03/01/13 20:16, Stephen Boyd wrote:
> On 03/01/13 19:42, Stephen Boyd wrote:
>> On 03/01/13 19:00, Randy Dunlap wrote:
>>> On 03/01/13 15:51, a...@linux-foundation.org wrote:
 The mm-of-the-moment snapshot 2013-03-01-15-50 has been uploaded to

http://www.ozlabs.org/~akpm/mmotm/

>>> on i386:
>>>
>>> ERROR: "copy_from_user_overflow" [fs/binfmt_misc.ko] undefined!
>>>
>>> which I don't understand.
>>> lib/usercopy.o is built and building binfmt_misc.c says:
>>>
>>>   CC [M]  fs/binfmt_misc.o
>>> In file included from arch/x86/include/asm/uaccess.h:537:0,
>>>  from include/linux/uaccess.h:5,
>>>  from include/linux/highmem.h:8,
>>>  from include/linux/pagemap.h:10,
>>>  from fs/binfmt_misc.c:27:
>>> arch/x86/include/asm/uaccess_32.h: In function 'parse_command.part.1':
>>> arch/x86/include/asm/uaccess_32.h:211:26: warning: call to 
>>> 'copy_from_user_overflow' declared with attribute warning: copy_from_user() 
>>> buffer size is not provably correct [enabled by default]
>> Hm.. That's because it's part of lib and not obj, right?

Yes, this fixes the build error.

>> diff --git a/lib/Makefile b/lib/Makefile
>> index 59fabd0..4c55104 100644
>> --- a/lib/Makefile
>> +++ b/lib/Makefile
>> @@ -15,7 +15,7 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \
>>  is_single_threaded.o plist.o decompress.o kobject_uevent.o \
>>  earlycpio.o percpu-refcount.o
>>  
>> -lib-$(CONFIG_ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS) += usercopy.o
>> +obj-$(CONFIG_ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS) += usercopy.o
>>  lib-$(CONFIG_MMU) += ioremap.o
>>  lib-$(CONFIG_SMP) += cpumask.o
>>  
>>
> 
> I'm a little confused though because it is lib-y on x86 before my patch.

binfmt_misc is built as a loadable module in my config.
It must be the only user of copy_from_user_overflow() in this config.
I guess that it would also fail prior to your patch, but I haven't
tested it.  Anyway, your patch above is correct and needed.

thanks,
-- 
~Randy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 00/77] 3.8.2-stable review

2013-03-01 Thread Greg Kroah-Hartman
On Fri, Mar 01, 2013 at 08:59:55PM -0700, Shuah Khan wrote:
> On Fri, Mar 1, 2013 at 12:43 PM, Greg Kroah-Hartman
>  wrote:
> > This is the start of the stable review cycle for the 3.8.2 release.
> > There are 77 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> >
> > Responses should be made by Sun Mar  3 19:42:25 UTC 2013.
> > Anything received after that time might be too late.
> >
> > The whole patch series can be found in one patch at:
> > kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.8.2-rc1.gz
> > and the diffstat can be found below.
> >
> > thanks,
> >
> > greg k-h
> >
> 
> Patches applied cleanly to 3.0.67, 3.4.34, and 3.8.1
> 
> Compiled and booted on the following systems:

Thanks for testing and letting us know.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] [libata] Avoid specialized TLA's in ZPODD's Kconfig

2013-03-01 Thread Aaron Lu
ODD is not a common TLA for non-ATA people so they will get confused
by its meaning when they are configuring the kernel. This patch fixed
this problem by using ODD only after stating what it is.

Signed-off-by: Aaron Lu 
---
v2:
Add a space before open paren as suggested by Sergei Shtylyov.

 drivers/ata/Kconfig | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/ata/Kconfig b/drivers/ata/Kconfig
index 3e751b7..a5a3ebc 100644
--- a/drivers/ata/Kconfig
+++ b/drivers/ata/Kconfig
@@ -59,15 +59,16 @@ config ATA_ACPI
  option libata.noacpi=1
 
 config SATA_ZPODD
-   bool "SATA Zero Power ODD Support"
+   bool "SATA Zero Power Optical Disc Drive (ZPODD) support"
depends on ATA_ACPI
default n
help
- This option adds support for SATA ZPODD. It requires both
- ODD and the platform support, and if enabled, will automatically
- power on/off the ODD when certain condition is satisfied. This
- does not impact user's experience of the ODD, only power is saved
- when ODD is not in use(i.e. no disc inside).
+ This option adds support for SATA Zero Power Optical Disc
+ Drive (ZPODD). It requires both the ODD and the platform
+ support, and if enabled, will automatically power on/off the
+ ODD when certain condition is satisfied. This does not impact
+ end user's experience of the ODD, only power is saved when
+ the ODD is not in use (i.e. no disc inside).
 
  If unsure, say N.
 
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Boot failure on Origen board using latest kernel

2013-03-01 Thread Sachin Kamat
Hi Alim,

On 2 March 2013 10:18, Alim Akhtar  wrote:
> Hi Sachin,
>
> Looks like exynos4 is not yet moved to the generic dma binding recently
> merged.
>
> Could you try out below:

I forgot to mention in the previous mail that the problem was in non-dt case.
With the below entries added in exynos4 dtsi file, it boots fine in DT case.

-- 
With warm regards,
Sachin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: build failure after merge of the ftrace tree

2013-03-01 Thread Stephen Rothwell
Hi Steve,

On Fri, 01 Mar 2013 23:22:58 -0500 Steven Rostedt  wrote:
>
> On Sat, 2013-03-02 at 15:17 +1100, Stephen Rothwell wrote:
> >  
> > > I made some major changes and wanted testing on it as soon as possible.
> > > I've done tons of testing on my own but I wanted a broader audience. Do
> > > you want me to pull my tree and reset it to what is only for 3.9?
> > 
> > Yes, please.  The reasoning is that until -rc1 is out we don't want to
> > complicate life for people whose code has not yet been merged by
> > reporting problems in code that will not be merged until the next merge
> > window.
> 
> OK, I reset it to what I based my work on, and that commit is in Linus's
> tree.

Thanks.  It should only be for a few days anyway.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgphPr_TJU5jw.pgp
Description: PGP signature


Re: Boot failure on Origen board using latest kernel

2013-03-01 Thread Sachin Kamat
Hi Padma,

Here is the short boot log with earlyprintk enabled. This is in the non-dt case.

EXYNOS4210 PMU Initialize
EXYNOS: Initializing architecture
s3c24xx-pwm s3c24xx-pwm.0: tin at 5000, tdiv at 5000, tin=divclk, base 0
bio: create slab  at 0
SCSI subsystem initialized
Switching to clocksource mct-frc
ROMFS MTD (C) 2007 Red Hat, Inc.
io scheduler noop registered
io scheduler deadline registered
io scheduler cfq registered (default)
dma-pl330 dma-pl330.0: Loaded driver for PL330 DMAC-267056
dma-pl330 dma-pl330.0:  DBUFF-32x4bytes Num_Chans-8 Num_Peri-32 Num_Events-32
of_dma_controller_register: not enough information provided
dma-pl330 dma-pl330.0: unable to register DMA to the generic DT DMA helpers
dma-pl330: probe of dma-pl330.0 failed with error -22



On 2 March 2013 09:40, Sachin Kamat  wrote:
> Hi Padma,
>
> While trying to boot the latest mainline kernel (Linus tree tip at
> commit b0af9cd9) on Exynos4210 based Origen board, it stops at
> "Uncompressing Linux... done, booting the kernel."
>
> Git bisect pointed to the commit 421da89aa ("DMA: PL330: Register the
> DMA controller with the generic DMA helpers").
> Reverting this commit helped me boot the board again.
> Could you please look into this?
>
> --
> With warm regards,
> Sachin



-- 
With warm regards,
Sachin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: WARNING at tty_buffer.c:428 process_one_work()

2013-03-01 Thread Stephen Rothwell
On Fri, 01 Mar 2013 17:10:08 -0500 (EST) David Miller  
wrote:
>
> From: Greg KH 
> Date: Fri, 1 Mar 2013 13:56:09 -0800
> 
> > On Fri, Mar 01, 2013 at 04:47:11PM -0500, David Miller wrote:
> >> 
> >> I'm getting these non-stop right when the hypervisor console registers
> >> on sparc64, and the machine won't boot up properly.  This is with
> >> Linus's current tree.
> >> 
> >> [511865.556835] console [ttyHV0] enabled
> >> [511865.564555] [ cut here ]
> >> [511865.612410] WARNING: at drivers/tty/tty_buffer.c:428 
> >> process_one_work+0x164/0x420()
> >> [511865.627846] tty is NULL
> > 
> > Sorry about this, I have a patch, from Jiri, to get to Linus after
> > 3.9-rc1 to remove the warning.  It's safe to ignore.  Maybe I should
> > just push it today, I wasn't aware that it was being hit so easily.
> 
> Ok, next I'm hitting some regression in Al Viro's signal changes when userland
> starts up. :-)

If only we had some way of testing this stuff before it gets into Linus'
tree ... ;-)

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgp29o2NMDkrG.pgp
Description: PGP signature


Re: [RFC PATCH 0/2] ipc: do not hold ipc lock more than necessary

2013-03-01 Thread Emmanuel Benisty
Hi,

On Sat, Mar 2, 2013 at 7:16 AM, Davidlohr Bueso  wrote:
> The following set of not-thoroughly-tested patches are based on the
> discussion of holding the ipc lock unnecessarily, such as for permissions
> and security checks:
>
> https://lkml.org/lkml/2013/2/28/540
>
> Patch 0/1: Introduces new functions, analogous to ipc_lock and ipc_lock_check
> in the ipc utility code, allowing to obtain the ipc object without holding 
> the lock.
>
> Patch 0/2: Use the new functions and only acquire the ipc lock when needed.

Not sure how much a work in progress this is but my machine dies
immediately when I start chromium, crappy mobile phone picture here:
http://i.imgur.com/S0hfPz3.jpg

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] ipc: semaphores: do not hold ipc lock more than necessary

2013-03-01 Thread Michel Lespinasse
On Sat, Mar 2, 2013 at 8:16 AM, Davidlohr Bueso  wrote:
> Instead of holding the ipc lock for permissions and security
> checks, among others, only acquire it when necessary.
>
> Signed-off-by: Davidlohr Bueso 

You got some really great test results on this; I think they deserve
to be mentioned in the commit message.

Code looks fine to me otherwise, but I only had a quick look.

Nice work!

Acked-by: Michel Lespinasse 

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 1/2] ipc: introduce obtaining a lockless ipc object

2013-03-01 Thread Michel Lespinasse
On Sat, Mar 2, 2013 at 8:16 AM, Davidlohr Bueso  wrote:
> Through ipc_lock() and, therefore, ipc_lock_check() we currently
> return the locked ipc object. This is not necessary for all situations,
> thus introduce, analogous, ipc_obtain_object and ipc_obtain_object_check
> functions that only mark the RCU read critical region without acquiring
> the lock and return the ipc object.
>
> Signed-off-by: Davidlohr Bueso 
> ---
>  ipc/util.c | 42 --
>  ipc/util.h |  2 ++
>  2 files changed, 34 insertions(+), 10 deletions(-)
>
> diff --git a/ipc/util.c b/ipc/util.c
> index 464a8ab..902f282 100644
> --- a/ipc/util.c
> +++ b/ipc/util.c
> @@ -667,6 +667,21 @@ void ipc64_perm_to_ipc_perm (struct ipc64_perm *in, 
> struct ipc_perm *out)
> out->seq= in->seq;
>  }
>
> +struct kern_ipc_perm *ipc_obtain_object(struct ipc_ids *ids, int id)
> +{
> +   struct kern_ipc_perm *out;
> +   int lid = ipcid_to_idx(id);
> +
> +   rcu_read_lock();
> +   out = idr_find(>ipcs_idr, lid);
> +   if (!out) {
> +   rcu_read_unlock();
> +   return ERR_PTR(-EINVAL);
> +   }
> +
> +   return out;
> +}

I think it may be nicer to take the rcu read lock at the call site
rather than in ipc_obtain_object(), to make the rcu read lock/unlock
sites pair up more nicely. Either that or make an inline
ipc_release_object function that pairs up with ipc_obtain_object() and
just does an rcu_read_unlock().

> +
>  /**
>   * ipc_lock - Lock an ipc structure without rw_mutex held
>   * @ids: IPC identifier set
> @@ -679,18 +694,13 @@ void ipc64_perm_to_ipc_perm (struct ipc64_perm *in, 
> struct ipc_perm *out)
>
>  struct kern_ipc_perm *ipc_lock(struct ipc_ids *ids, int id)
>  {
> -   struct kern_ipc_perm *out;
> -   int lid = ipcid_to_idx(id);
> +   struct kern_ipc_perm *out = ipc_obtain_object(ids, id);
>
> -   rcu_read_lock();
> -   out = idr_find(>ipcs_idr, lid);
> -   if (out == NULL) {
> -   rcu_read_unlock();
> +   if (!out)

I think this should be if (IS_ERR(out)) ?

Looks great otherwise.

Acked-by: Michel Lespinasse 

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: build failure after merge of the ftrace tree

2013-03-01 Thread Steven Rostedt
On Sat, 2013-03-02 at 15:17 +1100, Stephen Rothwell wrote:
>  
> > I made some major changes and wanted testing on it as soon as possible.
> > I've done tons of testing on my own but I wanted a broader audience. Do
> > you want me to pull my tree and reset it to what is only for 3.9?
> 
> Yes, please.  The reasoning is that until -rc1 is out we don't want to
> complicate life for people whose code has not yet been merged by
> reporting problems in code that will not be merged until the next merge
> window.

OK, I reset it to what I based my work on, and that commit is in Linus's
tree.

> 
> That is why my daily reports during the merge window start with:
> 
> "Please do not add any work destined for v3.10 to your -next included
> branches until after Linus has release v3.9-rc1."
> 
> But I am beginning to realise that not many actually read those reports (I

/me is guilty :-(

> had the date wrong on two of them this week and it was only noticed after
> the second one :-().
> 

I'm try to be better next time.

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] tracing/syscalls: Annotate field-defining functions with __init

2013-03-01 Thread Steven Rostedt
On Thu, 2013-02-21 at 10:33 +0800, Li Zefan wrote:
> These two functions are called during kernel boot only.
> 

Applied, thanks Li!

-- Steve

> Signed-off-by: Li Zefan 
> ---
>  kernel/trace/trace_syscalls.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
> index 5329e13..a70fa19 100644
> --- a/kernel/trace/trace_syscalls.c
> +++ b/kernel/trace/trace_syscalls.c
> @@ -232,7 +232,7 @@ static void free_syscall_print_fmt(struct 
> ftrace_event_call *call)
>   kfree(call->print_fmt);
>  }
>  
> -static int syscall_enter_define_fields(struct ftrace_event_call *call)
> +static int __init syscall_enter_define_fields(struct ftrace_event_call *call)
>  {
>   struct syscall_trace_enter trace;
>   struct syscall_metadata *meta = call->data;
> @@ -255,7 +255,7 @@ static int syscall_enter_define_fields(struct 
> ftrace_event_call *call)
>   return ret;
>  }
>  
> -static int syscall_exit_define_fields(struct ftrace_event_call *call)
> +static int __init syscall_exit_define_fields(struct ftrace_event_call *call)
>  {
>   struct syscall_trace_exit trace;
>   int ret;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: build failure after merge of the ftrace tree

2013-03-01 Thread Stephen Rothwell
Hi Steve,

On Fri, 01 Mar 2013 22:55:29 -0500 Steven Rostedt  wrote:
>
> Oh crap, sorry. Quite the contrary. I should have waited for -rc1 as
> this is intended for v3.10.
> 
> I made some major changes and wanted testing on it as soon as possible.
> I've done tons of testing on my own but I wanted a broader audience. Do
> you want me to pull my tree and reset it to what is only for 3.9?

Yes, please.  The reasoning is that until -rc1 is out we don't want to
complicate life for people whose code has not yet been merged by
reporting problems in code that will not be merged until the next merge
window.

That is why my daily reports during the merge window start with:

"Please do not add any work destined for v3.10 to your -next included
branches until after Linus has release v3.9-rc1."

But I am beginning to realise that not many actually read those reports (I
had the date wrong on two of them this week and it was only noticed after
the second one :-().

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpe9zmt2iK94.pgp
Description: PGP signature


Re: mmotm 2013-03-01-15-50 uploaded (strict user copy)

2013-03-01 Thread Stephen Boyd
On 03/01/13 19:42, Stephen Boyd wrote:
> On 03/01/13 19:00, Randy Dunlap wrote:
>> On 03/01/13 15:51, a...@linux-foundation.org wrote:
>>> The mm-of-the-moment snapshot 2013-03-01-15-50 has been uploaded to
>>>
>>>http://www.ozlabs.org/~akpm/mmotm/
>>>
>> on i386:
>>
>> ERROR: "copy_from_user_overflow" [fs/binfmt_misc.ko] undefined!
>>
>> which I don't understand.
>> lib/usercopy.o is built and building binfmt_misc.c says:
>>
>>   CC [M]  fs/binfmt_misc.o
>> In file included from arch/x86/include/asm/uaccess.h:537:0,
>>  from include/linux/uaccess.h:5,
>>  from include/linux/highmem.h:8,
>>  from include/linux/pagemap.h:10,
>>  from fs/binfmt_misc.c:27:
>> arch/x86/include/asm/uaccess_32.h: In function 'parse_command.part.1':
>> arch/x86/include/asm/uaccess_32.h:211:26: warning: call to 
>> 'copy_from_user_overflow' declared with attribute warning: copy_from_user() 
>> buffer size is not provably correct [enabled by default]
> Hm.. That's because it's part of lib and not obj, right?
>
> diff --git a/lib/Makefile b/lib/Makefile
> index 59fabd0..4c55104 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -15,7 +15,7 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \
>  is_single_threaded.o plist.o decompress.o kobject_uevent.o \
>  earlycpio.o percpu-refcount.o
>  
> -lib-$(CONFIG_ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS) += usercopy.o
> +obj-$(CONFIG_ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS) += usercopy.o
>  lib-$(CONFIG_MMU) += ioremap.o
>  lib-$(CONFIG_SMP) += cpumask.o
>  
>

I'm a little confused though because it is lib-y on x86 before my patch.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] mfd: as3711: add OF support

2013-03-01 Thread Mark Brown
On Mon, Feb 18, 2013 at 10:57:44AM +0100, Guennadi Liakhovetski wrote:
> Add device-tree bindings to the AS3711 regulator and backlight drivers.

Reviwed-by: Mark Brown 


signature.asc
Description: Digital signature


Boot failure on Origen board using latest kernel

2013-03-01 Thread Sachin Kamat
Hi Padma,

While trying to boot the latest mainline kernel (Linus tree tip at
commit b0af9cd9) on Exynos4210 based Origen board, it stops at
"Uncompressing Linux... done, booting the kernel."

Git bisect pointed to the commit 421da89aa ("DMA: PL330: Register the
DMA controller with the generic DMA helpers").
Reverting this commit helped me boot the board again.
Could you please look into this?

-- 
With warm regards,
Sachin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] documentation: add palmas dts definition

2013-03-01 Thread Mark Brown
On Wed, Feb 27, 2013 at 11:16:57AM -0700, Stephen Warren wrote:

> I believe what Mark wants is something like the following in this file
> (take from Documentation/devicetree/bindings/regulator/tps6586x.txt):

> > - regulators: A node that houses a sub-node for each regulator within the
> >   device. Each sub-node is identified using the node's name (or the 
> > deprecated
> >   regulator-compatible property if present), with valid values listed below.
> >   The content of each sub-node is defined by the standard binding for
> >   regulators; see regulator.txt.
> >   sys, sm[0-2], ldo[0-9] and ldo_rtc

Yes, exactly.


signature.asc
Description: Digital signature


Re: [PATCH] add extra free kbytes tunable

2013-03-01 Thread Simon Jeons

On 03/02/2013 11:08 AM, Hugh Dickins wrote:

On Sat, 2 Mar 2013, Simon Jeons wrote:

On 03/02/2013 09:42 AM, Hugh Dickins wrote:

On Sat, 2 Mar 2013, Simon Jeons wrote:

In function __add_to_swap_cache if add to radix tree successfully will
result
in increase NR_FILE_PAGES, why? This is anonymous page instead of file
backed
page.

Right, that's hard to understand without historical background.

I think the quick answer would be that we used to (and still do) think
of file-cache and swap-cache as two halves of page-cache.  And then when

shmem page should be treated as file-cache or swap-cache? It is strange since
it is consist of anonymous pages and these pages establish files.

A shmem page is swap-backed file-cache, and it may get transferred to or
from swap-cache: yes, it's a difficult and confusing case, as I said below.

I would never call it "anonymous", but it is counted in /proc/meminfo's
Active(anon) or Inactive(anon) rather than in (file), because "anon"
there is shorthand for "swap-backed".


Oh, I see. Thanks. :)




So you'll find that shmem and swap are counted as file in some places
and anon in others, and it's hard to grasp which is where and why,
without remembering the history.

Hugh


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] Platform: x86: chromeos_laptop : Add basic platform data for atmel devices

2013-03-01 Thread Olof Johansson
2013/3/1 Benson Leung :
> Add basic platform data to get the current upstream driver working
> with the 224s touchpad and 1664s touchscreen.
> We will be using NULL config so we will use the settings from the
> devices' NVRAMs.
>
> Signed-off-by: Benson Leung 
>
> Change-Id: I712bf4726fb4b194fbde44ad200c54d13dc3bdb9

Same thing, please remove Change-Id.

I've tested this patch, and had issues getting touchscreen to work.
Seems to be a driver-side issue, so this patch is still valid.

There was also some flakiness with the touchpad device going
unresponsive after some time that needs to be debugged. Again, driver
side.

With those caveats:

Tested-by: Olof Johansson 


-Olof
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 00/77] 3.8.2-stable review

2013-03-01 Thread Shuah Khan
On Fri, Mar 1, 2013 at 12:43 PM, Greg Kroah-Hartman
 wrote:
> This is the start of the stable review cycle for the 3.8.2 release.
> There are 77 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Sun Mar  3 19:42:25 UTC 2013.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.8.2-rc1.gz
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
>

Patches applied cleanly to 3.0.67, 3.4.34, and 3.8.1

Compiled and booted on the following systems:

HP EliteBook 6930p Intel(R) Core(TM)2 Duo CPU T9400 @ 2.53GHz
HP ProBook 6475b AMD A10-4600M APU with Radeon(tm) HD Graphics

Special test this cycle:
HP ProLiant DL385p Gen8: Tested all three releases for the following
commit: iommu/amd: Initialize device table after dma_ops

dmesgs for all releases look good. No regressions compared to the previous
dmesgs for each of these releases.

Cross-compile tests results:

alpha: defconfig passed on all
arm: defconfig passed on all
arm64: not applicable to 3.0.y, 3.4.y. defconfig passed on 3.8.y
c6x: not applicable to 3.0.y, defconfig passed on 3.4.y, and 3.8.y.
mips: defconfig passed on all
mipsel: defconfig passed on all
powerpc: wii_defconfig passed on all
sh: defconfig passed on all
sparc: defconfig passed on all
tile: tilegx_defconfig passed on all

-- Shuah
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 00/46] 3.4.35-stable review

2013-03-01 Thread Shuah Khan
On Fri, Mar 1, 2013 at 12:44 PM, Greg Kroah-Hartman
 wrote:
> This is the start of the stable review cycle for the 3.4.35 release.
> There are 46 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Sun Mar  3 19:44:00 UTC 2013.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.4.35-rc1.gz
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
>

Patches applied cleanly to 3.0.67, 3.4.34, and 3.8.1

Compiled and booted on the following systems:

HP EliteBook 6930p Intel(R) Core(TM)2 Duo CPU T9400 @ 2.53GHz
HP ProBook 6475b AMD A10-4600M APU with Radeon(tm) HD Graphics

Special test this cycle:
HP ProLiant DL385p Gen8: Tested all three releases for the following
commit: iommu/amd: Initialize device table after dma_ops

dmesgs for all releases look good. No regressions compared to the previous
dmesgs for each of these releases.

Cross-compile tests results:

alpha: defconfig passed on all
arm: defconfig passed on all
arm64: not applicable to 3.0.y, 3.4.y. defconfig passed on 3.8.y
c6x: not applicable to 3.0.y, defconfig passed on 3.4.y, and 3.8.y.
mips: defconfig passed on all
mipsel: defconfig passed on all
powerpc: wii_defconfig passed on all
sh: defconfig passed on all
sparc: defconfig passed on all
tile: tilegx_defconfig passed on all

-- Shuah
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -next] spi: fix return value check in ce4100_spi_probe()

2013-03-01 Thread Mark Brown
On Fri, Feb 22, 2013 at 10:52:35AM +0800, Wei Yongjun wrote:
> From: Wei Yongjun 
> 
> In case of error, the function platform_device_register_full()
> returns ERR_PTR() and never returns NULL. The NULL test in the
> return value check should be replaced with IS_ERR().

Applied, thanks.


signature.asc
Description: Digital signature


Re: [ 00/30] 3.0.68-stable review

2013-03-01 Thread Shuah Khan
On Fri, Mar 1, 2013 at 12:45 PM, Greg Kroah-Hartman
 wrote:
> This is the start of the stable review cycle for the 3.0.68 release.
> There are 30 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Sun Mar  3 19:44:54 UTC 2013.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.0.68-rc1.gz
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
>

Patches applied cleanly to 3.0.67, 3.4.34, and 3.8.1

Compiled and booted on the following systems:

HP EliteBook 6930p Intel(R) Core(TM)2 Duo CPU T9400 @ 2.53GHz
HP ProBook 6475b AMD A10-4600M APU with Radeon(tm) HD Graphics

Special test this cycle:
HP ProLiant DL385p Gen8: Tested all three releases for the following
commit: iommu/amd: Initialize device table after dma_ops

dmesgs for all releases look good. No regressions compared to the previous
dmesgs for each of these releases.

Cross-compile tests results:

alpha: defconfig passed on all
arm: defconfig passed on all
arm64: not applicable to 3.0.y, 3.4.y. defconfig passed on 3.8.y
c6x: not applicable to 3.0.y, defconfig passed on 3.4.y, and 3.8.y.
mips: defconfig passed on all
mipsel: defconfig passed on all
powerpc: wii_defconfig passed on all
sh: defconfig passed on all
sparc: defconfig passed on all
tile: tilegx_defconfig passed on all

-- Shuah
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] Input: atmel_mxt_ts - add device id for touchpad variant

2013-03-01 Thread Olof Johansson
(sorry for duplicate post to some of you, my mail client defaulted to
HTML email for some reason).


2013/3/1 Benson Leung 
>
> From: Daniel Kurtz 
>
> This same driver can be used by atmel based touchscreens and touchpads
> (buttonpads) by instantiating the i2c device as a "atmel_mxt_tp".
>
> This will cause the driver to perform some touchpad specific
> initializations, such as:
>   * register input device name "Atmel maXTouch Touchpad" instead of
>   Touchscreen.
>   * register BTN_LEFT & BTN_TOOL_* event types.
>   * register axis resolution (as a fixed constant, for now)
>   * register BUTTONPAD property
>   * process GPIO buttons using reportid T19
>
> For now, the left mouse button is mapped to GPIO3. Going forward,
> platform data should specify the configuration of the buttons.
> They can be configured via a future platform data change to
> specify optional middle and right buttons, as well as other possible
> uses for the GPIO object T19.
>
> Signed-off-by: Daniel Kurtz 
> Signed-off-by: Benson Leung 
>
> Change-Id: Ia82e75d85111c94f6c3fb423181df0fa4b964fc4

Please remember to remove Change-Id on future patch postings.

Tested with native Linux Mint +mainline kernel earlier today, so:

Tested-by: Olof Johansson 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: build failure after merge of the ftrace tree

2013-03-01 Thread Steven Rostedt
On Sat, 2013-03-02 at 14:42 +1100, Stephen Rothwell wrote:
> Hi Steve,
> 
> On Fri, 01 Mar 2013 21:45:18 -0500 Steven Rostedt  wrote:
> >
> > On Fri, 2013-03-01 at 13:47 +1100, Stephen Rothwell wrote:
> > > 
> > > After merging the ftrace tree, today's linux-next build (x86_64
> > > allmodconfig) failed like this:
> > > 
> > > kernel/trace/trace_kdb.c: In function 'ftrace_dump_buf':
> > > kernel/trace/trace_kdb.c:29:33: error: invalid type argument of '->' 
> > > (have 'struct trace_array_cpu')
> > > kernel/trace/trace_kdb.c:86:33: error: invalid type argument of '->' 
> > > (have 'struct trace_array_cpu')
> > > 
> > > Caused by commit eaac1836c10e ("tracing: Replace the static global
> > > per_cpu arrays with allocated per_cpu").
> > > 
> > > I have used the ftrace tree from next-20130228 for today.
> > 
> > I rebased, and it should all be good now. I also fixed breakage to the
> > new snapshot feature. Here's my diff:
> 
> Of course, I wonder if this is intended for the current merge window
> (since it turned up in linux-next so late).  If not, then please don;t
> put it in your -next included branch until after -rc1 is released.  If
> you are aiming for this merge window, then good luck!  ;-)
> 

Oh crap, sorry. Quite the contrary. I should have waited for -rc1 as
this is intended for v3.10.

I made some major changes and wanted testing on it as soon as possible.
I've done tons of testing on my own but I wanted a broader audience. Do
you want me to pull my tree and reset it to what is only for 3.9?

All changes I wanted for 3.9 have already made it into Linus's tree.
Everything I've been pushing is destined for 3.10. But as I said. They
are big changes and wanted testing on them early. But it can wait for an
-rc1 window.

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] tracing: Annotate event field-defining functions with __init

2013-03-01 Thread Steven Rostedt
On Thu, 2013-02-21 at 10:33 +0800, Li Zefan wrote:
> Those functions are called either during kernel boot or module init.
> 
> Before:
> 
> $ dmesg | grep 'Freeing unused kernel memory'
> Freeing unused kernel memory: 1208k freed
> Freeing unused kernel memory: 1360k freed
> Freeing unused kernel memory: 1960k freed
> 
> After:
> 
> $ dmesg | grep 'Freeing unused kernel memory'
> Freeing unused kernel memory: 1236k freed
> Freeing unused kernel memory: 1388k freed
> Freeing unused kernel memory: 1960k freed

Also nice :-)

Here's my numbers (again with lots of debug enabled):

old:
[0.027087] Freeing SMP alternatives: 12k freed
[6.835298] Freeing initrd memory: 8004k freed
[   18.820837] Freeing unused kernel memory: 1092k freed
[   18.838487] Freeing unused kernel memory: 1456k freed
[   18.850665] Freeing unused kernel memory: 1544k freed

new:
[0.025087] Freeing SMP alternatives: 12k freed
[6.775349] Freeing initrd memory: 8004k freed
[   18.753519] Freeing unused kernel memory: 1144k freed
[   18.771447] Freeing unused kernel memory: 1508k freed
[   18.783637] Freeing unused kernel memory: 1544k freed

Lets hope these pass all my tests.

-- Steve



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v7 7/7] Documentation: update nfs option in filesystem/vfat.txt

2013-03-01 Thread Namjae Jeon
From: Namjae Jeon 

Add descriptions about 'stale_rw' and 'nostale_ro' nfs options in
filesystem/vfat.txt

Signed-off-by: Namjae Jeon 
Signed-off-by: Ravishankar N 
Signed-off-by: Amit Sahrawat 
---
 Documentation/filesystems/vfat.txt |   26 +-
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/Documentation/filesystems/vfat.txt 
b/Documentation/filesystems/vfat.txt
index d230dd9..4a93e98 100644
--- a/Documentation/filesystems/vfat.txt
+++ b/Documentation/filesystems/vfat.txt
@@ -150,12 +150,28 @@ discard   -- If set, issues discard/TRIM commands to 
the block
 device when blocks are freed. This is useful for SSD devices
 and sparse/thinly-provisoned LUNs.
 
-nfs   -- This option maintains an index (cache) of directory
-inodes by i_logstart which is used by the nfs-related code to
-improve look-ups.
+nfs=stale_rw|nostale_ro
+   Enable this only if you want to export the FAT filesystem
+   over NFS.
+
+   stale_rw: This option maintains an index (cache) of directory
+   inodes by i_logstart which is used by the nfs-related code to
+   improve look-ups. Full file operations (read/write) over NFS is
+   supported but with cache eviction at NFS server, this could
+   result in ESTALE issues.
+
+   nostale_ro: This option bases the inode number and filehandle
+   on the on-disk location of a file in the MS-DOS directory entry.
+   This ensures that ESTALE will not be returned after a file is
+   evicted from the inode cache. However, it means that operations
+   such as rename, create and unlink could cause filehandles that
+   previously pointed at one file to point at a different file,
+   potentially causing data corruption. For this reason, this
+   option also mounts the filesystem readonly.
+
+   To maintain backward compatibility, '-o nfs' is also accepted,
+   defaulting to stale_rw
 
-Enable this only if you want to export the FAT filesystem
-over NFS
 
 : 0,1,yes,no,true,false
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v7 5/7] fat (exportfs): rebuild inode if ilookup() fails

2013-03-01 Thread Namjae Jeon
From: Namjae Jeon 

If the cache lookups fail,use the i_pos value to find the
directory entry of the inode and rebuild the inode.Since this
involves accessing the FAT media, do this only if the nostale_ro nfs
mount option is specified.

Signed-off-by: Namjae Jeon 
Signed-off-by: Ravishankar N 
Signed-off-by: Amit Sahrawat 
---
 fs/fat/fat.h   |1 +
 fs/fat/inode.c |   15 +++
 fs/fat/nfs.c   |   41 -
 3 files changed, 52 insertions(+), 5 deletions(-)

diff --git a/fs/fat/fat.h b/fs/fat/fat.h
index c517fc0..413eaaf 100644
--- a/fs/fat/fat.h
+++ b/fs/fat/fat.h
@@ -75,6 +75,7 @@ struct msdos_sb_info {
unsigned long root_cluster;   /* first cluster of the root directory */
unsigned long fsinfo_sector;  /* sector number of FAT32 fsinfo */
struct mutex fat_lock;
+   struct mutex nfs_build_inode_lock;
struct mutex s_lock;
unsigned int prev_free;  /* previously allocated cluster number */
unsigned int free_clusters;  /* -1 if undefined */
diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index a42f2c2..ee3f3b9 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -444,12 +444,25 @@ static int fat_fill_inode(struct inode *inode, struct 
msdos_dir_entry *de)
return 0;
 }
 
+static inline void fat_lock_build_inode(struct msdos_sb_info *sbi)
+{
+   if (sbi->options.nfs == FAT_NFS_NOSTALE_RO)
+   mutex_lock(>nfs_build_inode_lock);
+}
+
+static inline void fat_unlock_build_inode(struct msdos_sb_info *sbi)
+{
+   if (sbi->options.nfs == FAT_NFS_NOSTALE_RO)
+   mutex_unlock(>nfs_build_inode_lock);
+}
+
 struct inode *fat_build_inode(struct super_block *sb,
struct msdos_dir_entry *de, loff_t i_pos)
 {
struct inode *inode;
int err;
 
+   fat_lock_build_inode(MSDOS_SB(sb));
inode = fat_iget(sb, i_pos);
if (inode)
goto out;
@@ -469,6 +482,7 @@ struct inode *fat_build_inode(struct super_block *sb,
fat_attach(inode, i_pos);
insert_inode_hash(inode);
 out:
+   fat_unlock_build_inode(MSDOS_SB(sb));
return inode;
 }
 
@@ -1248,6 +1262,7 @@ int fat_fill_super(struct super_block *sb, void *data, 
int silent, int isvfat,
sb->s_magic = MSDOS_SUPER_MAGIC;
sb->s_op = _sops;
sb->s_export_op = _export_ops;
+   mutex_init(>nfs_build_inode_lock);
ratelimit_state_init(>ratelimit, DEFAULT_RATELIMIT_INTERVAL,
 DEFAULT_RATELIMIT_BURST);
 
diff --git a/fs/fat/nfs.c b/fs/fat/nfs.c
index d59c025..0748196 100644
--- a/fs/fat/nfs.c
+++ b/fs/fat/nfs.c
@@ -50,19 +50,50 @@ static struct inode *fat_dget(struct super_block *sb, int 
i_logstart)
return inode;
 }
 
+static struct inode *fat_ilookup(struct super_block *sb, u64 ino, loff_t i_pos)
+{
+   if (MSDOS_SB(sb)->options.nfs == FAT_NFS_NOSTALE_RO)
+   return fat_iget(sb, i_pos);
+
+   else {
+   if ((ino < MSDOS_ROOT_INO) || (ino == MSDOS_FSINFO_INO))
+   return NULL;
+   return ilookup(sb, ino);
+   }
+}
+
 static struct inode *__fat_nfs_get_inode(struct super_block *sb,
   u64 ino, u32 generation, loff_t i_pos)
 {
-   struct inode *inode;
-
-   if ((ino < MSDOS_ROOT_INO) || (ino == MSDOS_FSINFO_INO))
-   return NULL;
+   struct inode *inode = fat_ilookup(sb, ino, i_pos);
 
-   inode = ilookup(sb, ino);
if (inode && generation && (inode->i_generation != generation)) {
iput(inode);
inode = NULL;
}
+   if (inode == NULL && MSDOS_SB(sb)->options.nfs == FAT_NFS_NOSTALE_RO) {
+   struct buffer_head *bh = NULL;
+   struct msdos_dir_entry *de ;
+   sector_t blocknr;
+   int offset;
+   fat_get_blknr_offset(MSDOS_SB(sb), i_pos, , );
+   bh = sb_bread(sb, blocknr);
+   if (!bh) {
+   fat_msg(sb, KERN_ERR,
+   "unable to read block(%llu) for building NFS 
inode",
+   (llu)blocknr);
+   return inode;
+   }
+   de = (struct msdos_dir_entry *)bh->b_data;
+   /* If a file is deleted on server and client is not updated
+* yet, we must not build the inode upon a lookup call.
+*/
+   if (IS_FREE(de[offset].name))
+   inode = NULL;
+   else
+   inode = fat_build_inode(sb, [offset], i_pos);
+   brelse(bh);
+   }
 
return inode;
 }
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v7 6/7] fat (exportfs): rebuild directory-inode if fat_dget()

2013-03-01 Thread Namjae Jeon
From: Namjae Jeon 

This patch enables rebuilding of directory inodes which are not present
in the cache.This is done by traversing the disk clusters to find the
directory entry of the parent directory and using its i_pos to build the
inode.

The traversal is done by fat_scan_logstart() which is similar to
fat_scan() but matches i_pos values instead of names.fat_scan_logstart()
needs an inode parameter to work, for which a dummy inode is created by
it's caller fat_rebuild_parent(). This dummy inode is destroyed after
the traversal completes.

All this is done  only if the nostale_ro nfs mount option is specified.

Signed-off-by: Namjae Jeon 
Signed-off-by: Ravishankar N 
Signed-off-by: Amit Sahrawat 
---
 fs/fat/dir.c   |   23 +++
 fs/fat/fat.h   |3 +++
 fs/fat/inode.c |2 +-
 fs/fat/nfs.c   |   52 +++-
 4 files changed, 78 insertions(+), 2 deletions(-)

diff --git a/fs/fat/dir.c b/fs/fat/dir.c
index 165012e..7a6f02c 100644
--- a/fs/fat/dir.c
+++ b/fs/fat/dir.c
@@ -964,6 +964,29 @@ int fat_scan(struct inode *dir, const unsigned char *name,
 }
 EXPORT_SYMBOL_GPL(fat_scan);
 
+/*
+ * Scans a directory for a given logstart.
+ * Returns an error code or zero.
+ */
+int fat_scan_logstart(struct inode *dir, int i_logstart,
+ struct fat_slot_info *sinfo)
+{
+   struct super_block *sb = dir->i_sb;
+
+   sinfo->slot_off = 0;
+   sinfo->bh = NULL;
+   while (fat_get_short_entry(dir, >slot_off, >bh,
+  >de) >= 0) {
+   if (fat_get_start(MSDOS_SB(sb), sinfo->de) == i_logstart) {
+   sinfo->slot_off -= sizeof(*sinfo->de);
+   sinfo->nr_slots = 1;
+   sinfo->i_pos = fat_make_i_pos(sb, sinfo->bh, sinfo->de);
+   return 0;
+   }
+   }
+   return -ENOENT;
+}
+
 static int __fat_remove_entries(struct inode *dir, loff_t pos, int nr_slots)
 {
struct super_block *sb = dir->i_sb;
diff --git a/fs/fat/fat.h b/fs/fat/fat.h
index 413eaaf..21664fc 100644
--- a/fs/fat/fat.h
+++ b/fs/fat/fat.h
@@ -296,6 +296,8 @@ extern int fat_dir_empty(struct inode *dir);
 extern int fat_subdirs(struct inode *dir);
 extern int fat_scan(struct inode *dir, const unsigned char *name,
struct fat_slot_info *sinfo);
+extern int fat_scan_logstart(struct inode *dir, int i_logstart,
+struct fat_slot_info *sinfo);
 extern int fat_get_dotdot_entry(struct inode *dir, struct buffer_head **bh,
struct msdos_dir_entry **de);
 extern int fat_alloc_new_dir(struct inode *dir, struct timespec *ts);
@@ -373,6 +375,7 @@ extern struct inode *fat_build_inode(struct super_block *sb,
 extern int fat_sync_inode(struct inode *inode);
 extern int fat_fill_super(struct super_block *sb, void *data, int silent,
  int isvfat, void (*setup)(struct super_block *));
+extern int fat_fill_inode(struct inode *inode, struct msdos_dir_entry *de);
 
 extern int fat_flush_inodes(struct super_block *sb, struct inode *i1,
struct inode *i2);
diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index ee3f3b9..dfce656 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -385,7 +385,7 @@ static int fat_calc_dir_size(struct inode *inode)
 }
 
 /* doesn't deal with root inode */
-static int fat_fill_inode(struct inode *inode, struct msdos_dir_entry *de)
+int fat_fill_inode(struct inode *inode, struct msdos_dir_entry *de)
 {
struct msdos_sb_info *sbi = MSDOS_SB(inode->i_sb);
int error;
diff --git a/fs/fat/nfs.c b/fs/fat/nfs.c
index 0748196..93e1493 100644
--- a/fs/fat/nfs.c
+++ b/fs/fat/nfs.c
@@ -216,6 +216,53 @@ static struct dentry *fat_fh_to_parent_nostale(struct 
super_block *sb,
 }
 
 /*
+ * Rebuild the parent for a directory that is not connected
+ *  to the filesystem root
+ */
+static
+struct inode *fat_rebuild_parent(struct super_block *sb, int parent_logstart)
+{
+   int search_clus, clus_to_match;
+   struct msdos_dir_entry *de;
+   struct inode *parent = NULL;
+   struct inode *dummy_grand_parent = NULL;
+   struct fat_slot_info sinfo;
+   struct msdos_sb_info *sbi = MSDOS_SB(sb);
+   sector_t blknr = fat_clus_to_blknr(sbi, parent_logstart);
+   struct buffer_head *parent_bh = sb_bread(sb, blknr);
+   if (!parent_bh) {
+   fat_msg(sb, KERN_ERR,
+   "unable to read cluster of parent directory");
+   return NULL;
+   }
+
+   de = (struct msdos_dir_entry *) parent_bh->b_data;
+   clus_to_match = fat_get_start(sbi, [0]);
+   search_clus = fat_get_start(sbi, [1]);
+
+   dummy_grand_parent = fat_dget(sb, search_clus);
+   if (!dummy_grand_parent) {
+   dummy_grand_parent = new_inode(sb);
+   if (!dummy_grand_parent) {
+   brelse(parent_bh);
+ 

[PATCH v7 4/7] fat: restructure export_operations

2013-03-01 Thread Namjae Jeon
From: Namjae Jeon 

Define two nfs export_operation structures,one for 'stale_rw' mounts and
the other for 'nostale_ro'. The latter uses i_pos as a basis for encoding
and decoding file handles.

Also, assign i_pos to kstat->ino. The logic for rebuilding the inode is
added in the subsequent patches.

Signed-off-by: Namjae Jeon 
Signed-off-by: Ravishankar N 
Signed-off-by: Amit Sahrawat 
---
 fs/fat/fat.h |8 +--
 fs/fat/file.c|5 ++
 fs/fat/inode.c   |   13 ++---
 fs/fat/nfs.c |  130 --
 include/linux/exportfs.h |   11 
 5 files changed, 147 insertions(+), 20 deletions(-)

diff --git a/fs/fat/fat.h b/fs/fat/fat.h
index 980c034..c517fc0 100644
--- a/fs/fat/fat.h
+++ b/fs/fat/fat.h
@@ -406,12 +406,8 @@ int fat_cache_init(void);
 void fat_cache_destroy(void);
 
 /* fat/nfs.c */
-struct fid;
-extern struct dentry *fat_fh_to_dentry(struct super_block *sb, struct fid *fid,
-  int fh_len, int fh_type);
-extern struct dentry *fat_fh_to_parent(struct super_block *sb, struct fid *fid,
-  int fh_len, int fh_type);
-extern struct dentry *fat_get_parent(struct dentry *child_dir);
+extern const struct export_operations fat_export_ops;
+extern const struct export_operations fat_export_ops_nostale;
 
 /* helper for printk */
 typedef unsigned long long llu;
diff --git a/fs/fat/file.c b/fs/fat/file.c
index 3978f8c..b0b632e 100644
--- a/fs/fat/file.c
+++ b/fs/fat/file.c
@@ -306,6 +306,11 @@ int fat_getattr(struct vfsmount *mnt, struct dentry 
*dentry, struct kstat *stat)
struct inode *inode = dentry->d_inode;
generic_fillattr(inode, stat);
stat->blksize = MSDOS_SB(inode->i_sb)->cluster_size;
+
+   if (MSDOS_SB(inode->i_sb)->options.nfs == FAT_NFS_NOSTALE_RO) {
+   /* Use i_pos for ino. This is used as fileid of nfs. */
+   stat->ino = fat_i_pos_read(MSDOS_SB(inode->i_sb), inode);
+   }
return 0;
 }
 EXPORT_SYMBOL_GPL(fat_getattr);
diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index 8356a05..a42f2c2 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -18,7 +18,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -749,12 +748,6 @@ static const struct super_operations fat_sops = {
.show_options   = fat_show_options,
 };
 
-static const struct export_operations fat_export_ops = {
-   .fh_to_dentry   = fat_fh_to_dentry,
-   .fh_to_parent   = fat_fh_to_parent,
-   .get_parent = fat_get_parent,
-};
-
 static int fat_show_options(struct seq_file *m, struct dentry *root)
 {
struct msdos_sb_info *sbi = MSDOS_SB(root->d_sb);
@@ -1178,8 +1171,10 @@ out:
opts->allow_utime = ~opts->fs_dmask & (S_IWGRP | S_IWOTH);
if (opts->unicode_xlate)
opts->utf8 = 0;
-   if (opts->nfs == FAT_NFS_NOSTALE_RO)
+   if (opts->nfs == FAT_NFS_NOSTALE_RO) {
sb->s_flags |= MS_RDONLY;
+   sb->s_export_op = _export_ops_nostale;
+   }
 
return 0;
 }
@@ -1190,7 +1185,7 @@ static int fat_read_root(struct inode *inode)
struct msdos_sb_info *sbi = MSDOS_SB(sb);
int error;
 
-   MSDOS_I(inode)->i_pos = 0;
+   MSDOS_I(inode)->i_pos = MSDOS_ROOT_INO;
inode->i_uid = sbi->options.fs_uid;
inode->i_gid = sbi->options.fs_gid;
inode->i_version++;
diff --git a/fs/fat/nfs.c b/fs/fat/nfs.c
index 499c104..d59c025 100644
--- a/fs/fat/nfs.c
+++ b/fs/fat/nfs.c
@@ -14,6 +14,18 @@
 #include 
 #include "fat.h"
 
+struct fat_fid {
+   u32 i_gen;
+   u32 i_pos_low;
+   u16 i_pos_hi;
+   u16 parent_i_pos_hi;
+   u32 parent_i_pos_low;
+   u32 parent_i_gen;
+};
+
+#define FAT_FID_SIZE_WITHOUT_PARENT 3
+#define FAT_FID_SIZE_WITH_PARENT (sizeof(struct fat_fid)/sizeof(u32))
+
 /**
  * Look up a directory inode given its starting cluster.
  */
@@ -38,8 +50,8 @@ static struct inode *fat_dget(struct super_block *sb, int 
i_logstart)
return inode;
 }
 
-static struct inode *fat_nfs_get_inode(struct super_block *sb,
-  u64 ino, u32 generation)
+static struct inode *__fat_nfs_get_inode(struct super_block *sb,
+  u64 ino, u32 generation, loff_t i_pos)
 {
struct inode *inode;
 
@@ -55,35 +67,130 @@ static struct inode *fat_nfs_get_inode(struct super_block 
*sb,
return inode;
 }
 
+static struct inode *fat_nfs_get_inode(struct super_block *sb,
+  u64 ino, u32 generation)
+{
+
+   return __fat_nfs_get_inode(sb, ino, generation, 0);
+}
+
+static int
+fat_encode_fh_nostale(struct inode *inode, __u32 *fh, int *lenp,
+ struct inode *parent)
+{
+   int len = *lenp;
+   struct msdos_sb_info *sbi = MSDOS_SB(inode->i_sb);
+   struct fat_fid *fid = (struct fat_fid *) fh;
+   loff_t i_pos;
+  

[PATCH v7 3/7] fat: introduce a helper fat_get_blknr_offset()

2013-03-01 Thread Namjae Jeon
From: Namjae Jeon 

Introduce helper function to get the block number and offset for a given
i_pos value. Use it in __fat_write_inode() now and later on in nfs.c

Signed-off-by: Namjae Jeon 
Signed-off-by: Ravishankar N 
Signed-off-by: Amit Sahrawat 
---
 fs/fat/fat.h   |7 +++
 fs/fat/inode.c |9 +
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/fs/fat/fat.h b/fs/fat/fat.h
index f16948e..980c034 100644
--- a/fs/fat/fat.h
+++ b/fs/fat/fat.h
@@ -218,6 +218,13 @@ static inline sector_t fat_clus_to_blknr(struct 
msdos_sb_info *sbi, int clus)
+ sbi->data_start;
 }
 
+static inline void fat_get_blknr_offset(struct msdos_sb_info *sbi,
+   loff_t i_pos, sector_t *blknr, int *offset)
+{
+   *blknr = i_pos >> sbi->dir_per_block_bits;
+   *offset = i_pos & (sbi->dir_per_block - 1);
+}
+
 static inline loff_t fat_i_pos_read(struct msdos_sb_info *sbi,
struct inode *inode)
 {
diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index d89f79f..8356a05 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -663,7 +663,8 @@ static int __fat_write_inode(struct inode *inode, int wait)
struct buffer_head *bh;
struct msdos_dir_entry *raw_entry;
loff_t i_pos;
-   int err;
+   sector_t blocknr;
+   int err, offset;
 
if (inode->i_ino == MSDOS_ROOT_INO)
return 0;
@@ -673,7 +674,8 @@ retry:
if (!i_pos)
return 0;
 
-   bh = sb_bread(sb, i_pos >> sbi->dir_per_block_bits);
+   fat_get_blknr_offset(sbi, i_pos, , );
+   bh = sb_bread(sb, blocknr);
if (!bh) {
fat_msg(sb, KERN_ERR, "unable to read inode block "
   "for updating (i_pos %lld)", i_pos);
@@ -686,8 +688,7 @@ retry:
goto retry;
}
 
-   raw_entry = &((struct msdos_dir_entry *) (bh->b_data))
-   [i_pos & (sbi->dir_per_block - 1)];
+   raw_entry = &((struct msdos_dir_entry *) (bh->b_data))[offset];
if (S_ISDIR(inode->i_mode))
raw_entry->size = 0;
else
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v7 2/7] fat: move fat_i_pos_read to fat.h

2013-03-01 Thread Namjae Jeon
From: Namjae Jeon 

Move fat_i_pos_read to fat.h so that it can be called from nfs.c in the
subsequent patches to encode the file handle.

Signed-off-by: Namjae Jeon 
Signed-off-by: Ravishankar N 
Signed-off-by: Amit Sahrawat 
---
 fs/fat/fat.h   |   14 ++
 fs/fat/inode.c |   14 --
 2 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/fs/fat/fat.h b/fs/fat/fat.h
index a7b1d86..f16948e 100644
--- a/fs/fat/fat.h
+++ b/fs/fat/fat.h
@@ -218,6 +218,20 @@ static inline sector_t fat_clus_to_blknr(struct 
msdos_sb_info *sbi, int clus)
+ sbi->data_start;
 }
 
+static inline loff_t fat_i_pos_read(struct msdos_sb_info *sbi,
+   struct inode *inode)
+{
+   loff_t i_pos;
+#if BITS_PER_LONG == 32
+   spin_lock(>inode_hash_lock);
+#endif
+   i_pos = MSDOS_I(inode)->i_pos;
+#if BITS_PER_LONG == 32
+   spin_unlock(>inode_hash_lock);
+#endif
+   return i_pos;
+}
+
 static inline void fat16_towchar(wchar_t *dst, const __u8 *src, size_t len)
 {
 #ifdef __BIG_ENDIAN
diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index 82cef99..d89f79f 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -656,20 +656,6 @@ static int fat_statfs(struct dentry *dentry, struct 
kstatfs *buf)
return 0;
 }
 
-static inline loff_t fat_i_pos_read(struct msdos_sb_info *sbi,
-   struct inode *inode)
-{
-   loff_t i_pos;
-#if BITS_PER_LONG == 32
-   spin_lock(>inode_hash_lock);
-#endif
-   i_pos = MSDOS_I(inode)->i_pos;
-#if BITS_PER_LONG == 32
-   spin_unlock(>inode_hash_lock);
-#endif
-   return i_pos;
-}
-
 static int __fat_write_inode(struct inode *inode, int wait)
 {
struct super_block *sb = inode->i_sb;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v7 1/7] fat: introduce 2 new values for the -o nfs mount option

2013-03-01 Thread Namjae Jeon
From: Namjae Jeon 

Provide two possible values 'stale_rw' and 'nostale_ro' for
the -o nfs mount option.The first one allows all file operations but
does not reduce ESTALE errors on memory constrained systems. The second
one eliminates ESTALE errors but mounts the filesystem as
read-only. Not specifying a value defaults to 'stale_rw'.

Signed-off-by: Namjae Jeon 
Signed-off-by: Ravishankar N 
Signed-off-by: Amit Sahrawat 
---
 fs/fat/fat.h   |7 +--
 fs/fat/inode.c |   23 ---
 2 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/fs/fat/fat.h b/fs/fat/fat.h
index e9cc3f0..a7b1d86 100644
--- a/fs/fat/fat.h
+++ b/fs/fat/fat.h
@@ -23,6 +23,9 @@
 #define FAT_ERRORS_PANIC   2  /* panic on error */
 #define FAT_ERRORS_RO  3  /* remount r/o on error */
 
+#define FAT_NFS_STALE_RW   1  /* NFS RW support, can cause ESTALE */
+#define FAT_NFS_NOSTALE_RO 2  /* NFS RO support, no ESTALE issue */
+
 struct fat_mount_options {
kuid_t fs_uid;
kgid_t fs_gid;
@@ -34,6 +37,7 @@ struct fat_mount_options {
unsigned short shortname;  /* flags for shortname display/create rule */
unsigned char name_check;  /* r = relaxed, n = normal, s = strict */
unsigned char errors;  /* On error: continue, panic, remount-ro */
+   unsigned char nfs;/* NFS support: nostale_ro, stale_rw */
unsigned short allow_utime;/* permission for setting the [am]time */
unsigned quiet:1,  /* set = fake successful chmods and chowns */
 showexec:1,   /* set = only set x bit for com/exe/bat */
@@ -48,8 +52,7 @@ struct fat_mount_options {
 usefree:1,/* Use free_clusters for FAT32 */
 tz_set:1, /* Filesystem timestamps' offset set */
 rodir:1,  /* allow ATTR_RO for directory */
-discard:1,/* Issue discard requests on deletions */
-nfs:1;/* Do extra work needed for NFS export */
+discard:1;/* Issue discard requests on deletions */
 };
 
 #define FAT_HASH_BITS  8
diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index d1d502a..82cef99 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -815,8 +815,6 @@ static int fat_show_options(struct seq_file *m, struct 
dentry *root)
seq_puts(m, ",usefree");
if (opts->quiet)
seq_puts(m, ",quiet");
-   if (opts->nfs)
-   seq_puts(m, ",nfs");
if (opts->showexec)
seq_puts(m, ",showexec");
if (opts->sys_immutable)
@@ -850,6 +848,10 @@ static int fat_show_options(struct seq_file *m, struct 
dentry *root)
seq_puts(m, ",errors=panic");
else
seq_puts(m, ",errors=remount-ro");
+   if (opts->nfs == FAT_NFS_NOSTALE_RO)
+   seq_puts(m, ",nfs=nostale_ro");
+   else if (opts->nfs)
+   seq_puts(m, ",nfs=stale_rw");
if (opts->discard)
seq_puts(m, ",discard");
 
@@ -866,7 +868,7 @@ enum {
Opt_uni_xl_no, Opt_uni_xl_yes, Opt_nonumtail_no, Opt_nonumtail_yes,
Opt_obsolete, Opt_flush, Opt_tz_utc, Opt_rodir, Opt_err_cont,
Opt_err_panic, Opt_err_ro, Opt_discard, Opt_nfs, Opt_time_offset,
-   Opt_err,
+   Opt_nfs_stale_rw, Opt_nfs_nostale_ro, Opt_err,
 };
 
 static const match_table_t fat_tokens = {
@@ -896,7 +898,9 @@ static const match_table_t fat_tokens = {
{Opt_err_panic, "errors=panic"},
{Opt_err_ro, "errors=remount-ro"},
{Opt_discard, "discard"},
-   {Opt_nfs, "nfs"},
+   {Opt_nfs_stale_rw, "nfs"},
+   {Opt_nfs_stale_rw, "nfs=stale_rw"},
+   {Opt_nfs_nostale_ro, "nfs=nostale_ro"},
{Opt_obsolete, "conv=binary"},
{Opt_obsolete, "conv=text"},
{Opt_obsolete, "conv=auto"},
@@ -1093,6 +1097,12 @@ static int parse_options(struct super_block *sb, char 
*options, int is_vfat,
case Opt_err_ro:
opts->errors = FAT_ERRORS_RO;
break;
+   case Opt_nfs_stale_rw:
+   opts->nfs = FAT_NFS_STALE_RW;
+   break;
+   case Opt_nfs_nostale_ro:
+   opts->nfs = FAT_NFS_NOSTALE_RO;
+   break;
 
/* msdos specific */
case Opt_dots:
@@ -1151,9 +1161,6 @@ static int parse_options(struct super_block *sb, char 
*options, int is_vfat,
case Opt_discard:
opts->discard = 1;
break;
-   case Opt_nfs:
-   opts->nfs = 1;
-   break;
 
/* obsolete mount options */
case Opt_obsolete:
@@ -1184,6 +1191,8 @@ out:
opts->allow_utime = ~opts->fs_dmask & (S_IWGRP | S_IWOTH);
if (opts->unicode_xlate)
opts->utf8 = 0;
+   if 

[PATCH v7 0/7] fat (exportfs): support stale_rw and nostale_ro mount option.

2013-03-01 Thread Namjae Jeon
From: Namjae Jeon 

This patch set eliminates the client side ESTALE errors when a FAT partition
exported over NFS has it's dentries evicted from the cache. The idea is to
find the on-disk location_'i_pos' of the dirent of the inode that has been
evicted and use it to rebuild the inode.

Change log
v7:
Assign i_pos = MSDOS_ROOT_INO in fat_read_root() and drop check in fat 
_getattr().
Remove __packed attribute from struct fat_fid.
Make code in fat_encode_fh() more readable.

v6:
Dummy inode approach to eliminate custom function fat_traverse_cluster().

v5:
Modified  fat_ent_read() arguments so that the custom function 
fat_read_next_clus() can be eliminated.

v(no name):
Define two nfs export_operation structures, one for 'stale_rw' mounts and 
the other for 'nostale_ro'

fat_nfs_get_inode does not hold i_mutex of parent directory. So introduce
fat_lock_build_inode().

v4: 

 
Instead of assigning i_pos to inode->i_ino, assign it to kstat->ino

v3:
Dropped busy-list approach and made the filesystem read only when
rebuilding evicted inodes, by providing stale_rw and nostale_ro mount options
 
v2:
Introduced a list of busy i_pos values for inodes that are unlinked but
having open file handles. Did this to avoid assigning such i_pos values to new
files created at same location.

v1:
Permanent inode number based approach by assigning i_pos to i_ino.
Added custom function fat_read_next_clus() and  fat_traverse_cluster()
to read disk entries.

Namjae Jeon (7):
  fat: Introduce 2 new values for the -o nfs mount option
  fat: move fat_i_pos_read to fat.h
  fat: introduce a helper fat_get_blknr_offset()
  fat: restructure export_operations
  fat (exportfs): rebuild inode if ilookup() fails
  fat (exportfs): rebuild directory-inode if fat_dget() fails
  Documentation: update nfs option in filesystem/vfat.txt
---
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mmotm 2013-03-01-15-50 uploaded (strict user copy)

2013-03-01 Thread Stephen Boyd
On 03/01/13 19:00, Randy Dunlap wrote:
> On 03/01/13 15:51, a...@linux-foundation.org wrote:
>> The mm-of-the-moment snapshot 2013-03-01-15-50 has been uploaded to
>>
>>http://www.ozlabs.org/~akpm/mmotm/
>>
>
> on i386:
>
> ERROR: "copy_from_user_overflow" [fs/binfmt_misc.ko] undefined!
>
> which I don't understand.
> lib/usercopy.o is built and building binfmt_misc.c says:
>
>   CC [M]  fs/binfmt_misc.o
> In file included from arch/x86/include/asm/uaccess.h:537:0,
>  from include/linux/uaccess.h:5,
>  from include/linux/highmem.h:8,
>  from include/linux/pagemap.h:10,
>  from fs/binfmt_misc.c:27:
> arch/x86/include/asm/uaccess_32.h: In function 'parse_command.part.1':
> arch/x86/include/asm/uaccess_32.h:211:26: warning: call to 
> 'copy_from_user_overflow' declared with attribute warning: copy_from_user() 
> buffer size is not provably correct [enabled by default]

Hm.. That's because it's part of lib and not obj, right?

diff --git a/lib/Makefile b/lib/Makefile
index 59fabd0..4c55104 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -15,7 +15,7 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \
 is_single_threaded.o plist.o decompress.o kobject_uevent.o \
 earlycpio.o percpu-refcount.o
 
-lib-$(CONFIG_ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS) += usercopy.o
+obj-$(CONFIG_ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS) += usercopy.o
 lib-$(CONFIG_MMU) += ioremap.o
 lib-$(CONFIG_SMP) += cpumask.o
 

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: build failure after merge of the nfsd tree

2013-03-01 Thread J. Bruce Fields
On Fri, Mar 01, 2013 at 01:19:42AM +, Myklebust, Trond wrote:
> On Fri, 2013-03-01 at 12:04 +1100, Stephen Rothwell wrote:
> > Hi all,
> > 
> > After merging the nfsd tree, today's linux-next build (powerpc
> > ppc64_defconfig) failed like this:
> > 
> > net/sunrpc/xprtsock.c:1923:30: error: 'struct rpc_task' has no member named 
> > 'tk_xprt'
> > 
> > Caused by commit dc107402ae06 ("SUNRPC: make AF_LOCAL connect
> > synchronous") interacting with commit 77102893ae68 ("SUNRPC: Nuke the
> > tk_xprt macro") from Linus' tree.
> > 
> > I have no idea how to fix this, so I have used the version of the nfsd
> > tree from next-20130228 for today.
> 
> Hi Bruce,
> 
> The attached patch should suffice to fix this up.

Thanks!

Looks like Linus got this right in the upstream merge.

--b.

> 
> Cheers
>   Trond
> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer
> 
> NetApp
> trond.mykleb...@netapp.com
> www.netapp.com

> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> index 2e7e09c..c1d8476 100644
> --- a/net/sunrpc/xprtsock.c
> +++ b/net/sunrpc/xprtsock.c
> @@ -1918,9 +1918,8 @@ out:
>   return status;
>  }
>  
> -static void xs_local_connect(struct rpc_task *task)
> +static void xs_local_connect(struct rpc_xprt *xprt, struct rpc_task *task)
>  {
> - struct rpc_xprt *xprt = task->tk_xprt;
>   struct sock_xprt *transport = container_of(xprt, struct sock_xprt, 
> xprt);
>   int ret;
>  

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: build failure after merge of the ftrace tree

2013-03-01 Thread Stephen Rothwell
Hi Steve,

On Fri, 01 Mar 2013 21:45:18 -0500 Steven Rostedt  wrote:
>
> On Fri, 2013-03-01 at 13:47 +1100, Stephen Rothwell wrote:
> > 
> > After merging the ftrace tree, today's linux-next build (x86_64
> > allmodconfig) failed like this:
> > 
> > kernel/trace/trace_kdb.c: In function 'ftrace_dump_buf':
> > kernel/trace/trace_kdb.c:29:33: error: invalid type argument of '->' (have 
> > 'struct trace_array_cpu')
> > kernel/trace/trace_kdb.c:86:33: error: invalid type argument of '->' (have 
> > 'struct trace_array_cpu')
> > 
> > Caused by commit eaac1836c10e ("tracing: Replace the static global
> > per_cpu arrays with allocated per_cpu").
> > 
> > I have used the ftrace tree from next-20130228 for today.
> 
> I rebased, and it should all be good now. I also fixed breakage to the
> new snapshot feature. Here's my diff:

Of course, I wonder if this is intended for the current merge window
(since it turned up in linux-next so late).  If not, then please don;t
put it in your -next included branch until after -rc1 is released.  If
you are aiming for this merge window, then good luck!  ;-)

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgprPFibReL9H.pgp
Description: PGP signature


[PATCH 5/5] f2fs: avoid extra ++ while returning from get_node_path

2013-03-01 Thread Namjae Jeon
From: Namjae Jeon 

In all the breaking conditions in get_node_path, 'n' is used to
track index in offset[] array, but while breaking out also, in all
paths n++ is done.
So, remove the ++ from breaking paths. Also, avoid
reset of 'level=0' in first case.

Signed-off-by: Namjae Jeon 
Signed-off-by: Amit Sahrawat 
---
 fs/f2fs/node.c |   13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index e24f747..d10085c 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -320,15 +320,14 @@ static int get_node_path(long block, int offset[4], 
unsigned int noffset[4])
noffset[0] = 0;
 
if (block < direct_index) {
-   offset[n++] = block;
-   level = 0;
+   offset[n] = block;
goto got;
}
block -= direct_index;
if (block < direct_blks) {
offset[n++] = NODE_DIR1_BLOCK;
noffset[n] = 1;
-   offset[n++] = block;
+   offset[n] = block;
level = 1;
goto got;
}
@@ -336,7 +335,7 @@ static int get_node_path(long block, int offset[4], 
unsigned int noffset[4])
if (block < direct_blks) {
offset[n++] = NODE_DIR2_BLOCK;
noffset[n] = 2;
-   offset[n++] = block;
+   offset[n] = block;
level = 1;
goto got;
}
@@ -346,7 +345,7 @@ static int get_node_path(long block, int offset[4], 
unsigned int noffset[4])
noffset[n] = 3;
offset[n++] = block / direct_blks;
noffset[n] = 4 + offset[n - 1];
-   offset[n++] = block % direct_blks;
+   offset[n] = block % direct_blks;
level = 2;
goto got;
}
@@ -356,7 +355,7 @@ static int get_node_path(long block, int offset[4], 
unsigned int noffset[4])
noffset[n] = 4 + dptrs_per_blk;
offset[n++] = block / direct_blks;
noffset[n] = 5 + dptrs_per_blk + offset[n - 1];
-   offset[n++] = block % direct_blks;
+   offset[n] = block % direct_blks;
level = 2;
goto got;
}
@@ -371,7 +370,7 @@ static int get_node_path(long block, int offset[4], 
unsigned int noffset[4])
noffset[n] = 7 + (dptrs_per_blk * 2) +
  offset[n - 2] * (dptrs_per_blk + 1) +
  offset[n - 1];
-   offset[n++] = block % direct_blks;
+   offset[n] = block % direct_blks;
level = 3;
goto got;
} else {
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/5] f2fs: align f2fs maximum name length to linux based filesystem

2013-03-01 Thread Namjae Jeon
From: Namjae Jeon 

Since, maximum filename length supported in linux is 255 characters.
So, there is no need to reserve space for 256 characters in f2fs inode.
This aligns the filename length to NFS requests also which also
has a default limit of '255'.

Signed-off-by: Namjae Jeon 
Signed-off-by: Amit Sahrawat 
---
 include/linux/f2fs_fs.h |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/f2fs_fs.h b/include/linux/f2fs_fs.h
index f9a12f6..f7bd27d 100644
--- a/include/linux/f2fs_fs.h
+++ b/include/linux/f2fs_fs.h
@@ -139,7 +139,7 @@ struct f2fs_extent {
__le32 len; /* lengh of the extent */
 } __packed;
 
-#define F2FS_MAX_NAME_LEN  256
+#define F2FS_MAX_NAME_LEN  255
 #define ADDRS_PER_INODE 923/* Address Pointers in an Inode */
 #define ADDRS_PER_BLOCK 1018   /* Address Pointers in a Direct Block */
 #define NIDS_PER_BLOCK  1018   /* Node IDs in an Indirect Block */
@@ -166,6 +166,7 @@ struct f2fs_inode {
__le32 i_pino;  /* parent inode number */
__le32 i_namelen;   /* file name length */
__u8 i_name[F2FS_MAX_NAME_LEN]; /* file name for SPOR */
+   __u8 i_reserved_2;  /* reserved for future use*/
 
struct f2fs_extent i_ext;   /* caching a largest extent */
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/5] f2fs: optimize and change return path in lookup_free_nid_list

2013-03-01 Thread Namjae Jeon
From: Namjae Jeon 

Optimize and change return path in lookup_free_nid_list

Signed-off-by: Namjae Jeon 
Signed-off-by: Amit Sahrawat 
---
 fs/f2fs/node.c |7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index e275218..e24f747 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -1195,14 +1195,13 @@ const struct address_space_operations f2fs_node_aops = {
 static struct free_nid *__lookup_free_nid_list(nid_t n, struct list_head *head)
 {
struct list_head *this;
-   struct free_nid *i = NULL;
+   struct free_nid *i;
list_for_each(this, head) {
i = list_entry(this, struct free_nid, list);
if (i->nid == n)
-   break;
-   i = NULL;
+   return i;
}
-   return i;
+   return NULL;
 }
 
 static void __del_from_free_nid_list(struct free_nid *i)
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/5] f2fs: move f2fs_balance_fs to correct place in unlink

2013-03-01 Thread Namjae Jeon
From: Namjae Jeon 

Actual dirty of pages will occur in f2fs_delete_entry so move the
f2fs_balance_fs just before deletion.

Signed-off-by: Namjae Jeon 
Signed-off-by: Amit Sahrawat 
---
 fs/f2fs/namei.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index 1a49b88..eaa86f5 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c
@@ -223,8 +223,6 @@ static int f2fs_unlink(struct inode *dir, struct dentry 
*dentry)
struct page *page;
int err = -ENOENT;
 
-   f2fs_balance_fs(sbi);
-
de = f2fs_find_entry(dir, >d_name, );
if (!de)
goto fail;
@@ -236,6 +234,8 @@ static int f2fs_unlink(struct inode *dir, struct dentry 
*dentry)
goto fail;
}
 
+   f2fs_balance_fs(sbi);
+
f2fs_delete_entry(de, page, inode);
 
/* In order to evict this inode,  we set it dirty */
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/5] f2fs: change victim segmap test condition in get_victim_by_default

2013-03-01 Thread Namjae Jeon
From: Namjae Jeon 

Instead of checking for victim_segmap(FG_GC) OR
(gc_type == BG_GC) && victim_segmap(BG_GC);
to continue for the victim selection. The 2 conditions
can simply be merged and decision can directly be made using 'gc_type'.

Signed-off-by: Namjae Jeon 
Signed-off-by: Amit Sahrawat 
---
 fs/f2fs/gc.c |5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 94b8a0c..16b4148 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -266,10 +266,7 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi,
}
p.offset = ((segno / p.ofs_unit) * p.ofs_unit) + p.ofs_unit;
 
-   if (test_bit(segno, dirty_i->victim_segmap[FG_GC]))
-   continue;
-   if (gc_type == BG_GC &&
-   test_bit(segno, dirty_i->victim_segmap[BG_GC]))
+   if (test_bit(segno, dirty_i->victim_segmap[gc_type]))
continue;
if (IS_CURSEC(sbi, GET_SECNO(sbi, segno)))
continue;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/4] tracing: Add a helper function for event print functions

2013-03-01 Thread Steven Rostedt
On Thu, 2013-02-21 at 10:32 +0800, Li Zefan wrote:
> Move duplicate code in event print functions to a helper function.
> 
> This shrinks the size of the kernel by ~13K.
> 
>textdata bss dec hex filename
> 6596137 1743966 1013867218478775119f6b7 vmlinux.o.old
> 6583002 1743849 1013867218465523119c2f3 vmlinux.o.new

Nice! I just tried it out and got:

   textdata bss dec hex filename
7747813 1673816 1503232 10924861 a6b33d vmlinux.old
7715443 1673560 1499136 10888139 a623cb vmlinux.new

Of course I have a lot of debugging enabled at the moment, which
probably exaggerates these numbers a bit. But still, 36K in savings is
nice for even a debug kernel :-)

I'll start beating on this patch for a bit, and will most likely add it
to my v3.10 queue.

Thanks!

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mfd: palmas: provide irq flags through DT/platform data

2013-03-01 Thread Mark Brown
On Fri, Mar 01, 2013 at 12:34:40PM -0700, Stephen Warren wrote:

> Is Palmas a family of chips rather than a single chip then? That
> implies that the DT would need two compatible values, e.g.:

Yes.

> compatible = "ti,12345", "ti,palmas";

> ... where "12345" is the actual chip name.

> ... rather than just the following which IIRC was in the example in
> the DT binding document in another patch series:

> compatible = "ti,palmas";

Indeed, and in fact this has already been done for the I2C device ID
table.  We should have the same list of devices in the OF IDs.


signature.asc
Description: Digital signature


Re: [PATCH -V1 15/24] mm/THP: HPAGE_SHIFT is not a #define on some arch

2013-03-01 Thread Hillf Danton
Hello Aneesh

[with lkml cced]

>-#if HPAGE_PMD_ORDER > MAX_ORDER
>-#error "hugepages can't be allocated by the buddy allocator"
>-#endif
...
>-  if (!has_transparent_hugepage()) {
>+  if (!has_transparent_hugepage() || (HPAGE_PMD_ORDER > MAX_ORDER)) {
>   transparent_hugepage_flags = 0;
>   return -EINVAL;
>   }

Fair for other archs that support THP, if you are changing
build error to runtime error?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 14/31] workqueue: replace POOL_MANAGING_WORKERS flag with worker_pool->manager_mutex

2013-03-01 Thread Tejun Heo
POOL_MANAGING_WORKERS is used to synchronize the manager role.
Synchronizing among workers doesn't need blocking and that's why it's
implemented as a flag.

It got converted to a mutex a while back to add blocking wait from CPU
hotplug path - 6037315269 ("workqueue: use mutex for global_cwq
manager exclusion").  Later it turned out that synchronization among
workers and cpu hotplug need to be done separately.  Eventually,
POOL_MANAGING_WORKERS is restored and workqueue->manager_mutex got
morphed into workqueue->assoc_mutex - 552a37e936 ("workqueue: restore
POOL_MANAGING_WORKERS") and b2eb83d123 ("workqueue: rename
manager_mutex to assoc_mutex").

Now, we're gonna need to be able to lock out managers from
destroy_workqueue() to support multiple unbound pools with custom
attributes making it again necessary to be able to block on the
manager role.  This patch replaces POOL_MANAGING_WORKERS with
worker_pool->manager_mutex.

This patch doesn't introduce any behavior changes.

Signed-off-by: Tejun Heo 
---
 kernel/workqueue.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 2645218..68b3443 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -64,7 +64,6 @@ enum {
 * create_worker() is in progress.
 */
POOL_MANAGE_WORKERS = 1 << 0,   /* need to manage workers */
-   POOL_MANAGING_WORKERS   = 1 << 1,   /* managing workers */
POOL_DISASSOCIATED  = 1 << 2,   /* cpu can't serve workers */
POOL_FREEZING   = 1 << 3,   /* freeze in progress */
 
@@ -145,6 +144,7 @@ struct worker_pool {
DECLARE_HASHTABLE(busy_hash, BUSY_WORKER_HASH_ORDER);
/* L: hash of busy workers */
 
+   struct mutexmanager_mutex;  /* the holder is the manager */
struct mutexassoc_mutex;/* protect POOL_DISASSOCIATED */
struct ida  worker_ida; /* L: for worker IDs */
 
@@ -702,7 +702,7 @@ static bool need_to_manage_workers(struct worker_pool *pool)
 /* Do we have too many workers and should some go away? */
 static bool too_many_workers(struct worker_pool *pool)
 {
-   bool managing = pool->flags & POOL_MANAGING_WORKERS;
+   bool managing = mutex_is_locked(>manager_mutex);
int nr_idle = pool->nr_idle + managing; /* manager is considered idle */
int nr_busy = pool->nr_workers - nr_idle;
 
@@ -2027,15 +2027,13 @@ static bool manage_workers(struct worker *worker)
struct worker_pool *pool = worker->pool;
bool ret = false;
 
-   if (pool->flags & POOL_MANAGING_WORKERS)
+   if (!mutex_trylock(>manager_mutex))
return ret;
 
-   pool->flags |= POOL_MANAGING_WORKERS;
-
/*
 * To simplify both worker management and CPU hotplug, hold off
 * management while hotplug is in progress.  CPU hotplug path can't
-* grab %POOL_MANAGING_WORKERS to achieve this because that can
+* grab @pool->manager_mutex to achieve this because that can
 * lead to idle worker depletion (all become busy thinking someone
 * else is managing) which in turn can result in deadlock under
 * extreme circumstances.  Use @pool->assoc_mutex to synchronize
@@ -2075,8 +2073,8 @@ static bool manage_workers(struct worker *worker)
ret |= maybe_destroy_workers(pool);
ret |= maybe_create_worker(pool);
 
-   pool->flags &= ~POOL_MANAGING_WORKERS;
mutex_unlock(>assoc_mutex);
+   mutex_unlock(>manager_mutex);
return ret;
 }
 
@@ -3805,6 +3803,7 @@ static int __init init_workqueues(void)
setup_timer(>mayday_timer, pool_mayday_timeout,
(unsigned long)pool);
 
+   mutex_init(>manager_mutex);
mutex_init(>assoc_mutex);
ida_init(>worker_ida);
 
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 13/31] workqueue: update synchronization rules on worker_pool_idr

2013-03-01 Thread Tejun Heo
Make worker_pool_idr protected by workqueue_lock for writes and
sched-RCU protected for reads.  Lockdep assertions are added to
for_each_pool() and get_work_pool() and all their users are converted
to either hold workqueue_lock or disable preemption/irq.

worker_pool_assign_id() is updated to hold workqueue_lock when
allocating a pool ID.  As idr_get_new() always performs RCU-safe
assignment, this is enough on the writer side.

As standard pools are never destroyed, there's nothing to do on that
side.

The locking is superflous at this point.  This is to help
implementation of unbound pools/pwqs with custom attributes.

This patch doesn't introduce any behavior changes.

Signed-off-by: Tejun Heo 
---
 kernel/workqueue.c | 69 ++
 1 file changed, 44 insertions(+), 25 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index ff51c59..2645218 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -282,9 +282,16 @@ static inline int __next_wq_cpu(int cpu, const struct 
cpumask *mask,
  * for_each_pool - iterate through all worker_pools in the system
  * @pool: iteration cursor
  * @id: integer used for iteration
+ *
+ * This must be called either with workqueue_lock held or sched RCU read
+ * locked.  If the pool needs to be used beyond the locking in effect, the
+ * caller is responsible for guaranteeing that the pool stays online.
+ *
+ * The if clause exists only for the lockdep assertion and can be ignored.
  */
 #define for_each_pool(pool, id)
\
-   idr_for_each_entry(_pool_idr, pool, id)
+   idr_for_each_entry(_pool_idr, pool, id)  \
+   if (({ assert_rcu_or_wq_lock(); true; }))
 
 /**
  * for_each_pwq - iterate through all pool_workqueues of the specified 
workqueue
@@ -430,8 +437,10 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(struct worker_pool 
[NR_STD_WORKER_POOLS],
 cpu_std_worker_pools);
 static struct worker_pool unbound_std_worker_pools[NR_STD_WORKER_POOLS];
 
-/* idr of all pools */
-static DEFINE_MUTEX(worker_pool_idr_mutex);
+/*
+ * idr of all pools.  Modifications are protected by workqueue_lock.  Read
+ * accesses are protected by sched-RCU protected.
+ */
 static DEFINE_IDR(worker_pool_idr);
 
 static int worker_thread(void *__worker);
@@ -454,23 +463,18 @@ static int worker_pool_assign_id(struct worker_pool *pool)
 {
int ret;
 
-   mutex_lock(_pool_idr_mutex);
-   idr_pre_get(_pool_idr, GFP_KERNEL);
-   ret = idr_get_new(_pool_idr, pool, >id);
-   mutex_unlock(_pool_idr_mutex);
+   do {
+   if (!idr_pre_get(_pool_idr, GFP_KERNEL))
+   return -ENOMEM;
+
+   spin_lock_irq(_lock);
+   ret = idr_get_new(_pool_idr, pool, >id);
+   spin_unlock_irq(_lock);
+   } while (ret == -EAGAIN);
 
return ret;
 }
 
-/*
- * Lookup worker_pool by id.  The idr currently is built during boot and
- * never modified.  Don't worry about locking for now.
- */
-static struct worker_pool *worker_pool_by_id(int pool_id)
-{
-   return idr_find(_pool_idr, pool_id);
-}
-
 static struct worker_pool *get_std_worker_pool(int cpu, bool highpri)
 {
struct worker_pool *pools = std_worker_pools(cpu);
@@ -584,13 +588,23 @@ static struct pool_workqueue *get_work_pwq(struct 
work_struct *work)
  * @work: the work item of interest
  *
  * Return the worker_pool @work was last associated with.  %NULL if none.
+ *
+ * Pools are created and destroyed under workqueue_lock, and allows read
+ * access under sched-RCU read lock.  As such, this function should be
+ * called under workqueue_lock or with preemption disabled.
+ *
+ * All fields of the returned pool are accessible as long as the above
+ * mentioned locking is in effect.  If the returned pool needs to be used
+ * beyond the critical section, the caller is responsible for ensuring the
+ * returned pool is and stays online.
  */
 static struct worker_pool *get_work_pool(struct work_struct *work)
 {
unsigned long data = atomic_long_read(>data);
-   struct worker_pool *pool;
int pool_id;
 
+   assert_rcu_or_wq_lock();
+
if (data & WORK_STRUCT_PWQ)
return ((struct pool_workqueue *)
(data & WORK_STRUCT_WQ_DATA_MASK))->pool;
@@ -599,9 +613,7 @@ static struct worker_pool *get_work_pool(struct work_struct 
*work)
if (pool_id == WORK_OFFQ_POOL_NONE)
return NULL;
 
-   pool = worker_pool_by_id(pool_id);
-   WARN_ON_ONCE(!pool);
-   return pool;
+   return idr_find(_pool_idr, pool_id);
 }
 
 /**
@@ -2767,11 +2779,15 @@ static bool start_flush_work(struct work_struct *work, 
struct wq_barrier *barr)
struct pool_workqueue *pwq;
 
might_sleep();
+
+   local_irq_disable();
pool = get_work_pool(work);
-   if (!pool)
+   if (!pool) {
+  

[PATCH 02/31] workqueue: make workqueue_lock irq-safe

2013-03-01 Thread Tejun Heo
workqueue_lock will be used to synchronize areas which require
irq-safety and there isn't much benefit in keeping it not irq-safe.
Make it irq-safe.

This patch doesn't introduce any visible behavior changes.

Signed-off-by: Tejun Heo 
---
 kernel/workqueue.c | 44 ++--
 1 file changed, 22 insertions(+), 22 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index a533e77..61f78ef 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2717,10 +2717,10 @@ void drain_workqueue(struct workqueue_struct *wq)
 * hotter than drain_workqueue() and already looks at @wq->flags.
 * Use WQ_DRAINING so that queue doesn't have to check nr_drainers.
 */
-   spin_lock(_lock);
+   spin_lock_irq(_lock);
if (!wq->nr_drainers++)
wq->flags |= WQ_DRAINING;
-   spin_unlock(_lock);
+   spin_unlock_irq(_lock);
 reflush:
flush_workqueue(wq);
 
@@ -2742,10 +2742,10 @@ reflush:
goto reflush;
}
 
-   spin_lock(_lock);
+   spin_lock_irq(_lock);
if (!--wq->nr_drainers)
wq->flags &= ~WQ_DRAINING;
-   spin_unlock(_lock);
+   spin_unlock_irq(_lock);
 }
 EXPORT_SYMBOL_GPL(drain_workqueue);
 
@@ -3235,7 +3235,7 @@ struct workqueue_struct *__alloc_workqueue_key(const char 
*fmt,
 * list.  Grab it, set max_active accordingly and add the new
 * workqueue to workqueues list.
 */
-   spin_lock(_lock);
+   spin_lock_irq(_lock);
 
if (workqueue_freezing && wq->flags & WQ_FREEZABLE)
for_each_pwq_cpu(cpu, wq)
@@ -3243,7 +3243,7 @@ struct workqueue_struct *__alloc_workqueue_key(const char 
*fmt,
 
list_add(>list, );
 
-   spin_unlock(_lock);
+   spin_unlock_irq(_lock);
 
return wq;
 err:
@@ -3287,9 +3287,9 @@ void destroy_workqueue(struct workqueue_struct *wq)
 * wq list is used to freeze wq, remove from list after
 * flushing is complete in case freeze races us.
 */
-   spin_lock(_lock);
+   spin_lock_irq(_lock);
list_del(>list);
-   spin_unlock(_lock);
+   spin_unlock_irq(_lock);
 
if (wq->flags & WQ_RESCUER) {
kthread_stop(wq->rescuer->task);
@@ -3338,7 +3338,7 @@ void workqueue_set_max_active(struct workqueue_struct 
*wq, int max_active)
 
max_active = wq_clamp_max_active(max_active, wq->flags, wq->name);
 
-   spin_lock(_lock);
+   spin_lock_irq(_lock);
 
wq->saved_max_active = max_active;
 
@@ -3346,16 +3346,16 @@ void workqueue_set_max_active(struct workqueue_struct 
*wq, int max_active)
struct pool_workqueue *pwq = get_pwq(cpu, wq);
struct worker_pool *pool = pwq->pool;
 
-   spin_lock_irq(>lock);
+   spin_lock(>lock);
 
if (!(wq->flags & WQ_FREEZABLE) ||
!(pool->flags & POOL_FREEZING))
pwq_set_max_active(pwq, max_active);
 
-   spin_unlock_irq(>lock);
+   spin_unlock(>lock);
}
 
-   spin_unlock(_lock);
+   spin_unlock_irq(_lock);
 }
 EXPORT_SYMBOL_GPL(workqueue_set_max_active);
 
@@ -3602,7 +3602,7 @@ void freeze_workqueues_begin(void)
 {
unsigned int cpu;
 
-   spin_lock(_lock);
+   spin_lock_irq(_lock);
 
WARN_ON_ONCE(workqueue_freezing);
workqueue_freezing = true;
@@ -3612,7 +3612,7 @@ void freeze_workqueues_begin(void)
struct workqueue_struct *wq;
 
for_each_std_worker_pool(pool, cpu) {
-   spin_lock_irq(>lock);
+   spin_lock(>lock);
 
WARN_ON_ONCE(pool->flags & POOL_FREEZING);
pool->flags |= POOL_FREEZING;
@@ -3625,11 +3625,11 @@ void freeze_workqueues_begin(void)
pwq->max_active = 0;
}
 
-   spin_unlock_irq(>lock);
+   spin_unlock(>lock);
}
}
 
-   spin_unlock(_lock);
+   spin_unlock_irq(_lock);
 }
 
 /**
@@ -3650,7 +3650,7 @@ bool freeze_workqueues_busy(void)
unsigned int cpu;
bool busy = false;
 
-   spin_lock(_lock);
+   spin_lock_irq(_lock);
 
WARN_ON_ONCE(!workqueue_freezing);
 
@@ -3674,7 +3674,7 @@ bool freeze_workqueues_busy(void)
}
}
 out_unlock:
-   spin_unlock(_lock);
+   spin_unlock_irq(_lock);
return busy;
 }
 
@@ -3691,7 +3691,7 @@ void thaw_workqueues(void)
 {
unsigned int cpu;
 
-   spin_lock(_lock);
+   spin_lock_irq(_lock);
 
if (!workqueue_freezing)
goto out_unlock;
@@ -3701,7 +3701,7 @@ void thaw_workqueues(void)
struct workqueue_struct *wq;
 
for_each_std_worker_pool(pool, cpu) {
-   spin_lock_irq(>lock);
+   spin_lock(>lock);
 
  

[PATCH 17/31] workqueue: implement attribute-based unbound worker_pool management

2013-03-01 Thread Tejun Heo
This patch makes unbound worker_pools reference counted and
dynamically created and destroyed as workqueues needing them come and
go.  All unbound worker_pools are hashed on unbound_pool_hash which is
keyed by the content of worker_pool->attrs.

When an unbound workqueue is allocated, get_unbound_pool() is called
with the attributes of the workqueue.  If there already is a matching
worker_pool, the reference count is bumped and the pool is returned.
If not, a new worker_pool with matching attributes is created and
returned.

When an unbound workqueue is destroyed, put_unbound_pool() is called
which decrements the reference count of the associated worker_pool.
If the refcnt reaches zero, the worker_pool is destroyed in sched-RCU
safe way.

Note that the standard unbound worker_pools - normal and highpri ones
with no specific cpumask affinity - are no longer created explicitly
during init_workqueues().  init_workqueues() only initializes
workqueue_attrs to be used for standard unbound pools -
unbound_std_wq_attrs[].  The pools are spawned on demand as workqueues
are created.

Signed-off-by: Tejun Heo 
---
 kernel/workqueue.c | 230 ++---
 1 file changed, 218 insertions(+), 12 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 7eba824..fb91b67 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -41,6 +41,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -80,6 +81,7 @@ enum {
 
NR_STD_WORKER_POOLS = 2,/* # standard pools per cpu */
 
+   UNBOUND_POOL_HASH_ORDER = 6,/* hashed by pool->attrs */
BUSY_WORKER_HASH_ORDER  = 6,/* 64 pointers */
 
MAX_IDLE_WORKERS_RATIO  = 4,/* 1/4 of busy can be idle */
@@ -149,6 +151,8 @@ struct worker_pool {
struct ida  worker_ida; /* L: for worker IDs */
 
struct workqueue_attrs  *attrs; /* I: worker attributes */
+   struct hlist_node   hash_node;  /* R: unbound_pool_hash node */
+   atomic_trefcnt; /* refcnt for unbound pools */
 
/*
 * The current concurrency level.  As it's likely to be accessed
@@ -156,6 +160,12 @@ struct worker_pool {
 * cacheline.
 */
atomic_tnr_running cacheline_aligned_in_smp;
+
+   /*
+* Destruction of pool is sched-RCU protected to allow dereferences
+* from get_work_pool().
+*/
+   struct rcu_head rcu;
 } cacheline_aligned_in_smp;
 
 /*
@@ -218,6 +228,11 @@ struct workqueue_struct {
 
 static struct kmem_cache *pwq_cache;
 
+/* hash of all unbound pools keyed by pool->attrs */
+static DEFINE_HASHTABLE(unbound_pool_hash, UNBOUND_POOL_HASH_ORDER);
+
+static struct workqueue_attrs *unbound_std_wq_attrs[NR_STD_WORKER_POOLS];
+
 struct workqueue_struct *system_wq __read_mostly;
 EXPORT_SYMBOL_GPL(system_wq);
 struct workqueue_struct *system_highpri_wq __read_mostly;
@@ -1740,7 +1755,7 @@ static struct worker *create_worker(struct worker_pool 
*pool)
worker->pool = pool;
worker->id = id;
 
-   if (pool->cpu != WORK_CPU_UNBOUND)
+   if (pool->cpu >= 0)
worker->task = kthread_create_on_node(worker_thread,
worker, cpu_to_node(pool->cpu),
"kworker/%d:%d%s", pool->cpu, id, pri);
@@ -3159,6 +3174,54 @@ fail:
return NULL;
 }
 
+static void copy_workqueue_attrs(struct workqueue_attrs *to,
+const struct workqueue_attrs *from)
+{
+   to->nice = from->nice;
+   cpumask_copy(to->cpumask, from->cpumask);
+}
+
+/*
+ * Hacky implementation of jhash of bitmaps which only considers the
+ * specified number of bits.  We probably want a proper implementation in
+ * include/linux/jhash.h.
+ */
+static u32 jhash_bitmap(const unsigned long *bitmap, int bits, u32 hash)
+{
+   int nr_longs = bits / BITS_PER_LONG;
+   int nr_leftover = bits % BITS_PER_LONG;
+   unsigned long leftover = 0;
+
+   if (nr_longs)
+   hash = jhash(bitmap, nr_longs * sizeof(long), hash);
+   if (nr_leftover) {
+   bitmap_copy(, bitmap + nr_longs, nr_leftover);
+   hash = jhash(, sizeof(long), hash);
+   }
+   return hash;
+}
+
+/* hash value of the content of @attr */
+static u32 wqattrs_hash(const struct workqueue_attrs *attrs)
+{
+   u32 hash = 0;
+
+   hash = jhash_1word(attrs->nice, hash);
+   hash = jhash_bitmap(cpumask_bits(attrs->cpumask), nr_cpu_ids, hash);
+   return hash;
+}
+
+/* content equality test */
+static bool wqattrs_equal(const struct workqueue_attrs *a,
+ const struct workqueue_attrs *b)
+{
+   if (a->nice != b->nice)
+   return false;
+   if (!cpumask_equal(a->cpumask, b->cpumask))
+   return false;
+   return true;
+}
+
 

[PATCH 16/31] workqueue: introduce workqueue_attrs

2013-03-01 Thread Tejun Heo
Introduce struct workqueue_attrs which carries worker attributes -
currently the nice level and allowed cpumask along with helper
routines alloc_workqueue_attrs() and free_workqueue_attrs().

Each worker_pool now carries ->attrs describing the attributes of its
workers.  All functions dealing with cpumask and nice level of workers
are updated to follow worker_pool->attrs instead of determining them
from other characteristics of the worker_pool, and init_workqueues()
is updated to set worker_pool->attrs appropriately for all standard
pools.

Note that create_worker() is updated to always perform set_user_nice()
and use set_cpus_allowed_ptr() combined with manual assertion of
PF_THREAD_BOUND instead of kthread_bind().  This simplifies handling
random attributes without affecting the outcome.

This patch doesn't introduce any behavior changes.

Signed-off-by: Tejun Heo 
---
 include/linux/workqueue.h |  12 ++
 kernel/workqueue.c| 103 --
 2 files changed, 93 insertions(+), 22 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 899be66..2683e8e 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -115,6 +115,15 @@ struct delayed_work {
int cpu;
 };
 
+/*
+ * A struct for workqueue attributes.  This can be used to change
+ * attributes of an unbound workqueue.
+ */
+struct workqueue_attrs {
+   int nice;   /* nice level */
+   cpumask_var_t   cpumask;/* allowed CPUs */
+};
+
 static inline struct delayed_work *to_delayed_work(struct work_struct *work)
 {
return container_of(work, struct delayed_work, work);
@@ -399,6 +408,9 @@ __alloc_workqueue_key(const char *fmt, unsigned int flags, 
int max_active,
 
 extern void destroy_workqueue(struct workqueue_struct *wq);
 
+struct workqueue_attrs *alloc_workqueue_attrs(gfp_t gfp_mask);
+void free_workqueue_attrs(struct workqueue_attrs *attrs);
+
 extern bool queue_work_on(int cpu, struct workqueue_struct *wq,
struct work_struct *work);
 extern bool queue_work(struct workqueue_struct *wq, struct work_struct *work);
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index f97539b..7eba824 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -148,6 +148,8 @@ struct worker_pool {
struct mutexassoc_mutex;/* protect POOL_DISASSOCIATED */
struct ida  worker_ida; /* L: for worker IDs */
 
+   struct workqueue_attrs  *attrs; /* I: worker attributes */
+
/*
 * The current concurrency level.  As it's likely to be accessed
 * from other CPUs during try_to_wake_up(), put it in a separate
@@ -1563,14 +1565,13 @@ __acquires(>lock)
 * against POOL_DISASSOCIATED.
 */
if (!(pool->flags & POOL_DISASSOCIATED))
-   set_cpus_allowed_ptr(current, get_cpu_mask(pool->cpu));
+   set_cpus_allowed_ptr(current, pool->attrs->cpumask);
 
spin_lock_irq(>lock);
if (pool->flags & POOL_DISASSOCIATED)
return false;
if (task_cpu(current) == pool->cpu &&
-   cpumask_equal(>cpus_allowed,
- get_cpu_mask(pool->cpu)))
+   cpumask_equal(>cpus_allowed, pool->attrs->cpumask))
return true;
spin_unlock_irq(>lock);
 
@@ -1677,7 +1678,7 @@ static void rebind_workers(struct worker_pool *pool)
 * wq doesn't really matter but let's keep @worker->pool
 * and @pwq->pool consistent for sanity.
 */
-   if (std_worker_pool_pri(worker->pool))
+   if (worker->pool->attrs->nice < 0)
wq = system_highpri_wq;
else
wq = system_wq;
@@ -1719,7 +1720,7 @@ static struct worker *alloc_worker(void)
  */
 static struct worker *create_worker(struct worker_pool *pool)
 {
-   const char *pri = std_worker_pool_pri(pool) ? "H" : "";
+   const char *pri = pool->attrs->nice < 0  ? "H" : "";
struct worker *worker = NULL;
int id = -1;
 
@@ -1749,24 +1750,23 @@ static struct worker *create_worker(struct worker_pool 
*pool)
if (IS_ERR(worker->task))
goto fail;
 
-   if (std_worker_pool_pri(pool))
-   set_user_nice(worker->task, HIGHPRI_NICE_LEVEL);
+   set_user_nice(worker->task, pool->attrs->nice);
+   set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
 
/*
-* Determine CPU binding of the new worker depending on
-* %POOL_DISASSOCIATED.  The caller is responsible for ensuring the
-* flag remains stable across this function.  See the comments
-* above the flag definition for details.
-*
-* As an unbound worker may later become a 

[PATCH 05/31] workqueue: replace for_each_pwq_cpu() with for_each_pwq()

2013-03-01 Thread Tejun Heo
Introduce for_each_pwq() which iterates all pool_workqueues of a
workqueue using the recently added workqueue->pwqs list and replace
for_each_pwq_cpu() usages with it.

This is primarily to remove the single unbound CPU assumption from pwq
iteration for the scheduled unbound pools with custom attributes
support which would introduce multiple unbound pwqs per workqueue;
however, it also simplifies iterator users.

Note that pwq->pool initialization is moved to alloc_and_link_pwqs()
as that now is the only place which is explicitly handling the two pwq
types.

This patch doesn't introduce any visible behavior changes.

Signed-off-by: Tejun Heo 
---
 kernel/workqueue.c | 53 ++---
 1 file changed, 22 insertions(+), 31 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index d493293..0055a31 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -273,12 +273,6 @@ static inline int __next_wq_cpu(int cpu, const struct 
cpumask *mask,
return WORK_CPU_END;
 }
 
-static inline int __next_pwq_cpu(int cpu, const struct cpumask *mask,
-struct workqueue_struct *wq)
-{
-   return __next_wq_cpu(cpu, mask, !(wq->flags & WQ_UNBOUND) ? 1 : 2);
-}
-
 /*
  * CPU iterators
  *
@@ -289,8 +283,6 @@ static inline int __next_pwq_cpu(int cpu, const struct 
cpumask *mask,
  *
  * for_each_wq_cpu()   : possible CPUs + WORK_CPU_UNBOUND
  * for_each_online_wq_cpu(): online CPUs + WORK_CPU_UNBOUND
- * for_each_pwq_cpu()  : possible CPUs for bound workqueues,
- *   WORK_CPU_UNBOUND for unbound workqueues
  */
 #define for_each_wq_cpu(cpu)   \
for ((cpu) = __next_wq_cpu(-1, cpu_possible_mask, 3);   \
@@ -302,10 +294,13 @@ static inline int __next_pwq_cpu(int cpu, const struct 
cpumask *mask,
 (cpu) < WORK_CPU_END;  \
 (cpu) = __next_wq_cpu((cpu), cpu_online_mask, 3))
 
-#define for_each_pwq_cpu(cpu, wq)  \
-   for ((cpu) = __next_pwq_cpu(-1, cpu_possible_mask, (wq));   \
-(cpu) < WORK_CPU_END;  \
-(cpu) = __next_pwq_cpu((cpu), cpu_possible_mask, (wq)))
+/**
+ * for_each_pwq - iterate through all pool_workqueues of the specified 
workqueue
+ * @pwq: iteration cursor
+ * @wq: the target workqueue
+ */
+#define for_each_pwq(pwq, wq)  \
+   list_for_each_entry((pwq), &(wq)->pwqs, pwqs_node)
 
 #ifdef CONFIG_DEBUG_OBJECTS_WORK
 
@@ -2507,15 +2502,14 @@ static bool flush_workqueue_prep_pwqs(struct 
workqueue_struct *wq,
  int flush_color, int work_color)
 {
bool wait = false;
-   unsigned int cpu;
+   struct pool_workqueue *pwq;
 
if (flush_color >= 0) {
WARN_ON_ONCE(atomic_read(>nr_pwqs_to_flush));
atomic_set(>nr_pwqs_to_flush, 1);
}
 
-   for_each_pwq_cpu(cpu, wq) {
-   struct pool_workqueue *pwq = get_pwq(cpu, wq);
+   for_each_pwq(pwq, wq) {
struct worker_pool *pool = pwq->pool;
 
spin_lock_irq(>lock);
@@ -2714,7 +2708,7 @@ EXPORT_SYMBOL_GPL(flush_workqueue);
 void drain_workqueue(struct workqueue_struct *wq)
 {
unsigned int flush_cnt = 0;
-   unsigned int cpu;
+   struct pool_workqueue *pwq;
 
/*
 * __queue_work() needs to test whether there are drainers, is much
@@ -2728,8 +2722,7 @@ void drain_workqueue(struct workqueue_struct *wq)
 reflush:
flush_workqueue(wq);
 
-   for_each_pwq_cpu(cpu, wq) {
-   struct pool_workqueue *pwq = get_pwq(cpu, wq);
+   for_each_pwq(pwq, wq) {
bool drained;
 
spin_lock_irq(>pool->lock);
@@ -3102,6 +3095,7 @@ int keventd_up(void)
 
 static int alloc_and_link_pwqs(struct workqueue_struct *wq)
 {
+   bool highpri = wq->flags & WQ_HIGHPRI;
int cpu;
 
if (!(wq->flags & WQ_UNBOUND)) {
@@ -3112,6 +3106,7 @@ static int alloc_and_link_pwqs(struct workqueue_struct 
*wq)
for_each_possible_cpu(cpu) {
struct pool_workqueue *pwq = get_pwq(cpu, wq);
 
+   pwq->pool = get_std_worker_pool(cpu, highpri);
list_add_tail(>pwqs_node, >pwqs);
}
} else {
@@ -3122,6 +3117,7 @@ static int alloc_and_link_pwqs(struct workqueue_struct 
*wq)
return -ENOMEM;
 
wq->pool_wq.single = pwq;
+   pwq->pool = get_std_worker_pool(WORK_CPU_UNBOUND, highpri);
list_add_tail(>pwqs_node, >pwqs);
}
 
@@ -3156,7 +3152,7 @@ struct workqueue_struct *__alloc_workqueue_key(const char 
*fmt,
 {
va_list args, args1;
struct workqueue_struct *wq;
-   unsigned int cpu;
+   struct 

[PATCH 03/31] workqueue: introduce kmem_cache for pool_workqueues

2013-03-01 Thread Tejun Heo
pool_workqueues need to be aligned to 1 << WORK_STRUCT_FLAG_BITS as
the lower bits of work->data are used for flags when they're pointing
to pool_workqueues.

Due to historical reasons, unbound pool_workqueues are allocated using
kzalloc() with sufficient buffer area for alignment and aligned
manually.  The original pointer is stored at the end which free_pwqs()
retrieves when freeing it.

There's no reason for this hackery anymore.  Set alignment of struct
pool_workqueue to 1 << WORK_STRUCT_FLAG_BITS, add kmem_cache for
pool_workqueues with proper alignment and replace the hacky alloc and
free implementation with plain kmem_cache_zalloc/free().

In case WORK_STRUCT_FLAG_BITS gets shrunk too much and makes fields of
pool_workqueues misaligned, trigger WARN if the alignment of struct
pool_workqueue becomes smaller than that of long long.

Note that assertion on IS_ALIGNED() is removed from alloc_pwqs().  We
already have another one in pwq init loop in __alloc_workqueue_key().

This patch doesn't introduce any visible behavior changes.

Signed-off-by: Tejun Heo 
---
 kernel/workqueue.c | 43 ---
 1 file changed, 12 insertions(+), 31 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 61f78ef..69f1268 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -169,7 +169,7 @@ struct pool_workqueue {
int nr_active;  /* L: nr of active works */
int max_active; /* L: max active works */
struct list_headdelayed_works;  /* L: delayed works */
-};
+} __aligned(1 << WORK_STRUCT_FLAG_BITS);
 
 /*
  * Structure used to wait for workqueue flush.
@@ -233,6 +233,8 @@ struct workqueue_struct {
charname[]; /* I: workqueue name */
 };
 
+static struct kmem_cache *pwq_cache;
+
 struct workqueue_struct *system_wq __read_mostly;
 EXPORT_SYMBOL_GPL(system_wq);
 struct workqueue_struct *system_highpri_wq __read_mostly;
@@ -3098,34 +3100,11 @@ int keventd_up(void)
 
 static int alloc_pwqs(struct workqueue_struct *wq)
 {
-   /*
-* pwqs are forced aligned according to WORK_STRUCT_FLAG_BITS.
-* Make sure that the alignment isn't lower than that of
-* unsigned long long.
-*/
-   const size_t size = sizeof(struct pool_workqueue);
-   const size_t align = max_t(size_t, 1 << WORK_STRUCT_FLAG_BITS,
-  __alignof__(unsigned long long));
-
if (!(wq->flags & WQ_UNBOUND))
-   wq->pool_wq.pcpu = __alloc_percpu(size, align);
-   else {
-   void *ptr;
-
-   /*
-* Allocate enough room to align pwq and put an extra
-* pointer at the end pointing back to the originally
-* allocated pointer which will be used for free.
-*/
-   ptr = kzalloc(size + align + sizeof(void *), GFP_KERNEL);
-   if (ptr) {
-   wq->pool_wq.single = PTR_ALIGN(ptr, align);
-   *(void **)(wq->pool_wq.single + 1) = ptr;
-   }
-   }
+   wq->pool_wq.pcpu = alloc_percpu(struct pool_workqueue);
+   else
+   wq->pool_wq.single = kmem_cache_zalloc(pwq_cache, GFP_KERNEL);
 
-   /* just in case, make sure it's actually aligned */
-   BUG_ON(!IS_ALIGNED(wq->pool_wq.v, align));
return wq->pool_wq.v ? 0 : -ENOMEM;
 }
 
@@ -3133,10 +3112,8 @@ static void free_pwqs(struct workqueue_struct *wq)
 {
if (!(wq->flags & WQ_UNBOUND))
free_percpu(wq->pool_wq.pcpu);
-   else if (wq->pool_wq.single) {
-   /* the pointer to free is stored right after the pwq */
-   kfree(*(void **)(wq->pool_wq.single + 1));
-   }
+   else
+   kmem_cache_free(pwq_cache, wq->pool_wq.single);
 }
 
 static int wq_clamp_max_active(int max_active, unsigned int flags,
@@ -3737,6 +3714,10 @@ static int __init init_workqueues(void)
BUILD_BUG_ON((1LU << (BITS_PER_LONG - WORK_OFFQ_POOL_SHIFT)) <
 WORK_CPU_END * NR_STD_WORKER_POOLS);
 
+   WARN_ON(__alignof__(struct pool_workqueue) < __alignof__(long long));
+
+   pwq_cache = KMEM_CACHE(pool_workqueue, SLAB_PANIC);
+
cpu_notifier(workqueue_cpu_up_callback, CPU_PRI_WORKQUEUE_UP);
hotcpu_notifier(workqueue_cpu_down_callback, CPU_PRI_WORKQUEUE_DOWN);
 
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 04/31] workqueue: add workqueue_struct->pwqs list

2013-03-01 Thread Tejun Heo
Add workqueue_struct->pwqs list and chain all pool_workqueues
belonging to a workqueue there.  This will be used to implement
generic pool_workqueue iteration and handle multiple pool_workqueues
for the scheduled unbound pools with custom attributes.

This patch doesn't introduce any visible behavior changes.

Signed-off-by: Tejun Heo 
---
 kernel/workqueue.c | 33 +++--
 1 file changed, 27 insertions(+), 6 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 69f1268..d493293 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -169,6 +169,7 @@ struct pool_workqueue {
int nr_active;  /* L: nr of active works */
int max_active; /* L: max active works */
struct list_headdelayed_works;  /* L: delayed works */
+   struct list_headpwqs_node;  /* I: node on wq->pwqs */
 } __aligned(1 << WORK_STRUCT_FLAG_BITS);
 
 /*
@@ -212,6 +213,7 @@ struct workqueue_struct {
struct pool_workqueue   *single;
unsigned long   v;
} pool_wq;  /* I: pwq's */
+   struct list_headpwqs;   /* I: all pwqs of this wq */
struct list_headlist;   /* W: list of all workqueues */
 
struct mutexflush_mutex;/* protects wq flushing */
@@ -3098,14 +3100,32 @@ int keventd_up(void)
return system_wq != NULL;
 }
 
-static int alloc_pwqs(struct workqueue_struct *wq)
+static int alloc_and_link_pwqs(struct workqueue_struct *wq)
 {
-   if (!(wq->flags & WQ_UNBOUND))
+   int cpu;
+
+   if (!(wq->flags & WQ_UNBOUND)) {
wq->pool_wq.pcpu = alloc_percpu(struct pool_workqueue);
-   else
-   wq->pool_wq.single = kmem_cache_zalloc(pwq_cache, GFP_KERNEL);
+   if (!wq->pool_wq.pcpu)
+   return -ENOMEM;
+
+   for_each_possible_cpu(cpu) {
+   struct pool_workqueue *pwq = get_pwq(cpu, wq);
 
-   return wq->pool_wq.v ? 0 : -ENOMEM;
+   list_add_tail(>pwqs_node, >pwqs);
+   }
+   } else {
+   struct pool_workqueue *pwq;
+
+   pwq = kmem_cache_zalloc(pwq_cache, GFP_KERNEL);
+   if (!pwq)
+   return -ENOMEM;
+
+   wq->pool_wq.single = pwq;
+   list_add_tail(>pwqs_node, >pwqs);
+   }
+
+   return 0;
 }
 
 static void free_pwqs(struct workqueue_struct *wq)
@@ -3167,13 +3187,14 @@ struct workqueue_struct *__alloc_workqueue_key(const 
char *fmt,
wq->saved_max_active = max_active;
mutex_init(>flush_mutex);
atomic_set(>nr_pwqs_to_flush, 0);
+   INIT_LIST_HEAD(>pwqs);
INIT_LIST_HEAD(>flusher_queue);
INIT_LIST_HEAD(>flusher_overflow);
 
lockdep_init_map(>lockdep_map, lock_name, key, 0);
INIT_LIST_HEAD(>list);
 
-   if (alloc_pwqs(wq) < 0)
+   if (alloc_and_link_pwqs(wq) < 0)
goto err;
 
for_each_pwq_cpu(cpu, wq) {
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 18/31] workqueue: remove unbound_std_worker_pools[] and related helpers

2013-03-01 Thread Tejun Heo
Workqueue no longer makes use of unbound_std_worker_pools[].  All
unbound worker_pools are created dynamically and there's nothing
special about the standard ones.  With unbound_std_worker_pools[]
unused, workqueue no longer has places where it needs to treat the
per-cpu pools-cpu and unbound pools together.

Remove unbound_std_worker_pools[] and the helpers wrapping it to
present unified per-cpu and unbound standard worker_pools.

* for_each_std_worker_pool() now only walks through per-cpu pools.

* for_each[_online]_wq_cpu() which don't have any users left are
  removed.

* std_worker_pools() and std_worker_pool_pri() are unused and removed.

* get_std_worker_pool() is removed.  Its only user -
  alloc_and_link_pwqs() - only used it for per-cpu pools anyway.  Open
  code per_cpu access in alloc_and_link_pwqs() instead.

This patch doesn't introduce any functional changes.

Signed-off-by: Tejun Heo 
---
 kernel/workqueue.c | 66 +-
 1 file changed, 6 insertions(+), 60 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index fb91b67..f7f627c 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -253,48 +253,13 @@ EXPORT_SYMBOL_GPL(system_freezable_wq);
   "sched RCU or workqueue lock should be held")
 
 #define for_each_std_worker_pool(pool, cpu)\
-   for ((pool) = _worker_pools(cpu)[0];\
-(pool) < _worker_pools(cpu)[NR_STD_WORKER_POOLS]; (pool)++)
+   for ((pool) = _cpu(cpu_std_worker_pools, cpu)[0];   \
+(pool) < _cpu(cpu_std_worker_pools, cpu)[NR_STD_WORKER_POOLS]; 
\
+(pool)++)
 
 #define for_each_busy_worker(worker, i, pos, pool) \
hash_for_each(pool->busy_hash, i, pos, worker, hentry)
 
-static inline int __next_wq_cpu(int cpu, const struct cpumask *mask,
-   unsigned int sw)
-{
-   if (cpu < nr_cpu_ids) {
-   if (sw & 1) {
-   cpu = cpumask_next(cpu, mask);
-   if (cpu < nr_cpu_ids)
-   return cpu;
-   }
-   if (sw & 2)
-   return WORK_CPU_UNBOUND;
-   }
-   return WORK_CPU_END;
-}
-
-/*
- * CPU iterators
- *
- * An extra cpu number is defined using an invalid cpu number
- * (WORK_CPU_UNBOUND) to host workqueues which are not bound to any
- * specific CPU.  The following iterators are similar to for_each_*_cpu()
- * iterators but also considers the unbound CPU.
- *
- * for_each_wq_cpu()   : possible CPUs + WORK_CPU_UNBOUND
- * for_each_online_wq_cpu(): online CPUs + WORK_CPU_UNBOUND
- */
-#define for_each_wq_cpu(cpu)   \
-   for ((cpu) = __next_wq_cpu(-1, cpu_possible_mask, 3);   \
-(cpu) < WORK_CPU_END;  \
-(cpu) = __next_wq_cpu((cpu), cpu_possible_mask, 3))
-
-#define for_each_online_wq_cpu(cpu)\
-   for ((cpu) = __next_wq_cpu(-1, cpu_online_mask, 3); \
-(cpu) < WORK_CPU_END;  \
-(cpu) = __next_wq_cpu((cpu), cpu_online_mask, 3))
-
 /**
  * for_each_pool - iterate through all worker_pools in the system
  * @pool: iteration cursor
@@ -452,7 +417,6 @@ static bool workqueue_freezing; /* W: have wqs 
started freezing? */
  */
 static DEFINE_PER_CPU_SHARED_ALIGNED(struct worker_pool [NR_STD_WORKER_POOLS],
 cpu_std_worker_pools);
-static struct worker_pool unbound_std_worker_pools[NR_STD_WORKER_POOLS];
 
 /*
  * idr of all pools.  Modifications are protected by workqueue_lock.  Read
@@ -462,19 +426,6 @@ static DEFINE_IDR(worker_pool_idr);
 
 static int worker_thread(void *__worker);
 
-static struct worker_pool *std_worker_pools(int cpu)
-{
-   if (cpu != WORK_CPU_UNBOUND)
-   return per_cpu(cpu_std_worker_pools, cpu);
-   else
-   return unbound_std_worker_pools;
-}
-
-static int std_worker_pool_pri(struct worker_pool *pool)
-{
-   return pool - std_worker_pools(pool->cpu);
-}
-
 /* allocate ID and assign it to @pool */
 static int worker_pool_assign_id(struct worker_pool *pool)
 {
@@ -492,13 +443,6 @@ static int worker_pool_assign_id(struct worker_pool *pool)
return ret;
 }
 
-static struct worker_pool *get_std_worker_pool(int cpu, bool highpri)
-{
-   struct worker_pool *pools = std_worker_pools(cpu);
-
-   return [highpri];
-}
-
 /**
  * first_pwq - return the first pool_workqueue of the specified workqueue
  * @wq: the target workqueue
@@ -3390,8 +3334,10 @@ static int alloc_and_link_pwqs(struct workqueue_struct 
*wq)
for_each_possible_cpu(cpu) {
struct pool_workqueue *pwq =
per_cpu_ptr(wq->cpu_pwqs, cpu);
+  

[PATCH 19/31] workqueue: drop "std" from cpu_std_worker_pools and for_each_std_worker_pool()

2013-03-01 Thread Tejun Heo
All per-cpu pools are standard, so there's no need to use both "cpu"
and "std" and for_each_std_worker_pool() is confusing in that it can
be used only for per-cpu pools.

* s/cpu_std_worker_pools/cpu_worker_pools/

* s/for_each_std_worker_pool()/for_each_cpu_worker_pool()/

Signed-off-by: Tejun Heo 
---
 kernel/workqueue.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index f7f627c..95a3dcc 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -252,9 +252,9 @@ EXPORT_SYMBOL_GPL(system_freezable_wq);
   lockdep_is_held(_lock),\
   "sched RCU or workqueue lock should be held")
 
-#define for_each_std_worker_pool(pool, cpu)\
-   for ((pool) = _cpu(cpu_std_worker_pools, cpu)[0];   \
-(pool) < _cpu(cpu_std_worker_pools, cpu)[NR_STD_WORKER_POOLS]; 
\
+#define for_each_cpu_worker_pool(pool, cpu)\
+   for ((pool) = _cpu(cpu_worker_pools, cpu)[0];   \
+(pool) < _cpu(cpu_worker_pools, cpu)[NR_STD_WORKER_POOLS]; \
 (pool)++)
 
 #define for_each_busy_worker(worker, i, pos, pool) \
@@ -416,7 +416,7 @@ static bool workqueue_freezing; /* W: have wqs 
started freezing? */
  * POOL_DISASSOCIATED set, and their workers have WORKER_UNBOUND set.
  */
 static DEFINE_PER_CPU_SHARED_ALIGNED(struct worker_pool [NR_STD_WORKER_POOLS],
-cpu_std_worker_pools);
+cpu_worker_pools);
 
 /*
  * idr of all pools.  Modifications are protected by workqueue_lock.  Read
@@ -3335,7 +3335,7 @@ static int alloc_and_link_pwqs(struct workqueue_struct 
*wq)
struct pool_workqueue *pwq =
per_cpu_ptr(wq->cpu_pwqs, cpu);
struct worker_pool *cpu_pools =
-   per_cpu(cpu_std_worker_pools, cpu);
+   per_cpu(cpu_worker_pools, cpu);
 
pwq->pool = _pools[highpri];
list_add_tail_rcu(>pwqs_node, >pwqs);
@@ -3688,7 +3688,7 @@ static void wq_unbind_fn(struct work_struct *work)
struct hlist_node *pos;
int i;
 
-   for_each_std_worker_pool(pool, cpu) {
+   for_each_cpu_worker_pool(pool, cpu) {
WARN_ON_ONCE(cpu != smp_processor_id());
 
mutex_lock(>assoc_mutex);
@@ -3731,7 +3731,7 @@ static void wq_unbind_fn(struct work_struct *work)
 * unbound chain execution of pending work items if other workers
 * didn't already.
 */
-   for_each_std_worker_pool(pool, cpu)
+   for_each_cpu_worker_pool(pool, cpu)
atomic_set(>nr_running, 0);
 }
 
@@ -3748,7 +3748,7 @@ static int __cpuinit workqueue_cpu_up_callback(struct 
notifier_block *nfb,
 
switch (action & ~CPU_TASKS_FROZEN) {
case CPU_UP_PREPARE:
-   for_each_std_worker_pool(pool, cpu) {
+   for_each_cpu_worker_pool(pool, cpu) {
struct worker *worker;
 
if (pool->nr_workers)
@@ -3766,7 +3766,7 @@ static int __cpuinit workqueue_cpu_up_callback(struct 
notifier_block *nfb,
 
case CPU_DOWN_FAILED:
case CPU_ONLINE:
-   for_each_std_worker_pool(pool, cpu) {
+   for_each_cpu_worker_pool(pool, cpu) {
mutex_lock(>assoc_mutex);
spin_lock_irq(>lock);
 
@@ -4006,7 +4006,7 @@ static int __init init_workqueues(void)
struct worker_pool *pool;
 
i = 0;
-   for_each_std_worker_pool(pool, cpu) {
+   for_each_cpu_worker_pool(pool, cpu) {
BUG_ON(init_worker_pool(pool));
pool->cpu = cpu;
cpumask_copy(pool->attrs->cpumask, cpumask_of(cpu));
@@ -4021,7 +4021,7 @@ static int __init init_workqueues(void)
for_each_online_cpu(cpu) {
struct worker_pool *pool;
 
-   for_each_std_worker_pool(pool, cpu) {
+   for_each_cpu_worker_pool(pool, cpu) {
struct worker *worker;
 
pool->flags &= ~POOL_DISASSOCIATED;
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 21/31] workqueue: drop WQ_RESCUER and test workqueue->rescuer for NULL instead

2013-03-01 Thread Tejun Heo
WQ_RESCUER is superflous.  WQ_MEM_RECLAIM indicates that the user
wants a rescuer and testing wq->rescuer for NULL can answer whether a
given workqueue has a rescuer or not.  Drop WQ_RESCUER and test
wq->rescuer directly.

This will help simplifying __alloc_workqueue_key() failure path by
allowing it to use destroy_workqueue() on a partially constructed
workqueue, which in turn will help implementing dynamic management of
pool_workqueues.

While at it, clear wq->rescuer after freeing it in
destroy_workqueue().  This is a precaution as scheduled changes will
make destruction more complex.

This patch doesn't introduce any functional changes.

Signed-off-by: Tejun Heo 
---
 include/linux/workqueue.h |  1 -
 kernel/workqueue.c| 22 ++
 2 files changed, 10 insertions(+), 13 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 2683e8e..0341403 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -294,7 +294,6 @@ enum {
WQ_CPU_INTENSIVE= 1 << 5, /* cpu instensive workqueue */
 
WQ_DRAINING = 1 << 6, /* internal: workqueue is draining */
-   WQ_RESCUER  = 1 << 7, /* internal: workqueue has rescuer */
 
WQ_MAX_ACTIVE   = 512,/* I like 512, better ideas? */
WQ_MAX_UNBOUND_PER_CPU  = 4,  /* 4 * #cpus for unbound wq */
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 1695bd6..bcc02bb 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1825,7 +1825,7 @@ static void send_mayday(struct work_struct *work)
 
lockdep_assert_held(_lock);
 
-   if (!(wq->flags & WQ_RESCUER))
+   if (!wq->rescuer)
return;
 
/* mayday mayday mayday */
@@ -2283,7 +2283,7 @@ sleep:
  * @__rescuer: self
  *
  * Workqueue rescuer thread function.  There's one rescuer for each
- * workqueue which has WQ_RESCUER set.
+ * workqueue which has WQ_MEM_RECLAIM set.
  *
  * Regular work processing on a pool may block trying to create a new
  * worker which uses GFP_KERNEL allocation which has slight chance of
@@ -2767,7 +2767,7 @@ static bool start_flush_work(struct work_struct *work, 
struct wq_barrier *barr)
 * flusher is not running on the same workqueue by verifying write
 * access.
 */
-   if (pwq->wq->saved_max_active == 1 || pwq->wq->flags & WQ_RESCUER)
+   if (pwq->wq->saved_max_active == 1 || pwq->wq->rescuer)
lock_map_acquire(>wq->lockdep_map);
else
lock_map_acquire_read(>wq->lockdep_map);
@@ -3405,13 +3405,6 @@ struct workqueue_struct *__alloc_workqueue_key(const 
char *fmt,
va_end(args);
va_end(args1);
 
-   /*
-* Workqueues which may be used during memory reclaim should
-* have a rescuer to guarantee forward progress.
-*/
-   if (flags & WQ_MEM_RECLAIM)
-   flags |= WQ_RESCUER;
-
max_active = max_active ?: WQ_DFL_ACTIVE;
max_active = wq_clamp_max_active(max_active, flags, wq->name);
 
@@ -3442,7 +3435,11 @@ struct workqueue_struct *__alloc_workqueue_key(const 
char *fmt,
}
local_irq_enable();
 
-   if (flags & WQ_RESCUER) {
+   /*
+* Workqueues which may be used during memory reclaim should
+* have a rescuer to guarantee forward progress.
+*/
+   if (flags & WQ_MEM_RECLAIM) {
struct worker *rescuer;
 
wq->rescuer = rescuer = alloc_worker();
@@ -3526,9 +3523,10 @@ void destroy_workqueue(struct workqueue_struct *wq)
 
spin_unlock_irq(_lock);
 
-   if (wq->flags & WQ_RESCUER) {
+   if (wq->rescuer) {
kthread_stop(wq->rescuer->task);
kfree(wq->rescuer);
+   wq->rescuer = NULL;
}
 
/*
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 20/31] workqueue: add pool ID to the names of unbound kworkers

2013-03-01 Thread Tejun Heo
There are gonna be multiple unbound pools.  Include pool ID in the
name of unbound kworkers.

Signed-off-by: Tejun Heo 
---
 kernel/workqueue.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 95a3dcc..1695bd6 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1705,7 +1705,8 @@ static struct worker *create_worker(struct worker_pool 
*pool)
"kworker/%d:%d%s", pool->cpu, id, pri);
else
worker->task = kthread_create(worker_thread, worker,
- "kworker/u:%d%s", id, pri);
+ "kworker/%du:%d%s",
+ pool->id, id, pri);
if (IS_ERR(worker->task))
goto fail;
 
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 08/31] workqueue: add wokrqueue_struct->maydays list to replace mayday cpu iterators

2013-03-01 Thread Tejun Heo
Similar to how pool_workqueue iteration used to be, raising and
servicing mayday requests is based on CPU numbers.  It's hairy because
cpumask_t may not be able to handle WORK_CPU_UNBOUND and cpumasks are
assumed to be always set on UP.  This is ugly and can't handle
multiple unbound pools to be added for unbound workqueues w/ custom
attributes.

Add workqueue_struct->maydays.  When a pool_workqueue needs rescuing,
it gets chained on the list through pool_workqueue->mayday_node and
rescuer_thread() consumes the list until it's empty.

This patch doesn't introduce any visible behavior changes.

Signed-off-by: Tejun Heo 
---
 kernel/workqueue.c | 77 --
 1 file changed, 28 insertions(+), 49 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 9f195aa..8b38d1c 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -170,6 +170,7 @@ struct pool_workqueue {
int max_active; /* L: max active works */
struct list_headdelayed_works;  /* L: delayed works */
struct list_headpwqs_node;  /* I: node on wq->pwqs */
+   struct list_headmayday_node;/* W: node on wq->maydays */
 } __aligned(1 << WORK_STRUCT_FLAG_BITS);
 
 /*
@@ -182,27 +183,6 @@ struct wq_flusher {
 };
 
 /*
- * All cpumasks are assumed to be always set on UP and thus can't be
- * used to determine whether there's something to be done.
- */
-#ifdef CONFIG_SMP
-typedef cpumask_var_t mayday_mask_t;
-#define mayday_test_and_set_cpu(cpu, mask) \
-   cpumask_test_and_set_cpu((cpu), (mask))
-#define mayday_clear_cpu(cpu, mask)cpumask_clear_cpu((cpu), (mask))
-#define for_each_mayday_cpu(cpu, mask) for_each_cpu((cpu), (mask))
-#define alloc_mayday_mask(maskp, gfp)  zalloc_cpumask_var((maskp), 
(gfp))
-#define free_mayday_mask(mask) free_cpumask_var((mask))
-#else
-typedef unsigned long mayday_mask_t;
-#define mayday_test_and_set_cpu(cpu, mask) test_and_set_bit(0, &(mask))
-#define mayday_clear_cpu(cpu, mask)clear_bit(0, &(mask))
-#define for_each_mayday_cpu(cpu, mask) if ((cpu) = 0, (mask))
-#define alloc_mayday_mask(maskp, gfp)  true
-#define free_mayday_mask(mask) do { } while (0)
-#endif
-
-/*
  * The externally visible workqueue abstraction is an array of
  * per-CPU workqueues:
  */
@@ -224,7 +204,7 @@ struct workqueue_struct {
struct list_headflusher_queue;  /* F: flush waiters */
struct list_headflusher_overflow; /* F: flush overflow list */
 
-   mayday_mask_t   mayday_mask;/* cpus requesting rescue */
+   struct list_headmaydays;/* W: pwqs requesting rescue */
struct worker   *rescuer;   /* I: rescue worker */
 
int nr_drainers;/* W: drain in progress */
@@ -1852,23 +1832,21 @@ static void idle_worker_timeout(unsigned long __pool)
spin_unlock_irq(>lock);
 }
 
-static bool send_mayday(struct work_struct *work)
+static void send_mayday(struct work_struct *work)
 {
struct pool_workqueue *pwq = get_work_pwq(work);
struct workqueue_struct *wq = pwq->wq;
-   unsigned int cpu;
+
+   lockdep_assert_held(_lock);
 
if (!(wq->flags & WQ_RESCUER))
-   return false;
+   return;
 
/* mayday mayday mayday */
-   cpu = pwq->pool->cpu;
-   /* WORK_CPU_UNBOUND can't be set in cpumask, use cpu 0 instead */
-   if (cpu == WORK_CPU_UNBOUND)
-   cpu = 0;
-   if (!mayday_test_and_set_cpu(cpu, wq->mayday_mask))
+   if (list_empty(>mayday_node)) {
+   list_add_tail(>mayday_node, >maydays);
wake_up_process(wq->rescuer->task);
-   return true;
+   }
 }
 
 static void pool_mayday_timeout(unsigned long __pool)
@@ -1876,7 +1854,8 @@ static void pool_mayday_timeout(unsigned long __pool)
struct worker_pool *pool = (void *)__pool;
struct work_struct *work;
 
-   spin_lock_irq(>lock);
+   spin_lock_irq(_lock); /* for wq->maydays */
+   spin_lock(>lock);
 
if (need_to_create_worker(pool)) {
/*
@@ -1889,7 +1868,8 @@ static void pool_mayday_timeout(unsigned long __pool)
send_mayday(work);
}
 
-   spin_unlock_irq(>lock);
+   spin_unlock(>lock);
+   spin_unlock_irq(_lock);
 
mod_timer(>mayday_timer, jiffies + MAYDAY_INTERVAL);
 }
@@ -2338,8 +2318,6 @@ static int rescuer_thread(void *__rescuer)
struct worker *rescuer = __rescuer;
struct workqueue_struct *wq = rescuer->rescue_wq;
struct list_head *scheduled = >scheduled;
-   bool is_unbound = wq->flags & WQ_UNBOUND;
-   unsigned int cpu;
 
set_user_nice(current, RESCUER_NICE_LEVEL);
 
@@ -2357,18 +2335,19 @@ repeat:
return 0;
}
 
-   /*
-* See whether 

[PATCH 22/31] workqueue: restructure __alloc_workqueue_key()

2013-03-01 Thread Tejun Heo
* Move initialization and linking of pool_workqueues into
  init_and_link_pwq().

* Make the failure path use destroy_workqueue() once pool_workqueue
  initialization succeeds.

These changes are to prepare for dynamic management of pool_workqueues
and don't introduce any functional changes.

While at it, convert list_del(>list) to list_del_init() as a
precaution as scheduled changes will make destruction more complex.

Signed-off-by: Tejun Heo 
---
 kernel/workqueue.c | 67 +++---
 1 file changed, 38 insertions(+), 29 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index bcc02bb..d0604ee 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3322,6 +3322,23 @@ fail:
return NULL;
 }
 
+/* initialize @pwq which interfaces with @pool for @wq and link it in */
+static void init_and_link_pwq(struct pool_workqueue *pwq,
+ struct workqueue_struct *wq,
+ struct worker_pool *pool)
+{
+   BUG_ON((unsigned long)pwq & WORK_STRUCT_FLAG_MASK);
+
+   pwq->pool = pool;
+   pwq->wq = wq;
+   pwq->flush_color = -1;
+   pwq->max_active = wq->saved_max_active;
+   INIT_LIST_HEAD(>delayed_works);
+   INIT_LIST_HEAD(>mayday_node);
+
+   list_add_tail_rcu(>pwqs_node, >pwqs);
+}
+
 static int alloc_and_link_pwqs(struct workqueue_struct *wq)
 {
bool highpri = wq->flags & WQ_HIGHPRI;
@@ -3338,23 +3355,23 @@ static int alloc_and_link_pwqs(struct workqueue_struct 
*wq)
struct worker_pool *cpu_pools =
per_cpu(cpu_worker_pools, cpu);
 
-   pwq->pool = _pools[highpri];
-   list_add_tail_rcu(>pwqs_node, >pwqs);
+   init_and_link_pwq(pwq, wq, _pools[highpri]);
}
} else {
struct pool_workqueue *pwq;
+   struct worker_pool *pool;
 
pwq = kmem_cache_zalloc(pwq_cache, GFP_KERNEL);
if (!pwq)
return -ENOMEM;
 
-   pwq->pool = get_unbound_pool(unbound_std_wq_attrs[highpri]);
-   if (!pwq->pool) {
+   pool = get_unbound_pool(unbound_std_wq_attrs[highpri]);
+   if (!pool) {
kmem_cache_free(pwq_cache, pwq);
return -ENOMEM;
}
 
-   list_add_tail_rcu(>pwqs_node, >pwqs);
+   init_and_link_pwq(pwq, wq, pool);
}
 
return 0;
@@ -3399,7 +3416,7 @@ struct workqueue_struct *__alloc_workqueue_key(const char 
*fmt,
 
wq = kzalloc(sizeof(*wq) + namelen, GFP_KERNEL);
if (!wq)
-   goto err;
+   return NULL;
 
vsnprintf(wq->name, namelen, fmt, args1);
va_end(args);
@@ -3422,18 +3439,7 @@ struct workqueue_struct *__alloc_workqueue_key(const 
char *fmt,
INIT_LIST_HEAD(>list);
 
if (alloc_and_link_pwqs(wq) < 0)
-   goto err;
-
-   local_irq_disable();
-   for_each_pwq(pwq, wq) {
-   BUG_ON((unsigned long)pwq & WORK_STRUCT_FLAG_MASK);
-   pwq->wq = wq;
-   pwq->flush_color = -1;
-   pwq->max_active = max_active;
-   INIT_LIST_HEAD(>delayed_works);
-   INIT_LIST_HEAD(>mayday_node);
-   }
-   local_irq_enable();
+   goto err_free_wq;
 
/*
 * Workqueues which may be used during memory reclaim should
@@ -3442,16 +3448,19 @@ struct workqueue_struct *__alloc_workqueue_key(const 
char *fmt,
if (flags & WQ_MEM_RECLAIM) {
struct worker *rescuer;
 
-   wq->rescuer = rescuer = alloc_worker();
+   rescuer = alloc_worker();
if (!rescuer)
-   goto err;
+   goto err_destroy;
 
rescuer->rescue_wq = wq;
rescuer->task = kthread_create(rescuer_thread, rescuer, "%s",
   wq->name);
-   if (IS_ERR(rescuer->task))
-   goto err;
+   if (IS_ERR(rescuer->task)) {
+   kfree(rescuer);
+   goto err_destroy;
+   }
 
+   wq->rescuer = rescuer;
rescuer->task->flags |= PF_THREAD_BOUND;
wake_up_process(rescuer->task);
}
@@ -3472,12 +3481,12 @@ struct workqueue_struct *__alloc_workqueue_key(const 
char *fmt,
spin_unlock_irq(_lock);
 
return wq;
-err:
-   if (wq) {
-   free_pwqs(wq);
-   kfree(wq->rescuer);
-   kfree(wq);
-   }
+
+err_free_wq:
+   kfree(wq);
+   return NULL;
+err_destroy:
+   destroy_workqueue(wq);
return NULL;
 }
 EXPORT_SYMBOL_GPL(__alloc_workqueue_key);
@@ -3519,7 +3528,7 @@ void destroy_workqueue(struct workqueue_struct *wq)
 * wq 

[PATCH 07/31] workqueue: restructure pool / pool_workqueue iterations in freeze/thaw functions

2013-03-01 Thread Tejun Heo
The three freeze/thaw related functions - freeze_workqueues_begin(),
freeze_workqueues_busy() and thaw_workqueues() - need to iterate
through all pool_workqueues of all freezable workqueues.  They did it
by first iterating pools and then visiting all pwqs (pool_workqueues)
of all workqueues and process it if its pwq->pool matches the current
pool.  This is rather backwards and done this way partly because
workqueue didn't have fitting iteration helpers and partly to avoid
the number of lock operations on pool->lock.

Workqueue now has fitting iterators and the locking operation overhead
isn't anything to worry about - those locks are unlikely to be
contended and the same CPU visiting the same set of locks multiple
times isn't expensive.

Restructure the three functions such that the flow better matches the
logical steps and pwq iteration is done using for_each_pwq() inside
workqueue iteration.

* freeze_workqueues_begin(): Setting of FREEZING is moved into a
  separate for_each_pool() iteration.  pwq iteration for clearing
  max_active is updated as described above.

* freeze_workqueues_busy(): pwq iteration updated as described above.

* thaw_workqueues(): The single for_each_wq_cpu() iteration is broken
  into three discrete steps - clearing FREEZING, restoring max_active,
  and kicking workers.  The first and last steps use for_each_pool()
  and the second step uses pwq iteration described above.

This makes the code easier to understand and removes the use of
for_each_wq_cpu() for walking pwqs, which can't support multiple
unbound pwqs which will be needed to implement unbound workqueues with
custom attributes.

This patch doesn't introduce any visible behavior changes.

Signed-off-by: Tejun Heo 
---
 kernel/workqueue.c | 87 --
 1 file changed, 45 insertions(+), 42 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 869dbcc..9f195aa 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3598,6 +3598,8 @@ EXPORT_SYMBOL_GPL(work_on_cpu);
 void freeze_workqueues_begin(void)
 {
struct worker_pool *pool;
+   struct workqueue_struct *wq;
+   struct pool_workqueue *pwq;
int id;
 
spin_lock_irq(_lock);
@@ -3605,23 +3607,24 @@ void freeze_workqueues_begin(void)
WARN_ON_ONCE(workqueue_freezing);
workqueue_freezing = true;
 
+   /* set FREEZING */
for_each_pool(pool, id) {
-   struct workqueue_struct *wq;
-
spin_lock(>lock);
-
WARN_ON_ONCE(pool->flags & POOL_FREEZING);
pool->flags |= POOL_FREEZING;
+   spin_unlock(>lock);
+   }
 
-   list_for_each_entry(wq, , list) {
-   struct pool_workqueue *pwq = get_pwq(pool->cpu, wq);
+   /* suppress further executions by setting max_active to zero */
+   list_for_each_entry(wq, , list) {
+   if (!(wq->flags & WQ_FREEZABLE))
+   continue;
 
-   if (pwq && pwq->pool == pool &&
-   (wq->flags & WQ_FREEZABLE))
-   pwq->max_active = 0;
+   for_each_pwq(pwq, wq) {
+   spin_lock(>pool->lock);
+   pwq->max_active = 0;
+   spin_unlock(>pool->lock);
}
-
-   spin_unlock(>lock);
}
 
spin_unlock_irq(_lock);
@@ -3642,25 +3645,22 @@ void freeze_workqueues_begin(void)
  */
 bool freeze_workqueues_busy(void)
 {
-   unsigned int cpu;
bool busy = false;
+   struct workqueue_struct *wq;
+   struct pool_workqueue *pwq;
 
spin_lock_irq(_lock);
 
WARN_ON_ONCE(!workqueue_freezing);
 
-   for_each_wq_cpu(cpu) {
-   struct workqueue_struct *wq;
+   list_for_each_entry(wq, , list) {
+   if (!(wq->flags & WQ_FREEZABLE))
+   continue;
/*
 * nr_active is monotonically decreasing.  It's safe
 * to peek without lock.
 */
-   list_for_each_entry(wq, , list) {
-   struct pool_workqueue *pwq = get_pwq(cpu, wq);
-
-   if (!pwq || !(wq->flags & WQ_FREEZABLE))
-   continue;
-
+   for_each_pwq(pwq, wq) {
WARN_ON_ONCE(pwq->nr_active < 0);
if (pwq->nr_active) {
busy = true;
@@ -3684,40 +3684,43 @@ out_unlock:
  */
 void thaw_workqueues(void)
 {
-   unsigned int cpu;
+   struct workqueue_struct *wq;
+   struct pool_workqueue *pwq;
+   struct worker_pool *pool;
+   int id;
 
spin_lock_irq(_lock);
 
if (!workqueue_freezing)
goto out_unlock;
 
-   for_each_wq_cpu(cpu) {
-   struct worker_pool *pool;
-   struct workqueue_struct *wq;
-
-   

[PATCH 06/31] workqueue: introduce for_each_pool()

2013-03-01 Thread Tejun Heo
With the scheduled unbound pools with custom attributes, there will be
multiple unbound pools, so it wouldn't be able to use
for_each_wq_cpu() + for_each_std_worker_pool() to iterate through all
pools.

Introduce for_each_pool() which iterates through all pools using
worker_pool_idr and use it instead of for_each_wq_cpu() +
for_each_std_worker_pool() combination in freeze_workqueues_begin().

Signed-off-by: Tejun Heo 
---
 kernel/workqueue.c | 36 +---
 1 file changed, 21 insertions(+), 15 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 0055a31..869dbcc 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -295,6 +295,14 @@ static inline int __next_wq_cpu(int cpu, const struct 
cpumask *mask,
 (cpu) = __next_wq_cpu((cpu), cpu_online_mask, 3))
 
 /**
+ * for_each_pool - iterate through all worker_pools in the system
+ * @pool: iteration cursor
+ * @id: integer used for iteration
+ */
+#define for_each_pool(pool, id)
\
+   idr_for_each_entry(_pool_idr, pool, id)
+
+/**
  * for_each_pwq - iterate through all pool_workqueues of the specified 
workqueue
  * @pwq: iteration cursor
  * @wq: the target workqueue
@@ -3589,33 +3597,31 @@ EXPORT_SYMBOL_GPL(work_on_cpu);
  */
 void freeze_workqueues_begin(void)
 {
-   unsigned int cpu;
+   struct worker_pool *pool;
+   int id;
 
spin_lock_irq(_lock);
 
WARN_ON_ONCE(workqueue_freezing);
workqueue_freezing = true;
 
-   for_each_wq_cpu(cpu) {
-   struct worker_pool *pool;
+   for_each_pool(pool, id) {
struct workqueue_struct *wq;
 
-   for_each_std_worker_pool(pool, cpu) {
-   spin_lock(>lock);
-
-   WARN_ON_ONCE(pool->flags & POOL_FREEZING);
-   pool->flags |= POOL_FREEZING;
+   spin_lock(>lock);
 
-   list_for_each_entry(wq, , list) {
-   struct pool_workqueue *pwq = get_pwq(cpu, wq);
+   WARN_ON_ONCE(pool->flags & POOL_FREEZING);
+   pool->flags |= POOL_FREEZING;
 
-   if (pwq && pwq->pool == pool &&
-   (wq->flags & WQ_FREEZABLE))
-   pwq->max_active = 0;
-   }
+   list_for_each_entry(wq, , list) {
+   struct pool_workqueue *pwq = get_pwq(pool->cpu, wq);
 
-   spin_unlock(>lock);
+   if (pwq && pwq->pool == pool &&
+   (wq->flags & WQ_FREEZABLE))
+   pwq->max_active = 0;
}
+
+   spin_unlock(>lock);
}
 
spin_unlock_irq(_lock);
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 23/31] workqueue: implement get/put_pwq()

2013-03-01 Thread Tejun Heo
Add pool_workqueue->refcnt along with get/put_pwq().  Both per-cpu and
unbound pwqs have refcnts and any work item inserted on a pwq
increments the refcnt which is dropped when the work item finishes.

For per-cpu pwqs the base ref is never dropped and destroy_workqueue()
frees the pwqs as before.  For unbound ones, destroy_workqueue()
simply drops the base ref on the first pwq.  When the refcnt reaches
zero, pwq_unbound_release_workfn() is scheduled on system_wq, which
unlinks the pwq, puts the associated pool and frees the pwq and wq as
necessary.  This needs to be done from a work item as put_pwq() needs
to be protected by pool->lock but release can't happen with the lock
held - e.g. put_unbound_pool() involves blocking operations.

Unbound pool->locks are marked with lockdep subclas 1 as put_pwq()
will schedule the release work item on system_wq while holding the
unbound pool's lock and triggers recursive locking warning spuriously.

This will be used to implement dynamic creation and destruction of
unbound pwqs.

Signed-off-by: Tejun Heo 
---
 kernel/workqueue.c | 137 -
 1 file changed, 114 insertions(+), 23 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index d0604ee..e092cd5 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -179,6 +179,7 @@ struct pool_workqueue {
struct workqueue_struct *wq;/* I: the owning workqueue */
int work_color; /* L: current color */
int flush_color;/* L: flushing color */
+   int refcnt; /* L: reference count */
int nr_in_flight[WORK_NR_COLORS];
/* L: nr of in_flight works */
int nr_active;  /* L: nr of active works */
@@ -186,6 +187,15 @@ struct pool_workqueue {
struct list_headdelayed_works;  /* L: delayed works */
struct list_headpwqs_node;  /* R: node on wq->pwqs */
struct list_headmayday_node;/* W: node on wq->maydays */
+
+   /*
+* Release of unbound pwq is punted to system_wq.  See put_pwq()
+* and pwq_unbound_release_workfn() for details.  pool_workqueue
+* itself is also sched-RCU protected so that the first pwq can be
+* determined without grabbing workqueue_lock.
+*/
+   struct work_struct  unbound_release_work;
+   struct rcu_head rcu;
 } __aligned(1 << WORK_STRUCT_FLAG_BITS);
 
 /*
@@ -936,6 +946,45 @@ static void move_linked_works(struct work_struct *work, 
struct list_head *head,
*nextp = n;
 }
 
+/**
+ * get_pwq - get an extra reference on the specified pool_workqueue
+ * @pwq: pool_workqueue to get
+ *
+ * Obtain an extra reference on @pwq.  The caller should guarantee that
+ * @pwq has positive refcnt and be holding the matching pool->lock.
+ */
+static void get_pwq(struct pool_workqueue *pwq)
+{
+   lockdep_assert_held(>pool->lock);
+   WARN_ON_ONCE(pwq->refcnt <= 0);
+   pwq->refcnt++;
+}
+
+/**
+ * put_pwq - put a pool_workqueue reference
+ * @pwq: pool_workqueue to put
+ *
+ * Drop a reference of @pwq.  If its refcnt reaches zero, schedule its
+ * destruction.  The caller should be holding the matching pool->lock.
+ */
+static void put_pwq(struct pool_workqueue *pwq)
+{
+   lockdep_assert_held(>pool->lock);
+   if (likely(--pwq->refcnt))
+   return;
+   if (WARN_ON_ONCE(!(pwq->wq->flags & WQ_UNBOUND)))
+   return;
+   /*
+* @pwq can't be released under pool->lock, bounce to
+* pwq_unbound_release_workfn().  This never recurses on the same
+* pool->lock as this path is taken only for unbound workqueues and
+* the release work item is scheduled on a per-cpu workqueue.  To
+* avoid lockdep warning, unbound pool->locks are given lockdep
+* subclass of 1 in get_unbound_pool().
+*/
+   schedule_work(>unbound_release_work);
+}
+
 static void pwq_activate_delayed_work(struct work_struct *work)
 {
struct pool_workqueue *pwq = get_work_pwq(work);
@@ -967,9 +1016,9 @@ static void pwq_activate_first_delayed(struct 
pool_workqueue *pwq)
  */
 static void pwq_dec_nr_in_flight(struct pool_workqueue *pwq, int color)
 {
-   /* ignore uncolored works */
+   /* uncolored work items don't participate in flushing or nr_active */
if (color == WORK_NO_COLOR)
-   return;
+   goto out_put;
 
pwq->nr_in_flight[color]--;
 
@@ -982,11 +1031,11 @@ static void pwq_dec_nr_in_flight(struct pool_workqueue 
*pwq, int color)
 
/* is flush in progress and are we at the flushing tip? */
if (likely(pwq->flush_color != color))
-   return;
+   goto out_put;
 
/* are there still in-flight works? */
if (pwq->nr_in_flight[color])
- 

[PATCH 24/31] workqueue: prepare flush_workqueue() for dynamic creation and destrucion of unbound pool_workqueues

2013-03-01 Thread Tejun Heo
Unbound pwqs (pool_workqueues) will be dynamically created and
destroyed with the scheduled unbound workqueue w/ custom attributes
support.  This patch synchronizes pwq linking and unlinking against
flush_workqueue() so that its operation isn't disturbed by pwqs coming
and going.

Linking and unlinking a pwq into wq->pwqs is now protected also by
wq->flush_mutex and a new pwq's work_color is initialized to
wq->work_color during linking.  This ensures that pwqs changes don't
disturb flush_workqueue() in progress and the new pwq's work coloring
stays in sync with the rest of the workqueue.

flush_mutex during unlinking isn't strictly necessary but it's simpler
to do it anyway.

Signed-off-by: Tejun Heo 
---
 kernel/workqueue.c | 25 +++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index e092cd5..0f0da59 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -122,6 +122,9 @@ enum {
  * W: workqueue_lock protected.
  *
  * R: workqueue_lock protected for writes.  Sched-RCU protected for reads.
+ *
+ * FR: wq->flush_mutex and workqueue_lock protected for writes.  Sched-RCU
+ * protected for reads.
  */
 
 /* struct worker is defined in workqueue_internal.h */
@@ -185,7 +188,7 @@ struct pool_workqueue {
int nr_active;  /* L: nr of active works */
int max_active; /* L: max active works */
struct list_headdelayed_works;  /* L: delayed works */
-   struct list_headpwqs_node;  /* R: node on wq->pwqs */
+   struct list_headpwqs_node;  /* FR: node on wq->pwqs */
struct list_headmayday_node;/* W: node on wq->maydays */
 
/*
@@ -214,7 +217,7 @@ struct wq_flusher {
 struct workqueue_struct {
unsigned intflags;  /* W: WQ_* flags */
struct pool_workqueue __percpu *cpu_pwqs; /* I: per-cpu pwq's */
-   struct list_headpwqs;   /* R: all pwqs of this wq */
+   struct list_headpwqs;   /* FR: all pwqs of this wq */
struct list_headlist;   /* W: list of all workqueues */
 
struct mutexflush_mutex;/* protects wq flushing */
@@ -3395,9 +3398,16 @@ static void pwq_unbound_release_workfn(struct 
work_struct *work)
if (WARN_ON_ONCE(!(wq->flags & WQ_UNBOUND)))
return;
 
+   /*
+* Unlink @pwq.  Synchronization against flush_mutex isn't strictly
+* necessary on release but do it anyway.  It's easier to verify
+* and consistent with the linking path.
+*/
+   mutex_lock(>flush_mutex);
spin_lock_irq(_lock);
list_del_rcu(>pwqs_node);
spin_unlock_irq(_lock);
+   mutex_unlock(>flush_mutex);
 
put_unbound_pool(pool);
call_rcu_sched(>rcu, rcu_free_pwq);
@@ -3425,7 +3435,18 @@ static void init_and_link_pwq(struct pool_workqueue *pwq,
INIT_LIST_HEAD(>mayday_node);
INIT_WORK(>unbound_release_work, pwq_unbound_release_workfn);
 
+   /*
+* Link @pwq and set the matching work_color.  This is synchronized
+* with flush_mutex to avoid confusing flush_workqueue().
+*/
+   mutex_lock(>flush_mutex);
+   spin_lock_irq(_lock);
+
+   pwq->work_color = wq->work_color;
list_add_tail_rcu(>pwqs_node, >pwqs);
+
+   spin_unlock_irq(_lock);
+   mutex_unlock(>flush_mutex);
 }
 
 static int alloc_and_link_pwqs(struct workqueue_struct *wq)
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 25/31] workqueue: perform non-reentrancy test when queueing to unbound workqueues too

2013-03-01 Thread Tejun Heo
Because per-cpu workqueues have multiple pwqs (pool_workqueues) to
serve the CPUs, to guarantee that a single work item isn't queued on
one pwq while still executing another, __queue_work() takes a look at
the previous pool the target work item was on and if it's still
executing there, queue the work item on that pool.

To support changing workqueue_attrs on the fly, unbound workqueues too
will have multiple pwqs and thus need non-reentrancy test when
queueing.  This patch modifies __queue_work() such that the reentrancy
test is performed regardless of the workqueue type.

per_cpu_ptr(wq->cpu_pwqs, cpu) used to be used to determine the
matching pwq for the last pool.  This can't be used for unbound
workqueues and is replaced with worker->current_pwq which also happens
to be simpler.

Signed-off-by: Tejun Heo 
---
 kernel/workqueue.c | 42 +++---
 1 file changed, 19 insertions(+), 23 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 0f0da59..4c67967 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1206,6 +1206,7 @@ static void __queue_work(int cpu, struct workqueue_struct 
*wq,
 struct work_struct *work)
 {
struct pool_workqueue *pwq;
+   struct worker_pool *last_pool;
struct list_head *worklist;
unsigned int work_flags;
unsigned int req_cpu = cpu;
@@ -1225,41 +1226,36 @@ static void __queue_work(int cpu, struct 
workqueue_struct *wq,
WARN_ON_ONCE(!is_chained_work(wq)))
return;
 
-   /* determine the pwq to use */
+   /* pwq which will be used unless @work is executing elsewhere */
if (!(wq->flags & WQ_UNBOUND)) {
-   struct worker_pool *last_pool;
-
if (cpu == WORK_CPU_UNBOUND)
cpu = raw_smp_processor_id();
-
-   /*
-* It's multi cpu.  If @work was previously on a different
-* cpu, it might still be running there, in which case the
-* work needs to be queued on that cpu to guarantee
-* non-reentrancy.
-*/
pwq = per_cpu_ptr(wq->cpu_pwqs, cpu);
-   last_pool = get_work_pool(work);
+   } else {
+   pwq = first_pwq(wq);
+   }
 
-   if (last_pool && last_pool != pwq->pool) {
-   struct worker *worker;
+   /*
+* If @work was previously on a different pool, it might still be
+* running there, in which case the work needs to be queued on that
+* pool to guarantee non-reentrancy.
+*/
+   last_pool = get_work_pool(work);
+   if (last_pool && last_pool != pwq->pool) {
+   struct worker *worker;
 
-   spin_lock(_pool->lock);
+   spin_lock(_pool->lock);
 
-   worker = find_worker_executing_work(last_pool, work);
+   worker = find_worker_executing_work(last_pool, work);
 
-   if (worker && worker->current_pwq->wq == wq) {
-   pwq = per_cpu_ptr(wq->cpu_pwqs, last_pool->cpu);
-   } else {
-   /* meh... not running there, queue here */
-   spin_unlock(_pool->lock);
-   spin_lock(>pool->lock);
-   }
+   if (worker && worker->current_pwq->wq == wq) {
+   pwq = worker->current_pwq;
} else {
+   /* meh... not running there, queue here */
+   spin_unlock(_pool->lock);
spin_lock(>pool->lock);
}
} else {
-   pwq = first_pwq(wq);
spin_lock(>pool->lock);
}
 
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 26/31] workqueue: implement apply_workqueue_attrs()

2013-03-01 Thread Tejun Heo
Implement apply_workqueue_attrs() which applies workqueue_attrs to the
specified unbound workqueue by creating a new pwq (pool_workqueue)
linked to worker_pool with the specified attributes.

A new pwq is linked at the head of wq->pwqs instead of tail and
__queue_work() verifies that the first unbound pwq has positive refcnt
before choosing it for the actual queueing.  This is to cover the case
where creation of a new pwq races with queueing.  As base ref on a pwq
won't be dropped without making another pwq the first one,
__queue_work() is guaranteed to make progress and not add work item to
a dead pwq.

init_and_link_pwq() is updated to return the last first pwq the new
pwq replaced, which is put by apply_workqueue_attrs().

Note that apply_workqueue_attrs() is almost identical to unbound pwq
part of alloc_and_link_pwqs().  The only difference is that there is
no previous first pwq.  apply_workqueue_attrs() is implemented to
handle such cases and replaces unbound pwq handling in
alloc_and_link_pwqs().

Signed-off-by: Tejun Heo 
---
 include/linux/workqueue.h |  2 ++
 kernel/workqueue.c| 91 ---
 2 files changed, 73 insertions(+), 20 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 0341403..c8c3bf4 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -409,6 +409,8 @@ extern void destroy_workqueue(struct workqueue_struct *wq);
 
 struct workqueue_attrs *alloc_workqueue_attrs(gfp_t gfp_mask);
 void free_workqueue_attrs(struct workqueue_attrs *attrs);
+int apply_workqueue_attrs(struct workqueue_struct *wq,
+ const struct workqueue_attrs *attrs);
 
 extern bool queue_work_on(int cpu, struct workqueue_struct *wq,
struct work_struct *work);
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 4c67967..36fcf9c 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1225,7 +1225,7 @@ static void __queue_work(int cpu, struct workqueue_struct 
*wq,
if (unlikely(wq->flags & WQ_DRAINING) &&
WARN_ON_ONCE(!is_chained_work(wq)))
return;
-
+retry:
/* pwq which will be used unless @work is executing elsewhere */
if (!(wq->flags & WQ_UNBOUND)) {
if (cpu == WORK_CPU_UNBOUND)
@@ -1259,6 +1259,25 @@ static void __queue_work(int cpu, struct 
workqueue_struct *wq,
spin_lock(>pool->lock);
}
 
+   /*
+* pwq is determined and locked.  For unbound pools, we could have
+* raced with pwq release and it could already be dead.  If its
+* refcnt is zero, repeat pwq selection.  Note that pwqs never die
+* without another pwq replacing it as the first pwq or while a
+* work item is executing on it, so the retying is guaranteed to
+* make forward-progress.
+*/
+   if (unlikely(!pwq->refcnt)) {
+   if (wq->flags & WQ_UNBOUND) {
+   spin_unlock(>pool->lock);
+   cpu_relax();
+   goto retry;
+   }
+   /* oops */
+   WARN_ONCE(true, "workqueue: per-cpu pwq for %s on cpu%d has 0 
refcnt",
+ wq->name, cpu);
+   }
+
/* pwq determined, queue */
trace_workqueue_queue_work(req_cpu, pwq, work);
 
@@ -3418,7 +3437,8 @@ static void pwq_unbound_release_workfn(struct work_struct 
*work)
 
 static void init_and_link_pwq(struct pool_workqueue *pwq,
  struct workqueue_struct *wq,
- struct worker_pool *pool)
+ struct worker_pool *pool,
+ struct pool_workqueue **p_last_pwq)
 {
BUG_ON((unsigned long)pwq & WORK_STRUCT_FLAG_MASK);
 
@@ -3438,13 +3458,58 @@ static void init_and_link_pwq(struct pool_workqueue 
*pwq,
mutex_lock(>flush_mutex);
spin_lock_irq(_lock);
 
+   if (p_last_pwq)
+   *p_last_pwq = first_pwq(wq);
pwq->work_color = wq->work_color;
-   list_add_tail_rcu(>pwqs_node, >pwqs);
+   list_add_rcu(>pwqs_node, >pwqs);
 
spin_unlock_irq(_lock);
mutex_unlock(>flush_mutex);
 }
 
+/**
+ * apply_workqueue_attrs - apply new workqueue_attrs to an unbound workqueue
+ * @wq: the target workqueue
+ * @attrs: the workqueue_attrs to apply, allocated with alloc_workqueue_attrs()
+ *
+ * Apply @attrs to an unbound workqueue @wq.  If @attrs doesn't match the
+ * current attributes, a new pwq is created and made the first pwq which
+ * will serve all new work items.  Older pwqs are released as in-flight
+ * work items finish.  Note that a work item which repeatedly requeues
+ * itself back-to-back will stay on its current pwq.
+ *
+ * Performs GFP_KERNEL allocations.  Returns 0 on success and -errno on
+ * failure.
+ */
+int apply_workqueue_attrs(struct workqueue_struct *wq,
+ const struct 

[PATCH 11/31] workqueue: replace get_pwq() with explicit per_cpu_ptr() accesses and first_pwq()

2013-03-01 Thread Tejun Heo
get_pwq() takes @cpu, which can also be WORK_CPU_UNBOUND, and @wq and
returns the matching pwq (pool_workqueue).  We want to move away from
using @cpu for identifying pools and pwqs for unbound pools with
custom attributes and there is only one user - workqueue_congested() -
which makes use of the WQ_UNBOUND conditional in get_pwq().  All other
users already know whether they're dealing with a per-cpu or unbound
workqueue.

Replace get_pwq() with explicit per_cpu_ptr(wq->cpu_pwqs, cpu) for
per-cpu workqueues and first_pwq() for unbound ones, and open-code
WQ_UNBOUND conditional in workqueue_congested().

Note that this makes workqueue_congested() behave sligntly differently
when @cpu other than WORK_CPU_UNBOUND is specified.  It ignores @cpu
for unbound workqueues and always uses the first pwq instead of
oopsing.

Signed-off-by: Tejun Heo 
---
 kernel/workqueue.c | 29 ++---
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 79840b9..02f51b8 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -463,16 +463,9 @@ static struct worker_pool *get_std_worker_pool(int cpu, 
bool highpri)
return [highpri];
 }
 
-static struct pool_workqueue *get_pwq(int cpu, struct workqueue_struct *wq)
+static struct pool_workqueue *first_pwq(struct workqueue_struct *wq)
 {
-   if (!(wq->flags & WQ_UNBOUND)) {
-   if (likely(cpu < nr_cpu_ids))
-   return per_cpu_ptr(wq->cpu_pwqs, cpu);
-   } else if (likely(cpu == WORK_CPU_UNBOUND)) {
-   return list_first_entry(>pwqs, struct pool_workqueue,
-   pwqs_node);
-   }
-   return NULL;
+   return list_first_entry(>pwqs, struct pool_workqueue, pwqs_node);
 }
 
 static unsigned int work_color_to_flags(int color)
@@ -1192,7 +1185,7 @@ static void __queue_work(int cpu, struct workqueue_struct 
*wq,
 * work needs to be queued on that cpu to guarantee
 * non-reentrancy.
 */
-   pwq = get_pwq(cpu, wq);
+   pwq = per_cpu_ptr(wq->cpu_pwqs, cpu);
last_pool = get_work_pool(work);
 
if (last_pool && last_pool != pwq->pool) {
@@ -1203,7 +1196,7 @@ static void __queue_work(int cpu, struct workqueue_struct 
*wq,
worker = find_worker_executing_work(last_pool, work);
 
if (worker && worker->current_pwq->wq == wq) {
-   pwq = get_pwq(last_pool->cpu, wq);
+   pwq = per_cpu_ptr(wq->cpu_pwqs, last_pool->cpu);
} else {
/* meh... not running there, queue here */
spin_unlock(_pool->lock);
@@ -1213,7 +1206,7 @@ static void __queue_work(int cpu, struct workqueue_struct 
*wq,
spin_lock(>pool->lock);
}
} else {
-   pwq = get_pwq(WORK_CPU_UNBOUND, wq);
+   pwq = first_pwq(wq);
spin_lock(>pool->lock);
}
 
@@ -1652,7 +1645,7 @@ static void rebind_workers(struct worker_pool *pool)
else
wq = system_wq;
 
-   insert_work(get_pwq(pool->cpu, wq), rebind_work,
+   insert_work(per_cpu_ptr(wq->cpu_pwqs, pool->cpu), rebind_work,
worker->scheduled.next,
work_color_to_flags(WORK_NO_COLOR));
}
@@ -3090,7 +3083,8 @@ static int alloc_and_link_pwqs(struct workqueue_struct 
*wq)
return -ENOMEM;
 
for_each_possible_cpu(cpu) {
-   struct pool_workqueue *pwq = get_pwq(cpu, wq);
+   struct pool_workqueue *pwq =
+   per_cpu_ptr(wq->cpu_pwqs, cpu);
 
pwq->pool = get_std_worker_pool(cpu, highpri);
list_add_tail(>pwqs_node, >pwqs);
@@ -3345,7 +3339,12 @@ EXPORT_SYMBOL_GPL(workqueue_set_max_active);
  */
 bool workqueue_congested(int cpu, struct workqueue_struct *wq)
 {
-   struct pool_workqueue *pwq = get_pwq(cpu, wq);
+   struct pool_workqueue *pwq;
+
+   if (!(wq->flags & WQ_UNBOUND))
+   pwq = per_cpu_ptr(wq->cpu_pwqs, cpu);
+   else
+   pwq = first_pwq(wq);
 
return !list_empty(>delayed_works);
 }
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 10/31] workqueue: remove workqueue_struct->pool_wq.single

2013-03-01 Thread Tejun Heo
workqueue->pool_wq union is used to point either to percpu pwqs
(pool_workqueues) or single unbound pwq.  As the first pwq can be
accessed via workqueue->pwqs list, there's no reason for the single
pointer anymore.

Use list_first_entry(workqueue->pwqs) to access the unbound pwq and
drop workqueue->pool_wq.single pointer and the pool_wq union.  It
simplifies the code and eases implementing multiple unbound pools w/
custom attributes.

This patch doesn't introduce any visible behavior changes.

Signed-off-by: Tejun Heo 
---
 kernel/workqueue.c | 26 --
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index cbdc2ac..79840b9 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -188,11 +188,7 @@ struct wq_flusher {
  */
 struct workqueue_struct {
unsigned intflags;  /* W: WQ_* flags */
-   union {
-   struct pool_workqueue __percpu  *pcpu;
-   struct pool_workqueue   *single;
-   unsigned long   v;
-   } pool_wq;  /* I: pwq's */
+   struct pool_workqueue __percpu *cpu_pwqs; /* I: per-cpu pwq's */
struct list_headpwqs;   /* I: all pwqs of this wq */
struct list_headlist;   /* W: list of all workqueues */
 
@@ -471,9 +467,11 @@ static struct pool_workqueue *get_pwq(int cpu, struct 
workqueue_struct *wq)
 {
if (!(wq->flags & WQ_UNBOUND)) {
if (likely(cpu < nr_cpu_ids))
-   return per_cpu_ptr(wq->pool_wq.pcpu, cpu);
-   } else if (likely(cpu == WORK_CPU_UNBOUND))
-   return wq->pool_wq.single;
+   return per_cpu_ptr(wq->cpu_pwqs, cpu);
+   } else if (likely(cpu == WORK_CPU_UNBOUND)) {
+   return list_first_entry(>pwqs, struct pool_workqueue,
+   pwqs_node);
+   }
return NULL;
 }
 
@@ -3087,8 +3085,8 @@ static int alloc_and_link_pwqs(struct workqueue_struct 
*wq)
int cpu;
 
if (!(wq->flags & WQ_UNBOUND)) {
-   wq->pool_wq.pcpu = alloc_percpu(struct pool_workqueue);
-   if (!wq->pool_wq.pcpu)
+   wq->cpu_pwqs = alloc_percpu(struct pool_workqueue);
+   if (!wq->cpu_pwqs)
return -ENOMEM;
 
for_each_possible_cpu(cpu) {
@@ -3104,7 +3102,6 @@ static int alloc_and_link_pwqs(struct workqueue_struct 
*wq)
if (!pwq)
return -ENOMEM;
 
-   wq->pool_wq.single = pwq;
pwq->pool = get_std_worker_pool(WORK_CPU_UNBOUND, highpri);
list_add_tail(>pwqs_node, >pwqs);
}
@@ -3115,9 +3112,10 @@ static int alloc_and_link_pwqs(struct workqueue_struct 
*wq)
 static void free_pwqs(struct workqueue_struct *wq)
 {
if (!(wq->flags & WQ_UNBOUND))
-   free_percpu(wq->pool_wq.pcpu);
-   else
-   kmem_cache_free(pwq_cache, wq->pool_wq.single);
+   free_percpu(wq->cpu_pwqs);
+   else if (!list_empty(>pwqs))
+   kmem_cache_free(pwq_cache, list_first_entry(>pwqs,
+   struct pool_workqueue, pwqs_node));
 }
 
 static int wq_clamp_max_active(int max_active, unsigned int flags,
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 09/31] workqueue: consistently use int for @cpu variables

2013-03-01 Thread Tejun Heo
Workqueue is mixing unsigned int and int for @cpu variables.  There's
no point in using unsigned int for cpus - many of cpu related APIs
take int anyway.  Consistently use int for @cpu variables so that we
can use negative values to mark special ones.

This patch doesn't introduce any visible behavior changes.

Signed-off-by: Tejun Heo 
---
 include/linux/workqueue.h   |  6 +++---
 kernel/workqueue.c  | 24 +++-
 kernel/workqueue_internal.h |  5 ++---
 3 files changed, 16 insertions(+), 19 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 5bd030f..899be66 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -435,7 +435,7 @@ extern bool cancel_delayed_work_sync(struct delayed_work 
*dwork);
 
 extern void workqueue_set_max_active(struct workqueue_struct *wq,
 int max_active);
-extern bool workqueue_congested(unsigned int cpu, struct workqueue_struct *wq);
+extern bool workqueue_congested(int cpu, struct workqueue_struct *wq);
 extern unsigned int work_busy(struct work_struct *work);
 
 /*
@@ -466,12 +466,12 @@ static inline bool __deprecated 
flush_delayed_work_sync(struct delayed_work *dwo
 }
 
 #ifndef CONFIG_SMP
-static inline long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)
+static inline long work_on_cpu(int cpu, long (*fn)(void *), void *arg)
 {
return fn(arg);
 }
 #else
-long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg);
+long work_on_cpu(int cpu, long (*fn)(void *), void *arg);
 #endif /* CONFIG_SMP */
 
 #ifdef CONFIG_FREEZER
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 8b38d1c..cbdc2ac 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -124,7 +124,7 @@ enum {
 
 struct worker_pool {
spinlock_t  lock;   /* the pool lock */
-   unsigned intcpu;/* I: the associated cpu */
+   int cpu;/* I: the associated cpu */
int id; /* I: pool ID */
unsigned intflags;  /* X: flags */
 
@@ -467,8 +467,7 @@ static struct worker_pool *get_std_worker_pool(int cpu, 
bool highpri)
return [highpri];
 }
 
-static struct pool_workqueue *get_pwq(unsigned int cpu,
- struct workqueue_struct *wq)
+static struct pool_workqueue *get_pwq(int cpu, struct workqueue_struct *wq)
 {
if (!(wq->flags & WQ_UNBOUND)) {
if (likely(cpu < nr_cpu_ids))
@@ -730,7 +729,7 @@ static void wake_up_worker(struct worker_pool *pool)
  * CONTEXT:
  * spin_lock_irq(rq->lock)
  */
-void wq_worker_waking_up(struct task_struct *task, unsigned int cpu)
+void wq_worker_waking_up(struct task_struct *task, int cpu)
 {
struct worker *worker = kthread_data(task);
 
@@ -755,8 +754,7 @@ void wq_worker_waking_up(struct task_struct *task, unsigned 
int cpu)
  * RETURNS:
  * Worker task on @cpu to wake up, %NULL if none.
  */
-struct task_struct *wq_worker_sleeping(struct task_struct *task,
-  unsigned int cpu)
+struct task_struct *wq_worker_sleeping(struct task_struct *task, int cpu)
 {
struct worker *worker = kthread_data(task), *to_wakeup = NULL;
struct worker_pool *pool;
@@ -1160,7 +1158,7 @@ static bool is_chained_work(struct workqueue_struct *wq)
return worker && worker->current_pwq->wq == wq;
 }
 
-static void __queue_work(unsigned int cpu, struct workqueue_struct *wq,
+static void __queue_work(int cpu, struct workqueue_struct *wq,
 struct work_struct *work)
 {
struct pool_workqueue *pwq;
@@ -1716,7 +1714,7 @@ static struct worker *create_worker(struct worker_pool 
*pool)
if (pool->cpu != WORK_CPU_UNBOUND)
worker->task = kthread_create_on_node(worker_thread,
worker, cpu_to_node(pool->cpu),
-   "kworker/%u:%d%s", pool->cpu, id, pri);
+   "kworker/%d:%d%s", pool->cpu, id, pri);
else
worker->task = kthread_create(worker_thread, worker,
  "kworker/u:%d%s", id, pri);
@@ -3347,7 +3345,7 @@ EXPORT_SYMBOL_GPL(workqueue_set_max_active);
  * RETURNS:
  * %true if congested, %false otherwise.
  */
-bool workqueue_congested(unsigned int cpu, struct workqueue_struct *wq)
+bool workqueue_congested(int cpu, struct workqueue_struct *wq)
 {
struct pool_workqueue *pwq = get_pwq(cpu, wq);
 
@@ -3464,7 +3462,7 @@ static int __cpuinit workqueue_cpu_up_callback(struct 
notifier_block *nfb,
   unsigned long action,
   void *hcpu)
 {
-   unsigned int cpu = (unsigned long)hcpu;
+   int cpu = (unsigned long)hcpu;
struct worker_pool *pool;
 
switch (action & 

[PATCH 29/31] cpumask: implement cpumask_parse()

2013-03-01 Thread Tejun Heo
We have cpulist_parse() but not cpumask_parse().  Implement it using
bitmap_parse().

bitmap_parse() is weird in that it takes @len for a string in
kernel-memory which also is inconsistent with bitmap_parselist().
Make cpumask_parse() calculate the length and don't expose the
inconsistency to cpumask users.  Maybe we can fix up bitmap_parse()
later.

This will be used to expose workqueue cpumask knobs to userland via
sysfs.

Signed-off-by: Tejun Heo 
Cc: Rusty Russell 
---
Rusty, if this looks okay to you, would it be okay for me to route it
together with the rest of workqueue changes?

Thanks.

 include/linux/cpumask.h | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index 0325602..d08e4d2 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -591,6 +591,21 @@ static inline int cpulist_scnprintf(char *buf, int len,
 }
 
 /**
+ * cpumask_parse - extract a cpumask from from a string
+ * @buf: the buffer to extract from
+ * @dstp: the cpumask to set.
+ *
+ * Returns -errno, or 0 for success.
+ */
+static inline int cpumask_parse(const char *buf, struct cpumask *dstp)
+{
+   char *nl = strchr(buf, '\n');
+   int len = nl ? nl - buf : strlen(buf);
+
+   return bitmap_parse(buf, len, cpumask_bits(dstp), nr_cpumask_bits);
+}
+
+/**
  * cpulist_parse - extract a cpumask from a user string of ranges
  * @buf: the buffer to extract from
  * @dstp: the cpumask to set.
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 28/31] workqueue: reject increasing max_active for ordered workqueues

2013-03-01 Thread Tejun Heo
Workqueue will soon allow exposing control knobs to userland via
sysfs.  Increasing max_active for an ordered workqueue breaks
correctness.  Tag ordered workqueues with __WQ_ORDERED and always
limit max_active at 1.

Signed-off-by: Tejun Heo 
---
 include/linux/workqueue.h |  3 ++-
 kernel/workqueue.c| 11 ++-
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index fc7f882..e1e5748 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -294,6 +294,7 @@ enum {
WQ_CPU_INTENSIVE= 1 << 5, /* cpu instensive workqueue */
 
__WQ_DRAINING   = 1 << 16, /* internal: workqueue is draining */
+   __WQ_ORDERED= 1 << 17, /* internal: workqueue is ordered */
 
WQ_MAX_ACTIVE   = 512,/* I like 512, better ideas? */
WQ_MAX_UNBOUND_PER_CPU  = 4,  /* 4 * #cpus for unbound wq */
@@ -396,7 +397,7 @@ __alloc_workqueue_key(const char *fmt, unsigned int flags, 
int max_active,
  * Pointer to the allocated workqueue on success, %NULL on failure.
  */
 #define alloc_ordered_workqueue(fmt, flags, args...)   \
-   alloc_workqueue(fmt, WQ_UNBOUND | (flags), 1, ##args)
+   alloc_workqueue(fmt, WQ_UNBOUND | __WQ_ORDERED | (flags), 1, ##args)
 
 #define create_workqueue(name) \
alloc_workqueue((name), WQ_MEM_RECLAIM, 1)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 2016c9e..8d487f6 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3537,7 +3537,16 @@ static int alloc_and_link_pwqs(struct workqueue_struct 
*wq)
 static int wq_clamp_max_active(int max_active, unsigned int flags,
   const char *name)
 {
-   int lim = flags & WQ_UNBOUND ? WQ_UNBOUND_MAX_ACTIVE : WQ_MAX_ACTIVE;
+   int lim;
+
+   if (flags & WQ_UNBOUND) {
+   if (flags & __WQ_ORDERED)
+   lim = 1;
+   else
+   lim = WQ_UNBOUND_MAX_ACTIVE;
+   } else {
+   lim = WQ_MAX_ACTIVE;
+   }
 
if (max_active < 1 || max_active > lim)
pr_warn("workqueue: max_active %d requested for %s is out of 
range, clamping between %d and %d\n",
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 30/31] driver/base: implement subsys_virtual_register()

2013-03-01 Thread Tejun Heo
Kay tells me the most appropriate place to expose workqueues to
userland would be /sys/devices/virtual/workqueues/WQ_NAME which is
symlinked to /sys/bus/workqueue/devices/WQ_NAME and that we're lacking
a way to do that outside of driver core as virtual_device_parent()
isn't exported and there's no inteface to conveniently create a
virtual subsystem.

This patch implements subsys_virtual_register() by factoring out
subsys_register() from subsys_system_register() and using it with
virtual_device_parent() as the origin directory.  It's identical to
subsys_system_register() other than the origin directory but we aren't
gonna restrict the device names which should be used under it.

This will be used to expose workqueue attributes to userland.

Signed-off-by: Tejun Heo 
Cc: Kay Sievers 
Cc: Greg Kroah-Hartman 
---
Kay, does this look okay?  If so, how should this be routed?

Thanks.

 drivers/base/base.h|  2 ++
 drivers/base/bus.c | 73 +++---
 drivers/base/core.c|  2 +-
 include/linux/device.h |  2 ++
 4 files changed, 57 insertions(+), 22 deletions(-)

diff --git a/drivers/base/base.h b/drivers/base/base.h
index 6ee17bb..b8bdfe6 100644
--- a/drivers/base/base.h
+++ b/drivers/base/base.h
@@ -101,6 +101,8 @@ static inline int hypervisor_init(void) { return 0; }
 extern int platform_bus_init(void);
 extern void cpu_dev_init(void);
 
+struct kobject *virtual_device_parent(struct device *dev);
+
 extern int bus_add_device(struct device *dev);
 extern void bus_probe_device(struct device *dev);
 extern void bus_remove_device(struct device *dev);
diff --git a/drivers/base/bus.c b/drivers/base/bus.c
index 24eb078..d229858 100644
--- a/drivers/base/bus.c
+++ b/drivers/base/bus.c
@@ -1205,26 +1205,10 @@ static void system_root_device_release(struct device 
*dev)
 {
kfree(dev);
 }
-/**
- * subsys_system_register - register a subsystem at /sys/devices/system/
- * @subsys: system subsystem
- * @groups: default attributes for the root device
- *
- * All 'system' subsystems have a /sys/devices/system/ root device
- * with the name of the subsystem. The root device can carry subsystem-
- * wide attributes. All registered devices are below this single root
- * device and are named after the subsystem with a simple enumeration
- * number appended. The registered devices are not explicitely named;
- * only 'id' in the device needs to be set.
- *
- * Do not use this interface for anything new, it exists for compatibility
- * with bad ideas only. New subsystems should use plain subsystems; and
- * add the subsystem-wide attributes should be added to the subsystem
- * directory itself and not some create fake root-device placed in
- * /sys/devices/system/.
- */
-int subsys_system_register(struct bus_type *subsys,
-  const struct attribute_group **groups)
+
+static int subsys_register(struct bus_type *subsys,
+  const struct attribute_group **groups,
+  struct kobject *parent_of_root)
 {
struct device *dev;
int err;
@@ -1243,7 +1227,7 @@ int subsys_system_register(struct bus_type *subsys,
if (err < 0)
goto err_name;
 
-   dev->kobj.parent = _kset->kobj;
+   dev->kobj.parent = parent_of_root;
dev->groups = groups;
dev->release = system_root_device_release;
 
@@ -1263,8 +1247,55 @@ err_dev:
bus_unregister(subsys);
return err;
 }
+
+/**
+ * subsys_system_register - register a subsystem at /sys/devices/system/
+ * @subsys: system subsystem
+ * @groups: default attributes for the root device
+ *
+ * All 'system' subsystems have a /sys/devices/system/ root device
+ * with the name of the subsystem. The root device can carry subsystem-
+ * wide attributes. All registered devices are below this single root
+ * device and are named after the subsystem with a simple enumeration
+ * number appended. The registered devices are not explicitely named;
+ * only 'id' in the device needs to be set.
+ *
+ * Do not use this interface for anything new, it exists for compatibility
+ * with bad ideas only. New subsystems should use plain subsystems; and
+ * add the subsystem-wide attributes should be added to the subsystem
+ * directory itself and not some create fake root-device placed in
+ * /sys/devices/system/.
+ */
+int subsys_system_register(struct bus_type *subsys,
+  const struct attribute_group **groups)
+{
+   return subsys_register(subsys, groups, _kset->kobj);
+}
 EXPORT_SYMBOL_GPL(subsys_system_register);
 
+/**
+ * subsys_virtual_register - register a subsystem at /sys/devices/virtual/
+ * @subsys: virtual subsystem
+ * @groups: default attributes for the root device
+ *
+ * All 'virtual' subsystems have a /sys/devices/system/ root device
+ * with the name of the subystem.  The root device can carry subsystem-wide
+ * attributes.  All registered devices are below this single root device.
+ * There's 

[PATCH 15/31] workqueue: separate out init_worker_pool() from init_workqueues()

2013-03-01 Thread Tejun Heo
This will be used to implement unbound pools with custom attributes.

This patch doesn't introduce any functional changes.

Signed-off-by: Tejun Heo 
---
 kernel/workqueue.c | 37 +
 1 file changed, 21 insertions(+), 16 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 68b3443..f97539b 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3121,6 +3121,26 @@ int keventd_up(void)
return system_wq != NULL;
 }
 
+static void init_worker_pool(struct worker_pool *pool)
+{
+   spin_lock_init(>lock);
+   pool->flags |= POOL_DISASSOCIATED;
+   INIT_LIST_HEAD(>worklist);
+   INIT_LIST_HEAD(>idle_list);
+   hash_init(pool->busy_hash);
+
+   init_timer_deferrable(>idle_timer);
+   pool->idle_timer.function = idle_worker_timeout;
+   pool->idle_timer.data = (unsigned long)pool;
+
+   setup_timer(>mayday_timer, pool_mayday_timeout,
+   (unsigned long)pool);
+
+   mutex_init(>manager_mutex);
+   mutex_init(>assoc_mutex);
+   ida_init(>worker_ida);
+}
+
 static int alloc_and_link_pwqs(struct workqueue_struct *wq)
 {
bool highpri = wq->flags & WQ_HIGHPRI;
@@ -3789,23 +3809,8 @@ static int __init init_workqueues(void)
struct worker_pool *pool;
 
for_each_std_worker_pool(pool, cpu) {
-   spin_lock_init(>lock);
+   init_worker_pool(pool);
pool->cpu = cpu;
-   pool->flags |= POOL_DISASSOCIATED;
-   INIT_LIST_HEAD(>worklist);
-   INIT_LIST_HEAD(>idle_list);
-   hash_init(pool->busy_hash);
-
-   init_timer_deferrable(>idle_timer);
-   pool->idle_timer.function = idle_worker_timeout;
-   pool->idle_timer.data = (unsigned long)pool;
-
-   setup_timer(>mayday_timer, pool_mayday_timeout,
-   (unsigned long)pool);
-
-   mutex_init(>manager_mutex);
-   mutex_init(>assoc_mutex);
-   ida_init(>worker_ida);
 
/* alloc pool ID */
BUG_ON(worker_pool_assign_id(pool));
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 31/31] workqueue: implement sysfs interface for workqueues

2013-03-01 Thread Tejun Heo
There are cases where workqueue users want to expose control knobs to
userland.  e.g. Unbound workqueues with custom attributes are
scheduled to be used for writeback workers and depending on
configuration it can be useful to allow admins to tinker with the
priority or allowed CPUs.

This patch implements workqueue_sysfs_register(), which makes the
workqueue visible under /sys/bus/workqueue/devices/WQ_NAME.  There
currently are two attributes common to both per-cpu and unbound pools
and extra attributes for unbound pools including nice level and
cpumask.

If alloc_workqueue*() is called with WQ_SYSFS,
workqueue_sysfs_register() is called automatically as part of
workqueue creation.  This is the preferred method unless the workqueue
user wants to apply workqueue_attrs before making the workqueue
visible to userland.

Signed-off-by: Tejun Heo 
---
 include/linux/workqueue.h |   8 ++
 kernel/workqueue.c| 280 ++
 2 files changed, 288 insertions(+)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index e1e5748..9764841 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -292,6 +292,7 @@ enum {
WQ_MEM_RECLAIM  = 1 << 3, /* may be used for memory reclaim */
WQ_HIGHPRI  = 1 << 4, /* high priority */
WQ_CPU_INTENSIVE= 1 << 5, /* cpu instensive workqueue */
+   WQ_SYSFS= 1 << 6, /* visible in sysfs, see 
wq_sysfs_register() */
 
__WQ_DRAINING   = 1 << 16, /* internal: workqueue is draining */
__WQ_ORDERED= 1 << 17, /* internal: workqueue is ordered */
@@ -494,4 +495,11 @@ extern bool freeze_workqueues_busy(void);
 extern void thaw_workqueues(void);
 #endif /* CONFIG_FREEZER */
 
+#ifdef CONFIG_SYSFS
+int workqueue_sysfs_register(struct workqueue_struct *wq);
+#else  /* CONFIG_SYSFS */
+static inline int workqueue_sysfs_register(struct workqueue_struct *wq)
+{ return 0; }
+#endif /* CONFIG_SYSFS */
+
 #endif
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 8d487f6..a618df4 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -210,6 +210,8 @@ struct wq_flusher {
struct completion   done;   /* flush completion */
 };
 
+struct wq_device;
+
 /*
  * The externally visible workqueue abstraction is an array of
  * per-CPU workqueues:
@@ -233,6 +235,10 @@ struct workqueue_struct {
 
int nr_drainers;/* W: drain in progress */
int saved_max_active; /* W: saved pwq max_active */
+
+#ifdef CONFIG_SYSFS
+   struct wq_device*wq_dev;/* I: for sysfs interface */
+#endif
 #ifdef CONFIG_LOCKDEP
struct lockdep_map  lockdep_map;
 #endif
@@ -438,6 +444,8 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(struct worker_pool 
[NR_STD_WORKER_POOLS],
 static DEFINE_IDR(worker_pool_idr);
 
 static int worker_thread(void *__worker);
+static void copy_workqueue_attrs(struct workqueue_attrs *to,
+const struct workqueue_attrs *from);
 
 /* allocate ID and assign it to @pool */
 static int worker_pool_assign_id(struct worker_pool *pool)
@@ -3151,6 +3159,273 @@ int keventd_up(void)
return system_wq != NULL;
 }
 
+#ifdef CONFIG_SYSFS
+/*
+ * Workqueues with WQ_SYSFS flag set is visible to userland via
+ * /sys/bus/workqueue/devices/WQ_NAME.  All visible workqueues have the
+ * following attributes.
+ *
+ *  per_cpuRO bool : whether the workqueue is per-cpu or unbound
+ *  max_active RW int  : maximum number of in-flight work items
+ *
+ * Unbound workqueues have the following extra attributes.
+ *
+ *  id RO int  : the associated pool ID
+ *  nice   RW int  : nice value of the workers
+ *  cpumaskRW mask : bitmask of allowed CPUs for the workers
+ */
+struct wq_device {
+   struct workqueue_struct *wq;
+   struct device   dev;
+};
+
+static struct workqueue_struct *dev_to_wq(struct device *dev)
+{
+   struct wq_device *wq_dev = container_of(dev, struct wq_device, dev);
+
+   return wq_dev->wq;
+}
+
+static ssize_t wq_per_cpu_show(struct device *dev,
+  struct device_attribute *attr, char *buf)
+{
+   struct workqueue_struct *wq = dev_to_wq(dev);
+
+   return scnprintf(buf, PAGE_SIZE, "%d\n", (bool)!(wq->flags & 
WQ_UNBOUND));
+}
+
+static ssize_t wq_max_active_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+   struct workqueue_struct *wq = dev_to_wq(dev);
+
+   return scnprintf(buf, PAGE_SIZE, "%d\n", wq->saved_max_active);
+}
+
+static ssize_t wq_max_active_store(struct device *dev,
+  struct device_attribute *attr,
+  const char *buf, size_t count)
+{
+   struct workqueue_struct *wq = dev_to_wq(dev);
+   int val;
+
+   if (sscanf(buf, "%d", ) != 1 || val <= 0)

[PATCH 12/31] workqueue: update synchronization rules on workqueue->pwqs

2013-03-01 Thread Tejun Heo
Make workqueue->pwqs protected by workqueue_lock for writes and
sched-RCU protected for reads.  Lockdep assertions are added to
for_each_pwq() and first_pwq() and all their users are converted to
either hold workqueue_lock or disable preemption/irq.

alloc_and_link_pwqs() is updated to use list_add_tail_rcu() for
consistency which isn't strictly necessary as the workqueue isn't
visible.  destroy_workqueue() isn't updated to sched-RCU release pwqs.
This is okay as the workqueue should have on users left by that point.

The locking is superflous at this point.  This is to help
implementation of unbound pools/pwqs with custom attributes.

This patch doesn't introduce any behavior changes.

Signed-off-by: Tejun Heo 
---
 kernel/workqueue.c | 85 +++---
 1 file changed, 68 insertions(+), 17 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 02f51b8..ff51c59 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -42,6 +42,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "workqueue_internal.h"
 
@@ -118,6 +119,8 @@ enum {
  * F: wq->flush_mutex protected.
  *
  * W: workqueue_lock protected.
+ *
+ * R: workqueue_lock protected for writes.  Sched-RCU protected for reads.
  */
 
 /* struct worker is defined in workqueue_internal.h */
@@ -169,7 +172,7 @@ struct pool_workqueue {
int nr_active;  /* L: nr of active works */
int max_active; /* L: max active works */
struct list_headdelayed_works;  /* L: delayed works */
-   struct list_headpwqs_node;  /* I: node on wq->pwqs */
+   struct list_headpwqs_node;  /* R: node on wq->pwqs */
struct list_headmayday_node;/* W: node on wq->maydays */
 } __aligned(1 << WORK_STRUCT_FLAG_BITS);
 
@@ -189,7 +192,7 @@ struct wq_flusher {
 struct workqueue_struct {
unsigned intflags;  /* W: WQ_* flags */
struct pool_workqueue __percpu *cpu_pwqs; /* I: per-cpu pwq's */
-   struct list_headpwqs;   /* I: all pwqs of this wq */
+   struct list_headpwqs;   /* R: all pwqs of this wq */
struct list_headlist;   /* W: list of all workqueues */
 
struct mutexflush_mutex;/* protects wq flushing */
@@ -227,6 +230,11 @@ EXPORT_SYMBOL_GPL(system_freezable_wq);
 #define CREATE_TRACE_POINTS
 #include 
 
+#define assert_rcu_or_wq_lock()
\
+   rcu_lockdep_assert(rcu_read_lock_sched_held() ||\
+  lockdep_is_held(_lock),\
+  "sched RCU or workqueue lock should be held")
+
 #define for_each_std_worker_pool(pool, cpu)\
for ((pool) = _worker_pools(cpu)[0];\
 (pool) < _worker_pools(cpu)[NR_STD_WORKER_POOLS]; (pool)++)
@@ -282,9 +290,16 @@ static inline int __next_wq_cpu(int cpu, const struct 
cpumask *mask,
  * for_each_pwq - iterate through all pool_workqueues of the specified 
workqueue
  * @pwq: iteration cursor
  * @wq: the target workqueue
+ *
+ * This must be called either with workqueue_lock held or sched RCU read
+ * locked.  If the pwq needs to be used beyond the locking in effect, the
+ * caller is responsible for guaranteeing that the pwq stays online.
+ *
+ * The if clause exists only for the lockdep assertion and can be ignored.
  */
 #define for_each_pwq(pwq, wq)  \
-   list_for_each_entry((pwq), &(wq)->pwqs, pwqs_node)
+   list_for_each_entry_rcu((pwq), &(wq)->pwqs, pwqs_node)  \
+   if (({ assert_rcu_or_wq_lock(); true; }))
 
 #ifdef CONFIG_DEBUG_OBJECTS_WORK
 
@@ -463,9 +478,19 @@ static struct worker_pool *get_std_worker_pool(int cpu, 
bool highpri)
return [highpri];
 }
 
+/**
+ * first_pwq - return the first pool_workqueue of the specified workqueue
+ * @wq: the target workqueue
+ *
+ * This must be called either with workqueue_lock held or sched RCU read
+ * locked.  If the pwq needs to be used beyond the locking in effect, the
+ * caller is responsible for guaranteeing that the pwq stays online.
+ */
 static struct pool_workqueue *first_pwq(struct workqueue_struct *wq)
 {
-   return list_first_entry(>pwqs, struct pool_workqueue, pwqs_node);
+   assert_rcu_or_wq_lock();
+   return list_first_or_null_rcu(>pwqs, struct pool_workqueue,
+ pwqs_node);
 }
 
 static unsigned int work_color_to_flags(int color)
@@ -2488,10 +2513,12 @@ static bool flush_workqueue_prep_pwqs(struct 
workqueue_struct *wq,
atomic_set(>nr_pwqs_to_flush, 1);
}
 
+   local_irq_disable();
+
for_each_pwq(pwq, wq) {
struct worker_pool *pool = pwq->pool;
 
-   spin_lock_irq(>lock);
+   

[PATCH 01/31] workqueue: make sanity checks less punshing using WARN_ON[_ONCE]()s

2013-03-01 Thread Tejun Heo
Workqueue has been using mostly BUG_ON()s for sanity checks, which
fail unnecessarily harshly when the assertion doesn't hold.  Most
assertions can converted to be less drastic such that things can limp
along instead of dying completely.  Convert BUG_ON()s to
WARN_ON[_ONCE]()s with softer failure behaviors - e.g. if assertion
check fails in destroy_worker(), trigger WARN and silently ignore
destruction request.

Most conversions are trivial.  Note that sanity checks in
destroy_workqueue() are moved above removal from workqueues list so
that it can bail out without side-effects if assertion checks fail.

This patch doesn't introduce any visible behavior changes during
normal operation.

Signed-off-by: Tejun Heo 
---
 kernel/workqueue.c | 85 +-
 1 file changed, 46 insertions(+), 39 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 0b1e6f2..a533e77 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -530,7 +530,7 @@ static int work_next_color(int color)
 static inline void set_work_data(struct work_struct *work, unsigned long data,
 unsigned long flags)
 {
-   BUG_ON(!work_pending(work));
+   WARN_ON_ONCE(!work_pending(work));
atomic_long_set(>data, data | flags | work_static(work));
 }
 
@@ -785,7 +785,8 @@ struct task_struct *wq_worker_sleeping(struct task_struct 
*task,
pool = worker->pool;
 
/* this can only happen on the local cpu */
-   BUG_ON(cpu != raw_smp_processor_id());
+   if (WARN_ON_ONCE(cpu != raw_smp_processor_id()))
+   return NULL;
 
/*
 * The counterpart of the following dec_and_test, implied mb,
@@ -1459,9 +1460,10 @@ static void worker_enter_idle(struct worker *worker)
 {
struct worker_pool *pool = worker->pool;
 
-   BUG_ON(worker->flags & WORKER_IDLE);
-   BUG_ON(!list_empty(>entry) &&
-  (worker->hentry.next || worker->hentry.pprev));
+   if (WARN_ON_ONCE(worker->flags & WORKER_IDLE) ||
+   WARN_ON_ONCE(!list_empty(>entry) &&
+(worker->hentry.next || worker->hentry.pprev)))
+   return;
 
/* can't use worker_set_flags(), also called from start_worker() */
worker->flags |= WORKER_IDLE;
@@ -1498,7 +1500,8 @@ static void worker_leave_idle(struct worker *worker)
 {
struct worker_pool *pool = worker->pool;
 
-   BUG_ON(!(worker->flags & WORKER_IDLE));
+   if (WARN_ON_ONCE(!(worker->flags & WORKER_IDLE)))
+   return;
worker_clr_flags(worker, WORKER_IDLE);
pool->nr_idle--;
list_del_init(>entry);
@@ -1795,8 +1798,9 @@ static void destroy_worker(struct worker *worker)
int id = worker->id;
 
/* sanity check frenzy */
-   BUG_ON(worker->current_work);
-   BUG_ON(!list_empty(>scheduled));
+   if (WARN_ON(worker->current_work) ||
+   WARN_ON(!list_empty(>scheduled)))
+   return;
 
if (worker->flags & WORKER_STARTED)
pool->nr_workers--;
@@ -1925,7 +1929,8 @@ restart:
del_timer_sync(>mayday_timer);
spin_lock_irq(>lock);
start_worker(worker);
-   BUG_ON(need_to_create_worker(pool));
+   if (WARN_ON_ONCE(need_to_create_worker(pool)))
+   goto restart;
return true;
}
 
@@ -2258,7 +2263,7 @@ recheck:
 * preparing to process a work or actually processing it.
 * Make sure nobody diddled with it while I was sleeping.
 */
-   BUG_ON(!list_empty(>scheduled));
+   WARN_ON_ONCE(!list_empty(>scheduled));
 
/*
 * When control reaches this point, we're guaranteed to have
@@ -2366,7 +2371,7 @@ repeat:
 * Slurp in all works issued via this workqueue and
 * process'em.
 */
-   BUG_ON(!list_empty(>scheduled));
+   WARN_ON_ONCE(!list_empty(>scheduled));
list_for_each_entry_safe(work, n, >worklist, entry)
if (get_work_pwq(work) == pwq)
move_linked_works(work, scheduled, );
@@ -2501,7 +2506,7 @@ static bool flush_workqueue_prep_pwqs(struct 
workqueue_struct *wq,
unsigned int cpu;
 
if (flush_color >= 0) {
-   BUG_ON(atomic_read(>nr_pwqs_to_flush));
+   WARN_ON_ONCE(atomic_read(>nr_pwqs_to_flush));
atomic_set(>nr_pwqs_to_flush, 1);
}
 
@@ -2512,7 +2517,7 @@ static bool flush_workqueue_prep_pwqs(struct 
workqueue_struct *wq,
spin_lock_irq(>lock);
 
if (flush_color >= 0) {
-   BUG_ON(pwq->flush_color != -1);
+   WARN_ON_ONCE(pwq->flush_color != -1);
 
if (pwq->nr_in_flight[flush_color]) {
   

[PATCH 27/31] workqueue: make it clear that WQ_DRAINING is an internal flag

2013-03-01 Thread Tejun Heo
We're gonna add another internal WQ flag.  Let's make the distinction
clear.  Prefix WQ_DRAINING with __ and move it to bit 16.

Signed-off-by: Tejun Heo 
---
 include/linux/workqueue.h | 2 +-
 kernel/workqueue.c| 8 
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index c8c3bf4..fc7f882 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -293,7 +293,7 @@ enum {
WQ_HIGHPRI  = 1 << 4, /* high priority */
WQ_CPU_INTENSIVE= 1 << 5, /* cpu instensive workqueue */
 
-   WQ_DRAINING = 1 << 6, /* internal: workqueue is draining */
+   __WQ_DRAINING   = 1 << 16, /* internal: workqueue is draining */
 
WQ_MAX_ACTIVE   = 512,/* I like 512, better ideas? */
WQ_MAX_UNBOUND_PER_CPU  = 4,  /* 4 * #cpus for unbound wq */
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 36fcf9c..2016c9e 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1222,7 +1222,7 @@ static void __queue_work(int cpu, struct workqueue_struct 
*wq,
debug_work_activate(work);
 
/* if dying, only works from the same workqueue are allowed */
-   if (unlikely(wq->flags & WQ_DRAINING) &&
+   if (unlikely(wq->flags & __WQ_DRAINING) &&
WARN_ON_ONCE(!is_chained_work(wq)))
return;
 retry:
@@ -2761,11 +2761,11 @@ void drain_workqueue(struct workqueue_struct *wq)
/*
 * __queue_work() needs to test whether there are drainers, is much
 * hotter than drain_workqueue() and already looks at @wq->flags.
-* Use WQ_DRAINING so that queue doesn't have to check nr_drainers.
+* Use __WQ_DRAINING so that queue doesn't have to check nr_drainers.
 */
spin_lock_irq(_lock);
if (!wq->nr_drainers++)
-   wq->flags |= WQ_DRAINING;
+   wq->flags |= __WQ_DRAINING;
spin_unlock_irq(_lock);
 reflush:
flush_workqueue(wq);
@@ -2793,7 +2793,7 @@ reflush:
 
spin_lock(_lock);
if (!--wq->nr_drainers)
-   wq->flags &= ~WQ_DRAINING;
+   wq->flags &= ~__WQ_DRAINING;
spin_unlock(_lock);
 
local_irq_enable();
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCHSET wq/for-3.10-tmp] workqueue: implement workqueue with custom worker attributes

2013-03-01 Thread Tejun Heo
Subject: [PATCHSET wq/for-3.10-tmp] workqueue: implement workqueue with custom 
worker attributes

Hello,

Finally, here's the unbound workqueue with custom worker attributes
patchset I've been talking about.  The goal is simple.  We want
unbound workqueues with custom worker attributes with a mechanism to
expose the knobs to userland.

Currently, the supported attributes are nice level and allowed
cpumask.  It's likely that cgroup association will be added in future.
Attributes are specified via struct workqueue_attrs.

 struct workqueue_attrs {
int nice;   /* nice level */
cpumask_var_t   cpumask;/* allowed CPUs */
 };

which is allocated, applied and freed using the following functions.

 struct workqueue_attrs *alloc_workqueue_attrs(gfp_t gfp_mask);
 void free_workqueue_attrs(struct workqueue_attrs *attrs);
 int apply_workqueue_attrs(struct workqueue_struct *wq,
   const struct workqueue_attrs *attrs);

If the workqueue's knobs should be visible to userland, WQ_SYSFS can
be specified during alloc_workqueue() or workqueue_sysfs_register()
can be called.  The knobs will be accessible under
/sys/bus/workqueue/devices/NAME/.  max_active, nice and cpumask are
all adjustable from userland.

Whenever a new set of attrs is applied, workqueue tries to find the
worker_pool with matching attributes.  If there's one, its refcnt is
bumped and used; otherwise, a new one is created.  A new
pool_workqueue is created to interface with the found or created
worker_pool and the old pwqs (pool_workqueues) stick around until all
in-flight work items finish.  As pwqs retire, the associated
worker_pools are put too.  As a result, workqueue will make all
workqueues with the same attributes share the same pool and only keep
around the pools which are in use.

The interface is simple but the implementation is quite involved
because per-cpu assumption is still very strongly entrenched in the
existing workqueue implementation with unbound workqueue
implementation thrown on top as a hacky extension of the per-cpu
model.  A lot of this patchset deals with decoupling per-cpu
assumptions from various parts.

After per-cpu assumption is removed, unbound workqueue handling is
updated so that it can deal with multiple pwqs.  With the pwq and pool
iterators updated to handle per-cpu and unbound ones equally, it
usually boils down to traveling the same path used by per-cpu
workqueues to deal with multiple per-cpu pwqs.  For example,
non-reentrancy test while queueing and multiple pwq handling in
flush_workqueue() are now shared by both per-cpu and unbound
workqueues.

The result is pretty nice as per-cpu and unbound workqueues behave
almost the same with the only difference being per-cpu's pwqs are
per-cpu and unbound's are for different attributes.  The handling
deviates only in creation and destruction paths.

This patchset doesn't introduce any uses of workqueue_attrs or
WQ_SYSFS.  Writeback and btrfs IO workers are candidates for
conversion and will be done in separate patchsets.

This patchset contains the following 31 patches.

 0001-workqueue-make-sanity-checks-less-punshing-using-WAR.patch
 0002-workqueue-make-workqueue_lock-irq-safe.patch
 0003-workqueue-introduce-kmem_cache-for-pool_workqueues.patch
 0004-workqueue-add-workqueue_struct-pwqs-list.patch
 0005-workqueue-replace-for_each_pwq_cpu-with-for_each_pwq.patch
 0006-workqueue-introduce-for_each_pool.patch
 0007-workqueue-restructure-pool-pool_workqueue-iterations.patch
 0008-workqueue-add-wokrqueue_struct-maydays-list-to-repla.patch
 0009-workqueue-consistently-use-int-for-cpu-variables.patch
 0010-workqueue-remove-workqueue_struct-pool_wq.single.patch
 0011-workqueue-replace-get_pwq-with-explicit-per_cpu_ptr-.patch
 0012-workqueue-update-synchronization-rules-on-workqueue-.patch
 0013-workqueue-update-synchronization-rules-on-worker_poo.patch
 0014-workqueue-replace-POOL_MANAGING_WORKERS-flag-with-wo.patch
 0015-workqueue-separate-out-init_worker_pool-from-init_wo.patch
 0016-workqueue-introduce-workqueue_attrs.patch
 0017-workqueue-implement-attribute-based-unbound-worker_p.patch
 0018-workqueue-remove-unbound_std_worker_pools-and-relate.patch
 0019-workqueue-drop-std-from-cpu_std_worker_pools-and-for.patch
 0020-workqueue-add-pool-ID-to-the-names-of-unbound-kworke.patch
 0021-workqueue-drop-WQ_RESCUER-and-test-workqueue-rescuer.patch
 0022-workqueue-restructure-__alloc_workqueue_key.patch
 0023-workqueue-implement-get-put_pwq.patch
 0024-workqueue-prepare-flush_workqueue-for-dynamic-creati.patch
 0025-workqueue-perform-non-reentrancy-test-when-queueing-.patch
 0026-workqueue-implement-apply_workqueue_attrs.patch
 0027-workqueue-make-it-clear-that-WQ_DRAINING-is-an-inter.patch
 0028-workqueue-reject-increasing-max_active-for-ordered-w.patch
 0029-cpumask-implement-cpumask_parse.patch
 0030-driver-base-implement-subsys_virtual_register.patch
 

Re: [PATCH 1/4] tracing/syscalls: Anotate some functions static

2013-03-01 Thread Steven Rostedt
This has been fixed already in mainline:

commit 6aea49cb5f3001a8275bf9c9f586ec3eb39af194
tracing/syscalls: Make local functions static
Author: Fengguang Wu 

-- Steve


On Thu, 2013-02-21 at 10:32 +0800, Li Zefan wrote:
> Signed-off-by: Li Zefan 
> ---
>  kernel/trace/trace_syscalls.c | 18 +-
>  1 file changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
> index 7609dd6..5329e13 100644
> --- a/kernel/trace/trace_syscalls.c
> +++ b/kernel/trace/trace_syscalls.c
> @@ -77,7 +77,7 @@ static struct syscall_metadata *syscall_nr_to_meta(int nr)
>   return syscalls_metadata[nr];
>  }
>  
> -enum print_line_t
> +static enum print_line_t
>  print_syscall_enter(struct trace_iterator *iter, int flags,
>   struct trace_event *event)
>  {
> @@ -130,7 +130,7 @@ end:
>   return TRACE_TYPE_HANDLED;
>  }
>  


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] add extra free kbytes tunable

2013-03-01 Thread Hugh Dickins
On Sat, 2 Mar 2013, Simon Jeons wrote:
> On 03/02/2013 09:42 AM, Hugh Dickins wrote:
> > On Sat, 2 Mar 2013, Simon Jeons wrote:
> > > In function __add_to_swap_cache if add to radix tree successfully will
> > > result
> > > in increase NR_FILE_PAGES, why? This is anonymous page instead of file
> > > backed
> > > page.
> > Right, that's hard to understand without historical background.
> > 
> > I think the quick answer would be that we used to (and still do) think
> > of file-cache and swap-cache as two halves of page-cache.  And then when
> 
> shmem page should be treated as file-cache or swap-cache? It is strange since
> it is consist of anonymous pages and these pages establish files.

A shmem page is swap-backed file-cache, and it may get transferred to or
from swap-cache: yes, it's a difficult and confusing case, as I said below.

I would never call it "anonymous", but it is counted in /proc/meminfo's
Active(anon) or Inactive(anon) rather than in (file), because "anon"
there is shorthand for "swap-backed".

> > So you'll find that shmem and swap are counted as file in some places
> > and anon in others, and it's hard to grasp which is where and why,
> > without remembering the history.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [for-next][PATCH 0/7] tracing: fixups, memory savings, and block on splice

2013-03-01 Thread Steven Rostedt
On Fri, 2013-03-01 at 21:48 -0500, Steven Rostedt wrote:

> By converting two common structures to slab caches, I was able to
> save 36K of memory!

Correction... 36K was for only one slab conversion. The total was 42K in
savings ;-)

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mmotm 2013-03-01-15-50 uploaded (early_printk)

2013-03-01 Thread Randy Dunlap
On 03/01/13 15:51, a...@linux-foundation.org wrote:
> The mm-of-the-moment snapshot 2013-03-01-15-50 has been uploaded to
> 
>http://www.ozlabs.org/~akpm/mmotm/
> 

on i386:

arch/x86/built-in.o: In function `finish_e820_parsing':
(.init.text+0x225e): undefined reference to `early_printk'
arch/x86/built-in.o: In function `setup_early_printk':
early_printk.c:(.init.text+0x82ae): undefined reference to `early_console'
early_printk.c:(.init.text+0x8304): undefined reference to `early_console'
early_printk.c:(.init.text+0x836f): undefined reference to `early_console'
early_printk.c:(.init.text+0x83e6): undefined reference to `early_console'


Full randconfig file is attached.


-- 
~Randy
#
# Automatically generated file; DO NOT EDIT.
# Linux/i386 3.8.0-mm1 Kernel Configuration
#
# CONFIG_64BIT is not set
CONFIG_X86_32=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf32-i386"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_MMU=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_GENERIC_GPIO=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_DEFAULT_IDLE=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_ARCH_HAS_CPU_AUTOPROBE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
# CONFIG_ZONE_DMA32 is not set
# CONFIG_AUDIT_ARCH is not set
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_X86_32_SMP=y
CONFIG_X86_HT=y
CONFIG_X86_32_LAZY_GS=y
CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-ecx -fcall-saved-edx"
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
# CONFIG_KERNEL_GZIP is not set
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
CONFIG_KERNEL_XZ=y
# CONFIG_KERNEL_LZO is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
# CONFIG_POSIX_MQUEUE is not set
# CONFIG_FHANDLE is not set
CONFIG_AUDIT=y
# CONFIG_AUDITSYSCALL is not set
# CONFIG_AUDIT_LOGINUID_IMMUTABLE is not set
CONFIG_HAVE_GENERIC_HARDIRQS=y

#
# IRQ subsystem
#
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_IRQ_DOMAIN=y
# CONFIG_IRQ_DOMAIN_DEBUG is not set
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ALWAYS_USE_PERSISTENT_CLOCK=y
CONFIG_KTIME_SCALAR=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
# CONFIG_NO_HZ is not set
# CONFIG_HIGH_RES_TIMERS is not set

#
# CPU/Task time and stats accounting
#
# CONFIG_TICK_CPU_ACCOUNTING is not set
CONFIG_IRQ_TIME_ACCOUNTING=y
# CONFIG_BSD_PROCESS_ACCT is not set
CONFIG_TASKSTATS=y
# CONFIG_TASK_DELAY_ACCT is not set
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
# CONFIG_PREEMPT_RCU is not set
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_FANOUT=32
CONFIG_RCU_FANOUT_LEAF=16
CONFIG_RCU_FANOUT_EXACT=y
CONFIG_TREE_RCU_TRACE=y
# CONFIG_RCU_NOCB_CPU is not set
CONFIG_IKCONFIG=y
CONFIG_LOG_BUF_SHIFT=17
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
CONFIG_ARCH_WANTS_PROT_NUMA_PROT_NONE=y
CONFIG_CHECKPOINT_RESTORE=y
# CONFIG_NAMESPACES is not set
# CONFIG_SCHED_AUTOGROUP is not set
CONFIG_SYSFS_DEPRECATED=y
CONFIG_SYSFS_DEPRECATED_V2=y
# CONFIG_RELAY is not set
# CONFIG_BLK_DEV_INITRD is not set
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_ANON_INODES=y
CONFIG_EXPERT=y
CONFIG_HAVE_UID16=y
CONFIG_UID16=y
CONFIG_SYSCTL_EXCEPTION_TRACE=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_HOTPLUG=y
# CONFIG_PRINTK is not set
# CONFIG_BUG is not set
# CONFIG_ELF_CORE is not set
CONFIG_PCSPKR_PLATFORM=y
CONFIG_HAVE_PCSPKR_PLATFORM=y
# CONFIG_BASE_FULL is not set
CONFIG_FUTEX=y
# CONFIG_EPOLL is not set
# CONFIG_SIGNALFD is not set
CONFIG_TIMERFD=y
# CONFIG_EVENTFD is not set
# CONFIG_SHMEM is not set
# CONFIG_AIO is not set
# CONFIG_EMBEDDED is not set
CONFIG_HAVE_PERF_EVENTS=y

#
# Kernel Performance Events And Counters
#
CONFIG_PERF_EVENTS=y
# CONFIG_DEBUG_PERF_USE_VMALLOC is not set
# CONFIG_VM_EVENT_COUNTERS is not set
CONFIG_SLUB_DEBUG=y
CONFIG_COMPAT_BRK=y
# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_SLOB is not set
CONFIG_PROFILING=y
CONFIG_OPROFILE=y
CONFIG_OPROFILE_EVENT_MULTIPLEX=y

[for-next][PATCH 0/7] tracing: fixups, memory savings, and block on splice

2013-03-01 Thread Steven Rostedt
This is some more updates coming for v3.10.

One, I had to rebase from my last patch set because it broke
kgdb tracing as well as the new snapshot feature. The end of this
email contains the difference between my last patch set and what I
pushed to next in my rebase.

The first patch contains a fix to the kernel command line setting
of trace_events, as the multi buffers change broke it.

Watching Ezequiel Garcia presentation at ELC, he pointed out the waste
in the kernel from subsystems abusing kmalloc when kmem_cache_alloc()
would be better. The trace system was one of the problem areas.
By converting two common structures to slab caches, I was able to
save 36K of memory!

I also noticed that the field and event names in the format files
and filtering logic was being strdup'd from strings that happen to
be constant. I originally did this in case of modules, but then,
if a module adds an event, it must also remove it before unloading,
which would destroy the reference to the string.

By not doing the strdup() and just point to the original string
I was able to save over a 100K of memory!!! This also makes each
TRACE_EVENT() less expensive.

I finally got around to fixing a long standing bug in the splice
logic. That is, it never blocked when there was no data and required
the caller to poll. Now with irq_work(), splice and reads from
trace_pipe_raw() can block and wait for data in the buffer before
returning.

Also, since we have multiple buffers, and instead of waking up
all waiters on all buffers for data in a single buffer, I moved the
wake up logic into the ring buffer code itself. Now all users of the
ring buffer can block until data is present.

Enjoy,

-- Steve



Steven Rostedt (5):
  tracing: Get trace_events kernel command line working again
  tracing: Use kmem_cache_alloc instead of kmalloc in trace_events.c
  tracing: Use direct field, type and system names
  tracing: Fix polling on trace_pipe_raw
  tracing: Fix read blocking on trace_pipe_raw

Steven Rostedt (Red Hat) (2):
  tracing: Do not block on splice if either file or splice NONBLOCK flag is 
set
  tracing/ring-buffer: Move poll wake ups into ring buffer code


 include/linux/ring_buffer.h |6 ++
 kernel/trace/ring_buffer.c  |  146 +
 kernel/trace/trace.c|  171 +--
 kernel/trace/trace.h|4 +-
 kernel/trace/trace_events.c |  188 ---
 5 files changed, 386 insertions(+), 129 deletions(-)

[ rebase changes from last for-next patch set ]

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index af7be82..b36befa 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4133,14 +4133,30 @@ static int tracing_clock_open(struct inode *inode, 
struct file *file)
 #ifdef CONFIG_TRACER_SNAPSHOT
 static int tracing_snapshot_open(struct inode *inode, struct file *file)
 {
+   struct trace_cpu *tc = inode->i_private;
struct trace_iterator *iter;
+   struct seq_file *m;
int ret = 0;
 
if (file->f_mode & FMODE_READ) {
iter = __tracing_open(inode, file, true);
if (IS_ERR(iter))
ret = PTR_ERR(iter);
+   } else {
+   /* Writes still need the seq_file to hold the private data */
+   m = kzalloc(sizeof(*m), GFP_KERNEL);
+   if (!m)
+   return -ENOMEM;
+   iter = kzalloc(sizeof(*iter), GFP_KERNEL);
+   if (!iter) {
+   kfree(m);
+   return -ENOMEM;
+   }
+   iter->tr = tc->tr;
+   m->private = iter;
+   file->private_data = m;
}
+
return ret;
 }
 
@@ -4148,7 +4164,9 @@ static ssize_t
 tracing_snapshot_write(struct file *filp, const char __user *ubuf, size_t cnt,
   loff_t *ppos)
 {
-   struct trace_array *tr = filp->private_data;
+   struct seq_file *m = filp->private_data;
+   struct trace_iterator *iter = m->private;
+   struct trace_array *tr = iter->tr;
unsigned long val;
int ret;
 
@@ -4209,6 +4227,22 @@ out:
mutex_unlock(_types_lock);
return ret;
 }
+
+static int tracing_snapshot_release(struct inode *inode, struct file *file)
+{
+   struct seq_file *m = file->private_data;
+
+   if (file->f_mode & FMODE_READ)
+   return tracing_release(inode, file);
+
+   /* If write only, the seq_file is just a stub */
+   if (m)
+   kfree(m->private);
+   kfree(m);
+
+   return 0;
+}
+
 #endif /* CONFIG_TRACER_SNAPSHOT */
 
 
@@ -4273,7 +4307,7 @@ static const struct file_operations snapshot_fops = {
.read   = seq_read,
.write  = tracing_snapshot_write,
.llseek = tracing_seek,
-   .release= tracing_release,
+   .release= 

  1   2   3   4   5   6   7   8   9   10   >