[GIT PULL] Power management fixes for 3.6-rc2

2012-08-10 Thread Rafael J. Wysocki
Hi Linus,

Please pull from the git repository at

  git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git 
pm-for-3.6-rc2

to receive power management fixes for v3.6-rc2 with top-most commit
07368d32f1a67e797def08cf2ee3ea1647b204b6

  tpm_tis / PM: Fix unused function warning for CONFIG_PM_SLEEP

on top of commit 0d7614f09c1ebdbaa1599a5aba7593f147bf96ee

  Linux 3.6-rc1

Included are:

* Fix for two recent regressions in the generic PM domains framework.

* Revert of a commit that introduced a resume regression and is conceptually
  incorrect in my opinion.

* Fix for a return value in pcc-cpufreq.c from Julia Lawall.

* RTC wakeup signaling fix from Neil Brown.

* Suppression of compiler warnings for CONFIG_PM_SLEEP unset in ACPI,
  platform/x86 and TPM drivers.

Thanks!


 drivers/acpi/ac.c|  4 
 drivers/acpi/battery.c   |  2 ++
 drivers/acpi/button.c|  4 
 drivers/acpi/fan.c   |  4 
 drivers/acpi/power.c |  4 
 drivers/acpi/sbs.c   |  2 ++
 drivers/acpi/thermal.c   |  4 
 drivers/base/power/clock_ops.c   |  3 +--
 drivers/base/power/common.c  |  4 +---
 drivers/char/tpm/tpm_tis.c   |  2 ++
 drivers/cpufreq/pcc-cpufreq.c|  1 +
 drivers/platform/x86/classmate-laptop.c  |  4 
 drivers/platform/x86/fujitsu-tablet.c|  2 ++
 drivers/platform/x86/hdaps.c |  2 ++
 drivers/platform/x86/hp_accel.c  |  2 +-
 drivers/platform/x86/msi-laptop.c|  4 
 drivers/platform/x86/panasonic-laptop.c  |  4 
 drivers/platform/x86/sony-laptop.c   | 12 +++-
 drivers/platform/x86/thinkpad_acpi.c |  2 ++
 drivers/platform/x86/toshiba_acpi.c  |  2 ++
 drivers/platform/x86/toshiba_bluetooth.c |  4 
 drivers/platform/x86/xo15-ebook.c|  2 ++
 drivers/rtc/interface.c  |  2 ++
 drivers/rtc/rtc-cmos.c   |  1 -
 include/linux/sched.h|  8 
 kernel/power/suspend.c   |  3 ---
 kernel/watchdog.c| 21 ++---
 27 files changed, 71 insertions(+), 38 deletions(-)

---

Julia Lawall (1):
  drivers/cpufreq/pcc-cpufreq.c: fix error return code

NeilBrown (1):
  RTC: Avoid races between RTC alarm wakeup and suspend.

Rafael J. Wysocki (5):
  PM: Make dev_pm_get_subsys_data() always return 0 on success
  Revert "NMI watchdog: fix for lockup detector breakage on resume"
  ACPI / PM: Fix unused function warnings for CONFIG_PM_SLEEP
  platform / x86 / PM: Fix unused function warnings for CONFIG_PM_SLEEP
  tpm_tis / PM: Fix unused function warning for CONFIG_PM_SLEEP

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linaro-mm-sig] [PATCH 2/4] dma-fence: dma-buf synchronization (v8 )

2012-08-10 Thread Daniel Vetter
On Fri, Aug 10, 2012 at 04:57:52PM +0200, Maarten Lankhorst wrote:
> A dma-fence can be attached to a buffer which is being filled or consumed
> by hw, to allow userspace to pass the buffer without waiting to another
> device.  For example, userspace can call page_flip ioctl to display the
> next frame of graphics after kicking the GPU but while the GPU is still
> rendering.  The display device sharing the buffer with the GPU would
> attach a callback to get notified when the GPU's rendering-complete IRQ
> fires, to update the scan-out address of the display, without having to
> wake up userspace.
> 
> A dma-fence is transient, one-shot deal.  It is allocated and attached
> to one or more dma-buf's.  When the one that attached it is done, with
> the pending operation, it can signal the fence.
> 
>   + dma_fence_signal()
> 
> The dma-buf-mgr handles tracking, and waiting on, the fences associated
> with a dma-buf.
> 
> TODO maybe need some helper fxn for simple devices, like a display-
> only drm/kms device which simply wants to wait for exclusive fence to
> be signaled, and then attach a non-exclusive fence while scanout is in
> progress.
> 
> The one pending on the fence can add an async callback:
>   + dma_fence_add_callback()
> The callback can optionally be cancelled with remove_wait_queue()
> 
> Or wait synchronously (optionally with timeout or interruptible):
>   + dma_fence_wait()
> 
> A default software-only implementation is provided, which can be used
> by drivers attaching a fence to a buffer when they have no other means
> for hw sync.  But a memory backed fence is also envisioned, because it
> is common that GPU's can write to, or poll on some memory location for
> synchronization.  For example:
> 
>   fence = dma_buf_get_fence(dmabuf);
>   if (fence->ops == _fence_ops) {
> dma_buf *fence_buf;
> dma_bikeshed_fence_get_buf(fence, _buf, );
> ... tell the hw the memory location to wait on ...
>   } else {
> /* fall-back to sw sync * /
> dma_fence_add_callback(fence, my_cb);
>   }
> 
> On SoC platforms, if some other hw mechanism is provided for synchronizing
> between IP blocks, it could be supported as an alternate implementation
> with it's own fence ops in a similar way.
> 
> To facilitate other non-sw implementations, the enable_signaling callback
> can be used to keep track if a device not supporting hw sync is waiting
> on the fence, and in this case should arrange to call dma_fence_signal()
> at some point after the condition has changed, to notify other devices
> waiting on the fence.  If there are no sw waiters, this can be skipped to
> avoid waking the CPU unnecessarily. The handler of the enable_signaling
> op should take a refcount until the fence is signaled, then release its ref.
> 
> The intention is to provide a userspace interface (presumably via eventfd)
> later, to be used in conjunction with dma-buf's mmap support for sw access
> to buffers (or for userspace apps that would prefer to do their own
> synchronization).

I think the commit message should be cleaned up: Kill the TODO, rip out
the bikeshed_fence and otherwise update it to the latest code.

> 
> v1: Original
> v2: After discussion w/ danvet and mlankhorst on #dri-devel, we decided
> that dma-fence didn't need to care about the sw->hw signaling path
> (it can be handled same as sw->sw case), and therefore the fence->ops
> can be simplified and more handled in the core.  So remove the signal,
> add_callback, cancel_callback, and wait ops, and replace with a simple
> enable_signaling() op which can be used to inform a fence supporting
> hw->hw signaling that one or more devices which do not support hw
> signaling are waiting (and therefore it should enable an irq or do
> whatever is necessary in order that the CPU is notified when the
> fence is passed).
> v3: Fix locking fail in attach_fence() and get_fence()
> v4: Remove tie-in w/ dma-buf..  after discussion w/ danvet and mlankorst
> we decided that we need to be able to attach one fence to N dma-buf's,
> so using the list_head in dma-fence struct would be problematic.
> v5: [ Maarten Lankhorst ] Updated for dma-bikeshed-fence and dma-buf-manager.
> v6: [ Maarten Lankhorst ] I removed dma_fence_cancel_callback and some 
> comments
> about checking if fence fired or not. This is broken by design.
> waitqueue_active during destruction is now fatal, since the signaller
> should be holding a reference in enable_signalling until it signalled
> the fence. Pass the original dma_fence_cb along, and call __remove_wait
> in the dma_fence_callback handler, so that no cleanup needs to be
> performed.
> v7: [ Maarten Lankhorst ] Set cb->func and only enable sw signaling if
> fence wasn't signaled yet, for example for hardware fences that may
> choose to signal blindly.
> v8: [ Maarten Lankhorst ] Tons of tiny fixes, moved __dma_fence_init to
> header and fixed include mess. 

Re: [PATCH 1/2] radio-shark: Only compile led support when CONFIG_LED_CLASS is set

2012-08-10 Thread Mauro Carvalho Chehab
Em 10-08-2012 16:58, Hans de Goede escreveu:
> Reported-by: Dadiv Rientjes 
> Signed-off-by: Hans de Goede 
> ---
>  drivers/media/radio/radio-shark.c | 26 --
>  1 file changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/media/radio/radio-shark.c 
> b/drivers/media/radio/radio-shark.c
> index c2ead23..f746ed0 100644
> --- a/drivers/media/radio/radio-shark.c
> +++ b/drivers/media/radio/radio-shark.c
> @@ -27,7 +27,6 @@
>  
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
> @@ -35,6 +34,12 @@
>  #include 
>  #include 
>  
> +#if defined(CONFIG_LEDS_CLASS) || \
> +(defined(CONFIG_LEDS_CLASS_MODULE) && defined(CONFIG_RADIO_SHARK_MODULE))
> +#include 

Conditionally including headers is not a good thing.

...
>  static void usb_shark_disconnect(struct usb_interface *intf)
>  {
>   struct v4l2_device *v4l2_dev = usb_get_intfdata(intf);
>   struct shark_device *shark = v4l2_dev_to_shark(v4l2_dev);
> +#ifdef SHARK_USE_LEDS
>   int i;
> +#endif
>  
>   mutex_lock(>tea.mutex);
>   v4l2_device_disconnect(>v4l2_dev);
>   snd_tea575x_exit(>tea);
>   mutex_unlock(>tea.mutex);
>  
> +#ifdef SHARK_USE_LEDS
>   for (i = 0; i < NO_LEDS; i++)
>   led_classdev_unregister(>leds[i]);
> +#endif
>  
>   v4l2_device_put(>v4l2_dev);
>  }

That looks ugly. Maybe you could code it on a different way.

You could be move all shark_use_leds together into the same place at
the code, like:

#if defined(CONFIG_LEDS_CLASS) || \
(defined(CONFIG_LEDS_CLASS_MODULE) && defined(CONFIG_RADIO_SHARK_MODULE))

 static void shark_led_set_blue(struct led_classdev *led_cdev,
...
.brightness_set = shark_led_set_red,
},
 };

static void shark_led_disconnect(...)
{
...
}

static void shark_led_release(...)
{
...
}

static void shark_led_register(...)
{
...
}
#else
static inline void shark_led_disconnect(...) { };
static inline void shark_led_release(...) { };
static inline void shark_led_register(...)
{
printk(KERN_WARN "radio-shark: CONFIG_LED_CLASS not enabled. LEDs won't 
work\n");
}
#endif

And let the rest of the code to call the shark_led functions, as if LEDS aren't 
enabled,
the function stubs won't produce any code (well, except for the above error 
notice).

The same comment also applies to patch 2.

Regards,
Mauro
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Q: how to control the TTY output queue in real time?

2012-08-10 Thread Alan Cox

> If they do quite fine with the fifo, then maybe the new
> function will do too? Its basically a tcdrain(), just with
> the controllable watermark I guess.

I guess providing you account the fifo, and any hardware flow control it
would work.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 0/6] omap-am33xx rtc dt support

2012-08-10 Thread Sekhar Nori
On 7/27/2012 5:53 PM, Afzal Mohammed wrote:
> Hi,
> 
> This series makes rtc-omap driver DT capable, adds AM33xx
> RTC DT support along with a few enchancments to the driver.
> 
> rtc-omap driver is made intelligent enough to handle kicker
> mechanism. This helps in removing kicker mechanism support
> done for DaVinci at platform level.
> 
> This has been tested on Beaglebone (AM33xx platform) and on
> DaVinci DA850 EVM.
> 
> This series is based over linux-omap master and can be
> directly applied over linux-next, except for
> [PATCH 6/6] arm/dts: am33xx rtc node.
> 
> PATCH 6/6 should go through linux-omap tree and needs
> http://www.mail-archive.com/linux-omap@vger.kernel.org/msg71644.html
> (arm/dts: am33xx wdt node) to get applied cleanly

I tested patches 1-5 on AM18x EVM using rtcwake and hwclock commands.
Also tested on DT enabled AM18x EVM using hwclock.

For patched 1-5:

Acked-by: Sekhar Nori 

Alessandro,

I assume you would want me to queue 2/6 through DaVinci tree. That patch
depends on 1/6 being accepted and merged by you. Let me know how you
want to move forward here.

Thanks,
Sekhar
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] radio-shark: Only compile led support when CONFIG_LED_CLASS is set

2012-08-10 Thread Hans de Goede
Reported-by: Dadiv Rientjes 
Signed-off-by: Hans de Goede 
---
 drivers/media/radio/radio-shark.c | 26 --
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/drivers/media/radio/radio-shark.c 
b/drivers/media/radio/radio-shark.c
index c2ead23..f746ed0 100644
--- a/drivers/media/radio/radio-shark.c
+++ b/drivers/media/radio/radio-shark.c
@@ -27,7 +27,6 @@
 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -35,6 +34,12 @@
 #include 
 #include 
 
+#if defined(CONFIG_LEDS_CLASS) || \
+(defined(CONFIG_LEDS_CLASS_MODULE) && defined(CONFIG_RADIO_SHARK_MODULE))
+#include 
+#define SHARK_USE_LEDS 1
+#endif
+
 /*
  * Version Information
  */
@@ -54,6 +59,7 @@ MODULE_LICENSE("GPL");
 
 #define v4l2_dev_to_shark(d) container_of(d, struct shark_device, v4l2_dev)
 
+#ifdef SHARK_USE_LEDS
 enum { BLUE_LED, BLUE_PULSE_LED, RED_LED, NO_LEDS };
 
 static void shark_led_set_blue(struct led_classdev *led_cdev,
@@ -83,17 +89,20 @@ static const struct led_classdev 
shark_led_templates[NO_LEDS] = {
.brightness_set = shark_led_set_red,
},
 };
+#endif
 
 struct shark_device {
struct usb_device *usbdev;
struct v4l2_device v4l2_dev;
struct snd_tea575x tea;
 
+#ifdef SHARK_USE_LEDS
struct work_struct led_work;
struct led_classdev leds[NO_LEDS];
char led_names[NO_LEDS][32];
atomic_t brightness[NO_LEDS];
unsigned long brightness_new;
+#endif
 
u8 *transfer_buffer;
u32 last_val;
@@ -175,6 +184,7 @@ static struct snd_tea575x_ops shark_tea_ops = {
.read_val  = shark_read_val,
 };
 
+#ifdef SHARK_USE_LEDS
 static void shark_led_work(struct work_struct *work)
 {
struct shark_device *shark =
@@ -244,20 +254,25 @@ static void shark_led_set_red(struct led_classdev 
*led_cdev,
set_bit(RED_LED, >brightness_new);
schedule_work(>led_work);
 }
+#endif
 
 static void usb_shark_disconnect(struct usb_interface *intf)
 {
struct v4l2_device *v4l2_dev = usb_get_intfdata(intf);
struct shark_device *shark = v4l2_dev_to_shark(v4l2_dev);
+#ifdef SHARK_USE_LEDS
int i;
+#endif
 
mutex_lock(>tea.mutex);
v4l2_device_disconnect(>v4l2_dev);
snd_tea575x_exit(>tea);
mutex_unlock(>tea.mutex);
 
+#ifdef SHARK_USE_LEDS
for (i = 0; i < NO_LEDS; i++)
led_classdev_unregister(>leds[i]);
+#endif
 
v4l2_device_put(>v4l2_dev);
 }
@@ -266,7 +281,9 @@ static void usb_shark_release(struct v4l2_device *v4l2_dev)
 {
struct shark_device *shark = v4l2_dev_to_shark(v4l2_dev);
 
+#ifdef SHARK_USE_LEDS
cancel_work_sync(>led_work);
+#endif
v4l2_device_unregister(>v4l2_dev);
kfree(shark->transfer_buffer);
kfree(shark);
@@ -276,7 +293,10 @@ static int usb_shark_probe(struct usb_interface *intf,
   const struct usb_device_id *id)
 {
struct shark_device *shark;
-   int i, retval = -ENOMEM;
+   int retval = -ENOMEM;
+#ifdef SHARK_USE_LEDS
+   int i;
+#endif
 
shark = kzalloc(sizeof(struct shark_device), GFP_KERNEL);
if (!shark)
@@ -321,6 +341,7 @@ static int usb_shark_probe(struct usb_interface *intf,
goto err_init_tea;
}
 
+#ifdef SHARK_USE_LEDS
INIT_WORK(>led_work, shark_led_work);
for (i = 0; i < NO_LEDS; i++) {
shark->leds[i] = shark_led_templates[i];
@@ -341,6 +362,7 @@ static int usb_shark_probe(struct usb_interface *intf,
 "couldn't register led: %s\n",
 shark->led_names[i]);
}
+#endif
 
return 0;
 
-- 
1.7.11.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Q: how to control the TTY output queue in real time?

2012-08-10 Thread Stas Sergeev

Hi Alan, thanks, clear enough now. :)

10.08.2012 23:33, Alan Cox wrote:

if (bytes_left < constant)
write_wakeup


and I suspect if you made that adjustable and turned off the fifo and any
other funnies you'd at least make it work for a sufficiently rigged demo.

You suggest to turn off the fifo, sounds worrysome,
does this mean that tcdrain() and TIOCOUTQ do not
account the fifo too?
If they do quite fine with the fifo, then maybe the new
function will do too? Its basically a tcdrain(), just with
the controllable watermark I guess.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] radio-shark2: Only compile led support when CONFIG_LED_CLASS is set

2012-08-10 Thread Hans de Goede
Reported-by: Dadiv Rientjes 
Signed-off-by: Hans de Goede 
---
 drivers/media/radio/radio-shark2.c | 27 ---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/drivers/media/radio/radio-shark2.c 
b/drivers/media/radio/radio-shark2.c
index b9575de..e593d5a 100644
--- a/drivers/media/radio/radio-shark2.c
+++ b/drivers/media/radio/radio-shark2.c
@@ -27,7 +27,6 @@
 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -35,6 +34,12 @@
 #include 
 #include "radio-tea5777.h"
 
+#if defined(CONFIG_LEDS_CLASS) || \
+(defined(CONFIG_LEDS_CLASS_MODULE) && defined(CONFIG_RADIO_SHARK2_MODULE))
+#include 
+#define SHARK_USE_LEDS 1
+#endif
+
 MODULE_AUTHOR("Hans de Goede ");
 MODULE_DESCRIPTION("Griffin radioSHARK2, USB radio receiver driver");
 MODULE_LICENSE("GPL");
@@ -43,7 +48,6 @@ static int debug;
 module_param(debug, int, 0);
 MODULE_PARM_DESC(debug, "Debug level (0-1)");
 
-
 #define SHARK_IN_EP0x83
 #define SHARK_OUT_EP   0x05
 
@@ -52,6 +56,7 @@ MODULE_PARM_DESC(debug, "Debug level (0-1)");
 
 #define v4l2_dev_to_shark(d) container_of(d, struct shark_device, v4l2_dev)
 
+#ifdef SHARK_USE_LEDS
 enum { BLUE_LED, RED_LED, NO_LEDS };
 
 static void shark_led_set_blue(struct led_classdev *led_cdev,
@@ -73,17 +78,20 @@ static const struct led_classdev 
shark_led_templates[NO_LEDS] = {
.brightness_set = shark_led_set_red,
},
 };
+#endif
 
 struct shark_device {
struct usb_device *usbdev;
struct v4l2_device v4l2_dev;
struct radio_tea5777 tea;
 
+#ifdef SHARK_USE_LEDS
struct work_struct led_work;
struct led_classdev leds[NO_LEDS];
char led_names[NO_LEDS][32];
atomic_t brightness[NO_LEDS];
unsigned long brightness_new;
+#endif
 
u8 *transfer_buffer;
 };
@@ -161,6 +169,7 @@ static struct radio_tea5777_ops shark_tea_ops = {
.read_reg  = shark_read_reg,
 };
 
+#ifdef SHARK_USE_LEDS
 static void shark_led_work(struct work_struct *work)
 {
struct shark_device *shark =
@@ -216,20 +225,25 @@ static void shark_led_set_red(struct led_classdev 
*led_cdev,
set_bit(RED_LED, >brightness_new);
schedule_work(>led_work);
 }
+#endif
 
 static void usb_shark_disconnect(struct usb_interface *intf)
 {
struct v4l2_device *v4l2_dev = usb_get_intfdata(intf);
struct shark_device *shark = v4l2_dev_to_shark(v4l2_dev);
+#ifdef SHARK_USE_LEDS
int i;
+#endif
 
mutex_lock(>tea.mutex);
v4l2_device_disconnect(>v4l2_dev);
radio_tea5777_exit(>tea);
mutex_unlock(>tea.mutex);
 
+#ifdef SHARK_USE_LEDS
for (i = 0; i < NO_LEDS; i++)
led_classdev_unregister(>leds[i]);
+#endif
 
v4l2_device_put(>v4l2_dev);
 }
@@ -238,7 +252,9 @@ static void usb_shark_release(struct v4l2_device *v4l2_dev)
 {
struct shark_device *shark = v4l2_dev_to_shark(v4l2_dev);
 
+#ifdef SHARK_USE_LEDS
cancel_work_sync(>led_work);
+#endif
v4l2_device_unregister(>v4l2_dev);
kfree(shark->transfer_buffer);
kfree(shark);
@@ -248,7 +264,10 @@ static int usb_shark_probe(struct usb_interface *intf,
   const struct usb_device_id *id)
 {
struct shark_device *shark;
-   int i, retval = -ENOMEM;
+   int retval = -ENOMEM;
+#ifdef SHARK_USE_LEDS
+   int i;
+#endif
 
shark = kzalloc(sizeof(struct shark_device), GFP_KERNEL);
if (!shark)
@@ -292,6 +311,7 @@ static int usb_shark_probe(struct usb_interface *intf,
goto err_init_tea;
}
 
+#ifdef SHARK_USE_LEDS
INIT_WORK(>led_work, shark_led_work);
for (i = 0; i < NO_LEDS; i++) {
shark->leds[i] = shark_led_templates[i];
@@ -312,6 +332,7 @@ static int usb_shark_probe(struct usb_interface *intf,
 "couldn't register led: %s\n",
 shark->led_names[i]);
}
+#endif
 
return 0;
 
-- 
1.7.11.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linaro-mm-sig] [PATCH 3/4] dma-seqno-fence: Hardware dma-buf implementation of fencing (v2)

2012-08-10 Thread Daniel Vetter
On Fri, Aug 10, 2012 at 04:57:58PM +0200, Maarten Lankhorst wrote:
> This type of fence can be used with hardware synchronization for simple
> hardware that can block execution until the condition
> (dma_buf[offset] - value) >= 0 has been met.
> 
> A software fallback still has to be provided in case the fence is used
> with a device that doesn't support this mechanism. It is useful to expose
> this for graphics cards that have an op to support this.
> 
> Some cards like i915 can export those, but don't have an option to wait,
> so they need the software fallback.
> 
> I extended the original patch by Rob Clark.
> 
> v1: Original
> v2: Renamed from bikeshed to seqno, moved into dma-fence.c since
> not much was left of the file. Lots of documentation added.
> 
> Signed-off-by: Maarten Lankhorst 

Patch looks good, two bikesheds inline. Either way
Reviewed-by: Daniel Vetter 

> ---
>  drivers/base/dma-fence.c  |   21 +++
>  include/linux/dma-fence.h |   61 
> +
>  2 files changed, 82 insertions(+)
> 
> diff --git a/drivers/base/dma-fence.c b/drivers/base/dma-fence.c
> index 93448e4..4092a58 100644
> --- a/drivers/base/dma-fence.c
> +++ b/drivers/base/dma-fence.c
> @@ -266,3 +266,24 @@ struct dma_fence *dma_fence_create(void *priv)
>   return fence;
>  }
>  EXPORT_SYMBOL_GPL(dma_fence_create);
> +
> +static int seqno_enable_signaling(struct dma_fence *fence)
> +{
> + struct dma_seqno_fence *seqno_fence = to_seqno_fence(fence);
> + return seqno_fence->enable_signaling(seqno_fence);
> +}
> +
> +static void seqno_release(struct dma_fence *fence)
> +{
> + struct dma_seqno_fence *f = to_seqno_fence(fence);
> +
> + if (f->release)
> + f->release(f);
> + dma_buf_put(f->sync_buf);
> +}
> +
> +const struct dma_fence_ops dma_seqno_fence_ops = {
> + .enable_signaling = seqno_enable_signaling,
> + .release = seqno_release
> +};
> +EXPORT_SYMBOL_GPL(dma_seqno_fence_ops);
> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> index e0ceddd..3ef0da0 100644
> --- a/include/linux/dma-fence.h
> +++ b/include/linux/dma-fence.h
> @@ -91,6 +91,19 @@ struct dma_fence_ops {
>   void (*release)(struct dma_fence *fence);
>  };
>  
> +struct dma_seqno_fence {
> + struct dma_fence base;
> +
> + struct dma_buf *sync_buf;
> + uint32_t seqno_ofs;
> + uint32_t seqno;
> +
> + int (*enable_signaling)(struct dma_seqno_fence *fence);
> + void (*release)(struct dma_seqno_fence *fence);

I think using dma_fence_ops here is the better color. We lose type-safety
at compile-time, but still keep type-safety at runtime (thanks to
to_dma_seqno_fence). In addition people seem to like to constify function
pointers, we'd save a pointer and if we extend the sw dma_fence interface.

> +};
> +
> +extern const struct dma_fence_ops dma_seqno_fence_ops;
> +
>  struct dma_fence *dma_fence_create(void *priv);
>  
>  /**
> @@ -121,4 +134,52 @@ int dma_fence_wait(struct dma_fence *fence, bool intr, 
> unsigned long timeout);
>  int dma_fence_add_callback(struct dma_fence *fence, struct dma_fence_cb *cb,
>  dma_fence_func_t func, void *priv);
>  
> +/**
> + * to_seqno_fence - cast a dma_fence to a dma_seqno_fence
> + * @fence: dma_fence to cast to a dma_seqno_fence
> + *
> + * Returns NULL if the dma_fence is not a dma_seqno_fence,
> + * or the dma_seqno_fence otherwise.
> + */
> +static inline struct dma_seqno_fence *
> +to_seqno_fence(struct dma_fence *fence)
> +{
> + if (fence->ops != _seqno_fence_ops)
> + return NULL;
> + return container_of(fence, struct dma_seqno_fence, base);
> +}

I think adding an is_dma_seqno_fence would be nice ...

> +
> +/**
> + * dma_seqno_fence_init - initialize a seqno fence
> + * @fence: dma_seqno_fence to initialize
> + * @sync_buf: buffer containing the memory location to signal on
> + * @seqno_ofs: the offset within @sync_buf
> + * @seqno: the sequence # to signal on
> + * @priv: value of priv member
> + * @enable_signaling: callback which is called when some other device is
> + *waiting for sw notification of fence
> + * @release: callback called during destruction before object is freed.
> + *
> + * This function initializes a struct dma_seqno_fence with passed parameters,
> + * and takes a reference on sync_buf which is released on fence destruction.
> + */
> +static inline void
> +dma_seqno_fence_init(struct dma_seqno_fence *fence,
> + struct dma_buf *sync_buf,
> + uint32_t seqno_ofs, uint32_t seqno, void *priv,
> + int (*enable_signaling)(struct dma_seqno_fence *),
> + void (*release)(struct dma_seqno_fence *))
> +{
> + BUG_ON(!fence || !sync_buf || !enable_signaling);
> +
> + __dma_fence_init(>base, _seqno_fence_ops, priv);
> +
> + get_dma_buf(sync_buf);
> + fence->sync_buf = sync_buf;
> + fence->seqno_ofs = seqno_ofs;
> + 

Re: [Linaro-mm-sig] [PATCH 1/4] dma-buf: remove fallback for !CONFIG_DMA_SHARED_BUFFER

2012-08-10 Thread Daniel Vetter
On Fri, Aug 10, 2012 at 04:57:43PM +0200, Maarten Lankhorst wrote:
> Documentation says that code requiring dma-buf should add it to
> select, so inline fallbacks are not going to be used. A link error
> will make it obvious what went wrong, instead of silently doing
> nothing at runtime.
> 
> Signed-off-by: Maarten Lankhorst 

I've botched it more than once to update these when creating new dma-buf
code. Hence

Reviewed-by: Daniel Vetter 
-- 
Daniel Vetter
Mail: dan...@ffwll.ch
Mobile: +41 (0)79 365 57 48
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Q: how to control the TTY output queue in real time?

2012-08-10 Thread Stas Sergeev

Hello.

I am writing an app that needs to control the
serial xmit in real-time. What I need is a notification
that the TTY output queue fillup (returned by TIOCOUTQ
ioctl) have dropped below the specified value.
I haven't found anything that can help implementing
this. If I can't get an async notification, the sync
notification will do too, like, for instance, the tcdrain()
call, but with the argument to specify the needed fillup,
below which the function will return.
If there is nothing like this, then even the notification
on every transmitted char will do.
But I've found nothing of the above. :(

Any suggestions how the real-time control can be
implemented?

(please CC me the replies)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Q: how to control the TTY output queue in real time?

2012-08-10 Thread Alan Cox
> I am writing an app that needs to control the
> serial xmit in real-time. What I need is a notification
> that the TTY output queue fillup (returned by TIOCOUTQ
> ioctl) have dropped below the specified value.

Not a supported feature basically.

> I haven't found anything that can help implementing
> this. If I can't get an async notification, the sync
> notification will do too, like, for instance, the tcdrain()
> call, but with the argument to specify the needed fillup,
> below which the function will return.
> If there is nothing like this, then even the notification
> on every transmitted char will do.
> But I've found nothing of the above. :(
> 
> Any suggestions how the real-time control can be
> implemented?

Bascially even on the hardware that knows with this degree of granularity
we don't propogate the information back in the manner you want.

I'm not sure its a total loss however. Currently all the code basically
does stuff in the tx path or tx irq handler along the line of


if (bytes_left < constant)
write_wakeup


and I suspect if you made that adjustable and turned off the fifo and any
other funnies you'd at least make it work for a sufficiently rigged demo.

We could in theory put it in the tty_port in future too if its general
purpose useful.

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] "x86-64/efi: Use EFI to deal with platform wall clock" prevents my machine from booting

2012-08-10 Thread Matthew Garrett
On Fri, Aug 10, 2012 at 12:22:12PM -0700, Yinghai Lu wrote:

> What is solution for this regression?

Revert the patch for now, we'll add it back once we've got the UEFI 
pagetable set up.

-- 
Matthew Garrett | mj...@srcf.ucam.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/2] net: connect to UNIX sockets from specified root

2012-08-10 Thread Alan Cox
On Fri, 10 Aug 2012 15:11:50 -0400
"J. Bruce Fields"  wrote:

> On Fri, Aug 10, 2012 at 07:26:28PM +0100, Alan Cox wrote:
> > > On that whole subject...
> > > 
> > > Do we need a Unix domain socket equivalent to openat()?
> > 
> > I don't think so. The name is just a file system indexing trick, it's not
> > really the socket proper. It's little more than "ascii string with
> > permissions attached"
> 
> That's overstating the case.  As I understand it the address is resolved
> by a pathname lookup like any other--it can follow symlinks, is relative
> to the current working directory and filesystem namespace, etc. 

Explicitly for Linux yes - this is not generally true of the AF_UNIX
socket domain and even the permissions aspect isn't guaranteed to be
supported on some BSD environments !

The name is however just a proxy for the socket itself. You don't even
get a device node in the usual sense or the same inode in the file system
space.

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] "x86-64/efi: Use EFI to deal with platform wall clock" prevents my machine from booting

2012-08-10 Thread Yinghai Lu
On Thu, Aug 9, 2012 at 1:51 AM, Matt Fleming  wrote:
> On Tue, 2012-08-07 at 11:50 +0100, Jan Beulich wrote:
>> >
>> > I managed to find a machine to reproduce this on and it looks like the
>> > ASUS firmware engineers are upto their old tricks of referencing
>> > physical addresses after we've taken control of the memory map,
>>
>> Yippie. On such systems we simply can't do any runtime calls.
>> Should we add a command line option forcing efi_native to false,
>> thus suppressing all runtime calls? Or would the "noefi" one be
>> enough already?
>
> I think a better solution for this, seeing as there appear to be *so*
> many ASUS machines in the wild with this inability to do virtual EFI
> calls, is to provide a 1:1 mapping as well as our regular virt->phys
> mapping for the benefit of the firmware. We can load our special page
> table in efi_call_*, etc.
>
> One thing to note is that because of breakage seen on Apple machines
> last time Matthew tried this approach, we (the kernel) can't actually
> access the 1:1 mapping, it would exist purely for the benefit of
> firmware that was broken enough to reference physical addresses after
> SetVirtualAddressMap().
>

What is solution for this regression?

It seems Jan's commit broke our setup with UEFI too.

Assume other systems with AMI code base would have same problem.

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 02/15] Declaring udp protocols has its own proc entry

2012-08-10 Thread Jan Ceuleers
On 08/10/2012 08:31 PM, Jan Ceuleers wrote:
> Two points:
> 
> - I haven't seen patch 01/15;

Correction: I have.

> - these patches should go to netdev rather than lkml

But this is still the case; MAINTAINERS would have told you that.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/2] net: connect to UNIX sockets from specified root

2012-08-10 Thread J. Bruce Fields
On Fri, Aug 10, 2012 at 07:26:28PM +0100, Alan Cox wrote:
> > On that whole subject...
> > 
> > Do we need a Unix domain socket equivalent to openat()?
> 
> I don't think so. The name is just a file system indexing trick, it's not
> really the socket proper. It's little more than "ascii string with
> permissions attached"

That's overstating the case.  As I understand it the address is resolved
by a pathname lookup like any other--it can follow symlinks, is relative
to the current working directory and filesystem namespace, etc.  So a
unix-domain socket equivalent to openat() would at least be
well-defined--whether it's needed or not, I don't know.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ANNOUNCE] 3.0.40-rt60

2012-08-10 Thread Steven Rostedt

Dear RT Folks,

I'm pleased to announce the 3.0.40-rt60 stable release.


This release is just an update to the new stable 3.0.40 version
and no RT specific changes have been made.


You can get this release via the git tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git

  Head SHA1: 810bfb094d8b827b9748a0e237b83675467d94e1


Or to build 3.0.40-rt60 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v3.0/linux-3.0.tar.xz

  http://www.kernel.org/pub/linux/kernel/v3.0/patch-3.0.40.xz

  
http://www.kernel.org/pub/linux/kernel/projects/rt/3.0/patch-3.0.40-rt60.patch.xz



Enjoy,

-- Steve



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] printk: Fix calculation of length used to discard records

2012-08-10 Thread Jeff Mahoney
While tracking down a weird buffer overflow issue in a program that
looked to be sane, I started double checking the length returned
by syslog(SYSLOG_ACTION_READ_ALL, ...) to make sure it wasn't overflowing
the buffer. Sure enough, it was.  I saw this in strace:

11339 syslog(SYSLOG_ACTION_READ_ALL, "<5>[244017.708129] REISERFS (dev"..., 
8192) = 8279

It turns out that the loops that calculate how much space the entries
will take when they're copied don't include the newlines and
prefixes that will be included in the final output since prev flags
is passed as 0. 

This patch properly accounts for it and fixes the overflow.

CC: sta...@kernel.org
Signed-off-by: Jeff Mahoney 
---
 kernel/printk.c |2 ++
 1 file changed, 2 insertions(+)

--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -1034,6 +1034,7 @@ static int syslog_print_all(char __user
struct log *msg = log_from_idx(idx);
 
len += msg_print_text(msg, prev, true, NULL, 0);
+   prev = msg->flags;
idx = log_next(idx);
seq++;
}
@@ -1046,6 +1047,7 @@ static int syslog_print_all(char __user
struct log *msg = log_from_idx(idx);
 
len -= msg_print_text(msg, prev, true, NULL, 0);
+   prev = msg->flags;
idx = log_next(idx);
seq++;
}

-- 
Jeff Mahoney
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL 0/3] arm-soc updates

2012-08-10 Thread Arnd Bergmann
Hi Linus,

Here are three pull requests for you to consider for the next -rc.

The first one is our regular bug fix series, this time with a lot
of patches fixing build regressions. Please pull at least this one.

The other two are things that fell through the cracks in the v3.6
merge window because of communication problems between the arm-soc
team and our downstream maintainers. Neither of them is important,
but applying them should also be harmless. If you feel like making
an exception, please have a look and consider merging them as well,
but don't worry about them otherwise.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL 3/3] arm-soc: make of_device_id->data constant

2012-08-10 Thread Arnd Bergmann
The following changes since commit 28a33cbc24e4256c143dce96c7d93bf423229f92:

  Linux 3.5 (2012-07-21 13:58:29 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc.git 
tags/late-warnings

for you to fetch changes up to 100e8f5fda756fe71d21c0ea68d67e56f5f05167:

  Merge branch 'ofdeviceiddata' of git://git.pengutronix.de/git/ukl/linux into 
late/warnings (2012-07-24 16:58:51 +0200)



arm-soc: make of_device_id->data constant

This patch series from Uwe Kleine-Königwas meant to go into
the v3.6 merge window, but got lost in a small miscommunication
between Olof and me. It gets rid of a (harmless) gcc warning
by making the of_device_id->data pointer constant, which is
generally considered to be a good idea.

Cc: "Uwe Kleine-König" 

Arnd Bergmann (7):
  watchdog/mpc8xxx: add a const qualifier
  powerpc/fsl_msi: drop unneeded cast to non-const pointer
  mfd/da9052: make i2c_device_id array const
  i2c/mpc: make data used as *of_device_id.data const
  macintosh/mediabay: make data used as *of_device_id.data const
  can: mpc5xxx_can: make data used as *of_device_id.data const
  Merge branch 'ofdeviceiddata' of git://git.pengutronix.de/git/ukl/linux 
into late/warnings

Marc Kleine-Budde (1):
  can: mpc5xxx_can: make data in mpc5xxx_can_probe const

Uwe Kleine-König (18):
  spi/imx: make spi_imx_data.devtype_data member point to const data
  spi/spi-omap2-mcspi: add a const qualifier
  serial/imx: make imx_port.devdata member point to const data
  serial/mpc52xx_uart: add a const qualifier
  ARM: cache-l2x0: add a const qualifier
  misc/atmel_tc: make atmel_tc.tcb_config member point to const data
  gpio/gpio-omap.c: add a const qualifier
  gpio/mpc8xxx: add a const qualifier
  i2c/i2c-omap: add a const qualifier
  i2c/mpc: add a const qualifier
  dmaengine: at_hdmac: add a few const qualifiers
  mmc/omap_hsmmc: add a const qualifier
  macintosh/mediabay: add a const qualifier
  powerpc/83xx: add a const qualifier
  powerpc/fsl_msi: add a const qualifier
  powerpc/celleb_pci: add a const qualifier
  of: add const to struct *of_device_id.data
  gpio/gpio-omap: make platformdata used as *of_device_id.data const

 arch/arm/mm/cache-l2x0.c |2 +-
 arch/powerpc/platforms/83xx/suspend.c|2 +-
 arch/powerpc/platforms/cell/celleb_pci.c |2 +-
 arch/powerpc/sysdev/fsl_msi.c|8 
 drivers/dma/at_hdmac.c   |4 ++--
 drivers/gpio/gpio-mpc8xxx.c  |2 +-
 drivers/gpio/gpio-omap.c |8 
 drivers/i2c/busses/i2c-mpc.c |   12 ++--
 drivers/i2c/busses/i2c-omap.c|3 ++-
 drivers/macintosh/mediabay.c |8 
 drivers/mfd/da9052-i2c.c |4 ++--
 drivers/mmc/host/omap_hsmmc.c|2 +-
 drivers/net/can/mscan/mpc5xxx_can.c  |6 +++---
 drivers/spi/spi-imx.c|2 +-
 drivers/spi/spi-omap2-mcspi.c|2 +-
 drivers/tty/serial/imx.c |2 +-
 drivers/tty/serial/mpc52xx_uart.c|2 +-
 drivers/watchdog/mpc8xxx_wdt.c   |2 +-
 include/linux/atmel_tc.h |2 +-
 include/linux/mod_devicetable.h  |2 +-
 20 files changed, 39 insertions(+), 38 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL 2/3] arm-soc: late at91 changes

2012-08-10 Thread Arnd Bergmann
The following changes since commit 0d7614f09c1ebdbaa1599a5aba7593f147bf96ee:

  Linux 3.6-rc1 (2012-08-02 16:38:10 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc.git 
tags/late-at91-mci

for you to fetch changes up to b3ce167791c02da98f6e527be7d19213b3abddf0:

  Merge tag 'at91-for-next-soc' of git://github.com/at91linux/linux-at91 into 
at91/mci (2012-08-10 12:34:18 +0200)



arm-soc: late at91 changes

This series was originally sent for inclusion in v3.5 but missed out
twice, first time because it was late for v3.5 and we decided not to
push it then, and this time because the maintainer was on a long
vacation before the merge window and nobody noticed that it was missing
from the v3.6 list.

The main purpose of this series is to move over the at91 board files
to use the atmel-mci driver instead of the old at91_mci that is
scheduled for removal in v3.7.

Cc: Nicolas Ferre 
Cc: Ludovic Desroches 

Arnd Bergmann (1):
  Merge tag 'at91-for-next-soc' of git://github.com/at91linux/linux-at91 
into at91/mci

Ludovic Desroches (1):
  ARM: at91: add atmel-mci support for chips and boards which can use it

Nicolas Ferre (1):
  ARM: at91/defconfig: change the MCI driver to use in defconfigs

Paul Bolle (1):
  ARM: at91: set i2c_board_info.type to "ds1339" directly

Richard Genoud (1):
  ARM: at91/defconfig: Remove unaffected config option

 arch/arm/configs/afeb9260_defconfig  |1 -
 arch/arm/configs/at91rm9200_defconfig|2 +-
 arch/arm/configs/at91sam9261_defconfig   |2 +-
 arch/arm/configs/at91sam9263_defconfig   |3 +-
 arch/arm/configs/at91sam9g20_defconfig   |2 +-
 arch/arm/configs/at91sam9rl_defconfig|2 +-
 arch/arm/configs/cpu9260_defconfig   |2 +-
 arch/arm/configs/cpu9g20_defconfig   |2 +-
 arch/arm/configs/qil-a9260_defconfig |3 +-
 arch/arm/configs/stamp9g20_defconfig |1 -
 arch/arm/configs/usb-a9260_defconfig |1 -
 arch/arm/mach-at91/at91rm9200_devices.c  |   92 ++---
 arch/arm/mach-at91/at91sam9260_devices.c |   84 +---
 arch/arm/mach-at91/at91sam9261_devices.c |   60 +--
 arch/arm/mach-at91/at91sam9263.c |4 +-
 arch/arm/mach-at91/at91sam9263_devices.c |  161 +-
 arch/arm/mach-at91/at91sam9rl_devices.c  |   60 +--
 arch/arm/mach-at91/board-afeb-9260v1.c   |   14 +--
 arch/arm/mach-at91/board-carmeva.c   |   14 +--
 arch/arm/mach-at91/board-cpu9krea.c  |   17 ++--
 arch/arm/mach-at91/board-cpuat91.c   |   13 +--
 arch/arm/mach-at91/board-csb337.c|   14 +--
 arch/arm/mach-at91/board-eb9200.c|   14 +--
 arch/arm/mach-at91/board-ecbat91.c   |   14 +--
 arch/arm/mach-at91/board-eco920.c|   14 +--
 arch/arm/mach-at91/board-flexibity.c |   14 +--
 arch/arm/mach-at91/board-foxg20.c|   16 +--
 arch/arm/mach-at91/board-kb9202.c|   14 +--
 arch/arm/mach-at91/board-neocore926.c|   13 +--
 arch/arm/mach-at91/board-picotux200.c|   14 +--
 arch/arm/mach-at91/board-qil-a9260.c |   14 +--
 arch/arm/mach-at91/board-rm9200dk.c  |   14 +--
 arch/arm/mach-at91/board-rm9200ek.c  |   14 +--
 arch/arm/mach-at91/board-rsi-ews.c   |   13 +--
 arch/arm/mach-at91/board-sam9-l9260.c|   16 +--
 arch/arm/mach-at91/board-sam9260ek.c |   16 +--
 arch/arm/mach-at91/board-sam9261ek.c |   13 +--
 arch/arm/mach-at91/board-sam9263ek.c |   13 +--
 arch/arm/mach-at91/board-sam9g20ek.c |   16 +--
 arch/arm/mach-at91/board-sam9rlek.c  |   13 +--
 arch/arm/mach-at91/board-stamp9g20.c |   14 ---
 arch/arm/mach-at91/board-usb-a926x.c |2 -
 arch/arm/mach-at91/board-yl-9200.c   |   13 +--
 drivers/mtd/nand/Kconfig |   40 
 44 files changed, 384 insertions(+), 494 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL 1/3] arm-soc: bug fixes for v3.6-rc2

2012-08-10 Thread Arnd Bergmann
The following changes since commit 0d7614f09c1ebdbaa1599a5aba7593f147bf96ee:

  Linux 3.6-rc1 (2012-08-02 16:38:10 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc.git 
tags/fixes-for-linus

for you to fetch changes up to de9234306bb28fe6c8c3bb908e3f9956f5276a02:

  ARM: davinci: remove broken ntosd2_init_i2c (2012-08-10 13:14:36 +0200)


arm-soc: bug fixes for v3.6-rc2

These are a bunch of bug fixes that came in after the merge window and
one update for the MAINTAINERS file. The largest part of the fixes
are patches that address bugs found by building all the ARM defconfig
files. There are a lot more warnings that we have patches for, but
the others are either still under discussion or are harmless and
do not cause actual problems besides making the build slightly noisy.


Arnd Bergmann (19):
  Merge branch 'imx/fixes-for-3.6' of 
git://git.linaro.org/people/shawnguo/linux-2.6 into fixes
  Merge branch 'mxs/fixes-for-3.6' of 
git://git.linaro.org/people/shawnguo/linux-2.6 into fixes
  Merge tag 'imx-fixes' of git://git.pengutronix.de/git/imx/linux-2.6 into 
fixes
  mfd/asic3: fix asic3_mfd_probe return value
  usb/ohci-omap: remove unused variable
  ARM: pxa: remove irq_to_gpio from ezx-pcap driver
  Input: eeti_ts: pass gpio value instead of IRQ
  ARM: sa1100: include linux/io.h in hackkit leds code
  ARM: s3c24xx: use new PWM driver
  ARM: integrator: include 
  ARM: imx: gpmi-nand depends on mxs-dma
  ARM: exynos: exynos_pm_add_dev_to_genpd may be unused
  gpio: em: do not discard em_gio_irq_domain_cleanup
  mtd/omap2: fix dmaengine_slave_config error handling
  spi/s3c64xx: improve error handling
  omap-rng: fix use of SIMPLE_DEV_PM_OPS
  Merge branch 'testing/new-warnings' into fixes
  ARM: s3c24xx: enable CONFIG_BUG for tct_hammer
  ARM: davinci: remove broken ntosd2_init_i2c

Fabio Estevam (5):
  ARM: dts: imx27-3ds.dts: Fix serial console node
  ARM: imx6q-sabrelite: Setup CLKO IOMUX
  ARM: mx23: Fix registers range
  ARM: mx28: Fix registers range
  ARM: clk-imx31: Fix the keypad clock name

Javier Martin (1):
  i.MX27: Fix emma-prp and csi clocks.

Linus Walleij (2):
  MAINTAINERS: update entry for Linus Walleij
  ARM: integrator: use clk_prepare_enable() for timer

Marc Kleine-Budde (1):
  ARM: mxs: always build ocotp

Marek Vasut (1):
  ARM: mxs: Remove MMAP_MIN_ADDR setting from mxs_defconfig

Sebastian Hesselbarth (1):
  ARM: kirkwood: fix typo in Makefile.boot

Shawn Guo (3):
  ARM: imx: enable emi_slow_gate clock for imx5
  ARM: dts: imx53-ard: add regulators for lan9220
  ARM: dts: imx: fix gpio interrupts property

Stephen Warren (1):
  ARM: tegra: more regulator fixes for Harmony

 MAINTAINERS   |   17 ---
 arch/arm/boot/dts/imx23.dtsi  |   52 ++--
 arch/arm/boot/dts/imx27-3ds.dts   |2 +-
 arch/arm/boot/dts/imx28.dtsi  |   74 ++---
 arch/arm/boot/dts/imx51-babbage.dts   |2 +-
 arch/arm/boot/dts/imx53-ard.dts   |   22 -
 arch/arm/boot/dts/imx6q-sabrelite.dts |1 +
 arch/arm/configs/imx_v6_v7_defconfig  |1 +
 arch/arm/configs/mxs_defconfig|1 -
 arch/arm/configs/tct_hammer_defconfig |2 +-
 arch/arm/mach-davinci/board-neuros-osd2.c |   39 ---
 arch/arm/mach-exynos/pm_domains.c |2 +-
 arch/arm/mach-imx/clk-imx27.c |8 ++--
 arch/arm/mach-imx/clk-imx31.c |2 +-
 arch/arm/mach-imx/clk-imx51-imx53.c   |1 +
 arch/arm/mach-integrator/core.c   |1 +
 arch/arm/mach-integrator/integrator_ap.c  |2 +-
 arch/arm/mach-kirkwood/Makefile.boot  |4 +-
 arch/arm/mach-mxs/Kconfig |6 ---
 arch/arm/mach-mxs/Makefile|3 +-
 arch/arm/mach-pxa/raumfeld.c  |2 +-
 arch/arm/mach-s3c24xx/Kconfig |4 +-
 arch/arm/mach-sa1100/leds-hackkit.c   |1 +
 arch/arm/mach-tegra/board-harmony-power.c |   32 +++--
 arch/arm/plat-samsung/Kconfig |3 +-
 drivers/char/hw_random/omap-rng.c |2 +-
 drivers/gpio/gpio-em.c|2 +-
 drivers/input/touchscreen/eeti_ts.c   |   21 
 drivers/mfd/asic3.c   |1 +
 drivers/mfd/ezx-pcap.c|2 +-
 drivers/mtd/nand/Kconfig  |2 +-
 drivers/mtd/nand/omap2.c  |7 ++-
 drivers/spi/spi-s3c64xx.c |2 +-
 drivers/usb/host/ohci-omap.c  |2 -
 include/linux/input/eeti_ts.h |1 +
 include/linux/mfd/ezx-pcap.h  |1 +
 36 files changed, 160 

Re: i915 regression on 3.6-rc1: lid blanks screen

2012-08-10 Thread Hugh Dickins
On Fri, 10 Aug 2012, Takashi Iwai wrote:
> At Fri, 10 Aug 2012 14:35:13 +0200,
> Daniel Vetter wrote:
> > 
> > On Fri, Aug 10, 2012 at 1:59 PM, Takashi Iwai  wrote:
> > > At Mon, 6 Aug 2012 11:25:30 -0700 (PDT),
> > > Hugh Dickins wrote:
> > >>
> > >> On Mon, 6 Aug 2012, Daniel Vetter wrote:
> > >> > On Mon, Aug 6, 2012 at 6:21 AM, Hugh Dickins  wrote:
> > >> > > On Sun, 5 Aug 2012, Takashi Iwai wrote:
> > >> > >> At Sat, 4 Aug 2012 10:01:13 -0700 (PDT),
> > >> > >> Hugh Dickins wrote:
> > >> > >> >
> > >> > >> > Sorry to report that with 3.6-rc1, closing and opening the lid on
> > >> > >> > this ThinkPad T420s leaves the screen blank, and I have to reboot.
> > >> > >> >
> > >> > >> > Bisection led to this commit, and reverting indeed gets my screen 
> > >> > >> > back:
> > >> > >> >
> > >> > >> > commit 520c41cf2fa029d1e8b923ac2026f96664f17c4b
> > >> > >> > Author: Daniel Vetter 
> > >> > >> > Date:   Wed Jul 11 16:27:52 2012 +0200
> > >> > >> >
> > >> > >> > drm/i915/lvds: ditch ->prepare special case
> > >> > > ...
> > >> > >>
> > >> > >> Hm, it's surprising.
> > >> > >>
> > >> > >> Could you check whether the counter-part intel_lvds_enable() is
> > >> > >> called?  If the prepare callback affects, it must be from the mode
> > >> > >> setting (drm_crtc_helper_set_mode()).
> > >> > >
> > >> > > Yes, I put a dump_stack() in both, and intel_lvds_enable() gets 
> > >> > > called
> > >> > > about 0.28 seconds after the intel_lvds_disable() when I lift the 
> > >> > > lid;
> > >> > > but with no video display until I revert that commit.
> > >> >
> > >> > Can you please boot with drm.debug=0xe added to your kernel cmdline,
> > >> > reproduce the issue (with the two dump_stack calls added) and then
> > >> > attach the full dmesg?
> > >>
> > >> Collected, I'll send it to you both privately in a moment.
> > >>
> > >> >
> > >> > Also a few other things to try: What happens if you do a modeset on
> > >> > the LVDS while it's still working, e.g.
> > >>
> > >> In the dmesg, I've only gone to runlevel 3, simply working on the
> > >> console without startx.  For these xrandrs to work, I did startx
> > >> and used the graphics screen.
> > >
> > > OK, now I can see the problem here, too.  The key is that it happens
> > > only on Linux console, not on X.  That's why no one else reported.
> > > I guess the problem can be seen on many laptops with LVDS on PCH.

A correction there: for me it was happening both on X and on console;
but once I found that it happened even on the simple console, I mostly
stuck to bisecting and testing on that.

> > >
> > > Looking at intel_reg_dumper output, BLC_PWM_CPU_CTL is 0 while other
> > > registers are set correctly.  This seems coming from the rewrite of
> > > backlight control code by commit
> > >   24ded204: drm/i915: properly enable the blc controller on the right pipe
> > > and
> > >   a4f32fc3: drm/i915: don't forget the PCH backlight registers
> > >
> > > While the latter fixes the regression by the former commit, it still
> > > doesn't cover this regression.
> > >
> > > I don't know the exact hardware behavior, but it looks like that
> > > resetting BLC_PWM_PCH_CTL2 and BLC_PWM_PCH_CTL1 clears the
> > > BLC_PWM_CPU_CTL (oh what confusing reg names).
> > >
> > > FWIW, the commit 520c41cf you mentioned is no direct cause.  This
> > > patch works fine on the top of 3.5 kernel.  But it's like a bad drug,
> > > the combination of this and other two commits break things.
> > >
> > > The patch below is my quick fix.  It worked on an HP laptop.
> > > Hugh, could you give it a try?
> > 
> > Hm, this sounds eerily familiar to the backlight bug you've recently fixed 
> > in
> > 
> > commit 6db65cbb941f9d433659bdad02b307f6d94465df
> > Author: Takashi Iwai 
> > Date:   Thu Jun 21 15:30:41 2012 +0200
> > 
> > drm/i915: Fix eDP blank screen after S3 resume on HP desktops
> > 
> > Have you checked other code-paths for such issues?
> 
> Not yet, I'd leave such a joy rather to you guys :)
> 
> > The resume code
> > seems to follow this order already ... also, when you submit this
> > patch, can you please add a small comment to explain the ordering
> > constraint, like in the resume register restore function?
> 
> Sure, I'll add a comment and resubmit once when I hear it really fixes
> on Hugh's machine, too.

Indeed, your patch really fixes it on my machine: many thanks!

Hugh

> 
> 
> thanks,
> 
> Takashi
> 
> > 
> > Thanks, Daniel
> > 
> > >
> > >
> > > thanks,
> > >
> > > Takashi
> > >
> > > ===
> > > From: Takashi Iwai 
> > > Subject: [PATCH] drm/i915: Fix blank panel at reopening lid on Linux 
> > > console
> > >
> > > When you reopen the lid on Linux console on a laptop with PCH, the
> > > panel suddenly goes blank.  It seems because BLC_PWM_CPU_CTL register
> > > is cleared when BLC_PWM_PCH_CTL1 and BLC_PWM_PCH_CTL2 registers are
> > > played.
> > >
> > > This patch fixes the problem by setting BLC_PWM_CPU_CTL after enabling
> > > BLC_PWM_PCH_CTL_1 and _2 registers.
> > >
> > > 

Re: [PATCH v2 02/11] memcg: Reclaim when more than one page needed.

2012-08-10 Thread Michal Hocko
On Thu 09-08-12 17:01:10, Glauber Costa wrote:
> From: Suleiman Souhlal 
> 
> mem_cgroup_do_charge() was written before kmem accounting, and expects
> three cases: being called for 1 page, being called for a stock of 32
> pages, or being called for a hugepage.  If we call for 2 or 3 pages (and
> both the stack and several slabs used in process creation are such, at
> least with the debug options I had), it assumed it's being called for
> stock and just retried without reclaiming.
> 
> Fix that by passing down a minsize argument in addition to the csize.
> 
> And what to do about that (csize == PAGE_SIZE && ret) retry?  If it's
> needed at all (and presumably is since it's there, perhaps to handle
> races), then it should be extended to more than PAGE_SIZE, yet how far?
> And should there be a retry count limit, of what?  For now retry up to
> COSTLY_ORDER (as page_alloc.c does) and make sure not to do it if
> __GFP_NORETRY.
> 
> [v4: fixed nr pages calculation pointed out by Christoph Lameter ]
> 
> Signed-off-by: Suleiman Souhlal 
> Signed-off-by: Glauber Costa 
> Reviewed-by: Kamezawa Hiroyuki 

I am not happy with the min_pages argument but we can do something more
clever  later.

Acked-by: Michal Hocko 

> ---
>  mm/memcontrol.c | 16 +---
>  1 file changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index bc7bfa7..2cef99a 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2294,7 +2294,8 @@ enum {
>  };
>  
>  static int mem_cgroup_do_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
> - unsigned int nr_pages, bool oom_check)
> + unsigned int nr_pages, unsigned int min_pages,
> + bool oom_check)
>  {
>   unsigned long csize = nr_pages * PAGE_SIZE;
>   struct mem_cgroup *mem_over_limit;
> @@ -2317,18 +2318,18 @@ static int mem_cgroup_do_charge(struct mem_cgroup 
> *memcg, gfp_t gfp_mask,
>   } else
>   mem_over_limit = mem_cgroup_from_res_counter(fail_res, res);
>   /*
> -  * nr_pages can be either a huge page (HPAGE_PMD_NR), a batch
> -  * of regular pages (CHARGE_BATCH), or a single regular page (1).
> -  *
>* Never reclaim on behalf of optional batching, retry with a
>* single page instead.
>*/
> - if (nr_pages == CHARGE_BATCH)
> + if (nr_pages > min_pages)
>   return CHARGE_RETRY;
>  
>   if (!(gfp_mask & __GFP_WAIT))
>   return CHARGE_WOULDBLOCK;
>  
> + if (gfp_mask & __GFP_NORETRY)
> + return CHARGE_NOMEM;
> +
>   ret = mem_cgroup_reclaim(mem_over_limit, gfp_mask, flags);
>   if (mem_cgroup_margin(mem_over_limit) >= nr_pages)
>   return CHARGE_RETRY;
> @@ -2341,7 +2342,7 @@ static int mem_cgroup_do_charge(struct mem_cgroup 
> *memcg, gfp_t gfp_mask,
>* unlikely to succeed so close to the limit, and we fall back
>* to regular pages anyway in case of failure.
>*/
> - if (nr_pages == 1 && ret)
> + if (nr_pages <= (1 << PAGE_ALLOC_COSTLY_ORDER) && ret)
>   return CHARGE_RETRY;
>  
>   /*
> @@ -2476,7 +2477,8 @@ again:
>   nr_oom_retries = MEM_CGROUP_RECLAIM_RETRIES;
>   }
>  
> - ret = mem_cgroup_do_charge(memcg, gfp_mask, batch, oom_check);
> + ret = mem_cgroup_do_charge(memcg, gfp_mask, batch, nr_pages,
> + oom_check);
>   switch (ret) {
>   case CHARGE_OK:
>   break;
> -- 
> 1.7.11.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 02/11] memcg: Reclaim when more than one page needed.

2012-08-10 Thread Michal Hocko
On Fri 10-08-12 19:30:00, Michal Hocko wrote:
> On Thu 09-08-12 17:01:10, Glauber Costa wrote:
> [...]
> > For now retry up to COSTLY_ORDER (as page_alloc.c does) and make sure
> > not to do it if __GFP_NORETRY.
> 
> Who is using __GFP_NORETRY for user backed memory (except for hugetlb
> which has its own controller)?

Bahh, friday brain... GFP_THISNODE used by slab. Sorry for noise.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mellanox mlx4_core and SR-IOV

2012-08-10 Thread Chris Friesen

On 08/03/2012 02:33 AM, Lukas Hejtmanek wrote:

I also tried OFED package from Mellanox which seems to have better SR-IOV
support (at least mlx4_ib does not complain that SR-IOV is not supported).
However, it does not work when SR-IOV enabled:


Last I heard they were not officially providing support for SR-IOV.  Has 
anyone heard otherwise from the Mellanox folks?


Chris


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/2] net: connect to UNIX sockets from specified root

2012-08-10 Thread Stanislav Kinsbursky

10.08.2012 22:15, H. Peter Anvin пишет:

On 08/10/2012 05:57 AM, Stanislav Kinsbursky wrote:

Today, there is a problem in connecting of local SUNRPC thansports. These
transports uses UNIX sockets and connection itself is done by rpciod
workqueue.
But UNIX sockets lookup is done in context of process file system root. I.e.
all local thunsports are connecting in rpciod context.
This works nice until we will try to mount NFS from process with other root -
for example in container. This container can have it's own (nested) root and
rcpbind process, listening on it's own unix sockets. But NFS mount attempt in
this container will register new service (Lockd for example) in global rpcbind
- not containers's one.

This patch set introduces kernel connect helper for UNIX stream sockets and
modifies unix_find_other() to be able to search from specified root.
It also replaces generic socket connect call for local transports by new
helper in SUNRPC layer.

The following series implements...

On that whole subject...

Do we need a Unix domain socket equivalent to openat()?


It looks like sys_connectat () and sys_bindat () could be an organic 
part on openat () and friends family.
But currently I don't have any usage example for them in hands.  And the 
main problem here, that this syscalls can be used only for unix sockets.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


pull request: wireless 2012-08-10

2012-08-10 Thread John W. Linville
commit 039aafba1b57ed39acb3abc290c11be37402feb2

Dave,

Here is a handful of fixes intended for 3.6.

Daniel Drake offers a cfg80211 fix to consume pending events before
taking a wireless device down.  This prevents a resource leak.

Stanislaw Gruszka gives us a fix for a NULL pointer dereference in
rt61pci.

Johannes Berg provides an iwlwifi patch to disable "greenfield" mode.
Use of that mode was causing a rate scaling problem in for iwlwifi.

Please let me know if there are problems!

Thanks,

John

---

The following changes since commit 63d02d157ec4124990258d66517b6c11fd6df0cf:

  net: tcp: ipv6_mapped needs sk_rx_dst_set method (2012-08-09 20:56:09 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless.git for-davem

for you to fetch changes up to 039aafba1b57ed39acb3abc290c11be37402feb2:

  Merge branch 'master' of 
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 
(2012-08-10 14:05:38 -0400)



Daniel Drake (1):
  cfg80211: process pending events when unregistering net device

Johannes Berg (1):
  iwlwifi: disable greenfield transmissions as a workaround

John W. Linville (1):
  Merge branch 'master' of git://git.kernel.org/.../linville/wireless into 
for-davem

Stanislaw Gruszka (1):
  rt61pci: fix NULL pointer dereference in config_lna_gain

 drivers/net/wireless/iwlwifi/dvm/rs.c |   13 -
 drivers/net/wireless/rt2x00/rt61pci.c |3 +--
 net/wireless/core.c   |5 +
 net/wireless/core.h   |1 +
 net/wireless/util.c   |2 +-
 5 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/drivers/net/wireless/iwlwifi/dvm/rs.c 
b/drivers/net/wireless/iwlwifi/dvm/rs.c
index 6fddd27..a82f46c 100644
--- a/drivers/net/wireless/iwlwifi/dvm/rs.c
+++ b/drivers/net/wireless/iwlwifi/dvm/rs.c
@@ -707,11 +707,14 @@ static int rs_toggle_antenna(u32 valid_ant, u32 
*rate_n_flags,
  */
 static bool rs_use_green(struct ieee80211_sta *sta)
 {
-   struct iwl_station_priv *sta_priv = (void *)sta->drv_priv;
-   struct iwl_rxon_context *ctx = sta_priv->ctx;
-
-   return (sta->ht_cap.cap & IEEE80211_HT_CAP_GRN_FLD) &&
-   !(ctx->ht.non_gf_sta_present);
+   /*
+* There's a bug somewhere in this code that causes the
+* scaling to get stuck because GF+SGI can't be combined
+* in SISO rates. Until we find that bug, disable GF, it
+* has only limited benefit and we still interoperate with
+* GF APs since we can always receive GF transmissions.
+*/
+   return false;
 }
 
 /**
diff --git a/drivers/net/wireless/rt2x00/rt61pci.c 
b/drivers/net/wireless/rt2x00/rt61pci.c
index f322596..3f7bc5c 100644
--- a/drivers/net/wireless/rt2x00/rt61pci.c
+++ b/drivers/net/wireless/rt2x00/rt61pci.c
@@ -2243,8 +2243,7 @@ static void rt61pci_txdone(struct rt2x00_dev *rt2x00dev)
 
 static void rt61pci_wakeup(struct rt2x00_dev *rt2x00dev)
 {
-   struct ieee80211_conf conf = { .flags = 0 };
-   struct rt2x00lib_conf libconf = { .conf =  };
+   struct rt2x00lib_conf libconf = { .conf = >hw->conf };
 
rt61pci_config(rt2x00dev, , IEEE80211_CONF_CHANGE_PS);
 }
diff --git a/net/wireless/core.c b/net/wireless/core.c
index 31b40cc..dcd64d5 100644
--- a/net/wireless/core.c
+++ b/net/wireless/core.c
@@ -952,6 +952,11 @@ static int cfg80211_netdev_notifier_call(struct 
notifier_block *nb,
 */
synchronize_rcu();
INIT_LIST_HEAD(>list);
+   /*
+* Ensure that all events have been processed and
+* freed.
+*/
+   cfg80211_process_wdev_events(wdev);
break;
case NETDEV_PRE_UP:
if (!(wdev->wiphy->interface_modes & BIT(wdev->iftype)))
diff --git a/net/wireless/core.h b/net/wireless/core.h
index 5206c68..bc7430b 100644
--- a/net/wireless/core.h
+++ b/net/wireless/core.h
@@ -426,6 +426,7 @@ int cfg80211_change_iface(struct cfg80211_registered_device 
*rdev,
  struct net_device *dev, enum nl80211_iftype ntype,
  u32 *flags, struct vif_params *params);
 void cfg80211_process_rdev_events(struct cfg80211_registered_device *rdev);
+void cfg80211_process_wdev_events(struct wireless_dev *wdev);
 
 int cfg80211_can_use_iftype_chan(struct cfg80211_registered_device *rdev,
 struct wireless_dev *wdev,
diff --git a/net/wireless/util.c b/net/wireless/util.c
index 26f8cd3..994e2f0 100644
--- a/net/wireless/util.c
+++ b/net/wireless/util.c
@@ -735,7 +735,7 @@ void cfg80211_upload_connect_keys(struct wireless_dev *wdev)
wdev->connect_keys = NULL;
 }
 
-static void cfg80211_process_wdev_events(struct wireless_dev *wdev)
+void cfg80211_process_wdev_events(struct wireless_dev *wdev)
 {
struct 

[PATCH] staging: csr: Fix up version.h includes

2012-08-10 Thread Jesper Juhl
Include version.h where actually needed, remove where unneeded.

Signed-off-by: Jesper Juhl 
---
 drivers/staging/csr/csr_panic.c| 1 -
 drivers/staging/csr/drv.c  | 3 +--
 drivers/staging/csr/io.c   | 2 +-
 drivers/staging/csr/monitor.c  | 3 +--
 drivers/staging/csr/netdev.c   | 3 +--
 drivers/staging/csr/sdio_mmc.c | 2 +-
 drivers/staging/csr/sme_native.c   | 2 +-
 drivers/staging/csr/sme_sys.c  | 2 +-
 drivers/staging/csr/ul_int.c   | 1 +
 drivers/staging/csr/unifi_pdu_processing.c | 2 +-
 drivers/staging/csr/unifi_wext.h   | 1 +
 11 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/drivers/staging/csr/csr_panic.c b/drivers/staging/csr/csr_panic.c
index 353a829..095f7fa 100644
--- a/drivers/staging/csr/csr_panic.c
+++ b/drivers/staging/csr/csr_panic.c
@@ -9,7 +9,6 @@
 */
 
 #include 
-#include 
 #include 
 
 #include "csr_panic.h"
diff --git a/drivers/staging/csr/drv.c b/drivers/staging/csr/drv.c
index b2c27f4..9834d92 100644
--- a/drivers/staging/csr/drv.c
+++ b/drivers/staging/csr/drv.c
@@ -15,8 +15,6 @@
  * ---
  */
 
-
-
 /*
  * Porting Notes:
  * Part of this file contains an example for how to glue the OS layer
@@ -37,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "csr_wifi_hip_unifiversion.h"
 #include "unifi_priv.h"
diff --git a/drivers/staging/csr/io.c b/drivers/staging/csr/io.c
index e6503d96..deaff25 100644
--- a/drivers/staging/csr/io.c
+++ b/drivers/staging/csr/io.c
@@ -31,6 +31,7 @@
  * ---
  */
 #include 
+#include 
 
 #include "csr_wifi_hip_unifi.h"
 #include "csr_wifi_hip_unifiversion.h"
@@ -38,7 +39,6 @@
 #include "unifiio.h"
 #include "unifi_priv.h"
 
-
 /*
  * Array of pointers to context structs for unifi devices that are present.
  * The index in the array corresponds to the wlan interface number
diff --git a/drivers/staging/csr/monitor.c b/drivers/staging/csr/monitor.c
index 628782a..ca7559b 100644
--- a/drivers/staging/csr/monitor.c
+++ b/drivers/staging/csr/monitor.c
@@ -10,6 +10,7 @@
  * ---
  */
 
+#include 
 #include "unifi_priv.h"
 
 #ifdef UNIFI_SNIFF_ARPHRD
@@ -23,8 +24,6 @@
 #define ETH_P_80211_RAW ETH_P_ALL
 #endif
 
-
-
 /*
  * ---
  *  uf_start_sniff
diff --git a/drivers/staging/csr/netdev.c b/drivers/staging/csr/netdev.c
index 1e6e111..0e34020 100644
--- a/drivers/staging/csr/netdev.c
+++ b/drivers/staging/csr/netdev.c
@@ -15,7 +15,6 @@
  * ---
  */
 
-
 /*
  * Porting Notes:
  * This file implements the data plane of the UniFi linux driver.
@@ -48,7 +47,7 @@
 #include 
 #include 
 #include 
-
+#include 
 #include 
 #include "csr_wifi_hip_unifi.h"
 #include "csr_wifi_hip_conversions.h"
diff --git a/drivers/staging/csr/sdio_mmc.c b/drivers/staging/csr/sdio_mmc.c
index d3fd57c..713d2a4 100644
--- a/drivers/staging/csr/sdio_mmc.c
+++ b/drivers/staging/csr/sdio_mmc.c
@@ -14,7 +14,7 @@
 #include 
 #include 
 #include 
-
+#include 
 #include 
 #include 
 #include 
diff --git a/drivers/staging/csr/sme_native.c b/drivers/staging/csr/sme_native.c
index 229268f..845b654 100644
--- a/drivers/staging/csr/sme_native.c
+++ b/drivers/staging/csr/sme_native.c
@@ -12,7 +12,7 @@
  */
 
 #include 
-
+#include 
 #include "unifi_priv.h"
 #include "csr_wifi_hip_unifi.h"
 #include "csr_wifi_hip_conversions.h"
diff --git a/drivers/staging/csr/sme_sys.c b/drivers/staging/csr/sme_sys.c
index 99de27e..7ff3f43 100644
--- a/drivers/staging/csr/sme_sys.c
+++ b/drivers/staging/csr/sme_sys.c
@@ -14,6 +14,7 @@
  * ---
  */
 
+#include 
 #include "csr_wifi_hip_unifiversion.h"
 #include "unifi_priv.h"
 #include "csr_wifi_hip_conversions.h"
@@ -21,7 +22,6 @@
 #include "csr_wifi_sme_sef.h"
 #endif
 
-
 /*
  * This file implements the SME SYS API and contains the following functions:
  * CsrWifiRouterCtrlMediaStatusReqHandler()
diff --git a/drivers/staging/csr/ul_int.c b/drivers/staging/csr/ul_int.c
index 46d3507..819690d 100644
--- a/drivers/staging/csr/ul_int.c
+++ b/drivers/staging/csr/ul_int.c
@@ -12,6 +12,7 @@
  *
  * ***
  */
+#include 
 #include "csr_wifi_hip_unifi.h"
 #include "csr_wifi_hip_conversions.h"
 #include "unifi_priv.h"
diff --git a/drivers/staging/csr/unifi_pdu_processing.c 
b/drivers/staging/csr/unifi_pdu_processing.c
index 7c7e8d4..c28f4dd 100644
--- a/drivers/staging/csr/unifi_pdu_processing.c
+++ b/drivers/staging/csr/unifi_pdu_processing.c
@@ -14,7 +14,7 @@
  * 

Re: [RFC PATCH 0/2] net: connect to UNIX sockets from specified root

2012-08-10 Thread H. Peter Anvin
On 08/10/2012 11:40 AM, Alan Cox wrote:
> 
> Agreed on open() for sockets.. the lack of open is a Berklix derived
> pecularity of the interface. It would equally be useful to be able to
> open "/dev/socket/ipv4/1.2.3.4/1135" and the like for scripts and stuff
> 
> That needs VFS changes however so you can pass the remainder of a path to
> a device node. It also lets you do a lot of other sane stuff like
> 
>   open /dev/ttyS0/9600/8n1
> 

Well, supporting device node subpaths would be nice, but I don't think
that that is a requirement either for being able to open() a socket (as
a Linux extension) nor for supporting something like your above
/dev/socket/... since that could be done with a filesystem rather than
just a device node.

-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 1/2] unix sockets: add ability for search for peer from passed root

2012-08-10 Thread Stanislav Kinsbursky

10.08.2012 22:10, J. Bruce Fields пишет:

On Fri, Aug 10, 2012 at 04:57:30PM +0400, Stanislav Kinsbursky wrote:

This helper is used stream sockets yet.
All is simple: if non-NULL struct path was passed to unix_find_other(), then
vfs_path_lookup() is called instead of kern_path().

I'm having some trouble parsing the changelog.  Maybe something like?:

unix sockets: add ability to look up using passed-in root

Export a unix_stream_connect_root() helper that allows a caller
to optionally pass in a root path, in which case the lookup will
be done relative to the given path instead of the current
working directory.


Yep, your variant is much better. Thanks.



I guess this is a question for the networking people, but: will it cause
problems to have sunrpc calling directly into the unix socket code?

(And if so, what would be the alternative: define some variant of
sockaddr_un that includes the root path?  Something better?)


That was my first idea. But there are problems with this solution (add 
root path to sockaddr_un) :
1) sockaddr_un size will change. I don't know, how this will affect 
user-space. Of course, we can introduce something like:


struct sockaddr_un_kern {
struct sockaddr_un un;
struct path *path;
}

But even in this case we need to color this structure somehow (for 
example, set path to NULL for simple connect or bind call) . And to add 
this color, we have to separate sys_connect () from our 
sock->ops->connect() call. And I don't really know how to do it since we 
don't have any info about socket type in sys_connect () in hands. I.e. 
we have it, but then we have to add some specific UNIX socket logic to 
completely generic sys_connect () and sys_bind () .



--b.


Signed-off-by: Stanislav Kinsbursky 
---
  include/net/af_unix.h |2 ++
  net/unix/af_unix.c|   25 ++---
  2 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index 2ee33da..559467e 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -67,6 +67,8 @@ struct unix_sock {
  
  long unix_inq_len(struct sock *sk);

  long unix_outq_len(struct sock *sk);
+int unix_stream_connect_root(struct path *root, struct socket *sock,
+struct sockaddr *uaddr, int addr_len, int flags);
  
  #ifdef CONFIG_SYSCTL

  extern int unix_sysctl_register(struct net *net);
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 641f2e4..a790ebc 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -759,7 +759,7 @@ out:mutex_unlock(>readlock);
return err;
  }
  
-static struct sock *unix_find_other(struct net *net,

+static struct sock *unix_find_other(struct net *net, struct path *root,
struct sockaddr_un *sunname, int len,
int type, unsigned int hash, int *error)
  {
@@ -769,7 +769,11 @@ static struct sock *unix_find_other(struct net *net,
  
  	if (sunname->sun_path[0]) {

struct inode *inode;
-   err = kern_path(sunname->sun_path, LOOKUP_FOLLOW, );
+
+   if (root)
+   err = vfs_path_lookup(root->dentry, root->mnt, 
sunname->sun_path, LOOKUP_FOLLOW, );
+   else
+   err = kern_path(sunname->sun_path, LOOKUP_FOLLOW, 
);
if (err)
goto fail;
inode = path.dentry->d_inode;
@@ -979,7 +983,7 @@ static int unix_dgram_connect(struct socket *sock, struct 
sockaddr *addr,
goto out;
  
  restart:

-   other = unix_find_other(net, sunaddr, alen, sock->type, hash, 
);
+   other = unix_find_other(net, NULL, sunaddr, alen, sock->type, hash, 
);
if (!other)
goto out;
  
@@ -1053,8 +1057,8 @@ static long unix_wait_for_peer(struct sock *other, long timeo)

return timeo;
  }
  
-static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,

-  int addr_len, int flags)
+int unix_stream_connect_root(struct path *root, struct socket *sock,
+struct sockaddr *uaddr, int addr_len, int flags)
  {
struct sockaddr_un *sunaddr = (struct sockaddr_un *)uaddr;
struct sock *sk = sock->sk;
@@ -1098,7 +1102,7 @@ static int unix_stream_connect(struct socket *sock, 
struct sockaddr *uaddr,
  
  restart:

/*  Find listening sock. */
-   other = unix_find_other(net, sunaddr, addr_len, sk->sk_type, hash, 
);
+   other = unix_find_other(net, root, sunaddr, addr_len, sk->sk_type, hash, 
);
if (!other)
goto out;
  
@@ -1227,6 +1231,13 @@ out:

sock_put(other);
return err;
  }
+EXPORT_SYMBOL_GPL(unix_stream_connect_root);
+
+static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
+   int addr_len, int 

Re: [lm-sensors] hwmon : raw reading -> temperature conversion

2012-08-10 Thread Guenter Roeck
On Fri, Aug 10, 2012 at 07:48:14PM +0530, Bitan Biswas wrote:
> Hi,
> 
> I have a question related to hwmon driver and need suggestions.
> 
> I am working on a temperature sensor driver that is hwmon driver.
> - The temperature is calculated from raw sensor reading and
> certain initialization parameters.
> - Raw reading obtained from 2 different sensor instances under
> same conditions can differ. Further, initialization parameters
> are specific to each hardware instance.
> - Expressions with floating point operands are used to compute
> the temperature value.
> 
> In our platform there are multiple kernel level clients to the
> temperature sensor driver.
> Hence I am planning to present temperature to these clients
> from kernel driver itself.
> 
> But looking at the hwmon linux documentation, seems the sensor
> kernel drivers should report only raw readings.
> The raw readings can be converted into required output,
> e.g. temperature in this case, by respective user space implementation.
> 
"raw" means the value as reported to the sensor. For example, for an ADC, the
raw value means the voltage in mV as seen on the sensor's input pins. This
voltage is the voltage to be reported. Converting it to a "real" voltage as,
typically, determined by a set of voltage divider resistors should be done in
user space.

For temperature sensors this is a bit more tricky. Presumably you get readings
from a thermistor or similar. The hwmon subsystem includes a driver for NTC
thermistors; maybe you can get some ideas from it. Maybe you can even use
it and/or extend it to support your hardware.

> However because of my driver clients being in kernel space, I am
> thinking of doing fixed point calculations in the sensor driver
> and get the temperature corresponding to raw sensor readings.
> 
> Please let me know if this a correct approach?
> 
Question is really what chip you are using, and how exactly your hardware looks
like. Do you use a generic ADC ? If so, is it already supported in the kernel ?
How are the thermistor readings converted and reported to SW ?

Thanks,
Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/2] net: connect to UNIX sockets from specified root

2012-08-10 Thread Alan Cox
> > AF_UNIX between roots raises some interesting semantic questions when
> > you begin passing file descriptors down them as well.
> 
> Why is that?  A file descriptor carries all that information with it...

Things like fchdir(). It's not a machine breaking problem but for
containers as opposed to chroot we need to be clear what the expected
isolation sematics are.

Agreed on open() for sockets.. the lack of open is a Berklix derived
pecularity of the interface. It would equally be useful to be able to
open "/dev/socket/ipv4/1.2.3.4/1135" and the like for scripts and stuff

That needs VFS changes however so you can pass the remainder of a path to
a device node. It also lets you do a lot of other sane stuff like

open /dev/ttyS0/9600/8n1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 11/12] KVM: x86: introduce set_mmio_exit_info

2012-08-10 Thread Marcelo Tosatti
On Tue, Aug 07, 2012 at 05:54:42PM +0800, Xiao Guangrong wrote:
> Introduce set_mmio_exit_info to cleanup the common code
> 
> Signed-off-by: Xiao Guangrong 
> ---
>  arch/x86/kvm/x86.c |   33 +
>  1 files changed, 17 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 4c86239..8cde327 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3761,9 +3761,6 @@ static int read_exit_mmio(struct kvm_vcpu *vcpu, gpa_t 
> gpa,
>  static int write_exit_mmio(struct kvm_vcpu *vcpu, gpa_t gpa,
>  void *val, int bytes)
>  {
> - struct kvm_mmio_fragment *frag = >mmio_fragments[0];
> -
> - memcpy(vcpu->run->mmio.data, frag->data, frag->len);
>   return X86EMUL_CONTINUE;
>  }
> 
> @@ -3831,6 +3828,20 @@ mmio:
>   return X86EMUL_CONTINUE;
>  }
> 
> +static void set_mmio_exit_info(struct kvm_vcpu *vcpu,
> +struct kvm_mmio_fragment *frag, bool write)
> +{
> + struct kvm_run *run = vcpu->run;
> +
> + run->exit_reason = KVM_EXIT_MMIO;
> + run->mmio.phys_addr = frag->gpa;
> + run->mmio.len = frag->len;
> + run->mmio.is_write = vcpu->mmio_is_write = write;
> +
> + if (write)
> + memcpy(run->mmio.data, frag->data, frag->len);
> +}
> +
>  int emulator_read_write(struct x86_emulate_ctxt *ctxt, unsigned long addr,
>   void *val, unsigned int bytes,
>   struct x86_exception *exception,
> @@ -3870,14 +3881,10 @@ int emulator_read_write(struct x86_emulate_ctxt 
> *ctxt, unsigned long addr,
>   return rc;
> 
>   gpa = vcpu->mmio_fragments[0].gpa;
> -
>   vcpu->mmio_needed = 1;
>   vcpu->mmio_cur_fragment = 0;
> 
> - vcpu->run->mmio.len = vcpu->mmio_fragments[0].len;
> - vcpu->run->mmio.is_write = vcpu->mmio_is_write = ops->write;
> - vcpu->run->exit_reason = KVM_EXIT_MMIO;
> - vcpu->run->mmio.phys_addr = gpa;
> + set_mmio_exit_info(vcpu, >mmio_fragments[0], ops->write);
> 
>   return ops->read_write_exit_mmio(vcpu, gpa, val, bytes);
>  }
> @@ -5486,7 +5493,6 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
>   */
>  static int complete_mmio(struct kvm_vcpu *vcpu)
>  {
> - struct kvm_run *run = vcpu->run;
>   struct kvm_mmio_fragment *frag;
>   int r;
> 
> @@ -5497,7 +5503,7 @@ static int complete_mmio(struct kvm_vcpu *vcpu)
>   /* Complete previous fragment */
>   frag = >mmio_fragments[vcpu->mmio_cur_fragment++];
>   if (!vcpu->mmio_is_write)
> - memcpy(frag->data, run->mmio.data, frag->len);
> + memcpy(frag->data, vcpu->run->mmio.data, frag->len);
>   if (vcpu->mmio_cur_fragment == vcpu->mmio_nr_fragments) {
>   vcpu->mmio_needed = 0;
>   if (vcpu->mmio_is_write)
> @@ -5507,12 +5513,7 @@ static int complete_mmio(struct kvm_vcpu *vcpu)
>   }
>   /* Initiate next fragment */
>   ++frag;
> - run->exit_reason = KVM_EXIT_MMIO;
> - run->mmio.phys_addr = frag->gpa;
> - if (vcpu->mmio_is_write)
> - memcpy(run->mmio.data, frag->data, frag->len);
> - run->mmio.len = frag->len;
> - run->mmio.is_write = vcpu->mmio_is_write;
> + set_mmio_exit_info(vcpu, frag, vcpu->mmio_is_write);
>   return 0;
> 
>   }
> -- 
> 1.7.7.6

IMO having a function is unnecessary (it makes it harder the code).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 00/12] KVM: introduce readonly memslot

2012-08-10 Thread Marcelo Tosatti
On Tue, Aug 07, 2012 at 05:47:15PM +0800, Xiao Guangrong wrote:
> Changelog:
> - introduce KVM_PFN_ERR_RO_FAULT instead of dummy page
> - introduce KVM_HVA_ERR_BAD and optimize error hva indicators
> 
> The test case can be found at:
> http://lkml.indiana.edu/hypermail/linux/kernel/1207.2/00819/migrate-perf.tar.bz2
> 
> In current code, if we map a readonly memory space from host to guest
> and the page is not currently mapped in the host, we will get a fault-pfn
> and async is not allowed, then the vm will crash.
> 
> As Avi's suggestion, We introduce readonly memory region to map ROM/ROMD
> to the guest, read access is happy for readonly memslot, write access on
> readonly memslot will cause KVM_EXIT_MMIO exit.

Memory slots whose QEMU mapping is write protected is supported
today, as long as there are no write faults.

What prevents the use of mmap(!MAP_WRITE) to handle read-only memslots
again?

The initial objective was to fix a vm crash, can you explain that
initial problem?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 05/12] KVM: reorganize hva_to_pfn

2012-08-10 Thread Marcelo Tosatti
On Tue, Aug 07, 2012 at 05:51:05PM +0800, Xiao Guangrong wrote:
> We do too many things in hva_to_pfn, this patch reorganize the code,
> let it be better readable
> 
> Signed-off-by: Xiao Guangrong 
> ---
>  virt/kvm/kvm_main.c |  159 
> +++
>  1 files changed, 97 insertions(+), 62 deletions(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 26ffc87..dd01bcb 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1043,83 +1043,118 @@ static inline int check_user_page_hwpoison(unsigned 
> long addr)
>   return rc == -EHWPOISON;
>  }
> 
> -static pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
> - bool write_fault, bool *writable)
> +/*
> + * The atomic path to get the writable pfn which will be stored in @pfn,
> + * true indicates success, otherwise false is returned.
> + */
> +static bool hva_to_pfn_fast(unsigned long addr, bool atomic, bool *async,
> + bool write_fault, bool *writable, pfn_t *pfn)
>  {
>   struct page *page[1];
> - int npages = 0;
> - pfn_t pfn;
> + int npages;
> 
> - /* we can do it either atomically or asynchronously, not both */
> - BUG_ON(atomic && async);
> + if (!(async || atomic))
> + return false;
> 
> - BUG_ON(!write_fault && !writable);
> + npages = __get_user_pages_fast(addr, 1, 1, page);
> + if (npages == 1) {
> + *pfn = page_to_pfn(page[0]);
> 
> - if (writable)
> - *writable = true;
> + if (writable)
> + *writable = true;
> + return true;
> + }
> +
> + return false;
> +}
> 
> - if (atomic || async)
> - npages = __get_user_pages_fast(addr, 1, 1, page);
> +/*
> + * The slow path to get the pfn of the specified host virtual address,
> + * 1 indicates success, -errno is returned if error is detected.
> + */
> +static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
> +bool *writable, pfn_t *pfn)
> +{
> + struct page *page[1];
> + int npages = 0;
> 
> - if (unlikely(npages != 1) && !atomic) {
> - might_sleep();
> + might_sleep();
> 
> - if (writable)
> - *writable = write_fault;
> -
> - if (async) {
> - down_read(>mm->mmap_sem);
> - npages = get_user_page_nowait(current, current->mm,
> -  addr, write_fault, page);
> - up_read(>mm->mmap_sem);
> - } else
> - npages = get_user_pages_fast(addr, 1, write_fault,
> -  page);
> -
> - /* map read fault as writable if possible */
> - if (unlikely(!write_fault) && npages == 1) {
> - struct page *wpage[1];
> -
> - npages = __get_user_pages_fast(addr, 1, 1, wpage);
> - if (npages == 1) {
> - *writable = true;
> - put_page(page[0]);
> - page[0] = wpage[0];
> - }
> - npages = 1;
> + if (writable)
> + *writable = write_fault;
> +
> + if (async) {
> + down_read(>mm->mmap_sem);
> + npages = get_user_page_nowait(current, current->mm,
> +   addr, write_fault, page);
> + up_read(>mm->mmap_sem);
> + } else
> + npages = get_user_pages_fast(addr, 1, write_fault,
> +  page);
> + if (npages != 1)
> + return npages;

 * Returns number of pages pinned. This may be fewer than the number
 * requested. If nr_pages is 0 or negative, returns 0. If no pages
 * were pinned, returns -errno.
 */
int get_user_pages_fast(unsigned long start, int nr_pages, int write,
struct page **pages)


Current behaviour is

if (atomic || async)
npages = __get_user_pages_fast(addr, 1, 1, page);

if (npages != 1) 
slow path retry;

The changes above change this, don't they?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 02/15] Declaring udp protocols has its own proc entry

2012-08-10 Thread Jan Ceuleers
On 08/10/2012 04:05 PM, Masatake YAMATO wrote:
> Declaring udp protocols has its own proc entry.
> 
> Signed-off-by: Masatake YAMATO 
> ---
>  net/ipv4/udp.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index b4c3582..2b822ac 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -1963,6 +1963,9 @@ struct proto udp_prot = {
>   .compat_getsockopt = compat_udp_getsockopt,
>  #endif
>   .clear_sk  = sk_prot_clear_portaddr_nulls,
> +#ifdef CONFIG_PROC_FS
> + .has_own_proc_entry= 1,
> +#endif
>  };
>  EXPORT_SYMBOL(udp_prot);
>  
> 

Two points:

- I haven't seen patch 01/15;
- these patches should go to netdev rather than lkml
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/2] net: connect to UNIX sockets from specified root

2012-08-10 Thread H. Peter Anvin
On 08/10/2012 11:26 AM, Alan Cox wrote:
>> On that whole subject...
>>
>> Do we need a Unix domain socket equivalent to openat()?
> 
> I don't think so. The name is just a file system indexing trick, it's not
> really the socket proper. It's little more than "ascii string with
> permissions attached" - indeed we also support an abstract name space
> which for a lot of uses is actually more convenient.
> 

I don't really understand why Unix domain sockets is different than any
other pathname users in this sense.  (Actually, I have never understood
why open() on a Unix domain socket doesn't give the equivalent of a
socket() + connect() -- it would make logical sense and would provide
additional functionality).

It would be different if the Unix domain sockets simply required an
absolute pathname (it is not just about the root, it is also about the
cwd, which is where the -at() functions come into play), but that is not
the case.

The abstract namespace is irrelevant for this, obviously.

> AF_UNIX between roots raises some interesting semantic questions when
> you begin passing file descriptors down them as well.

Why is that?  A file descriptor carries all that information with it...

-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.5.1 ext4_ sleeping while atomic bug.

2012-08-10 Thread Theodore Ts'o
Hi Dave,

Thanks for the bug report!  The following should address the bug which
you found.

- Ted

>From 05ca87aa00121756b5d41f3d71eb8b51bed3bc92 Mon Sep 17 00:00:00 2001
From: Theodore Ts'o 
Date: Fri, 10 Aug 2012 13:57:52 -0400
Subject: [PATCH] ext4: don't call ext4_error while block group is locked

While in ext4_validate_block_bitmap(), if an block allocation bitmap
is found to be invalid, we call ext4_error() while the block group is
still locked.  This causes ext4_commit_super() to call a function
which might sleep while in an atomic context.

There's no need to keep the block group locked at this point, so hoist
the ext4_error() call up to ext4_validate_block_bitmap() and release
the block group spinlock before calling ext4_error().

The reported stack trace can be found at:

http://article.gmane.org/gmane.comp.file-systems.ext4/33731

Reported-by: Dave Jones 
Signed-off-by: "Theodore Ts'o" 
Cc: sta...@vger.kernel.org
---
 fs/ext4/balloc.c | 62 +---
 fs/ext4/bitmap.c |  1 -
 2 files changed, 37 insertions(+), 26 deletions(-)

diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index d23b31c..1b50890 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -280,14 +280,18 @@ struct ext4_group_desc * ext4_get_group_desc(struct 
super_block *sb,
return desc;
 }
 
-static int ext4_valid_block_bitmap(struct super_block *sb,
-  struct ext4_group_desc *desc,
-  unsigned int block_group,
-  struct buffer_head *bh)
+/*
+ * Return the block number which was discovered to be invalid, or 0 if
+ * the block bitmap is valid.
+ */
+static ext4_fsblk_t ext4_valid_block_bitmap(struct super_block *sb,
+   struct ext4_group_desc *desc,
+   unsigned int block_group,
+   struct buffer_head *bh)
 {
ext4_grpblk_t offset;
ext4_grpblk_t next_zero_bit;
-   ext4_fsblk_t bitmap_blk;
+   ext4_fsblk_t blk;
ext4_fsblk_t group_first_block;
 
if (EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_FLEX_BG)) {
@@ -297,37 +301,33 @@ static int ext4_valid_block_bitmap(struct super_block *sb,
 * or it has to also read the block group where the bitmaps
 * are located to verify they are set.
 */
-   return 1;
+   return 0;
}
group_first_block = ext4_group_first_block_no(sb, block_group);
 
/* check whether block bitmap block number is set */
-   bitmap_blk = ext4_block_bitmap(sb, desc);
-   offset = bitmap_blk - group_first_block;
+   blk = ext4_block_bitmap(sb, desc);
+   offset = blk - group_first_block;
if (!ext4_test_bit(offset, bh->b_data))
/* bad block bitmap */
-   goto err_out;
+   return blk;
 
/* check whether the inode bitmap block number is set */
-   bitmap_blk = ext4_inode_bitmap(sb, desc);
-   offset = bitmap_blk - group_first_block;
+   blk = ext4_inode_bitmap(sb, desc);
+   offset = blk - group_first_block;
if (!ext4_test_bit(offset, bh->b_data))
/* bad block bitmap */
-   goto err_out;
+   return blk;
 
/* check whether the inode table block number is set */
-   bitmap_blk = ext4_inode_table(sb, desc);
-   offset = bitmap_blk - group_first_block;
+   blk = ext4_inode_table(sb, desc);
+   offset = blk - group_first_block;
next_zero_bit = ext4_find_next_zero_bit(bh->b_data,
offset + EXT4_SB(sb)->s_itb_per_group,
offset);
-   if (next_zero_bit >= offset + EXT4_SB(sb)->s_itb_per_group)
-   /* good bitmap for inode tables */
-   return 1;
-
-err_out:
-   ext4_error(sb, "Invalid block bitmap - block_group = %d, block = %llu",
-   block_group, bitmap_blk);
+   if (next_zero_bit < offset + EXT4_SB(sb)->s_itb_per_group)
+   /* bad bitmap for inode tables */
+   return blk;
return 0;
 }
 
@@ -336,14 +336,26 @@ void ext4_validate_block_bitmap(struct super_block *sb,
   unsigned int block_group,
   struct buffer_head *bh)
 {
+   ext4_fsblk_tblk;
+
if (buffer_verified(bh))
return;
 
ext4_lock_group(sb, block_group);
-   if (ext4_valid_block_bitmap(sb, desc, block_group, bh) &&
-   ext4_block_bitmap_csum_verify(sb, block_group, desc, bh,
- EXT4_BLOCKS_PER_GROUP(sb) / 8))
-   set_buffer_verified(bh);
+   blk = ext4_valid_block_bitmap(sb, desc, block_group, bh);
+   if (unlikely(blk != 0)) {
+

Re: [PATCH][trivial] ASoC: isabelle: Remove unneeded include of version.h

2012-08-10 Thread Mark Brown
On Fri, Aug 10, 2012 at 08:21:51PM +0200, Jesper Juhl wrote:
> On Fri, 10 Aug 2012, Mark Brown wrote:

> > Not sure what this patch is against, there appears to be no include of
> > version.h in current code...

> It's against Linus's tree. I created a branch off of master at 
> f4ba394c1b02e7fc2179fda8d3941a5b3b65efb6 and did the patch.

Ah, the fix is only in -next.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/2] net: connect to UNIX sockets from specified root

2012-08-10 Thread Alan Cox
> On that whole subject...
> 
> Do we need a Unix domain socket equivalent to openat()?

I don't think so. The name is just a file system indexing trick, it's not
really the socket proper. It's little more than "ascii string with
permissions attached" - indeed we also support an abstract name space
which for a lot of uses is actually more convenient.

AF_UNIX between roots raises some interesting semantic questions when you
begin passing file descriptors down them as well.

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][trivial] ASoC: isabelle: Remove unneeded include of version.h

2012-08-10 Thread Jesper Juhl
On Fri, 10 Aug 2012, Mark Brown wrote:

> On Fri, Aug 10, 2012 at 08:12:57PM +0200, Jesper Juhl wrote:
> > There is no need to include version.h in sound/soc/codecs/isabelle.c -
> > this patch removes the pointless include.
> 
> Not sure what this patch is against, there appears to be no include of
> version.h in current code...

It's against Linus's tree. I created a branch off of master at 
f4ba394c1b02e7fc2179fda8d3941a5b3b65efb6 and did the patch.

-- 
Jesper Juhlhttp://www.chaosbits.net/
Don't top-post http://www.catb.org/jargon/html/T/top-post.html
Plain text mails only, please.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][trivial] ASoC: isabelle: Remove unneeded include of version.h

2012-08-10 Thread Mark Brown
On Fri, Aug 10, 2012 at 08:12:57PM +0200, Jesper Juhl wrote:
> There is no need to include version.h in sound/soc/codecs/isabelle.c -
> this patch removes the pointless include.

Not sure what this patch is against, there appears to be no include of
version.h in current code...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/2] net: connect to UNIX sockets from specified root

2012-08-10 Thread H. Peter Anvin
On 08/10/2012 05:57 AM, Stanislav Kinsbursky wrote:
> Today, there is a problem in connecting of local SUNRPC thansports. These
> transports uses UNIX sockets and connection itself is done by rpciod
> workqueue.
> But UNIX sockets lookup is done in context of process file system root. I.e.
> all local thunsports are connecting in rpciod context.
> This works nice until we will try to mount NFS from process with other root -
> for example in container. This container can have it's own (nested) root and
> rcpbind process, listening on it's own unix sockets. But NFS mount attempt in
> this container will register new service (Lockd for example) in global rpcbind
> - not containers's one.
> 
> This patch set introduces kernel connect helper for UNIX stream sockets and
> modifies unix_find_other() to be able to search from specified root.
> It also replaces generic socket connect call for local transports by new
> helper in SUNRPC layer.
> 
> The following series implements...

On that whole subject...

Do we need a Unix domain socket equivalent to openat()?

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] promote zcache from staging

2012-08-10 Thread Seth Jennings
On 08/09/2012 03:20 PM, Dan Magenheimer wrote
> I also wonder if you have anything else unusual in your
> test setup, such as a fast swap disk (mine is a partition
> on the same rotating disk as source and target of the kernel build,
> the default install for a RHEL6 system)?

I'm using a normal SATA HDD with two partitions, one for
swap and the other an ext3 filesystem with the kernel source.

> Or have you disabled cleancache?

Yes, I _did_ disable cleancache.  I could see where having
cleancache enabled could explain the difference in results.

> Or have you changed any sysfs parameters or
> other kernel files?

No.

> And are you using 512M of physical memory or relying on
> kernel boot parameters to reduce visible memory

Limited with mem=512M boot parameter.

> ... and
> if the latter have you confirmed with /proc/meminfo?

Yes, confirmed.

Seth

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH][trivial] ASoC: isabelle: Remove unneeded include of version.h

2012-08-10 Thread Jesper Juhl
There is no need to include version.h in sound/soc/codecs/isabelle.c -
this patch removes the pointless include.

Signed-off-by: Jesper Juhl 
---
 sound/soc/codecs/isabelle.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/sound/soc/codecs/isabelle.c b/sound/soc/codecs/isabelle.c
index 5d8f39e..1bf5560 100644
--- a/sound/soc/codecs/isabelle.c
+++ b/sound/soc/codecs/isabelle.c
@@ -13,7 +13,6 @@
  */
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
-- 
1.7.11.4


-- 
Jesper Juhlhttp://www.chaosbits.net/
Don't top-post http://www.catb.org/jargon/html/T/top-post.html
Plain text mails only, please.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 1/2] unix sockets: add ability for search for peer from passed root

2012-08-10 Thread J. Bruce Fields
On Fri, Aug 10, 2012 at 04:57:30PM +0400, Stanislav Kinsbursky wrote:
> This helper is used stream sockets yet.
> All is simple: if non-NULL struct path was passed to unix_find_other(), then
> vfs_path_lookup() is called instead of kern_path().

I'm having some trouble parsing the changelog.  Maybe something like?:

unix sockets: add ability to look up using passed-in root

Export a unix_stream_connect_root() helper that allows a caller
to optionally pass in a root path, in which case the lookup will
be done relative to the given path instead of the current
working directory.

I guess this is a question for the networking people, but: will it cause
problems to have sunrpc calling directly into the unix socket code?

(And if so, what would be the alternative: define some variant of
sockaddr_un that includes the root path?  Something better?)

--b.

> 
> Signed-off-by: Stanislav Kinsbursky 
> ---
>  include/net/af_unix.h |2 ++
>  net/unix/af_unix.c|   25 ++---
>  2 files changed, 20 insertions(+), 7 deletions(-)
> 
> diff --git a/include/net/af_unix.h b/include/net/af_unix.h
> index 2ee33da..559467e 100644
> --- a/include/net/af_unix.h
> +++ b/include/net/af_unix.h
> @@ -67,6 +67,8 @@ struct unix_sock {
>  
>  long unix_inq_len(struct sock *sk);
>  long unix_outq_len(struct sock *sk);
> +int unix_stream_connect_root(struct path *root, struct socket *sock,
> +  struct sockaddr *uaddr, int addr_len, int flags);
>  
>  #ifdef CONFIG_SYSCTL
>  extern int unix_sysctl_register(struct net *net);
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index 641f2e4..a790ebc 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -759,7 +759,7 @@ out:  mutex_unlock(>readlock);
>   return err;
>  }
>  
> -static struct sock *unix_find_other(struct net *net,
> +static struct sock *unix_find_other(struct net *net, struct path *root,
>   struct sockaddr_un *sunname, int len,
>   int type, unsigned int hash, int *error)
>  {
> @@ -769,7 +769,11 @@ static struct sock *unix_find_other(struct net *net,
>  
>   if (sunname->sun_path[0]) {
>   struct inode *inode;
> - err = kern_path(sunname->sun_path, LOOKUP_FOLLOW, );
> +
> + if (root)
> + err = vfs_path_lookup(root->dentry, root->mnt, 
> sunname->sun_path, LOOKUP_FOLLOW, );
> + else
> + err = kern_path(sunname->sun_path, LOOKUP_FOLLOW, 
> );
>   if (err)
>   goto fail;
>   inode = path.dentry->d_inode;
> @@ -979,7 +983,7 @@ static int unix_dgram_connect(struct socket *sock, struct 
> sockaddr *addr,
>   goto out;
>  
>  restart:
> - other = unix_find_other(net, sunaddr, alen, sock->type, hash, 
> );
> + other = unix_find_other(net, NULL, sunaddr, alen, sock->type, 
> hash, );
>   if (!other)
>   goto out;
>  
> @@ -1053,8 +1057,8 @@ static long unix_wait_for_peer(struct sock *other, long 
> timeo)
>   return timeo;
>  }
>  
> -static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
> -int addr_len, int flags)
> +int unix_stream_connect_root(struct path *root, struct socket *sock,
> +  struct sockaddr *uaddr, int addr_len, int flags)
>  {
>   struct sockaddr_un *sunaddr = (struct sockaddr_un *)uaddr;
>   struct sock *sk = sock->sk;
> @@ -1098,7 +1102,7 @@ static int unix_stream_connect(struct socket *sock, 
> struct sockaddr *uaddr,
>  
>  restart:
>   /*  Find listening sock. */
> - other = unix_find_other(net, sunaddr, addr_len, sk->sk_type, hash, 
> );
> + other = unix_find_other(net, root, sunaddr, addr_len, sk->sk_type, 
> hash, );
>   if (!other)
>   goto out;
>  
> @@ -1227,6 +1231,13 @@ out:
>   sock_put(other);
>   return err;
>  }
> +EXPORT_SYMBOL_GPL(unix_stream_connect_root);
> +
> +static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
> + int addr_len, int flags)
> +{
> + return unix_stream_connect_root(NULL, sock, uaddr, addr_len, flags);
> +}
>  
>  static int unix_socketpair(struct socket *socka, struct socket *sockb)
>  {
> @@ -1508,7 +1519,7 @@ restart:
>   if (sunaddr == NULL)
>   goto out_free;
>  
> - other = unix_find_other(net, sunaddr, namelen, sk->sk_type,
> + other = unix_find_other(net, NULL, sunaddr, namelen, 
> sk->sk_type,
>   hash, );
>   if (other == NULL)
>   goto out_free;
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  

Re: [PATCH v2 02/11] memcg: Reclaim when more than one page needed.

2012-08-10 Thread Kamezawa Hiroyuki

(2012/08/11 2:28), Michal Hocko wrote:

On Sat 11-08-12 01:49:25, KAMEZAWA Hiroyuki wrote:

(2012/08/11 0:42), Michal Hocko wrote:

On Thu 09-08-12 17:01:10, Glauber Costa wrote:
[...]

@@ -2317,18 +2318,18 @@ static int mem_cgroup_do_charge(struct mem_cgroup 
*memcg, gfp_t gfp_mask,
} else
mem_over_limit = mem_cgroup_from_res_counter(fail_res, res);
/*
-* nr_pages can be either a huge page (HPAGE_PMD_NR), a batch
-* of regular pages (CHARGE_BATCH), or a single regular page (1).
-*
 * Never reclaim on behalf of optional batching, retry with a
 * single page instead.
 */
-   if (nr_pages == CHARGE_BATCH)
+   if (nr_pages > min_pages)
return CHARGE_RETRY;


This is dangerous because THP charges will be retried now while they
previously failed with CHARGE_NOMEM which means that we will keep
attempting potentially endlessly.


with THP, I thought nr_pages == min_pages, and no retry.


right you are.


Why cannot we simply do if (nr_pages < CHARGE_BATCH) and get rid of the
min_pages altogether?


Hm, I think a slab can be larger than CHARGE_BATCH.


Also the comment doesn't seem to be valid anymore.


I agree it's not clean. Because our assumption on nr_pages are changed,
I think this behavior should not depend on nr_pages value..
Shouldn't we have a flag to indicate "trial-for-batched charge" ?


dunno, it would require a new parameter anyway (because abusing gfp
doesn't seem great idea).


ok, agreed.

-Kame


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v7 1/4] mm: introduce compaction and migration for virtio ballooned pages

2012-08-10 Thread Rafael Aquini
Memory fragmentation introduced by ballooning might reduce significantly
the number of 2MB contiguous memory blocks that can be used within a guest,
thus imposing performance penalties associated with the reduced number of
transparent huge pages that could be used by the guest workload.

This patch introduces the helper functions as well as the necessary changes
to teach compaction and migration bits how to cope with pages which are
part of a guest memory balloon, in order to make them movable by memory
compaction procedures.

Signed-off-by: Rafael Aquini 
---
 include/linux/mm.h |  17 
 mm/compaction.c| 125 +
 mm/migrate.c   |  30 -
 3 files changed, 152 insertions(+), 20 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 311be90..56cc553 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1662,5 +1662,22 @@ static inline unsigned int 
debug_guardpage_minorder(void) { return 0; }
 static inline bool page_is_guard(struct page *page) { return false; }
 #endif /* CONFIG_DEBUG_PAGEALLOC */
 
+#if (defined(CONFIG_VIRTIO_BALLOON) || \
+   defined(CONFIG_VIRTIO_BALLOON_MODULE)) && defined(CONFIG_COMPACTION)
+extern bool isolate_balloon_page(struct page *);
+extern void putback_balloon_page(struct page *);
+extern struct address_space *balloon_mapping;
+
+static inline bool movable_balloon_page(struct page *page)
+{
+   return (page->mapping && page->mapping == balloon_mapping);
+}
+
+#else
+static inline bool isolate_balloon_page(struct page *page) { return false; }
+static inline void putback_balloon_page(struct page *page) { return false; }
+static inline bool movable_balloon_page(struct page *page) { return false; }
+#endif /* (VIRTIO_BALLOON || VIRTIO_BALLOON_MODULE) && CONFIG_COMPACTION */
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff --git a/mm/compaction.c b/mm/compaction.c
index e78cb96..e4e871b 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "internal.h"
 
 #if defined CONFIG_COMPACTION || defined CONFIG_CMA
@@ -21,6 +22,84 @@
 #define CREATE_TRACE_POINTS
 #include 
 
+#if defined(CONFIG_VIRTIO_BALLOON) || defined(CONFIG_VIRTIO_BALLOON_MODULE)
+/*
+ * Balloon pages special page->mapping.
+ * Users must properly allocate and initialize an instance of balloon_mapping,
+ * and set it as the page->mapping for balloon enlisted page instances.
+ * There is no need on utilizing struct address_space locking schemes for
+ * balloon_mapping as, once it gets initialized at balloon driver, it will
+ * remain just like a static reference that helps us on identifying a guest
+ * ballooned page by its mapping, as well as it will keep the 'a_ops' callback
+ * pointers to the functions that will execute the balloon page mobility tasks.
+ *
+ * address_space_operations necessary methods for ballooned pages:
+ *   .migratepage- used to perform balloon's page migration (as is)
+ *   .invalidatepage - used to isolate a page from balloon's page list
+ *   .freepage   - used to reinsert an isolated page to balloon's page list
+ */
+struct address_space *balloon_mapping;
+EXPORT_SYMBOL_GPL(balloon_mapping);
+
+static inline void __isolate_balloon_page(struct page *page)
+{
+   page->mapping->a_ops->invalidatepage(page, 0);
+}
+
+static inline void __putback_balloon_page(struct page *page)
+{
+   page->mapping->a_ops->freepage(page);
+}
+
+/* __isolate_lru_page() counterpart for a ballooned page */
+bool isolate_balloon_page(struct page *page)
+{
+   if (WARN_ON(!movable_balloon_page(page)))
+   return false;
+
+   if (likely(get_page_unless_zero(page))) {
+   /*
+* As balloon pages are not isolated from LRU lists, concurrent
+* compaction threads can race against page migration functions
+* move_to_new_page() & __unmap_and_move().
+* In order to avoid having an already isolated balloon page
+* being (wrongly) re-isolated while it is under migration,
+* lets be sure we have the page lock before proceeding with
+* the balloon page isolation steps.
+*/
+   if (likely(trylock_page(page))) {
+   /*
+* A ballooned page, by default, has just one refcount.
+* Prevent concurrent compaction threads from isolating
+* an already isolated balloon page.
+*/
+   if (movable_balloon_page(page) &&
+   (page_count(page) == 2)) {
+   __isolate_balloon_page(page);
+   unlock_page(page);
+   return true;
+   }
+   unlock_page(page);
+   }
+   /* Drop 

[PATCH v7 2/4] virtio_balloon: introduce migration primitives to balloon pages

2012-08-10 Thread Rafael Aquini
Memory fragmentation introduced by ballooning might reduce significantly
the number of 2MB contiguous memory blocks that can be used within a guest,
thus imposing performance penalties associated with the reduced number of
transparent huge pages that could be used by the guest workload.

Besides making balloon pages movable at allocation time and introducing
the necessary primitives to perform balloon page migration/compaction,
this patch also introduces the following locking scheme to provide the
proper synchronization and protection for struct virtio_balloon elements
against concurrent accesses due to parallel operations introduced by
memory compaction / page migration.
 - balloon_lock (mutex) : synchronizes the access demand to elements of
  struct virtio_balloon and its queue operations;
 - pages_lock (spinlock): special protection to balloon pages list against
  concurrent list handling operations;

Signed-off-by: Rafael Aquini 
---
 drivers/virtio/virtio_balloon.c | 138 +---
 include/linux/virtio_balloon.h  |   4 ++
 2 files changed, 134 insertions(+), 8 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 0908e60..7c937a0 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Balloon device works in 4K page units.  So each page is pointed to by
@@ -35,6 +36,12 @@
  */
 #define VIRTIO_BALLOON_PAGES_PER_PAGE (PAGE_SIZE >> VIRTIO_BALLOON_PFN_SHIFT)
 
+/* Synchronizes accesses/updates to the struct virtio_balloon elements */
+DEFINE_MUTEX(balloon_lock);
+
+/* Protects 'virtio_balloon->pages' list against concurrent handling */
+DEFINE_SPINLOCK(pages_lock);
+
 struct virtio_balloon
 {
struct virtio_device *vdev;
@@ -51,6 +58,7 @@ struct virtio_balloon
 
/* Number of balloon pages we've told the Host we're not using. */
unsigned int num_pages;
+
/*
 * The pages we've told the Host we're not using.
 * Each page on this list adds VIRTIO_BALLOON_PAGES_PER_PAGE
@@ -125,10 +133,12 @@ static void fill_balloon(struct virtio_balloon *vb, 
size_t num)
/* We can only do one array worth at a time. */
num = min(num, ARRAY_SIZE(vb->pfns));
 
+   mutex_lock(_lock);
for (vb->num_pfns = 0; vb->num_pfns < num;
 vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
-   struct page *page = alloc_page(GFP_HIGHUSER | __GFP_NORETRY |
-   __GFP_NOMEMALLOC | __GFP_NOWARN);
+   struct page *page = alloc_page(GFP_HIGHUSER_MOVABLE |
+   __GFP_NORETRY | __GFP_NOWARN |
+   __GFP_NOMEMALLOC);
if (!page) {
if (printk_ratelimit())
dev_printk(KERN_INFO, >vdev->dev,
@@ -141,7 +151,10 @@ static void fill_balloon(struct virtio_balloon *vb, size_t 
num)
set_page_pfns(vb->pfns + vb->num_pfns, page);
vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
totalram_pages--;
+   spin_lock(_lock);
list_add(>lru, >pages);
+   page->mapping = balloon_mapping;
+   spin_unlock(_lock);
}
 
/* Didn't get any?  Oh well. */
@@ -149,6 +162,7 @@ static void fill_balloon(struct virtio_balloon *vb, size_t 
num)
return;
 
tell_host(vb, vb->inflate_vq);
+   mutex_unlock(_lock);
 }
 
 static void release_pages_by_pfn(const u32 pfns[], unsigned int num)
@@ -169,10 +183,22 @@ static void leak_balloon(struct virtio_balloon *vb, 
size_t num)
/* We can only do one array worth at a time. */
num = min(num, ARRAY_SIZE(vb->pfns));
 
+   mutex_lock(_lock);
for (vb->num_pfns = 0; vb->num_pfns < num;
 vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
+   /*
+* We can race against virtballoon_isolatepage() and end up
+* stumbling across a _temporarily_ empty 'pages' list.
+*/
+   spin_lock(_lock);
+   if (unlikely(list_empty(>pages))) {
+   spin_unlock(_lock);
+   break;
+   }
page = list_first_entry(>pages, struct page, lru);
+   page->mapping = NULL;
list_del(>lru);
+   spin_unlock(_lock);
set_page_pfns(vb->pfns + vb->num_pfns, page);
vb->num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
}
@@ -182,8 +208,11 @@ static void leak_balloon(struct virtio_balloon *vb, size_t 
num)
 * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST);
 * is true, we *have* to do it in this order
 */
-   tell_host(vb, vb->deflate_vq);
-   

[PATCH v7 3/4] mm: introduce putback_movable_pages()

2012-08-10 Thread Rafael Aquini
The PATCH "mm: introduce compaction and migration for virtio ballooned pages"
hacks around putback_lru_pages() in order to allow ballooned pages to be
re-inserted on balloon page list as if a ballooned page was like a LRU page.

As ballooned pages are not legitimate LRU pages, this patch introduces
putback_movable_pages() to properly cope with cases where the isolated
pageset contains ballooned pages and LRU pages, thus fixing the mentioned
inelegant hack around putback_lru_pages().

Signed-off-by: Rafael Aquini 
---
 include/linux/migrate.h |  2 ++
 mm/compaction.c |  4 ++--
 mm/migrate.c| 20 
 mm/page_alloc.c |  2 +-
 4 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index ce7e667..ff103a1 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -10,6 +10,7 @@ typedef struct page *new_page_t(struct page *, unsigned long 
private, int **);
 #ifdef CONFIG_MIGRATION
 
 extern void putback_lru_pages(struct list_head *l);
+extern void putback_movable_pages(struct list_head *l);
 extern int migrate_page(struct address_space *,
struct page *, struct page *, enum migrate_mode);
 extern int migrate_pages(struct list_head *l, new_page_t x,
@@ -33,6 +34,7 @@ extern int migrate_huge_page_move_mapping(struct 
address_space *mapping,
 #else
 
 static inline void putback_lru_pages(struct list_head *l) {}
+static inline void putback_movable_pages(struct list_head *l) {}
 static inline int migrate_pages(struct list_head *l, new_page_t x,
unsigned long private, bool offlining,
enum migrate_mode mode) { return -ENOSYS; }
diff --git a/mm/compaction.c b/mm/compaction.c
index e4e871b..8567bb8 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -837,9 +837,9 @@ static int compact_zone(struct zone *zone, struct 
compact_control *cc)
trace_mm_compaction_migratepages(nr_migrate - nr_remaining,
nr_remaining);
 
-   /* Release LRU pages not migrated */
+   /* Release isolated pages not migrated */
if (err) {
-   putback_lru_pages(>migratepages);
+   putback_movable_pages(>migratepages);
cc->nr_migratepages = 0;
if (err == -ENOMEM) {
ret = COMPACT_PARTIAL;
diff --git a/mm/migrate.c b/mm/migrate.c
index 80f22bb..1165134 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -79,6 +79,26 @@ void putback_lru_pages(struct list_head *l)
list_del(>lru);
dec_zone_page_state(page, NR_ISOLATED_ANON +
page_is_file_cache(page));
+   putback_lru_page(page);
+   }
+}
+
+/*
+ * Put previously isolated pages back onto the appropriated lists
+ * from where they were once taken off for compaction/migration.
+ *
+ * This function shall be used instead of putback_lru_pages(),
+ * whenever the isolated pageset has been built by isolate_migratepages_range()
+ */
+void putback_movable_pages(struct list_head *l)
+{
+   struct page *page;
+   struct page *page2;
+
+   list_for_each_entry_safe(page, page2, l, lru) {
+   list_del(>lru);
+   dec_zone_page_state(page, NR_ISOLATED_ANON +
+   page_is_file_cache(page));
if (unlikely(movable_balloon_page(page)))
putback_balloon_page(page);
else
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 009ac28..78b7663 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5669,7 +5669,7 @@ static int __alloc_contig_migrate_range(unsigned long 
start, unsigned long end)
0, false, MIGRATE_SYNC);
}
 
-   putback_lru_pages();
+   putback_movable_pages();
return ret > 0 ? 0 : ret;
 }
 
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v7 4/4] mm: add vm event counters for balloon pages compaction

2012-08-10 Thread Rafael Aquini
This patch introduces a new set of vm event counters to keep track of
ballooned pages compaction activity.

Signed-off-by: Rafael Aquini 
---
 drivers/virtio/virtio_balloon.c |  1 +
 include/linux/vm_event_item.h   |  8 +++-
 mm/compaction.c |  2 ++
 mm/migrate.c|  1 +
 mm/vmstat.c | 10 +-
 5 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 7c937a0..b8f7ea5 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -414,6 +414,7 @@ int virtballoon_migratepage(struct address_space *mapping,
 
mutex_unlock(_lock);
 
+   count_vm_event(COMPACTBALLOONMIGRATED);
return 0;
 }
 
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index 57f7b10..b1841a2 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -41,7 +41,13 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
 #ifdef CONFIG_COMPACTION
COMPACTBLOCKS, COMPACTPAGES, COMPACTPAGEFAILED,
COMPACTSTALL, COMPACTFAIL, COMPACTSUCCESS,
-#endif
+#if defined(CONFIG_VIRTIO_BALLOON) || defined(CONFIG_VIRTIO_BALLOON_MODULE)
+   COMPACTBALLOONISOLATED, /* isolated from balloon pagelist */
+   COMPACTBALLOONMIGRATED, /* balloon page sucessfully migrated */
+   COMPACTBALLOONRETURNED, /* putback to pagelist, not-migrated */
+   COMPACTBALLOONRELEASED, /* old-page released after migration */
+#endif /* CONFIG_VIRTIO_BALLOON || CONFIG_VIRTIO_BALLOON_MODULE */
+#endif /* CONFIG_COMPACTION */
 #ifdef CONFIG_HUGETLB_PAGE
HTLB_BUDDY_PGALLOC, HTLB_BUDDY_PGALLOC_FAIL,
 #endif
diff --git a/mm/compaction.c b/mm/compaction.c
index 8567bb8..ff0f9ac 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -77,6 +77,7 @@ bool isolate_balloon_page(struct page *page)
(page_count(page) == 2)) {
__isolate_balloon_page(page);
unlock_page(page);
+   count_vm_event(COMPACTBALLOONISOLATED);
return true;
}
unlock_page(page);
@@ -97,6 +98,7 @@ void putback_balloon_page(struct page *page)
__putback_balloon_page(page);
put_page(page);
unlock_page(page);
+   count_vm_event(COMPACTBALLOONRETURNED);
 }
 #endif /* CONFIG_VIRTIO_BALLOON || CONFIG_VIRTIO_BALLOON_MODULE */
 
diff --git a/mm/migrate.c b/mm/migrate.c
index 1165134..024566f 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -892,6 +892,7 @@ static int unmap_and_move(new_page_t get_new_page, unsigned 
long private,
page_is_file_cache(page));
put_page(page);
__free_page(page);
+   count_vm_event(COMPACTBALLOONRELEASED);
return rc;
}
 out:
diff --git a/mm/vmstat.c b/mm/vmstat.c
index df7a674..ad5c4f1 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -768,7 +768,15 @@ const char * const vmstat_text[] = {
"compact_stall",
"compact_fail",
"compact_success",
-#endif
+
+#if defined(CONFIG_VIRTIO_BALLOON) || defined(CONFIG_VIRTIO_BALLOON_MODULE)
+   "compact_balloon_isolated",
+   "compact_balloon_migrated",
+   "compact_balloon_returned",
+   "compact_balloon_released",
+#endif /* CONFIG_VIRTIO_BALLOON || CONFIG_VIRTIO_BALLOON_MODULE */
+
+#endif /* CONFIG_COMPACTION */
 
 #ifdef CONFIG_HUGETLB_PAGE
"htlb_buddy_alloc_success",
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v7 0/4] make balloon pages movable by compaction

2012-08-10 Thread Rafael Aquini
Memory fragmentation introduced by ballooning might reduce significantly
the number of 2MB contiguous memory blocks that can be used within a guest,
thus imposing performance penalties associated with the reduced number of
transparent huge pages that could be used by the guest workload.

This patch-set follows the main idea discussed at 2012 LSFMMS session:
"Ballooning for transparent huge pages" -- http://lwn.net/Articles/490114/
to introduce the required changes to the virtio_balloon driver, as well as
the changes to the core compaction & migration bits, in order to make those
subsystems aware of ballooned pages and allow memory balloon pages become
movable within a guest, thus avoiding the aforementioned fragmentation issue

Rafael Aquini (4):
  mm: introduce compaction and migration for virtio ballooned pages
  virtio_balloon: introduce migration primitives to balloon pages
  mm: introduce putback_movable_pages()
  mm: add vm event counters for balloon pages compaction

 drivers/virtio/virtio_balloon.c | 139 +---
 include/linux/migrate.h |   2 +
 include/linux/mm.h  |  17 +
 include/linux/virtio_balloon.h  |   4 ++
 include/linux/vm_event_item.h   |   8 ++-
 mm/compaction.c | 131 +++--
 mm/migrate.c|  51 ++-
 mm/page_alloc.c |   2 +-
 mm/vmstat.c |  10 ++-
 9 files changed, 331 insertions(+), 33 deletions(-)

Change log:
v7:
 * fix a potential page leak case at 'putback_balloon_page' (Mel);
 * adjust vm-events-counter patch and remove its drop-on-merge message (Rik);
 * add 'putback_movable_pages' to avoid hacks on 'putback_lru_pages' (Minchan);
v6:
 * rename 'is_balloon_page()' to 'movable_balloon_page()' (Rik);
v5:
 * address Andrew Morton's review comments on the patch series;
 * address a couple extra nitpick suggestions on PATCH 01 (Minchan);
v4: 
 * address Rusty Russel's review comments on PATCH 02;
 * re-base virtio_balloon patch on 9c378abc5c0c6fc8e3acf5968924d274503819b3;
V3: 
 * address reviewers nitpick suggestions on PATCH 01 (Mel, Minchan);
V2: 
 * address Mel Gorman's review comments on PATCH 01;


Preliminary test results:
(2 VCPU 2048mB RAM KVM guest running 3.6.0_rc1+ -- after a reboot)

* 64mB balloon:
[root@localhost ~]# awk '/compact/ {print}' /proc/vmstat
compact_blocks_moved 0
compact_pages_moved 0
compact_pagemigrate_failed 0
compact_stall 0
compact_fail 0
compact_success 0
compact_balloon_isolated 0
compact_balloon_migrated 0
compact_balloon_returned 0
compact_balloon_released 0
[root@localhost ~]# 
[root@localhost ~]# for i in $(seq 1 6); do echo 1 > 
/proc/sys/vm/compact_memory & done &>/dev/null 
[1]   Doneecho 1 > /proc/sys/vm/compact_memory
[2]   Doneecho 1 > /proc/sys/vm/compact_memory
[3]   Doneecho 1 > /proc/sys/vm/compact_memory
[4]   Doneecho 1 > /proc/sys/vm/compact_memory
[5]-  Doneecho 1 > /proc/sys/vm/compact_memory
[6]+  Doneecho 1 > /proc/sys/vm/compact_memory
[root@localhost ~]# 
[root@localhost ~]# awk '/compact/ {print}' /proc/vmstat
compact_blocks_moved 6579
compact_pages_moved 50114
compact_pagemigrate_failed 111
compact_stall 0
compact_fail 0
compact_success 0
compact_balloon_isolated 18361
compact_balloon_migrated 18306
compact_balloon_returned 55
compact_balloon_released 18306


* 128 mB balloon:
[root@localhost ~]# awk '/compact/ {print}' /proc/vmstat
compact_blocks_moved 0
compact_pages_moved 0
compact_pagemigrate_failed 0
compact_stall 0
compact_fail 0
compact_success 0
compact_balloon_isolated 0
compact_balloon_migrated 0
compact_balloon_returned 0
compact_balloon_released 0
[root@localhost ~]# 
[root@localhost ~]# for i in $(seq 1 6); do echo 1 > 
/proc/sys/vm/compact_memory & done &>/dev/null  
[1]   Doneecho 1 > /proc/sys/vm/compact_memory
[2]   Doneecho 1 > /proc/sys/vm/compact_memory
[3]   Doneecho 1 > /proc/sys/vm/compact_memory
[4]   Doneecho 1 > /proc/sys/vm/compact_memory
[5]-  Doneecho 1 > /proc/sys/vm/compact_memory
[6]+  Doneecho 1 > /proc/sys/vm/compact_memory
[root@localhost ~]# 
[root@localhost ~]# awk '/compact/ {print}' /proc/vmstat
compact_blocks_moved 6789
compact_pages_moved 64479
compact_pagemigrate_failed 127
compact_stall 0
compact_fail 0
compact_success 0
compact_balloon_isolated 33937
compact_balloon_migrated 33869
compact_balloon_returned 68
compact_balloon_released 33869

-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 11/11] protect architectures where THREAD_SIZE >= PAGE_SIZE against fork bombs

2012-08-10 Thread Kamezawa Hiroyuki
(2012/08/09 22:01), Glauber Costa wrote:
> Because those architectures will draw their stacks directly from the
> page allocator, rather than the slab cache, we can directly pass
> __GFP_KMEMCG flag, and issue the corresponding free_pages.
> 
> This code path is taken when the architecture doesn't define
> CONFIG_ARCH_THREAD_INFO_ALLOCATOR (only ia64 seems to), and has
> THREAD_SIZE >= PAGE_SIZE. Luckily, most - if not all - of the remaining
> architectures fall in this category.
> 
> This will guarantee that every stack page is accounted to the memcg the
> process currently lives on, and will have the allocations to fail if
> they go over limit.
> 
> For the time being, I am defining a new variant of THREADINFO_GFP, not
> to mess with the other path. Once the slab is also tracked by memcg, we
> can get rid of that flag.
> 
> Tested to successfully protect against :(){ :|:& };:
> 
> Signed-off-by: Glauber Costa 
> Acked-by: Frederic Weisbecker 
> CC: Christoph Lameter 
> CC: Pekka Enberg 
> CC: Michal Hocko 
> CC: Kamezawa Hiroyuki 
> CC: Johannes Weiner 
> CC: Suleiman Souhlal 

Acked-by: KAMEZAWA Hiroyuki 


> ---
>   include/linux/thread_info.h | 2 ++
>   kernel/fork.c   | 4 ++--
>   2 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
> index ccc1899..e7e0473 100644
> --- a/include/linux/thread_info.h
> +++ b/include/linux/thread_info.h
> @@ -61,6 +61,8 @@ extern long do_no_restart_syscall(struct restart_block 
> *parm);
>   # define THREADINFO_GFP (GFP_KERNEL | __GFP_NOTRACK)
>   #endif
>   
> +#define THREADINFO_GFP_ACCOUNTED (THREADINFO_GFP | __GFP_KMEMCG)
> +
>   /*
>* flag set/clear/test wrappers
>* - pass TIF_ constants to these functions
> diff --git a/kernel/fork.c b/kernel/fork.c
> index dc3ff16..b0b90c3 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -142,7 +142,7 @@ void __weak arch_release_thread_info(struct thread_info 
> *ti) { }
>   static struct thread_info *alloc_thread_info_node(struct task_struct *tsk,
> int node)
>   {
> - struct page *page = alloc_pages_node(node, THREADINFO_GFP,
> + struct page *page = alloc_pages_node(node, THREADINFO_GFP_ACCOUNTED,
>THREAD_SIZE_ORDER);
>   
>   return page ? page_address(page) : NULL;
> @@ -151,7 +151,7 @@ static struct thread_info *alloc_thread_info_node(struct 
> task_struct *tsk,
>   static inline void free_thread_info(struct thread_info *ti)
>   {
>   arch_release_thread_info(ti);
> - free_pages((unsigned long)ti, THREAD_SIZE_ORDER);
> + free_accounted_pages((unsigned long)ti, THREAD_SIZE_ORDER);
>   }
>   # else
>   static struct kmem_cache *thread_info_cache;
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Upgraded from 3.4 to 3.5.1 kernel: machine does not boot

2012-08-10 Thread Jesper Juhl
On Fri, 10 Aug 2012, Justin Piszcz wrote:

> Hello,
> 
> Motherboard: Supermicro X8DTH-6F
> Distro: Debian Testing x86_64
> 
> >From 3.4 -> 3.5.1 on x86_64 make oldconfig and a few minor changes and the
> machine attempts to boot but hangs at the filesystem mounting part of the
> boot process.
> 
> Picture of where it stops working (a little burry but readable)
> http://home.comcast.net/~jpiszcz/20120810/3.5-kernel-hangs.jpg
> 
> Kernel config 3.4 (working)
> http://home.comcast.net/~jpiszcz/20120810/config-3.4.txt
> 
> Kernel config 3.5.1 (hangs)
> http://home.comcast.net/~jpiszcz/20120810/config-3.5.1.txt
> 
> As you see towards the end the machine has been sitting there for 1 hour as
> that's the timeout I have the drives spindown on the 3ware card.
> 
> Any thoughts as what is wrong here?
> 
Not really, but some (rather obvious) ideas on what to try:

- Does 3.5 work? (could be that whatever broke things for you was 
introduced in 3.5.1).

- Does the latest 3.4.8 work?

- Does 3.6-rc1 (or even the latest snapshot of Linus' tree, post-rc1) 
work?

- If noone comes up with a good idea as to the cause of your troubles, you 
could try bisecting between your last working kernel and 3.5.1 to try and 
narrow it down to one (or a few) commits that are causing your trouble.

-- 
Jesper Juhlhttp://www.chaosbits.net/
Don't top-post http://www.catb.org/jargon/html/T/top-post.html
Plain text mails only, please.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 09/11] memcg: propagate kmem limiting information to children

2012-08-10 Thread Kamezawa Hiroyuki
(2012/08/09 22:01), Glauber Costa wrote:
> The current memcg slab cache management fails to present satisfatory
> hierarchical behavior in the following scenario:
> 
> -> /cgroups/memory/A/B/C
> 
> * kmem limit set at A,
> * A and B have no tasks,
> * span a new task in in C.
> 
> Because kmem_accounted is a boolean that was not set for C, no
> accounting would be done. This is, however, not what we expect.
> 
> The basic idea, is that when a cgroup is limited, we walk the tree
> upwards (something Kame and I already thought about doing for other
> purposes), and make sure that we store the information about the parent
> being limited in kmem_accounted (that is turned into a bitmap: two
> booleans would not be space efficient). The code for that is taken from
> sched/core.c. My reasons for not putting it into a common place is to
> dodge the type issues that would arise from a common implementation
> between memcg and the scheduler - but I think that it should ultimately
> happen, so if you want me to do it now, let me know.
> 
> We do the reverse operation when a formerly limited cgroup becomes
> unlimited.
> 
> Signed-off-by: Glauber Costa 
> CC: Christoph Lameter 
> CC: Pekka Enberg 
> CC: Michal Hocko 
> CC: Kamezawa Hiroyuki 
> CC: Johannes Weiner 
> CC: Suleiman Souhlal 



> ---
>   mm/memcontrol.c | 88 
> +++--
>   1 file changed, 79 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 3216292..3d30b79 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -295,7 +295,8 @@ struct mem_cgroup {
>* Should the accounting and control be hierarchical, per subtree?
>*/
>   bool use_hierarchy;
> - bool kmem_accounted;
> +
> + unsigned long kmem_accounted; /* See KMEM_ACCOUNTED_*, below */
>   
>   booloom_lock;
>   atomic_tunder_oom;
> @@ -348,6 +349,38 @@ struct mem_cgroup {
>   #endif
>   };
>   
> +enum {
> + KMEM_ACCOUNTED_THIS, /* accounted by this cgroup itself */
> + KMEM_ACCOUNTED_PARENT, /* accounted by any of its parents. */
> +};
> +
> +#ifdef CONFIG_MEMCG_KMEM
> +static bool memcg_kmem_account(struct mem_cgroup *memcg)
> +{
> + return !test_and_set_bit(KMEM_ACCOUNTED_THIS, >kmem_accounted);
> +}
> +
> +static bool memcg_kmem_clear_account(struct mem_cgroup *memcg)
> +{
> + return test_and_clear_bit(KMEM_ACCOUNTED_THIS, >kmem_accounted);
> +}
> +
> +static bool memcg_kmem_is_accounted(struct mem_cgroup *memcg)
> +{
> + return test_bit(KMEM_ACCOUNTED_THIS, >kmem_accounted);
> +}
> +
> +static void memcg_kmem_account_parent(struct mem_cgroup *memcg)
> +{
> + set_bit(KMEM_ACCOUNTED_PARENT, >kmem_accounted);
> +}
> +
> +static void memcg_kmem_clear_account_parent(struct mem_cgroup *memcg)
> +{
> + clear_bit(KMEM_ACCOUNTED_PARENT, >kmem_accounted);
> +}
> +#endif /* CONFIG_MEMCG_KMEM */
> +
>   /* Stuffs for move charges at task migration. */
>   /*
>* Types of charges to be moved. "move_charge_at_immitgrate" is treated as a
> @@ -614,7 +647,7 @@ EXPORT_SYMBOL(__memcg_kmem_free_page);
>   
>   static void disarm_kmem_keys(struct mem_cgroup *memcg)
>   {
> - if (memcg->kmem_accounted)
> + if (test_bit(KMEM_ACCOUNTED_THIS, >kmem_accounted))
>   static_key_slow_dec(_kmem_enabled_key);
>   }
>   #else
> @@ -4171,17 +4204,54 @@ static ssize_t mem_cgroup_read(struct cgroup *cont, 
> struct cftype *cft,
>   static void memcg_update_kmem_limit(struct mem_cgroup *memcg, u64 val)
>   {
>   #ifdef CONFIG_MEMCG_KMEM
> - /*
> -  * Once enabled, can't be disabled. We could in theory disable it if we
> -  * haven't yet created any caches, or if we can shrink them all to
> -  * death. But it is not worth the trouble.
> -  */
> + struct mem_cgroup *iter;
> +
>   mutex_lock(_limit_mutex);
> - if (!memcg->kmem_accounted && val != RESOURCE_MAX) {
> + if ((val != RESOURCE_MAX) && memcg_kmem_account(memcg)) {
> +
> + /*
> +  * Once enabled, can't be disabled. We could in theory disable
> +  * it if we haven't yet created any caches, or if we can shrink
> +  * them all to death. But it is not worth the trouble
> +  */
>   static_key_slow_inc(_kmem_enabled_key);
> - memcg->kmem_accounted = true;
> +
> + if (!memcg->use_hierarchy)
> + goto out;
> +
> + for_each_mem_cgroup_tree(iter, memcg) {
> + if (iter == memcg)
> + continue;
> + memcg_kmem_account_parent(iter);
> + }

Could you add an explanation comment ?


> + } else if ((val == RESOURCE_MAX) && memcg_kmem_clear_account(memcg)) {
> +
> + if (!memcg->use_hierarchy)
> + goto out;
> +
ditto.

> + for_each_mem_cgroup_tree(iter, memcg) {
> + struct mem_cgroup *parent;
> 

Re: [PATCH v2] staging: gdm72xx: fix reference counting in gdm_wimax_event_init

2012-08-10 Thread Ben Chan
Hi Dan,

I manually walked through the driver code and spotted the issue. But
this morning I was able to get an extra module to verify my patch on
hardware.

I tested the following patterns using two identical modules, and
checked the creation/destruction/ref_cnt of wm_event:
- insert module A, remove A
- insert A, insert B, remove A, remove B
- insert A, insert B, remove B, remove A
- insert A, insert B, remove A, remove B
- insert A, insert B, remove B, insert B, remove B, remove A

Thanks,
Ben

On Thu, Aug 9, 2012 at 11:28 PM, Dan Carpenter  wrote:
> Ben, I'm confused.  Do you have a way to test this, or are you just
> doing manual review?
>
> regards,
> dan carpenter
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: null pointer dereference while loading i915

2012-08-10 Thread Mihai Moldovan
* On 10.08.2012 06:39 PM, Daniel Vetter wrote:
> On Fri, Aug 10, 2012 at 6:05 PM, Mihai Moldovan  wrote:
>> * On 10.08.2012 12:10 PM, Daniel Vetter wrote:
>>> On Wed, Aug 8, 2012 at 6:50 AM, Mihai Moldovan  wrote:
 Hi Daniel, hi list

 ever since version 3.2.0 (maybe even earlier, but 3.0.2 is still working 
 fine),
 my box is crashing when loading the i915 driver (mode-setting enabled.)

 The current version I'm testing with is 3.5.0.

 I was able to get the BUG output (please forgive any errors/flips in the 
 output,
 I have had to transcribe the messages from the screen/images), however, 
 I'm not
 able to find out what's wrong.

 If I see it correctly, there's a null pointer dereference in a printk 
 called
 from inside gmbus_xfer. The only printk calls I can see in
 drivers/gpu/drm/i915/intel_i2c.c gmbus_xfer() however are issued by the
 DRM_DEBUG_KMS() and DRM_INFO() macros.
 Neither call looks wrong to me, I even tried to swap adapter->name with
 bus->adapter.name and make *sure* i < num is true, but haven't had any 
 success.

 I'd really like to see this bug fixed, as it's preventing me from updating 
 the
 kernel for over a year now.

 Also, while 3.0.2 works, it *does* spew error/warning messages related to 
 gmbus
 and I've had corrupted VTs in the past (albeit after a long uptime with 
 multiple
 X restarting and DVI cable unplugging/reattaching events), so maybe 
 there's a
 lot more broken than "expected".
>>> Hm, this is rather strange. gmbus should not be enable on 3.2 nor 3.0,
>>> since exactly this issue might happen. We've re-enabled gmbus again on
>>> 3.5 after having fixed this bug. Are you sure that this is plain 3.2
>>> you're running?
>> Sorry, I messed up the version numbers. Started bisecting yesterday and 
>> noticed,
>> that 3.0 up to 3.2 still work "fine" (see below), instead I've had another
>> problem with 3.2 (completely lockup after the kernel is running for a few
>> minutes, but I have no idea where this issue is coming from. Seems to be
>> happening with 3.2.0 only, so... *shrug*)
>>
>> 3.0.2   => working, gmbus warnings as posted.
>> 3.1-09933/07170 => working, NO gmbus warnings, but render errors (see below)
>> 3.2-rc2 to rc4  => working, NO gmbus warnings, but render errors (see below)
>> --- (stopped bisecting 3.0 to 3.2 as this was pointless) ---
>> --- (restarted bisecting with 3.2 to 3.5) ---
>> 3.3.0-06109 => working, gmbus warnings just like with 3.0, render errors
>> (see below)
>> 3.4.0-07487 => working, gmbus warnings, hang errors (see below)
>> ...
>>
>> I've done more steps, but have not yet finished bisecting, so stay tuned.
>> All those render errors look like that:
>>
>> [drm] capturing error event; look for more information in
>> /debug/dri/0/i915_error_state
>> render error detected, EIR: 0x0010
>>   IPEIR: 0x
>>   IPEHR: 0x0200
>>   INSTDONE: 0x
>>   INSTPS: 0x8001e025
>>   INSTDONE1: 0xbfbb
>>   ACTHD: 0x00a4203c
>> page table error
>>   PGTBL_ER: 0x0010
>> [drm:i915_report_and_clear_eir] *ERROR* EIR stuck: 0x0010, masking
>>
>> I'll finish bisecting (and hope, that my guess was right, concerning the
>> varaiant I wasn't able to build) and will post the bisect log when done.
>>
>> Meanwhile: at least for 3.0.2 and even older versions, gmbus must have been
>> enabled as I'm pretty sure I always saw those errors when booting (just
>> confirmed via logs for 3.0.0, 26.38.6, 2.6.39). Doesn't come up with 2.6.34,
>> 2.6.36.1, 3.1-..., 3.2-... though.
> Yeah, we've enabled gmbus a few times and then disabled it again due
> to bugs. Also, the usual debug messsage says gmbus even when gmbus
> isn't on ... yeah, slightly confusing, but that should be fixed, too.

Hm, OK.

Well, I'm done now.

bisect log:

git bisect start
# good: [805a6af8dba5dfdd35ec35dc52ec0122400b2610] Linux 3.2
git bisect good 805a6af8dba5dfdd35ec35dc52ec0122400b2610
# bad: [28a33cbc24e4256c143dce96c7d93bf423229f92] Linux 3.5
git bisect bad 28a33cbc24e4256c143dce96c7d93bf423229f92
# good: [49d99a2f9c4d033cc3965958a1397b1fad573dd3] Merge branch 'for-linus' of
git://oss.sgi.com/xfs/xfs
git bisect good 49d99a2f9c4d033cc3965958a1397b1fad573dd3
# good: [813a95e5b4fa936bbde10ef89188932745dcd7f4] Merge tag 'pinctrl' of
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
git bisect good 813a95e5b4fa936bbde10ef89188932745dcd7f4
# bad: [9978306e31a8f89bd81fbc4c49fd9aefb1d30d10] Merge branch 'for-linus' of
git://oss.sgi.com/xfs/xfs
git bisect bad 9978306e31a8f89bd81fbc4c49fd9aefb1d30d10
# good: [927ad551031798d4cba49766549600bbb33872d7] Merge tag
'ktest-v3.5-spelling' of
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest
git bisect good 927ad551031798d4cba49766549600bbb33872d7
# good: [2c01e7bc46f10e9190818437e564f7e0db875ae9] Merge branch 'for-linus' of

Re: [PATCH v2] SubmittingPatches: clarify SOB tag usage when evolving submissions

2012-08-10 Thread Randy Dunlap
On 08/09/2012 02:48 PM, Luis R. Rodriguez wrote:

> From: "Luis R. Rodriguez" 
> 
> Initial large code submissions typically are not accepted
> on their first patch submission. The developers are
> typically given feedback and at times some developers may
> even submit changes to the original authors for integration
> into their second submission attempt.
> 
> Developers wishing to contribute changes to the evolution
> of a second patch submission must supply their own Siged-off-by
> tag to the original authors and must submit their changes
> on a public mailing list or ensure that these submission
> are recorded somewhere publicly.
> 
> To date a few of these type of contributors have expressed
> different preferences for whether or not their own SOB tag
> should be used for a second code submission. Lets keep things
> simple and only require the contributor's SOB tag if so desired
> explicitly. It is not technically required if there already
> is a public record of their contribution somewhere.
> 
> Document this on Documentation/SubmittingPatches
> 
> Signed-off-by: Luis R. Rodriguez 


Note:  I'm no longer maintaining Documentation/, so I'm cc-ing Rob.

> ---
> 
> This v2 has Singed/Signed typo fixes.
> 
>  Documentation/SubmittingPatches |   15 +++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
> index c379a2a..3154565 100644
> --- a/Documentation/SubmittingPatches
> +++ b/Documentation/SubmittingPatches
> @@ -366,6 +366,21 @@ and protect the submitter from complaints. Note that 
> under no circumstances
>  can you change the author's identity (the From header), as it is the one
>  which appears in the changelog.
>  
> +If you are submitting a large change (for example a new driver) at times
> +you may be asked to make quite a lot of modifications prior to getting
> +your change accepted. At times you may even receive patches from developers
> +who not only wish to tell you what you should change to get your changes
> +upstream but actually send you patches. If those patches were made publicly
> +and they do contain a Signed-off-by tag you are not expected to provide


I would add a comma:   tag,

but for a patch that attempts to clarify, I don't find it very helpful.

> +their own Signed-off-by tag on the second iteration of the patch so long
> +as there is a public record somewhere that can be used to show the
> +contributor had sent their changes with their own Signed-off-by tag.

> +

> +If you receive patches privately during development you may want to
> +ask for these patches to be re-posted publicly or you can also decide
> +to merge the patches as part of a separate historical git tree that
> +will remain online for historical archiving.


I don't think it's a good idea to require a historical git archive for
(private) patches.  If I send a patch privately and it contains an SOB:
line, then the maintainer should be able to apply the patch and
use the SOB: from the patch (IMO).  Are you addressing some concern
about fraudulent emails/patches?

> +
>  Special note to back-porters: It seems to be a common and useful practise
>  to insert an indication of the origin of a patch at the top of the commit
>  message (just after the subject line) to facilitate tracking. For instance,



-- 
~Randy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] floppy: remove check for allocated queue on do_floppy_init error handling

2012-08-10 Thread Vivek Goyal
On Thu, Aug 09, 2012 at 04:59:50PM -0300, Herton Ronaldo Krzesinski wrote:
> The check "if (disks[dr]->queue)" check is bogus, if we reach there
> for each dr should exist an queue allocated (note that we decrement dr
> first on entering the loop).
> 
> Signed-off-by: Herton Ronaldo Krzesinski 

As mentioned in second patch, I like going trhough full array of drives
and do cleanup as needed instead of relying on "dr" variable. 

But if you don't like that, then I am not as such against this approach.
Was just trying to keep all put_disk() at one place.

Thanks
Vivek

> ---
>  drivers/block/floppy.c |   16 +++-
>  1 file changed, 7 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
> index 3eafe93..438ffc9 100644
> --- a/drivers/block/floppy.c
> +++ b/drivers/block/floppy.c
> @@ -4332,15 +4332,13 @@ out_unreg_blkdev:
>  out_put_disk:
>   while (dr--) {
>   del_timer_sync(_off_timer[dr]);
> - if (disks[dr]->queue) {
> - blk_cleanup_queue(disks[dr]->queue);
> - /*
> -  * put_disk() is not paired with add_disk() and
> -  * will put queue reference one extra time. fix it.
> -  */
> - if (!disk_registered[dr])
> - disks[dr]->queue = NULL;
> - }
> + blk_cleanup_queue(disks[dr]->queue);
> + /*
> +  * put_disk() is not paired with add_disk() and
> +  * will put queue reference one extra time. fix it.
> +  */
> + if (!disk_registered[dr])
> + disks[dr]->queue = NULL;
>   put_disk(disks[dr]);
>   }
>   destroy_workqueue(floppy_wq);
> -- 
> 1.7.9.5
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 07/11] mm: Allocate kernel pages to the right memcg

2012-08-10 Thread Greg Thelen
On Thu, Aug 09 2012, Glauber Costa wrote:

> When a process tries to allocate a page with the __GFP_KMEMCG flag, the
> page allocator will call the corresponding memcg functions to validate
> the allocation. Tasks in the root memcg can always proceed.
>
> To avoid adding markers to the page - and a kmem flag that would
> necessarily follow, as much as doing page_cgroup lookups for no reason,
> whoever is marking its allocations with __GFP_KMEMCG flag is responsible
> for telling the page allocator that this is such an allocation at
> free_pages() time. This is done by the invocation of
> __free_accounted_pages() and free_accounted_pages().
>
> Signed-off-by: Glauber Costa 
> CC: Christoph Lameter 
> CC: Pekka Enberg 
> CC: Michal Hocko 
> CC: Kamezawa Hiroyuki 
> CC: Johannes Weiner 
> CC: Suleiman Souhlal 
> ---
>  include/linux/gfp.h |  3 +++
>  mm/page_alloc.c | 38 ++
>  2 files changed, 41 insertions(+)
>
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index d8eae4d..029570f 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -370,6 +370,9 @@ extern void free_pages(unsigned long addr, unsigned int 
> order);
>  extern void free_hot_cold_page(struct page *page, int cold);
>  extern void free_hot_cold_page_list(struct list_head *list, int cold);
>  
> +extern void __free_accounted_pages(struct page *page, unsigned int order);
> +extern void free_accounted_pages(unsigned long addr, unsigned int order);
> +
>  #define __free_page(page) __free_pages((page), 0)
>  #define free_page(addr) free_pages((addr), 0)
>  
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index b956cec..da341dc 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2532,6 +2532,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int 
> order,
>   struct page *page = NULL;
>   int migratetype = allocflags_to_migratetype(gfp_mask);
>   unsigned int cpuset_mems_cookie;
> + void *handle = NULL;
>  
>   gfp_mask &= gfp_allowed_mask;
>  
> @@ -2543,6 +2544,13 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int 
> order,
>   return NULL;
>  
>   /*
> +  * Will only have any effect when __GFP_KMEMCG is set.
> +  * This is verified in the (always inline) callee
> +  */
> + if (!memcg_kmem_new_page(gfp_mask, , order))
> + return NULL;
> +
> + /*
>* Check the zones suitable for the gfp_mask contain at least one
>* valid zone. It's possible to have an empty zonelist as a result
>* of GFP_THISNODE and a memoryless node
> @@ -2583,6 +2591,8 @@ out:
>   if (unlikely(!put_mems_allowed(cpuset_mems_cookie) && !page))
>   goto retry_cpuset;
>  
> + memcg_kmem_commit_page(page, handle, order);
> +
>   return page;
>  }
>  EXPORT_SYMBOL(__alloc_pages_nodemask);
> @@ -2635,6 +2645,34 @@ void free_pages(unsigned long addr, unsigned int order)
>  
>  EXPORT_SYMBOL(free_pages);
>  
> +/*
> + * __free_accounted_pages and free_accounted_pages will free pages allocated
> + * with __GFP_KMEMCG.
> + *
> + * Those pages are accounted to a particular memcg, embedded in the
> + * corresponding page_cgroup. To avoid adding a hit in the allocator to 
> search
> + * for that information only to find out that it is NULL for users who have 
> no
> + * interest in that whatsoever, we provide these functions.
> + *
> + * The caller knows better which flags it relies on.
> + */
> +void __free_accounted_pages(struct page *page, unsigned int order)
> +{
> + memcg_kmem_free_page(page, order);
> + __free_pages(page, order);
> +}
> +EXPORT_SYMBOL(__free_accounted_pages);
> +
> +void free_accounted_pages(unsigned long addr, unsigned int order)
> +{
> + if (addr != 0) {
> + VM_BUG_ON(!virt_addr_valid((void *)addr));
> + memcg_kmem_free_page(virt_to_page((void *)addr), order);
> + __free_pages(virt_to_page((void *)addr), order);

Nit.  Is there any reason not to replace the above two lines with:
__free_accounted_pages(virt_to_page((void *)addr), order);

> + }
> +}
> +EXPORT_SYMBOL(free_accounted_pages);
> +
>  static void *make_alloc_exact(unsigned long addr, unsigned order, size_t 
> size)
>  {
>   if (addr) {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/6] floppy: use disk_registered for checking if a drive is present

2012-08-10 Thread Vivek Goyal
On Thu, Aug 09, 2012 at 04:59:51PM -0300, Herton Ronaldo Krzesinski wrote:
> Simplify/cleanup code, replacing remaining checks for drives present
> using disk_registered array.
> 
> Signed-off-by: Herton Ronaldo Krzesinski 
> ---

Looks good to me.

Acked-by: Vivek Goyal 

Vivek

>  drivers/block/floppy.c |   10 +++---
>  1 file changed, 3 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
> index 438ffc9..5fcc2a1 100644
> --- a/drivers/block/floppy.c
> +++ b/drivers/block/floppy.c
> @@ -4114,9 +4114,7 @@ static struct platform_device floppy_device[N_DRIVE];
>  static struct kobject *floppy_find(dev_t dev, int *part, void *data)
>  {
>   int drive = (*part & 3) | ((*part & 0x80) >> 5);
> - if (drive >= N_DRIVE ||
> - !(allowed_drive_mask & (1 << drive)) ||
> - fdc_state[FDC(drive)].version == FDC_NONE)
> + if (drive >= N_DRIVE || !disk_registered[drive])
>   return NULL;
>   if (((*part >> 2) & 0x1f) >= ARRAY_SIZE(floppy_type))
>   return NULL;
> @@ -4559,8 +4557,7 @@ static void __exit floppy_module_exit(void)
>   for (drive = 0; drive < N_DRIVE; drive++) {
>   del_timer_sync(_off_timer[drive]);
>  
> - if ((allowed_drive_mask & (1 << drive)) &&
> - fdc_state[FDC(drive)].version != FDC_NONE) {
> + if (disk_registered[drive]) {
>   del_gendisk(disks[drive]);
>   device_remove_file(_device[drive].dev, 
> _attr_cmos);
>   platform_device_unregister(_device[drive]);
> @@ -4571,8 +4568,7 @@ static void __exit floppy_module_exit(void)
>* These disks have not called add_disk().  Don't put down
>* queue reference in put_disk().
>*/
> - if (!(allowed_drive_mask & (1 << drive)) ||
> - fdc_state[FDC(drive)].version == FDC_NONE)
> + if (!disk_registered[drive])
>   disks[drive]->queue = NULL;
>  
>   put_disk(disks[drive]);
> -- 
> 1.7.9.5
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/6] floppy: properly handle failure on add_disk loop

2012-08-10 Thread Vivek Goyal
On Thu, Aug 09, 2012 at 04:59:49PM -0300, Herton Ronaldo Krzesinski wrote:
> On do_floppy_init, if something failed inside the loop we call add_disk,
> there was no cleanup of previous iterations in the error handling.
> 
> Signed-off-by: Herton Ronaldo Krzesinski 
> Cc: sta...@vger.kernel.org
> ---

Looks good to me.

Acked-by: Vivek Goyal 

Vivek

>  drivers/block/floppy.c |   10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
> index 9272203..3eafe93 100644
> --- a/drivers/block/floppy.c
> +++ b/drivers/block/floppy.c
> @@ -4294,7 +4294,7 @@ static int __init do_floppy_init(void)
>  
>   err = platform_device_register(_device[drive]);
>   if (err)
> - goto out_release_dma;
> + goto out_remove_drives;
>  
>   err = device_create_file(_device[drive].dev,
>_attr_cmos);
> @@ -4313,6 +4313,14 @@ static int __init do_floppy_init(void)
>  
>  out_unreg_platform_dev:
>   platform_device_unregister(_device[drive]);
> +out_remove_drives:
> + while (drive--) {
> + if (disk_registered[drive]) {
> + del_gendisk(disks[drive]);
> + device_remove_file(_device[drive].dev, 
> _attr_cmos);
> + platform_device_unregister(_device[drive]);
> + }
> + }
>  out_release_dma:
>   if (atomic_read(_count))
>   floppy_release_irq_and_dma();
> -- 
> 1.7.9.5
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 07/11] mm: Allocate kernel pages to the right memcg

2012-08-10 Thread Kamezawa Hiroyuki
(2012/08/09 22:01), Glauber Costa wrote:
> When a process tries to allocate a page with the __GFP_KMEMCG flag, the
> page allocator will call the corresponding memcg functions to validate
> the allocation. Tasks in the root memcg can always proceed.
> 
> To avoid adding markers to the page - and a kmem flag that would
> necessarily follow, as much as doing page_cgroup lookups for no reason,
> whoever is marking its allocations with __GFP_KMEMCG flag is responsible
> for telling the page allocator that this is such an allocation at
> free_pages() time. This is done by the invocation of
> __free_accounted_pages() and free_accounted_pages().
> 
> Signed-off-by: Glauber Costa 
> CC: Christoph Lameter 
> CC: Pekka Enberg 
> CC: Michal Hocko 
> CC: Kamezawa Hiroyuki 
> CC: Johannes Weiner 
> CC: Suleiman Souhlal 

Ah, ok. free_accounted_page() seems good.

Acked-by: KAMEZAWA Hiroyuki 

I myself is okay with this. But...

Because you add a new hook to alloc_pages(), please get Ack from Mel
before requesting merge.

Thanks,
-Kame




> ---
>   include/linux/gfp.h |  3 +++
>   mm/page_alloc.c | 38 ++
>   2 files changed, 41 insertions(+)
> 
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index d8eae4d..029570f 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -370,6 +370,9 @@ extern void free_pages(unsigned long addr, unsigned int 
> order);
>   extern void free_hot_cold_page(struct page *page, int cold);
>   extern void free_hot_cold_page_list(struct list_head *list, int cold);
>   
> +extern void __free_accounted_pages(struct page *page, unsigned int order);
> +extern void free_accounted_pages(unsigned long addr, unsigned int order);
> +
>   #define __free_page(page) __free_pages((page), 0)
>   #define free_page(addr) free_pages((addr), 0)
>   
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index b956cec..da341dc 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2532,6 +2532,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int 
> order,
>   struct page *page = NULL;
>   int migratetype = allocflags_to_migratetype(gfp_mask);
>   unsigned int cpuset_mems_cookie;
> + void *handle = NULL;
>   
>   gfp_mask &= gfp_allowed_mask;
>   
> @@ -2543,6 +2544,13 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int 
> order,
>   return NULL;
>   
>   /*
> +  * Will only have any effect when __GFP_KMEMCG is set.
> +  * This is verified in the (always inline) callee
> +  */
> + if (!memcg_kmem_new_page(gfp_mask, , order))
> + return NULL;
> +
> + /*
>* Check the zones suitable for the gfp_mask contain at least one
>* valid zone. It's possible to have an empty zonelist as a result
>* of GFP_THISNODE and a memoryless node
> @@ -2583,6 +2591,8 @@ out:
>   if (unlikely(!put_mems_allowed(cpuset_mems_cookie) && !page))
>   goto retry_cpuset;
>   
> + memcg_kmem_commit_page(page, handle, order);
> +
>   return page;
>   }
>   EXPORT_SYMBOL(__alloc_pages_nodemask);
> @@ -2635,6 +2645,34 @@ void free_pages(unsigned long addr, unsigned int order)
>   
>   EXPORT_SYMBOL(free_pages);
>   
> +/*
> + * __free_accounted_pages and free_accounted_pages will free pages allocated
> + * with __GFP_KMEMCG.
> + *
> + * Those pages are accounted to a particular memcg, embedded in the
> + * corresponding page_cgroup. To avoid adding a hit in the allocator to 
> search
> + * for that information only to find out that it is NULL for users who have 
> no
> + * interest in that whatsoever, we provide these functions.
> + *
> + * The caller knows better which flags it relies on.
> + */
> +void __free_accounted_pages(struct page *page, unsigned int order)
> +{
> + memcg_kmem_free_page(page, order);
> + __free_pages(page, order);
> +}
> +EXPORT_SYMBOL(__free_accounted_pages);
> +
> +void free_accounted_pages(unsigned long addr, unsigned int order)
> +{
> + if (addr != 0) {
> + VM_BUG_ON(!virt_addr_valid((void *)addr));
> + memcg_kmem_free_page(virt_to_page((void *)addr), order);
> + __free_pages(virt_to_page((void *)addr), order);
> + }
> +}
> +EXPORT_SYMBOL(free_accounted_pages);
> +
>   static void *make_alloc_exact(unsigned long addr, unsigned order, size_t 
> size)
>   {
>   if (addr) {
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 02/11] memcg: Reclaim when more than one page needed.

2012-08-10 Thread Michal Hocko
On Thu 09-08-12 17:01:10, Glauber Costa wrote:
[...]
> For now retry up to COSTLY_ORDER (as page_alloc.c does) and make sure
> not to do it if __GFP_NORETRY.

Who is using __GFP_NORETRY for user backed memory (except for hugetlb
which has its own controller)?

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 02/11] memcg: Reclaim when more than one page needed.

2012-08-10 Thread Michal Hocko
On Sat 11-08-12 01:49:25, KAMEZAWA Hiroyuki wrote:
> (2012/08/11 0:42), Michal Hocko wrote:
> >On Thu 09-08-12 17:01:10, Glauber Costa wrote:
> >[...]
> >>@@ -2317,18 +2318,18 @@ static int mem_cgroup_do_charge(struct mem_cgroup 
> >>*memcg, gfp_t gfp_mask,
> >>} else
> >>mem_over_limit = mem_cgroup_from_res_counter(fail_res, res);
> >>/*
> >>-* nr_pages can be either a huge page (HPAGE_PMD_NR), a batch
> >>-* of regular pages (CHARGE_BATCH), or a single regular page (1).
> >>-*
> >> * Never reclaim on behalf of optional batching, retry with a
> >> * single page instead.
> >> */
> >>-   if (nr_pages == CHARGE_BATCH)
> >>+   if (nr_pages > min_pages)
> >>return CHARGE_RETRY;
> >
> >This is dangerous because THP charges will be retried now while they
> >previously failed with CHARGE_NOMEM which means that we will keep
> >attempting potentially endlessly.
> 
> with THP, I thought nr_pages == min_pages, and no retry.

right you are.

> >Why cannot we simply do if (nr_pages < CHARGE_BATCH) and get rid of the
> >min_pages altogether?
> 
> Hm, I think a slab can be larger than CHARGE_BATCH.
>
> >Also the comment doesn't seem to be valid anymore.
> >
> I agree it's not clean. Because our assumption on nr_pages are changed,
> I think this behavior should not depend on nr_pages value..
> Shouldn't we have a flag to indicate "trial-for-batched charge" ?

dunno, it would require a new parameter anyway (because abusing gfp
doesn't seem great idea).

> 
> 
> Thanks,
> -Kame
> 
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 06/11] memcg: kmem controller infrastructure

2012-08-10 Thread Kamezawa Hiroyuki
(2012/08/09 22:01), Glauber Costa wrote:
> This patch introduces infrastructure for tracking kernel memory pages to
> a given memcg. This will happen whenever the caller includes the flag
> __GFP_KMEMCG flag, and the task belong to a memcg other than the root.
> 
> In memcontrol.h those functions are wrapped in inline accessors.  The
> idea is to later on, patch those with static branches, so we don't incur
> any overhead when no mem cgroups with limited kmem are being used.
> 
> [ v2: improved comments and standardized function names ]
> 
> Signed-off-by: Glauber Costa 
> CC: Christoph Lameter 
> CC: Pekka Enberg 
> CC: Michal Hocko 
> CC: Kamezawa Hiroyuki 
> CC: Johannes Weiner 
> ---
>   include/linux/memcontrol.h |  79 +++
>   mm/memcontrol.c| 185 
> +
>   2 files changed, 264 insertions(+)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 8d9489f..75b247e 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -21,6 +21,7 @@
>   #define _LINUX_MEMCONTROL_H
>   #include 
>   #include 
> +#include 
>   
>   struct mem_cgroup;
>   struct page_cgroup;
> @@ -399,6 +400,11 @@ struct sock;
>   #ifdef CONFIG_MEMCG_KMEM
>   void sock_update_memcg(struct sock *sk);
>   void sock_release_memcg(struct sock *sk);
> +
> +#define memcg_kmem_on 1
> +bool __memcg_kmem_new_page(gfp_t gfp, void *handle, int order);
> +void __memcg_kmem_commit_page(struct page *page, void *handle, int order);
> +void __memcg_kmem_free_page(struct page *page, int order);
>   #else
>   static inline void sock_update_memcg(struct sock *sk)
>   {
> @@ -406,6 +412,79 @@ static inline void sock_update_memcg(struct sock *sk)
>   static inline void sock_release_memcg(struct sock *sk)
>   {
>   }
> +
> +#define memcg_kmem_on 0
> +static inline bool
> +__memcg_kmem_new_page(gfp_t gfp, void *handle, int order)
> +{
> + return false;
> +}
> +
> +static inline void  __memcg_kmem_free_page(struct page *page, int order)
> +{
> +}
> +
> +static inline void
> +__memcg_kmem_commit_page(struct page *page, struct mem_cgroup *handle, int 
> order)
> +{
> +}
>   #endif /* CONFIG_MEMCG_KMEM */
> +
> +/**
> + * memcg_kmem_new_page: verify if a new kmem allocation is allowed.
> + * @gfp: the gfp allocation flags.
> + * @handle: a pointer to the memcg this was charged against.
> + * @order: allocation order.
> + *
> + * returns true if the memcg where the current task belongs can hold this
> + * allocation.
> + *
> + * We return true automatically if this allocation is not to be accounted to
> + * any memcg.
> + */
> +static __always_inline bool
> +memcg_kmem_new_page(gfp_t gfp, void *handle, int order)
> +{
> + if (!memcg_kmem_on)
> + return true;
> + if (!(gfp & __GFP_KMEMCG) || (gfp & __GFP_NOFAIL))
> + return true;
> + if (in_interrupt() || (!current->mm) || (current->flags & PF_KTHREAD))
> + return true;
> + return __memcg_kmem_new_page(gfp, handle, order);
> +}
> +
> +/**
> + * memcg_kmem_free_page: uncharge pages from memcg
> + * @page: pointer to struct page being freed
> + * @order: allocation order.
> + *
> + * there is no need to specify memcg here, since it is embedded in 
> page_cgroup
> + */
> +static __always_inline void
> +memcg_kmem_free_page(struct page *page, int order)
> +{
> + if (memcg_kmem_on)
> + __memcg_kmem_free_page(page, order);
> +}
> +
> +/**
> + * memcg_kmem_commit_page: embeds correct memcg in a page
> + * @handle: a pointer to the memcg this was charged against.
> + * @page: pointer to struct page recently allocated
> + * @handle: the memcg structure we charged against
> + * @order: allocation order.
> + *
> + * Needs to be called after memcg_kmem_new_page, regardless of success or
> + * failure of the allocation. if @page is NULL, this function will revert the
> + * charges. Otherwise, it will commit the memcg given by @handle to the
> + * corresponding page_cgroup.
> + */
> +static __always_inline void
> +memcg_kmem_commit_page(struct page *page, struct mem_cgroup *handle, int 
> order)
> +{
> + if (memcg_kmem_on)
> + __memcg_kmem_commit_page(page, handle, order);
> +}

Doesn't this 2 functions has no short-cuts ?

if (memcg_kmem_on && handle) ?

Maybe free() needs to access page_cgroup...



>   #endif /* _LINUX_MEMCONTROL_H */
>   
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 54e93de..e9824c1 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -10,6 +10,10 @@
>* Copyright (C) 2009 Nokia Corporation
>* Author: Kirill A. Shutemov
>*
> + * Kernel Memory Controller
> + * Copyright (C) 2012 Parallels Inc. and Google Inc.
> + * Authors: Glauber Costa and Suleiman Souhlal
> + *
>* This program is free software; you can redistribute it and/or modify
>* it under the terms of the GNU General Public License as published by
>* the Free Software Foundation; either version 2 of the 

Re: [PATCH] lib/parser.c: avoid overflow in match_number()

2012-08-10 Thread Randy Dunlap
On 08/09/2012 01:03 PM, Alex Elder wrote:

> The result of converting an integer value to another signed integer
> type that's unable to represent the original value is implementation
> defined.  (See notes in section 6.3.1.3 of the C standard.)
> 
> In match_number(), the result of simple_strtol() (which returns type
> long) is assigned to a value of type int.
> 
> Instead, handle the result of simple_strtol() in a well-defined way,
> and return -ERANGE if the result won't fit in the int variable used
> to hold the parsed result.
> 
> No current callers pay attention to the particular error value
> returned, so this additional return code shouldn't do any harm.
> 
> Signed-off-by: Alex Elder 


Makes sense to me, but I wonder who will merge it.

I'm Cc-ing a couple of possibilities.

> ---
>  lib/parser.c |   10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> Index: b/lib/parser.c
> ===
> --- a/lib/parser.c
> +++ b/lib/parser.c
> @@ -122,13 +122,14 @@ int match_token(char *s, const match_tab
>   *
>   * Description: Given a _t and a base, attempts to parse the 
> substring
>   * as a number in that base. On success, sets @result to the integer 
> represented
> - * by the string and returns 0. Returns either -ENOMEM or -EINVAL on failure.
> + * by the string and returns 0. Returns -ENOMEM, -EINVAL, or -ERANGE on 
> failure.
>   */
>  static int match_number(substring_t *s, int *result, int base)
>  {
>  char *endp;
>  char *buf;
>  int ret;
> +long val;
>  size_t len = s->to - s->from;
> 
>  buf = kmalloc(len + 1, GFP_KERNEL);
> @@ -136,10 +137,15 @@ static int match_number(substring_t *s,
>  return -ENOMEM;
>  memcpy(buf, s->from, len);
>  buf[len] = '\0';
> -*result = simple_strtol(buf, , base);
> +
>  ret = 0;
> +val = simple_strtol(buf, , base);
>  if (endp == buf)
>  ret = -EINVAL;
> +else if (val < (long) INT_MIN || val > (long) INT_MAX)
> +ret = -ERANGE;
> +else
> +*result = (int) val;
>  kfree(buf);
>  return ret;
>  }
> -- 


-- 
~Randy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/6] floppy: avoid leaking extra reference to queue on do_floppy_init error handling

2012-08-10 Thread Vivek Goyal
On Thu, Aug 09, 2012 at 04:59:48PM -0300, Herton Ronaldo Krzesinski wrote:
> After commit 3f9a5aa ("floppy: Cleanup disk->queue before caling
> put_disk() if add_disk() was never called"), if something fails in the
> add_disk loop, we unconditionally set disks[dr]->queue to NULL. But
> that's wrong, since we may have succesfully done an add_disk on some of
> the drives previously in the loop, and in this case we would end up with
> an extra reference to the disks[dr]->queue.
> 
> Add a new global array to mark "registered" disks, and use that to check
> if we did an add_disk on one of the disks already. Using an array to
> track added disks also will help to simplify/cleanup code later, as
> suggested by Vivek Goyal.
> 
> Signed-off-by: Herton Ronaldo Krzesinski 
> Cc: sta...@vger.kernel.org

Looks good to me.

Acked-by: Vivek Goyal 

Vivek

> ---
>  drivers/block/floppy.c |5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
> index 1e09e99..9272203 100644
> --- a/drivers/block/floppy.c
> +++ b/drivers/block/floppy.c
> @@ -409,6 +409,7 @@ static struct floppy_drive_struct drive_state[N_DRIVE];
>  static struct floppy_write_errors write_errors[N_DRIVE];
>  static struct timer_list motor_off_timer[N_DRIVE];
>  static struct gendisk *disks[N_DRIVE];
> +static bool disk_registered[N_DRIVE];
>  static struct block_device *opened_bdev[N_DRIVE];
>  static DEFINE_MUTEX(open_lock);
>  static struct floppy_raw_cmd *raw_cmd, default_raw_cmd;
> @@ -4305,6 +4306,7 @@ static int __init do_floppy_init(void)
>   disks[drive]->flags |= GENHD_FL_REMOVABLE;
>   disks[drive]->driverfs_dev = _device[drive].dev;
>   add_disk(disks[drive]);
> + disk_registered[drive] = true;
>   }
>  
>   return 0;
> @@ -4328,7 +4330,8 @@ out_put_disk:
>* put_disk() is not paired with add_disk() and
>* will put queue reference one extra time. fix it.
>*/
> - disks[dr]->queue = NULL;
> + if (!disk_registered[dr])
> + disks[dr]->queue = NULL;
>   }
>   put_disk(disks[dr]);
>   }
> -- 
> 1.7.9.5
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/6] floppy: do put_disk on current dr if blk_init_queue fails

2012-08-10 Thread Vivek Goyal
On Thu, Aug 09, 2012 at 04:59:47PM -0300, Herton Ronaldo Krzesinski wrote:
> If blk_init_queue fails, we do not call put_disk on the current dr
> (dr is decremented first in the error handling loop).
> 
> Signed-off-by: Herton Ronaldo Krzesinski 
> Cc: sta...@vger.kernel.org

Hi,

So for the current drive we do put_disk() here and for rest of the drives
we do it in out_put_disk:.

How about if we go through all the N_DRIVE always and do put disk as need be.

for(i = 0, i < N_DRIVE, i++) {
if (!disks[i])
continue;

if (disks[i]->queue)
blk_cleanup_queue();

if (!disk_registered[i])
disks[i]->queue = NULL;

put_disk();
}

It is little more lines of code but personally I find it easier to understand 
and less error prone as future modifications take place.

Thanks
Vivek

> ---
>  drivers/block/floppy.c |1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
> index c8d9e68..1e09e99 100644
> --- a/drivers/block/floppy.c
> +++ b/drivers/block/floppy.c
> @@ -4151,6 +4151,7 @@ static int __init do_floppy_init(void)
>  
>   disks[dr]->queue = blk_init_queue(do_fd_request, _lock);
>   if (!disks[dr]->queue) {
> + put_disk(disks[dr]);
>   err = -ENOMEM;
>   goto out_put_disk;
>   }
> -- 
> 1.7.9.5
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/5] Improve hugepage allocation success rates under load V3

2012-08-10 Thread Jim Schutt

On 08/10/2012 05:02 AM, Mel Gorman wrote:

On Thu, Aug 09, 2012 at 04:38:24PM -0600, Jim Schutt wrote:




Ok, this is an untested hack and I expect it would drop allocation success
rates again under load (but not as much). Can you test again and see what
effect, if any, it has please?

---8<---
mm: compaction: back out if contended

---




Initial testing with this patch looks very good from
my perspective; CPU utilization stays reasonable,
write-out rate stays high, no signs of stress.
Here's an example after ~10 minutes under my test load:



Hmmm, I wonder if I should have tested this patch longer,
in view of the trouble I ran into testing the new patch?
See below.



Excellent, so it is contention that is the problem.



I'll continue testing tomorrow to be sure nothing
shows up after continued testing.

If this passes your allocation success rate testing,
I'm happy with this performance for 3.6 - if not, I'll
be happy to test any further patches.



It does impair allocation success rates as I expected (they're still ok
but not as high as I'd like) so I implemented the following instead. It
attempts to backoff when contention is detected or compaction is taking
too long. It does not backoff as quickly as the first prototype did so
I'd like to see if it addresses your problem or not.


I really appreciate getting the chance to test out
your patchset.



I appreciate that you have a workload that demonstrates the problem and
will test patches. I will not abuse this and hope the keep the revisions
to a minimum.

Thanks.

---8<---
mm: compaction: Abort async compaction if locks are contended or taking too long



Hmmm, while testing this patch, a couple of my servers got
stuck after ~30 minutes or so, like this:

[ 2515.869936] INFO: task ceph-osd:30375 blocked for more than 120 seconds.
[ 2515.876630] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 2515.884447] ceph-osdD  0 30375  1 0x
[ 2515.891531]  8802e1a99e38 0082 88056b38e298 
8802e1a99fd8
[ 2515.899013]  8802e1a98010 8802e1a98000 8802e1a98000 
8802e1a98000
[ 2515.906482]  8802e1a99fd8 8802e1a98000 880697d31700 
8802e1a84500
[ 2515.913968] Call Trace:
[ 2515.916433]  [] schedule+0x5d/0x60
[ 2515.921417]  [] rwsem_down_failed_common+0x105/0x140
[ 2515.927938]  [] rwsem_down_write_failed+0x13/0x20
[ 2515.934195]  [] call_rwsem_down_write_failed+0x13/0x20
[ 2515.940934]  [] ? down_write+0x45/0x50
[ 2515.946244]  [] sys_mprotect+0xd2/0x240
[ 2515.951640]  [] system_call_fastpath+0x16/0x1b
[ 2515.957646] INFO: task ceph-osd:95698 blocked for more than 120 seconds.
[ 2515.964330] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 2515.972141] ceph-osdD  0 95698  1 0x
[ 2515.979223]  8802b049fe38 0082 88056b38e2a0 
8802b049ffd8
[ 2515.986700]  8802b049e010 8802b049e000 8802b049e000 
8802b049e000
[ 2515.994176]  8802b049ffd8 8802b049e000 8809832ddc00 
880611592e00
[ 2516.001653] Call Trace:
[ 2516.004111]  [] schedule+0x5d/0x60
[ 2516.009072]  [] rwsem_down_failed_common+0x105/0x140
[ 2516.015589]  [] rwsem_down_write_failed+0x13/0x20
[ 2516.021861]  [] call_rwsem_down_write_failed+0x13/0x20
[ 2516.028555]  [] ? down_write+0x45/0x50
[ 2516.033859]  [] sys_mprotect+0xd2/0x240
[ 2516.039248]  [] system_call_fastpath+0x16/0x1b
[ 2516.045248] INFO: task ceph-osd:95699 blocked for more than 120 seconds.
[ 2516.051934] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 2516.059753] ceph-osdD  0 95699  1 0x
[ 2516.066832]  880c022d3dc8 0082 880c022d2000 
880c022d3fd8
[ 2516.074302]  880c022d2010 880c022d2000 880c022d2000 
880c022d2000
[ 2516.081784]  880c022d3fd8 880c022d2000 8806224cc500 
88096b64dc00
[ 2516.089254] Call Trace:
[ 2516.091702]  [] schedule+0x5d/0x60
[ 2516.096656]  [] rwsem_down_failed_common+0x105/0x140
[ 2516.103176]  [] rwsem_down_write_failed+0x13/0x20
[ 2516.109443]  [] call_rwsem_down_write_failed+0x13/0x20
[ 2516.116134]  [] ? down_write+0x45/0x50
[ 2516.121442]  [] vm_mmap_pgoff+0x6e/0xb0
[ 2516.126861]  [] sys_mmap_pgoff+0x18a/0x190
[ 2516.132552]  [] ? trace_hardirqs_on_thunk+0x3a/0x3c
[ 2516.138985]  [] sys_mmap+0x22/0x30
[ 2516.143945]  [] system_call_fastpath+0x16/0x1b
[ 2516.149949] INFO: task ceph-osd:95816 blocked for more than 120 seconds.
[ 2516.156632] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 2516.16] ceph-osdD  0 95816  1 0x
[ 2516.171521]  880332991e38 0082 880332991de8 
880332991fd8
[ 2516.178992]  880332990010 88033299 88033299 
88033299
[ 2516.186466]  880332991fd8 88033299 880697d31700 
880a92c32e00
[ 

Re: [3.5 regression] DRM: Massive (EDID-probing?) X startup delay on ATI Radeon RV770 (HD4870)

2012-08-10 Thread Nix
On 6 Aug 2012, Alex Deucher verbalised:

> On Sat, Aug 4, 2012 at 12:13 PM, Nix  wrote:
>> Possibly-relevant info:
>>
>>  - Two DVI monitors, identical specs, one dual-head graphics card
>>(so no VGA switcheroo or awesome-yet-terrifying PRIME madness needed)
>>
>>  - KMS, Xserver 1.12.3, driver 6.14.6-28 (trunk current as of today),
>>Mesa 8.0.4, libdrm 2.4.37
>>
>> As of kernel 3.5 EDID probing of the older of my two monitors appears to
>> have subtly broken. The log shows that it appears to work -- KMS comes
>> up OK and I get a working console -- but then X stops during startup for
>> nearly a minute (with both monitors black) before coming back to life
>> again and EDID-probing the monitor a further six times for no obvious
>> reason. (Full log attached, and xorg.conf, for what little use it is.)

False alarm. Well, the massive number of EDID probes is consistent, but
that was present in earlier kernels too. The minute-long X start process
was what was unusual, and intermittent: even in 3.5, it starts in a
relatively 'normal' eight seconds normally. I suspect system load, or
i2c bus timeouts or something (it is a really crap bus, after all).
Anyway, it hasn't recurred, which makes it hard to track down further,
let alone bisect :(

I think we have to let this one rest. Dammit. I hate to let a bug slip
away.

-- 
NULL && (void)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 02/16] perf symbol: remove unused 'end' arg in kallsyms parse cb

2012-08-10 Thread Cody P Schafer

I guess that length of 1 effectively same as zero length in this case
since we end up calling symbols__fixup_end. The 'end - start + 1' part
looks like a leftover from previous change and not needed anymore -
KSYM_NAME_LEN check too, IMHO - so I suggest using 0 length to make it
clear.


Got it.


And it seems you need to rebase the series onto Arnaldo's current
perf/core branch which separates out ELF bits to symbol-elf.c.


Will do. It apparently wasn't pushed out when I sent these patches, look 
for v2 shortly.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/6] floppy: don't call alloc_ordered_workqueue inside the alloc_disk loop

2012-08-10 Thread Vivek Goyal
On Thu, Aug 09, 2012 at 04:59:46PM -0300, Herton Ronaldo Krzesinski wrote:
> Since commit 070ad7e ("floppy: convert to delayed work and single-thread
> wq"), we end up calling alloc_ordered_workqueue multiple times inside
> the loop, which shouldn't be intended. Besides the leak, other side
> effect in the current code is if blk_init_queue fails, we would end up
> calling unregister_blkdev even if we didn't call yet register_blkdev.
> 
> Just moved the allocation of floppy_wq before the loop, and adjusted the
> code accordingly.
> 
> Signed-off-by: Herton Ronaldo Krzesinski 
> Cc: sta...@vger.kernel.org # 3.5+

Looks good to me.

Acked-by: Vivek Goyal 

Vivek

> ---
>  drivers/block/floppy.c |   15 ++-
>  1 file changed, 6 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
> index a7d6347..c8d9e68 100644
> --- a/drivers/block/floppy.c
> +++ b/drivers/block/floppy.c
> @@ -4138,6 +4138,10 @@ static int __init do_floppy_init(void)
>  
>   raw_cmd = NULL;
>  
> + floppy_wq = alloc_ordered_workqueue("floppy", 0);
> + if (!floppy_wq)
> + return -ENOMEM;
> +
>   for (dr = 0; dr < N_DRIVE; dr++) {
>   disks[dr] = alloc_disk(1);
>   if (!disks[dr]) {
> @@ -4145,16 +4149,10 @@ static int __init do_floppy_init(void)
>   goto out_put_disk;
>   }
>  
> - floppy_wq = alloc_ordered_workqueue("floppy", 0);
> - if (!floppy_wq) {
> - err = -ENOMEM;
> - goto out_put_disk;
> - }
> -
>   disks[dr]->queue = blk_init_queue(do_fd_request, _lock);
>   if (!disks[dr]->queue) {
>   err = -ENOMEM;
> - goto out_destroy_workq;
> + goto out_put_disk;
>   }
>  
>   blk_queue_max_hw_sectors(disks[dr]->queue, 64);
> @@ -4318,8 +4316,6 @@ out_release_dma:
>  out_unreg_region:
>   blk_unregister_region(MKDEV(FLOPPY_MAJOR, 0), 256);
>   platform_driver_unregister(_driver);
> -out_destroy_workq:
> - destroy_workqueue(floppy_wq);
>  out_unreg_blkdev:
>   unregister_blkdev(FLOPPY_MAJOR, "fd");
>  out_put_disk:
> @@ -4335,6 +4331,7 @@ out_put_disk:
>   }
>   put_disk(disks[dr]);
>   }
> + destroy_workqueue(floppy_wq);
>   return err;
>  }
>  
> -- 
> 1.7.9.5
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: POI: nvidia forcedeth phy id not present

2012-08-10 Thread Janpieter Sollie
A primary concern seems to be solved (4k frame size).
However, it is not really a patch yet, just a dirty workaround (just replaced 
ETH_DATA_LEN with
a value 4000, my switch is limited to 4k)

486c486
< #define NV_PKTLIMIT_1_ETH_DATA_LEN /* hard limit not known */
---
> #define NV_PKTLIMIT_1 4000 /* hard limit not known */

I will try to improve the MTU detection in driver, but it can take some time 
before I really
understand what each function does. I do not know if this workaround forces a 
value to the
'initiated' phy, or just forces a wrong MTU to be used on standard nvidia cards.
the results are worth considering: dd -if=//server/share of=/dev/null went from 
91 to 112 mbps
(and this is no file cache, the pc has 2gb ram (>50% used by GUI) and the 
source file is 30 ...)
any help developing a patch may be useful.  Is there some documentation about 
this driver module?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 05/11] Add a __GFP_KMEMCG flag

2012-08-10 Thread Kamezawa Hiroyuki
(2012/08/09 22:01), Glauber Costa wrote:
> This flag is used to indicate to the callees that this allocation is a
> kernel allocation in process context, and should be accounted to
> current's memcg. It takes numerical place of the of the recently removed
> __GFP_NO_KSWAPD.
> 
> Signed-off-by: Glauber Costa 
> CC: Christoph Lameter 
> CC: Pekka Enberg 
> CC: Michal Hocko 
> CC: Kamezawa Hiroyuki 
> CC: Johannes Weiner 
> CC: Suleiman Souhlal 
> CC: Rik van Riel 
> CC: Mel Gorman 

Okay, so, only memcg-aware allocations are accounted.
It seems a safe way to go.

Acked-by: KAMEZAWA Hiroyuki 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 04/11] kmem accounting basic infrastructure

2012-08-10 Thread Kamezawa Hiroyuki
(2012/08/09 22:01), Glauber Costa wrote:
> This patch adds the basic infrastructure for the accounting of the slab
> caches. To control that, the following files are created:
> 
>   * memory.kmem.usage_in_bytes
>   * memory.kmem.limit_in_bytes
>   * memory.kmem.failcnt
>   * memory.kmem.max_usage_in_bytes
> 
> They have the same meaning of their user memory counterparts. They
> reflect the state of the "kmem" res_counter.
> 
> The code is not enabled until a limit is set. This can be tested by the
> flag "kmem_accounted". This means that after the patch is applied, no
> behavioral changes exists for whoever is still using memcg to control
> their memory usage.
> 
> We always account to both user and kernel resource_counters. This
> effectively means that an independent kernel limit is in place when the
> limit is set to a lower value than the user memory. A equal or higher
> value means that the user limit will always hit first, meaning that kmem
> is effectively unlimited.
> 
> People who want to track kernel memory but not limit it, can set this
> limit to a very high number (like RESOURCE_MAX - 1page - that no one
> will ever hit, or equal to the user memory)
> 
> Signed-off-by: Glauber Costa 
> CC: Michal Hocko 
> CC: Johannes Weiner 
> Reviewed-by: Kamezawa Hiroyuki 

Could you add  a patch for documentation of this new interface and a text
explaining the behavior of "kmem_accounting" ?

Hm, my concern is the difference of behavior between user page accounting and
kmem accounting...but this is how tcp-accounting is working.

Once you add Documentation, it's okay to add my Ack.

Thanks,
-Kame


> ---
>   mm/memcontrol.c | 69 
> -
>   1 file changed, 68 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index b0e29f4..54e93de 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -273,6 +273,10 @@ struct mem_cgroup {
>   };
>   
>   /*
> +  * the counter to account for kernel memory usage.
> +  */
> + struct res_counter kmem;
> + /*
>* Per cgroup active and inactive list, similar to the
>* per zone LRU lists.
>*/
> @@ -287,6 +291,7 @@ struct mem_cgroup {
>* Should the accounting and control be hierarchical, per subtree?
>*/
>   bool use_hierarchy;
> + bool kmem_accounted;
>   
>   booloom_lock;
>   atomic_tunder_oom;
> @@ -397,6 +402,7 @@ enum res_type {
>   _MEM,
>   _MEMSWAP,
>   _OOM_TYPE,
> + _KMEM,
>   };
>   
>   #define MEMFILE_PRIVATE(x, val) ((x) << 16 | (val))
> @@ -1499,6 +1505,10 @@ done:
>   res_counter_read_u64(>memsw, RES_USAGE) >> 10,
>   res_counter_read_u64(>memsw, RES_LIMIT) >> 10,
>   res_counter_read_u64(>memsw, RES_FAILCNT));
> + printk(KERN_INFO "kmem: usage %llukB, limit %llukB, failcnt %llu\n",
> + res_counter_read_u64(>kmem, RES_USAGE) >> 10,
> + res_counter_read_u64(>kmem, RES_LIMIT) >> 10,
> + res_counter_read_u64(>kmem, RES_FAILCNT));
>   
>   mem_cgroup_print_oom_stat(memcg);
>   }
> @@ -4008,6 +4018,9 @@ static ssize_t mem_cgroup_read(struct cgroup *cont, 
> struct cftype *cft,
>   else
>   val = res_counter_read_u64(>memsw, name);
>   break;
> + case _KMEM:
> + val = res_counter_read_u64(>kmem, name);
> + break;
>   default:
>   BUG();
>   }
> @@ -4046,8 +4059,23 @@ static int mem_cgroup_write(struct cgroup *cont, 
> struct cftype *cft,
>   break;
>   if (type == _MEM)
>   ret = mem_cgroup_resize_limit(memcg, val);
> - else
> + else if (type == _MEMSWAP)
>   ret = mem_cgroup_resize_memsw_limit(memcg, val);
> + else if (type == _KMEM) {
> + ret = res_counter_set_limit(>kmem, val);
> + if (ret)
> + break;
> + /*
> +  * Once enabled, can't be disabled. We could in theory
> +  * disable it if we haven't yet created any caches, or
> +  * if we can shrink them all to death.
> +  *
> +  * But it is not worth the trouble
> +  */
> + if (!memcg->kmem_accounted && val != RESOURCE_MAX)
> + memcg->kmem_accounted = true;
> + } else
> + return -EINVAL;
>   break;
>   case RES_SOFT_LIMIT:
>   ret = res_counter_memparse_write_strategy(buffer, );
> @@ -4113,12 +4141,16 @@ static int mem_cgroup_reset(struct cgroup *cont, 
> unsigned int event)
>   case RES_MAX_USAGE:
>   if (type == _MEM)
>   res_counter_reset_max(>res);
> + else if (type == _KMEM)

Re: [PATCH] ASoC: core: remove unused variable in soc_probe() in linux-next

2012-08-10 Thread Mark Brown
On Thu, Aug 09, 2012 at 11:16:26PM -0700, Jerry Snitselaar wrote:
> With commit 28d528c8 "ASoC: core: Remove pointless error on card
> registration failure", the variable ret is no longer used in
> soc_probe() and generates an unused variable warning during a build.

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ANNOUNCE] 3.4.8-rt16

2012-08-10 Thread Steven Rostedt

Dear RT Folks,

I'm pleased to announce the 3.4.8-rt16 stable release.


This release is just an update to the new stable 3.4.8 version
and no RT specific changes have been made.


You can get this release via the git tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git

  Head SHA1: bb33f79d9568fd197c5e006a168f88ffbfd3dbe3


Or to build 3.4.8-rt16 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v3.x/linux-3.4.tar.xz

  http://www.kernel.org/pub/linux/kernel/v3.x/patch-3.4.8.xz

  
http://www.kernel.org/pub/linux/kernel/projects/rt/3.4/patch-3.4.8-rt16.patch.xz



Enjoy,

-- Steve



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 02/11] memcg: Reclaim when more than one page needed.

2012-08-10 Thread Kamezawa Hiroyuki

(2012/08/11 0:42), Michal Hocko wrote:

On Thu 09-08-12 17:01:10, Glauber Costa wrote:
[...]

@@ -2317,18 +2318,18 @@ static int mem_cgroup_do_charge(struct mem_cgroup 
*memcg, gfp_t gfp_mask,
} else
mem_over_limit = mem_cgroup_from_res_counter(fail_res, res);
/*
-* nr_pages can be either a huge page (HPAGE_PMD_NR), a batch
-* of regular pages (CHARGE_BATCH), or a single regular page (1).
-*
 * Never reclaim on behalf of optional batching, retry with a
 * single page instead.
 */
-   if (nr_pages == CHARGE_BATCH)
+   if (nr_pages > min_pages)
return CHARGE_RETRY;


This is dangerous because THP charges will be retried now while they
previously failed with CHARGE_NOMEM which means that we will keep
attempting potentially endlessly.


with THP, I thought nr_pages == min_pages, and no retry.



Why cannot we simply do if (nr_pages < CHARGE_BATCH) and get rid of the
min_pages altogether?


Hm, I think a slab can be larger than CHARGE_BATCH.


Also the comment doesn't seem to be valid anymore.


I agree it's not clean. Because our assumption on nr_pages are changed,
I think this behavior should not depend on nr_pages value..
Shouldn't we have a flag to indicate "trial-for-batched charge" ?


Thanks,
-Kame




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ARM: tegra: use IO_ADDRESS for getting virtual address

2012-08-10 Thread Stephen Warren
On 08/10/2012 07:03 AM, Laxman Dewangan wrote:
> Use macro IO_ADDRESS for getting virtual address of
> corresponding physical address to make the consistency
> with rest of Tegra code-base.
> This macro calls the IO_TO_VIRT() which is defined in
> arch/arm/mach-tegra/include/mach/iomap.h

Thanks, applied to Tegra's for-3.7/fixes.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Upgraded from 3.4 to 3.5.1 kernel: machine does not boot

2012-08-10 Thread Justin Piszcz
Hello,

Motherboard: Supermicro X8DTH-6F
Distro: Debian Testing x86_64

>From 3.4 -> 3.5.1 on x86_64 make oldconfig and a few minor changes and the
machine attempts to boot but hangs at the filesystem mounting part of the
boot process.

Picture of where it stops working (a little burry but readable)
http://home.comcast.net/~jpiszcz/20120810/3.5-kernel-hangs.jpg

Kernel config 3.4 (working)
http://home.comcast.net/~jpiszcz/20120810/config-3.4.txt

Kernel config 3.5.1 (hangs)
http://home.comcast.net/~jpiszcz/20120810/config-3.5.1.txt

As you see towards the end the machine has been sitting there for 1 hour as
that's the timeout I have the drives spindown on the 3ware card.

Any thoughts as what is wrong here?

Diff between the two:

$ diff -u config-3.4.txt  config-3.5.1.txt  |grep '^+C'
+CONFIG_ARCH_SUPPORTS_UPROBES=y
+CONFIG_BUILDTIME_EXTABLE_SORT=y
+CONFIG_CLOCKSOURCE_WATCHDOG=y
+CONFIG_ARCH_CLOCKSOURCE_DATA=y
+CONFIG_GENERIC_TIME_VSYSCALL=y
+CONFIG_GENERIC_CLOCKEVENTS=y
+CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
+CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
+CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
+CONFIG_GENERIC_CMOS_UPDATE=y
+CONFIG_TICK_ONESHOT=y
+CONFIG_NO_HZ=y
+CONFIG_HIGH_RES_TIMERS=y
+CONFIG_RCU_FANOUT_LEAF=16
+CONFIG_GENERIC_SMP_IDLE_THREAD=y
+CONFIG_HAVE_ARCH_SECCOMP_FILTER=y
+CONFIG_SECCOMP_FILTER=y
+CONFIG_CROSS_MEMORY_ATTACH=y
+CONFIG_X86_DEV_DMA_OPS=y
+CONFIG_NETFILTER_NETLINK=y
+CONFIG_NF_CT_NETLINK=y
+CONFIG_HAVE_BPF_JIT=y
+CONFIG_E1000E=y
+CONFIG_IXGBE_HWMON=y
+CONFIG_NET_VENDOR_I825XX=y
+CONFIG_HID=y
+CONFIG_HIDRAW=y
+CONFIG_HID_GENERIC=y
+CONFIG_USB_HID=y
+CONFIG_HID_PID=y
+CONFIG_USB_HIDDEV=y
+CONFIG_NEW_LEDS=y
+CONFIG_LEDS_CLASS=y
+CONFIG_NFS_V2=y
+CONFIG_PANIC_ON_OOPS_VALUE=0
+CONFIG_RCU_CPU_STALL_INFO=y
+CONFIG_CRYPTO_CRC32C=y
+CONFIG_GENERIC_STRNCPY_FROM_USER=y
+CONFIG_GENERIC_STRNLEN_USER=y

Justin.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: null pointer dereference while loading i915

2012-08-10 Thread Daniel Vetter
On Fri, Aug 10, 2012 at 6:05 PM, Mihai Moldovan  wrote:
> * On 10.08.2012 12:10 PM, Daniel Vetter wrote:
>> On Wed, Aug 8, 2012 at 6:50 AM, Mihai Moldovan  wrote:
>>> Hi Daniel, hi list
>>>
>>> ever since version 3.2.0 (maybe even earlier, but 3.0.2 is still working 
>>> fine),
>>> my box is crashing when loading the i915 driver (mode-setting enabled.)
>>>
>>> The current version I'm testing with is 3.5.0.
>>>
>>> I was able to get the BUG output (please forgive any errors/flips in the 
>>> output,
>>> I have had to transcribe the messages from the screen/images), however, I'm 
>>> not
>>> able to find out what's wrong.
>>>
>>> If I see it correctly, there's a null pointer dereference in a printk called
>>> from inside gmbus_xfer. The only printk calls I can see in
>>> drivers/gpu/drm/i915/intel_i2c.c gmbus_xfer() however are issued by the
>>> DRM_DEBUG_KMS() and DRM_INFO() macros.
>>> Neither call looks wrong to me, I even tried to swap adapter->name with
>>> bus->adapter.name and make *sure* i < num is true, but haven't had any 
>>> success.
>>>
>>> I'd really like to see this bug fixed, as it's preventing me from updating 
>>> the
>>> kernel for over a year now.
>>>
>>> Also, while 3.0.2 works, it *does* spew error/warning messages related to 
>>> gmbus
>>> and I've had corrupted VTs in the past (albeit after a long uptime with 
>>> multiple
>>> X restarting and DVI cable unplugging/reattaching events), so maybe there's 
>>> a
>>> lot more broken than "expected".
>>
>> Hm, this is rather strange. gmbus should not be enable on 3.2 nor 3.0,
>> since exactly this issue might happen. We've re-enabled gmbus again on
>> 3.5 after having fixed this bug. Are you sure that this is plain 3.2
>> you're running?
>
> Sorry, I messed up the version numbers. Started bisecting yesterday and 
> noticed,
> that 3.0 up to 3.2 still work "fine" (see below), instead I've had another
> problem with 3.2 (completely lockup after the kernel is running for a few
> minutes, but I have no idea where this issue is coming from. Seems to be
> happening with 3.2.0 only, so... *shrug*)
>
> 3.0.2   => working, gmbus warnings as posted.
> 3.1-09933/07170 => working, NO gmbus warnings, but render errors (see below)
> 3.2-rc2 to rc4  => working, NO gmbus warnings, but render errors (see below)
> --- (stopped bisecting 3.0 to 3.2 as this was pointless) ---
> --- (restarted bisecting with 3.2 to 3.5) ---
> 3.3.0-06109 => working, gmbus warnings just like with 3.0, render errors
> (see below)
> 3.4.0-07487 => working, gmbus warnings, hang errors (see below)
> ...
>
> I've done more steps, but have not yet finished bisecting, so stay tuned.
> All those render errors look like that:
>
> [drm] capturing error event; look for more information in
> /debug/dri/0/i915_error_state
> render error detected, EIR: 0x0010
>   IPEIR: 0x
>   IPEHR: 0x0200
>   INSTDONE: 0x
>   INSTPS: 0x8001e025
>   INSTDONE1: 0xbfbb
>   ACTHD: 0x00a4203c
> page table error
>   PGTBL_ER: 0x0010
> [drm:i915_report_and_clear_eir] *ERROR* EIR stuck: 0x0010, masking
>
> I'll finish bisecting (and hope, that my guess was right, concerning the
> varaiant I wasn't able to build) and will post the bisect log when done.
>
> Meanwhile: at least for 3.0.2 and even older versions, gmbus must have been
> enabled as I'm pretty sure I always saw those errors when booting (just
> confirmed via logs for 3.0.0, 26.38.6, 2.6.39). Doesn't come up with 2.6.34,
> 2.6.36.1, 3.1-..., 3.2-... though.

Yeah, we've enabled gmbus a few times and then disabled it again due
to bugs. Also, the usual debug messsage says gmbus even when gmbus
isn't on ... yeah, slightly confusing, but that should be fixed, too.

For the gpu hang, please ensure that you're running the latest stable
release of everything (to avoid hunting down already known issues and
also because recent kernels dump more useful stuff), grab the entire
i915_error_state from debugfs and file a bug report with the usual
details at bugs.freedesktop.org against dri -> drm/intel.

Thanks,

Daniel
-- 
Daniel Vetter
daniel.vet...@ffwll.ch - +41 (0) 79 365 57 48 - http://blog.ffwll.ch
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: rcu stalls seen with numasched_v2 patches applied.

2012-08-10 Thread Srikar Dronamraju
> ---
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1539,6 +1539,7 @@ struct task_struct {
>  #ifdef CONFIG_SMP
>   u64 node_stamp; /* migration stamp  */
>   unsigned long numa_contrib;
> + struct callback_head numa_work;
>  #endif /* CONFIG_SMP  */
>  #endif /* CONFIG_NUMA */
>   struct rcu_head rcu;
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -816,7 +816,7 @@ void task_numa_work(struct callback_head
>   struct task_struct *t, *p = current;
>   int node = p->node_last;
> 
> - WARN_ON_ONCE(p != container_of(work, struct task_struct, rcu));
> + WARN_ON_ONCE(p != container_of(work, struct task_struct, numa_work));
> 
>   /*
>* Who cares about NUMA placement when they're dying.
> @@ -891,8 +891,8 @@ void task_tick_numa(struct rq *rq, struc
>* yet and exit_task_work() is called before
>* exit_notify().
>*/
> - init_task_work(>rcu, task_numa_work);
> - task_work_add(curr, >rcu, true);
> + init_task_work(>numa_work, task_numa_work);
> + task_work_add(curr, >numa_work, true);
>   }
>   curr->node_last = node;
>   }
> 

This change worked well on the 2 node machine 
but on the 8 node machine it hangs with repeated messages

Pid: 60935, comm: numa01 Tainted: GW3.5.0-numasched_v2_020812+ #4
Call Trace:
  [] ? rcu_check_callback s+0x632/0x650
[] ? update_process_times+0x48/0x90
[] ? tick_sched_timer+0x6e/0xe0
[] ? __run_hrtimer+0x75/0x1a0
[] ? tick_setup_sched_timer+0x100/0x100
[] ? hrtimer_interrupt+0xf6/0x250
[] ? smp_apic_timer_interrupt+0x69/0x99
[] ? apic_timer_interrupt+0x6a/0x70
  [] ? wait_on_page_bit+0x73/0x80
[] ? _raw_spin_lock+0x22/0x30
[] ? handle_pte_fault+0x1b3/0xca0
[] ? __schedule+0x2e7/0x710
[] ? up_read+0x18/0x30
[] ? do_page_fault+0x13e/0x460
[] ? __switch_to+0x1aa/0x460
[] ? __schedule+0x2e7/0x710
[] ? page_fault+0x25/0x30
{ 3}  (t=62998 jiffies)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [36/36] AArch64: MAINTAINERS update

2012-08-10 Thread Christopher Covington
Hi Catalin,

On 07/06/2012 05:06 PM, Catalin Marinas wrote:
> This patch updates the MAINTAINERS file for the AArch64 Linux kernel
> port.
> 
> Signed-off-by: Catalin Marinas 
> 
> ---
> MAINTAINERS |6 ++
>  1 files changed, 6 insertions(+), 0 deletions(-)
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index eb22272..50699f5 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -192,6 +192,12 @@ S:   Supported
>  F:   Documentation/scsi/aacraid.txt
>  F:   drivers/scsi/aacraid/
>  
> +AARCH64 ARCHITECTURE
> +M:   Catalin Marinas 
> +L:   linux-arm-ker...@lists.infradead.org (moderated for non-subscribers)

I think it's important to include the Linux ARM kernel mailing list as a 
recipient for these changes and I hope you'll do so in all future revisions of 
the 64-bit patches.

Thanks,
Christopher

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv4 1/3] fs: Move core dump functionality into its own file

2012-08-10 Thread Kees Cook
On Fri, Aug 10, 2012 at 1:26 AM, Alex Kelly  wrote:
> This prepares for making core dump functionality optional.
>
> The variable "suid_dumpable" and associated functions are left in fs/exec.c
> because they're used elsewhere, such as in ptrace.
>
> Signed-off-by: Alex Kelly 
> Reviewed-by: Josh Triplett 
> ---
> v2: This patch set is a second revision that follows some suggestions from
> Ingo Molnar and Josh Triplett. Specifically, authorship of commits is
> revised for consistency, and an additional two patches cleaning up artifacts
> and making headers more sane are added.
>
> v3: This version fixes a few more authorship issues and some problems caused
> by a bad git send-email config. Sorry about the extra mails
>
> v4: This version fixes some ordering issues pointed out by Kees Cook and Josh
> Triplett, such that the order of the functions moved to fs/coredump.c is now
> consistent with their original order in fs/exec.c. v4 also drops some extra
> blank lines unintentionally introduced in fs/coredump.c, to avoid the need to
> clean them up later. That left the cleanup patch just reformatting a comment,
> so I dropped that patch. Some of the functions moved to coredump.c need a lot
> of cleaning up, but I'm not sure that those formatting changes should be
> folded into this patch series.

Thanks for the cleanups! This looks great now.

For all three patches:
Acked-by: Kees Cook 

-Kees

-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mutex: place lock in contended state after fastpath_lock failure

2012-08-10 Thread Nicolas Pitre
On Fri, 10 Aug 2012, Will Deacon wrote:

> ARM recently moved to asm-generic/mutex-xchg.h for its mutex
> implementation after the previous implementation was found to be missing
> some crucial memory barriers. However, this has revealed some problems
> running hackbench on SMP platforms due to the way in which the
> MUTEX_SPIN_ON_OWNER code operates.
> 
> The symptoms are that a bunch of hackbench tasks are left waiting on an
> unlocked mutex and therefore never get woken up to claim it. This boils
> down to the following sequence of events:
> 
> Task ATask BTask CLock value
> 0 1
> 1   lock()0
> 2 lock()  0
> 3 spin(A) 0
> 4   unlock()  1
> 5   lock()0
> 6 cmpxchg(1,0)0
> 7 contended()-1
> 8   lock()0
> 9   spin(C)   0
> 10  unlock()  1
> 11  cmpxchg(1,0)  0
> 12  unlock()  1
> 
> At this point, the lock is unlocked, but Task B is in an uninterruptible
> sleep with nobody to wake it up.
> 
> This patch fixes the problem by ensuring we put the lock into the
> contended state if we fail to acquire it on the fastpath, ensuring that
> any blocked waiters are woken up when the mutex is released.
> 
> Cc: Arnd Bergmann 
> Cc: Thomas Gleixner 
> Cc: Chris Mason 
> Cc: Ingo Molnar 
> Cc: Nicolas Pitre 
> Cc: 
> Signed-off-by: Will Deacon 

Reviewed-by: Nicolas Pitre 

> ---
> 
> Nico: Can I add your S-o-B to this please? Also, preliminary benchmarks
>   are now showing a slight performance improvement on A15 if I use
>   the -dec variant rather than -xchg. I'll follow up with a patch
>   once I've got more numbers.

Good.


> 
>  include/asm-generic/mutex-xchg.h |   11 +--
>  1 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/include/asm-generic/mutex-xchg.h 
> b/include/asm-generic/mutex-xchg.h
> index 580a6d3..c04e0db 100644
> --- a/include/asm-generic/mutex-xchg.h
> +++ b/include/asm-generic/mutex-xchg.h
> @@ -26,7 +26,13 @@ static inline void
>  __mutex_fastpath_lock(atomic_t *count, void (*fail_fn)(atomic_t *))
>  {
>   if (unlikely(atomic_xchg(count, 0) != 1))
> - fail_fn(count);
> + /*
> +  * We failed to acquire the lock, so mark it contended
> +  * to ensure that any waiting tasks are woken up by the
> +  * unlock slow path.
> +  */
> + if (likely(atomic_xchg(count, -1) != 1))
> + fail_fn(count);
>  }
>  
>  /**
> @@ -43,7 +49,8 @@ static inline int
>  __mutex_fastpath_lock_retval(atomic_t *count, int (*fail_fn)(atomic_t *))
>  {
>   if (unlikely(atomic_xchg(count, 0) != 1))
> - return fail_fn(count);
> + if (likely(atomic_xchg(count, -1) != 1))
> + return fail_fn(count);
>   return 0;
>  }
>  
> -- 
> 1.7.4.1
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] overlayfs: copy up i_uid/i_gid from the underlying inode

2012-08-10 Thread Miklos Szeredi
Andy Whitcroft  writes:
>   After a long hiatus I have had time to look into the issues
>   highlighted by the i_uid/i_gid requirements from the VFS.
>   I have identified a number of places which definatly did need
>   the ids copying up and those are reflected in the patch below.
>   I am not 100% convinced I have hit all of the places this might
>   be needed but it cirtainly helps with the issues I was seeing
>   with link and YAMA (which given YAMA is now gaining the link
>   constraints in mainline in v3.6 we will see more issues here).
>   were seeing and identify the places where
>
>   Please consider for overlayfs.

ovl_setattr() also needs this, I think.

Updated patch below.  Also pushed overlayfs.v14 with this patch to:

  git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs.v14


Thanks,
Miklos



>From 1182ebc7718994c66f3a966a90ed861aea2a5d01 Mon Sep 17 00:00:00 2001
From: Andy Whitcroft 
Date: Thu, 9 Aug 2012 16:47:21 +0100
Subject: [PATCH] overlayfs: copy up i_uid/i_gid from the underlying inode

YAMA et al rely on on i_uid/i_gid to be populated in order to perform
their checks.  While these really cannot be guarenteed as the underlying
filesystem may not even have the concept, they are expected to be filled
when possible.  To quote Al Viro:

"Ideally, yes, we'd want to have ->i_uid used only by fs-specific
 code and helpers used by that fs (including those that are
 implicit defaults). [...]   In practice we have enough places
 where uid/gid is used directly to make setting them practically
 a requirement - places like /proc// can get away with
 not doing that, but only because shitloads of syscalls are
 not allowed on those anyway, permissions or no permissions.
 In anything general-purpose you really need to set it."

Copy up the underlying filesystem information into the overlayfs inode
when we create it.

BugLink: http://bugs.launchpad.net/bugs/944386
Signed-off-by: Andy Whitcroft 
Signed-off-by: Miklos Szeredi 
---
 fs/overlayfs/dir.c   |2 ++
 fs/overlayfs/inode.c |2 ++
 fs/overlayfs/overlayfs.h |6 ++
 fs/overlayfs/super.c |1 +
 4 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index 40650c4..c4446c4 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -304,6 +304,7 @@ static int ovl_create_object(struct dentry *dentry, int 
mode, dev_t rdev,
}
}
ovl_dentry_update(dentry, newdentry);
+   ovl_copyattr(newdentry->d_inode, inode);
d_instantiate(dentry, inode);
inode = NULL;
newdentry = NULL;
@@ -446,6 +447,7 @@ static int ovl_link(struct dentry *old, struct inode 
*newdir,
new->d_fsdata);
if (!newinode)
goto link_fail;
+   ovl_copyattr(upperdir->d_inode, newinode);
 
ovl_dentry_version_inc(new->d_parent);
ovl_dentry_update(new, newdentry);
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index f3a534f..e7ab09b 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -31,6 +31,8 @@ int ovl_setattr(struct dentry *dentry, struct iattr *attr)
 
mutex_lock(>d_inode->i_mutex);
err = notify_change(upperdentry, attr);
+   if (!err)
+   ovl_copyattr(upperdentry->d_inode, dentry->d_inode);
mutex_unlock(>d_inode->i_mutex);
 
return err;
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index fe1241d..1cba38f 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -56,6 +56,12 @@ int ovl_removexattr(struct dentry *dentry, const char *name);
 
 struct inode *ovl_new_inode(struct super_block *sb, umode_t mode,
struct ovl_entry *oe);
+static inline void ovl_copyattr(struct inode *from, struct inode *to)
+{
+   to->i_uid = from->i_uid;
+   to->i_gid = from->i_gid;
+}
+
 /* dir.c */
 extern const struct inode_operations ovl_dir_inode_operations;
 
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 64d2695..9808408 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -343,6 +343,7 @@ static int ovl_do_lookup(struct dentry *dentry)
  oe);
if (!inode)
goto out_dput;
+   ovl_copyattr(realdentry->d_inode, inode);
}
 
if (upperdentry)
-- 
1.7.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: null pointer dereference while loading i915

2012-08-10 Thread Mihai Moldovan
* On 10.08.2012 12:10 PM, Daniel Vetter wrote:
> On Wed, Aug 8, 2012 at 6:50 AM, Mihai Moldovan  wrote:
>> Hi Daniel, hi list
>>
>> ever since version 3.2.0 (maybe even earlier, but 3.0.2 is still working 
>> fine),
>> my box is crashing when loading the i915 driver (mode-setting enabled.)
>>
>> The current version I'm testing with is 3.5.0.
>>
>> I was able to get the BUG output (please forgive any errors/flips in the 
>> output,
>> I have had to transcribe the messages from the screen/images), however, I'm 
>> not
>> able to find out what's wrong.
>>
>> If I see it correctly, there's a null pointer dereference in a printk called
>> from inside gmbus_xfer. The only printk calls I can see in
>> drivers/gpu/drm/i915/intel_i2c.c gmbus_xfer() however are issued by the
>> DRM_DEBUG_KMS() and DRM_INFO() macros.
>> Neither call looks wrong to me, I even tried to swap adapter->name with
>> bus->adapter.name and make *sure* i < num is true, but haven't had any 
>> success.
>>
>> I'd really like to see this bug fixed, as it's preventing me from updating 
>> the
>> kernel for over a year now.
>>
>> Also, while 3.0.2 works, it *does* spew error/warning messages related to 
>> gmbus
>> and I've had corrupted VTs in the past (albeit after a long uptime with 
>> multiple
>> X restarting and DVI cable unplugging/reattaching events), so maybe there's a
>> lot more broken than "expected".
>
> Hm, this is rather strange. gmbus should not be enable on 3.2 nor 3.0,
> since exactly this issue might happen. We've re-enabled gmbus again on
> 3.5 after having fixed this bug. Are you sure that this is plain 3.2
> you're running?

Sorry, I messed up the version numbers. Started bisecting yesterday and noticed,
that 3.0 up to 3.2 still work "fine" (see below), instead I've had another
problem with 3.2 (completely lockup after the kernel is running for a few
minutes, but I have no idea where this issue is coming from. Seems to be
happening with 3.2.0 only, so... *shrug*)

3.0.2   => working, gmbus warnings as posted.
3.1-09933/07170 => working, NO gmbus warnings, but render errors (see below)
3.2-rc2 to rc4  => working, NO gmbus warnings, but render errors (see below)
--- (stopped bisecting 3.0 to 3.2 as this was pointless) ---
--- (restarted bisecting with 3.2 to 3.5) ---
3.3.0-06109 => working, gmbus warnings just like with 3.0, render errors
(see below)
3.4.0-07487 => working, gmbus warnings, hang errors (see below)
...

I've done more steps, but have not yet finished bisecting, so stay tuned.
All those render errors look like that:

[drm] capturing error event; look for more information in
/debug/dri/0/i915_error_state
render error detected, EIR: 0x0010
  IPEIR: 0x
  IPEHR: 0x0200
  INSTDONE: 0x
  INSTPS: 0x8001e025
  INSTDONE1: 0xbfbb
  ACTHD: 0x00a4203c
page table error
  PGTBL_ER: 0x0010
[drm:i915_report_and_clear_eir] *ERROR* EIR stuck: 0x0010, masking

I'll finish bisecting (and hope, that my guess was right, concerning the
varaiant I wasn't able to build) and will post the bisect log when done.

Meanwhile: at least for 3.0.2 and even older versions, gmbus must have been
enabled as I'm pretty sure I always saw those errors when booting (just
confirmed via logs for 3.0.0, 26.38.6, 2.6.39). Doesn't come up with 2.6.34,
2.6.36.1, 3.1-..., 3.2-... though.

Best regards,


Mihai




smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PATCH v2 03/11] memcg: change defines to an enum

2012-08-10 Thread Michal Hocko
On Thu 09-08-12 17:01:11, Glauber Costa wrote:
> This is just a cleanup patch for clarity of expression.  In earlier
> submissions, people asked it to be in a separate patch, so here it is.
> 
> [ v2: use named enum as type throughout the file as well ]
> 
> Signed-off-by: Glauber Costa 
> CC: Michal Hocko 
> CC: Johannes Weiner 
> Acked-by: Kamezawa Hiroyuki 

Acked-by: Michal Hocko 

> ---
>  mm/memcontrol.c | 26 --
>  1 file changed, 16 insertions(+), 10 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 2cef99a..b0e29f4 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -393,9 +393,12 @@ enum charge_type {
>  };
>  
>  /* for encoding cft->private value on file */
> -#define _MEM (0)
> -#define _MEMSWAP (1)
> -#define _OOM_TYPE(2)
> +enum res_type {
> + _MEM,
> + _MEMSWAP,
> + _OOM_TYPE,
> +};
> +
>  #define MEMFILE_PRIVATE(x, val)  ((x) << 16 | (val))
>  #define MEMFILE_TYPE(val)((val) >> 16 & 0x)
>  #define MEMFILE_ATTR(val)((val) & 0x)
> @@ -3983,7 +3986,8 @@ static ssize_t mem_cgroup_read(struct cgroup *cont, 
> struct cftype *cft,
>   struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
>   char str[64];
>   u64 val;
> - int type, name, len;
> + int name, len;
> + enum res_type type;
>  
>   type = MEMFILE_TYPE(cft->private);
>   name = MEMFILE_ATTR(cft->private);
> @@ -4019,7 +4023,8 @@ static int mem_cgroup_write(struct cgroup *cont, struct 
> cftype *cft,
>   const char *buffer)
>  {
>   struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
> - int type, name;
> + enum res_type type;
> + int name;
>   unsigned long long val;
>   int ret;
>  
> @@ -4095,7 +4100,8 @@ out:
>  static int mem_cgroup_reset(struct cgroup *cont, unsigned int event)
>  {
>   struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
> - int type, name;
> + int name;
> + enum res_type type;
>  
>   type = MEMFILE_TYPE(event);
>   name = MEMFILE_ATTR(event);
> @@ -4423,7 +4429,7 @@ static int mem_cgroup_usage_register_event(struct 
> cgroup *cgrp,
>   struct mem_cgroup *memcg = mem_cgroup_from_cont(cgrp);
>   struct mem_cgroup_thresholds *thresholds;
>   struct mem_cgroup_threshold_ary *new;
> - int type = MEMFILE_TYPE(cft->private);
> + enum res_type type = MEMFILE_TYPE(cft->private);
>   u64 threshold, usage;
>   int i, size, ret;
>  
> @@ -4506,7 +4512,7 @@ static void mem_cgroup_usage_unregister_event(struct 
> cgroup *cgrp,
>   struct mem_cgroup *memcg = mem_cgroup_from_cont(cgrp);
>   struct mem_cgroup_thresholds *thresholds;
>   struct mem_cgroup_threshold_ary *new;
> - int type = MEMFILE_TYPE(cft->private);
> + enum res_type type = MEMFILE_TYPE(cft->private);
>   u64 usage;
>   int i, j, size;
>  
> @@ -4584,7 +4590,7 @@ static int mem_cgroup_oom_register_event(struct cgroup 
> *cgrp,
>  {
>   struct mem_cgroup *memcg = mem_cgroup_from_cont(cgrp);
>   struct mem_cgroup_eventfd_list *event;
> - int type = MEMFILE_TYPE(cft->private);
> + enum res_type type = MEMFILE_TYPE(cft->private);
>  
>   BUG_ON(type != _OOM_TYPE);
>   event = kmalloc(sizeof(*event), GFP_KERNEL);
> @@ -4609,7 +4615,7 @@ static void mem_cgroup_oom_unregister_event(struct 
> cgroup *cgrp,
>  {
>   struct mem_cgroup *memcg = mem_cgroup_from_cont(cgrp);
>   struct mem_cgroup_eventfd_list *ev, *tmp;
> - int type = MEMFILE_TYPE(cft->private);
> + enum res_type type = MEMFILE_TYPE(cft->private);
>  
>   BUG_ON(type != _OOM_TYPE);
>  
> -- 
> 1.7.11.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [lm-sensors] NULL dereference BUG in sch56xx_init()

2012-08-10 Thread Guenter Roeck
On Fri, Aug 10, 2012 at 10:35:59AM +0200, Hans de Goede wrote:
> Hi,
> 
> On 08/09/2012 04:42 PM, Guenter Roeck wrote:
> >On Thu, Aug 09, 2012 at 08:55:26PM +0800, Fengguang Wu wrote:
> >>Hi Guenter,
> >>
> >>This commit triggered an oops which can be fixed by the attached diff.
> >>Should it be folded into the original one (preferable for me), or be
> >>resent as a standalone patch?
> >>
> >I folded it into the original commit.
> >
> >Thanks a lot for the test and feedback!
> 
> Fengguang, good catch, thanks!
> 
> Guenter, 2 remarks:
> 
> 1) The changing of the type of the address parameter of sch56xx_device_add is
>not necessary

Yes, I know. I took it in anyway because it reduces code size by another 8
bytes.

> 2) A similar change is needed for the f71882fg, there the type of the address
>variable in f71882fg_init() needs to be changed to int too.
> 
Noticed, and fixed. I merged the fix for both into the original patched for
simplicity.

Thanks,
Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 02/11] memcg: Reclaim when more than one page needed.

2012-08-10 Thread Michal Hocko
On Thu 09-08-12 17:01:10, Glauber Costa wrote:
[...]
> @@ -2317,18 +2318,18 @@ static int mem_cgroup_do_charge(struct mem_cgroup 
> *memcg, gfp_t gfp_mask,
>   } else
>   mem_over_limit = mem_cgroup_from_res_counter(fail_res, res);
>   /*
> -  * nr_pages can be either a huge page (HPAGE_PMD_NR), a batch
> -  * of regular pages (CHARGE_BATCH), or a single regular page (1).
> -  *
>* Never reclaim on behalf of optional batching, retry with a
>* single page instead.
>*/
> - if (nr_pages == CHARGE_BATCH)
> + if (nr_pages > min_pages)
>   return CHARGE_RETRY;

This is dangerous because THP charges will be retried now while they
previously failed with CHARGE_NOMEM which means that we will keep
attempting potentially endlessly.
Why cannot we simply do if (nr_pages < CHARGE_BATCH) and get rid of the
min_pages altogether?
Also the comment doesn't seem to be valid anymore.

>  
>   if (!(gfp_mask & __GFP_WAIT))
>   return CHARGE_WOULDBLOCK;
>  
> + if (gfp_mask & __GFP_NORETRY)
> + return CHARGE_NOMEM;
> +
>   ret = mem_cgroup_reclaim(mem_over_limit, gfp_mask, flags);
>   if (mem_cgroup_margin(mem_over_limit) >= nr_pages)
>   return CHARGE_RETRY;
> @@ -2341,7 +2342,7 @@ static int mem_cgroup_do_charge(struct mem_cgroup 
> *memcg, gfp_t gfp_mask,
>* unlikely to succeed so close to the limit, and we fall back
>* to regular pages anyway in case of failure.
>*/
> - if (nr_pages == 1 && ret)
> + if (nr_pages <= (1 << PAGE_ALLOC_COSTLY_ORDER) && ret)
>   return CHARGE_RETRY;
>  
>   /*
> @@ -2476,7 +2477,8 @@ again:
>   nr_oom_retries = MEM_CGROUP_RECLAIM_RETRIES;
>   }
>  
> - ret = mem_cgroup_do_charge(memcg, gfp_mask, batch, oom_check);
> + ret = mem_cgroup_do_charge(memcg, gfp_mask, batch, nr_pages,
> + oom_check);
>   switch (ret) {
>   case CHARGE_OK:
>   break;
> -- 
> 1.7.11.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv4 3/3] fs: Update coredump-related headers

2012-08-10 Thread Serge E. Hallyn
Quoting Alex Kelly (alex.page.ke...@gmail.com):
> This patch creates a new header file, fs/coredump.h, which contains
> functions only used by the new coredump.c. It also moves do_coredump
> to the include/linux/coredump.h header file, for consistency.
> 
> Signed-off-by: Alex Kelly 
> Reviewed-by: Josh Triplett 

Acked-by: Serge Hallyn 

> ---
>  fs/coredump.c| 2 ++
>  fs/coredump.h| 6 ++
>  fs/exec.c| 1 +
>  include/linux/binfmts.h  | 5 -
>  include/linux/coredump.h | 5 +
>  include/linux/sched.h| 1 -
>  kernel/signal.c  | 1 +
>  7 files changed, 15 insertions(+), 6 deletions(-)
>  create mode 100644 fs/coredump.h
> 
> diff --git a/fs/coredump.c b/fs/coredump.c
> index 9692329..1935b4d 100644
> --- a/fs/coredump.c
> +++ b/fs/coredump.c
> @@ -14,6 +14,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -39,6 +40,7 @@
>  
>  #include 
>  #include "internal.h"
> +#include "coredump.h"
>  
>  #include 
>  
> diff --git a/fs/coredump.h b/fs/coredump.h
> new file mode 100644
> index 000..e39ff07
> --- /dev/null
> +++ b/fs/coredump.h
> @@ -0,0 +1,6 @@
> +#ifndef _FS_COREDUMP_H
> +#define _FS_COREDUMP_H
> +
> +extern int __get_dumpable(unsigned long mm_flags);
> +
> +#endif
> diff --git a/fs/exec.c b/fs/exec.c
> index b604050..a0ad3a2 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -63,6 +63,7 @@
>  
>  #include 
>  #include "internal.h"
> +#include "coredump.h"
>  
>  #include 
>  
> diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
> index 00e2e89..c7b16ee 100644
> --- a/include/linux/binfmts.h
> +++ b/include/linux/binfmts.h
> @@ -132,11 +132,6 @@ extern int copy_strings_kernel(int argc, const char 
> *const *argv,
>  struct linux_binprm *bprm);
>  extern int prepare_bprm_creds(struct linux_binprm *bprm);
>  extern void install_exec_creds(struct linux_binprm *bprm);
> -#ifdef CONFIG_COREDUMP
> -extern void do_coredump(long signr, int exit_code, struct pt_regs *regs);
> -#else
> -static inline void do_coredump(long signr, int exit_code, struct pt_regs 
> *regs) {}
> -#endif
>  extern void set_binfmt(struct linux_binfmt *new);
>  extern void free_bprm(struct linux_binprm *);
>  
> diff --git a/include/linux/coredump.h b/include/linux/coredump.h
> index ba4b85a..42f9752 100644
> --- a/include/linux/coredump.h
> +++ b/include/linux/coredump.h
> @@ -11,5 +11,10 @@
>   */
>  extern int dump_write(struct file *file, const void *addr, int nr);
>  extern int dump_seek(struct file *file, loff_t off);
> +#ifdef CONFIG_COREDUMP
> +extern void do_coredump(long signr, int exit_code, struct pt_regs *regs);
> +#else
> +static inline void do_coredump(long signr, int exit_code, struct pt_regs 
> *regs) {}
> +#endif
>  
>  #endif /* _LINUX_COREDUMP_H */
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 7bb5047..c147e70 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -413,7 +413,6 @@ static inline void arch_pick_mmap_layout(struct mm_struct 
> *mm) {}
>  
>  extern void set_dumpable(struct mm_struct *mm, int value);
>  extern int get_dumpable(struct mm_struct *mm);
> -extern int __get_dumpable(unsigned long mm_flags);
>  
>  /* get/set_dumpable() values */
>  #define SUID_DUMPABLE_DISABLED   0
> diff --git a/kernel/signal.c b/kernel/signal.c
> index be4f856..fb4fd72 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -17,6 +17,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> -- 
> 1.7.11.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv4 2/3] fs: Make core dump functionality optional

2012-08-10 Thread Serge E. Hallyn
Quoting Alex Kelly (alex.page.ke...@gmail.com):
> Adds an expert Kconfig option, CONFIG_COREDUMP, which allows disabling of 
> core dump.
> This saves approximately 2.6k in the compiled kernel, and complements 
> CONFIG_ELF_CORE,
> which now depends on it.
> 
> CONFIG_COREDUMP also disables coredump-related sysctls, except for 
> suid_dumpable and
> related functions, which are necessary for ptrace.
> 
> Signed-off-by: Alex Kelly 
> Reviewed-by: Josh Triplett 

Acked-by: Serge Hallyn 

> ---
>  fs/Kconfig.binfmt   | 8 
>  fs/Makefile | 3 ++-
>  include/linux/binfmts.h | 4 
>  init/Kconfig| 1 +
>  kernel/sysctl.c | 6 +-
>  5 files changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/Kconfig.binfmt b/fs/Kconfig.binfmt
> index 0225742..0efd152 100644
> --- a/fs/Kconfig.binfmt
> +++ b/fs/Kconfig.binfmt
> @@ -164,3 +164,11 @@ config BINFMT_MISC
> You may say M here for module support and later load the module when
> you have use for it; the module is called binfmt_misc. If you
> don't know what to answer at this point, say Y.
> +
> +config COREDUMP
> + bool "Enable core dump support" if EXPERT
> + default y
> + help
> +   This option enables support for performing core dumps. You almost
> +   certainly want to say Y here. Not necessary on systems that never
> +   need debugging or only ever run flawless code.
> diff --git a/fs/Makefile b/fs/Makefile
> index 8938f82..1d7af79 100644
> --- a/fs/Makefile
> +++ b/fs/Makefile
> @@ -11,7 +11,7 @@ obj-y :=open.o read_write.o file_table.o super.o \
>   attr.o bad_inode.o file.o filesystems.o namespace.o \
>   seq_file.o xattr.o libfs.o fs-writeback.o \
>   pnode.o drop_caches.o splice.o sync.o utimes.o \
> - stack.o fs_struct.o statfs.o coredump.o
> + stack.o fs_struct.o statfs.o
>  
>  ifeq ($(CONFIG_BLOCK),y)
>  obj-y += buffer.o bio.o block_dev.o direct-io.o mpage.o ioprio.o
> @@ -48,6 +48,7 @@ obj-$(CONFIG_FS_MBCACHE)+= mbcache.o
>  obj-$(CONFIG_FS_POSIX_ACL)   += posix_acl.o xattr_acl.o
>  obj-$(CONFIG_NFS_COMMON) += nfs_common/
>  obj-$(CONFIG_GENERIC_ACL)+= generic_acl.o
> +obj-$(CONFIG_COREDUMP)   += coredump.o
>  
>  obj-$(CONFIG_FHANDLE)+= fhandle.o
>  
> diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
> index 366422b..00e2e89 100644
> --- a/include/linux/binfmts.h
> +++ b/include/linux/binfmts.h
> @@ -132,7 +132,11 @@ extern int copy_strings_kernel(int argc, const char 
> *const *argv,
>  struct linux_binprm *bprm);
>  extern int prepare_bprm_creds(struct linux_binprm *bprm);
>  extern void install_exec_creds(struct linux_binprm *bprm);
> +#ifdef CONFIG_COREDUMP
>  extern void do_coredump(long signr, int exit_code, struct pt_regs *regs);
> +#else
> +static inline void do_coredump(long signr, int exit_code, struct pt_regs 
> *regs) {}
> +#endif
>  extern void set_binfmt(struct linux_binfmt *new);
>  extern void free_bprm(struct linux_binprm *);
>  
> diff --git a/init/Kconfig b/init/Kconfig
> index af6c7f8..0e75056 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1230,6 +1230,7 @@ config BUG
>Just say Y.
>  
>  config ELF_CORE
> + depends on COREDUMP
>   default y
>   bool "Enable ELF core dumps" if EXPERT
>   help
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 87174ef..af57e84 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -97,10 +97,12 @@
>  extern int sysctl_overcommit_memory;
>  extern int sysctl_overcommit_ratio;
>  extern int max_threads;
> -extern int core_uses_pid;
>  extern int suid_dumpable;
> +#ifdef CONFIG_COREDUMP
> +extern int core_uses_pid;
>  extern char core_pattern[];
>  extern unsigned int core_pipe_limit;
> +#endif
>  extern int pid_max;
>  extern int min_free_kbytes;
>  extern int pid_max_min, pid_max_max;
> @@ -404,6 +406,7 @@ static struct ctl_table kern_table[] = {
>   .mode   = 0644,
>   .proc_handler   = proc_dointvec,
>   },
> +#ifdef CONFIG_COREDUMP
>   {
>   .procname   = "core_uses_pid",
>   .data   = _uses_pid,
> @@ -425,6 +428,7 @@ static struct ctl_table kern_table[] = {
>   .mode   = 0644,
>   .proc_handler   = proc_dointvec,
>   },
> +#endif
>  #ifdef CONFIG_PROC_SYSCTL
>   {
>   .procname   = "tainted",
> -- 
> 1.7.11.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the 

Re: [PATCHv4 1/3] fs: Move core dump functionality into its own file

2012-08-10 Thread Serge E. Hallyn
Quoting Alex Kelly (alex.page.ke...@gmail.com):
> This prepares for making core dump functionality optional.
> 
> The variable "suid_dumpable" and associated functions are left in fs/exec.c
> because they're used elsewhere, such as in ptrace.
> 
> Signed-off-by: Alex Kelly 
> Reviewed-by: Josh Triplett 

Acked-by: Serge Hallyn 

> ---
> v2: This patch set is a second revision that follows some suggestions from
> Ingo Molnar and Josh Triplett. Specifically, authorship of commits is
> revised for consistency, and an additional two patches cleaning up artifacts
> and making headers more sane are added.
> 
> v3: This version fixes a few more authorship issues and some problems caused
> by a bad git send-email config. Sorry about the extra mails
> 
> v4: This version fixes some ordering issues pointed out by Kees Cook and Josh
> Triplett, such that the order of the functions moved to fs/coredump.c is now
> consistent with their original order in fs/exec.c. v4 also drops some extra
> blank lines unintentionally introduced in fs/coredump.c, to avoid the need to
> clean them up later. That left the cleanup patch just reformatting a comment,
> so I dropped that patch. Some of the functions moved to coredump.c need a lot 
> of cleaning up, but I'm not sure that those formatting changes should be 
> folded into this patch series.
> 
>  fs/Makefile   |   2 +-
>  fs/coredump.c | 689 
> ++
>  fs/exec.c | 647 +--
>  include/linux/sched.h |   1 +
>  4 files changed, 692 insertions(+), 647 deletions(-)
>  create mode 100644 fs/coredump.c
> 
> diff --git a/fs/Makefile b/fs/Makefile
> index 2fb9779..8938f82 100644
> --- a/fs/Makefile
> +++ b/fs/Makefile
> @@ -11,7 +11,7 @@ obj-y :=open.o read_write.o file_table.o super.o \
>   attr.o bad_inode.o file.o filesystems.o namespace.o \
>   seq_file.o xattr.o libfs.o fs-writeback.o \
>   pnode.o drop_caches.o splice.o sync.o utimes.o \
> - stack.o fs_struct.o statfs.o
> + stack.o fs_struct.o statfs.o coredump.o
>  
>  ifeq ($(CONFIG_BLOCK),y)
>  obj-y += buffer.o bio.o block_dev.o direct-io.o mpage.o ioprio.o
> diff --git a/fs/coredump.c b/fs/coredump.c
> new file mode 100644
> index 000..9692329
> --- /dev/null
> +++ b/fs/coredump.c
> @@ -0,0 +1,689 @@
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include "internal.h"
> +
> +#include 
> +
> +int core_uses_pid;
> +char core_pattern[CORENAME_MAX_SIZE] = "core";
> +unsigned int core_pipe_limit;
> +
> +struct core_name {
> + char *corename;
> + int used, size;
> +};
> +static atomic_t call_count = ATOMIC_INIT(1);
> +
> +/* The maximal length of core_pattern is also specified in sysctl.c */
> +
> +static int expand_corename(struct core_name *cn)
> +{
> + char *old_corename = cn->corename;
> +
> + cn->size = CORENAME_MAX_SIZE * atomic_inc_return(_count);
> + cn->corename = krealloc(old_corename, cn->size, GFP_KERNEL);
> +
> + if (!cn->corename) {
> + kfree(old_corename);
> + return -ENOMEM;
> + }
> +
> + return 0;
> +}
> +
> +static int cn_printf(struct core_name *cn, const char *fmt, ...)
> +{
> + char *cur;
> + int need;
> + int ret;
> + va_list arg;
> +
> + va_start(arg, fmt);
> + need = vsnprintf(NULL, 0, fmt, arg);
> + va_end(arg);
> +
> + if (likely(need < cn->size - cn->used - 1))
> + goto out_printf;
> +
> + ret = expand_corename(cn);
> + if (ret)
> + goto expand_fail;
> +
> +out_printf:
> + cur = cn->corename + cn->used;
> + va_start(arg, fmt);
> + vsnprintf(cur, need + 1, fmt, arg);
> + va_end(arg);
> + cn->used += need;
> + return 0;
> +
> +expand_fail:
> + return ret;
> +}
> +
> +static void cn_escape(char *str)
> +{
> + for (; *str; str++)
> + if (*str == '/')
> + *str = '!';
> +}
> +
> +static int cn_print_exe_file(struct core_name *cn)
> +{
> + struct file *exe_file;
> + char *pathbuf, *path;
> + int ret;
> +
> + exe_file = get_mm_exe_file(current->mm);
> + if (!exe_file) {
> + char *commstart = cn->corename + cn->used;
> + ret = cn_printf(cn, "%s (path unknown)", current->comm);
> + cn_escape(commstart);
> + return ret;
> + }
> +
> + pathbuf = kmalloc(PATH_MAX, GFP_TEMPORARY);
> + if (!pathbuf) {
> + ret = 

Re: [RFC 1/4] remoteproc: Bugfix assign device address to carveout (noiommu)

2012-08-10 Thread Ohad Ben-Cohen
Hi Sjur,

On Thu, Aug 9, 2012 at 11:35 PM, Sjur Brændeland  wrote:
> Any thoughts on how to go about to fix this?

The general direction I have in mind is to put the resource table in
its final location while we do the first pass of fw parsing.

This will solve all sort of open issues we have (or going to have soon):

1. dynamically-allocated address of the vrings can be communicated
2. vdev statuses can be communicated
3. virtio config space will finally become bi-directional as it should
4. dynamically probed rproc-to-rproc IPC could then take place

It's the real deal :)

The only problem with this approach is that the resource table isn't
reloaded throughout cycles of power up/down, and that is insecure.
We'll have to manually reload it somewhere after the rproc is powered
down (or before it is powered up again).

This change will break existing firmwares, but it looks required and inevitable.

Thanks,
Ohad.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Re: [PATCH 0/8] Set bi_rw when alloc bio before call bio_add_page.

2012-08-10 Thread Muthu Kumar
[ Resending in plain text... sorry for the duplicate ]

Hi,

On Mon, Jul 30, 2012 at 6:14 PM, Dave Chinner  wrote:
>
> On Tue, Jul 31, 2012 at 08:55:59AM +0800, majianpeng wrote:
> > On 2012-07-31 05:42 Dave Chinner  Wrote:
> > >On Mon, Jul 30, 2012 at 03:14:28PM +0800, majianpeng wrote:
> > >> When exec bio_alloc, the bi_rw is zero.But after calling
> > >> bio_add_page,
> > >> it will use bi_rw.
> > >> Fox example, in functiion __bio_add_page,it will call
> > >> merge_bvec_fn().
> > >> The merge_bvec_fn of raid456 will use the bi_rw to judge the merge.
> > >> >> if ((bvm->bi_rw & 1) == WRITE)
> > >> >> return biovec->bv_len; /* always allow writes to be mergeable */
> > >
> > >So if bio_add_page() requires bi_rw to be set, then shouldn't it be
> > >set up for every caller? I noticed there are about 50 call sites for
> > >bio_add_page(), and you've only touched about 10 of them. Indeed, I
> > >notice that the RAID0/1 code uses bio_add_page, and as that can be
> > >stacked on top of RAID456, it also needs to set bi_rw correctly.
> > >As a result, your patch set is nowhere near complete, not does it
> > >document that bio_add_page requires that bi_rw be set before calling
> > >(which is the new API requirement, AFAICT).
> > There are many place call bio_add_page and I send some of those. Because
> > my abilty, so I only send
> > some patchs which i understand clearly.
>
> Sure, but my point is that there is no point changing only a few and
> ignoring the great majority of callers. Either fix them all, fix it
> some other way (e.g. API change), or remove the code from the RAID5
> function that requires it.

A while back, we tried to address this by changing the alloc functions to
take rw argument and set it (as per Jens suggestion). I guess the patch did
not make it in. Please check:

https://lkml.org/lkml/2011/7/11/275

and the follow ups. If needed, I can dust up that patch and resend it.

>



>
> It's entirely possible that when bi_rw was added to struct
> bvec_merge_data, the person who added it was mistaken that bi_rw was
> set at this point in time when in fact it never has been. Hence it's
> presence and reliance on it would be a bug.
>
> That's what I'm asking - is this actually beneificial, or should it
> simply be removed from struct bvec_merge_data? Data is needed to
> answer that question

There are cases where we found it really beneficial to know the rw
field to decide if the can be really merged or not.


Regards,
Muthu


>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> da...@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv4 2/3] fs: Make core dump functionality optional

2012-08-10 Thread Serge Hallyn
Quoting Josh Triplett (j...@joshtriplett.org):
> On Fri, Aug 10, 2012 at 08:23:23AM -0500, Serge Hallyn wrote:
> > Quoting Alex Kelly (alex.page.ke...@gmail.com):
> > > Adds an expert Kconfig option, CONFIG_COREDUMP, which allows disabling of 
> > > core dump.
> > > This saves approximately 2.6k in the compiled kernel, and complements 
> > > CONFIG_ELF_CORE,
> > > which now depends on it.
> > 
> > Is there another reason than the 2.6k to do this?  My kernels range
> > between 4.8 and 5M, so that's .05% size savings?
> 
> A kitchen-sink kernel might take up that much space, but you can build a
> minimal embedded kernel that only takes up ~200k, at which point 2.6k
> represents a >1% decrease.  Add a few more changes like this, and those
> decreases start to add up.  At this point, no one thing you can chop out
> of the kernel will give you a 100k decrease by itself; you need a pile
> of changes like this one to do that.
> 
> - Josh Triplett

I see.  That's an order of magnitude smaller than what i figured you'd
get with a reasonable kernel  :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] trivial: pinctrl core: remove extraneous code lines

2012-08-10 Thread Stephen Warren
On 08/10/2012 08:53 AM, Richard Genoud wrote:
> In function pinctrl_get_locked, pointer p is returned on error, and also
> return on no_error.
> So, we just return it with no error test.
> 
> It's pretty the same in function pinctrl_lookup_state_locked: state is
> returned in every case, so we drop the error test and just return state.
> 
> Signed-off-by: Richard Genoud 
> ---
>  drivers/pinctrl/core.c |   10 ++
>  1 files changed, 2 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/pinctrl/core.c b/drivers/pinctrl/core.c
> index fb7f3be..7365d46 100644
> --- a/drivers/pinctrl/core.c
> +++ b/drivers/pinctrl/core.c
> @@ -657,11 +657,7 @@ static struct pinctrl *pinctrl_get_locked(struct device 
> *dev)
>   if (p != NULL)
>   return ERR_PTR(-EBUSY);
>  
> - p = create_pinctrl(dev);
> - if (IS_ERR(p))
> - return p;
> -
> - return p;
> + return create_pinctrl(dev);
>  }

This makes sense.

>  /**
> @@ -738,10 +734,8 @@ static struct pinctrl_state 
> *pinctrl_lookup_state_locked(struct pinctrl *p,
>   dev_dbg(p->dev, "using pinctrl dummy state (%s)\n",
>   name);
>   state = create_state(p, name);
> - if (IS_ERR(state))
> - return state;
>   } else {
> - return ERR_PTR(-ENODEV);
> + state = ERR_PTR(-ENODEV);
>   }
>   }

Personally I find the code much clearer as it is, but the result of this
patch still looks correct.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


<    1   2   3   4   5   6   7   8   >